This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/2
InstCombineMulDivRem.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
sdiv-exact-by-negative-power-of-two.ll
-
sdiv-exact-by-power-of-two.ll

Differential D135970

[InstCombine] try to determine "exact" for sdiv
ClosedPublic

Authored by spatel on Oct 14 2022, 10:00 AM.

Download Raw Diff

Details

Reviewers

craig.topper
nikic

Commits

rGe5ee0b06d694: [InstCombine] try to determine "exact" for sdiv

Summary

If the divisor is a power-of-2 or negative-power-of-2 and the dividend is known to have >= trailing zeros than the divisor, the division is exact:
https://alive2.llvm.org/ce/z/UGBksM (general proof)
https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests)

This isn't the most direct optimization (we could create ashr in these examples instead of relying on existing folds for exact divides), but it's possible that there's a more general constraint than just a pow2 divisor, so this might be extended in the future.

This should solve issue #58348.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Oct 14 2022, 10:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 10:00 AM

Herald added subscribers: StephenFan, hiraditya, mcrosier. · View Herald Transcript

spatel requested review of this revision.Oct 14 2022, 10:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 10:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B192206: Diff 467811.Oct 14 2022, 10:01 AM

LGTM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1351	Side note, as you are working on division transforms. This has an obvious generalization to any `nonneg / neg` (https://alive2.llvm.org/ce/z/bYVnFG), and similar for `neg / nonneg` and `neg / neg`. It does require adding two negations in the general case, but I believe we consider that worthwhile to relax sdiv to udiv -- or at least we do the same transform based on range information in CVP.

This revision is now accepted and ready to land.Oct 14 2022, 12:13 PM

spatel added inline comments.Oct 16 2022, 6:59 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1351	Thanks for pointing that out. I didn't know CVP did that transform. Looks like it was: 8d487668d09fb0e And the instcombine change/enhancement was implemented/mentioned here: 0fdcca07ad2c0bdc2 Scanning over x86 instruction timings at least, it's not clear if unsigned div (DIV) is any faster than signed div (IDIV). So that seems questionable as an IR transform since it's not reversible in general, but if the negates are deleted/noise in most cases, then it's ok.

This revision was landed with ongoing or failed builds.Oct 16 2022, 8:14 AM

Closed by commit rGe5ee0b06d694: [InstCombine] try to determine "exact" for sdiv (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGe5ee0b06d694: [InstCombine] try to determine "exact" for sdiv.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineMulDivRem.cpp

10 lines

test/

Transforms/

InstCombine/

sdiv-exact-by-negative-power-of-two.ll

16 lines

sdiv-exact-by-power-of-two.ll

15 lines

Diff 468074

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show First 20 Lines • Show All 1,326 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitSDiv(BinaryOperator &I) {
if (match(&I, m_c_BinOp(		if (match(&I, m_c_BinOp(
m_OneUse(m_Intrinsic<Intrinsic::abs>(m_Value(X), m_One())),		m_OneUse(m_Intrinsic<Intrinsic::abs>(m_Value(X), m_One())),
m_Deferred(X)))) {		m_Deferred(X)))) {
Value *Cond = Builder.CreateIsNotNeg(X);		Value *Cond = Builder.CreateIsNotNeg(X);
return SelectInst::Create(Cond, ConstantInt::get(Ty, 1),		return SelectInst::Create(Cond, ConstantInt::get(Ty, 1),
ConstantInt::getAllOnesValue(Ty));		ConstantInt::getAllOnesValue(Ty));
}		}

if (isKnownNonNegative(Op0, DL, 0, &AC, &I, &DT)) {		KnownBits KnownDividend = computeKnownBits(Op0, 0, &I);
		if (!I.isExact() &&
		(match(Op1, m_Power2(Op1C)) \|\| match(Op1, m_NegatedPower2(Op1C))) &&
		KnownDividend.countMinTrailingZeros() >= Op1C->countTrailingZeros()) {
		I.setIsExact();
		return &I;
		}

		if (KnownDividend.isNonNegative()) {
// If both operands are unsigned, turn this into a udiv.		// If both operands are unsigned, turn this into a udiv.
if (isKnownNonNegative(Op1, DL, 0, &AC, &I, &DT)) {		if (isKnownNonNegative(Op1, DL, 0, &AC, &I, &DT)) {
auto *BO = BinaryOperator::CreateUDiv(Op0, Op1, I.getName());		auto *BO = BinaryOperator::CreateUDiv(Op0, Op1, I.getName());
BO->setIsExact(I.isExact());		BO->setIsExact(I.isExact());
return BO;		return BO;
}		}

if (match(Op1, m_NegatedPower2())) {		if (match(Op1, m_NegatedPower2())) {
		nikicUnsubmitted Not Done Reply Inline Actions Side note, as you are working on division transforms. This has an obvious generalization to any `nonneg / neg` (https://alive2.llvm.org/ce/z/bYVnFG), and similar for `neg / nonneg` and `neg / neg`. It does require adding two negations in the general case, but I believe we consider that worthwhile to relax sdiv to udiv -- or at least we do the same transform based on range information in CVP. nikic: Side note, as you are working on division transforms. This has an obvious generalization to any…
		spatelAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing that out. I didn't know CVP did that transform. Looks like it was: 8d487668d09fb0e And the instcombine change/enhancement was implemented/mentioned here: 0fdcca07ad2c0bdc2 Scanning over x86 instruction timings at least, it's not clear if unsigned div (DIV) is any faster than signed div (IDIV). So that seems questionable as an IR transform since it's not reversible in general, but if the negates are deleted/noise in most cases, then it's ok. spatel: Thanks for pointing that out. I didn't know CVP did that transform. Looks like it was…
// X sdiv (-(1 << C)) -> -(X sdiv (1 << C)) ->		// X sdiv (-(1 << C)) -> -(X sdiv (1 << C)) ->
// -> -(X udiv (1 << C)) -> -(X u>> C)		// -> -(X udiv (1 << C)) -> -(X u>> C)
Constant *CNegLog2 = ConstantExpr::getExactLogBase2(		Constant *CNegLog2 = ConstantExpr::getExactLogBase2(
ConstantExpr::getNeg(cast<Constant>(Op1)));		ConstantExpr::getNeg(cast<Constant>(Op1)));
Value *Shr = Builder.CreateLShr(Op0, CNegLog2, I.getName(), I.isExact());		Value *Shr = Builder.CreateLShr(Op0, CNegLog2, I.getName(), I.isExact());
return BinaryOperator::CreateNeg(Shr);		return BinaryOperator::CreateNeg(Shr);
}		}

▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/sdiv-exact-by-negative-power-of-two.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <2 x i8> poison			; CHECK-NEXT: ret <2 x i8> poison
	;			;
	%div = sdiv exact <2 x i8> %x, <i8 -32, i8 undef>			%div = sdiv exact <2 x i8> %x, <i8 -32, i8 undef>
	ret <2 x i8> %div			ret <2 x i8> %div
	}			}

	define i8 @prove_exact_with_high_mask(i8 %x, i8 %y) {			define i8 @prove_exact_with_high_mask(i8 %x, i8 %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask(			; CHECK-LABEL: @prove_exact_with_high_mask(
	; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -32			; CHECK-NEXT: [[A:%.]] = ashr i8 [[X:%.]], 2
	; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], -4			; CHECK-NEXT: [[D_NEG:%.*]] = and i8 [[A]], -8
				; CHECK-NEXT: [[D:%.*]] = sub nsw i8 0, [[D_NEG]]
	; CHECK-NEXT: ret i8 [[D]]			; CHECK-NEXT: ret i8 [[D]]
	;			;
	%a = and i8 %x, -32			%a = and i8 %x, -32
	%d = sdiv i8 %a, -4			%d = sdiv i8 %a, -4
	ret i8 %d			ret i8 %d
	}			}

	define i8 @prove_exact_with_high_mask_limit(i8 %x, i8 %y) {			define i8 @prove_exact_with_high_mask_limit(i8 %x, i8 %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask_limit(			; CHECK-LABEL: @prove_exact_with_high_mask_limit(
	; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -32			; CHECK-NEXT: [[A:%.]] = ashr i8 [[X:%.]], 5
	; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], -32			; CHECK-NEXT: [[D:%.*]] = sub nsw i8 0, [[A]]
	; CHECK-NEXT: ret i8 [[D]]			; CHECK-NEXT: ret i8 [[D]]
	;			;
	%a = and i8 %x, -32			%a = and i8 %x, -32
	%d = sdiv i8 %a, -32			%d = sdiv i8 %a, -32
	ret i8 %d			ret i8 %d
	}			}

				; negative test - not enough low zeros in dividend

	define i8 @not_prove_exact_with_high_mask(i8 %x, i8 %y) {			define i8 @not_prove_exact_with_high_mask(i8 %x, i8 %y) {
	; CHECK-LABEL: @not_prove_exact_with_high_mask(			; CHECK-LABEL: @not_prove_exact_with_high_mask(
	; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -32			; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -32
	; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], -64			; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], -64
	; CHECK-NEXT: ret i8 [[D]]			; CHECK-NEXT: ret i8 [[D]]
	;			;
	%a = and i8 %x, -32			%a = and i8 %x, -32
	%d = sdiv i8 %a, -64			%d = sdiv i8 %a, -64
	ret i8 %d			ret i8 %d
	}			}

	define <2 x i8> @prove_exact_with_high_mask_splat_vec(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @prove_exact_with_high_mask_splat_vec(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask_splat_vec(			; CHECK-LABEL: @prove_exact_with_high_mask_splat_vec(
	; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 5, i8 5>			; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 5, i8 5>
	; CHECK-NEXT: [[D:%.*]] = sdiv <2 x i8> [[A]], <i8 -16, i8 -16>			; CHECK-NEXT: [[D_NEG:%.*]] = ashr exact <2 x i8> [[A]], <i8 4, i8 4>
				; CHECK-NEXT: [[D:%.*]] = sub nsw <2 x i8> zeroinitializer, [[D_NEG]]
	; CHECK-NEXT: ret <2 x i8> [[D]]			; CHECK-NEXT: ret <2 x i8> [[D]]
	;			;
	%a = shl <2 x i8> %x, <i8 5, i8 5>			%a = shl <2 x i8> %x, <i8 5, i8 5>
	%d = sdiv <2 x i8> %a, <i8 -16, i8 -16>			%d = sdiv <2 x i8> %a, <i8 -16, i8 -16>
	ret <2 x i8> %d			ret <2 x i8> %d
	}			}

				; TODO: Needs knownbits to handle arbitrary vector constants.

	define <2 x i8> @prove_exact_with_high_mask_vec(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @prove_exact_with_high_mask_vec(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask_vec(			; CHECK-LABEL: @prove_exact_with_high_mask_vec(
	; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 2>			; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 2>
	; CHECK-NEXT: [[D:%.*]] = sdiv <2 x i8> [[A]], <i8 -8, i8 -4>			; CHECK-NEXT: [[D:%.*]] = sdiv <2 x i8> [[A]], <i8 -8, i8 -4>
	; CHECK-NEXT: ret <2 x i8> [[D]]			; CHECK-NEXT: ret <2 x i8> [[D]]
	;			;
	%a = shl <2 x i8> %x, <i8 3, i8 2>			%a = shl <2 x i8> %x, <i8 3, i8 2>
	%d = sdiv <2 x i8> %a, <i8 -8, i8 -4>			%d = sdiv <2 x i8> %a, <i8 -8, i8 -4>
	ret <2 x i8> %d			ret <2 x i8> %d
	}			}

llvm/test/Transforms/InstCombine/sdiv-exact-by-power-of-two.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	;			;
	%shl = shl nsw i8 1, %y			%shl = shl nsw i8 1, %y
	%div = sdiv i8 %x, %shl			%div = sdiv i8 %x, %shl
	ret i8 %div			ret i8 %div
	}			}

	define i8 @prove_exact_with_high_mask(i8 %x, i8 %y) {			define i8 @prove_exact_with_high_mask(i8 %x, i8 %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask(			; CHECK-LABEL: @prove_exact_with_high_mask(
	; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -8			; CHECK-NEXT: [[A:%.]] = ashr i8 [[X:%.]], 2
	; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], 4			; CHECK-NEXT: [[D:%.*]] = and i8 [[A]], -2
	; CHECK-NEXT: ret i8 [[D]]			; CHECK-NEXT: ret i8 [[D]]
	;			;
	%a = and i8 %x, -8			%a = and i8 %x, -8
	%d = sdiv i8 %a, 4			%d = sdiv i8 %a, 4
	ret i8 %d			ret i8 %d
	}			}

	define i8 @prove_exact_with_high_mask_limit(i8 %x, i8 %y) {			define i8 @prove_exact_with_high_mask_limit(i8 %x, i8 %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask_limit(			; CHECK-LABEL: @prove_exact_with_high_mask_limit(
	; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -8			; CHECK-NEXT: [[A:%.]] = ashr i8 [[X:%.]], 3
	; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], 8			; CHECK-NEXT: ret i8 [[A]]
	; CHECK-NEXT: ret i8 [[D]]
	;			;
	%a = and i8 %x, -8			%a = and i8 %x, -8
	%d = sdiv i8 %a, 8			%d = sdiv i8 %a, 8
	ret i8 %d			ret i8 %d
	}			}

				; negative test - not enough low zeros in dividend

	define i8 @not_prove_exact_with_high_mask(i8 %x, i8 %y) {			define i8 @not_prove_exact_with_high_mask(i8 %x, i8 %y) {
	; CHECK-LABEL: @not_prove_exact_with_high_mask(			; CHECK-LABEL: @not_prove_exact_with_high_mask(
	; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -8			; CHECK-NEXT: [[A:%.]] = and i8 [[X:%.]], -8
	; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], 16			; CHECK-NEXT: [[D:%.*]] = sdiv i8 [[A]], 16
	; CHECK-NEXT: ret i8 [[D]]			; CHECK-NEXT: ret i8 [[D]]
	;			;
	%a = and i8 %x, -8			%a = and i8 %x, -8
	%d = sdiv i8 %a, 16			%d = sdiv i8 %a, 16
	ret i8 %d			ret i8 %d
	}			}

	define <2 x i8> @prove_exact_with_high_mask_splat_vec(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @prove_exact_with_high_mask_splat_vec(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask_splat_vec(			; CHECK-LABEL: @prove_exact_with_high_mask_splat_vec(
	; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 3>			; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 3>
	; CHECK-NEXT: [[D:%.*]] = sdiv <2 x i8> [[A]], <i8 8, i8 8>			; CHECK-NEXT: [[D:%.*]] = ashr exact <2 x i8> [[A]], <i8 3, i8 3>
	; CHECK-NEXT: ret <2 x i8> [[D]]			; CHECK-NEXT: ret <2 x i8> [[D]]
	;			;
	%a = shl <2 x i8> %x, <i8 3, i8 3>			%a = shl <2 x i8> %x, <i8 3, i8 3>
	%d = sdiv <2 x i8> %a, <i8 8, i8 8>			%d = sdiv <2 x i8> %a, <i8 8, i8 8>
	ret <2 x i8> %d			ret <2 x i8> %d
	}			}

				; TODO: Needs knownbits to handle arbitrary vector constants.

	define <2 x i8> @prove_exact_with_high_mask_vec(<2 x i8> %x, <2 x i8> %y) {			define <2 x i8> @prove_exact_with_high_mask_vec(<2 x i8> %x, <2 x i8> %y) {
	; CHECK-LABEL: @prove_exact_with_high_mask_vec(			; CHECK-LABEL: @prove_exact_with_high_mask_vec(
	; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 2>			; CHECK-NEXT: [[A:%.]] = shl <2 x i8> [[X:%.]], <i8 3, i8 2>
	; CHECK-NEXT: [[D:%.*]] = sdiv <2 x i8> [[A]], <i8 8, i8 4>			; CHECK-NEXT: [[D:%.*]] = sdiv <2 x i8> [[A]], <i8 8, i8 4>
	; CHECK-NEXT: ret <2 x i8> [[D]]			; CHECK-NEXT: ret <2 x i8> [[D]]
	;			;
	%a = shl <2 x i8> %x, <i8 3, i8 2>			%a = shl <2 x i8> %x, <i8 3, i8 2>
	%d = sdiv <2 x i8> %a, <i8 8, i8 4>			%d = sdiv <2 x i8> %a, <i8 8, i8 4>
	ret <2 x i8> %d			ret <2 x i8> %d
	}			}