Download Raw Diff

Details

Reviewers

lebedev.ri
spatel

Commits

rG012909dcaf85: [InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0"

Summary

"X % C == 0" is optimized to "X & C-1 == 0" (where C is a power-of-two)
However, "X % Y" can also be represented as "X - (X / Y) * Y" so if I rewrite the initial expression:
"X - (X / C) * C == 0" it's not currently optimized to "X & C-1 == 0", see godbolt: https://godbolt.org/z/KzuXUj

This is my first contribution to LLVM so I hope I didn't mess things up

Diff Detail

Event Timeline

EgorBo created this revision.May 4 2020, 3:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2020, 3:00 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

EgorBo added reviewers: lebedev.ri, spatel.May 4 2020, 3:01 PM

EgorBo edited the summary of this revision. (Show Details)

Harbormaster failed remote builds in B55703: Diff 261936!May 4 2020, 3:37 PM

Fix test name

lebedev.ri added inline comments.May 4 2020, 10:59 PM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1362–1363	I think we should first instead fold `X - (X / C) * C`/`((X / -C1) >> C2)) + X` into `X % C`. We have integer division/remainder either way, but with remainder it's less instructions. But that won't help here, because your target pattern `@is_rem32_pos_decomposed_i8` is currently folded into define i1 @is_rem32_pos_decomposed_i8(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %s = sub i8 0, %x %r = icmp eq i8 %m.neg, %s ret i1 %r } instead of define i1 @tgt(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %m.neg.add = add i8 %m.neg, %x %r = icmp eq i8 %m.neg.add, 0 ret i1 %r } For that, i'd think we should extend `foldIRemByPowerOfTwoToBitTest()`.
1391	Don't use fixed-size integers, use `APInt`.

lebedev.ri added inline comments.May 4 2020, 11:06 PM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

1362–1363

Edit: err, actually, disregard foldIRemByPowerOfTwoToBitTest() comment.
We should also fold

define i1 @is_rem32_pos_decomposed_i8(i8 %x) {
  %d.neg = sdiv i8 %x, -32
  %m.neg = shl i8 %d.neg, 5
  %s = sub i8 0, %x
  %r = icmp eq i8 %m.neg, %s
  ret i1 %r
}

into

define i1 @is_rem32_pos_decomposed_i8(i8 %x) {
  %s = srem i8 %x, 32
  %r = icmp eq i8 %s, 0
  ret i1 %r
}

and then foldIRemByPowerOfTwoToBitTest() will just work.

Moved to InstCombiner::visitAdd

@lebedev.ri Thanks for review!
I've moved the transformation to "InstCombiner::visitAdd"

So now it converts "((X / C1) << C2) + X" to "X % C1" thus "((X / C1) << C2) + X == 0" is automatically optimized to "X & C-1 == 0" later.

lebedev.ri added inline comments.May 5 2020, 1:58 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1309–1310 ↗	(On Diff #262062)	// ((X / C1) << C2) + X => X % -C1 // where -C1 = 1 << C2
1311–1312 ↗	(On Diff #262062)	Do we care about non-splat vector support? This will only work for scalars/splat vectors.
1317 ↗	(On Diff #262062)	Just return the new `srem`.
llvm/test/Transforms/InstCombine/icmp-div-constant.ll
41	Please pull into a new file. Some comments: there should be no `icmp` here, this isn't the pattern we are folding (although yes, that is the larger pattern) there shouldn't be `nsw` flags since we don't use them there shouldn't be `mul`, we are looking for `shl` there shouldn't be `sub`, we are looking for `add` And this needs more tests, in particular: splat vector test non-splat vector test vector splat with undef lane test: there's two constants here, so two tests where either one of them contains `undef` lane, and a test where both of them contain undef lane one test with extra uses (see `@use` in this file) on every of the values - we don't care whether there are extra uses Negative tests - what are preconditions on the constants?

Add more tests (TODO: add non-splat vectors)

Add more tests (TODO: non-splat vectors)

Add more tests (undef, negative, non-splat vectors)

@lebedev.ri can you please re-review? do you want me to add more tests?

Regarding the non-splat vectors, I am not super familiar with it yet, will it complicate my code significantly?

Please feel free to add weekly-ish "ping" comments, i lost track of this, sorry :/
This looks good for the time being modulo nits.

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1312 ↗	(On Diff #262685)	`if (match(LHS, m_Shl(m_SDiv(m_Specific(RHS), m_APInt(C1)), m_APInt(C2)))) {`
1313–1315 ↗	(On Diff #262685)	I'm not sure why all these preconditions are needed? I think you only want `-C1 == 1 << C2`, i.e. `-(C1) == (one << C2)`. https://rise4fun.com/Alive/wslBO
1316 ↗	(On Diff #262685)	Why not just `-(*C1)`? While there, we now do that twice, so cache it in some new variable.
1309–1310 ↗	(On Diff #262062)	Eeeh, i walked into that one myself, but let's explicitly spell out sigdness of ops: `((X s/ C1) << C2) + X => X s% -C1`

Reverse-ping.

@lebedev.ri I am so sorry for this huge delay, totally forgot about this pr and broke my notifications, I promise I'll respond without delays in future :-) Addressed your feedback I beleive.

LGTM, thanks you.
If you need help committing this, do specify the "name <e@ma.il>" to be used.

This revision is now accepted and ready to land.Jun 10 2020, 12:55 PM

In D79369#2085994, @lebedev.ri wrote:

LGTM, thanks you.
If you need help committing this, do specify the "name <e@ma.il>" to be used.

@lebedev.ri I'm not sure I understand what you mean, do I need to add a header on top of that diff? something like

From: EgorBo <egorbo@gmail.com>

In D79369#2086224, @EgorBo wrote:

In D79369#2085994, @lebedev.ri wrote:

LGTM, thanks you.
If you need help committing this, do specify the "name <e@ma.il>" to be used.

@lebedev.ri I'm not sure I understand what you mean, do I need to add a header on top of that diff? something like

From: EgorBo <egorbo@gmail.com>

?

https://llvm.org/docs/DeveloperPolicy.html#attribution-of-changes
https://llvm.org/docs/DeveloperPolicy.html#commit-messages

If you’re not the original author, ensure the ‘Author’ property of the commit is set to the original author and the ‘Committer’ property is set to yourself. You can use a command similar to git commit --amend --author="John Doe <jdoe@llvm.org> to correct the author property if it is incorrect. See Attribution of Changes for more information including the method we used for attribution before the project migrated to git.

Closed by commit rG012909dcaf85: [InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0" (authored by EgorBo, committed by lebedev.ri). · Explain WhyJun 12 2020, 12:30 AM

This revision was automatically updated to reflect the committed changes.

Diff 261950

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

	Show First 20 Lines • Show All 1,342 Lines • ▼ Show 20 Lines
	Instruction *InstCombiner::foldIRemByPowerOfTwoToBitTest(ICmpInst &I) {			Instruction *InstCombiner::foldIRemByPowerOfTwoToBitTest(ICmpInst &I) {
	// This fold is only valid for equality predicates.			// This fold is only valid for equality predicates.
	if (!I.isEquality())			if (!I.isEquality())
	return nullptr;			return nullptr;
	ICmpInst::Predicate Pred;			ICmpInst::Predicate Pred;
	Value X, Y, *Zero;			Value X, Y, *Zero;
	if (!match(&I, m_ICmp(Pred, m_OneUse(m_IRem(m_Value(X), m_Value(Y))),			if (!match(&I, m_ICmp(Pred, m_OneUse(m_IRem(m_Value(X), m_Value(Y))),
	m_CombineAnd(m_Zero(), m_Value(Zero)))))			m_CombineAnd(m_Zero(), m_Value(Zero)))))
	return nullptr;			// try to recognize a decomposed version
				return foldIRemByPowerOfTwoDecomposedToBitTest(I);
	if (!isKnownToBeAPowerOfTwo(Y, /OrZero/ true, 0, &I))			if (!isKnownToBeAPowerOfTwo(Y, /OrZero/ true, 0, &I))
	return nullptr;			return nullptr;
	// This may increase instruction count, we don't enforce that Y is a constant.			// This may increase instruction count, we don't enforce that Y is a constant.
	Value *Mask = Builder.CreateAdd(Y, Constant::getAllOnesValue(Y->getType()));			Value *Mask = Builder.CreateAdd(Y, Constant::getAllOnesValue(Y->getType()));
	Value *Masked = Builder.CreateAnd(X, Mask);			Value *Masked = Builder.CreateAnd(X, Mask);
	return ICmpInst::Create(Instruction::ICmp, Pred, Masked, Zero);			return ICmpInst::Create(Instruction::ICmp, Pred, Masked, Zero);
	}			}

				/// Fold decomposed version of "X % C == 0" to "X & C-1 == 0".
				/// "X % C" can also be represented as "X - (X / C) * C" which is optimized
				/// into "((X / -C1) >> C2)) + X" so the latter can be folded to
				lebedev.riUnsubmitted Not Done Reply Inline Actions I think we should first instead fold `X - (X / C) * C`/`((X / -C1) >> C2)) + X` into `X % C`. We have integer division/remainder either way, but with remainder it's less instructions. But that won't help here, because your target pattern `@is_rem32_pos_decomposed_i8` is currently folded into define i1 @is_rem32_pos_decomposed_i8(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %s = sub i8 0, %x %r = icmp eq i8 %m.neg, %s ret i1 %r } instead of define i1 @tgt(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %m.neg.add = add i8 %m.neg, %x %r = icmp eq i8 %m.neg.add, 0 ret i1 %r } For that, i'd think we should extend `foldIRemByPowerOfTwoToBitTest()`. lebedev.ri: I think we should first instead fold `X - (X / C) * C`/`((X / -C1) >> C2)) + X` into `X % C`.
				lebedev.riUnsubmitted Not Done Reply Inline Actions Edit: err, actually, disregard `foldIRemByPowerOfTwoToBitTest()` comment. We should also fold define i1 @is_rem32_pos_decomposed_i8(i8 %x) { %d.neg = sdiv i8 %x, -32 %m.neg = shl i8 %d.neg, 5 %s = sub i8 0, %x %r = icmp eq i8 %m.neg, %s ret i1 %r } into define i1 @is_rem32_pos_decomposed_i8(i8 %x) { %s = srem i8 %x, 32 %r = icmp eq i8 %s, 0 ret i1 %r } and then `foldIRemByPowerOfTwoToBitTest()` will just work. lebedev.ri: Edit: err, actually, disregard `foldIRemByPowerOfTwoToBitTest()` comment. We should also fold…
				/// "X & C-1" for icmp eq/ne 0
				Instruction *
				InstCombiner::foldIRemByPowerOfTwoDecomposedToBitTest(ICmpInst &I) {
				// This fold is only valid for equality predicates.
				if (!I.isEquality())
				return nullptr;

				ICmpInst::Predicate Pred;
				Value X, Y, *Zero;
				const APInt C1, C2;
				if (!match(&I,
				m_ICmp(Pred,
				m_Add(m_Shl(m_SDiv(m_Value(X), m_APInt(C1)), m_APInt(C2)),
				m_Value(Y)),
				m_Value(Zero))))
				return nullptr;

				// C1 should be some negative power of two number
				if ((X != Y) \|\| !C1->isNegative() \|\| !C1->abs().isPowerOf2())
				return nullptr;

				// 1 << C2 == C1
				APInt one(C2->getBitWidth(), 1);
				if ((C1->abs() != one.shl(*C2)) \|\| C2->sle(one))
				return nullptr;

				// Replace with "X & C-1 ==/!= 0"
				uint64_t AndMask = C1->abs().getZExtValue() - 1;
				lebedev.riUnsubmitted Not Done Reply Inline Actions Don't use fixed-size integers, use `APInt`. lebedev.ri: Don't use fixed-size integers, use `APInt`.
				Value *And = Builder.CreateAnd(X, ConstantInt::get(X->getType(), AndMask));
				return new ICmpInst(Pred, And, Zero);
				}

	/// Fold equality-comparison between zero and any (maybe truncated) right-shift			/// Fold equality-comparison between zero and any (maybe truncated) right-shift
	/// by one-less-than-bitwidth into a sign test on the original value.			/// by one-less-than-bitwidth into a sign test on the original value.
	Instruction *InstCombiner::foldSignBitTest(ICmpInst &I) {			Instruction *InstCombiner::foldSignBitTest(ICmpInst &I) {
	Instruction *Val;			Instruction *Val;
	ICmpInst::Predicate Pred;			ICmpInst::Predicate Pred;
	if (!I.isEquality() \|\| !match(&I, m_ICmp(Pred, m_Instruction(Val), m_Zero())))			if (!I.isEquality() \|\| !match(&I, m_ICmp(Pred, m_Instruction(Val), m_Zero())))
	return nullptr;			return nullptr;

	▲ Show 20 Lines • Show All 4,836 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 928 Lines • ▼ Show 20 Lines	private:
Instruction *foldICmpUsingKnownBits(ICmpInst &Cmp);		Instruction *foldICmpUsingKnownBits(ICmpInst &Cmp);
Instruction *foldICmpWithDominatingICmp(ICmpInst &Cmp);		Instruction *foldICmpWithDominatingICmp(ICmpInst &Cmp);
Instruction *foldICmpWithConstant(ICmpInst &Cmp);		Instruction *foldICmpWithConstant(ICmpInst &Cmp);
Instruction *foldICmpInstWithConstant(ICmpInst &Cmp);		Instruction *foldICmpInstWithConstant(ICmpInst &Cmp);
Instruction *foldICmpInstWithConstantNotInt(ICmpInst &Cmp);		Instruction *foldICmpInstWithConstantNotInt(ICmpInst &Cmp);
Instruction *foldICmpBinOp(ICmpInst &Cmp, const SimplifyQuery &SQ);		Instruction *foldICmpBinOp(ICmpInst &Cmp, const SimplifyQuery &SQ);
Instruction *foldICmpEquality(ICmpInst &Cmp);		Instruction *foldICmpEquality(ICmpInst &Cmp);
Instruction *foldIRemByPowerOfTwoToBitTest(ICmpInst &I);		Instruction *foldIRemByPowerOfTwoToBitTest(ICmpInst &I);
		Instruction *foldIRemByPowerOfTwoDecomposedToBitTest(ICmpInst &I);
Instruction *foldSignBitTest(ICmpInst &I);		Instruction *foldSignBitTest(ICmpInst &I);
Instruction *foldICmpWithZero(ICmpInst &Cmp);		Instruction *foldICmpWithZero(ICmpInst &Cmp);

Value *foldUnsignedMultiplicationOverflowCheck(ICmpInst &Cmp);		Value *foldUnsignedMultiplicationOverflowCheck(ICmpInst &Cmp);

Instruction foldICmpSelectConstant(ICmpInst &Cmp, SelectInst Select,		Instruction foldICmpSelectConstant(ICmpInst &Cmp, SelectInst Select,
ConstantInt *C);		ConstantInt *C);
Instruction foldICmpTruncConstant(ICmpInst &Cmp, TruncInst Trunc,		Instruction foldICmpTruncConstant(ICmpInst &Cmp, TruncInst Trunc,
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-div-constant.ll

	Show All 32 Lines
	; CHECK-NEXT: [[R:%.*]] = icmp sgt i8 [[TMP1]], 0			; CHECK-NEXT: [[R:%.*]] = icmp sgt i8 [[TMP1]], 0
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	%s = srem i8 %x, 32			%s = srem i8 %x, 32
	%r = icmp sgt i8 %s, 0			%r = icmp sgt i8 %s, 0
	ret i1 %r			ret i1 %r
	}			}

				define i1 @is_rem32_pos_decomposed_i8(i8 %x) {
				lebedev.riUnsubmitted Not Done Reply Inline Actions Please pull into a new file. Some comments: there should be no `icmp` here, this isn't the pattern we are folding (although yes, that is the larger pattern) there shouldn't be `nsw` flags since we don't use them there shouldn't be `mul`, we are looking for `shl` there shouldn't be `sub`, we are looking for `add` And this needs more tests, in particular: splat vector test non-splat vector test vector splat with undef lane test: there's two constants here, so two tests where either one of them contains `undef` lane, and a test where both of them contain undef lane one test with extra uses (see `@use` in this file) on every of the values - we don't care whether there are extra uses Negative tests - what are preconditions on the constants? lebedev.ri: Please pull into a new file. Some comments: * there should be no `icmp` here, this isn't the…
				; CHECK-LABEL: @is_rem32_pos_decomposed_i8(
				; CHECK-NEXT: [[TMP1:%.]] = and i8 [[X:%.]], -97
				; CHECK-NEXT: [[R:%.*]] = icmp sgt i8 [[TMP1]], 0
				; CHECK-NEXT: ret i1 [[R]]
				;
				%d = sdiv i8 %x, 32
				%m = mul nsw i8 %d, 32
				%s = sub nsw i8 %x, %m
				%r = icmp eq i8 %s, 0
				ret i1 %r
				}

	; i16 -32765 == 32771 == 0b1000000000000011			; i16 -32765 == 32771 == 0b1000000000000011

	define i1 @is_rem4_neg_i16(i16 %x) {			define i1 @is_rem4_neg_i16(i16 %x) {
	; CHECK-LABEL: @is_rem4_neg_i16(			; CHECK-LABEL: @is_rem4_neg_i16(
	; CHECK-NEXT: [[TMP1:%.]] = and i16 [[X:%.]], -32765			; CHECK-NEXT: [[TMP1:%.]] = and i16 [[X:%.]], -32765
	; CHECK-NEXT: [[R:%.*]] = icmp ugt i16 [[TMP1]], -32768			; CHECK-NEXT: [[R:%.*]] = icmp ugt i16 [[TMP1]], -32768
	; CHECK-NEXT: ret i1 [[R]]			; CHECK-NEXT: ret i1 [[R]]
	;			;
	▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0"
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 261950

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/test/Transforms/InstCombine/icmp-div-constant.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0"ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 261950

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/test/Transforms/InstCombine/icmp-div-constant.ll

[InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0"
ClosedPublic