This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
5/12
InstCombineAndOrXor.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
4/11
onehot_merge.ll

Differential D64275

[InstCombine] Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2().
AbandonedPublic

Authored by huihuiz on Jul 5 2019, 8:52 PM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
lebedev.ri

Summary

In the middle-end, (A & (signbit l>> K)) ==/!= 0 can be folded into
(A l<< K) s>=/s< 0.

When folding 'And' or 'Or' of ICmps, need to extend the iszero() check
of (A & (signbit l>> K)) to its equivalent form of (A l<< K) s>=/s< 0.

Diff Detail

Repository: rL LLVM

Event Timeline

huihuiz created this revision.Jul 5 2019, 8:52 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJul 5 2019, 8:53 PM

huihuiz mentioned this in D62818: [InstCombine] Introduce fold for icmp pred (and X, (sh signbit, Y)), 0..Jul 5 2019, 8:55 PM

huihuiz added a parent revision: D62818: [InstCombine] Introduce fold for icmp pred (and X, (sh signbit, Y)), 0..

Thanks for working on this!
This is for sure missing some one-use checks (i.e. increases instruction count).
Could you please add 5 copies of @foo1_and_signbit_lshr_without_shifting_signbit, each one with different instruction having extra use?
It is also a good idea to add a test where both sides of or/and are in this new form.

llvm/test/Transforms/InstCombine/onehot_merge.ll
163–166	I'm guessing all these instructions must be one-use.

This revision now requires changes to proceed.Jul 9 2019, 1:06 AM

huihuiz mentioned this in rL365729: [InstCombine][NFCI] Add test coverage to onehot_merge.ll.Jul 10 2019, 9:57 PM

huihuiz mentioned this in rG51f5079191d5: [InstCombine][NFCI] Add test coverage to onehot_merge.ll.

Addressed review comments

In D64275#1575312, @lebedev.ri wrote:

Thanks for working on this!
This is for sure missing some one-use checks (i.e. increases instruction count).
Could you please add 5 copies of @foo1_and_signbit_lshr_without_shifting_signbit, each one with different instruction having extra use?

Thank you for catching this!
Yes indeed, increases instruction count.

For (A & K) ==/!= 0, need to restrict one use for 'and' and 'cmp' , there is no need to check K ('shift') for more than one use.
For equivalent form (A l<< K) s<=/s> 0, need to restrict one use for 'shift' and 'cmp'

It is also a good idea to add a test where both sides of or/and are in this new form.

Test cases added, please expand to see (as there is no diff)
I believe we should not try to decompose (A l<< K) s<=/s> 0 into (A & (signbit l>> K)) ==/!= 0 when both sides are in this new form, as this decomposition introduce additional instruction of signbit shift.
Also, foldLogOpOfMaskedICmps() in InstCombineCompares.cpp will try to fold ((A l<< K1) s<=/> 0) &/| ((A l<< K2) s<=/> 0) into a single icmp.

I think this looks ok, but i'd like to see more test coverage :)

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
888	could just `return match(...);`
889	Some test should "regress" (improve due to this), could you please add similar extra-use test coverage for this too?
908–910	To be pointed out, this indeed creates an instruction while we don't know yet whether we will be able to do the fold, not great, but i think this is the smallest evil, the alternative solution would indeed be uglier..
llvm/test/Transforms/InstCombine/onehot_merge.ll
27–32	Similarly, could you please add 6 tests with different of each of these instructions having an extra use?

(just marking as reviewed)

This revision now requires changes to proceed.Jul 11 2019, 12:26 AM

spatel added inline comments.Jul 11 2019, 10:59 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910	Hmm...do we have a test where the temporary instruction is created, but the fold fails? Last time I made that mistake, we infinite looped: the temporary instruction gets created and deleted triggering another round of combining and repeat forever.

lebedev.ri added inline comments.Jul 11 2019, 11:09 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910	Oh, hmm.

huihuiz marked an inline comment as done.Jul 11 2019, 11:42 AM

huihuiz added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910	Thank you for point this out! Indeed there are chances of infinite looping see test define i1 @foo(i32 %k, i32 %c1, i32 %c2) { %t0 = shl i32 3, %c1 %t1 = and i32 %t0, %k %t2 = icmp eq i32 %t1, 0 %t3 = shl i32 %k, %c2 %t4 = icmp sgt i32 %t3, -1 %or = or i1 %t2, %t4 ret i1 %or } I am working on a solution now.

lebedev.ri added inline comments.Jul 11 2019, 11:49 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910	Can we just manually delete the instruction if the fold fails? I'm sure that can be nicely wrapped into RAII wrapper.

spatel added inline comments.Jul 11 2019, 1:12 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910	I'm skeptical. I could be wrong, but it's not done anywhere else in instcombine? Could we send 'B2' out as a partial result (for example, if it's not null, we matched), and then have the caller create the shift if needed?

lebedev.ri added inline comments.Jul 11 2019, 1:16 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910	If we don't create this instruction, i guess e.g. `isCompareToZeroOrEquivalentForm()`, `isKnownToBeAPowerOfTwo()` will need to be also modified, no?

huihuiz mentioned this in rL365828: [InstCombine][NFCI] Add more test coverage to onehot_merge.ll.Jul 11 2019, 2:32 PM

huihuiz mentioned this in rG7b4a59db1e01: [InstCombine][NFCI] Add more test coverage to onehot_merge.ll.

I think we should only create signbit shift instruction when fold is supposed to happen.

I update the differential, let me know what you guys think?

test case foo1_and_signbit_lshr_without_shifting_signbit_not_pwr2() aim to check that we are not creating additional shift instruction when fold fails.

delete the signbit shift instruction might not break the cycle of infinite "create and delete".

Hm, not that ugly; this is still correct?

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
893–897	Can we end up needing to create this shift on both sides? If yes, then can you make this a lambda helper? If not, create it once and `std::swap()` into proper variable?

huihuiz updated this revision to Diff 209368.Jul 11 2019, 4:09 PM

huihuiz marked an inline comment as done.

huihuiz added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
893–897	We are not supposed to create for both sides, this will be handled by early exit. lambda helper probably better.

lebedev.ri added inline comments.Jul 11 2019, 4:14 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
924	This needs a comment as to why we're creating a node (in what pattern are we?)
llvm/test/Transforms/InstCombine/onehot_merge.ll
187	Looks like this one should?
187	Looks like this one should?
187	Could fold, no instruction count increase

huihuiz updated this revision to Diff 209382.Jul 11 2019, 5:36 PM

huihuiz marked 3 inline comments as done.

huihuiz added inline comments.

llvm/test/Transforms/InstCombine/onehot_merge.ll
187	We should not fold when 'cmp' has more than one use, this will increase instruction count when one side is using the new form. Take this as an example, we need to keep the extra 'cmp' that can't be reused, while the benefit of folding is offset by the added signbit shift instruction. define i1 @foo(i32 %k, i32 %c1, i32 %c2, i1* %p) { %t0 = shl i32 1, %c1 %t1 = and i32 %t0, %k %t2 = icmp eq i32 %t1, 0 store i1 %t2, i1* %p ; extra use of cmp %t3 = shl i32 %k, %c2 %t4 = icmp sgt i32 %t3, -1 %or = or i1 %t2, %t4 ret i1 %or }
187	I am OK with allowing 'and' to have more than one use. Folding cut instruction count by 2, the worst case is no instruction count increase.

forget to update the test comment, sorry

lebedev.ri marked an inline comment as done.Jul 12 2019, 3:06 PM

lebedev.ri added inline comments.

llvm/test/Transforms/InstCombine/onehot_merge.ll
187	the worst case is no instruction count increase. Precisely.
187	I believe i'm missing the point in `@foo1_and_extra_use_cmp2`. We were doing this fold previously (in llvm trunk), now we don't because of use-count. But this restriction does not seem to improve anything? We had 8 instructions, and we have 8 instructions without the fold?

huihuiz marked an inline comment as done.Jul 12 2019, 4:10 PM

huihuiz added inline comments.

llvm/test/Transforms/InstCombine/onehot_merge.ll

187

This restriction doesn't improve anything, but make sure we don't increase instruction count for certain cases.

If 'cmp' has extra use, and one side is using the new form, e.g., @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp1/2 (please expand the test file to see)
Without this restriction, instruction count actually increased.

If 'cmp' has extra use, and both sides are using old form, e.g., @foo1_and_extra_use_cmp, we have 8 instructions with and without the fold.

Here is the diff without this restriction:

--- a/llvm/test/Transforms/InstCombine/onehot_merge.ll
+++ b/llvm/test/Transforms/InstCombine/onehot_merge.ll
@@ -274,10 +274,10 @@ define i1 @foo1_and_extra_use_cmp(i32 %k, i32 %c1, i32 %c2, i1* %p) {
 ; CHECK-NEXT:    [[T2:%.*]] = and i32 [[T0]], [[K:%.*]]
 ; CHECK-NEXT:    [[T3:%.*]] = icmp eq i32 [[T2]], 0
 ; CHECK-NEXT:    store i1 [[T3]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[T4:%.*]] = and i32 [[T1]], [[K]]
-; CHECK-NEXT:    [[T5:%.*]] = icmp eq i32 [[T4]], 0
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T3]], [[T5]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[T0]], [[T1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[K]]
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne i32 [[TMP2]], [[TMP1]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = shl i32 1, %c2
@@ -340,13 +340,13 @@ define i1 @foo1_and_extra_use_cmp2(i32 %k, i32 %c1, i32 %c2, i1* %p) {
 ; CHECK-LABEL: @foo1_and_extra_use_cmp2(
 ; CHECK-NEXT:    [[T0:%.*]] = shl i32 1, [[C1:%.*]]
 ; CHECK-NEXT:    [[T1:%.*]] = shl i32 1, [[C2:%.*]]
-; CHECK-NEXT:    [[T2:%.*]] = and i32 [[T0]], [[K:%.*]]
-; CHECK-NEXT:    [[T3:%.*]] = icmp eq i32 [[T2]], 0
-; CHECK-NEXT:    [[T4:%.*]] = and i32 [[T1]], [[K]]
+; CHECK-NEXT:    [[T4:%.*]] = and i32 [[T1]], [[K:%.*]]
 ; CHECK-NEXT:    [[T5:%.*]] = icmp eq i32 [[T4]], 0
 ; CHECK-NEXT:    store i1 [[T5]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T3]], [[T5]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[T0]], [[T1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[K]]
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne i32 [[TMP2]], [[TMP1]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = shl i32 1, %c2
@@ -410,10 +410,11 @@ define i1 @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp1(i32 %k,
 ; CHECK-NEXT:    [[T1:%.*]] = and i32 [[T0]], [[K:%.*]]
 ; CHECK-NEXT:    [[T2:%.*]] = icmp eq i32 [[T1]], 0
 ; CHECK-NEXT:    store i1 [[T2]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[T3:%.*]] = shl i32 [[K]], [[C2:%.*]]
-; CHECK-NEXT:    [[T4:%.*]] = icmp sgt i32 [[T3]], -1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T2]], [[T4]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 -2147483648, [[C2:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = or i32 [[T0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], [[K]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i32 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = and i32 %t0, %k

 define i1 @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp2(i32 %k, i32 %c1, i32 %c2, i1* %p) {
 ; CHECK-LABEL: @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp2(
 ; CHECK-NEXT:    [[T0:%.*]] = shl i32 1, [[C1:%.*]]
-; CHECK-NEXT:    [[T1:%.*]] = and i32 [[T0]], [[K:%.*]]
-; CHECK-NEXT:    [[T2:%.*]] = icmp eq i32 [[T1]], 0
-; CHECK-NEXT:    [[T3:%.*]] = shl i32 [[K]], [[C2:%.*]]
+; CHECK-NEXT:    [[T3:%.*]] = shl i32 [[K:%.*]], [[C2:%.*]]
 ; CHECK-NEXT:    [[T4:%.*]] = icmp sgt i32 [[T3]], -1
 ; CHECK-NEXT:    store i1 [[T4]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T2]], [[T4]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 -2147483648, [[C2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = or i32 [[T0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], [[K]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i32 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = and i32 %t0, %k

lebedev.ri added inline comments.Jul 12 2019, 11:06 PM

llvm/test/Transforms/InstCombine/onehot_merge.ll
187	What i'm saying is, yes, sure, that one-use check is nessesary for some of the patterns, but for this one it causes regressions for no benefit. Can that check be tightened up to not have this false-positive?

(just marking as reviewed)

This revision now requires changes to proceed.Jul 24 2019, 12:30 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:43 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

huihuiz abandoned this revision.Jan 13 2023, 9:22 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

81 lines

test/

Transforms/

InstCombine/

onehot_merge.ll

22 lines

Diff 208251

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 856 Lines • ▼ Show 20 Lines	foldAndOrOfEqualityCmpsWithConstants(ICmpInst LHS, ICmpInst RHS,
return nullptr;		return nullptr;
}		}

// Fold (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2)		// Fold (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2)
// Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)		// Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2)
Value InstCombiner::foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,		Value InstCombiner::foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,
bool JoinedByAnd,		bool JoinedByAnd,
Instruction &CxtI) {		Instruction &CxtI) {
ICmpInst::Predicate Pred = LHS->getPredicate();		auto isCompareToZeroOrEquivalentForm = [](ICmpInst *Compare, bool JoinedByAnd,
if (Pred != RHS->getPredicate())		InstCombiner::BuilderTy &Builder,
return nullptr;		Value &A, Value &B) -> bool {
if (JoinedByAnd && Pred != ICmpInst::ICMP_NE)		if (Compare->isEquality()) {
return nullptr;		if (Compare->getPredicate() !=
if (!JoinedByAnd && Pred != ICmpInst::ICMP_EQ)		(JoinedByAnd ? ICmpInst::ICMP_NE : ICmpInst::ICMP_EQ))
return nullptr;		return false;

// TODO support vector splats		// TODO support vector splats
ConstantInt *LHSC = dyn_cast<ConstantInt>(LHS->getOperand(1));		ConstantInt *C = dyn_cast<ConstantInt>(Compare->getOperand(1));
ConstantInt *RHSC = dyn_cast<ConstantInt>(RHS->getOperand(1));		if (!C \|\| !C->isZero())
if (!LHSC \|\| !RHSC \|\| !LHSC->isZero() \|\| !RHSC->isZero())		return false;
return nullptr;
		if (match(Compare->getOperand(0), m_And(m_Value(A), m_Value(B))))
		return true;
		} else if (Compare->isSigned()) {
		// Generalize iszero() check to equivalent form of (A l<< K) s>=/s< 0.
		// In the middle-end, (A & (signbit l>> K)) ==/!= 0 can be folded into
		// (A l<< K) s>=/s< 0.
		if (Compare->getPredicate() !=
		(JoinedByAnd ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_SGT))
		return false;

		// TODO support vector splats
		lebedev.riUnsubmitted Done Reply Inline Actions could just `return match(...);` lebedev.ri: could just `return match(...);`
		ConstantInt *C = dyn_cast<ConstantInt>(Compare->getOperand(1));
		lebedev.riUnsubmitted Done Reply Inline Actions Some test should "regress" (improve due to this), could you please add similar extra-use test coverage for this too? lebedev.ri: Some test should "regress" (improve due to this), could you please add similar extra-use test…
		if (!C \|\| (JoinedByAnd ? !C->isZero() : !C->isMinusOne()))
		return false;

		Value *B2;
		if (match(Compare->getOperand(0), m_Shl(m_Value(A), m_Value(B2)))) {
		Type *Ty = B2->getType();
		B = Builder.CreateLShr(
		ConstantInt::get(Ty, APInt::getSignMask(Ty->getScalarSizeInBits())),
		lebedev.riUnsubmitted Not Done Reply Inline Actions Can we end up needing to create this shift on both sides? If yes, then can you make this a lambda helper? If not, create it once and `std::swap()` into proper variable? lebedev.ri: Can we end up needing to create this shift on both sides? If yes, then can you make this a…
		huihuizAuthorUnsubmitted Done Reply Inline Actions We are not supposed to create for both sides, this will be handled by early exit. lambda helper probably better. huihuiz: We are not supposed to create for both sides, this will be handled by early exit. lambda…
		B2);
		return true;
		}
		}

		return false;
		};

Value A, B, C, D;		Value A, B, C, D;
if (match(LHS->getOperand(0), m_And(m_Value(A), m_Value(B))) &&		if (!isCompareToZeroOrEquivalentForm(LHS, JoinedByAnd, Builder, A, B) \|\|
match(RHS->getOperand(0), m_And(m_Value(C), m_Value(D)))) {		!isCompareToZeroOrEquivalentForm(RHS, JoinedByAnd, Builder, C, D))
		return nullptr;

		lebedev.riUnsubmitted Not Done Reply Inline Actions To be pointed out, this indeed creates an instruction while we don't know yet whether we will be able to do the fold, not great, but i think this is the smallest evil, the alternative solution would indeed be uglier.. lebedev.ri: To be pointed out, this indeed creates an instruction while we don't know yet whether we will…
		spatelUnsubmitted Not Done Reply Inline Actions Hmm...do we have a test where the temporary instruction is created, but the fold fails? Last time I made that mistake, we infinite looped: the temporary instruction gets created and deleted triggering another round of combining and repeat forever. spatel: Hmm...do we have a test where the temporary instruction is created, but the fold fails? Last…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Oh, hmm. lebedev.ri: Oh, hmm.
		huihuizAuthorUnsubmitted Done Reply Inline Actions Thank you for point this out! Indeed there are chances of infinite looping see test define i1 @foo(i32 %k, i32 %c1, i32 %c2) { %t0 = shl i32 3, %c1 %t1 = and i32 %t0, %k %t2 = icmp eq i32 %t1, 0 %t3 = shl i32 %k, %c2 %t4 = icmp sgt i32 %t3, -1 %or = or i1 %t2, %t4 ret i1 %or } I am working on a solution now. huihuiz: Thank you for point this out! Indeed there are chances of infinite looping see test ```…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Can we just manually delete the instruction if the fold fails? I'm sure that can be nicely wrapped into RAII wrapper. lebedev.ri: Can we just manually delete the instruction if the fold fails? I'm sure that can be nicely…
		spatelUnsubmitted Not Done Reply Inline Actions I'm skeptical. I could be wrong, but it's not done anywhere else in instcombine? Could we send 'B2' out as a partial result (for example, if it's not null, we matched), and then have the caller create the shift if needed? spatel: I'm skeptical. I could be wrong, but it's not done anywhere else in instcombine? Could we send…
		lebedev.riUnsubmitted Not Done Reply Inline Actions If we don't create this instruction, i guess e.g. `isCompareToZeroOrEquivalentForm()`, `isKnownToBeAPowerOfTwo()` will need to be also modified, no? lebedev.ri: If we don't create this instruction, i guess e.g. `isCompareToZeroOrEquivalentForm()`…
if (A == D \|\| B == D)		if (A == D \|\| B == D)
std::swap(C, D);		std::swap(C, D);
if (B == C)		if (B == C)
std::swap(A, B);		std::swap(A, B);

if (A == C &&		if (A == C && isKnownToBeAPowerOfTwo(B, false, 0, &CxtI) &&
isKnownToBeAPowerOfTwo(B, false, 0, &CxtI) &&
isKnownToBeAPowerOfTwo(D, false, 0, &CxtI)) {		isKnownToBeAPowerOfTwo(D, false, 0, &CxtI)) {
Value *Mask = Builder.CreateOr(B, D);		Value *Mask = Builder.CreateOr(B, D);
Value *Masked = Builder.CreateAnd(A, Mask);		Value *Masked = Builder.CreateAnd(A, Mask);
auto NewPred = JoinedByAnd ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;		auto NewPred = JoinedByAnd ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
return Builder.CreateICmp(NewPred, Masked, Mask);		return Builder.CreateICmp(NewPred, Masked, Mask);
}		}
}

return nullptr;		return nullptr;
		lebedev.riUnsubmitted Done Reply Inline Actions This needs a comment as to why we're creating a node (in what pattern are we?) lebedev.ri: This needs a comment as to why we're creating a node (in what pattern are we?)
}		}

/// General pattern:		/// General pattern:
/// X & Y		/// X & Y
///		///
/// Where Y is checking that all the high bits (covered by a mask 4294967168)		/// Where Y is checking that all the high bits (covered by a mask 4294967168)
/// are uniform, i.e. %arg & 4294967168 can be either 4294967168 or 0		/// are uniform, i.e. %arg & 4294967168 can be either 4294967168 or 0
/// Pattern can be one of:		/// Pattern can be one of:
▲ Show 20 Lines • Show All 2,181 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/onehot_merge.ll

Show All 18 Lines
; CHECK-LABEL: @foo1_and(		; CHECK-LABEL: @foo1_and(
; CHECK-NEXT: [[T:%.]] = shl i32 1, [[C1:%.]]		; CHECK-NEXT: [[T:%.]] = shl i32 1, [[C1:%.]]
; CHECK-NEXT: [[T4:%.]] = shl i32 1, [[C2:%.]]		; CHECK-NEXT: [[T4:%.]] = shl i32 1, [[C2:%.]]
; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[T]], [[T4]]		; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[T]], [[T4]]
; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP1]], [[K:%.]]		; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP1]], [[K:%.]]
; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i32 [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i32 [[TMP2]], [[TMP1]]
; CHECK-NEXT: ret i1 [[TMP3]]		; CHECK-NEXT: ret i1 [[TMP3]]
;		;
%t = shl i32 1, %c1		%t = shl i32 1, %c1
%t4 = shl i32 1, %c2		%t4 = shl i32 1, %c2
%t1 = and i32 %t, %k		%t1 = and i32 %t, %k
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t5 = and i32 %t4, %k		%t5 = and i32 %t4, %k
%t6 = icmp eq i32 %t5, 0		%t6 = icmp eq i32 %t5, 0
		lebedev.riUnsubmitted Not Done Reply Inline Actions Similarly, could you please add 6 tests with different of each of these instructions having an extra use? lebedev.ri: Similarly, could you please add 6 tests with different of each of these instructions having an…
%or = or i1 %t2, %t6		%or = or i1 %t2, %t6
ret i1 %or		ret i1 %or
}		}

; Same as above but with operands commuted one of the ands, but not the other.		; Same as above but with operands commuted one of the ands, but not the other.
define i1 @foo1_and_commuted(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_and_commuted(i32 %k, i32 %c1, i32 %c2) {
; CHECK-LABEL: @foo1_and_commuted(		; CHECK-LABEL: @foo1_and_commuted(
; CHECK-NEXT: [[K2:%.]] = mul i32 [[K:%.]], [[K]]		; CHECK-NEXT: [[K2:%.]] = mul i32 [[K:%.]], [[K]]
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	;
%or = and i1 %t2, %t6		%or = and i1 %t2, %t6
ret i1 %or		ret i1 %or
}		}

; Same as last two, but shift-of-signbit replaced with 'icmp s*'		; Same as last two, but shift-of-signbit replaced with 'icmp s*'
define i1 @foo1_and_signbit_lshr_without_shifting_signbit(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_and_signbit_lshr_without_shifting_signbit(i32 %k, i32 %c1, i32 %c2) {
; CHECK-LABEL: @foo1_and_signbit_lshr_without_shifting_signbit(		; CHECK-LABEL: @foo1_and_signbit_lshr_without_shifting_signbit(
; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[C1:%.]]		; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[C1:%.]]
; CHECK-NEXT: [[T1:%.]] = and i32 [[T0]], [[K:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 -2147483648, [[C2:%.]]
; CHECK-NEXT: [[T2:%.*]] = icmp eq i32 [[T1]], 0		; CHECK-NEXT: [[TMP2:%.*]] = or i32 [[T0]], [[TMP1]]
; CHECK-NEXT: [[T3:%.]] = shl i32 [[K]], [[C2:%.]]		; CHECK-NEXT: [[TMP3:%.]] = and i32 [[TMP2]], [[K:%.]]
; CHECK-NEXT: [[T4:%.*]] = icmp sgt i32 [[T3]], -1		; CHECK-NEXT: [[TMP4:%.*]] = icmp ne i32 [[TMP3]], [[TMP2]]
; CHECK-NEXT: [[OR:%.*]] = or i1 [[T2]], [[T4]]		; CHECK-NEXT: ret i1 [[TMP4]]
; CHECK-NEXT: ret i1 [[OR]]
;		;
%t0 = shl i32 1, %c1		%t0 = shl i32 1, %c1
%t1 = and i32 %t0, %k		%t1 = and i32 %t0, %k
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = shl i32 %k, %c2		%t3 = shl i32 %k, %c2
%t4 = icmp sgt i32 %t3, -1		%t4 = icmp sgt i32 %t3, -1
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm guessing all these instructions must be one-use. lebedev.ri: I'm guessing all these instructions must be one-use.
%or = or i1 %t2, %t4		%or = or i1 %t2, %t4
ret i1 %or		ret i1 %or
}		}

define i1 @foo1_or_signbit_lshr_without_shifting_signbit(i32 %k, i32 %c1, i32 %c2) {		define i1 @foo1_or_signbit_lshr_without_shifting_signbit(i32 %k, i32 %c1, i32 %c2) {
; CHECK-LABEL: @foo1_or_signbit_lshr_without_shifting_signbit(		; CHECK-LABEL: @foo1_or_signbit_lshr_without_shifting_signbit(
; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[C1:%.]]		; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[C1:%.]]
; CHECK-NEXT: [[T1:%.]] = and i32 [[T0]], [[K:%.]]		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 -2147483648, [[C2:%.]]
; CHECK-NEXT: [[T2:%.*]] = icmp ne i32 [[T1]], 0		; CHECK-NEXT: [[TMP2:%.*]] = or i32 [[T0]], [[TMP1]]
; CHECK-NEXT: [[T3:%.]] = shl i32 [[K]], [[C2:%.]]		; CHECK-NEXT: [[TMP3:%.]] = and i32 [[TMP2]], [[K:%.]]
; CHECK-NEXT: [[T4:%.*]] = icmp slt i32 [[T3]], 0		; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP3]], [[TMP2]]
; CHECK-NEXT: [[OR:%.*]] = and i1 [[T2]], [[T4]]		; CHECK-NEXT: ret i1 [[TMP4]]
; CHECK-NEXT: ret i1 [[OR]]
;		;
%t0 = shl i32 1, %c1		%t0 = shl i32 1, %c1
%t1 = and i32 %t0, %k		%t1 = and i32 %t0, %k
%t2 = icmp ne i32 %t1, 0		%t2 = icmp ne i32 %t1, 0
%t3 = shl i32 %k, %c2		%t3 = shl i32 %k, %c2
%t4 = icmp slt i32 %t3, 0		%t4 = icmp slt i32 %t3, 0
%or = and i1 %t2, %t4		%or = and i1 %t2, %t4
ret i1 %or		ret i1 %or
}		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions Looks like this one should? lebedev.ri: Looks like this one should?
		huihuizAuthorUnsubmitted Done Reply Inline Actions I am OK with allowing 'and' to have more than one use. Folding cut instruction count by 2, the worst case is no instruction count increase. huihuiz: I am OK with allowing 'and' to have more than one use. Folding cut instruction count by 2, the…
		lebedev.riUnsubmitted Done Reply Inline Actions the worst case is no instruction count increase. Precisely. lebedev.ri: > the worst case is no instruction count increase. Precisely.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Looks like this one should? lebedev.ri: Looks like this one should?
		lebedev.riUnsubmitted Not Done Reply Inline Actions Could fold, no instruction count increase lebedev.ri: Could fold, no instruction count increase
		huihuizAuthorUnsubmitted Done Reply Inline Actions We should not fold when 'cmp' has more than one use, this will increase instruction count when one side is using the new form. Take this as an example, we need to keep the extra 'cmp' that can't be reused, while the benefit of folding is offset by the added signbit shift instruction. define i1 @foo(i32 %k, i32 %c1, i32 %c2, i1* %p) { %t0 = shl i32 1, %c1 %t1 = and i32 %t0, %k %t2 = icmp eq i32 %t1, 0 store i1 %t2, i1* %p ; extra use of cmp %t3 = shl i32 %k, %c2 %t4 = icmp sgt i32 %t3, -1 %or = or i1 %t2, %t4 ret i1 %or } huihuiz: We should not fold when 'cmp' has more than one use, this will increase instruction count when…
		lebedev.riUnsubmitted Not Done Reply Inline Actions I believe i'm missing the point in `@foo1_and_extra_use_cmp2`. We were doing this fold previously (in llvm trunk), now we don't because of use-count. But this restriction does not seem to improve anything? We had 8 instructions, and we have 8 instructions without the fold? lebedev.ri: I believe i'm missing the point in `@foo1_and_extra_use_cmp2`. We were doing this fold…
		huihuizAuthorUnsubmitted Done Reply Inline Actions This restriction doesn't improve anything, but make sure we don't increase instruction count for certain cases. If 'cmp' has extra use, and one side is using the new form, e.g., @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp1/2 (please expand the test file to see) Without this restriction, instruction count actually increased. If 'cmp' has extra use, and both sides are using old form, e.g., @foo1_and_extra_use_cmp, we have 8 instructions with and without the fold. Here is the diff without this restriction: --- a/llvm/test/Transforms/InstCombine/onehot_merge.ll +++ b/llvm/test/Transforms/InstCombine/onehot_merge.ll @@ -274,10 +274,10 @@ define i1 @foo1_and_extra_use_cmp(i32 %k, i32 %c1, i32 %c2, i1* %p) { ; CHECK-NEXT: [[T2:%.]] = and i32 [[T0]], [[K:%.]] ; CHECK-NEXT: [[T3:%.]] = icmp eq i32 [[T2]], 0 ; CHECK-NEXT: store i1 [[T3]], i1 [[P:%.]], align 1 -; CHECK-NEXT: [[T4:%.]] = and i32 [[T1]], [[K]] -; CHECK-NEXT: [[T5:%.]] = icmp eq i32 [[T4]], 0 -; CHECK-NEXT: [[OR:%.]] = or i1 [[T3]], [[T5]] -; CHECK-NEXT: ret i1 [[OR]] +; CHECK-NEXT: [[TMP1:%.]] = or i32 [[T0]], [[T1]] +; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP1]], [[K]] +; CHECK-NEXT: [[TMP3:%.]] = icmp ne i32 [[TMP2]], [[TMP1]] +; CHECK-NEXT: ret i1 [[TMP3]] ; %t0 = shl i32 1, %c1 %t1 = shl i32 1, %c2 @@ -340,13 +340,13 @@ define i1 @foo1_and_extra_use_cmp2(i32 %k, i32 %c1, i32 %c2, i1 %p) { ; CHECK-LABEL: @foo1_and_extra_use_cmp2( ; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[C1:%.]] ; CHECK-NEXT: [[T1:%.]] = shl i32 1, [[C2:%.]] -; CHECK-NEXT: [[T2:%.]] = and i32 [[T0]], [[K:%.]] -; CHECK-NEXT: [[T3:%.]] = icmp eq i32 [[T2]], 0 -; CHECK-NEXT: [[T4:%.]] = and i32 [[T1]], [[K]] +; CHECK-NEXT: [[T4:%.]] = and i32 [[T1]], [[K:%.]] ; CHECK-NEXT: [[T5:%.]] = icmp eq i32 [[T4]], 0 ; CHECK-NEXT: store i1 [[T5]], i1 [[P:%.]], align 1 -; CHECK-NEXT: [[OR:%.]] = or i1 [[T3]], [[T5]] -; CHECK-NEXT: ret i1 [[OR]] +; CHECK-NEXT: [[TMP1:%.]] = or i32 [[T0]], [[T1]] +; CHECK-NEXT: [[TMP2:%.]] = and i32 [[TMP1]], [[K]] +; CHECK-NEXT: [[TMP3:%.]] = icmp ne i32 [[TMP2]], [[TMP1]] +; CHECK-NEXT: ret i1 [[TMP3]] ; %t0 = shl i32 1, %c1 %t1 = shl i32 1, %c2 @@ -410,10 +410,11 @@ define i1 @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp1(i32 %k, ; CHECK-NEXT: [[T1:%.]] = and i32 [[T0]], [[K:%.]] ; CHECK-NEXT: [[T2:%.]] = icmp eq i32 [[T1]], 0 ; CHECK-NEXT: store i1 [[T2]], i1* [[P:%.]], align 1 -; CHECK-NEXT: [[T3:%.]] = shl i32 [[K]], [[C2:%.]] -; CHECK-NEXT: [[T4:%.]] = icmp sgt i32 [[T3]], -1 -; CHECK-NEXT: [[OR:%.]] = or i1 [[T2]], [[T4]] -; CHECK-NEXT: ret i1 [[OR]] +; CHECK-NEXT: [[TMP1:%.]] = lshr i32 -2147483648, [[C2:%.]] +; CHECK-NEXT: [[TMP2:%.]] = or i32 [[T0]], [[TMP1]] +; CHECK-NEXT: [[TMP3:%.]] = and i32 [[TMP2]], [[K]] +; CHECK-NEXT: [[TMP4:%.]] = icmp ne i32 [[TMP3]], [[TMP2]] +; CHECK-NEXT: ret i1 [[TMP4]] ; %t0 = shl i32 1, %c1 %t1 = and i32 %t0, %k define i1 @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp2(i32 %k, i32 %c1, i32 %c2, i1* %p) { ; CHECK-LABEL: @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp2( ; CHECK-NEXT: [[T0:%.]] = shl i32 1, [[C1:%.]] -; CHECK-NEXT: [[T1:%.]] = and i32 [[T0]], [[K:%.]] -; CHECK-NEXT: [[T2:%.]] = icmp eq i32 [[T1]], 0 -; CHECK-NEXT: [[T3:%.]] = shl i32 [[K]], [[C2:%.]] +; CHECK-NEXT: [[T3:%.]] = shl i32 [[K:%.]], [[C2:%.]] ; CHECK-NEXT: [[T4:%.]] = icmp sgt i32 [[T3]], -1 ; CHECK-NEXT: store i1 [[T4]], i1 [[P:%.]], align 1 -; CHECK-NEXT: [[OR:%.]] = or i1 [[T2]], [[T4]] -; CHECK-NEXT: ret i1 [[OR]] +; CHECK-NEXT: [[TMP1:%.]] = lshr i32 -2147483648, [[C2]] +; CHECK-NEXT: [[TMP2:%.]] = or i32 [[T0]], [[TMP1]] +; CHECK-NEXT: [[TMP3:%.]] = and i32 [[TMP2]], [[K]] +; CHECK-NEXT: [[TMP4:%.]] = icmp ne i32 [[TMP3]], [[TMP2]] +; CHECK-NEXT: ret i1 [[TMP4]] ; %t0 = shl i32 1, %c1 %t1 = and i32 %t0, %k huihuiz: This restriction doesn't improve anything, but make sure we don't increase instruction count…
		lebedev.riUnsubmitted Not Done Reply Inline Actions What i'm saying is, yes, sure, that one-use check is nessesary for some of the patterns, but for this one it causes regressions for no benefit. Can that check be tightened up to not have this false-positive? lebedev.ri: What i'm saying is, yes, sure, that one-use check is nessesary for some of the patterns, but…

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2().AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 208251

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

llvm/test/Transforms/InstCombine/onehot_merge.ll

[InstCombine] Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2().
AbandonedPublic