This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Generalize InstCombiner::foldAndOrOfICmpsOfAndWithPow2().
AbandonedPublic

Authored by huihuiz on Jul 5 2019, 8:52 PM.

Details

Summary

In the middle-end, (A & (signbit l>> K)) ==/!= 0 can be folded into
(A l<< K) s>=/s< 0.

When folding 'And' or 'Or' of ICmps, need to extend the iszero() check
of (A & (signbit l>> K)) to its equivalent form of (A l<< K) s>=/s< 0.

Diff Detail

Repository
rL LLVM

Event Timeline

huihuiz created this revision.Jul 5 2019, 8:52 PM
lebedev.ri requested changes to this revision.Jul 9 2019, 1:06 AM

Thanks for working on this!
This is for sure missing some one-use checks (i.e. increases instruction count).
Could you please add 5 copies of @foo1_and_signbit_lshr_without_shifting_signbit, each one with different instruction having extra use?
It is also a good idea to add a test where both sides of or/and are in this new form.

llvm/test/Transforms/InstCombine/onehot_merge.ll
163–166

I'm guessing all these instructions must be one-use.

This revision now requires changes to proceed.Jul 9 2019, 1:06 AM
huihuiz updated this revision to Diff 209121.Jul 10 2019, 10:02 PM

Addressed review comments

huihuiz added a comment.EditedJul 10 2019, 10:15 PM

Thanks for working on this!
This is for sure missing some one-use checks (i.e. increases instruction count).
Could you please add 5 copies of @foo1_and_signbit_lshr_without_shifting_signbit, each one with different instruction having extra use?

Thank you for catching this!
Yes indeed, increases instruction count.

For (A & K) ==/!= 0, need to restrict one use for 'and' and 'cmp' , there is no need to check K ('shift') for more than one use.
For equivalent form (A l<< K) s<=/s> 0, need to restrict one use for 'shift' and 'cmp'

It is also a good idea to add a test where both sides of or/and are in this new form.

Test cases added, please expand to see (as there is no diff)
I believe we should not try to decompose (A l<< K) s<=/s> 0 into (A & (signbit l>> K)) ==/!= 0 when both sides are in this new form, as this decomposition introduce additional instruction of signbit shift.
Also, foldLogOpOfMaskedICmps() in InstCombineCompares.cpp will try to fold ((A l<< K1) s<=/> 0) &/| ((A l<< K2) s<=/> 0) into a single icmp.

I think this looks ok, but i'd like to see more test coverage :)

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
888

could just return match(...);

889

Some test should "regress" (improve due to this), could you please add similar extra-use test coverage for this too?

908–910

To be pointed out, this indeed creates an instruction while we don't know yet whether we will be able to do the fold,
not great, but i think this is the smallest evil, the alternative solution would indeed be uglier..

llvm/test/Transforms/InstCombine/onehot_merge.ll
27–32

Similarly, could you please add 6 tests with different of each of these instructions having an extra use?

lebedev.ri requested changes to this revision.Jul 11 2019, 12:26 AM

(just marking as reviewed)

This revision now requires changes to proceed.Jul 11 2019, 12:26 AM
spatel added inline comments.Jul 11 2019, 10:59 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910

Hmm...do we have a test where the temporary instruction is created, but the fold fails?
Last time I made that mistake, we infinite looped: the temporary instruction gets created and deleted triggering another round of combining and repeat forever.

lebedev.ri added inline comments.Jul 11 2019, 11:09 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910

Oh, hmm.

huihuiz marked an inline comment as done.Jul 11 2019, 11:42 AM
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910

Thank you for point this out! Indeed there are chances of infinite looping

see test

define i1 @foo(i32 %k, i32 %c1, i32 %c2) {
  %t0 = shl i32 3, %c1
  %t1 = and i32 %t0, %k
  %t2 = icmp eq i32 %t1, 0
  %t3 = shl i32 %k, %c2
  %t4 = icmp sgt i32 %t3, -1
  %or = or i1 %t2, %t4
  ret i1 %or
}

I am working on a solution now.

lebedev.ri added inline comments.Jul 11 2019, 11:49 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910

Can we just manually delete the instruction if the fold fails?
I'm sure that can be nicely wrapped into RAII wrapper.

spatel added inline comments.Jul 11 2019, 1:12 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910

I'm skeptical. I could be wrong, but it's not done anywhere else in instcombine?
Could we send 'B2' out as a partial result (for example, if it's not null, we matched), and then have the caller create the shift if needed?

lebedev.ri added inline comments.Jul 11 2019, 1:16 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
908–910

If we don't create this instruction, i guess e.g. isCompareToZeroOrEquivalentForm(), isKnownToBeAPowerOfTwo() will need to be also modified, no?

huihuiz updated this revision to Diff 209341.Jul 11 2019, 2:37 PM

I think we should only create signbit shift instruction when fold is supposed to happen.

I update the differential, let me know what you guys think?

huihuiz marked 2 inline comments as done.Jul 11 2019, 2:42 PM

test case foo1_and_signbit_lshr_without_shifting_signbit_not_pwr2() aim to check that we are not creating additional shift instruction when fold fails.

delete the signbit shift instruction might not break the cycle of infinite "create and delete".

Hm, not that ugly; this is still correct?

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
893–897

Can we end up needing to create this shift on both sides?
If yes, then can you make this a lambda helper?
If not, create it once and std::swap() into proper variable?

huihuiz updated this revision to Diff 209368.Jul 11 2019, 4:09 PM
huihuiz marked an inline comment as done.
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
893–897

We are not supposed to create for both sides, this will be handled by early exit.

lambda helper probably better.

lebedev.ri added inline comments.Jul 11 2019, 4:14 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
924

This needs a comment as to why we're creating a node (in what pattern are we?)

llvm/test/Transforms/InstCombine/onehot_merge.ll
187

Looks like this one should?

187

Looks like this one should?

187

Could fold, no instruction count increase

huihuiz updated this revision to Diff 209382.Jul 11 2019, 5:36 PM
huihuiz marked 3 inline comments as done.
huihuiz added inline comments.
llvm/test/Transforms/InstCombine/onehot_merge.ll
187

We should not fold when 'cmp' has more than one use, this will increase instruction count when one side is using the new form.

Take this as an example, we need to keep the extra 'cmp' that can't be reused, while the benefit of folding is offset by the added signbit shift instruction.

define i1 @foo(i32 %k, i32 %c1, i32 %c2, i1* %p) {
  %t0 = shl i32 1, %c1
  %t1 = and i32 %t0, %k
  %t2 = icmp eq i32 %t1, 0
  store i1 %t2, i1* %p  ; extra use of cmp
  %t3 = shl i32 %k, %c2
  %t4 = icmp sgt i32 %t3, -1
  %or = or i1 %t2, %t4
  ret i1 %or
}
187

I am OK with allowing 'and' to have more than one use.

Folding cut instruction count by 2, the worst case is no instruction count increase.

huihuiz updated this revision to Diff 209384.Jul 11 2019, 5:50 PM

forget to update the test comment, sorry

lebedev.ri marked an inline comment as done.Jul 12 2019, 3:06 PM
lebedev.ri added inline comments.
llvm/test/Transforms/InstCombine/onehot_merge.ll
187

the worst case is no instruction count increase.

Precisely.

187

I believe i'm missing the point in @foo1_and_extra_use_cmp2.
We were doing this fold previously (in llvm trunk), now we don't because of use-count.
But this restriction does not seem to improve anything?
We had 8 instructions, and we have 8 instructions without the fold?

huihuiz marked an inline comment as done.Jul 12 2019, 4:10 PM
huihuiz added inline comments.
llvm/test/Transforms/InstCombine/onehot_merge.ll
187

This restriction doesn't improve anything, but make sure we don't increase instruction count for certain cases.

If 'cmp' has extra use, and one side is using the new form, e.g., @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp1/2 (please expand the test file to see)
Without this restriction, instruction count actually increased.

If 'cmp' has extra use, and both sides are using old form, e.g., @foo1_and_extra_use_cmp, we have 8 instructions with and without the fold.

Here is the diff without this restriction:

--- a/llvm/test/Transforms/InstCombine/onehot_merge.ll
+++ b/llvm/test/Transforms/InstCombine/onehot_merge.ll
@@ -274,10 +274,10 @@ define i1 @foo1_and_extra_use_cmp(i32 %k, i32 %c1, i32 %c2, i1* %p) {
 ; CHECK-NEXT:    [[T2:%.*]] = and i32 [[T0]], [[K:%.*]]
 ; CHECK-NEXT:    [[T3:%.*]] = icmp eq i32 [[T2]], 0
 ; CHECK-NEXT:    store i1 [[T3]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[T4:%.*]] = and i32 [[T1]], [[K]]
-; CHECK-NEXT:    [[T5:%.*]] = icmp eq i32 [[T4]], 0
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T3]], [[T5]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[T0]], [[T1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[K]]
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne i32 [[TMP2]], [[TMP1]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = shl i32 1, %c2
@@ -340,13 +340,13 @@ define i1 @foo1_and_extra_use_cmp2(i32 %k, i32 %c1, i32 %c2, i1* %p) {
 ; CHECK-LABEL: @foo1_and_extra_use_cmp2(
 ; CHECK-NEXT:    [[T0:%.*]] = shl i32 1, [[C1:%.*]]
 ; CHECK-NEXT:    [[T1:%.*]] = shl i32 1, [[C2:%.*]]
-; CHECK-NEXT:    [[T2:%.*]] = and i32 [[T0]], [[K:%.*]]
-; CHECK-NEXT:    [[T3:%.*]] = icmp eq i32 [[T2]], 0
-; CHECK-NEXT:    [[T4:%.*]] = and i32 [[T1]], [[K]]
+; CHECK-NEXT:    [[T4:%.*]] = and i32 [[T1]], [[K:%.*]]
 ; CHECK-NEXT:    [[T5:%.*]] = icmp eq i32 [[T4]], 0
 ; CHECK-NEXT:    store i1 [[T5]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T3]], [[T5]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = or i32 [[T0]], [[T1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], [[K]]
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ne i32 [[TMP2]], [[TMP1]]
+; CHECK-NEXT:    ret i1 [[TMP3]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = shl i32 1, %c2
@@ -410,10 +410,11 @@ define i1 @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp1(i32 %k,
 ; CHECK-NEXT:    [[T1:%.*]] = and i32 [[T0]], [[K:%.*]]
 ; CHECK-NEXT:    [[T2:%.*]] = icmp eq i32 [[T1]], 0
 ; CHECK-NEXT:    store i1 [[T2]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[T3:%.*]] = shl i32 [[K]], [[C2:%.*]]
-; CHECK-NEXT:    [[T4:%.*]] = icmp sgt i32 [[T3]], -1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T2]], [[T4]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 -2147483648, [[C2:%.*]]
+; CHECK-NEXT:    [[TMP2:%.*]] = or i32 [[T0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], [[K]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i32 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = and i32 %t0, %k

 define i1 @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp2(i32 %k, i32 %c1, i32 %c2, i1* %p) {
 ; CHECK-LABEL: @foo1_and_signbit_lshr_without_shifting_signbit_extra_use_cmp2(
 ; CHECK-NEXT:    [[T0:%.*]] = shl i32 1, [[C1:%.*]]
-; CHECK-NEXT:    [[T1:%.*]] = and i32 [[T0]], [[K:%.*]]
-; CHECK-NEXT:    [[T2:%.*]] = icmp eq i32 [[T1]], 0
-; CHECK-NEXT:    [[T3:%.*]] = shl i32 [[K]], [[C2:%.*]]
+; CHECK-NEXT:    [[T3:%.*]] = shl i32 [[K:%.*]], [[C2:%.*]]
 ; CHECK-NEXT:    [[T4:%.*]] = icmp sgt i32 [[T3]], -1
 ; CHECK-NEXT:    store i1 [[T4]], i1* [[P:%.*]], align 1
-; CHECK-NEXT:    [[OR:%.*]] = or i1 [[T2]], [[T4]]
-; CHECK-NEXT:    ret i1 [[OR]]
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 -2147483648, [[C2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = or i32 [[T0]], [[TMP1]]
+; CHECK-NEXT:    [[TMP3:%.*]] = and i32 [[TMP2]], [[K]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ne i32 [[TMP3]], [[TMP2]]
+; CHECK-NEXT:    ret i1 [[TMP4]]
 ;
   %t0 = shl i32 1, %c1
   %t1 = and i32 %t0, %k
lebedev.ri added inline comments.Jul 12 2019, 11:06 PM
llvm/test/Transforms/InstCombine/onehot_merge.ll
187

What i'm saying is, yes, sure, that one-use check is nessesary for some of the patterns, but for this one
it causes regressions for no benefit. Can that check be tightened up to not have *this* false-positive?

lebedev.ri requested changes to this revision.Jul 24 2019, 12:30 PM

(just marking as reviewed)

This revision now requires changes to proceed.Jul 24 2019, 12:30 PM
lebedev.ri resigned from this revision.Jan 12 2023, 4:43 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:43 PM
Herald added a subscriber: StephenFan. · View Herald Transcript
huihuiz abandoned this revision.Jan 13 2023, 9:22 AM