This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x u</u>= (-C) earlier.
ClosedPublic

Authored by huihuiz on Jun 18 2019, 11:19 AM.

Details

Summary

To generate simplified IR, make sure fold

(X & ~C) ==/!= 0 --> X u</u>= C+1

is scheduled before fold

((X << Y) & C) == 0 -> (X & (C >> Y)) == 0.

https://rise4fun.com/Alive/7ZN

Diff Detail

Repository
rL LLVM

Event Timeline

huihuiz created this revision.Jun 18 2019, 11:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2019, 11:19 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

This is the second fold we talked about , split from D63026

lebedev.ri edited the summary of this revision. (Show Details)Jun 18 2019, 11:34 AM

Looks good for IR, but looks like this needs an undo fold for backend?
https://godbolt.org/z/giY9Cp
At least for aarch64 this seems to result in worse ASM?

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1643–1645

Why are we restricting this fold to single-use and?

1656

((%x & C) == 0) --> %x u< (-C) iff (-C) is power of two.

1657–1658

Why do we care about that? Seems rather arbitrary.

huihuiz updated this revision to Diff 205409.Jun 18 2019, 11:54 AM
huihuiz marked 2 inline comments as done.
huihuiz marked an inline comment as done.
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1643–1645

Same reason in D63026 (resolving PR10267)

1657–1658

take this test as example

define i1 @test_shift_and_cmp_changed2(i8 %p) {
  %shlp = shl i8 %p, 5
  %andp = and i8 %shlp, -64
  %cmp = icmp ult i8 %andp, 32
  ret i1 %cmp
}

we do miss fold for (X >> C3) & C2 != C1 --> (X & (C2 << C3)) != (C1 << C3)

lebedev.ri added inline comments.Jun 18 2019, 12:03 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1647

Despite how pointless it will look, please add the comment to each of the folds.
Nothing prevents from other folds being added, and this comment getting out of date.

1657–1658

Is that IR without this restriction?
We currently seem to handle it just fine https://godbolt.org/z/8DB1xD

huihuiz marked an inline comment as done.Jun 18 2019, 12:03 PM
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

for this test, should be better transforming into this

define i1 @test_shift_and_cmp_changed2(i8 %p) {
  %andp = and i8 %p, 6
  %cmp = icmp eq i8 %andp, 0
  ret i1 %cmp
}

However, by scheduling ((%x & C) == 0) --> %x u< (-C) iff (-C) is power of two earlier,
we end up with

define i1 @test_shift_and_cmp_changed2(i8 %p) {
  %shlp = shl i8 %p, 5
  %cmp = icmp ult i8 %shlp, 64
  ret i1 %cmp
}
huihuiz updated this revision to Diff 205413.Jun 18 2019, 12:12 PM
huihuiz marked 2 inline comments as done.
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

Yes , that is the IR without add this restriction.
We are currently handle it fine.
But when we try to schedule (X & ~C) ==/!= 0 --> X u</u>= C+1 earlier
we end up with

define i1 @test_shift_and_cmp_changed2(i8 %p) {
  %shlp = shl i8 %p, 5
  %cmp = icmp ult i8 %shlp, 64
  ret i1 %cmp
}
lebedev.ri added inline comments.Jun 18 2019, 12:55 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

Well, nice find.
Like i guessed this rescheduling is exposing all kinds of missing folds :)

lebedev.ri added inline comments.Jun 18 2019, 2:11 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

Can you give me a spoiler, if you drop this restriction, are there any other regressions in check-llvm?

huihuiz marked an inline comment as done.Jun 18 2019, 2:31 PM
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

Here are the failing tests when dropping this restriction:

  1. test file: test/Transforms/InstCombine/pr17827.ll

test_shift_and_cmp_changed2 and its vector version

  1. test file: test/Transforms/InstCombine/lshr-and-negC-icmpeq-zero.ll

scalar_i32_lshr_and_negC_eq_X_is_constant1
and
scalar_i32_lshr_and_negC_eq_X_is_constant1

 define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant1(i32 %y) {
 ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant1(
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 12345, [[Y:%.*]]
-; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
-; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
+; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %lshr = lshr i32 12345, %y
   %and = and i32 %lshr, 4294967288  ; ~7
   %r = icmp eq i32 %and, 0
   ret i1 %r
 }

 define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant2(i32 %y) {
 ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant2(
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 268435456, [[Y:%.*]]
-; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
-; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
+; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %lshr = lshr i32 268435456, %y
   %and = and i32 %lshr, 4294967288  ; ~7
   %r = icmp eq i32 %and, 0
   ret i1 %r
 }
  1. similarly for test/Transforms/InstCombine/shl-and-negC-icmpeq-zero.ll
lebedev.ri added inline comments.Jun 18 2019, 2:40 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

I'm confused, 2. and 3. are improvements, right?
So only test_shift_and_cmp_changed2 regresses?
A missing fold should simply be added then, i'd say.
I'm not sure what it is yet though.

huihuiz marked an inline comment as done.Jun 18 2019, 2:51 PM
huihuiz added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

test file 2 and 3 are improvements. But the improved test cases are created mostly to check for correctness, I am not sure the improved test cases are actually popular in real applications.

test file 1 is regression, the failing test is simplified from bit filed test bug in pr17827. Probably more prevalent in application code.

Adding restriction tend to benefit more cases.

lebedev.ri added inline comments.Jun 18 2019, 3:08 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1657–1658

(TLDR: It's all such a mess, isn't it?)

I am not sure the improved test cases are actually popular in real applications.

Given just how many these transforms are, it's not particularly relevant what can be and what can't be
encountered in the "actually popular in real applications", what mattes is what IR patterns
we can encounter in "actually popular in real application" after the middle-end opt pipeline.

test file 1 is regression, the failing test is simplified from bit filed test bug in pr17827. Probably more prevalent in application code.

So naturally, if that new pattern is now being encountered in a testcase reduced from an actual code,
then by definition it can be encountered in "actually popular in real application", no?

Adding restriction tend to benefit more cases.

But do we know that? Sadly all this patternmatching is done blindly, with no attempt at cost modelling,
or trying to ultimately produce least amount of instructions/etc, it just tries to combine instructions
when they can be combined.

huihuiz updated this revision to Diff 206042.Jun 21 2019, 11:54 AM
huihuiz retitled this revision from [InstCombine] Fold icmp eq/ne (and %x, ~C), 0 -> %x u</u>= 0 earlier, C+1 is power of 2. to [InstCombine] Fold icmp eq/ne (and %x, C), 0 iff (-C) is power of two -> %x u</u>= (-C) earlier..

Yes , this reorder expose yet another missing fold. As regression in test/Transforms/InstCombine/pr17827.ll

Simplify 'shl' inequality test into 'and' equality test should fix this issue. I am posting this into another differential, link shortly.

icmp ult/uge (shl %x, C2), C1 iff C1 is power of two -> icmp eq/ne (and %x, (lshr -C1, C2)), 0

Looks good. I'll accept at the same time with D63675.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1656

s/to/only for/

lebedev.ri added inline comments.Jun 22 2019, 2:15 AM
llvm/test/Transforms/InstCombine/pr17827.ll
7 ↗(On Diff #206042)

Please can you regenerate this test (and just directly commit, no review) to get rid of this noise.

huihuiz updated this revision to Diff 206307.Jun 24 2019, 2:53 PM
huihuiz marked 2 inline comments as done.

addressed review comments

Looks good, thank you.

lebedev.ri accepted this revision.Jun 24 2019, 3:14 PM
This revision is now accepted and ready to land.Jun 24 2019, 3:14 PM
This revision was automatically updated to reflect the committed changes.