This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] peek through FP casts for sign-bit compares (PR36682)
ClosedPublic

Authored by spatel on Mar 11 2018, 10:15 AM.

Details

Summary

This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682
https://godbolt.org/g/LhuD9A

The 'undef' in the vector test is just there to verify a recent improvement in pattern-matching: vector constant matches can include undef elements.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Mar 11 2018, 10:15 AM

In the mean time i experimented with generating the testcases (clearly, overdone it), and this patch does not handle all the cases i've come up with. Or maybe the cases are just invalid, which is more likely.

(with this patch)

Generated with:

/tmp$ clang++ bitcast.cpp
/build/llvm/test/Transforms/InstCombine$ /tmp/a.out > cast-int-icmp-eq-0.ll; /build/llvm/utils/update_test_checks.py --opt=/build/llvm-build-Clang-release/bin/opt cast-int-icmp-eq-0.ll

In the mean time i experimented with generating the testcases (clearly, overdone it), and this patch does not handle all the cases i've come up with. Or maybe the cases are just invalid, which is more likely.

Nice. I thought about the X == 0 and X != 0 cases just after I posted this. I think those are safe to add in a small follow-up patch.

We should canonicalize other signbit-check predicates (sle / sge) to this form. Are you seeing any of those cases fail to match with this patch?

In the mean time i experimented with generating the testcases (clearly, overdone it), and this patch does not handle all the cases i've come up with. Or maybe the cases are just invalid, which is more likely.

Nice. I thought about the X == 0 and X != 0 cases just after I posted this. I think those are safe to add in a small follow-up patch.

We should canonicalize other signbit-check predicates (sle / sge) to this form. Are you seeing any of those cases fail to match with this patch?

Eh, hard to say, there are more not changed patterns than changed, as you can see from the attached cast-int-icmp-eq-0.ll.
In particular, if the sizes change (when input size != bitcast fp type), then it's not handled.
All uitofp are not handled, obviously. And non slt 0/sgt -1 condition codes are not touched, too.

Should some version of that cast-int-icmp-eq-0.ll be committed? (anything missing from it?)

In the mean time i experimented with generating the testcases (clearly, overdone it), and this patch does not handle all the cases i've come up with. Or maybe the cases are just invalid, which is more likely.

Nice. I thought about the X == 0 and X != 0 cases just after I posted this. I think those are safe to add in a small follow-up patch.

We should canonicalize other signbit-check predicates (sle / sge) to this form. Are you seeing any of those cases fail to match with this patch?

Eh, hard to say, there are more not changed patterns than changed, as you can see from the attached cast-int-icmp-eq-0.ll.
In particular, if the sizes change (when input size != bitcast fp type), then it's not handled.

The tests in this patch have size-changing casts (both larger and smaller than the original type). Did I miss something?

All uitofp are not handled, obviously. And non slt 0/sgt -1 condition codes are not touched, too.

I think uitofp followed by a signbit check will be optimized away completely to true/false, so that can go in InstSimplify. That's another small, independent patch.

Should some version of that cast-int-icmp-eq-0.ll be committed? (anything missing from it?)

Sure - if you want to check it in, I can rebase this patch to show the diffs. Including some weird FP types like PPC_FP128 might be interesting. And having some 'undef' in vector tests as I have here also increases coverage.
But I don't see much value in repeating every test for every combination of types, sle/sge, or vector+scalar repeats of everything (if we know the transform works for at least one vector type, then it almost certainly works in the other cases).

Ok, i have limited the test generator to just the sitofp, slt 0 / sle -1 / sgt -1 / sge 0, added undef in the middle of vector variant.
Good news is that all the scalar cases are indeed handled.
Bad news is that half of the vector cases is not handled.

<- just all the vector cases, with your patch.

$ grep "CHECK-NEXT" cast-int-icmp-eq-0.ll | grep icmp | wc -l
36
$ grep "CHECK-NEXT" cast-int-icmp-eq-0.ll | grep bitcast | wc -l
18 # (so exactly half)
$ grep "CHECK-NEXT" cast-int-icmp-eq-0.ll | grep -B 1 -A 1 bitcast
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i32> [[I:%.*]] to <3 x float>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x float> [[F]] to <3 x i32>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i32> [[B]], <i32 0, i32 undef, i32 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i32> [[I:%.*]] to <3 x float>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x float> [[F]] to <3 x i32>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i32> [[B]], <i32 0, i32 undef, i32 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i32> [[I:%.*]] to <3 x double>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x double> [[F]] to <3 x i64>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i64> [[B]], <i64 0, i64 undef, i64 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i32> [[I:%.*]] to <3 x double>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x double> [[F]] to <3 x i64>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i64> [[B]], <i64 0, i64 undef, i64 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i32> [[I:%.*]] to <3 x half>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x half> [[F]] to <3 x i16>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i16> [[B]], <i16 0, i16 undef, i16 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i32> [[I:%.*]] to <3 x half>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x half> [[F]] to <3 x i16>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i16> [[B]], <i16 0, i16 undef, i16 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i64> [[I:%.*]] to <3 x float>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x float> [[F]] to <3 x i32>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i32> [[B]], <i32 0, i32 undef, i32 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i64> [[I:%.*]] to <3 x float>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x float> [[F]] to <3 x i32>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i32> [[B]], <i32 0, i32 undef, i32 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i64> [[I:%.*]] to <3 x double>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x double> [[F]] to <3 x i64>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i64> [[B]], <i64 0, i64 undef, i64 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i64> [[I:%.*]] to <3 x double>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x double> [[F]] to <3 x i64>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i64> [[B]], <i64 0, i64 undef, i64 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i64> [[I:%.*]] to <3 x half>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x half> [[F]] to <3 x i16>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i16> [[B]], <i16 0, i16 undef, i16 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i64> [[I:%.*]] to <3 x half>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x half> [[F]] to <3 x i16>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i16> [[B]], <i16 0, i16 undef, i16 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i16> [[I:%.*]] to <3 x float>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x float> [[F]] to <3 x i32>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i32> [[B]], <i32 0, i32 undef, i32 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i16> [[I:%.*]] to <3 x float>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x float> [[F]] to <3 x i32>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i32> [[B]], <i32 0, i32 undef, i32 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i16> [[I:%.*]] to <3 x double>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x double> [[F]] to <3 x i64>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i64> [[B]], <i64 0, i64 undef, i64 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i16> [[I:%.*]] to <3 x double>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x double> [[F]] to <3 x i64>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i64> [[B]], <i64 0, i64 undef, i64 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i16> [[I:%.*]] to <3 x half>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x half> [[F]] to <3 x i16>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i16> [[B]], <i16 0, i16 undef, i16 0>
--
; CHECK-NEXT:    [[F:%.*]] = sitofp <3 x i16> [[I:%.*]] to <3 x half>
; CHECK-NEXT:    [[B:%.*]] = bitcast <3 x half> [[F]] to <3 x i16>
; CHECK-NEXT:    [[CMP:%.*]] = icmp slt <3 x i16> [[B]], <i16 0, i16 undef, i16 0>

Unless it is because i'm using yesterday's trunk, I think the test coverage needs improving.
I think i should go ahead and commit that initial testfile now, unless it is too large?

Ok, i have limited the test generator to just the sitofp, slt 0 / sle -1 / sgt -1 / sge 0, added undef in the middle of vector variant.
Good news is that all the scalar cases are indeed handled.
Bad news is that half of the vector cases is not handled.

Ah, you're detecting a missing undef pattern match for m_Zero(). Nobody made that change to that matcher yet, unlike m_AllOnes().

Unless it is because i'm using yesterday's trunk, I think the test coverage needs improving.
I think i should go ahead and commit that initial testfile now, unless it is too large?

I think that's fine. We can always remove some tests if we think they don't add value.

Ok, i have limited the test generator to just the sitofp, slt 0 / sle -1 / sgt -1 / sge 0, added undef in the middle of vector variant.
Good news is that all the scalar cases are indeed handled.
Bad news is that half of the vector cases is not handled.

Ah, you're detecting a missing undef pattern match for m_Zero(). Nobody made that change to that matcher yet, unlike m_AllOnes().

To make that clear - if you make the vector constants 'zeroinitializer' or make all elements explicitly '0' rather than including undef, I expect them to be folded.

Looking more at this, as per alive-nj, for all the type combinations i checked
(input: i16, i32, i64; intermediate: half/i16, float/i32, double/i64)
for the following icmp comparisons the sitofp+bitcast can be dropped:

  • eq 0
  • ne 0
  • slt 0
  • sle 0
  • sge 0
  • sgt 0
  • slt 1
  • sge 1
  • sle -1
  • sgt -1

I did not check vectors, but i'm guessing it's the same there.


Will put up the testcase diff up next.

spatel updated this revision to Diff 138088.Mar 12 2018, 1:31 PM

Patch updated:
This is only a rebase with the new tests from rL327301 and deleting the redundant ones.

Should we handle x < 1, x > 0, x == 0, and x != 0 in this patch too?

Patch updated:
This is only a rebase with the new tests from rL327301 and deleting the redundant ones.

Thanks, looks like a start!

Should we handle x < 1, x > 0, x == 0, and x != 0 in this patch too?

Those don't really match the differential's subject, i'd say they should be in the next one[s].
(I suppose i can implement those cases now that there is an example :))

Those don't really match the differential's subject, i'd say they should be in the next one[s].
(I suppose i can implement those cases now that there is an example :))

No need - I already have the patch done. I just wasn't sure whether 1 step or multiple would be preferred.
I haven't worked on the uitofp InstSimplify patch if you want to handle that one.

Those don't really match the differential's subject, i'd say they should be in the next one[s].
(I suppose i can implement those cases now that there is an example :))

No need - I already have the patch done. I just wasn't sure whether 1 step or multiple would be preferred.

I haven't worked on the uitofp InstSimplify patch if you want to handle that one.

Ok, sounds good, i'll look into that then.

Those don't really match the differential's subject, i'd say they should be in the next one[s].
(I suppose i can implement those cases now that there is an example :))

No need - I already have the patch done. I just wasn't sure whether 1 step or multiple would be preferred.

I haven't worked on the uitofp InstSimplify patch if you want to handle that one.

Ok, sounds good, i'll look into that then.

And done, D44416 / D44421 / D44424 / D44425.
Only two patterns in each, others are either canonicalized to them, or are invalid as per alive.

This needs a rebase now that D44424 has landed, but otherwise this looks good to me, since it's very similar to those two commits.

spatel updated this revision to Diff 138949.Mar 19 2018, 9:02 AM

Patch updated:
Rebased against trunk.

lebedev.ri accepted this revision.Mar 23 2018, 11:05 AM

Patch updated:
Rebased against trunk.

Oh, sure, i did not realize this was blocked on me :)
LGTM, thank you!

This revision is now accepted and ready to land.Mar 23 2018, 11:05 AM
This revision was automatically updated to reflect the committed changes.