This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] use ConstantRange to simplify more and-of-icmps
ClosedPublic

Authored by spatel on Apr 17 2017, 5:09 PM.

Details

Summary

We can simplify (and (icmp X, C1), (icmp X, C2)) to one of the icmps in many cases. I had to check some of these with Alive to prove to myself it's right, but everything seems to check out. Eg, the code in instcombine was completely ignoring predicates with mismatched signedness.

Handling or-of-icmps would be a follow-up step.

I initially thought I could use equivalence of the intersection with one of the input ranges rather than 'contains', but ConstantRange intersection doesn't work like I expected (not sure if it's wrong or if my expectations are off).

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Apr 17 2017, 5:09 PM
sanjoy edited edge metadata.Apr 17 2017, 10:08 PM

I initially thought I could use equivalence of the intersection with one of the input ranges rather than 'contains', but ConstantRange intersection doesn't work like I expected

Do you mean there are cases where A.contains(B) is true but A.intersect(B) is not B? If so, that does sound fishy.

I initially thought I could use equivalence of the intersection with one of the input ranges rather than 'contains', but ConstantRange intersection doesn't work like I expected

Do you mean there are cases where A.contains(B) is true but A.intersect(B) is not B? If so, that does sound fishy.

I saw the inverse: A.contains(B) is false, but A.intersectsWith(B) is B.

Example:

define i1 @and_ne_ne(i8 %x) {
  %a = icmp ne i8 %x, 13
  %b = icmp ne i8 %x, 17
  %c = and i1 %a, %b
  ret i1 %c
}

ConstantRange shows:
A: Lower = 14, Upper = 13
B: Lower = 18, Upper = 17
A.intersectsWith(B) returns B (Lower = 18, Upper = 17).
I thought this would return Lower = 18, Upper = 13 based on the header comment for intersectsWith:
"Return the range that results from the intersection of this range with another range.
The resultant range is guaranteed to include all elements contained in both input ranges, and to
have the smallest possible set size that does so."

I initially thought I could use equivalence of the intersection with one of the input ranges rather than 'contains', but ConstantRange intersection doesn't work like I expected

Do you mean there are cases where A.contains(B) is true but A.intersect(B) is not B? If so, that does sound fishy.

I saw the inverse: A.contains(B) is false, but A.intersectsWith(B) is B.

Okay, that does not look fishy. :)

Example:

define i1 @and_ne_ne(i8 %x) {
  %a = icmp ne i8 %x, 13
  %b = icmp ne i8 %x, 17
  %c = and i1 %a, %b
  ret i1 %c
}

ConstantRange shows:
A: Lower = 14, Upper = 13
B: Lower = 18, Upper = 17
A.intersectsWith(B) returns B (Lower = 18, Upper = 17).
I thought this would return Lower = 18, Upper = 13 based on the header comment for intersectsWith:
"Return the range that results from the intersection of this range with another range.
The resultant range is guaranteed to include all elements contained in both input ranges, and to
have the smallest possible set size that does so."

In that case, (Lower = 18, Upper = 17) is one of the right answer(s). (Lower = 18, Upper = 13) would be wrong because, e.g., 15 is both in A and in B, but not in (Lower = 18, Upper = 13), and the contract is that "The resultant range is guaranteed to include all elements contained in both input ranges".

In that case, (Lower = 18, Upper = 17) is one of the right answer(s). (Lower = 18, Upper = 13) would be wrong because, e.g., 15 is both in A and in B, but not in (Lower = 18, Upper = 13), and the contract is that "The resultant range is guaranteed to include all elements contained in both input ranges".

Ah, I get it now. Thanks for the explanation. In that case "contains" seems like my only option for this patch, and it is working as I expected.

sanjoy requested changes to this revision.Apr 20 2017, 3:37 PM
sanjoy added inline comments.
lib/Analysis/InstructionSimplify.cpp
1592 ↗(On Diff #95510)

As written, the code is correct but misleading.

I think the correct thing to do in this case is:

auto Range0 = ConstantRange::makeExactICmpRegion(Pred0, *C0);
auto Range1 = ConstantRange::makeExactICmpRegion(Pred1, *C1);

The current code is correct because for an APInt (as opposed to ConstantRange) RHS, makeExactICmpRegion just happens to be the same as makeAllowedICmpRegion. However, if *C0 and *C1 were ConstantRange s (i.e. someone generalized this code), then using makeAllowedICmpRegion would be wrong.

This revision now requires changes to proceed.Apr 20 2017, 3:37 PM
spatel updated this revision to Diff 96132.Apr 21 2017, 6:52 AM
spatel edited edge metadata.

Patch updated:
Use "makeExactICmpRegion" rather than "makeAllowedICmpRegion" to avoid a potential bug. As Sanjoy noted, we're using APInt here currently, so there should be no difference right now. But if this code was converted to take ConstantRange inputs, that would be wrong.

sanjoy accepted this revision.Apr 21 2017, 11:03 AM

LGTM! I did not sanity check the test changes, I've assumed you did that. :)

This revision is now accepted and ready to land.Apr 21 2017, 11:03 AM

LGTM! I did not sanity check the test changes, I've assumed you did that. :)

Yes, I have two 10x10 graphs for the predicate combinations...there's a pattern in here that we could probably encode into the predicate enum values (we do something like that for the fcmp predicates), but I don't see that pattern yet. :)
And I spot-checked several of these using Alive for extra confidence.

This revision was automatically updated to reflect the committed changes.

I'm seeing a performance regression which I traced back to this commit. A reduced test case to show the problem is this:

define i1 @test(i64 %a) {
  %cmpge = icmp sge i64 %a, 1
  %cmpeq = icmp eq i64 %a, 1
  %cmpne = icmp eq i1 %cmpeq, false
  %cmp = and i1 %cmpge, %cmpne
  ret i1 %cmp
}

Before this commit, InstCombine would reduce the above to simply:

define i1 @test(i64 %a) {
  %1 = icmp sgt i64 %a, 1
  ret i1 %1
}

but for some reason this no longer happens now.

Note that when directly using an "icmp ne" instead of inverting the result of the "icmp eq", the optimization does still happen.

I'm seeing a performance regression which I traced back to this commit. A reduced test case to show the problem is this:

define i1 @test(i64 %a) {
  %cmpge = icmp sge i64 %a, 1
  %cmpeq = icmp eq i64 %a, 1
  %cmpne = icmp eq i1 %cmpeq, false
  %cmp = and i1 %cmpge, %cmpne
  ret i1 %cmp
}

Before this commit, InstCombine would reduce the above to simply:

define i1 @test(i64 %a) {
  %1 = icmp sgt i64 %a, 1
  ret i1 %1
}

but for some reason this no longer happens now.

Note that when directly using an "icmp ne" instead of inverting the result of the "icmp eq", the optimization does still happen.

Thanks for the small test. I'll take a look.

This is a fun one. The ultimate problem is that by making InstSimplify smarter, we revealed a shortcoming in InstCombine. InstCombine is missing a fold for xor-of-icmps like this:

; (a > 0) ^ (a == 1) --> ((a > 0) & (a != 1)) | ((a <= 0) & (a == 1)) --> (a > 1) 
define i1 @test1(i64 %a) {
  %b = icmp sgt i64 %a, 0
  %c = icmp eq i64 %a, 1
  %and = xor i1 %c, %b
  ret i1 %and
}

I think we can avoid the problem in your original test case though. Ie, there's another InstCombine bug because it creates an unnecessary instruction:
IC: ADD: %notctmp = xor i1 %c, false

And that's what triggers another fold (the 'and' in the original code becomes an 'xor') which produces the pattern that we don't know how to fold yet.

This should reduce the chance of finding the optimization gap:
rL303312

Related clean-up commits ahead of that:
rL303295
rL303309
rL303310

We might want to treat an xor-of-icmps as an or-of-and-of-icmps to really fix this....or we just handle xor-of-icmps directly. There was a patch:
rL289813
...that might have gotten this case, but it got reverted:
rL290266

This should reduce the chance of finding the optimization gap:
rL303312

Yes, this works for me. Thanks for the quick fix!

Here's a proposal to fix xor-of-icmps better:
D33342

Also, the post-commit thread for:
https://reviews.llvm.org/rL303312
...suggests that that patch caused a regression for SimplifyCFG. I don't know how that would happen yet, but I'll investigate if I get an example.