This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] fold and/or of compares with equality to min/max constant
ClosedPublic

Authored by spatel on Apr 18 2020, 9:34 AM.

Details

Summary

I found 12 (6 if we compress the DeMorganized forms) patterns for logic-of-compares with a min/max constant while looking at PR45510:
https://bugs.llvm.org/show_bug.cgi?id=45510

The variations on those forms multiply the test cases by 8 (unsigned/signed, swapped compare operands, commuted logic operands).
We had partial logic to deal with these for the unsigned min (zero) case, but missed everything else.

I drafted an alternate implementation that uses ConstantRange instead of predicate+constant matching. It's slightly less code, but it requires many more code comments to follow the implied logic, so I think it's not worth the effort. I don't expect there's any noticeable compile-time impact for either form.

Here's an abuse of Alive2 to show the 12 basic signed variants of the patterns in 1 function:
http://volta.cs.utah.edu:8080/z/5Vpiyg

declare void @use(i1, i1, i1, i1, i1, i1, i1, i1, i1, i1, i1, i1)
define void @src(i8 %x, i8 %y)  {
  %m1 = icmp eq i8 %x, 127
  %c1 = icmp slt i8 %x, %y
  %r1 = and i1 %m1, %c1   ; (X == MAX) && (X < Y) --> false

  %m2 = icmp ne i8 %x, 127
  %c2 = icmp sge i8 %x, %y
  %r2 = or i1 %m2, %c2    ; (X != MAX) || (X >= Y) --> true

  %m3 = icmp eq i8 %x, -128
  %c3 = icmp sgt i8 %x, %y
  %r3 = and i1 %m3, %c3   ; (X == MIN) && (X > Y) --> false

  %m4 = icmp ne i8 %x, -128
  %c4 = icmp sle i8 %x, %y
  %r4 = or i1 %m4, %c4    ; (X != MIN) || (X <= Y) --> true

  %m5 = icmp eq i8 %x, 127
  %c5 = icmp sge i8 %x, %y
  %r5 = and i1 %m5, %c5   ; (X == MAX) && (X >= Y) --> X == MAX
 
  %m6 = icmp ne i8 %x, 127
  %c6 = icmp slt i8 %x, %y
  %r6 = or i1 %m6, %c6   ; (X != MAX) || (X < Y) --> X != MAX

  %m7 = icmp eq i8 %x, -128
  %c7 = icmp sle i8 %x, %y
  %r7 = and i1 %m7, %c7   ; (X == MIN) && (X <= Y) --> X == MIN

  %m8 = icmp ne i8 %x, -128
  %c8 = icmp sgt i8 %x, %y
  %r8 = or i1 %m8, %c8   ; (X != MIN) || (X > Y) --> X != MIN

  %m9 = icmp ne i8 %x, 127
  %c9 = icmp slt i8 %x, %y
  %r9 = and i1 %m9, %c9    ; (X != MAX) && (X < Y) --> X < Y
 
  %m10 = icmp eq i8 %x, 127
  %c10 = icmp sge i8 %x, %y
  %r10 = or i1 %m10, %c10    ; (X == MAX) || (X >= Y) --> X >= Y

  %m11 = icmp ne i8 %x, -128
  %c11 = icmp sgt i8 %x, %y
  %r11 = and i1 %m11, %c11    ; (X != MIN) && (X > Y) --> X > Y

  %m12 = icmp eq i8 %x, -128
  %c12 = icmp sle i8 %x, %y
  %r12 = or i1 %m12, %c12    ; (X == MIN) || (X <= Y) --> X <= Y

  call void @use(i1 %r1, i1 %r2, i1 %r3, i1 %r4, i1 %r5, i1 %r6, i1 %r7, i1 %r8, i1 %r9, i1 %r10, i1 %r11, i1 %r12)
  ret void
}

define void @tgt(i8 %x, i8 %y)  {
  %m5 = icmp eq i8 %x, 127
  %m6 = icmp ne i8 %x, 127
  %m7 = icmp eq i8 %x, -128
  %m8 = icmp ne i8 %x, -128
  %c9 = icmp slt i8 %x, %y
  %c10 = icmp sge i8 %x, %y
  %c11 = icmp sgt i8 %x, %y
  %c12 = icmp sle i8 %x, %y
  call void @use(i1 0, i1 1, i1 0, i1 1, i1 %m5, i1 %m6, i1 %m7, i1 %m8, i1 %c9, i1 %c10, i1 %c11, i1 %c12)
  ret void
}

Diff Detail

Event Timeline

spatel created this revision.Apr 18 2020, 9:34 AM
lebedev.ri added inline comments.Apr 18 2020, 10:11 AM
llvm/lib/Analysis/InstructionSimplify.cpp
1488–1492

Any handling plan for these neighbor patterns?

1700

You literally always check !IsAnd.
Should this be IsOr instead?

In terms of correctness, this looks good to me.

llvm/lib/Analysis/InstructionSimplify.cpp
1486

Is simplification on pointer comparisons still alive after this is removed?

spatel planned changes to this revision.Apr 19 2020, 6:34 AM
spatel marked 3 inline comments as done.
spatel added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
1486

Good catch! No, we would lose nullptr simplifications. But there were no regression tests for those cases in InstSimplify or any other other pass. Let me add some. Given that, we can't just remove the specializations for compare with 'null'.

1488–1492

I didn't look closely, but I assumed we can't generalize these because of the isKnownNonZero() dependency. We'd have to look at the regression tests to see if there's some other means of value tracking to generalize the expected cases.

1700

I thought someone would call that out. :)
My mind defaulted to the 'and' versions of the patterns when visualizing the logic, so that's how the code ended up like this. If we are comfortable inverting the pattern matching to the 'or' versions, then we can invert the first check at least to the positive 'IsAnd'.
Another option would duplicate all of the code predicated by the 'IsAnd' check.
The alternative - make this 'IsOr' - would mean this function stands different/alone compared to all of the related compare simplifications. That didn't seem like a worthy option to me.
Let me know which, if any, form you'd prefer.

spatel marked an inline comment as done.Apr 19 2020, 6:39 AM
spatel added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
1700

Another option - we can change the later return lines to:

return IsAnd ? getFalse(Cmp0->getType()) : getTrue(Cmp0->getType());
spatel updated this revision to Diff 258744.Apr 20 2020, 8:04 AM
spatel marked 3 inline comments as done.

Patch updated:

  1. Allow null pointer constant as a special-case of zero integer (added some negative tests, so the test file from rGa2eb55d shows up here).
  2. Use positive 'IsAnd' variable to possibly make code clearer - and backwards justify that polarity choice :).
aqjune added inline comments.Apr 20 2020, 9:56 AM
llvm/lib/Analysis/InstructionSimplify.cpp
1714

The patch transforms this function into ret false, which is not correct. If %a = 0x0 and %b = 0x1, the source program should be true.

define i1 @f(i8* %a, i8* %b) {
  %c1 = icmp eq i8* %a, null
  %c2 = icmp slt i8* %a, %b
  %r = and i1 %c1, %c2
  ret i1 %r
}

And I think this miscompilation is related with assigning 1-bit integer in this line.
The line 1713 MinMaxC += APInt::getSignedMinValue(MinMaxC.getBitWidth()); will make MinMaxC full of one bits, and it makes if branch at line 1716 taken.

We can do more optimization with less code in instcombine, so here's a step towards that:
D78582

If there's agreement on that direction, I will abandon this patch.

We can do more optimization with less code in instcombine, so here's a step towards that:
D78582

I agree that we should have powerful instcombine fold,

If there's agreement on that direction, I will abandon this patch.

but note that it would be a suspended, half-baked state,
since then we'd neither generalize the existing folds,
nor drop them.

If there's agreement on that direction, I will abandon this patch.

but note that it would be a suspended, half-baked state,
since then we'd neither generalize the existing folds,
nor drop them.

Yes, it's fragmented. We could still remove the existing chunks shown here if we think that instcombine is probably good enough. There's that pesky one-use check in the other patch though, so there's a slight chance that we would lose an optimization somewhere. We have to decide if the reduction in code is worthy. Alternatively, I can fix the remaining known bug here, and we'd have full optimization power with some overlap.

If there's agreement on that direction, I will abandon this patch.

but note that it would be a suspended, half-baked state,
since then we'd neither generalize the existing folds,
nor drop them.

Yes, it's fragmented. We could still remove the existing chunks shown here if we think that instcombine is probably good enough.
We have to decide if the reduction in code is worthy.

There's that pesky one-use check in the other patch though, so there's a slight chance that we would lose an optimization somewhere.
Alternatively, I can fix the remaining known bug here, and we'd have full optimization power with some overlap.

That is my worry & preferred solution, yes.

nikic added a comment.Apr 22 2020, 1:07 AM

Yes, it's fragmented. We could still remove the existing chunks shown here if we think that instcombine is probably good enough. There's that pesky one-use check in the other patch though, so there's a slight chance that we would lose an optimization somewhere. We have to decide if the reduction in code is worthy. Alternatively, I can fix the remaining known bug here, and we'd have full optimization power with some overlap.

I think it's okay to just leave InstSimplify as it was. The existing code is specific to handling the zero case, which is generally more important than other cases, and it is located next to code that is specific to handling the zero case (isKnownNonZero). This seems like a reasonable state to leave things at here.

spatel updated this revision to Diff 259402.Apr 22 2020, 2:45 PM

Patch updated:

  1. Removed the code that would have overlapped with D78582.
  2. Left the old 'null' specific code here, but rearranged and added a code comment about the fragmentation.

Between the 2 patches, this should get all of the known patterns.

lebedev.ri accepted this revision.Apr 23 2020, 2:02 AM

This looks good to me, but we have a soundness problem with existing nullptr folds, specifically
(X == null) || (X u<= Y) --> X u<= Y and (X != null) && (X u> Y) --> X u> Y

llvm/test/Transforms/InstSimplify/and-or-icmp-nullptr.ll
217–248

Alive says all these are [preexisting] miscompiles, i think:

----------------------------------------
define i1 @ule_or_min(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ule_or_min(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=1, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 4        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ule_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ule_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ule * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 0        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ule_swap_or_min(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ule_swap_or_min(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 4        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)


----------------------------------------
define i1 @ule_swap_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  %cmpeq = icmp eq * %x, null
  %r = or i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ule_swap_or_min_commute(* %x, * %y) {
%0:
  %cmp = icmp uge * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=1, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x1 (1)
i1 %r = #x1 (1)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x0 (0)
Source value: #x1 (1)
Target value: #x0 (0)
265–313

Likewise:

----------------------------------------
define i1 @ugt_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ugt_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)


----------------------------------------
define i1 @ugt_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ugt_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ugt * %x, %y
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)


----------------------------------------
define i1 @ugt_swap_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmp, %cmpeq
  ret i1 %r
}
=>
define i1 @ugt_swap_and_not_min(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=2, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 2        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)


----------------------------------------
define i1 @ugt_swap_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  %cmpeq = icmp ne * %x, null
  %r = and i1 %cmpeq, %cmp
  ret i1 %r
}
=>
define i1 @ugt_swap_and_not_min_commute(* %x, * %y) {
%0:
  %cmp = icmp ult * %y, %x
  ret i1 %cmp
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
* %x = null
* %y = pointer(non-local, block_id=1, offset=0)

Source:
i1 %cmp = undef
i1 %cmpeq = #x0 (0)
i1 %r = #x0 (0)

SOURCE MEMORY STATE
===================
NON-LOCAL BLOCKS:
Block 0 >       size: 0 align: 64       alloc type: 0
Block 1 >       size: 0 align: 2        alloc type: 0
Block 2 >       size: 0 align: 4        alloc type: 0

Target:
i1 %cmp = #x1 (1)
Source value: #x0 (0)
Target value: #x1 (1)
This revision is now accepted and ready to land.Apr 23 2020, 2:02 AM

Hi, Alive2 currently approximates comparison with null pointer, so please ignore the case. :/ The transformation looks good to me too.
On my queue there is a work to address it. I'll rerun the unit tests after it is supported.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2020, 6:59 AM