Page MenuHomePhabricator

[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X
Changes PlannedPublic

Authored by craig.topper on Jul 7 2020, 5:06 PM.

Details

Summary

As noted here https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html and by alive2, this transform isn't valid. If X is poison this potentially propagates poison when it shouldn't.

It seems we don't have any tests for this in either InstSimplify or InstCombine. I can add negative tests if we want.

The same transform also exists in DAGCombiner.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 7 2020, 5:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2020, 5:06 PM

I do think we should have (miscompiled) tests in all places first.

nlopes accepted this revision.Jul 8 2020, 1:35 AM

Here's an end-to-end miscompilation: https://bugs.llvm.org/show_bug.cgi?id=31633

This revision is now accepted and ready to land.Jul 8 2020, 1:35 AM
nlopes added a subscriber: regehr.Jul 8 2020, 1:36 AM

Here's an end-to-end miscompilation: https://bugs.llvm.org/show_bug.cgi?id=31633

Sure, but we still need to have a test with comment that it should not be folded, referencing all this.

spatel added a comment.Jul 8 2020, 4:47 AM

Here's an end-to-end miscompilation: https://bugs.llvm.org/show_bug.cgi?id=31633

Sure, but we still need to have a test with comment that it should not be folded, referencing all this.

Yes - we need ~4 tests ( {trueval/falseval} * {scalar/vector} ) to ensure that the transform doesn't get mistakenly re-added. Also, there's a clang test that is going to fail with this change:
clang/test/CodeGen/arm-mve-intrinsics/dup.c
That file tried to be minimally dependent on opt, but still used -early-cse which calls instsimplify for analysis.

nikic added a subscriber: nikic.Jul 8 2020, 10:04 AM

Add tests for InstCombine and InstSimplfy.
Update frontend test

spatel accepted this revision.Jul 8 2020, 11:46 AM

LGTM

lebedev.ri accepted this revision.Jul 8 2020, 11:46 AM

LG, thank you.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2020, 12:53 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Please also add testcases with select constant expressions, to test constant folding.

Please also add testcases with select constant expressions, to test constant folding.

Should we remove the handling from llvm::ConstantFoldSelectInstruction

Should we remove the handling from llvm::ConstantFoldSelectInstruction

It seems silly to remove the handling from InstSimplify, but not constant folding.

majnemer added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
4121–4125

Can we still do these optimizations when X is a frozen value?

Wasn't @majnemer asking about

define i32 @src(i1 %cond, i32 %x) {
  %xf = freeze i32 %x
  %s = select i1 %cond, i32 %xf, i32 undef
  ret i32 %s
}

which is legal. I'm going to work on supporting known non-poison cases.

regehr added a comment.Jul 8 2020, 2:48 PM

@craig.topper ok, I agree that should work. alive doesn't like it -- is this an alive bug @nlopes? a freeze should not yield undef.
https://alive2.llvm.org/ce/z/mWAsYv

@craig.topper ok, I agree that should work. alive doesn't like it -- is this an alive bug @nlopes? a freeze should not yield undef.
https://alive2.llvm.org/ce/z/mWAsYv

Did you mean to check something like the following?

define i32 @src(i1 %cond, i32 %x) {
  %x2 = freeze i32 %x
  %s = select i1 %cond, i32 %x2, i32 undef
  ret i32 %s
}

define i32 @tgt(i1 %cond, i32 %x) {
  %x2 = freeze i32 %x
  ret i32 %x2
}

Alive does like this https://alive2.llvm.org/ce/z/yhibbe which is what I was going to implement.

nlopes added a comment.Jul 8 2020, 3:18 PM

Alive does like this https://alive2.llvm.org/ce/z/yhibbe which is what I was going to implement.

right. There's a guaranteedNonPoison (or similar name) in ValueTracking that can be used I guess.

regehr added a comment.Jul 8 2020, 6:42 PM

Did you mean to check something like the following?

define i32 @src(i1 %cond, i32 %x) {
  %x2 = freeze i32 %x
  %s = select i1 %cond, i32 %x2, i32 undef
  ret i32 %s
}

define i32 @tgt(i1 %cond, i32 %x) {
  %x2 = freeze i32 %x
  ret i32 %x2
}

that's fine but I still don't understand why the counterexample to my version says %x2 in @src can be undef

efriedma added a subscriber: tgt.Jul 9 2020, 1:05 PM

that's fine but I still don't understand why the counterexample to my version says %x2 in @src can be undef

If I'm understanding correctly, this reduces to something like the following:

define i32 @src() {

%x2 = freeze i32 undef
ret i32 %x2

}

define i32 @tgt() {

ret i32 undef

}

This seems a little suspect, yes.

efriedma removed a subscriber: tgt.Jul 9 2020, 1:06 PM
nlopes added a subscriber: tgt.Jul 9 2020, 2:18 PM

that's fine but I still don't understand why the counterexample to my version says %x2 in @src can be undef

If I'm understanding correctly, this reduces to something like the following:

define i32 @src() {

%x2 = freeze i32 undef
ret i32 %x2

}

define i32 @tgt() {

ret i32 undef

}

This seems a little suspect, yes.

This is a known bug: https://github.com/AliveToolkit/alive2/issues/3
gotta fix this soon.

After this patch we have false msan reports on code like this:

bool iv_compare2(const int *op1, const int *op2) {
  if (op1[1] != op2[1]) return op1[1] < op2[1];
  for (int i = 1; i >= 0; i--) {
    if (op1[i] != op2[i]) return op1[i] < op2[i];
  }
  return false;
}

void foo() {
  int a[2] = {};
  int b[2] = {};
  auto UNINITIALIZED= iv_compare2(a, b);
}

Here it looks fine and the same as before the patch. It returns "and undef, false" which should be false.

*** IR Dump After Simplify the CFG ***
; Function Attrs: norecurse nounwind readonly sanitize_memory uwtable
define zeroext i1 @_Z11iv_compare2PKiS0_(i32* nocapture readonly %op1, i32* nocapture readonly %op2) local_unnamed_addr #0 !dbg !8 {
entry:
  %arrayidx = getelementptr inbounds i32, i32* %op1, i64 1, !dbg !10
  %0 = load i32, i32* %arrayidx, align 4, !dbg !10
  %arrayidx1 = getelementptr inbounds i32, i32* %op2, i64 1, !dbg !11
  %1 = load i32, i32* %arrayidx1, align 4, !dbg !11
  %cmp.not = icmp eq i32 %0, %1, !dbg !12
  br i1 %cmp.not, label %for.cond, label %if.then, !dbg !10

if.then:                                          ; preds = %entry
  %cmp4 = icmp slt i32 %0, %1, !dbg !13
  ret i1 %cmp4, !dbg !14

for.cond:                                         ; preds = %entry
  %2 = load i32, i32* %op1, align 4, !dbg !15
  %3 = load i32, i32* %op2, align 4, !dbg !16
  %cmp9.not.1 = icmp eq i32 %2, %3, !dbg !17
  %cmp15 = icmp slt i32 %2, %3
  %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15, !dbg !15
  %spec.select40 = select i1 %cmp9.not.1, i1 false, i1 true, !dbg !15
  %spec.select = and i1 %spec.select39, %spec.select40
  ret i1 %spec.select
}

However with this patch after the next transformation it breaks the code:
Now it returns undef instead of false if %2 == %3

*** IR Dump After Combine redundant instructions ***
; Function Attrs: norecurse nounwind readonly sanitize_memory uwtable
define zeroext i1 @_Z11iv_compare2PKiS0_(i32* nocapture readonly %op1, i32* nocapture readonly %op2) local_unnamed_addr #0 !dbg !8 {
entry:
  %arrayidx = getelementptr inbounds i32, i32* %op1, i64 1, !dbg !10
  %0 = load i32, i32* %arrayidx, align 4, !dbg !10
  %arrayidx1 = getelementptr inbounds i32, i32* %op2, i64 1, !dbg !11
  %1 = load i32, i32* %arrayidx1, align 4, !dbg !11
  %cmp.not = icmp eq i32 %0, %1, !dbg !12
  br i1 %cmp.not, label %for.cond, label %if.then, !dbg !10

if.then:                                          ; preds = %entry
  %cmp4 = icmp slt i32 %0, %1, !dbg !13
  ret i1 %cmp4, !dbg !14

for.cond:                                         ; preds = %entry
  %2 = load i32, i32* %op1, align 4, !dbg !15
  %3 = load i32, i32* %op2, align 4, !dbg !16
  %cmp9.not.1 = icmp eq i32 %2, %3, !dbg !17
  %cmp15 = icmp slt i32 %2, %3
  %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15, !dbg !15
  ret i1 %spec.select39
}

Then msan reasonably reports a bug.

Seems like a bug in instsimplify:

define i1 @f(i32 %x, i32 %y) {
  %cmp9.not.1 = icmp eq i32 %x, %y
  %cmp15      = icmp slt i32 %x, %y
  %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15
  %spec.select40 = xor i1 %cmp9.not.1, 1
  %spec.select  = and i1 %spec.select39, %spec.select40
  ret i1 %spec.select
}
=>
define i1 @f(i32 %x, i32 %y) {
  %cmp9.not.1 = icmp eq i32 %x, %y
  %cmp15 = icmp slt i32 %x, %y
  %spec.select39 = select i1 %cmp9.not.1, i1 undef, i1 %cmp15
  ret i1 %spec.select39
}

https://godbolt.org/z/a8f7hT

Alive2 says it's incorrect: https://alive2.llvm.org/ce/z/-8Q4HL

Seems to be related with ValueTracking's isImpliedCondition since this optimizations happens only when operands of the two icmps are the same.

(renaming variables for readability)

%a = select i1 %s, i1 undef, i1 %t
%b = xor i1 %s, 1
%c = and i1 %a, %b

This series of reasoning happened from a single SimplifyAndInst call:

c = a & (s ^ 1)
  = (a & s) ^ (a & 1)                    ; ExpandBinOp
  = ((select s, undef, t) & s) ^ a
  = (select s, (undef & s), (t & s)) ^ a ; ThreadBinOpOverSelect
  = (select s, (undef & s), false) ^ a   ; since s = (x == y), t = (x < y)
  = (select s, false, false) ^ a         ; choose undef to be false
  = a
  = select s, undef, t

In general, distributing a into operands of xor (second line) isn't sound because it increases the number of uses of a. We don't want to totally disable the simplification, however.

If InstSimplify never increases the number of uses in the end, we have an alternative solution: tracking to which value undef is folded.
Whenever an undef value is chosen to be a concrete value, the decision should be remembered, so the copied undefs won't be folded into different values.
In case of InstSimplify, we can identify individual undefs by Use, since InstSimplify won't do any transformation inside.
This means SimplifyXXX needs to return two things: the simplified value & the undef cache. Since InstSimplify isn't designed to do transformation directly, other optimizations like InstCombine should perform the final change.

Does this solution make sense? Then, I can prepare a patch for this.

FYI InstSimplify doing distribution of undef is a known bug: https://bugs.llvm.org/show_bug.cgi?id=33165

This seems like something that we should then revert until we know that instsimplify can be updated with a fix?

This seems like something that we should then revert until we know that instsimplify can be updated with a fix?

Possibility for a miscompile sounds much worse to me than a false-positive diag..

We're starting to see miscompiles as we do more testing as well, just nothing smaller at the moment.

We're starting to see miscompiles as we do more testing as well, just nothing smaller at the moment.

Could you clarify, do you mean that this fix is causing (new) miscompiles?

It's that even before the msan instrumentation the IR doesn't look correct - thus a miscompile.

It's that even before the msan instrumentation the IR doesn't look correct - thus a miscompile.

I'll share a prototype of the InstSimplify patch on Phabricator, in a day or two; it would be great if the miscompilation is fixed with the patch.

It's that even before the msan instrumentation the IR doesn't look correct - thus a miscompile.

I'll share a prototype of the InstSimplify patch on Phabricator, in a day or two; it would be great if the miscompilation is fixed with the patch.

Would it be reasonable to revert this in the meantime?

I'm going to revert this as Eric requested. And I'll ask to merge the revert to the 11 branch.

This revision is now accepted and ready to land.Wed, Jul 15, 10:09 PM

@aqjune did you put a patch for InstSimplify doing distribution over undef yet?

@aqjune did you put a patch for InstSimplify doing distribution over undef yet?

Sorry, making InstSimplify to safely distribute undef was a nontrivial job - other than updating InstSimplify to track uses, it needed rewinding folded undef records if simplification failed & update constant folding to track uses too. The folded value could be symbolic, making things more complex.
I'm trying an alternative solution for this problem, I will leave a link here after submission.