This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Use knownbits interface for determining if `div`/`rem` are safe to speculate
AcceptedPublic

Authored by goldstein.w.n on Apr 27 2023, 11:23 PM.

Details

Summary

This just replaces the exact constant requirements with known-bits
which can prove better results.

This change also adds a new flag PreservesOpCharacteristics.
If PreservesOpCharacteristics is true, isSafeToSpeculativelyExecute{WithOpcode}
will use knownbits analysis to help determine if the Inst is safe
to speculatively execute. This can improve accuracy, but makes it easier to
misuse. For example if isSafeToSpeculativelyExecute{WithOpcode} returns
true for an Inst, then a Transform modifies the operands or hoists it from a
BB that had a dominating condition relevant to one of the operands, the analysis
done by isSafeToSpeculativelyExecute{WithOpcode} may no longer hold true.
If the user is certain their use-case doesn't change any of the characteristics of
the operands, then this is the better option. So for udiv, it would be a misuse
to set PreservesOpCharacteristics to true, then use the result to assume its
safe to mask/truncate the denominator.

This patch goes through the uses of isSafeToSpeculativelyExecute{WithOpcode}
and sets that flag. I tried to be conservative about where I set it to true.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Apr 27 2023, 11:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 11:23 PM
goldstein.w.n requested review of this revision.Apr 27 2023, 11:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 11:23 PM
nikic accepted this revision.Apr 28 2023, 1:10 AM

LGTM

llvm/lib/Analysis/ValueTracking.cpp
6059–6060

We don't know it is -1 anymore, just that it might be.

This revision is now accepted and ready to land.Apr 28 2023, 1:10 AM
goldstein.w.n marked an inline comment as done.Apr 29 2023, 9:25 AM

Update comment + rebase

Rebase with tests in LICM

nikic added a comment.May 1 2023, 12:08 AM

Ooops, I missed an important pre-condition here: You also need to check that the value isGuaranteedNotToBePoison. All the "known" APIs have an implicit "OrPoison" at the end, and this is one of the case where it makes a difference, as division by poison is UB as well. See for example https://alive2.llvm.org/ce/z/WAFYny.

nikic added a comment.May 1 2023, 12:14 AM

Ooops, I missed an important pre-condition here: You also need to check that the value isGuaranteedNotToBePoison. All the "known" APIs have an implicit "OrPoison" at the end, and this is one of the case where it makes a difference, as division by poison is UB as well. See for example https://alive2.llvm.org/ce/z/WAFYny.

Unfortunately alive2 doesn't detect this in the LICM tests. It looks like we should be able to test this in a way that alive2 understands using SimplifyCFG block speculation: https://llvm.godbolt.org/z/ehbMKqqxz Apparently this transform is willing to always speculate one instruction even if it's expensive, so it should work well for this purpose and be understood by alive.

cmtice added a subscriber: cmtice.May 1 2023, 11:12 AM

Just a heads up: This commit appears to be breaking one of our tests. I'll see if I can get a reproducible test case. Would you consider reverting this in the meantime?

Just a heads up: This commit appears to be breaking one of our tests. I'll see if I can get a reproducible test case. Would you consider reverting this in the meantime?

Done.

nikic added a comment.May 1 2023, 11:46 AM

I'd expect the test failure is due to the poison handling issue mentioned above.

I'd expect the test failure is due to the poison handling issue mentioned above.

Oh I didn't see that. Will have fix up for that shortly.

goldstein.w.n reopened this revision.May 1 2023, 12:27 PM
This revision is now accepted and ready to land.May 1 2023, 12:27 PM

Also check for poison/undef denom

Make poison only: https://alive2.llvm.org/ce/z/vEMLxe seems to be okay if undef included

@cmtice think fix is up. Any chance you can verify?

nikic added inline comments.May 1 2023, 12:39 PM
llvm/lib/Analysis/ValueTracking.cpp
6042

Omit Depth=0, which is the default.

6046

either case -> cases or need -> needs.

6062

This also needs to check that op0 is not poison.

goldstein.w.n marked 3 inline comments as done.

Also check poison on -1 denum case

nikic accepted this revision.May 1 2023, 1:01 PM

LGTM

cmtice added a comment.May 1 2023, 1:17 PM

Yes, you're updated patch seems to have fixed my problem :-)

@goldstein.w.n The newest commit seams to break speculate-div.ll test. I'm working on a reproduciton in upstream, but in our setup the
output is:

llvm/test/Transforms/LICM/speculate-div.ll:54:15: error: CHECK-NEXT: is not on the line after the previous match
; CHECK-NEXT: br label [[LOOP:%.*]]
              ^
<stdin>:37:2: note: 'next' match was here
 br label %loop
 ^
<stdin>:35:23: note: previous match ended here
 %x = and i16 %xo, 123
                      ^
<stdin>:36:1: note: non-matching line after previous match is here
 %div = sdiv i16 %n, %x
^
llvm/test/Transforms/LICM/speculate-div.ll:77:15: error: CHECK-NEXT: is not on the line after the previous match
; CHECK-NEXT: br label [[LOOP:%.*]]
              ^
<stdin>:50:2: note: 'next' match was here
 br label %loop
 ^
<stdin>:48:20: note: previous match ended here
 %x = or i16 %xx, 1
                   ^
<stdin>:49:1: note: non-matching line after previous match is here
 %div = srem i16 %n, %x
^
llvm/test/Transforms/LICM/speculate-div.ll:166:15: error: CHECK-NEXT: is not on the line after the previous match
; CHECK-NEXT: br label [[LOOP:%.*]]
              ^
<stdin>:100:2: note: 'next' match was here
 br label %loop
 ^
<stdin>:98:20: note: previous match ended here
 %x = or i16 %xx, 1
                   ^
<stdin>:99:1: note: non-matching line after previous match is here
 %div = udiv i16 %n, %x
^

Input file: <stdin>
Check file: llvm/test/Transforms/LICM/speculate-div.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         32: define void @sdiv_ok(i16 %n, i16 noundef %xx) { 
         33: entry: 
         34:  %xo = or i16 %xx, 1 
         35:  %x = and i16 %xo, 123 
         36:  %div = sdiv i16 %n, %x 
         37:  br label %loop 
next:54       !~~~~~~~~~~~~~  error: match on wrong line
         38:  
         39: loop: ; preds = %loop, %entry 
         40:  call void @maythrow() 
         41:  call void @use(i16 %div) 
         42:  br label %loop 
         43: } 
         44:  
         45: define void @srem_ok2(i16 noundef %nn, i16 noundef %xx) { 
         46: entry: 
         47:  %n = and i16 %nn, 123 
         48:  %x = or i16 %xx, 1 
         49:  %div = srem i16 %n, %x 
         50:  br label %loop 
next:77       !~~~~~~~~~~~~~  error: match on wrong line
         51:  
         52: loop: ; preds = %loop, %entry 
         53:  call void @maythrow() 
         54:  call void @use(i16 %div) 
         55:  br label %loop 
          .
          .
          .
         95:  
         96: define void @udiv_ok(i16 %n, i16 noundef %xx) { 
         97: entry: 
         98:  %x = or i16 %xx, 1 
         99:  %div = udiv i16 %n, %x 
        100:  br label %loop 
next:166      !~~~~~~~~~~~~~  error: match on wrong line
        101:  
        102: loop: ; preds = %loop, %entry 
        103:  call void @maythrow() 
        104:  call void @use(i16 %div) 
        105:  br label %loop 
          .
          .
          .
>>>>>>

It passes upstream though. Probably the issue is with lit version.

It passes upstream though. Probably the issue is with lit version.

@steelannelida where you able to resolve the issue or do you think the patch is still broken?

@goldstein.w.n the problem is on our side, thanks!

alexfh added a subscriber: alexfh.May 9 2023, 3:21 PM

This patch is causing a miscompile:

$ cat q.cc 
extern unsigned q();
int main() {
  unsigned qq = q();
  if (qq == 0) qq = 1;
  int rows = 8192 / qq;
  if (rows == 0) rows = 1;
  return rows;
}
$ cat w.cc
unsigned q() { return 524288; }
$ clang -fsanitize=memory -O2 q.cc w.cc && ./a.out
MemorySanitizer:DEADLYSIGNAL
==885020==ERROR: MemorySanitizer: FPE on unknown address 0x0000002cc0c4 (pc 0x0000002cc0c4 bp 0x000000000001 sp 0x7ffd25df6e50 T885020)
...

Please revert or fix soon. Thanks!

alexfh added a comment.May 9 2023, 3:29 PM
$ clang-6cdc229a64be8dacba40d30a9032c14f51ee30c0 -fsanitize=memory -O2 q.cc -S -emit-llvm -o good.ll
$ clang-6c667abf3294d61e4fbe1238e1755c79f7547f1b -fsanitize=memory -O2 q.cc -S -emit-llvm -o bad.ll
$ diff -u good.ll bad.ll
--- good.ll
+++ bad.ll
@@ -11,8 +11,10 @@
   %call = tail call noundef i32 @_Z1qv()
   %spec.store.select = tail call i32 @llvm.umax.i32(i32 %call, i32 1)
   %1 = icmp ugt i32 %spec.store.select, 8192
-  %div = udiv i32 8192, %spec.store.select
-  %spec.store.select4 = select i1 %1, i32 1, i32 %div
+  %div.rhs.trunc = trunc i32 %spec.store.select to i16
+  %div7 = udiv i16 8192, %div.rhs.trunc
+  %div.zext = zext i16 %div7 to i32
+  %spec.store.select4 = select i1 %1, i32 1, i32 %div.zext
   ret i32 %spec.store.select4
 }
 
@@ -40,4 +42,4 @@
 
 !0 = !{i32 1, !"wchar_size", i32 4}
 !1 = !{i32 7, !"uwtable", i32 2}
-!2 = !{!"clang version trunk (6cdc229a64be8dacba40d30a9032c14f51ee30c0)"}
+!2 = !{!"clang version trunk (6c667abf3294d61e4fbe1238e1755c79f7547f1b)"}
alexfh added a comment.May 9 2023, 3:33 PM

For clarity: I'm talking about the second attempt on this (https://reviews.llvm.org/rG6c667abf3294d61e4fbe1238e1755c79f7547f1b). For some reason, it's missing a link to this review thread.

$ clang-6cdc229a64be8dacba40d30a9032c14f51ee30c0 -fsanitize=memory -O2 q.cc -S -emit-llvm -o good.ll
$ clang-6c667abf3294d61e4fbe1238e1755c79f7547f1b -fsanitize=memory -O2 q.cc -S -emit-llvm -o bad.ll
$ diff -u good.ll bad.ll
--- good.ll
+++ bad.ll
@@ -11,8 +11,10 @@
   %call = tail call noundef i32 @_Z1qv()
   %spec.store.select = tail call i32 @llvm.umax.i32(i32 %call, i32 1)
   %1 = icmp ugt i32 %spec.store.select, 8192
-  %div = udiv i32 8192, %spec.store.select
-  %spec.store.select4 = select i1 %1, i32 1, i32 %div
+  %div.rhs.trunc = trunc i32 %spec.store.select to i16
+  %div7 = udiv i16 8192, %div.rhs.trunc
+  %div.zext = zext i16 %div7 to i32
+  %spec.store.select4 = select i1 %1, i32 1, i32 %div.zext
   ret i32 %spec.store.select4
 }
 
@@ -40,4 +42,4 @@
 
 !0 = !{i32 1, !"wchar_size", i32 4}
 !1 = !{i32 7, !"uwtable", i32 2}
-!2 = !{!"clang version trunk (6cdc229a64be8dacba40d30a9032c14f51ee30c0)"}
+!2 = !{!"clang version trunk (6c667abf3294d61e4fbe1238e1755c79f7547f1b)"}

I can't see how this change causes this issue. Somehow making the udiv speculative (which is it)
is affecting the ConstantRange pass which in turn allows for narrowUDivOrURem.

I think there must be a misuse of isSafeToSpeculativelyExecute somewhere but can't quite
find it.

goldstein.w.n added a comment.EditedMay 9 2023, 4:02 PM

This patch is causing a miscompile:

$ cat q.cc 
extern unsigned q();
int main() {
  unsigned qq = q();
  if (qq == 0) qq = 1;
  int rows = 8192 / qq;
  if (rows == 0) rows = 1;
  return rows;
}
$ cat w.cc
unsigned q() { return 524288; }
$ clang -fsanitize=memory -O2 q.cc w.cc && ./a.out
MemorySanitizer:DEADLYSIGNAL
==885020==ERROR: MemorySanitizer: FPE on unknown address 0x0000002cc0c4 (pc 0x0000002cc0c4 bp 0x000000000001 sp 0x7ffd25df6e50 T885020)
...

Please revert or fix soon. Thanks!

Reverting, wasn't able to track down the issue. I think this showing another bug
elsewhere but will revert while I investigate.

Edit: Checking build on revert, thats the delay

But the reason this change causes the bug is because:

if (!CurrI->hasOneUse() || !isSafeToSpeculativelyExecute(CurrI))

Which short stop getConstantRangeAtUse. I think there is almost
definetly a bug in getConstantRangeAtUse or the trunc logic. It was
just hidden because until now isSafeToSpeculativelyExecute only
worked if the div denominator was constant.

goldstein.w.n added a comment.EditedMay 9 2023, 4:27 PM
$ clang-6cdc229a64be8dacba40d30a9032c14f51ee30c0 -fsanitize=memory -O2 q.cc -S -emit-llvm -o good.ll
$ clang-6c667abf3294d61e4fbe1238e1755c79f7547f1b -fsanitize=memory -O2 q.cc -S -emit-llvm -o bad.ll
$ diff -u good.ll bad.ll
--- good.ll
+++ bad.ll
@@ -11,8 +11,10 @@
   %call = tail call noundef i32 @_Z1qv()
   %spec.store.select = tail call i32 @llvm.umax.i32(i32 %call, i32 1)
   %1 = icmp ugt i32 %spec.store.select, 8192
-  %div = udiv i32 8192, %spec.store.select
-  %spec.store.select4 = select i1 %1, i32 1, i32 %div
+  %div.rhs.trunc = trunc i32 %spec.store.select to i16
+  %div7 = udiv i16 8192, %div.rhs.trunc
+  %div.zext = zext i16 %div7 to i32
+  %spec.store.select4 = select i1 %1, i32 1, i32 %div.zext
   ret i32 %spec.store.select4
 }
 
@@ -40,4 +42,4 @@
 
 !0 = !{i32 1, !"wchar_size", i32 4}
 !1 = !{i32 7, !"uwtable", i32 2}
-!2 = !{!"clang version trunk (6cdc229a64be8dacba40d30a9032c14f51ee30c0)"}
+!2 = !{!"clang version trunk (6c667abf3294d61e4fbe1238e1755c79f7547f1b)"}

Ah, I think I found the bug.

In

declare i32 @llvm.umax.i32(i32, i32)
define dso_local noundef i32 @main(i32 %call, i32 %v) local_unnamed_addr {
entry:
  %spec.store.select = call i32 @llvm.umax.i32(i32 %call, i32 1)
  %div = udiv i32 8192, %spec.store.select
  %cmp1 = icmp ugt i32 %spec.store.select, 8192
  %spec.store.select4 = select i1 %cmp1, i32 1, i32 %div
  ret i32 %spec.store.select4
}

getConstantRangeAtUse is using the compare after the udiv:
%cmp1 = icmp ugt i32 %spec.store.select, 8192
To determine that the denominator of udiv <= 8192. This is because
if the denominator > 8192, the udiv won't be chosen by the select:
%spec.store.select4 = select i1 %cmp1, i32 1, i32 %div.
It's only allowed to search past the udiv because its speculatively executable.

The issue is CorrelatedValuePropegation is using the analysis that
relies on the udiv being speculatively executable to modify the udiv
operands which may (in this case does) invalidate the lemma of the
analysis.

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

alexfh added a comment.May 9 2023, 5:31 PM

Thanks for taking care of this!

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

I have implemented that at D150353 :)

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

I tend to agree that this issue leans more on narrowUDivOrURem but are these implicit assumptions elsewhere in the project? If the common usage of isSpeculativelyExecutable has come to mean that its speculatively executable even if the non-const operands change then it is a bug in the change to isSpeculativelyExecutable, not with narrowUDivOrURem

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

I tend to agree that this issue leans more on narrowUDivOrURem but are these implicit assumptions elsewhere in the project? If the common usage of isSpeculativelyExecutable has come to mean that its speculatively executable even if the non-const operands change then it is a bug in the change to isSpeculativelyExecutable, not with narrowUDivOrURem

In GuardWidening there's a similar problem: when it first analyzes code, it calls isSafeToSpeculativelyExecute for a pair {instruction, insertion point} (here) and then it starts hoisting the instruction along with its operands, starting from the operands. While hoisting, it asserts that isSafeToSpeculativelyExecute should be true (here) since it should have been checked prior to hoisting and should still remain true. With this patch that assertion in some cases fails after the operands get hoisted. The return value of isSafeToSpeculativelyExecute changes because isGuaranteedNotToBePoison for some reason fails to prove that the denominator isn't poison, even though the hoisting had no impact on what values the operands can have. I understand that isGuaranteedNotToBePoison by its nature is always allowed to return false, so I don't think there's any problem with it.
To illustrate my point, here's what the function looks like when GuardWidening calls isSafeToSpeculativelyExecute('%sdiv', '%call') for the first time:

define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb8, %bb
  %phi = phi i32 [ 1, %bb ], [ %add9, %bb8 ]
  %call = call i1 @llvm.experimental.widenable.condition()
  br i1 %call, label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb3:                                              ; preds = %bb1
  %zext = zext i32 %phi to i64
  %sdiv = sdiv i64 1, %zext
  %add = add nuw nsw i64 %zext, 1
  %add4 = add nsw i64 %add, 1
  %add5 = add nsw i64 %add4, %sdiv
  %icmp = icmp sgt i64 %add5, 1
  %call6 = call i1 @llvm.experimental.widenable.condition()
  %and = and i1 %icmp, %call6
  br i1 %and, label %bb8, label %bb7

bb7:                                              ; preds = %bb3
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb8:                                              ; preds = %bb3
  %add9 = add nuw nsw i32 %phi, 1
  br label %bb1
}

And here's what the IR looks like when it calls isSafeToSpeculativelyExecute('%sdiv', '%call') the second time:

define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb8, %bb
  %phi = phi i32 [ 1, %bb ], [ %add9, %bb8 ]
  %zext = zext i32 %phi to i64
  %add = add nuw nsw i64 %zext, 1
  %add4 = add nsw i64 %add, 1
  %call = call i1 @llvm.experimental.widenable.condition()
  br i1 %call, label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb3:                                              ; preds = %bb1
  %sdiv = sdiv i64 1, %zext
  %add5 = add nsw i64 %add4, %sdiv
  %icmp = icmp sgt i64 %add5, 1
  %call6 = call i1 @llvm.experimental.widenable.condition()
  %and = and i1 %icmp, %call6
  br i1 %and, label %bb8, label %bb7

bb7:                                              ; preds = %bb3
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb8:                                              ; preds = %bb3
  %add9 = add nuw nsw i32 %phi, 1
  br label %bb1
}

The only difference is that %zext, %add and %add4 were hoisted to bb1.

That problem isn't critical, and it probably can be resolved by removing the corresponding part of the assertion, but it demonstrates that some users indeed expect isSafeToSpeculativelyExecute to be resilient in face of some changes to the operands.

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

I tend to agree that this issue leans more on narrowUDivOrURem but are these implicit assumptions elsewhere in the project? If the common usage of isSpeculativelyExecutable has come to mean that its speculatively executable even if the non-const operands change then it is a bug in the change to isSpeculativelyExecutable, not with narrowUDivOrURem

In GuardWidening there's a similar problem: when it first analyzes code, it calls isSafeToSpeculativelyExecute for a pair {instruction, insertion point} (here) and then it starts hoisting the instruction along with its operands, starting from the operands. While hoisting, it asserts that isSafeToSpeculativelyExecute should be true (here) since it should have been checked prior to hoisting and should still remain true. With this patch that assertion in some cases fails after the operands get hoisted. The return value of isSafeToSpeculativelyExecute changes because isGuaranteedNotToBePoison for some reason fails to prove that the denominator isn't poison, even though the hoisting had no impact on what values the operands can have. I understand that isGuaranteedNotToBePoison by its nature is always allowed to return false, so I don't think there's any problem with it.
To illustrate my point, here's what the function looks like when GuardWidening calls isSafeToSpeculativelyExecute('%sdiv', '%call') for the first time:

define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb8, %bb
  %phi = phi i32 [ 1, %bb ], [ %add9, %bb8 ]
  %call = call i1 @llvm.experimental.widenable.condition()
  br i1 %call, label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb3:                                              ; preds = %bb1
  %zext = zext i32 %phi to i64
  %sdiv = sdiv i64 1, %zext
  %add = add nuw nsw i64 %zext, 1
  %add4 = add nsw i64 %add, 1
  %add5 = add nsw i64 %add4, %sdiv
  %icmp = icmp sgt i64 %add5, 1
  %call6 = call i1 @llvm.experimental.widenable.condition()
  %and = and i1 %icmp, %call6
  br i1 %and, label %bb8, label %bb7

bb7:                                              ; preds = %bb3
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb8:                                              ; preds = %bb3
  %add9 = add nuw nsw i32 %phi, 1
  br label %bb1
}

And here's what the IR looks like when it calls isSafeToSpeculativelyExecute('%sdiv', '%call') the second time:

define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb8, %bb
  %phi = phi i32 [ 1, %bb ], [ %add9, %bb8 ]
  %zext = zext i32 %phi to i64
  %add = add nuw nsw i64 %zext, 1
  %add4 = add nsw i64 %add, 1
  %call = call i1 @llvm.experimental.widenable.condition()
  br i1 %call, label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb3:                                              ; preds = %bb1
  %sdiv = sdiv i64 1, %zext
  %add5 = add nsw i64 %add4, %sdiv
  %icmp = icmp sgt i64 %add5, 1
  %call6 = call i1 @llvm.experimental.widenable.condition()
  %and = and i1 %icmp, %call6
  br i1 %and, label %bb8, label %bb7

bb7:                                              ; preds = %bb3
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb8:                                              ; preds = %bb3
  %add9 = add nuw nsw i32 %phi, 1
  br label %bb1
}

The only difference is that %zext, %add and %add4 were hoisted to bb1.

That problem isn't critical, and it probably can be resolved by removing the corresponding part of the assertion, but it demonstrates that some users indeed expect isSafeToSpeculativelyExecute to be resilient in face of some changes to the operands.

Yeah thats what I suspected. Maybe a new API is inorder? isSafeToSpeculativelyExecuteConst so that things like hoisting (which don't change the operands) can use the improve analysis but not things like GuardWidening or CVP. We could then replace where it makes sense rather than forcing the change to a 100 places which may rely on the _may change_ behavior.

Maybe a new API is inorder? isSafeToSpeculativelyExecuteConst so that things like hoisting (which don't change the operands) can use the improve analysis but not things like GuardWidening or CVP.

Yeah, that may be a good idea.

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

I tend to agree that this issue leans more on narrowUDivOrURem but are these implicit assumptions elsewhere in the project? If the common usage of isSpeculativelyExecutable has come to mean that its speculatively executable even if the non-const operands change then it is a bug in the change to isSpeculativelyExecutable, not with narrowUDivOrURem

In GuardWidening there's a similar problem: when it first analyzes code, it calls isSafeToSpeculativelyExecute for a pair {instruction, insertion point} (here) and then it starts hoisting the instruction along with its operands, starting from the operands. While hoisting, it asserts that isSafeToSpeculativelyExecute should be true (here) since it should have been checked prior to hoisting and should still remain true. With this patch that assertion in some cases fails after the operands get hoisted. The return value of isSafeToSpeculativelyExecute changes because isGuaranteedNotToBePoison for some reason fails to prove that the denominator isn't poison, even though the hoisting had no impact on what values the operands can have. I understand that isGuaranteedNotToBePoison by its nature is always allowed to return false, so I don't think there's any problem with it.
To illustrate my point, here's what the function looks like when GuardWidening calls isSafeToSpeculativelyExecute('%sdiv', '%call') for the first time:

define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb8, %bb
  %phi = phi i32 [ 1, %bb ], [ %add9, %bb8 ]
  %call = call i1 @llvm.experimental.widenable.condition()
  br i1 %call, label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb3:                                              ; preds = %bb1
  %zext = zext i32 %phi to i64
  %sdiv = sdiv i64 1, %zext
  %add = add nuw nsw i64 %zext, 1
  %add4 = add nsw i64 %add, 1
  %add5 = add nsw i64 %add4, %sdiv
  %icmp = icmp sgt i64 %add5, 1
  %call6 = call i1 @llvm.experimental.widenable.condition()
  %and = and i1 %icmp, %call6
  br i1 %and, label %bb8, label %bb7

bb7:                                              ; preds = %bb3
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb8:                                              ; preds = %bb3
  %add9 = add nuw nsw i32 %phi, 1
  br label %bb1
}

For this IR, I think isSafeToSpeculativelyExecute('%sdiv', '%call') should return false. I implemented a patch D150542 to give my opinion (may be totally wrong :)).

Short term fix is obviously revert (still building), but longer term
I guess we need an API for "is always speculatively executable"?
or something like that.
@nikic, any thoughts on how best to proceed in trying to safely
strengthen the isSpeculativelyExecutable case?

I think strengthening the 'isSpeculativlyExecutable' is a bit conservative. I found that this bug is only exposed in CVP's narrowUDivOrURem, we can add some checks to avoid it.

I tend to agree that this issue leans more on narrowUDivOrURem but are these implicit assumptions elsewhere in the project? If the common usage of isSpeculativelyExecutable has come to mean that its speculatively executable even if the non-const operands change then it is a bug in the change to isSpeculativelyExecutable, not with narrowUDivOrURem

In GuardWidening there's a similar problem: when it first analyzes code, it calls isSafeToSpeculativelyExecute for a pair {instruction, insertion point} (here) and then it starts hoisting the instruction along with its operands, starting from the operands. While hoisting, it asserts that isSafeToSpeculativelyExecute should be true (here) since it should have been checked prior to hoisting and should still remain true. With this patch that assertion in some cases fails after the operands get hoisted. The return value of isSafeToSpeculativelyExecute changes because isGuaranteedNotToBePoison for some reason fails to prove that the denominator isn't poison, even though the hoisting had no impact on what values the operands can have. I understand that isGuaranteedNotToBePoison by its nature is always allowed to return false, so I don't think there's any problem with it.
To illustrate my point, here's what the function looks like when GuardWidening calls isSafeToSpeculativelyExecute('%sdiv', '%call') for the first time:

define void @foo() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb8, %bb
  %phi = phi i32 [ 1, %bb ], [ %add9, %bb8 ]
  %call = call i1 @llvm.experimental.widenable.condition()
  br i1 %call, label %bb3, label %bb2

bb2:                                              ; preds = %bb1
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb3:                                              ; preds = %bb1
  %zext = zext i32 %phi to i64
  %sdiv = sdiv i64 1, %zext
  %add = add nuw nsw i64 %zext, 1
  %add4 = add nsw i64 %add, 1
  %add5 = add nsw i64 %add4, %sdiv
  %icmp = icmp sgt i64 %add5, 1
  %call6 = call i1 @llvm.experimental.widenable.condition()
  %and = and i1 %icmp, %call6
  br i1 %and, label %bb8, label %bb7

bb7:                                              ; preds = %bb3
  call void (...) @llvm.experimental.deoptimize.isVoid() [ "deopt"() ]
  ret void

bb8:                                              ; preds = %bb3
  %add9 = add nuw nsw i32 %phi, 1
  br label %bb1
}

For this IR, I think isSafeToSpeculativelyExecute('%sdiv', '%call') should return false. I implemented a patch D150542 to give my opinion (may be totally wrong :)).

Took a look, think you are right.
Plan for this patch is to update the API by adding overload bool meaning isSafeToSpeculativelyExecute(...., bool PreservesOperandCharacteristics = false).
If PreservesOperandCharacteristics is set, it means we the analysis promises the use will not modify the operands in a way that will invalidate the analysis (so only trunc if we known its still safe to speculatively execute). This should be usable to things like hoisting.

Add a new flag PreservesOpCharacteristics. If its true it is
assumed that the user will not do anything to the Inst argument that
would change the result of
isSafeToSpeculativelyExecute{WithOpcode}. If it true,
isSafeToSpeculativelyExecute{WithOpcode} will use KnownBits / proper
ValueTracking analysis. If its false,
isSafeToSpeculativelyExecute{WithOpcode} not use KnownBits / proper
ValueTracking analysis and the user may modify the operands in a way
that may change their values with respect to the analysis.

ping regarding the new codes

xbolva00 added inline comments.
llvm/include/llvm/Analysis/ValueTracking.h
712 ↗(On Diff #524139)

Document PreservesOpCharacteristics?

goldstein.w.n marked an inline comment as done.

Document new flag

ping. Is this still worth pursing?

nikic added a comment.Jun 13 2023, 4:00 AM

Okay, so as I understand it, this patch ran into two separate problems:

The first is that isSafeToSpeculativeExecute() is used for two different purposes: One is to speculate the instruction (i.e. execute it where it may not have previously been executed, keeping the same arguments), and the other is to execute the instruction at the same position but with different operands (with the implicit assumption that constant operands stay the same). Previously, the same function worked for both of those, but this is no longer the case if we analyze variables.

Actually, I think this is not quite true and we have a pre-existing issue for loads. Consider this example: https://alive2.llvm.org/ce/z/weHUyL If there were a control flow exit between the load and the select, this might speculate a trapping load.

With this in mind, this is clearly something that needs to be solved. My preference would to provide two separate functions (with a shared implementation) for the two different use cases, rather than specifying a flag everywhere. For example, one stays isSafeToSpeculativelyExecute() and the other becomes isSafeToExecuteWithDifferentOperands(). I think we can do this independently of the div change, with the load case as the motivating example.

The second problem is related to the poison check and the context instruction. As far as I understand, the issue are callers of isSafeToSpeculativelyExecute() which don't speculate instructions one-by-one (as LICM does), but rather check a whole sequence for speculability first and then perform the actual transform. This ends up being problematic, because we perform poison queries on instructions in their previous positions, where the fact that they are passed to the division itself implies that they cannot be poison. So what we really want to determine here is "is this instruction non-poison at this context under the assumption that it has already been hoisted there", which is not what the existing APIs do.

This problem is not as simple as not doing the programUndefinedIfUndefOrPoison() check, it can also occur in other contexts. Let's say we have a sequence like this:

%y = load i32, ptr %p, !noundef !{}, !range !{i32 1, i32 10}
%d = udiv i32 %x, %y

Even if we ignore that %y is non-poison due to being used in the udiv, we also know that it is non-poison due to the !noundef metadata. However, that !noundef metadata would be dropped if the load were actually speculated. So in this case, the udiv is actually not safe to speculate, because %y may be poison after speculation.

The two possible ways of solving this I can see are:

  1. Add an extra argument to isGuaranteedNotToBeUndefOrPoison() that pretends instructions have been speculated.
  2. Only support cases where the div arguments dominate the context instruction (if no context, that would effectively limit to constants/arguments). In that case the plain isGuaranteedNotToBeUndefOrPoison() should work fine.

Is this still worth pursing?

Not sure, to be honest. I liked this conceptually, but the problem turned out to be much more complicated than anticipated.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2787 ↗(On Diff #525821)

I don't think we should worry about that here and specify whatever is correct semantically.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
1249 ↗(On Diff #525821)

This should be false.

llvm/lib/Transforms/Scalar/SpeculativeExecution.cpp
300 ↗(On Diff #525821)

Why false here?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
450 ↗(On Diff #525821)

Why false here? This should be simple speculation.

Okay, so as I understand it, this patch ran into two separate problems:

The first is that isSafeToSpeculativeExecute() is used for two different purposes: One is to speculate the instruction (i.e. execute it where it may not have previously been executed, keeping the same arguments), and the other is to execute the instruction at the same position but with different operands (with the implicit assumption that constant operands stay the same). Previously, the same function worked for both of those, but this is no longer the case if we analyze variables.

Actually, I think this is not quite true and we have a pre-existing issue for loads. Consider this example: https://alive2.llvm.org/ce/z/weHUyL If there were a control flow exit between the load and the select, this might speculate a trapping load.

With this in mind, this is clearly something that needs to be solved. My preference would to provide two separate functions (with a shared implementation) for the two different use cases, rather than specifying a flag everywhere. For example, one stays isSafeToSpeculativelyExecute() and the other becomes isSafeToExecuteWithDifferentOperands(). I think we can do this independently of the div change, with the load case as the motivating example.

The second problem is related to the poison check and the context instruction. As far as I understand, the issue are callers of isSafeToSpeculativelyExecute() which don't speculate instructions one-by-one (as LICM does), but rather check a whole sequence for speculability first and then perform the actual transform. This ends up being problematic, because we perform poison queries on instructions in their previous positions, where the fact that they are passed to the division itself implies that they cannot be poison. So what we really want to determine here is "is this instruction non-poison at this context under the assumption that it has already been hoisted there", which is not what the existing APIs do.

This problem is not as simple as not doing the programUndefinedIfUndefOrPoison() check, it can also occur in other contexts. Let's say we have a sequence like this:

%y = load i32, ptr %p, !noundef !{}, !range !{i32 1, i32 10}
%d = udiv i32 %x, %y

Even if we ignore that %y is non-poison due to being used in the udiv, we also know that it is non-poison due to the !noundef metadata. However, that !noundef metadata would be dropped if the load were actually speculated. So in this case, the udiv is actually not safe to speculate, because %y may be poison after speculation.

The two possible ways of solving this I can see are:

  1. Add an extra argument to isGuaranteedNotToBeUndefOrPoison() that pretends instructions have been speculated.
  2. Only support cases where the div arguments dominate the context instruction (if no context, that would effectively limit to constants/arguments). In that case the plain isGuaranteedNotToBeUndefOrPoison() should work fine.

Is a third alternative to refactor the usage of isSafeToExecuteWithDifferentOperands() s.t instead of batching the analysis, it does the analysis/transform piece-meal? Or is that dramatically more work/have compile time/codegen impact? At the very least we could add a "IsBatchedAnalysis" flag (or API) and only apply the first two constraints if its set.

Is there not a final concern that isKnownNonZero uses dominating conditions to determine zero/non-zero, so w.o a CxtI instruction we can't actually use IsKnownNonZero as the transform may change them i.e:

%c = icmp ule i32 %x, 1
br i1 %c, label %T, label %F
T:
ret i32 %y
F:
%r = udiv %y, %x
ret i32 %r

we can't transform this into a select or something.

Think you could do the same with assume i.e:

%c = icmp ule i32 %z, 1
br i1 %c, label %T, label %F
T:
ret i32 %y
F:
%nz = icmp ne i32 %x, 0
call void @llvm.assume(i1 %nz)
%r = udiv %y, %x
ret i32 %r

For this, think we would need a flag is IsKnownNonZero to indicate we can't use assumes/dominating conditions to do isKnowNonZero (or just rely on computeKnownBits.isNonZero())

Is this still worth pursing?

Not sure, to be honest. I liked this conceptually, but the problem turned out to be much more complicated than anticipated.

I think given that we have an issue with loads a fix is needed either way. no?

nikic added a comment.Jun 13 2023, 1:03 PM

Is a third alternative to refactor the usage of isSafeToExecuteWithDifferentOperands() s.t instead of batching the analysis, it does the analysis/transform piece-meal? Or is that dramatically more work/have compile time/codegen impact? At the very least we could add a "IsBatchedAnalysis" flag (or API) and only apply the first two constraints if its set.

For some transforms, we need to ensure that e.g. all instructions in a block are speculatable, otherwise the transform is not legal. We can't really do that incrementally.

Is there not a final concern that isKnownNonZero uses dominating conditions to determine zero/non-zero, so w.o a CxtI instruction we can't actually use IsKnownNonZero as the transform may change them i.e:

%c = icmp ule i32 %x, 1
br i1 %c, label %T, label %F
T:
ret i32 %y
F:
%r = udiv %y, %x
ret i32 %r

we can't transform this into a select or something.

In this case we would be performing the query on %x, with the definition point of %x as the context, at which point there should be no dominating conditions for it. So I think there would be no problem.

Think you could do the same with assume i.e:

%c = icmp ule i32 %z, 1
br i1 %c, label %T, label %F
T:
ret i32 %y
F:
%nz = icmp ne i32 %x, 0
call void @llvm.assume(i1 %nz)
%r = udiv %y, %x
ret i32 %r

The assume case may be problematic because we have limited backwards reasoning for assumes. Your example wouldn't hit this, but if you defined %x in the F block that could be a problem.

For this, think we would need a flag is IsKnownNonZero to indicate we can't use assumes/dominating conditions to do isKnowNonZero (or just rely on computeKnownBits.isNonZero())

Is this still worth pursing?

Not sure, to be honest. I liked this conceptually, but the problem turned out to be much more complicated than anticipated.

I think given that we have an issue with loads a fix is needed either way. no?

It's two separate problems: The load issue is only about the isSafeToSpeculativelyExecute/isSafeToExecuteWithDifferentOperands split, it is not affected (as far as I can tell) by the trickier context instruction problem.

The second problem is related to the poison check and the context instruction. As far as I understand, the issue are callers of isSafeToSpeculativelyExecute() which don't speculate instructions one-by-one (as LICM does), but rather check a whole sequence for speculability first and then perform the actual transform. This ends up being problematic, because we perform poison queries on instructions in their previous positions, where the fact that they are passed to the division itself implies that they cannot be poison. So what we really want to determine here is "is this instruction non-poison at this context under the assumption that it has already been hoisted there", which is not what the existing APIs do.

This problem is not as simple as not doing the programUndefinedIfUndefOrPoison() check, it can also occur in other contexts. Let's say we have a sequence like this:

%y = load i32, ptr %p, !noundef !{}, !range !{i32 1, i32 10}
%d = udiv i32 %x, %y

Even if we ignore that %y is non-poison due to being used in the udiv, we also know that it is non-poison due to the !noundef metadata. However, that !noundef metadata would be dropped if the load were actually speculated. So in this case, the udiv is actually not safe to speculate, because %y may be poison after speculation.

The two possible ways of solving this I can see are:

  1. Add an extra argument to isGuaranteedNotToBeUndefOrPoison() that pretends instructions have been speculated.
  2. Only support cases where the div arguments dominate the context instruction (if no context, that would effectively limit to constants/arguments). In that case the plain isGuaranteedNotToBeUndefOrPoison() should work fine.

Can we have the third solution:
Continue or improving what we do in D150542.
And as for

%y = load i32, ptr %p, !noundef !{}, !range !{i32 1, i32 10}
%d = udiv i32 %x, %y

When we check if %y is guaranteed not to be undef or poison, %CtxI argument should be respected. In other words, in IsGuaranteedNotToBeUndefOrPoison, if the load instruction's parent is not equal to %CtxI's, then we should not reason about the result from the instruction's metadata. Since metadata may be dropped after moving to a different block. But I am also confused that if metadata like !noundef can be dropped randomly, why can we still reason about the result from metadata?

Can we have the third solution:
Continue or improving what we do in D150542.
And as for

%y = load i32, ptr %p, !noundef !{}, !range !{i32 1, i32 10}
%d = udiv i32 %x, %y

When we check if %y is guaranteed not to be undef or poison, %CtxI argument should be respected. In other words, in IsGuaranteedNotToBeUndefOrPoison, if the load instruction's parent is not equal to %CtxI's, then we should not reason about the result from the instruction's metadata. Since metadata may be dropped after moving to a different block. But I am also confused that if metadata like !noundef can be dropped randomly, why can we still reason about the result from metadata?

I don't think this is appropriate, because it overloads the meaning of %CtxI, between "not poison at this point" and "not poison if hoisted to this point". Generally speaking, specifying a context instruction should only improve ValueTracking results, not make the more conservative.

We can do what you suggest in general, but it should not use the existing CtxI argument.

However, together with the additional concern around context-sensitive facts (like assumes) that @goldstein.w.n raised, I think that going for "require div operands to dominate the context instruction" is going to be the most robust solution.

An unrelated thought I had is that speculating these divs might run into a cost issue: We currently only speculate divs with constant divisor, which also ensures that the div will be expanded to arithmetic as a side-effect, so we never actually end up speculating real division instructions.

Can we have the third solution:
Continue or improving what we do in D150542.
And as for

%y = load i32, ptr %p, !noundef !{}, !range !{i32 1, i32 10}
%d = udiv i32 %x, %y

When we check if %y is guaranteed not to be undef or poison, %CtxI argument should be respected. In other words, in IsGuaranteedNotToBeUndefOrPoison, if the load instruction's parent is not equal to %CtxI's, then we should not reason about the result from the instruction's metadata. Since metadata may be dropped after moving to a different block. But I am also confused that if metadata like !noundef can be dropped randomly, why can we still reason about the result from metadata?

I don't think this is appropriate, because it overloads the meaning of %CtxI, between "not poison at this point" and "not poison if hoisted to this point". Generally speaking, specifying a context instruction should only improve ValueTracking results, not make the more conservative.

We can do what you suggest in general, but it should not use the existing CtxI argument.

However, together with the additional concern around context-sensitive facts (like assumes) that @goldstein.w.n raised, I think that going for "require div operands to dominate the context instruction" is going to be the most robust solution.

An unrelated thought I had is that speculating these divs might run into a cost issue: We currently only speculate divs with constant divisor, which also ensures that the div will be expanded to arithmetic as a side-effect, so we never actually end up speculating real division instructions.

Going to post refactor patch that fixes load soonish (probably tomorrow), will run div patches through comile-time-tracker and maybe we will drop (or add flag for heavy analysis).