This is an archive of the discontinued LLVM Phabricator instance.

[GVNHoist] don't hoist callbr users into the callbr's block
ClosedPublic

Authored by nickdesaulniers on Feb 27 2023, 3:27 PM.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 3:27 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
nickdesaulniers requested review of this revision.Feb 27 2023, 3:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 3:27 PM

Might be able to bail sooner in GVNHoist::hoistExpressions; will test that locally.

  • hoist (lol) "is terminator a callbr" check into GVNHoist::findHoistableCandidates
bjope added inline comments.Feb 27 2023, 4:24 PM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
893

I think this is more restricted than needed. I mean, it could still be valid to hoist values into the CallBr terminated block if the hoisted instruction isn't using a value defined by the CallBr. That ofcourse also depends on if the CallBr might have other side effects or not. So ideally we would need to check if it is safe to hoist past the terminator and not only if it is safe to hoist to the end of BB.

It should at least be worth some more code comments here, explaining that we are conservative in this sense, to make a simple safe fix (while assuming that hoisting past a CallBr would be a quite rare opportunity anyway).

void added inline comments.Feb 27 2023, 5:23 PM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
893

With the recent changes allowing for valid outputs on the indirect branches, it could branch off somewhere with incorrect values. Or at the very least needlessly executed code. :-)

I've verified that this fixes the problem I saw ( https://github.com/llvm/llvm-project/issues/61023 ).
Thanks!

nickdesaulniers marked 2 inline comments as done.
nickdesaulniers retitled this revision from [GVNHoist] skip hoisting into blocks terminated by callbr to [GVNHoist] don't hoist callbr users into the callbr's block.
  • only hoist non-callbr user's, add test, use update_test_checks.py
llvm/lib/Transforms/Scalar/GVNHoist.cpp
893

I think this is more restricted than needed. I mean, it could still be valid to hoist values into the CallBr terminated block if the hoisted instruction isn't using a value defined by the CallBr. That ofcourse also depends on if the CallBr might have other side effects or not. So ideally we would need to check if it is safe to hoist past the terminator and not only if it is safe to hoist to the end of BB.

It should at least be worth some more code comments here, explaining that we are conservative in this sense, to make a simple safe fix (while assuming that hoisting past a CallBr would be a quite rare opportunity anyway).

If you're going to enable hoisting past a callbr, please add a testcase to ensure we don't hoist a load past a callbr which modifies the memory in question.

If you're going to enable hoisting past a callbr, please add a testcase to ensure we don't hoist a load past a callbr which modifies the memory in question.

Will do. Curiously, GVNHoist::safeToHoistLdSt returns false even for cases of loads not modified by callbr, so HoistGVN doesn't optimize those cases. FWICT, the call to GVNHoist::firstInBB is checking the callbr against itself. It looks like MemorySSA is perhaps claiming that for this input:

@x = global i32 0
@y = global i32 0
define i32 @foo4(i1 %z) {
entry:
  callbr void asm "", "=*m,!i"(ptr elementtype(i32) @x)
          to label %a [label %b]

a:                                                ; preds = %entry
  %0 = load i32, ptr @y, align 4
  ret i32 %0

b:                                                ; preds = %entry
  %1 = load i32, ptr @y, align 4
  ret i32 %1
}

that callbr is a "defining access" for the access of @y (IIUC)? That seems incorrect...but I'm unfamiliar with MemorySSA.

  • add tests for load hoisting, as per @efriedma

If you're going to enable hoisting past a callbr, please add a testcase to ensure we don't hoist a load past a callbr which modifies the memory in question.

Will do. Curiously, GVNHoist::safeToHoistLdSt returns false even for cases of loads not modified by callbr, so HoistGVN doesn't optimize those cases. FWICT, the call to GVNHoist::firstInBB is checking the callbr against itself. It looks like MemorySSA is perhaps claiming that for this input:

@x = global i32 0
@y = global i32 0
define i32 @foo4(i1 %z) {
entry:
  callbr void asm "", "=*m,!i"(ptr elementtype(i32) @x)
          to label %a [label %b]

a:                                                ; preds = %entry
  %0 = load i32, ptr @y, align 4
  ret i32 %0

b:                                                ; preds = %entry
  %1 = load i32, ptr @y, align 4
  ret i32 %1
}

that callbr is a "defining access" for the access of @y (IIUC)? That seems incorrect...but I'm unfamiliar with MemorySSA.

Ok, a quick read of https://llvm.org/docs/MemorySSA.html and running opt -S -passes=gvn-hoist,'print<memoryssa>' llvm/test/Transforms/GVNHoist/hoist-call.ll -disable-output gives:

MemorySSA for function: foo4
define i32 @foo4(i1 %z) {
entry:
; 1 = MemoryDef(liveOnEntry)
  callbr void asm "", "=*m,!i"(ptr elementtype(i32) @x)
          to label %a [label %b]

a:                                                ; preds = %entry
; MemoryUse(1)
  %0 = load i32, ptr @y, align 4
  ret i32 %0

b:                                                ; preds = %entry
; MemoryUse(1)
  %1 = load i32, ptr @y, align 4
  ret i32 %1
}

Shouldn't callbr have it's own dedicated MemoryDef that's distinct from liveOnEntry?

Shouldn't callbr have it's own dedicated MemoryDef that's distinct from liveOnEntry?

Hmm..maybe not? We don't for vanilla inline asm (non-asm-goto) that modifies memory: https://godbolt.org/z/4144Wd7e1. Shouldn't inline asm qualify as a MemoryAccess when "m" constraints are used?

Shouldn't callbr have it's own dedicated MemoryDef that's distinct from liveOnEntry?

Hmm..maybe not? We don't for vanilla inline asm (non-asm-goto) that modifies memory: https://godbolt.org/z/4144Wd7e1. Shouldn't inline asm qualify as a MemoryAccess when "m" constraints are used?

NVM, we do: https://godbolt.org/z/9EhW14z9W ...

Ka-Ka added a subscriber: Ka-Ka.Mar 1 2023, 10:29 PM
nickdesaulniers added inline comments.Mar 2 2023, 4:03 PM
llvm/test/Transforms/GVNHoist/hoist-call.ll
126

unused

bjope added inline comments.Mar 6 2023, 2:34 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
816

I guess CallBrInst is the only terminator that may define values today, but who knows about the future. So it would be nice generalize this somehow, i.e. not explicitly checking if the Terminator is a CallBrInst but rather checking if the terminator is a def.

Maybe we could just skip the IsCallBrTerminated check?

nickdesaulniers marked an inline comment as done.
  • remove specific isa<CallBrInst> as per bjope, remove unused param in test
nickdesaulniers added inline comments.
llvm/test/Transforms/GVNHoist/hoist-call.ll
124

I asked @george.burgess.iv if this fixme is correct; that we ought to be able to do such hoist. George mentioned that maybe AA was messing up. I wrote up a quick test case that disproves that: https://reviews.llvm.org/D145400 (not sure there's any value in landing that or not). So pretty much not sure about this comment.

@efriedma thoughts?

nikic added a subscriber: nikic.Mar 6 2023, 10:43 AM
nikic added inline comments.
llvm/test/Transforms/GVNHoist/hoist-call.ll
124

What matters here is not the aliasing relationship with the callbr arguments (which have nothing to do with the callbr itself), but whether the callbr modrefs the location. I believe we don't have any special callbr handling for that, so it will be determined purely based on memory attributes. If you add something like memory(argmem: readwrite) that might work.

efriedma accepted this revision.Mar 6 2023, 12:50 PM

LGTM with a couple minor comments

llvm/lib/Transforms/Scalar/GVNHoist.cpp
819

if (is_contained(T->users(), Insn))

llvm/test/Transforms/GVNHoist/hoist-call.ll
139

I just verified that if you mark this callbr "readnone", hoisting happens. Please adjust the testcase to demonstrate that hoisting.

This revision is now accepted and ready to land.Mar 6 2023, 12:50 PM
llvm/test/Transforms/GVNHoist/hoist-call.ll
139

I can do that...but...

If you add readnone to the above test case @foo3 (not @foo4 which you were referring to), we also hoist, which looks wrong to me...

nickdesaulniers added inline comments.Mar 6 2023, 2:03 PM
llvm/test/Transforms/GVNHoist/hoist-call.ll
139

Specifically:

-  callbr void asm "", "=*m,!i"(ptr elementtype(i32) @x)
+  callbr void asm "", "=*m,!i"(ptr elementtype(i32) @x) readnone

Using -passes='print<memoryssa> we can see that this simply eliminates the initial MemoryDef on the callbr.


I think we might need to do something like: https://reviews.llvm.org/D145400

diff --git a/llvm/lib/Transforms/Scalar/GVNHoist.cpp b/llvm/lib/Transforms/Scalar/GVNHoist.cpp
index b23182db20f0..d7d1fefcde44 100644
--- a/llvm/lib/Transforms/Scalar/GVNHoist.cpp
+++ b/llvm/lib/Transforms/Scalar/GVNHoist.cpp
@@ -818,6 +818,13 @@ void GVNHoist::checkSafety(CHIArgs C, BasicBlock *BB, GVNHoist::InsKind K,
     // the use above the def.
     if (is_contained(T->users(), Insn))
       continue;
+    if (const auto *CB = dyn_cast<CallBase>(T)) {
+      if (CB->isInlineAsm()) {
+        // For each operand
+        //   if we alias with insn
+        //     skip
+      }
+    }
     if (K == InsKind::Scalar) {
       if (safeToHoistScalar(BB, Insn->getParent(), NumBBsOnAllPaths))
         Safe.push_back(CHI);
nikic added inline comments.Mar 6 2023, 2:08 PM
llvm/test/Transforms/GVNHoist/hoist-call.ll
139

readnone means it's not allowed to read or write the global argument, so hoisting is indeed safe. As mentioned before, what you want here is memory(argmem: readwrite) to say that the callbr can access its arguments but will not clobber unrelated memory.

llvm/test/Transforms/GVNHoist/hoist-call.ll
139

What's the difference between memory(argmem: readwrite) vs no attribute? I guess https://llvm.org/docs/LangRef.html#function-attributes says that "memory(readwrite)" is the implicit default?

Shouldn't the front end be adding such attributes?
https://godbolt.org/z/zxM36Esd9

I'm happy to add whatever attributes to these test cases, my concern is that is feels like we're adding test cases for IR that frontend may never produce. Is that worthwhile to test? I kind of doubt it.

llvm/test/Transforms/GVNHoist/hoist-call.ll
139

Should I be adding tests for every combination of:

  1. memory(argmem: readwrite)
  2. memory(none) (is that the same as readnone?)
  3. <unspecified>

cross

  1. callbr accesses the same memory as the loads (MustAlias)
  2. callbr accesses different memory (NoAlias)

???

efriedma added inline comments.Mar 7 2023, 10:24 AM
llvm/test/Transforms/GVNHoist/hoist-call.ll
139

Yes, readnone is the old way to write memory(none); forgot the new syntax.

If you want to be extra careful, you could have the following:

  1. No memory attribute
  2. memory(none)
  3. memory(argmem: readwrite) with a matching argument
  4. memory(argmem: readwrite) with an unrelated argument

The point here is mostly to verify that we aren't accidentally forgetting to query alias analysis, though, so we don't need to go too deep into this.

hiraditya added inline comments.Mar 7 2023, 10:39 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
819

I'm assuming there are only a few T->users in practice, otherwise linear search on users could be slow.

If you're going to enable hoisting past a callbr, please add a testcase to ensure we don't hoist a load past a callbr which modifies the memory in question.

Will do. Curiously, GVNHoist::safeToHoistLdSt returns false even for cases of loads not modified by callbr, so HoistGVN doesn't optimize those cases. FWICT, the call to GVNHoist::firstInBB is checking the callbr against itself. It looks like MemorySSA is perhaps claiming that for this input:

@x = global i32 0
@y = global i32 0
define i32 @foo4(i1 %z) {
entry:
  callbr void asm "", "=*m,!i"(ptr elementtype(i32) @x)
          to label %a [label %b]

a:                                                ; preds = %entry
  %0 = load i32, ptr @y, align 4
  ret i32 %0

b:                                                ; preds = %entry
  %1 = load i32, ptr @y, align 4
  ret i32 %1
}

that callbr is a "defining access" for the access of @y (IIUC)? That seems incorrect...but I'm unfamiliar with MemorySSA.

MemorySSA is a very conservative wrapper on top of alias analysis. for any global, it is unlikely to be precise.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
819

maybe we can check if Insn has any operands that are defined by T

nickdesaulniers marked 6 inline comments as done.
  • add more memory attribute based tests, as per @efriedma
llvm/lib/Transforms/Scalar/GVNHoist.cpp
819

I'm assuming there are only a few T->users in practice

In practice, only CallBrInst's are terminators that produce values/can even have users. In practice, the number of users is 1.

maybe we can check if Insn has any operands that are defined by T

Equivalent.

nickdesaulniers added inline comments.Mar 8 2023, 3:47 PM
llvm/test/Transforms/GVNHoist/hoist-call.ll
139

Yes, readnone is the old way to write memory(none); forgot the new syntax.

Looks at the langref...is memory(none) the new way to write readnone?

The _only_ mention of readnone is as a parameter attribute:
https://llvm.org/docs/LangRef.html#parameter-attributes

But then I see it used all over the place in tests and other references in the langref...on call insts. Why do we have two attributes that do the same thing? And one is undocumented.

efriedma added inline comments.Mar 8 2023, 4:10 PM
llvm/test/Transforms/GVNHoist/hoist-call.ll
139
efriedma accepted this revision.Mar 8 2023, 6:38 PM

LGTM

nikic added inline comments.Mar 9 2023, 1:10 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
819

In practice, only CallBrInst's are terminators that produce values/can even have users.

Invokes also produces values and have users. Though it's possible that those aren't supported by GVNHoist.

In practice, the number of users is 1.

Hm, why? I'd expect more than one users even for front-end generated IR, depending on number of return values and number of control-flow paths.

nickdesaulniers planned changes to this revision.Mar 9 2023, 10:13 AM
nickdesaulniers marked an inline comment as done.
nickdesaulniers added inline comments.
llvm/lib/Transforms/Scalar/GVNHoist.cpp
819

Invokes also produces values and have users.

I guess catchswitch would be, too. I'll update the comment.

Hm, why? I'd expect more than one users even for front-end generated IR, depending on number of return values and number of control-flow paths.

The common case of callbr's produced-value usage comes from asm goto ("":"=r"(x):::foo); return x; bar: return 42;.

It is technically possible for the usage of the value to be 0 (unused) to many (used along every label in the label list). In practice, the number of users is 1.

Even if it's many, a linear search for any small number of N is irrelevant. I could make the same claim that a linear search through the operands of an instruction might be slow, for an instruction with infinite operands. Do any such instructions exist? Does it matter?

  • scan the operand list rather than the users list (assume that's shorter), as per @hiraditya
  • update comment about terminators that produce values
This revision is now accepted and ready to land.Mar 9 2023, 10:24 AM
nickdesaulniers marked an inline comment as done.Mar 9 2023, 10:24 AM
nikic accepted this revision.Mar 9 2023, 11:39 AM
This revision was landed with ongoing or failed builds.Mar 13 2023, 10:21 AM
This revision was automatically updated to reflect the committed changes.

Huh, the Windows buildbot seems to be complaining about this: https://lab.llvm.org/buildbot/#/builders/127/builds/45021/steps/7/logs/stdio

Not sure what's going wrong though...