Page MenuHomePhabricator

[LoopUnroll] fix cloning callbr
Needs RevisionPublic

Authored by nickdesaulniers on Jul 2 2019, 2:19 PM.

Details

Summary

There is currently a correctness issue when unrolling loops containing
callbr's where their indirect targets are being updated correctly to the
newly created labels, but their operands are not. This manifests in
unrolled loops where the second and subsequent copies of callbr
instructions have blockaddresses of the label from the first instance of
the unrolled loop, which would result in nonsensical runtime control
flow.

When cloning a callbr, update its blockaddress operands if they were cloned, too.

Link: https://bugs.llvm.org/show_bug.cgi?id=42489
Link: https://groups.google.com/forum/#!topic/clang-built-linux/z-hRWP9KqPI

Event Timeline

nickdesaulniers created this revision.Jul 2 2019, 2:19 PM

Here's how I might modify the above test if for when we can loop unroll correctly: https://gist.github.com/nickdesaulniers/7216f6e5a17c7064285190440cb88f1d

E5ten added a subscriber: E5ten.Jul 2 2019, 3:18 PM

I have a more complete fix for this whole issue outright. Let me get the test case working, then I'll upload a v2.

  • fix the loop unroller
nickdesaulniers retitled this revision from [LoopUnroll] do not unroll loops containing callbr to [LoopUnroll] fix cloning callbr.Jul 2 2019, 5:24 PM
nickdesaulniers edited the summary of this revision. (Show Details)
nickdesaulniers marked an inline comment as done.Jul 2 2019, 5:26 PM
nickdesaulniers added inline comments.
llvm/lib/Transforms/Utils/LoopUnroll.cpp
103

Notes to reviewers: let me know if the comments are excessive and I'll remove them.

Also note that I manually peeled this loop out from the above in this function. I could place the isa<CallBrInst> and rest of this block in there. Just let me know WDYT?

srhines added inline comments.Jul 2 2019, 5:34 PM
llvm/lib/Transforms/Utils/LoopUnroll.cpp
105

Fold this line into the if statement for LLVM style.

nickdesaulniers marked an inline comment as done.Jul 2 2019, 5:42 PM
nickdesaulniers marked an inline comment as done.Jul 2 2019, 6:08 PM
nickdesaulniers added inline comments.
llvm/lib/Transforms/Utils/LoopUnroll.cpp
111

Sorry, I think this should be a dyn_cast (we're "downcasting" from a Value* to a BasicBlock*, which may fail)

  • reroll loops, prefer conciseness
fhahn added a subscriber: fhahn.Jul 3 2019, 3:26 AM
fhahn added inline comments.
llvm/lib/Transforms/Utils/LoopUnroll.cpp
100

IIUC, for block addresses, we now set the operand first by the code above and then also by this block here. It might be slightly better to first check for the special case and continue in case we hit it.

Also, a brief comment why we need special handling here would be helpful IMO.

103

Is it possible for It->second to not be a BB here? I think if the original operand was a BB, the new one also should be a BB. So this could be a regular cast?

llvm/test/Transforms/LoopUnroll/callbr.ll
2

The test only checks the emitted IR and no debug output. So the 2>&1 should not be needed, right?

24

nit: could just be a void function.

nickdesaulniers marked 3 inline comments as done.
  • check early, continue, fix test nits
llvm/lib/Transforms/Utils/LoopUnroll.cpp
103

Is it possible for It->second to not be a BB here?

I'm pretty sure It->second can ONLY be a BasicBlock here, since the values stored in VMap were created by llvm::CloneBasicBlock (see VMap there).

That said, I'm not super confident in my understanding of dyn_cast vs static_cast.

The VMap is an instance of ValueToValueMapTy, which simply maps Value* to Value*. As Value is the base class of a lot of stuff in LLVM, we end up storing all kinds of pointers to derived classes in VMap.

My understanding of when to use dyn_cast (or even dynamic_cast) vs static_cast is that it's always ok to static_cast from derived class to base class, but not vice versa. That's why dyn_cast or dynamic_cast might result in nullptr at runtime, and thus need to be checked.

Since It->second should ALWAYS (IIUC) be a BasicBlock* in this case, I would have thought we still need a checked call to dyn_cast.

Maybe I'm misunderstanding and that it's ok to static_cast from base to derived if you're certain that base is always derived?

llvm/lib/Transforms/Utils/LoopUnroll.cpp
103

Ah, a quick reread of http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates makes it seem like cast<BasicBlock> should match with what I expect. Brb, enabling assertions in my build...

fhahn added inline comments.Jul 3 2019, 10:58 AM
llvm/lib/Transforms/Utils/LoopUnroll.cpp
103

Yep, cast<BasicBlock> will assert that it actually is a basic block with assertions enabled. That should be sufficient cover.

  • prefer cast<> to dyn_cast<>
nickdesaulniers marked 4 inline comments as done.Jul 3 2019, 11:12 AM
fhahn accepted this revision.Jul 3 2019, 1:06 PM

LGTM, thanks!

llvm/lib/Transforms/Utils/LoopUnroll.cpp
608

nit: unrelated change?

This revision is now accepted and ready to land.Jul 3 2019, 1:06 PM
nickdesaulniers marked an inline comment as done.Jul 3 2019, 1:39 PM

Great! Thanks for the code review.

nickdesaulniers marked an inline comment as done.Jul 3 2019, 1:40 PM
nickdesaulniers added inline comments.
llvm/lib/Transforms/Utils/LoopUnroll.cpp
608

Intentional; helps readability to declare as late as possible.

Is this transform safe? The inline asm could stash the address of a destination in a variable in one loop iteration, and use it in a later loop iteration. Or is that not legal?

Is this transform safe? The inline asm could stash the address of a destination in a variable in one loop iteration, and use it in a later loop iteration. Or is that not legal?

Isn't that an issue in inline assembly regardless of whether or not it's asm goto?
GCC seems to unroll the loop, even when capturing the induction variable:
https://godbolt.org/z/HVUTC4
Should the presence of inline assembly be such an optimization barrier?

If there's some rule that distinguishes blockaddresses used in callbr from general blockaddresses, we should state that explicitly somewhere in LangRef.

fhahn added a comment.Jul 3 2019, 3:51 PM

Maybe we could make the semantics of callbr a bit more explicit in LangRef?

Currently it only mentions that the only use is to implement asm goto's like in GCC. IIUC, asm goto cannot have outputs and the inputs are explicit, those can be updated while unrolling. According to https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html 6.47.2.7 Goto Labels, if the assembly in the asm goto modifies anything, it needs a "memory" clobber. I think this should be handled like regular inline assembly already in LLVM?

it needs a "memory" clobber. I think this should be handled like regular inline assembly already in LLVM?

Neither GCC nor Clang prevent loop unrolling loops containing inline asm w/ "memory" clobber. You can observe this with my godbolt link above and manually add "memory" to the 4 position in the asm statement.

If there's some rule that distinguishes blockaddresses used in callbr from general blockaddresses, we should state that explicitly somewhere in LangRef.

I don't really understand what you're looking for @eli.friedman . Should I add a statment along the lines of "blockaddresses may be rewritten during optimization passes to refer to the address of newly created blocks"?

The question is whether something like the following is legal:

void bar(void) {
    long temp = 0;
    #pragma GCC unroll 3
    for (int j = 0; j < 3; ++j) {
        asm goto("lea %l2(%%rip), %%rax\n"
                 "cmp $0, %0\n"
                 "jne 1f\n"
                 "mov %%rax, %0\n"
                 "1:\n"
                 "jmp *%0\n" :: "m"(temp), "r"(j) :"memory","rax": baz);
        baz:;
    }
}
int main() { bar(); }

I guess the answer is that no, it isn't legal, given gcc's behavior. But there's nothing in LangRef that forbids it... we need some rule that ties the value passed to the "X" constraint to the actual successor.

Or actually, it might make more sense to change the way we generate/lower callbr, to make the label parameters implicit; instead of modeling them with blockaddress, we never write them in IR at all, and automatically generate them later based on the successor list.

hfinkel added a subscriber: hfinkel.Jul 3 2019, 5:42 PM

Or actually, it might make more sense to change the way we generate/lower callbr, to make the label parameters implicit; instead of modeling them with blockaddress, we never write them in IR at all, and automatically generate them later based on the successor list.

That's an appealing idea.

We do need to be careful, AFAIK, because the goto targets can be passed into the asm as parameters (where the block addresses come from the labels-as-values extension). However, in theory we already shouldn't unroll loops with blocks with their address taken, so I imagine that case is irrelevant here (*).

(*) LoopUnroll.cpp actually only seems to check Header->hasAddressTaken(), not all of the blocks, so maybe that's wrong?

Or actually, it might make more sense to change the way we generate/lower callbr, to make the label parameters implicit; instead of modeling them with blockaddress, we never write them in IR at all, and automatically generate them later based on the successor list.

That's an appealing idea.

I kind of agree; looking at the arguments to callbr, the label list (the final part in []) feels like duplicate information of the earlier blockaddress operands. The blockaddress operands in the common case SHOULD match or be the address of blocks in the label list, except when you're explicitly passing the address of labels around (see below link). In that first case, why specify explicitly the blockaddress at all? Surely other transforms may update the label list but not the blockaddresses, just like this patch w/ LoopUnroll. I feel like the blockaddress could be implicit based on the label list, and generated as late as possible and only when needed.

That said, I'd really like to land this patch as it addresses an observable and bad bug in the Linux kernel, and I'd like to do so to make the clang-9 release train, rather than try to rearchitect callbr here.

We do need to be careful, AFAIK, because the goto targets can be passed into the asm as parameters (where the block addresses come from the labels-as-values extension). However, in theory we already shouldn't unroll loops with blocks with their address taken, so I imagine that case is irrelevant here (*).

(*) LoopUnroll.cpp actually only seems to check Header->hasAddressTaken(), not all of the blocks, so maybe that's wrong?

Great point. Looking at GCC:
https://godbolt.org/z/lKa_HD

Notice: it does unroll/duplicate the label within the loop when the label is passed in via the final label list in the asm goto. It does not duplicate the address of label passed in, even when the induction variable is captured.

So my patch as it stands in the current revision is incorrect or will differ from GCC; I need to differentiate between is this BlockAddress operand explicitly passed in as an input, or is it from the label list. I don't think it's hard for me to change what I have to at least match GCC. To @eli.friedman 's point; then yes blockaddress is indeed handled differently in this case. Let me fix up the patch and add a test based on the godbolt link.

And thanks to everyone so far for the feedback and discussion.

Here's another interesting case for me to add a test for: https://godbolt.org/z/WwKXaI

The same address being using in the label list AND explicitly passed in as an address of label. Note the difference in behavior there.

Here's another interesting case for me to add a test for: https://godbolt.org/z/WwKXaI

The same address being using in the label list AND explicitly passed in as an address of label. Note the difference in behavior there.

Sorry, that test case would rely on https://reviews.llvm.org/D64167 first. ;)

fhahn requested changes to this revision.Jul 8 2019, 2:34 PM
fhahn added reviewers: hfinkel, efriedma.

Marking as requiring changes while the comments are being addressed.

That said, I'd really like to land this patch as it addresses an observable and bad bug in the Linux kernel, and I'd like to do so to make the clang-9 release train, rather than try to rearchitect callbr here.

IIUC the patch as is fixes an issue with unrolling asm goto, in cases unrolling is legal. But we need additional checks to prevent unrolling in the cases Eli mentioned. Maybe a safer approach would be to start with a patch to restrict the unrolling of loops with asm goto (that should also fix the bug in the linux kernel, right?) and then allow unrolling for the safe subset of cases where it is legal.

Sorry, that test case would rely on https://reviews.llvm.org/D64167 first. ;)

It's fine to have this patch depend on D64167

This revision now requires changes to proceed.Jul 8 2019, 2:34 PM

Marking as requiring changes while the comments are being addressed.

That said, I'd really like to land this patch as it addresses an observable and bad bug in the Linux kernel, and I'd like to do so to make the clang-9 release train, rather than try to rearchitect callbr here.

IIUC the patch as is fixes an issue with unrolling asm goto, in cases unrolling is legal. But we need additional checks to prevent unrolling in the cases Eli mentioned. Maybe a safer approach would be to start with a patch to restrict the unrolling of loops with asm goto (that should also fix the bug in the linux kernel, right?) and then allow unrolling for the safe subset of cases where it is legal.

Sure thing, forked off v1 of this patch into: https://reviews.llvm.org/D64368

Note to self: this needs to be rebased on top of https://reviews.llvm.org/rL366130 and a test case for LoopUnswitch needs to be added.