Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[RFC prototype] Implementation of asm-goto support in LLVM
ClosedPublic

Authored by craig.topper on Oct 26 2018, 7:58 AM.

Details

Summary

This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

Once there is a consensus wrt the design of the feature, I will modify
this patch and split it into pieces for a proper review.

Even though this patch is definitely a prototype, it was tested quite
well, including the heavy-lifting of building the linux kernel
(kudos to Nick Desaulniers for doing that!).

One known problem in the implementation that I would like to mention
is that terminating callbr instruction is translated into INLINEASM
machine instruction, which is not a terminator. There are several crude
workarounds that we had to put in the code (e.g. see AsmPrinter.cpp) for
that. It has to be addressed in a better way and I would appreciate your
thoughts on that.

Authored By: Alexander Ivchenko, Mikhail Dvoretckii

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
chandlerc added inline comments.Jan 29 2019, 1:51 AM
include/llvm/IR/InstVisitor.h
273–279 ↗(On Diff #184024)

Need to update this code path.

include/llvm/IR/Instructions.h
4036–4038 ↗(On Diff #184024)

See above, this I think should sink to CallBase.

lib/AsmParser/LLParser.cpp
6197–6198 ↗(On Diff #184024)

bundles missing here?

6197–6198 ↗(On Diff #184024)

The syntax here diverges from both switch and indirectbr in surprising ways here. They don't use any keywords like or jump to introduce thi list of labels, they just directly put it there.

While abstractly, I somewhat like this syntax, I wonder if it would be better te *exactly* match the syntax for switch which has the same default label with a list of other labels structure to it. Consistency seems good here unless it causes problems?

6250 ↗(On Diff #184024)

Not a small vector?

6269 ↗(On Diff #184024)

Weird indentatino.

6279 ↗(On Diff #184024)

I'd use int and llvm::seq, but maybe better to remain consistent.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2973–2974 ↗(On Diff #184024)

Isn't one of the targets a fallthrough? Maybe i've not made it far enough, and the default target is different from the target here.

lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
439–443 ↗(On Diff #184024)

I'm a bit surprised we don't model this as an MBB operand to begin with if it *has* to be a direct block address anyways? Curious why.

lib/CodeGen/GlobalISel/IRTranslator.cpp
1246 ↗(On Diff #181183)

Maybe have the verifier reject if we can't lower here? Regardless, the comment still seems to need some amount of fixing.

deopt bundles are a reasonable thing, and just not really a reasonable thing (at this stage) to support on callbr.

1300 ↗(On Diff #184024)

This complex to thread through?

lib/CodeGen/IndirectBrExpandPass.cpp
151–153 ↗(On Diff #184024)

I think we have to fix this before this goes very far. I don't actually know how this is working in practice in our biggest customer (the Linux Kernel) as it should be using things like retpolines and needing this code to work.

I also think that this entire pass, while a reasonable thing to have, was a bad idea for retpoline lowering. We found retpolines to be faster than the branch trees created by this in most cases w/ many branches. So this needs a non-trivial update. Not sure who has the time to do this at this point though, I'll ask around w/ folks familiar with this bit of code.

It would be good to at least update the comment to English prose.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2535–2536 ↗(On Diff #184024)

Wait, we handle catchswitch combined with this? That seems .... ambitious.

2540–2547 ↗(On Diff #184024)

Weird that we handle probabilities here but not in Global ISel

lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

Wait, they do?

I don't understand -- the callbr is always a terminator. I understand that it may expand to lots of instructions, but isn't that opaque at this level?

lib/Target/X86/X86AsmPrinter.cpp
256–260 ↗(On Diff #184024)

The fact that we have to add an entire machine operand layer here makes me feel more strongly that we should just be using the existing support for MBB machine operands rather than this.... But maybe there is a reason I'm just missing so far.

craig.topper marked 4 inline comments as done.Jan 29 2019, 8:50 AM

Sorry it took so long, but made a full pass through this. Just want to point out that this patch is already in really great shape due to the huge amount of work from the original author, Craig taking it over, and the multiple rounds of review from Bill and Nick among others.

Most of my comments are pretty superficial and not hard to address I suspect. I have one big question and one big comment. I've referenced these in-line below as I went through the code but wanted to call them out at a highlevel:

  1. Why do we need the block address machine operand structure? I'm not saying to get rid of it (yet), I just don't understand the need to support both it and direct MBB operands.

Maybe I went the wrong way on this, but here's what I saw. The existence of the BlockAddress instruction in IR caused some portion of MachineIR handling to know that it needed to create a label of the form .Ltmp[0-9]+. This is helpfully annotated in the output assembly as "block address taken" and sometimes when there's a bug "address taken of block removed by CodeGen". The first version of this patch I inherited stripped the BlockAddress when the SelectionDAG was created. The resulting assembly would then print the .LBB label for the block in the inline asm instead. We ended up in a case where the LBB label didn't exist in the final output, but the Ltmp did. This failed to assemble obviously. Maybe that was really pointing to passes like BranchFolding.cpp doing bad things which we encountered later. But at the time I tried to fix it so that the inline asm printing would reference the .Ltmp label because that seemed like the label we really wanted.

  1. Why do we need to special case asm-goto targets in MI? I feel like this should somewhat fall out of the handling of address-taken basic blocks that can be jumped to somewhat opaquely. It would be really nice if there were a relatively common way of handling these rather than special casing the targets of asm-goto everywhere when it feels like the fundamental special treatment isn't actually about asm-goto at all. But maybe I've missed the critical thing that actually makes it special to asm-goto. If so, just help mme spot it1 =D

I haven't looked as closely at the actual optimizer changes yet as I'd like, but I think those are by-and-large going to be pretty straightforward.

Maybe it shouldn't be special to asm-goto. But its clear that we've already been treating exception handling pads specially, but not address-taken blocks so I've just been special casing asm goto to avoid changing anything about existing address taken. I'm not sure what we do for indirect branches, but I suspect analyzeBranch returning unanalyzable for indirect jumps is part of it. But that doesn't work for asm goto because our terminator is a direct jump which is analyzable.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2973–2974 ↗(On Diff #184024)

The only blocks that are marked as isAsmGotoTarget are the ones that aren't the default. See the code in SelectionDAGBuilder.

lib/CodeGen/GlobalISel/IRTranslator.cpp
1300 ↗(On Diff #184024)

Original author copied that comment out of visitInvokeInst

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2535–2536 ↗(On Diff #184024)

That was copy and pasted from Invoke. I'll remove it

lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

This is MachineIR. CallBr isn't a thing there. CallBr was expanded to an INLINEASM instruction followed by an unconditional jump during SelectionDAGBuilder. The INLINEASM block is not a terminator so its effectively "the middle of a basic block". This pass tried to shove a RBP pop between the INLINEASM instruction and the unconditional jump. In the case that exposed the bug, the unconditional jump went to a block that also jumped to the same inline asm target. So as far as the dominator tree was concerned, the block containing the INLINEASM dominated the other blocks so it was chosen to hold the RBP pop. But since the INLINEASM itself can jump that wasn't correct.

I'm not sure what we do for indirect branches, but I suspect analyzeBranch returning unanalyzable for indirect jumps is part of it. But that doesn't work for asm goto because our terminator is a direct jump which is analyzable.

Did you consider making blocks which contain an asm goto unanalyzable? Intuitively, that's what I would expect given an asm goto is a terminator at the IR level. But I guess that makes handling the fallthrough case efficiently more difficult?

craig.topper marked 3 inline comments as done.Jan 29 2019, 6:09 PM
craig.topper added inline comments.
lib/AsmParser/LLParser.cpp
6250 ↗(On Diff #184024)

Call and Invoke both use std::vector here...

Address some of Chandler's comments

Looking at updated patch and some of the optimizer bits, but some inline responses to discussion points below:

Sorry it took so long, but made a full pass through this. Just want to point out that this patch is already in really great shape due to the huge amount of work from the original author, Craig taking it over, and the multiple rounds of review from Bill and Nick among others.

Most of my comments are pretty superficial and not hard to address I suspect. I have one big question and one big comment. I've referenced these in-line below as I went through the code but wanted to call them out at a highlevel:

  1. Why do we need the block address machine operand structure? I'm not saying to get rid of it (yet), I just don't understand the need to support both it and direct MBB operands.

Maybe I went the wrong way on this, but here's what I saw. The existence of the BlockAddress instruction in IR caused some portion of MachineIR handling to know that it needed to create a label of the form .Ltmp[0-9]+. This is helpfully annotated in the output assembly as "block address taken" and sometimes when there's a bug "address taken of block removed by CodeGen". The first version of this patch I inherited stripped the BlockAddress when the SelectionDAG was created. The resulting assembly would then print the .LBB label for the block in the inline asm instead. We ended up in a case where the LBB label didn't exist in the final output, but the Ltmp did. This failed to assemble obviously. Maybe that was really pointing to passes like BranchFolding.cpp doing bad things which we encountered later. But at the time I tried to fix it so that the inline asm printing would reference the .Ltmp label because that seemed like the label we really wanted.

I've spent more time thinking about this. I think the difference here is the following... With the "block address" behavior we assume the address more than taken -- it is *captured* and its uses may be unanalyzable. As a consequence, it *cannot* be updated when we do things like rework the control flow.

On the flip side, the MBB operands are likely often assumed to be easily enumerated and updated when we're rewriting the control flow, and so the fact that we can't do that results in breakages.

At least, this is my understanding. Match yours?

Given this, I think the current "block address" stuff may be correct -- I don't see anything that indicates the asm-goto couldn't escape or even capture the addresses of its successors and then re-use them in some way, expecting them to be "real" addresses. If this also makes sense to you, likely good to add some comments for future readers.

  1. Why do we need to special case asm-goto targets in MI? I feel like this should somewhat fall out of the handling of address-taken basic blocks that can be jumped to somewhat opaquely. It would be really nice if there were a relatively common way of handling these rather than special casing the targets of asm-goto everywhere when it feels like the fundamental special treatment isn't actually about asm-goto at all. But maybe I've missed the critical thing that actually makes it special to asm-goto. If so, just help mme spot it1 =D

Maybe it shouldn't be special to asm-goto. But its clear that we've already been treating exception handling pads specially, but not address-taken blocks so I've just been special casing asm goto to avoid changing anything about existing address taken. I'm not sure what we do for indirect branches, but I suspect analyzeBranch returning unanalyzable for indirect jumps is part of it. But that doesn't work for asm goto because our terminator is a direct jump which is analyzable.

Ugh, yeah.

I think that we should consolidate all of this behind something like address-taken, but I think that can be follow-up work. I suspect that a decent number of the things you've fixed here are actually latent bugs w/ indirectbr and address-of-label extensions that we've just never managed to hit. =[[[[

But I think it's fine to fix that incrementally going forward by digging in and commoning these representations as much as possible.

lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

Ah, that explains the thing I was missing...

I had assumed that we would expand this to an INLINEASM instruction that *is* a terminator. In MIR we can have multiple terminators at the end of a block, and this is a common pattern, so could we not have both instructions be marked as terminators and avoid special casing this?

Made it through the optimizer code. Really minor changes here, I think this is looking good. Probable the biggest question marks are in the MI representation.

Also, how best to track on-going work that is going to be prioritized in the short term? Mostyl looking for what would be helpful for you all if the FIXME suggested below (or already in the code) are likely to be lost.

lib/Transforms/InstCombine/InstCombineCalls.cpp
4217–4219 ↗(On Diff #184223)

Check for .isTerminator() would I think make this a bit more self-explanatory.

lib/Transforms/InstCombine/InstructionCombining.cpp
923–925 ↗(On Diff #184223)

May also be a tiny bit clearer by using terminator based terminology and APIs.

lib/Transforms/Scalar/GVN.cpp
2177–2184 ↗(On Diff #184223)

Can just directly test on the CallBase now.

lib/Transforms/Utils/LoopSimplify.cpp
245–246 ↗(On Diff #184223)

Ok, I've seen this pattern too many times. ;]

I'd suggest adding isIndirectTerminator or hasIndirectControlFlow or some such?

lib/Transforms/Utils/SimplifyCFG.cpp
1269–1270 ↗(On Diff #184223)

FIXME as we can likely build an analogous safety predicate to the invoke one for CallBr.

1451–1456 ↗(On Diff #184223)

Fold into a test on CallBase.

craig.topper marked an inline comment as done.Jan 29 2019, 7:26 PM
craig.topper added inline comments.
lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

It looks like whether an instruction is a terminator is currently encoded in the MCInstrDesc entry corresponding to an opcode. So I think that means we would need a new flavor of INLINEASM opcode?

chandlerc added inline comments.Jan 29 2019, 8:11 PM
lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

Bleh, yeah.

I think that's (marginally) cleaner, but I have no idea how much work it is to thread an INLINEASM_BR node through the backend. If it's too much, could defer this.

Reason that I think it is cleaner: I suspect we'll keep finding ways in which we mishandle this, but if it turns into a terminator sequence I think the rest of the backend will start to behave correctly.

craig.topper marked 3 inline comments as done.

-More review comments addressed.
-Rebase after committing some of the CallBase related changes

Update the CallBrInst creation APIs to consistently accept a callee-type argument. This makes CallBrInst consistent with call and invoke after r351121

Add a bailout for asm goto target to Machine Block Placement. Fixes export4.zip from here https://github.com/ClangBuiltLinux/linux/issues/336

Update terminology to replace Transfers with IndirectDests. And Fallthrough with DefaultDest

Update CallBrInst creation interfaces to use FunctionCallee instead of Function* to match CallInst and InvokeInst

A few minor comments.

I assume the big outstanding item here is figuring out the MIR representation? Anything else?

docs/LangRef.rst
6910 ↗(On Diff #185140)

Maybe "branch to" or "jump to"? I don't want to imply that it has to be done via call/return.

lib/CodeGen/GlobalISel/IRTranslator.cpp
1246 ↗(On Diff #181183)

(still seems to need update)

1300 ↗(On Diff #184024)

(comment remains)

lib/CodeGen/IndirectBrExpandPass.cpp
151–153 ↗(On Diff #184024)

FWIW, I think I now understand -- I think the Linux kernel tester we have isn't building w/ retpolines enabled in the compiler, so we just haven't seen this explode. =[

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2535–2536 ↗(On Diff #184024)

(still needs fixing)

craig.topper marked an inline comment as done and an inline comment as not done.Feb 4 2019, 7:44 PM
craig.topper added inline comments.
include/llvm/Bitcode/LLVMBitCodes.h
477 ↗(On Diff #182342)

@chandlerc do you know the answer to the question I left here?

lib/Transforms/InstCombine/InstructionCombining.cpp
923–925 ↗(On Diff #184223)

I'm just going to remove this change because it would require a CallBrInst that returned a value, but we don't support that right now.

craig.topper marked an inline comment as done and an inline comment as not done.Feb 4 2019, 7:45 PM
craig.topper added inline comments.
lib/Transforms/InstCombine/InstructionCombining.cpp
923–925 ↗(On Diff #184223)

Oops I meant to remove this comment. I've now fixed the code to use terminator terminology and apis

chandlerc added inline comments.Feb 4 2019, 7:59 PM
include/llvm/Bitcode/LLVMBitCodes.h
477 ↗(On Diff #182342)

Not directly... But I think I can explain how to answer it.

Does any version of LLVM bitcode that we still support reading use 14? If so, then we need to leave it alone. If not, I think you can take it.

But maybe good to double check w/ someone who has thought more fully about bitcode backwards compat.

craig.topper marked 2 inline comments as done.Feb 5 2019, 11:57 AM
craig.topper added inline comments.
lib/CodeGen/GlobalISel/IRTranslator.cpp
1246 ↗(On Diff #181183)

Turns out this comment was copied directly from the InvokeInst handler above.

1300 ↗(On Diff #184024)

Does global isel do anything with probabilities anywhere else? I can't see any code obviously doing it. SelectionDAGBuilder has some helper functions that we used for CallBr, but I don't see equivalent helpers here or any calls passing probability to addSuccessor.

-Change the bitcode encoding to a value that's never been used so I don't have to figure out the compatibility story.
-Add the bitcode value to the llvm-bcanalyzer
-Fix "returns" in LangRef
-Remove comment from SelectionDAGBuilder

-Update vim syntax highlight file to include 'callbr'
-Add test cases for the 2 GVN hangs related to critical edge splitting that I had fixed previously.

kees added a subscriber: kees.Feb 6 2019, 5:56 AM

I found a weird mis-compilation bug. Not sure if in LLVM or Clang half. Details here: https://reviews.llvm.org/D56571#1386973

-Fix crash in bitcode writer when writing out callbr with attributes
-Add test backend test cases and jump threading test case for previously fixed bugs

-Remove GlobalISel implementation completely. It wasn't even trying to be correct for asm-goto. It assumed a call to a function not InlineAsm. I think we can deal with this later.
-Introduce an INLINEASM_BR instruction in machine IR that is like INLINEASM but is also a terminator. Hopefully I got the right flags set on this.
-Remove isAsmGotoTarget/hasAsmGotoTargetSuccessor from MachineBasicBlock

From visual inspection of the existing tests this seems to be working, but we'll see what the kernel builds have to say

kees added a comment.Feb 6 2019, 11:15 PM

I found a weird mis-compilation bug. Not sure if in LLVM or Clang half. Details here: https://reviews.llvm.org/D56571#1386973

This appears to be fixed with latest llvm, clang, and D53765.185703.

chandlerc accepted this revision.Feb 6 2019, 11:26 PM

This is looking really close. I think we should get some basic sanity checks w/ the kernel, and even if there is an obscure bug or two, land this and move to follow-up patches.

So, LGTM pending some basic sanity checks and addressing some nits below.

include/llvm/Analysis/SparsePropagation.h
333–334 ↗(On Diff #185703)

Use TI.isIndirectTerminator() here?

lib/IR/Instruction.cpp
789 ↗(On Diff #182342)

This should probably be isa<CallBase> now with a cleaner string, but not critical to fix in this patch.

lib/Transforms/InstCombine/InstCombineCalls.cpp
9 ↗(On Diff #185703)

nit: I always prefer the Oxford comma to reduce ambiguity.

4346–4350 ↗(On Diff #185703)

Wow this seems crazily expensive.

I guess it's OK because currently it can only occur for invoke. But the big-O here is quadratic: NumberOfUses * NumberOfSuccessors.

To make matters worse, this change moves to the generic successor APIs which will put a type check in the inner loop. =[

Maybe the simpler approach is to leave the old code for Invoke, and for CallBr simply return false if there are >0 users (which should be never currently).

This revision is now accepted and ready to land.Feb 6 2019, 11:26 PM

Address the last round of comments from Chandler. I'll commit this shortly.

nickdesaulniers added a comment.EditedFeb 7 2019, 3:19 PM

With
clang asm goto diff 185489
llvm asm goto diff 185868
I see a few more warnings with the kernel's x86 object file validator (that has found bugs previously with this patch set):
https://github.com/ClangBuiltLinux/linux/issues/351
https://github.com/ClangBuiltLinux/linux/issues/336#issuecomment-461631864

I'll try to provide reproducers for these ASAP.

  • EDIT **

Looks unrelated to this patch.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2019, 12:50 PM
nickdesaulniers added inline comments.Feb 9 2019, 4:00 PM
llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
4353
[582/1886] Building CXX object lib/Transf...LVMInstCombine.dir/InstCombineCalls.cpp.o
../lib/Transforms/InstCombine/InstCombineCalls.cpp: In member function ‘bool llvm::InstCombiner::transformConstExprCastCall(llvm::CallBase&)’:
../lib/Transforms/InstCombine/InstCombineCalls.cpp:4353:23: warning: unused variable ‘CBI’ [-Wunused-variable]
       if (CallBrInst *CBI = dyn_cast<CallBrInst>(Caller))
                       ^~~
This comment was removed by nickdesaulniers.
nickdesaulniers marked an inline comment as done.Feb 10 2019, 12:42 AM
nickdesaulniers added inline comments.
llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
4353
nhaehnle removed a subscriber: nhaehnle.Feb 12 2019, 1:14 AM