This is an archive of the discontinued LLVM Phabricator instance.

[RFC prototype] Implementation of asm-goto support in LLVM
ClosedPublic

Authored by craig.topper on Oct 26 2018, 7:58 AM.

Details

Summary

This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

Once there is a consensus wrt the design of the feature, I will modify
this patch and split it into pieces for a proper review.

Even though this patch is definitely a prototype, it was tested quite
well, including the heavy-lifting of building the linux kernel
(kudos to Nick Desaulniers for doing that!).

One known problem in the implementation that I would like to mention
is that terminating callbr instruction is translated into INLINEASM
machine instruction, which is not a terminator. There are several crude
workarounds that we had to put in the code (e.g. see AsmPrinter.cpp) for
that. It has to be addressed in a better way and I would appreciate your
thoughts on that.

Authored By: Alexander Ivchenko, Mikhail Dvoretckii

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3001 ↗(On Diff #181183)

Is this comment about terminator instruction still relevant?

3007 ↗(On Diff #181183)

This comment is also added to lib/CodeGen/MachineBasicBlock.cpp. What does that mean for "asm-goto sometimes seems to be the second-to-last operand?" Second to last operand to what?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2540–2541 ↗(On Diff #181183)

These can probably be moved closer to their uses below?

lib/Transforms/Utils/LoopSimplify.cpp
135 ↗(On Diff #181183)

From this part of the patch through below, there's multiple cases of:

if (indirectbr)
  foo();
if (callbr)
  foo();

That could be:

if (indirectbr || callbr)
  foo();

Attempt to fix a bug reported by @nickdesaulniers . We weren't using the label created for the blockaddress of the basic block when we printed the inline assembly block. So there was no guarantee the label we were using wouldn't be removed. To fix this I've kept the blockaddress object in the SelectionDAG instead of stripping it in the builder. I've created it directly as TargetBlockAddress to hide it from isel. Then in the printer I look for the operand being a blockaddress and grab the symbol for it. I had to make a few checks for TargetBlockAddress in other places to make this work. I admit I have no idea if this is the right thing to do at all, and open to any input on this.

craig.topper marked 3 inline comments as done and an inline comment as not done.Jan 14 2019, 10:51 AM
craig.topper added inline comments.
lib/Bitcode/Reader/BitcodeReader.cpp
4277–4279 ↗(On Diff #181183)

That would change the meaning of the else on the first if wouldn't it?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3001 ↗(On Diff #181183)

I think so. The CodeGen layer INLINEASM is not a terminator instruction.

3007 ↗(On Diff #181183)

I think this meant "instruction" not "operand"

Address some review comments. Include context this time

void added inline comments.Jan 14 2019, 12:59 PM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3006–3007 ↗(On Diff #181602)

This comment and the FIXME above it have me confused. Isn't the point of callbr to convert *all* asm goto statements into callbr? If so, then they should automatically be at the end of the basic block. What are the situations where it won't follow this?

craig.topper marked an inline comment as done.Jan 14 2019, 1:05 PM
craig.topper added inline comments.
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3006–3007 ↗(On Diff #181602)

This is MachineIR not IR. INLINEASM has its own MachineInstr opcode which is not a terminator. Currently asm-goto is still using that.

void added inline comments.Jan 14 2019, 1:10 PM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
3006–3007 ↗(On Diff #181602)

*smacks forehead* Of course. Nevermind.

Adding Hans so he is aware of the progress here.

test/Bitcode/callbr.ll
1 ↗(On Diff #181602)

This test case failed for me when running ninja check-all locally:

: 'RUN: at line 1';    llvm/build/bin/llvm-dis < llvm/test/Bitcode/callbr.ll.bc | llvm/build/bin/FileCheck llvm/test/Bitcode/callbr.ll
llvm/build/bin/llvm-dis: error: Invalid bitcode signature
void added a comment.Jan 15 2019, 12:42 PM

I like these changes. Could you add the documentation for the new instruction?

craig.topper added a subscriber: jyu2.

Add some documentation to the LangRef. Probably needs additional work.

Try to upload the patch as binary so the bitcode file go through correctly.

Add MO_BlockAddress to X86AsmPrinter's printOperand to fix a crash on some invalid inline assembly reported by @jyu2.

docs/LangRef.rst
6758 ↗(On Diff #181888)

s/adress/address/

6764 ↗(On Diff #181888)

s/calledd/called/

6769 ↗(On Diff #181888)

s/callbrs/callbr/

craig.topper marked an inline comment as done.Jan 15 2019, 4:27 PM
craig.topper added inline comments.
docs/LangRef.rst
6758 ↗(On Diff #181888)

That typo exists in Call and Invoke too. Fixed those in r351279

Fix LangRef typos

I think I see the issue. We need to return false for callbr from llvm::isSafeToSpeculativelyExecute. We probably also should fix to computeKnownBitsFromAssume to not visit an assume that is also the CxtI instruction. In that case we end up searching through the basic block until the terminator. And only because every terminator returns false from llvm::isSafeToSpeculativelyExecute do we end up terminating the loop without walking off the end of the basic block.

Add CallBr to isSafeToSpeculativelyExecute to keep isValidAssumeForContext from walking off the end of the basic block.

A few additional errors we're hitting when compiling an x86_64 kernel with allyesconfig (all configurable options and drivers "enabled").
https://github.com/ClangBuiltLinux/linux/issues/6#issuecomment-455360304

Working to triage these and provide smaller test cases. Please do not land until we understand these build failures.

craig.topper marked 10 inline comments as done.Jan 17 2019, 8:51 PM
craig.topper added inline comments.
include/llvm/Bitcode/LLVMBitCodes.h
477 ↗(On Diff #182342)

Can this use the 14 or does it need a new number? Not sure how holes get created here.

include/llvm/IR/Instructions.h
3864 ↗(On Diff #182342)

I should probably drop this comment as I'm not sure what the previous author meant.

lib/Bitcode/Reader/BitcodeReader.cpp
4294 ↗(On Diff #182342)

Should this be "(CCInfo >> bitc::CALL_CCONV) & CallingConv::MaxID" instead?

lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
441 ↗(On Diff #182342)

I think this is an 80 column violation

lib/IR/Instruction.cpp
789 ↗(On Diff #182342)

instructions is misspelled

lib/Transforms/InstCombine/InstCombineCalls.cpp
4259 ↗(On Diff #182342)

Can this be merged using CallBase?

4282 ↗(On Diff #182342)

Merge with above check for invoke

4403 ↗(On Diff #182342)

Should we have an interface that doesn't require copying the transfer list? Just ask for a specific transfer number instead?

lib/Transforms/InstCombine/InstructionCombining.cpp
929 ↗(On Diff #182342)

Should we merge these by doing isa<InvokeInst> || isa<CallBrInst> then just casting to Instruction to get the parent?

lib/Transforms/Utils/Local.cpp
1004 ↗(On Diff #182342)

Merge the dyn_cast into the if

This looks like a codegen bug, from compiling more of the Linux kernel. It seems that an optimization from -O1 is causing the problem: https://github.com/ClangBuiltLinux/linux/issues/320#issuecomment-455435791. I'm not sure how to help narrow it down further (which pass?). The output is from creduce and I've narrowed down the set of flags to Clang.

lib/Transforms/Utils/InlineFunction.cpp
1512 ↗(On Diff #182342)

While I can understand punting in v1, this does cause at least one -Wsection like warning from the kernel: https://github.com/ClangBuiltLinux/linux/issues/317. I call it section like, because the kernel has its own tool that it uses to validate sections; some run-once kernel code and data is put into init sections that get discarded after first use to help save kernel memory.

The issue is that this return false; makes __attribute__((always_inline)) fail. We can hack around that by converting always inline functions into macros, but I don't see that flying upstream.

So not the highest priority (it's just a warning, with no issues at runtime), but would be nice to have.

This looks like a codegen bug, from compiling more of the Linux kernel. It seems that an optimization from -O1 is causing the problem: https://github.com/ClangBuiltLinux/linux/issues/320#issuecomment-455435791. I'm not sure how to help narrow it down further (which pass?). The output is from creduce and I've narrowed down the set of flags to Clang.

You try using -opt-bisect-limit to narrow down the pass https://llvm.org/docs/OptBisect.html

Fix most of my own review comments. Fix what was probably a bug in SimplifyCFG.

I have not looked at the inlinining issue, the link error, or the ICE from @void yet.

Hopefully fix https://github.com/ClangBuiltLinux/linux/issues/320#issuecomment-455435791. Had to exclude CallBr predecessors in JumpThreading.

Hopefully fix https://github.com/ClangBuiltLinux/linux/issues/320#issuecomment-455435791. Had to exclude CallBr predecessors in JumpThreading.

Yes looks good; I've closed that issue as unreproducible.

For https://github.com/ClangBuiltLinux/linux/issues/317, I think we're hitting https://bugs.llvm.org/show_bug.cgi?id=39560. Reproducer https://gist.github.com/nickdesaulniers/a69b44159f4d42a34fc1b5d18cb7b608. If we fix that, then I should be able to drop my out of tree kernel patch.

Finally, https://github.com/ClangBuiltLinux/linux/issues/319 looks troubling, but I haven't really dug into it. It *may* be due to my out of tree kernel patch (https://github.com/ClangBuiltLinux/linux/commit/0670525ded48caf958b6535d388e6478b6b14a4b), but if we can support inlining then it looks like my out of tree patch should not be necessary: https://godbolt.org/z/exFrxS. Otherwise, it may be another codegen bug due to this patch.

Do you have any tests which perform a callbr with a normal function target rather than inline asm?

Do you have any tests which perform a callbr with a normal function target rather than inline asm?

No, and Verifier.cpp currently errors for it.

Do you have any tests which perform a callbr with a normal function target rather than inline asm?

No, and Verifier.cpp currently errors for it.

I think it would be good to mention in this explicitly in the LangRef.

Make the LangRef statement about only supporting inline assembly a little stronger.

-Add a new isAsmGotoTarget flag to MachineBasicBlock. Use it to preserve predecessors/successors in a few places similar to isEHPad. Probably more work to do here, but I think it at least fixes https://github.com/ClangBuiltLinux/linux/issues/319
-Remove SelectionDAGBuilder support for doing anything with CallBr other than inline assembly. The code can't be tested today so there's really no point in having it. Assuming inline assembly only made setting the flag above a little simpler.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2533 ↗(On Diff #183623)

needs a (void)Callee; otherwise release builds produce:

../lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2531:16: warning: unused variable 'Callee' [-Wunused-variable]
  const Value *Callee(I.getCalledValue());
               ^

or can the whole expression just fit in the assert?

Fix unused variable in release builds

Hopefully make callbr.ll.bc binary again

Really make it binary

nickdesaulniers added a comment.EditedJan 25 2019, 11:50 PM

Add a new isAsmGotoTarget flag to MachineBasicBlock

Is there a way to add a test for this case that is fixed by this, so that it never regresses without setting off alarms?

-Block tail merging in BranchFolding.cpp if the predecessor block contains an asm goto. This is what we do for exception handling.

-Simplify the code in isBlockOnlyReachableByFallthrough by using the isAsmGotoTarget flag.
-Support isAsmGotoTarget() in MachineBasicBlock::print

-Add another isAsmGotoTarget check to BranchFolding.cpp in place where we already checked isEHPad. Seems to prevent hang on fault.c reported here https://github.com/ClangBuiltLinux/linux/issues/330

-Disable critical edge splitting for the non-fallthrough edge of a callbr. Otherwise the blockaddress and the control flow edge get out of sync.

-Disable GVN PRE across callbr. This prevents it from queueing up a critical edge split that we can't perform causing an infinite loop. Might be able to do the the fallthrough edge, but left fixme for now.

-Block GVN Load PRE for CallBr critical edges.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2533 ↗(On Diff #183623)

Still seeing this in Diff 183921.

Restore a lost fix for a unused variable warning

-Teach shrink wrapping pass to ensure the asm goto targets are at least at the boundary of prologue/epilogue. This is what we do for EHPad, but may not be quite right for asm-goto.

Sorry it took so long, but made a full pass through this. Just want to point out that this patch is already in really great shape due to the huge amount of work from the original author, Craig taking it over, and the multiple rounds of review from Bill and Nick among others.

Most of my comments are pretty superficial and not hard to address I suspect. I have one big question and one big comment. I've referenced these in-line below as I went through the code but wanted to call them out at a highlevel:

  1. Why do we need the block address machine operand structure? I'm not saying to get rid of it (yet), I just don't understand the need to support both it and direct MBB operands.
  1. Why do we need to special case asm-goto targets in MI? I feel like this should somewhat fall out of the handling of address-taken basic blocks that can be jumped to somewhat opaquely. It would be really nice if there were a relatively common way of handling these rather than special casing the targets of asm-goto everywhere when it feels like the fundamental special treatment isn't actually about asm-goto at all. But maybe I've missed the critical thing that actually makes it special to asm-goto. If so, just help mme spot it1 =D

I haven't looked as closely at the actual optimizer changes yet as I'd like, but I think those are by-and-large going to be pretty straightforward.

include/llvm/IR/CallSite.h
53 ↗(On Diff #184024)

Bleh. We should just finish killing CallSite. But I guess this patch shouldn't wait on that.

89–91 ↗(On Diff #184024)

FWIW, I.getInt() == 0 seems way more clear.

259–264 ↗(On Diff #184024)

Can this be simplified by just delegating to CallBase? The amount of magic constants / implicit contracts here worries me.

321–324 ↗(On Diff #184024)

Can we just sink this into CallBase?

600–608 ↗(On Diff #184024)

Delegate to CallBase?

chandlerc added inline comments.Jan 29 2019, 1:51 AM
include/llvm/IR/InstVisitor.h
273–279 ↗(On Diff #184024)

Need to update this code path.

include/llvm/IR/Instructions.h
4036–4038 ↗(On Diff #184024)

See above, this I think should sink to CallBase.

lib/AsmParser/LLParser.cpp
6197–6198 ↗(On Diff #184024)

bundles missing here?

6197–6198 ↗(On Diff #184024)

The syntax here diverges from both switch and indirectbr in surprising ways here. They don't use any keywords like or jump to introduce thi list of labels, they just directly put it there.

While abstractly, I somewhat like this syntax, I wonder if it would be better te *exactly* match the syntax for switch which has the same default label with a list of other labels structure to it. Consistency seems good here unless it causes problems?

6250 ↗(On Diff #184024)

Not a small vector?

6269 ↗(On Diff #184024)

Weird indentatino.

6279 ↗(On Diff #184024)

I'd use int and llvm::seq, but maybe better to remain consistent.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2973–2974 ↗(On Diff #184024)

Isn't one of the targets a fallthrough? Maybe i've not made it far enough, and the default target is different from the target here.

lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
439–443 ↗(On Diff #184024)

I'm a bit surprised we don't model this as an MBB operand to begin with if it *has* to be a direct block address anyways? Curious why.

lib/CodeGen/GlobalISel/IRTranslator.cpp
1246 ↗(On Diff #181183)

Maybe have the verifier reject if we can't lower here? Regardless, the comment still seems to need some amount of fixing.

deopt bundles are a reasonable thing, and just not really a reasonable thing (at this stage) to support on callbr.

1300 ↗(On Diff #184024)

This complex to thread through?

lib/CodeGen/IndirectBrExpandPass.cpp
151–153 ↗(On Diff #184024)

I think we have to fix this before this goes very far. I don't actually know how this is working in practice in our biggest customer (the Linux Kernel) as it should be using things like retpolines and needing this code to work.

I also think that this entire pass, while a reasonable thing to have, was a bad idea for retpoline lowering. We found retpolines to be faster than the branch trees created by this in most cases w/ many branches. So this needs a non-trivial update. Not sure who has the time to do this at this point though, I'll ask around w/ folks familiar with this bit of code.

It would be good to at least update the comment to English prose.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2535–2536 ↗(On Diff #184024)

Wait, we handle catchswitch combined with this? That seems .... ambitious.

2540–2547 ↗(On Diff #184024)

Weird that we handle probabilities here but not in Global ISel

lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

Wait, they do?

I don't understand -- the callbr is always a terminator. I understand that it may expand to lots of instructions, but isn't that opaque at this level?

lib/Target/X86/X86AsmPrinter.cpp
256–260 ↗(On Diff #184024)

The fact that we have to add an entire machine operand layer here makes me feel more strongly that we should just be using the existing support for MBB machine operands rather than this.... But maybe there is a reason I'm just missing so far.

craig.topper marked 4 inline comments as done.Jan 29 2019, 8:50 AM

Sorry it took so long, but made a full pass through this. Just want to point out that this patch is already in really great shape due to the huge amount of work from the original author, Craig taking it over, and the multiple rounds of review from Bill and Nick among others.

Most of my comments are pretty superficial and not hard to address I suspect. I have one big question and one big comment. I've referenced these in-line below as I went through the code but wanted to call them out at a highlevel:

  1. Why do we need the block address machine operand structure? I'm not saying to get rid of it (yet), I just don't understand the need to support both it and direct MBB operands.

Maybe I went the wrong way on this, but here's what I saw. The existence of the BlockAddress instruction in IR caused some portion of MachineIR handling to know that it needed to create a label of the form .Ltmp[0-9]+. This is helpfully annotated in the output assembly as "block address taken" and sometimes when there's a bug "address taken of block removed by CodeGen". The first version of this patch I inherited stripped the BlockAddress when the SelectionDAG was created. The resulting assembly would then print the .LBB label for the block in the inline asm instead. We ended up in a case where the LBB label didn't exist in the final output, but the Ltmp did. This failed to assemble obviously. Maybe that was really pointing to passes like BranchFolding.cpp doing bad things which we encountered later. But at the time I tried to fix it so that the inline asm printing would reference the .Ltmp label because that seemed like the label we really wanted.

  1. Why do we need to special case asm-goto targets in MI? I feel like this should somewhat fall out of the handling of address-taken basic blocks that can be jumped to somewhat opaquely. It would be really nice if there were a relatively common way of handling these rather than special casing the targets of asm-goto everywhere when it feels like the fundamental special treatment isn't actually about asm-goto at all. But maybe I've missed the critical thing that actually makes it special to asm-goto. If so, just help mme spot it1 =D

I haven't looked as closely at the actual optimizer changes yet as I'd like, but I think those are by-and-large going to be pretty straightforward.

Maybe it shouldn't be special to asm-goto. But its clear that we've already been treating exception handling pads specially, but not address-taken blocks so I've just been special casing asm goto to avoid changing anything about existing address taken. I'm not sure what we do for indirect branches, but I suspect analyzeBranch returning unanalyzable for indirect jumps is part of it. But that doesn't work for asm goto because our terminator is a direct jump which is analyzable.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2973–2974 ↗(On Diff #184024)

The only blocks that are marked as isAsmGotoTarget are the ones that aren't the default. See the code in SelectionDAGBuilder.

lib/CodeGen/GlobalISel/IRTranslator.cpp
1300 ↗(On Diff #184024)

Original author copied that comment out of visitInvokeInst

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2535–2536 ↗(On Diff #184024)

That was copy and pasted from Invoke. I'll remove it

lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

This is MachineIR. CallBr isn't a thing there. CallBr was expanded to an INLINEASM instruction followed by an unconditional jump during SelectionDAGBuilder. The INLINEASM block is not a terminator so its effectively "the middle of a basic block". This pass tried to shove a RBP pop between the INLINEASM instruction and the unconditional jump. In the case that exposed the bug, the unconditional jump went to a block that also jumped to the same inline asm target. So as far as the dominator tree was concerned, the block containing the INLINEASM dominated the other blocks so it was chosen to hold the RBP pop. But since the INLINEASM itself can jump that wasn't correct.

I'm not sure what we do for indirect branches, but I suspect analyzeBranch returning unanalyzable for indirect jumps is part of it. But that doesn't work for asm goto because our terminator is a direct jump which is analyzable.

Did you consider making blocks which contain an asm goto unanalyzable? Intuitively, that's what I would expect given an asm goto is a terminator at the IR level. But I guess that makes handling the fallthrough case efficiently more difficult?

craig.topper marked 3 inline comments as done.Jan 29 2019, 6:09 PM
craig.topper added inline comments.
lib/AsmParser/LLParser.cpp
6250 ↗(On Diff #184024)

Call and Invoke both use std::vector here...

Address some of Chandler's comments

Looking at updated patch and some of the optimizer bits, but some inline responses to discussion points below:

Sorry it took so long, but made a full pass through this. Just want to point out that this patch is already in really great shape due to the huge amount of work from the original author, Craig taking it over, and the multiple rounds of review from Bill and Nick among others.

Most of my comments are pretty superficial and not hard to address I suspect. I have one big question and one big comment. I've referenced these in-line below as I went through the code but wanted to call them out at a highlevel:

  1. Why do we need the block address machine operand structure? I'm not saying to get rid of it (yet), I just don't understand the need to support both it and direct MBB operands.

Maybe I went the wrong way on this, but here's what I saw. The existence of the BlockAddress instruction in IR caused some portion of MachineIR handling to know that it needed to create a label of the form .Ltmp[0-9]+. This is helpfully annotated in the output assembly as "block address taken" and sometimes when there's a bug "address taken of block removed by CodeGen". The first version of this patch I inherited stripped the BlockAddress when the SelectionDAG was created. The resulting assembly would then print the .LBB label for the block in the inline asm instead. We ended up in a case where the LBB label didn't exist in the final output, but the Ltmp did. This failed to assemble obviously. Maybe that was really pointing to passes like BranchFolding.cpp doing bad things which we encountered later. But at the time I tried to fix it so that the inline asm printing would reference the .Ltmp label because that seemed like the label we really wanted.

I've spent more time thinking about this. I think the difference here is the following... With the "block address" behavior we assume the address more than taken -- it is *captured* and its uses may be unanalyzable. As a consequence, it *cannot* be updated when we do things like rework the control flow.

On the flip side, the MBB operands are likely often assumed to be easily enumerated and updated when we're rewriting the control flow, and so the fact that we can't do that results in breakages.

At least, this is my understanding. Match yours?

Given this, I think the current "block address" stuff may be correct -- I don't see anything that indicates the asm-goto couldn't escape or even capture the addresses of its successors and then re-use them in some way, expecting them to be "real" addresses. If this also makes sense to you, likely good to add some comments for future readers.

  1. Why do we need to special case asm-goto targets in MI? I feel like this should somewhat fall out of the handling of address-taken basic blocks that can be jumped to somewhat opaquely. It would be really nice if there were a relatively common way of handling these rather than special casing the targets of asm-goto everywhere when it feels like the fundamental special treatment isn't actually about asm-goto at all. But maybe I've missed the critical thing that actually makes it special to asm-goto. If so, just help mme spot it1 =D

Maybe it shouldn't be special to asm-goto. But its clear that we've already been treating exception handling pads specially, but not address-taken blocks so I've just been special casing asm goto to avoid changing anything about existing address taken. I'm not sure what we do for indirect branches, but I suspect analyzeBranch returning unanalyzable for indirect jumps is part of it. But that doesn't work for asm goto because our terminator is a direct jump which is analyzable.

Ugh, yeah.

I think that we should consolidate all of this behind something like address-taken, but I think that can be follow-up work. I suspect that a decent number of the things you've fixed here are actually latent bugs w/ indirectbr and address-of-label extensions that we've just never managed to hit. =[[[[

But I think it's fine to fix that incrementally going forward by digging in and commoning these representations as much as possible.

lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

Ah, that explains the thing I was missing...

I had assumed that we would expand this to an INLINEASM instruction that *is* a terminator. In MIR we can have multiple terminators at the end of a block, and this is a common pattern, so could we not have both instructions be marked as terminators and avoid special casing this?

Made it through the optimizer code. Really minor changes here, I think this is looking good. Probable the biggest question marks are in the MI representation.

Also, how best to track on-going work that is going to be prioritized in the short term? Mostyl looking for what would be helpful for you all if the FIXME suggested below (or already in the code) are likely to be lost.

lib/Transforms/InstCombine/InstCombineCalls.cpp
4217–4219 ↗(On Diff #184223)

Check for .isTerminator() would I think make this a bit more self-explanatory.

lib/Transforms/InstCombine/InstructionCombining.cpp
923–925 ↗(On Diff #184223)

May also be a tiny bit clearer by using terminator based terminology and APIs.

lib/Transforms/Scalar/GVN.cpp
2177–2184 ↗(On Diff #184223)

Can just directly test on the CallBase now.

lib/Transforms/Utils/LoopSimplify.cpp
245–246 ↗(On Diff #184223)

Ok, I've seen this pattern too many times. ;]

I'd suggest adding isIndirectTerminator or hasIndirectControlFlow or some such?

lib/Transforms/Utils/SimplifyCFG.cpp
1269–1270 ↗(On Diff #184223)

FIXME as we can likely build an analogous safety predicate to the invoke one for CallBr.

1451–1456 ↗(On Diff #184223)

Fold into a test on CallBase.

craig.topper marked an inline comment as done.Jan 29 2019, 7:26 PM
craig.topper added inline comments.
lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

It looks like whether an instruction is a terminator is currently encoded in the MCInstrDesc entry corresponding to an opcode. So I think that means we would need a new flavor of INLINEASM opcode?

chandlerc added inline comments.Jan 29 2019, 8:11 PM
lib/CodeGen/ShrinkWrap.cpp
505–507 ↗(On Diff #184024)

Bleh, yeah.

I think that's (marginally) cleaner, but I have no idea how much work it is to thread an INLINEASM_BR node through the backend. If it's too much, could defer this.

Reason that I think it is cleaner: I suspect we'll keep finding ways in which we mishandle this, but if it turns into a terminator sequence I think the rest of the backend will start to behave correctly.

craig.topper marked 3 inline comments as done.

-More review comments addressed.
-Rebase after committing some of the CallBase related changes

Update the CallBrInst creation APIs to consistently accept a callee-type argument. This makes CallBrInst consistent with call and invoke after r351121

Add a bailout for asm goto target to Machine Block Placement. Fixes export4.zip from here https://github.com/ClangBuiltLinux/linux/issues/336

Update terminology to replace Transfers with IndirectDests. And Fallthrough with DefaultDest

Update CallBrInst creation interfaces to use FunctionCallee instead of Function* to match CallInst and InvokeInst

A few minor comments.

I assume the big outstanding item here is figuring out the MIR representation? Anything else?

docs/LangRef.rst
6910 ↗(On Diff #185140)

Maybe "branch to" or "jump to"? I don't want to imply that it has to be done via call/return.

lib/CodeGen/GlobalISel/IRTranslator.cpp
1246 ↗(On Diff #181183)

(still seems to need update)

1300 ↗(On Diff #184024)

(comment remains)

lib/CodeGen/IndirectBrExpandPass.cpp
151–153 ↗(On Diff #184024)

FWIW, I think I now understand -- I think the Linux kernel tester we have isn't building w/ retpolines enabled in the compiler, so we just haven't seen this explode. =[

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2535–2536 ↗(On Diff #184024)

(still needs fixing)

craig.topper marked an inline comment as done and an inline comment as not done.Feb 4 2019, 7:44 PM
craig.topper added inline comments.
include/llvm/Bitcode/LLVMBitCodes.h
477 ↗(On Diff #182342)

@chandlerc do you know the answer to the question I left here?

lib/Transforms/InstCombine/InstructionCombining.cpp
923–925 ↗(On Diff #184223)

I'm just going to remove this change because it would require a CallBrInst that returned a value, but we don't support that right now.

craig.topper marked an inline comment as done and an inline comment as not done.Feb 4 2019, 7:45 PM
craig.topper added inline comments.
lib/Transforms/InstCombine/InstructionCombining.cpp
923–925 ↗(On Diff #184223)

Oops I meant to remove this comment. I've now fixed the code to use terminator terminology and apis

chandlerc added inline comments.Feb 4 2019, 7:59 PM
include/llvm/Bitcode/LLVMBitCodes.h
477 ↗(On Diff #182342)

Not directly... But I think I can explain how to answer it.

Does any version of LLVM bitcode that we still support reading use 14? If so, then we need to leave it alone. If not, I think you can take it.

But maybe good to double check w/ someone who has thought more fully about bitcode backwards compat.

craig.topper marked 2 inline comments as done.Feb 5 2019, 11:57 AM
craig.topper added inline comments.
lib/CodeGen/GlobalISel/IRTranslator.cpp
1246 ↗(On Diff #181183)

Turns out this comment was copied directly from the InvokeInst handler above.

1300 ↗(On Diff #184024)

Does global isel do anything with probabilities anywhere else? I can't see any code obviously doing it. SelectionDAGBuilder has some helper functions that we used for CallBr, but I don't see equivalent helpers here or any calls passing probability to addSuccessor.

-Change the bitcode encoding to a value that's never been used so I don't have to figure out the compatibility story.
-Add the bitcode value to the llvm-bcanalyzer
-Fix "returns" in LangRef
-Remove comment from SelectionDAGBuilder

-Update vim syntax highlight file to include 'callbr'
-Add test cases for the 2 GVN hangs related to critical edge splitting that I had fixed previously.

kees added a subscriber: kees.Feb 6 2019, 5:56 AM

I found a weird mis-compilation bug. Not sure if in LLVM or Clang half. Details here: https://reviews.llvm.org/D56571#1386973

-Fix crash in bitcode writer when writing out callbr with attributes
-Add test backend test cases and jump threading test case for previously fixed bugs

-Remove GlobalISel implementation completely. It wasn't even trying to be correct for asm-goto. It assumed a call to a function not InlineAsm. I think we can deal with this later.
-Introduce an INLINEASM_BR instruction in machine IR that is like INLINEASM but is also a terminator. Hopefully I got the right flags set on this.
-Remove isAsmGotoTarget/hasAsmGotoTargetSuccessor from MachineBasicBlock

From visual inspection of the existing tests this seems to be working, but we'll see what the kernel builds have to say

kees added a comment.Feb 6 2019, 11:15 PM

I found a weird mis-compilation bug. Not sure if in LLVM or Clang half. Details here: https://reviews.llvm.org/D56571#1386973

This appears to be fixed with latest llvm, clang, and D53765.185703.

chandlerc accepted this revision.Feb 6 2019, 11:26 PM

This is looking really close. I think we should get some basic sanity checks w/ the kernel, and even if there is an obscure bug or two, land this and move to follow-up patches.

So, LGTM pending some basic sanity checks and addressing some nits below.

include/llvm/Analysis/SparsePropagation.h
333–334 ↗(On Diff #185703)

Use TI.isIndirectTerminator() here?

lib/IR/Instruction.cpp
789 ↗(On Diff #182342)

This should probably be isa<CallBase> now with a cleaner string, but not critical to fix in this patch.

lib/Transforms/InstCombine/InstCombineCalls.cpp
9 ↗(On Diff #185703)

nit: I always prefer the Oxford comma to reduce ambiguity.

4346–4350 ↗(On Diff #185703)

Wow this seems crazily expensive.

I guess it's OK because currently it can only occur for invoke. But the big-O here is quadratic: NumberOfUses * NumberOfSuccessors.

To make matters worse, this change moves to the generic successor APIs which will put a type check in the inner loop. =[

Maybe the simpler approach is to leave the old code for Invoke, and for CallBr simply return false if there are >0 users (which should be never currently).

This revision is now accepted and ready to land.Feb 6 2019, 11:26 PM

Address the last round of comments from Chandler. I'll commit this shortly.

nickdesaulniers added a comment.EditedFeb 7 2019, 3:19 PM

With
clang asm goto diff 185489
llvm asm goto diff 185868
I see a few more warnings with the kernel's x86 object file validator (that has found bugs previously with this patch set):
https://github.com/ClangBuiltLinux/linux/issues/351
https://github.com/ClangBuiltLinux/linux/issues/336#issuecomment-461631864

I'll try to provide reproducers for these ASAP.

  • EDIT **

Looks unrelated to this patch.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2019, 12:50 PM
nickdesaulniers added inline comments.Feb 9 2019, 4:00 PM
llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
4353
[582/1886] Building CXX object lib/Transf...LVMInstCombine.dir/InstCombineCalls.cpp.o
../lib/Transforms/InstCombine/InstCombineCalls.cpp: In member function ‘bool llvm::InstCombiner::transformConstExprCastCall(llvm::CallBase&)’:
../lib/Transforms/InstCombine/InstCombineCalls.cpp:4353:23: warning: unused variable ‘CBI’ [-Wunused-variable]
       if (CallBrInst *CBI = dyn_cast<CallBrInst>(Caller))
                       ^~~
This comment was removed by nickdesaulniers.
nickdesaulniers marked an inline comment as done.Feb 10 2019, 12:42 AM
nickdesaulniers added inline comments.
llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
4353
nhaehnle removed a subscriber: nhaehnle.Feb 12 2019, 1:14 AM