This is an archive of the discontinued LLVM Phabricator instance.

Allow "callbr" to return non-void values
ClosedPublic

Authored by void on Nov 5 2019, 2:34 PM.

Details

Summary

Terminators in LLVM aren't prohibited from returning values. This means that
the "callbr" instruction, which is used for "asm goto", can support "asm goto
with outputs."

This patch removes all restrictions against "callbr" returning values. The
heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's
a terminator, and the code generator doesn't allow non-terminator instructions
after a terminator. In order to correctly model the feature, we need to copy
outputs from "INLINEASM_BR" into virtual registers. Of course, those copies
aren't terminators.

To get around this issue, we split the block containing the "INLINEASM_BR"
right before the "COPY" instructions. This results in two cheats:

  • Any physical registers defined by "INLINEASM_BR" need to be marked as live-in into the block with the "COPY" instructions. This violates an assumption that physical registers aren't marked as "live-in" until after register allocation. But it seems as if the live-in information only needs to be correct after register allocation. So we're able to get away with this.
  • The indirect branches from the "INLINEASM_BR" are moved to the "COPY" block. This is to satisfy PHI nodes.

I've been told that MLIR can support this handily, but until we're able to
use it, we'll have to stick with the above.

Diff Detail

Event Timeline

void created this revision.Nov 5 2019, 2:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2019, 2:34 PM
void updated this revision to Diff 228306.Nov 7 2019, 1:45 PM

Update language reference.

nickdesaulniers added inline comments.Nov 7 2019, 2:42 PM
llvm/docs/LangRef.rst
7331

Rather than the distinction between "normal" and "other," can we use the terms "fallthrough" and "indirect?"

7334

I assume that's the WIP part? Otherwise there's nothing else here that explicitly adds poison from what I can tell.

void marked an inline comment as done.Nov 7 2019, 4:31 PM
void added inline comments.
llvm/docs/LangRef.rst
7334

You're able to have the "fallthrough" listed in the "indirect" list as well. I'm just pointing out that if the value hasn't been calculated and shoved into the output object (register, etc.) then using that value in the fallthrough branch causes a poisoned value.

void updated this revision to Diff 228336.Nov 7 2019, 4:31 PM

Update description to use "fallthrough" and "indirect"

void updated this revision to Diff 228596.Nov 10 2019, 1:15 AM

Add testcase for multiple output constraints.

void edited the summary of this revision. (Show Details)Nov 10 2019, 6:25 PM

This change is now ready for review. PTAL.

void updated this revision to Diff 228813.Nov 11 2019, 10:32 PM

Don't split critical edges into a callbr indirect destination.

void updated this revision to Diff 228820.Nov 11 2019, 11:56 PM

Temporary change to determine if a block is used in a callbr.

void updated this revision to Diff 228926.Nov 12 2019, 11:30 AM

Simplify check that a basic block is the target of an indirect jump from callbr
and add testcase for this.

void added a comment.Nov 14 2019, 10:29 PM

Friendly ping. :-)

nickdesaulniers requested changes to this revision.Nov 18 2019, 10:51 AM

The value returned by "callbr" is only valid on the "normal" path. Return

values used on "abnormal" paths are poisoned.

Please edit the revision comment to use the terms "fallthrough" and "indirect" targets or paths, rather than "normal" and "abnormal."

llvm/docs/LangRef.rst
7332

I still don't understand how the poison value gets added. Can you please clarify, or maybe add a test case that does?

7332

If the outputs are only valid on the fallthrough, will that allow us to build the extension to the existing asm goto that has been asked for? I worry that if we implement this restriction, it may not be fully useful unless the outputs on all paths are valid. Do you have a link to the previous discussion about this?

llvm/lib/IR/BasicBlock.cpp
481 ↗(On Diff #228926)

In the past, I've been shot down for adding to BasicBlock. Can this be implemented as a method on a CallBrInst that accepts a BasicBlock?

487 ↗(On Diff #228926)

PI is only iterating BasicBlocks in this Function, right? Can't callbr refer to labels in other functions (unlike asm goto in C, due to scoping)?

This revision now requires changes to proceed.Nov 18 2019, 10:51 AM
void edited the summary of this revision. (Show Details)Nov 18 2019, 1:00 PM
void updated this revision to Diff 229918.Nov 18 2019, 1:51 PM
void marked 4 inline comments as done.

Don't add to BasicBlock, but just loop directly.

llvm/docs/LangRef.rst
7332

I still don't understand how the poison value gets added. Can you please
clarify, or maybe add a test case that does?

Poison values are a result of execution, not specified directly. So during execution, the value becomes poison if used in the indirect block, but not if used in the fallthrough. It's just fancy language for saying "don't do that!" :-)

From the docs:

Poison value behavior is defined in terms of value dependence:
  ...
  * An instruction control-depends on a terminator instruction if the terminator instruction has multiple successors and the instruction is always executed when control transfers to one of the successors, and may not be executed when control is transferred to another.
7332

After going back and forth with James and Hal, I now believe that having the value valid only on the fallthrough path will be sufficient for most purposes (if not all). However, this implementation doesn't restrict us from relaxing it in the future, but I would like to see a use case for that before we sink a lot of time into it.

Below is a link to the start of the thread.

http://lists.llvm.org/pipermail/llvm-dev/2019-June/133428.html

llvm/lib/IR/BasicBlock.cpp
481 ↗(On Diff #228926)

Better, I'll just inline it. :-)

487 ↗(On Diff #228926)

I don't believe it can refer to labels in other functions. Anyway, I changed this so that this function is gone.

void added a comment.Nov 24 2019, 1:59 PM

Ping? :-)

Ping? :-)

Sorry, busy week going into Thanksgiving holiday. I really think this needs more time to bake in code review and input from @jyknight (and maybe @rnk ). Particularly, re-reading through the thread (https://lists.llvm.org/pipermail/llvm-dev/2019-June/133428.html), I'm curious to see more tests for cases that we don't expect to work, particularly for the clang patch I'm curious if we can somehow warn the user?

Should you add a test case here for the case @jyknight describes in: https://lists.llvm.org/pipermail/llvm-dev/2019-June/133431.html (ie. differing output constraints)?

void updated this revision to Diff 231008.Nov 25 2019, 11:17 PM

Add testcase for callbrs with different output constraints jumping to the same
default location.

rnk added inline comments.Dec 3 2019, 3:48 PM
llvm/lib/CodeGen/MachineBasicBlock.cpp
1118

As a minor efficiency golf thing, I would start off this chain of checks with if (Succ->hasAddressTaken()). In most cases, which will handle splitting the critical edge to the fall through destination, which will not be marked as address taken.

1119

This code really should be looking at the machine IR to know if there is a callbr. I think INLINEASM_BR is target independent, so you really can just loop through the MachineInstrs looking for it, and then look for the MBB in the operand list. I wonder if we should have a flag on the MBB to indicate that it ends in an INLINEASM_BR, since those break the MIR invariant that terminators are grouped together at the end of the block.

1120

Is it possible for a BB to be both an indirect successor and a fallthrough successor? I suppose that could be the case with the Linux macro that gets the current PC.

In any case, it's probably safe to remove this condition, and then we don't have to worry.

llvm/test/CodeGen/X86/callbr-asm-outputs.ll
1

Please add -verify-machineinstrs to the tests, since that isn't usually on by default, and it may highlight some latent verifier issues I don't know about.

llvm/test/CodeGen/X86/callbr-asm.ll
2

Please add -verify-machineinstrs.

void updated this revision to Diff 232049.Dec 4 2019, 12:45 AM
void marked 6 inline comments as done.

Add "verify-machineinstrs" flag to callbr tests. Relax some of the verification
tests to account for callbrs.

llvm/lib/CodeGen/MachineBasicBlock.cpp
1119

The problem is that I need to between the indirect branches and the default one from the callbr instruction. It's way more cumbersome to do that with the INLINEASM_BR instruction. If you feel strongly about it I'll make the change, but the INLINEASM_BR instruction is already strongly tied to callbr. Do you expect it to change anytime in the future?

1120

It is possible. I have a testcase for it in this patch. :-)

llvm/lib/CodeGen/MachineBasicBlock.cpp
1119–1120

I think it may be worthwhile to swap these conditions with each other, too. It's highly unlikely the BB is terminated with a CallBrInst, yet highly likely the MBB refers to a BB.

1120

But you don't have a test case for what @rnk is asking about. (Unsure if it would be necessary).

I think @rnk is getting at this case from C:

void foo(void) {
  asm goto ("#NICK":: "r"(&&hello) :: hello);
hello:
  return;
}

Clang will emit this as:

define dso_local void @foo() #0 {
entry:
  callbr void asm sideeffect "#NICK", "r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@foo, %hello), i8* blockaddress(@foo, %hello)) #1
          to label %asm.fallthrough [label %hello], !srcloc !2

asm.fallthrough:                                  ; preds = %entry
  br label %hello

hello:                                            ; preds = %asm.fallthrough, %entry
  ret void
}

ie. the blockaddress is passed twice, once as the address of a label (GNU C extension), once as the indirect destination of the asm goto.

So to @rnk 's question:

Is it possible for a BB to be both an indirect successor and a fallthrough successor?

It is valid LLVM IR to have a BB be both; Clang today (or with https://reviews.llvm.org/D69876) will not emit such formation (but could).

ie. imagine the above:

callbr void asm sideeffect "#NICK", "r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@foo, %hello), i8* blockaddress(@foo, %hello)) #1

to label %asm.fallthrough [label %hello], !srcloc !2

to instead be:

callbr void asm sideeffect "#NICK", "r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@foo, %hello), i8* blockaddress(@foo, %asm.fallthrough)) #1

to label %asm.fallthrough [label %asm.fallthrough], !srcloc !2

I still don't understand @rnk 's point about

In any case, it's probably safe to remove this condition, and then we don't have to worry.

though.

1122–1123

please use a range-based for:

for (const BasicBlock* ID : cbr->getIndirectDests())
  if (ID == bb)

or llvm::any_of.

void updated this revision to Diff 234402.Dec 17 2019, 3:23 PM
void marked 3 inline comments as done.

Update

Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2019, 3:23 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
void added inline comments.Dec 17 2019, 3:23 PM
llvm/lib/CodeGen/MachineBasicBlock.cpp
1120

I meant that I had a testcase that required the conditional he suggested I remove. Sorry for the confusion.

You're right that we can generate the clang code where the fallthrough is the same as the indirect. I didn't mean to imply that it wasn't the case. I can add a testcase. I asked a question on the "cfe-dev" mailing list to determine how I could identify the "fallthrough" block from the CFG. So far no one has responded. Until I can determine that, I won't be able to handle that properly with this conditional.

void updated this revision to Diff 234403.Dec 17 2019, 3:26 PM

Fix bad update

xbolva00 added inline comments.
llvm/lib/CodeGen/MachineBasicBlock.cpp
1120

Maybe @aaron.ballman can answer your question related to “fallthrough” block..

More reviewers: +@echristo @MaskRay

llvm/include/llvm/CodeGen/MachineBasicBlock.h
134

I think "IsInlineAsmBrIndirectTarget" would be better. I think of "Pad" as being something EH related, it's usually a "landing pad" or "eh pad".

llvm/lib/CodeGen/MachineBasicBlock.cpp
1118

Somehow I missed that you added isInlineAsmBrIndirectPad, you could use that here instead of hasAddressTaken, and it would be more precise.

1119

I don't think it will actually be that hard to implement. The MIR already knows a few things:

  • After block layout, MBB->getFallthrough() will return the default destination. If non-null, you can use it.
  • If getFallthrough() is null and this block terminated in callbr in IR, there must be an analyzable unconditional branch terminator (JMP or B).
  • All other address taken successors must come from abnormal control flow, i.e. exception handling or callbr.

Actually, I'm surprised this utility doesn't return false after analyzeBranch below if Succ is not one of TBB or FBB.

In any case, if these ideas don't work out, I don't feel that strongly, but I feel obligated to ask you to try a variety of ways to make this work with just MIR info.

1120

I had thought that it would be safe to return false (don't allow this edge to be split) if BB appears in the indirect destination list, even if it's also the default destination, so this condition could be reduced to contains(cbr->getIndirectDests(), bb). I looked for a test where a BB is both a default and indirect destination, so I don't see why we can't remove the default dest check.

void marked 5 inline comments as done.Jan 7 2020, 3:50 PM
void added inline comments.
llvm/lib/CodeGen/MachineBasicBlock.cpp
1119–1120

Okay. No one likes this addition. I'm going to remove it and just check IsInlineAsmBrIndirectTarget.

void updated this revision to Diff 236710.Jan 7 2020, 3:51 PM

Remove check that the successor may be in the indirect list of a callbr instruction.

void updated this revision to Diff 236716.Jan 7 2020, 4:12 PM

Missed a change.

void updated this revision to Diff 237224.Jan 9 2020, 4:35 PM

The plus constraints are now at the end of the input/output/label list.

jyknight added inline comments.Jan 10 2020, 12:09 PM
llvm/lib/CodeGen/MachineVerifier.cpp
699

This isn't correct.

This line here, is looking at a block which doesn't end in a jump to a successor. So, it's trying to verify that the successor list makes sense in that context.

The unstated assumption in the code is that the only successors will be landing pads. Instead of actually checking each one, instead it just checks that the count is the number of landing pads, with the assumption that all the successors should be landing pads, and that all the landing pads should be successors.

The next clause is then checking for the case where there's a fallthrough to the next block. In that case, the successors should've been all the landing pads, and the single fallthrough block.

Adding similar code to check for the number of callbr targets doesn't really make sense. It's certainly not the case that all callbr targets are targets of all
callbr instructions. And even if it was, this still wouldn't be counting things correctly.

However -- I think i'd expect analyzeBranch to error out (returning true) when confronted by a callbr instruction, because it cannot actually tell what's going on there. If that were the case, nothing in this block should even be invoked. But I guess that's probably not happening, due to the terminator being followed by non-terminators.

That seems likely to be a problem that needs to be fixed. (And if that is fixed, I think the changes here aren't needed anymore)

void marked an inline comment as done.Jan 10 2020, 12:23 PM
void added inline comments.
llvm/lib/CodeGen/MachineVerifier.cpp
699

Your comment is very confusing. Could you please give an example of where this fails?

jyknight added inline comments.Jan 10 2020, 3:34 PM
llvm/lib/CodeGen/MachineVerifier.cpp
699

Sorry about that, I should've delimited the parts of that message better...
Basically:

  • Paragraphs 2-4 are describing why the code before this patch appears to be correct for landing pad, even though it's taking some shortcuts and making some non-obvious assumptions.
  • Paragraph 5 ("Adding similar code"...) is why it's not correct for callbr.
  • Paragraph 6-7 are how I'd suggest to resolve it.

I believe the code as of your patch will fail validation if you have a callbr instruction which has a normal-successor block which is an indirect target of a *different* callbr in the function.

I believe it'll also fail if you have any landing-pad successors, since those aren't being added to the count of expected successors, but rather checked separately.

But more seriously than these potential verifier failures, I expect that analyzeBranch returning wrong answers (in that it may report that a block unconditionally-jumps to a successor, while it really has both a callbr and jump, separated by the non-terminator copies) will cause miscompilation. I'm not sure exactly how that will exhibit, but I'm pretty sure it's not going to be good.

And, if analyzeBranch properly said "no idea" when confronted by callbr control flow, then this code in the verifier wouldn't be reached.

void marked 2 inline comments as done.Jan 13 2020, 2:36 AM
void added inline comments.
llvm/lib/CodeGen/MachineVerifier.cpp
699

I didn't need a delineation of the parts of the comment. I needed a clearer description of what your concern is, and to give an example of code that fails here.

This bit of code is simply saying that if the block containing the INLINEASM_BR doesn't end with a BR instruction, then the number of its successors should be equal to the number of indirect successors. This is correct, as it's not valid to have a duplicate label used in a callbr instruction:

$ llc -o /dev/null x.ll
Duplicate callbr destination!
  %3 = callbr i32 asm sideeffect "testl $0, $0; testl $1, $1; jne ${2:l}", "={si},r,X,0,~{dirflag},~{fpsr},~{flags}"(i32 %2, i8* blockaddress(@test1, %asm.fallthrough), i32 %1) #2
          to label %asm.fallthrough [label %asm.fallthrough], !srcloc !6
./bin/llc: x.ll: error: input module is broken!

A callbr with a normal successor block that is the indirect target of a different callbr isn't really relevant here, unless I'm misunderstanding what analyzeBranch returns. There would be two situations:

  1. The MBB ends in a fallthrough, which is the case I mentioned above, or
  1. The MBB ends in a BR instruction, in which case it won't be in this block of code, but the block below.

If analyzeBranch is not taking into account potential COPY instructions between INLINEASM_BR and BR, then it needs to be addressed there (I'll verify that it is). I *do* know that this code is reached by the verifier, so it handles it to some degree. :-)

void marked an inline comment as done.Jan 15 2020, 4:38 PM
void marked an inline comment as done.Jan 15 2020, 5:30 PM
void added inline comments.
llvm/lib/CodeGen/MachineVerifier.cpp
699

But more seriously than these potential verifier failures, I expect that
analyzeBranch returning wrong answers (in that it may report that a block
unconditionally-jumps to a successor, while it really has both a callbr and
jump, separated by the non-terminator copies) will cause miscompilation. I'm
not sure exactly how that will exhibit, but I'm pretty sure it's not going
to be good.

Here are two proposals that may help alleviate these concerns:

  1. Have analyzeBranch skip over the COPYs between the INLINEASM_BR and the JMP. It's relatively straight-forward to do, but it would have to be done for *all* analyzeBranch calls.
  1. Create a new pseudo-instruction called INLINEASM_BR_COPY (or some better name) that's a terminator which behaves like a normal COPY, but the analyzeBranch and other methods that look at terminators will be able to handle it without modifications, since it'll look similarly to an INLINEASM_BR instruction. It doesn't require changing all of analyzeBranch implementations, but it's a much larger change.

Thoughts?

void updated this revision to Diff 239026.Jan 19 2020, 10:56 PM

Update so that each MBB has a list of indirect dests of INLINEASM_BR instructions.

void updated this revision to Diff 239231.Jan 20 2020, 11:15 PM

Split the machine basic block after an INLINEASM_BR instruction that has
outputs. The copies then end up in a separate block and the back end doesn't
have to deal with COPY instructions between two terminators.

void added a comment.Jan 21 2020, 3:08 PM

@jyknight Do you think you'll have time to review this patch this week? I'd like to get it into the 10.0 release if possible. :-)

@jyknight Do you think you'll have time to review this patch this week? I'd like to get it into the 10.0 release if possible. :-)

I volunteer as a reviewer:)

@jyknight Do you think you'll have time to review this patch this week? I'd like to get it into the 10.0 release if possible. :-)

I volunteer as a reviewer:)

W00t! :-)

The idea of moving the copies to a new MachineBasicBlock seems a reasonable solution. That said, it does mean there will be allocatable physical registers which are live-in to the following block, which is generally not allowed. As far as I can tell, I _think_ that should be fine in this particular circumstance, but I'm a little uneasy that I might be missing some reason why it'll be incorrect.

I'm more uncomfortable with the scanning/splitting after-the-fact in ScheduleDAGSDNodes.cpp, and then the successor lists subsequently being incorrect. However, I wasn't immediately able to say what I'd suggest doing instead, so I spent some time yesterday poking around with this patch to see if I could find something that seemed better. I now believe it will be cleaner to have the inline-asm tell SelectionDAGISel::FinishBasicBlock to just emit the copies in another block, which we do in some other situations, already. But, I'm still poking at that to see how it'll end up -- I can't say I'm sure that's the right answer at the moment.

void added a comment.Jan 23 2020, 3:23 PM

The idea of moving the copies to a new MachineBasicBlock seems a reasonable solution. That said, it does mean there will be allocatable physical registers which are live-in to the following block, which is generally not allowed. As far as I can tell, I _think_ that should be fine in this particular circumstance, but I'm a little uneasy that I might be missing some reason why it'll be incorrect.

There are some passes (e.g. machine CSE; MachineCSE.cpp:692) that handle physical live-in registers before register allocation. Do those passes run after a pass which handles the physical live-in registers? (By "handle" I mean analyzes and/or modifies the machine IR so that physical live-ins are "okay".)

I'm more uncomfortable with the scanning/splitting after-the-fact in ScheduleDAGSDNodes.cpp, and then the successor lists subsequently being incorrect. However, I wasn't immediately able to say what I'd suggest doing instead, so I spent some time yesterday poking around with this patch to see if I could find something that seemed better. I now believe it will be cleaner to have the inline-asm tell SelectionDAGISel::FinishBasicBlock to just emit the copies in another block, which we do in some other situations, already. But, I'm still poking at that to see how it'll end up -- I can't say I'm sure that's the right answer at the moment.

I looked at FinishBasicBlock at one point. I put it in EmitSchedule because I wanted to allow the basic block to go through any post processing that may occur. I can place it in FinishBasicBlock if you think it fits in there better.

As for the successor list, I justify it this way: The default behavior is exactly the same as executing the inline asm and continuing directly out as if it's straight line code. If INLINEASM_BR wasn't a terminator, we would add the COPYs directly after it and before the JMP. The only issue is whether the successors being on the copy block instead of the block containing the INLINEASM_BR would cause something to go wrong. Like you I was concerned about this. But I think that since the behavior is the same as if the two blocks were a single block (the assembly code isn't changed, etc.), and the fact that the successors being on the INLINEASM_BR block shouldn't affect any machine passes (i.e. no analysis or transformation should care, because the IR can't model potential branches by the asm), it should be okay. If you wish I could add a part to the machine instruction verifier to ensure that assumptions we're making are enforced.

It seems that callbr (with output) will now be similar to a catchpad. It can set live-in physical register information. (See test/CodeGen/X86/{seh-catch-all.ll,seh-exception-code.ll,wineh-coreclr.ll,wineh-exceptionpointer.ll}) Is there any caveat doing this? Add @rnk to the attention list as the author of rL249492 and rL249786...

@rnk Thoughts on passing live-in formation (a bit similar to WinEH catchpad) to callbr at the SelectionDAG stage?

void added a comment.Jan 29 2020, 5:53 PM

@rnk Thoughts on passing live-in formation (a bit similar to WinEH catchpad) to callbr at the SelectionDAG stage?

To help with the review, here are some of the assumptions I'm making about the INLINEASM_BR and the copy block this patch creates:

  1. The two blocks (CallBrBB and CopyBB) are "tightly coupled". I.e., you can't split the edge between them (not that you should want to).
  2. The CopyBB block will *always* be the fall-through block. This implies that INLINEASM_BR is the only terminator in CallBrBB.
  3. If one block moves, then both must move.

I would be happy to add code in the machine instruction verifier to ensure that these assumptions are met if you think it's necessary.

void added a comment.Feb 3 2020, 12:57 AM

Friendly ping. :-)

void added a comment.Feb 5 2020, 11:47 AM

My apologies for being a pest, but I wanted to know the status of reviews for this bug. @jyknight & @rnk, do you have further comments or need more time?

MaskRay accepted this revision.Feb 5 2020, 12:02 PM

I think this is fine, but want to hear from @jyknight and @rnk.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
1043

for (const MachineOperand &MO : Last.operands()) {

1066

for (const MachineOperand &MO : MI.operands()) {

void updated this revision to Diff 242752.Feb 5 2020, 2:47 PM

Use iterator for machine operands.

void marked 2 inline comments as done.Feb 5 2020, 2:47 PM
void added a comment.Feb 14 2020, 4:01 PM

Another friendly ping. :-)

void updated this revision to Diff 244840.Feb 15 2020, 2:04 PM

Use the BB when creating the MBB.

void added a comment.Feb 18 2020, 9:55 PM

It's been almost a month since the last comments on this review. If you need more time, please comment here. Otherwise, I will submit this with the current approvals by the end of the week.

It's been almost a month since the last comments on this review. If you need more time, please comment here. Otherwise, I will submit this with the current approvals by the end of the week.

Explicit ping @nickdesaulniers, who blocked this, and @jyknight and @rnk, both explicitly requested by @MaskRay.

nickdesaulniers accepted this revision.Feb 19 2020, 10:50 AM

This code has now been tested on a running Linux kernel making use of the feature.

I still would like @jyknight to clarify his comments, consider explicitly requesting changes to this CL, or consider resigning as reviewer.

llvm/include/llvm/CodeGen/MachineBasicBlock.h
137

It's likely the count here is 0, or maybe 1. We don't see too often a large list of labels here.

llvm/lib/CodeGen/MachineVerifier.cpp
699

Instead of actually checking each one, instead it just checks that the count is the number of landing pads, with the assumption that all the successors should be landing pads, and that all the landing pads should be successors.

What do you mean "instead of actually checking each one?" What check should be done?

It's certainly not the case that all callbr targets are targets of all callbr instructions. And even if it was, this still wouldn't be counting things correctly.

Right, so you could have a MachineBasicBlock that's the target of a INLINEASM_BR, and a different MachineBasicBlock that also branches to the INLINEASM_BR target (but itself wasn't an INLINEASM_BR). But IndirectTargetSuccs is only built up from successors of the current block that are isInlineAsmBrIndirectTarget's. So I don't understand how the count is "wrong" just because you could have other MBB's also target the current MBB's indirect successor.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
1076

Isn't Fallthrough from above one of the potential successors? Do we have to skip it in the below conditional? What happens if we call addSuccessor with the same MBB twice?

This revision is now accepted and ready to land.Feb 19 2020, 10:50 AM

Ugh, it's actually been that long, hasn't it...I'm really sorry about that. :(

I've been actively spending time to look at this over the last couple weeks. I haven't been able to convince myself that the weird-successors and having allocatable registers across BBs here is not going to cause codegen issues after optimization passes run on it. Unfortunately, despite spending time looking into it, I also wasn't able to convince myself that this *IS* broken. Maybe someone else can chime in here and either assuage or confirm my worries?

I also see some other minor issues, which I need to write up, but I'd been blocking writing up a comment for that behind the larger question.

Anyways, while looking at this I started thinking it might actually be better to actually have INLINEASM_BR (both with or without outputs) _not_ be a terminator MachineInstr. (remaining a terminator in IR form, however). This would then be similar to how "invoke" works -- at the MachineInstr level, the call is not a terminator, even though it can jump out to the EH successors. And, the return-value handling remains in the same basic block as the call. I tried making that change, but doing it correctly has other impacts -- much of the code which currently special-cases isEHPad() needs to be updated (Which does mean I now feel like I have a good handle on the problems the _previous_ version of this code, which didn't split the block, had.). MachineBasicBlock::updateTerminator is a good example of the problematic cases -- it trawls the successor list to look for the "correct" successor, filtering out isEHPad successors. But unlike EH Pads, indirect targets of a INLINEASM_BR are not used only for that purpose, so can't be so quickly distinguished in the successors list.

So, I started updating some of that code to not have that assumption. While doing so, I noticed that fixing this would also allow analyzeBranch to ignore the inlineasm branch instructions -- and remain correct -- which actually would allow llvm to do better block placement of the inlineasm_br blocks, because it enables it to move the normal successor jump/fallthrough. So, I think that would actually be a good idea to make that change.

But whether it'd be _necessary_ to make such a change (or other changes) for this patch, or just a nice thing to fix, I'm still just unsure about.

void added a comment.Feb 19 2020, 2:51 PM

Ugh, it's actually been that long, hasn't it...I'm really sorry about that. :(

No worries. Thanks for getting back to us!

I've been actively spending time to look at this over the last couple weeks. I haven't been able to convince myself that the weird-successors and having allocatable registers across BBs here is not going to cause codegen issues after optimization passes run on it. Unfortunately, despite spending time looking into it, I also wasn't able to convince myself that this *IS* broken. Maybe someone else can chime in here and either assuage or confirm my worries?

I've been concerned about the register live-ins too (I'm less concerned about the successors issue). Is there documentation on the original decision to disallow physical register live-ins for MBBs before register allocation? We could then check to see if we're violating the original reasoning.

I also see some other minor issues, which I need to write up, but I'd been blocking writing up a comment for that behind the larger question.

Anyways, while looking at this I started thinking it might actually be better to actually have INLINEASM_BR (both with or without outputs) _not_ be a terminator MachineInstr. (remaining a terminator in IR form, however). This would then be similar to how "invoke" works -- at the MachineInstr level, the call is not a terminator, even though it can jump out to the EH successors. And, the return-value handling remains in the same basic block as the call. I tried making that change, but doing it correctly has other impacts -- much of the code which currently special-cases isEHPad() needs to be updated (Which does mean I now feel like I have a good handle on the problems the _previous_ version of this code, which didn't split the block, had.). MachineBasicBlock::updateTerminator is a good example of the problematic cases -- it trawls the successor list to look for the "correct" successor, filtering out isEHPad successors. But unlike EH Pads, indirect targets of a INLINEASM_BR are not used only for that purpose, so can't be so quickly distinguished in the successors list.

I had a very similar idea. There are some people *cough*Chris*cough* who insist that INLINEASM_BR makes sense as a terminator. I don't deny that it's very tempting to make it one, but because of this instruction's behavior it doesn't *act* like a normal terminator (explicitly branching to some destination).

As for isEHPad, do you think it would make sense to have a generic isIndirectTarget predicate, which we could then use everywhere and it would automagically filter out blocks that aren't interesting?

So, I started updating some of that code to not have that assumption. While doing so, I noticed that fixing this would also allow analyzeBranch to ignore the inlineasm branch instructions -- and remain correct -- which actually would allow llvm to do better block placement of the inlineasm_br blocks, because it enables it to move the normal successor jump/fallthrough. So, I think that would actually be a good idea to make that change.

Now that I know I'm not crazy for wanting to make INLINEASM_BR a non-terminator (I'm assuming you're not crazy at least :-), I'll give it a go. Feel free to send me patches if you have them.

But whether it'd be _necessary_ to make such a change (or other changes) for this patch, or just a nice thing to fix, I'm still just unsure about.

Given your concerns over the patch in its current form, let's at least try the non-terminator path. If we can make it work, then your concerns will be assuaged (as would mine).

void added a comment.Feb 19 2020, 5:45 PM

Would you be okay with me submitting this and working on making INLINEASM_BR a non-terminator? I'd like to give this feature some bake time.

lattner accepted this revision.Feb 19 2020, 10:29 PM

I'm super excited to see this progress towards supporting 'asm goto' with results! Great work!

I've been concerned about the register live-ins too (I'm less concerned about the successors issue). Is there documentation on the original decision to disallow physical register live-ins for MBBs before register allocation? We could then check to see if we're violating the original reasoning.

IIRC the regallocators don't rely on this. Actually, I think the live-ins sets are supposed to be correct only after regalloc (expect for the entry block, I don’t know for the landing pads.)

void edited the summary of this revision. (Show Details)Feb 21 2020, 3:12 PM
void marked 2 inline comments as done.Feb 24 2020, 4:36 PM
void added inline comments.
llvm/include/llvm/CodeGen/MachineBasicBlock.h
137

I'll make it 2 instead. :-)

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
1076

Yes. It's added to CopyBB above and the !CopyBB->isSuccessor(Succ) makes sure it's not re-added.

This revision was automatically updated to reflect the committed changes.