This is an archive of the discontinued LLVM Phabricator instance.

[TailDuplicator] prevent tail duplication for INLINEASM_BR
AbandonedPublic

Authored by nickdesaulniers on May 28 2019, 3:59 PM.

Details

Summary

Otherwise we may fail to predicate the INLINEASM_BR later in
the "If Converter" pass, particularly for targets that rely heavily
on it.

pr/42012
Link: https://github.com/ClangBuiltLinux/linux/issues/494

Event Timeline

nickdesaulniers retitled this revision from prevent tail duplication for INLINEASM_BR to [TailDuplicator] prevent tail duplication for INLINEASM_BR.May 28 2019, 4:02 PM
  • get rid of some debug info
nickdesaulniers edited the summary of this revision. (Show Details)May 28 2019, 4:31 PM

I can see how this change would prevent the TailDuplication pass from duplicating blocks with callbr instructions, however I'm not sure if that is sufficient, or if it is too restrictive.

  • Is it possible to write C code that results in IR as if the TailDuplication pass has duplicated the callbr, or is there a fundamental part of asm goto that prevents this from happening?
  • Do we know of other passes that might fail due to TailDuplication of callbr?
  • Could this be fixed in the if conversion pass IfConverter::ScanInstructions() by having TargetInstrInfo::isPredicable(MachineInstr&) reject TargetOpcode::INLINEASM_BR?

If there are many other things that are broken by duplicating callbr then this patch is definitely the right thing to do. If it is just if conversion then it may be best to fix it there? Apologies I'm not an expert in this area, so I'm happy to defer. May be best to try some more reviewers to take a look.

llvm/lib/CodeGen/TailDuplicator.cpp
789

I think it will be worth a comment saying why we can't tail duplicate blocks with asm goto?

790

Is the INLINEASM_BR more likely to terminate the basic block, if it is then maybe a reverse iterator will find them faster? Although I suspect that most basic blocks won't contain any so it won't matter much anyway.

Could this be fixed in the if conversion pass IfConverter::ScanInstructions() by having TargetInstrInfo::isPredicable(MachineInstr&) reject TargetOpcode::INLINEASM_BR?

Excellent point. Thinking more about this last night, and your comment; I think we may have 2 bugs here, and the current patch works but for the wrong reasons. I should have read through IfConverter instead of just focusing on TailDuplicator.

Towards your point quoted above (and without having looked at IfConverter), I'd naively say that:

  • INLINEASM_BR is a pseudo-op at the MIR level; basically a list of assembly instructions specified by the user that should not be peephole optimized.
  • Theoretically, we might be able to predicate the block, but we'd basically have to disassemble the user specified inline assembly, and check that all of the instructions were predicable.
  • If IfConverter has the machinery and finds it safe to do so for INLINEASM, then we can and should reuse that machinery.
  • Otherwise, conservatively do not allow INLINEASM_BR to be predicable. (I'm leaning more towards this, since such machinery would be rewriting the inline asm the user specified).

Either way, the test case can be reused for both issues (with different and improved CHECKs.

That's the first issue, observed w/ Debug builds. The second issue which was first observed with Release builds is that TailDuplicator was leaving behind references to removed branches for INLINEASM_BR. We can either skip this transformation for MachineBasicBlocks when encountering an INLINEASM_BR (as the patch currently does) with the backwards iteration improvement (you suggested), or I can see if we can update the reference correctly when we duplicate an INLINEASM_BR. I think what might make the latter difficult is that INLINEASM_BR's operands are of type BlockAddress, not MachineBlockAddress (IIRC, might be wrong).

Thanks for the code review; it definitely provided food for thought and will result in improved fixes!

craig.topper added inline comments.May 29 2019, 10:57 AM
llvm/lib/CodeGen/TailDuplicator.cpp
790

This will happen for each predecessor since the caller is in a loop right? Maybe we should check this before the loop in tailDuplicate? Are the othercallers of canTailDuplicate?

nickdesaulniers added a comment.EditedMay 29 2019, 3:39 PM

Otherwise, conservatively do not allow INLINEASM_BR to be predicable. (I'm leaning more towards this, since such machinery would be rewriting the inline asm the user specified).

So I have a patch in hand that basically does this. When I then try to build an allyesconfig arm kernel, I get the same "unable to predicate instruction" failure in a different translation unit for a BL instruction. It seems that BL is not considered isUncondBranchOpcode() in llvm/lib/Target/ARM/ARMBaseInstrInfo.h. Looking at its def in llvm/lib/Target/ARM/ARMInstrInfo.td, BL seems defined differently than Bcc (for instance) causing llvm::getMatchingCondBranchOpcode() to choke. I'm not sure if this is worth pursuing at this point.

The second issue which was first observed with Release builds is that TailDuplicator was leaving behind references to removed branches for INLINEASM_BR.

This is actually IfConverter (which only does transforms based on patterns from TailDuplicator, which is why the current patch as written works; block TailDuplicator then IfConverter can't mess up). I'm going to try now to see if I can find where this is going wrong; I think if we don't predicate MachineInstr`s when isInlineAsm, then don't remove address taken MachineBasicBlocks, that might be the ultimate solution.

So I have a patch in hand that basically does this. When I then try to build an allyesconfig arm kernel, I get the same "unable to predicate instruction" failure in a different translation unit for a BL instruction.

We shouldn't try to predicate ARM::BL, probably, given it isn't predicable. (In theory, it should be possible to predicate a call in ARM mode, I think, but it isn't implemented, and it's not clear it's actually worth implementing; it's a little tricky to handle the relocations correctly.) I'm pretty sure the ARM backend correctly reports this to if conversion, so probably a bug in if conversion. Maybe the same issue as https://bugs.llvm.org/show_bug.cgi?id=41121 .

so probably a bug in if conversion. Maybe the same issue as https://bugs.llvm.org/show_bug.cgi?id=41121 .

Interesting, and does look related. While I try to make If Converter more robust in the face of INLINEASM_BR, I still think curtailing Tail Duplicator from duplicating INLINEASM_BR (and INLINEASM) is probably also worthwhile, and kind of makes robustness changes to If Converter less important. I'm going to spend the rest of the day trying to improve If Converter, but ultimately this is holding up asm goto from working perfectly (which is more important to me), so if I don't have anything working by EOD, then tomorrow I will make the suggested edits here and focus on this patch.

So I have a patch in hand that basically does this.

In particular, I'm not super happy with it, which is why I haven't posted it yet. I think there's yet another/better way to solve the problem in If Converter, but requires more work.

nickdesaulniers marked an inline comment as done.
  • get rid of some debug info
  • hoist INLINEASM_BR check into shouldTailDuplicate
nickdesaulniers marked 3 inline comments as done.May 30 2019, 2:48 PM
nickdesaulniers added inline comments.
llvm/lib/CodeGen/TailDuplicator.cpp
790

Looks like there's a TailDuplicator::canTailDuplicate and a TailDuplicator::shouldTailDuplicate. TailDuplicator::shouldTailDuplicate runs earlier and iterates all MachineInstrs in the MachineBasicBlock. While the iteration order is front to back, moving this check there solves @craig.topper 's point about the the TailBB be repeatedly analyzed.

I still think curtailing Tail Duplicator from duplicating INLINEASM_BR (and INLINEASM) is probably also worthwhile

I don't think this makes sense. We can, and do, clone inline asm at the IR level; it should also be legal at the MIR level. (It's possible the transform isn't profitable if we're optimizing for size, but that's a completely different issue.)

nickdesaulniers marked an inline comment as done.May 30 2019, 3:24 PM

It's possible the transform isn't profitable if we're optimizing for size, but that's a completely different issue.

Right, that was more of my point.

Anyways, with this patch and D56571, I can successfully build arm32 Linux kernel allyesconfig, and build+boot arm32 Linux kernel defconfig + CONFIG_JUMP_LABEL + CONFIG_STATIC_KEYS_SELFTEST.

srhines added inline comments.May 30 2019, 5:12 PM
llvm/lib/CodeGen/TailDuplicator.cpp
627

You're missing a word in this sentence. "... leave it in its own ..."

nickdesaulniers marked an inline comment as done.May 30 2019, 5:47 PM

so probably a bug in if conversion. Maybe the same issue as https://bugs.llvm.org/show_bug.cgi?id=41121 .

Interesting, and does look related. While I try to make If Converter more robust in the face of INLINEASM_BR, I still think curtailing Tail Duplicator from duplicating INLINEASM_BR (and INLINEASM) is probably also worthwhile, and kind of makes robustness changes to If Converter less important. I'm going to spend the rest of the day trying to improve If Converter, but ultimately this is holding up asm goto from working perfectly (which is more important to me), so if I don't have anything working by EOD, then tomorrow I will make the suggested edits here and focus on this patch.

I don't have a strong opinion here. I'm happy to disable the duplication for INLINEASM_BR at this point if we can capture the information that we could remove the restriction with a more sophisticated if conversion.

llvm/lib/CodeGen/TailDuplicator.cpp
630

Is it worth something like "FIXME: improve if conversion to not consider blocks with INLINEASM_BR to remove this restriction." ? To record that improvements are possible.

The following testcase still crashes with this patch. Given that, I don't think it makes sense to merge this patch; even if it lets some current version kernel build, someone else is likely to hit the same issue in the near future.

target triple = "arm-linux-gnueabi"
@a = external dso_local local_unnamed_addr global i32*, align 4
@c = external dso_local local_unnamed_addr global i32, align 4

define dso_local i32 @f() #0 {
entry:
  %0 = load i32, i32* @c, align 4
  %tobool = icmp eq i32 %0, 0
  %1 = load i32*, i32** @a, align 4
  br i1 %tobool, label %if.else, label %if.then

if.then:                                          ; preds = %entry
  tail call void asm sideeffect "str $1, $0", "*Qo,r"(i32* %1, i32 1) #1
  callbr void asm sideeffect "", "X"(i8* blockaddress(@f, %g)) #1
          to label %asm.fallthrough [label %g]

if.else:                                          ; preds = %entry
  tail call void asm sideeffect "str $1, $0", "*Qo,r"(i32* %1, i32 0) #1
  callbr void asm sideeffect "", "X"(i8* blockaddress(@f, %g)) #1
          to label %asm.fallthrough [label %g]

asm.fallthrough:                                  ; preds = %if.end
  unreachable

g:                                                ; preds = %if.end
  ret i32 undef
}

attributes #0 = { "use-soft-float"="false" }
attributes #1 = { nounwind }
nickdesaulniers abandoned this revision.Sep 6 2019, 9:59 AM

@eli.friedman fixed this in r371111. Thanks Eli!