Prior to inserting an unconditional branch from X to its
fall through basic block, check if X has any terminators to
avoid inserting additional branches.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | I'd expect for this to be covered by getFallThrough check for false. Did something weird happen with an unanalyzable branch? |
llvm/test/CodeGen/AMDGPU/branch-relax-no-terminators.ll | ||
---|---|---|
1 ↗ | (On Diff #462558) | Perhaps the test should check the generated output rather than the debug info? The debug info is only generated if asserts are enabled, and I'm not sure why it shows the fail/pass criteria. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | We found a case where fixupUnconditionalBranch() appends a s_branch X to a block with terminators s_cbranch X, s_branch Y. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | There's a comment in getFallThrough: "If there is some explicit branch to the fallthrough block, it can obviously reach, even though the branch should get folded to fall through implicitly." Maybe that bit of code is what's confusing the logic here? (I'm not sure why that logic exists...) |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | Yea there is no check for the "explicit branch" or any form of branch at all. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 |
Yes - this is the code that causes the problem. I'm also not sure why that logic is there, though it's been around since 2009. The basic block where getFallThrough returns true contains the following branches: BB1: other instructions s_cbranch_execnz BB2 s_branch BB3 BB2: |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | That makes sense... but I'm not sure checking for PrevBB->terminators().empty() is the right way to fix that. You can have fallthrough with terminators. For example: BB1: other instructions s_cbranch_execnz BB3 BB2: |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | In the example, a branch won't be inserted which is what we expect right? Or am I missing something? |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | A branch is needed because a new block is inserted between BB1 and BB2 that shouldn't be executed when control goes from BB1 -> BB2. So, after the transformation, BB1: other instructions s_cbranch_execnz BB3 s_branch BB2 BB4: BB2: |
- Instead of checking the number of terminators to allow for fallthrough, analyze the branches and only allow fallthroughs if no terminator is an unconditional branch. Remove conditional branch if the destination is the fallthrough block.
- Add an MIR test.
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
506 | I don't think I've ever seen someone use the and macro before |
- Removed braces for a single-line code block
- Changed return type of EnforceFallThrough to void
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
503 | I made EnforceFallthrough usable more generally so it would be inconsistent to insert the assertion in the lambda. |
The logic here has gotten more confusing, not less. Please just don't call getFallThrough; you can directly reason about the result of analyzeBranch.
From TargetInstrInfo.h:
/// 1. If this block ends with no branches (it just falls through to its succ) /// just return false, leaving TBB/FBB null. /// 2. If this block ends with only an unconditional branch, it sets TBB to be /// the destination block. /// 3. If this block ends with a conditional branch and it falls through to a /// successor block, it sets TBB to be the branch destination block and a /// list of operands that evaluate the condition. These operands can be /// passed to other TargetInstrInfo methods to create new branches. /// 4. If this block ends with a conditional branch followed by an /// unconditional branch, it returns the 'true' destination in TBB, the /// 'false' destination in FBB, and a list of operands that evaluate the /// condition. These operands can be passed to other TargetInstrInfo /// methods to create new branches.
1 is fallthrough. 2 is not fallthrough. 3 is fallthrough. 4 is not fallthrough.
Not sure I understand the logic here when analyzeBranch fails; you just assume the block doesn't fall through?
- calling getFallthrough() is redundant as the specific case asserts that a fallthrough exists for PrevBB. I simplified the code and eliminated the use of getFallthrough().
analyzeBranch returns false when it is able to infer the terminators of the basic block. That is, determine TBB and FBB. This patch effectively leverages the cases for fallthrough as you mentioned for a specific case - only fall through for cases 1 and 3. Albeit I agree that not using getFallthrough() makes the code simpler and probably should not be used as the case already asserts that a fall through certainly exists for PrevBB (PrevBB is defined to be the block immediately prior to DestBB).
Okay, that's more straightforward.
I think we need to distinguish between (2) and (3) somehow (unconditional branch, vs. conditional branch + fallthrough). Should we be checking Cond.empty()?
analyzeBranch returns false when it is able to infer the terminators of the basic block
Right... and when it fails to analyze (i.e. returns true), you assume there is no fallthrough. I don't think that's a safe assumption.
- Use Cond.empty() to distinguish between the case of uncondBr vs condBr + fallthrough
Two possiblities: (1) there may be a known control barrier in the end of this block or, (2) the barrier may be predicated during IfConversion. In either case, fallthrough should be allowed.
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
509–511 | I tried out various branching opcodes out there on AMDGPU, none of them satisfied the condition. |
llvm/test/CodeGen/AMDGPU/branch-relax-no-terminators.mir | ||
---|---|---|
4–7 | llc: Unknown command line argument '--amdgpu-s-branch-bits=5' when running update_mir_test_checks.py on this test. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
509–511 | I'd prefer to just check else if (FailToAnalyze && PrevBB->isSuccessor(DestBB)), rather than try to figure out the specific properties of the terminator of PrevBB. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
509–511 | I'm not a fan of putting untested code in to satisfy theoretical conditions. If there's a case we don't know how to handle, I would just put a report_fatal_error in for when a target that does needs this runs into it |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
509–511 | You could end up in this situation with a predicated "ret" instruction on 32-bit ARM, I think, but we don't use BranchRelaxation there. Not sure if there any any unanalyzable fallthrough terminators on any of the targets that currently use BranchRelaxation; if there aren't, I guess a report_fatal_error is sufficient. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | I am not quite sure whether a simplified approach would work here: Check whether there is explicit branch to FT in PrevBB? if no, insert unconditional branch at the end of PrevBB to FT, otherwise do nothing. Does this solve the problem here? The remaining question is how to detect whether there is pre-existing branch to FT in PrevBB? I think we can do an iteration over the terminators of PrevBB to find if there is any branch instruction that has a MBB operand referencing DestBB. if ((auto *FT = PrevBB->getFallThrough()) { iterating over the terminators of PrevBB, see if there is any branch instruction which has a MachineOperand that references FT if (explicit branch to FT NOT found) insert unconditional branch and recalculate block size } |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 | That is essentially what I have done via breaking getFallThrough() and avoiding a call to it. The problem with getFallThrough(), as discussed previously, is that it consists of confusing logic. I use analyzeBranch() to determine the successors of PrevBB, which is the same as iterating over terminators and checking their operands. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
501–509 |
ok, now that you guys have made the decision like that, feel free to ignore my comments. Sorry I did not follow all the discussion carefully. |
llvm/lib/CodeGen/BranchRelaxation.cpp | ||
---|---|---|
510 | Can just move this up as if (analyzeBranch()) report_fatal_error |
Do you need this to return true/false. Doesn't seem like the return value is checked?