Download Raw Diff

Details

Reviewers

efriedma
fhahn
void
craig.topper
MaskRay
jyknight

Commits

rGed0fd567ebdb: BreakCriticalEdges: do not split the critical edge from a CallBr indirect…

Summary

Otherwise we'll fail the assertion in SplitBlockPredecessors() related
to splitting the edges from CallBr's.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1161
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nickdesaulniers created this revision.Sep 28 2020, 11:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 11:43 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

nickdesaulniers requested review of this revision.Sep 28 2020, 11:43 AM

Harbormaster completed remote builds in B73212: Diff 294769.Sep 28 2020, 12:05 PM

I believe that a critical edge may be split on the default path, but not the indirect path.

In D88438#2304941, @void wrote:

I believe that a critical edge may be split on the default path, but not the indirect path.

I don't think https://github.com/llvm/llvm-project/commit/2fe457690da0fc38bc7f9f1d0aee2ba6a6a16ada made a similar distinction?

I think if I add a bool CallBrInst::isIndirectDest(BasicBlock&); method, I would be able to use that here and in the above cases linked without regressing the test case added in the above link.

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

(oh, I never git add'ed my test case, let me do so now).

add test case

Herald added a subscriber: nemanjai. · View Herald TranscriptSep 30 2020, 4:58 PM

nickdesaulniers added inline comments.Sep 30 2020, 4:59 PM

llvm/test/CodeGen/PowerPC/loop-reduce-callbr.ll
1	is there a better way to express: "this should not trip an assertion build?"
5	TODO(Nick): delete

Harbormaster completed remote builds in B73592: Diff 295437.Sep 30 2020, 5:14 PM

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

The problem has to do with blockaddress.

For normal branches, splitting an edge is usually straightforward: you just introduce a new block that branches to the original block, and then you rewrite the branch in the predecessor. But suppose you have a block with two indirectbr predecessors, and you want to split the edge. In that case, the "rewrite the branch" step doesn't work: you can't rewrite the address argument to the indirectbr. (Or at least, you can't without performing some invasive rewrite like indirectbr->switch lowering.)

efriedma added inline comments.Sep 30 2020, 6:08 PM

llvm/test/CodeGen/PowerPC/loop-reduce-callbr.ll
1	Just running opt will automatically check the exit code, so this works. Generally you'd want to add a few CHECK lines to ensure loop-reduce is actually doing what you expect it to, though, so the test doesn't bitrot.

In D88438#2305068, @efriedma wrote:

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

The problem has to do with blockaddress.

For normal branches, splitting an edge is usually straightforward: you just introduce a new block that branches to the original block, and then you rewrite the branch in the predecessor. But suppose you have a block with two indirectbr predecessors, and you want to split the edge. In that case, the "rewrite the branch" step doesn't work: you can't rewrite the address argument to the indirectbr. (Or at least, you can't without performing some invasive rewrite like indirectbr->switch lowering.)

Correkt. We had a long discussion during the original implementation. I eventually relented, because I couldn't figure out a way to implement it without making the code unbearably horrible.

The best way to think about critical edges is to imagine trying to resolve a PHI node in block BB. The PHI node is a pseudo instruction, which has to become a concrete instruction during code gen. Let's take a simple case where the value is an immediate: 42 on branch A (from predecessor P1) and 37 on branch B (from predecessor P2), and further that the result of the PHI node is some register rN. When resolving the PHI node, you'll push rN <- 42 to the bottom of P1 and rN <- 37 to the bottom of P2. Nice!

Now if you have critical edges, things get messy. You can no longer push rN <- 42 and rN <- 37 into the predecessor blocks, because rN may be used for another value on edges not going to BB. In this case, we "split" the critical edge. I.e. we create an empty block between P1 and BB on edge A and another between P2 and BB on edge B. This provides us with a place to insert the rN <- 42 and rN <- 37 instructions respectively.

In D88438#2304953, @nickdesaulniers wrote:

In D88438#2304941, @void wrote:

I believe that a critical edge may be split on the default path, but not the indirect path.

I don't think https://github.com/llvm/llvm-project/commit/2fe457690da0fc38bc7f9f1d0aee2ba6a6a16ada made a similar distinction?

Yeah. I made an error there...

I think if I add a bool CallBrInst::isIndirectDest(BasicBlock&); method, I would be able to use that here and in the above cases linked without regressing the test case added in the above link.

It should be enough that the block isn't the default destination. (Though recent bug reports seem troubling for this.)

I don't know if it's important enough to allow splitting default edges right now. It may be okay to wait.

nickdesaulniers mentioned this in D94470: [LSR] Don't break a critical edge if parent ends with "callbr".Jan 13 2021, 6:51 PM

rebase, fix check for only non-default destination edges as per Bill
looks like this also fixes Fixes: https://github.com/ClangBuiltLinux/linux/issues/1252 / D94470
Added test case from D94470.

remove comment from test, simplify lambda

nickdesaulniers edited the summary of this revision. (Show Details)Jan 14 2021, 11:44 AM

nickdesaulniers added a reviewer: MaskRay.

nickdesaulniers added a subscriber: nathanchance.

mv my test

add comments and FileCheck to tests

Harbormaster completed remote builds in B85207: Diff 316724.Jan 14 2021, 12:27 PM

Harbormaster completed remote builds in B85205: Diff 316720.Jan 14 2021, 12:30 PM

Harbormaster completed remote builds in B85208: Diff 316725.Jan 14 2021, 12:42 PM

In D88438#2305129, @void wrote:

In D88438#2305068, @efriedma wrote:

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

The problem has to do with blockaddress.

For normal branches, splitting an edge is usually straightforward: you just introduce a new block that branches to the original block, and then you rewrite the branch in the predecessor. But suppose you have a block with two indirectbr predecessors, and you want to split the edge. In that case, the "rewrite the branch" step doesn't work: you can't rewrite the address argument to the indirectbr. (Or at least, you can't without performing some invasive rewrite like indirectbr->switch lowering.)

Isn't "the address argument in the indirectbr" derived from a blockaddress? So couldn't you just find the blockaddress and rewrite it (or create a new one) that referred to the newly created block?

Correkt. We had a long discussion during the original implementation. I eventually relented, because I couldn't figure out a way to implement it without making the code unbearably horrible.

The best way to think about critical edges is to imagine trying to resolve a PHI node in block BB. The PHI node is a pseudo instruction, which has to become a concrete instruction during code gen. Let's take a simple case where the value is an immediate: 42 on branch A (from predecessor P1) and 37 on branch B (from predecessor P2), and further that the result of the PHI node is some register rN. When resolving the PHI node, you'll push rN <- 42 to the bottom of P1 and rN <- 37 to the bottom of P2. Nice!

Now if you have critical edges, things get messy. You can no longer push rN <- 42 and rN <- 37 into the predecessor blocks, because rN may be used for another value on edges not going to BB. In this case, we "split" the critical edge. I.e. we create an empty block between P1 and BB on edge A and another between P2 and BB on edge B. This provides us with a place to insert the rN <- 42 and rN <- 37 instructions respectively.

Thanks! I found https://www.inf.ed.ac.uk/teaching/courses/copt/lecture-4-from-ssa.pdf pretty instructive, too.

run the same invocations for both tests, remove unnecessary target_triple

nickdesaulniers added a reviewer: jyknight.Jan 14 2021, 12:55 PM

nickdesaulniers retitled this revision from BreakCriticalEdges: bail if loop-simplify form requested for CallBr terminated BB to BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor.

Harbormaster completed remote builds in B85212: Diff 316740.Jan 14 2021, 12:57 PM

Looks totally reasonable to me.

This revision is now accepted and ready to land.Jan 14 2021, 1:13 PM

MaskRay added inline comments.Jan 14 2021, 1:29 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	Just use the new PM way: `-passes='loop(loop-reduce)'`
3 ↗	(On Diff #316725)	Please add some CHECK lines to prevent bitrot.

nickdesaulniers added inline comments.Jan 14 2021, 1:31 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
3 ↗	(On Diff #316725)	Beyond the one related to actual critical edge? Aren't those just noise?

nickdesaulniers added inline comments.Jan 14 2021, 1:43 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	If I do that for both tests: this one passes before this patch has been applied. (bad)(lol, wut) the other test still fails. (good) Is there something funny with NPM that I need to specify additional passes in addition to `loop(loop-reduce)`?

Harbormaster completed remote builds in B85227: Diff 316756.Jan 14 2021, 1:55 PM

LGTM.

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
3 ↗	(On Diff #316756)	One RUN line `opt -passes='loop(loop-reduce)' %s` is sufficient. `-passes=loop-reduce` is the same, just getting an implicit loop pass adapter. The legacy PM test does not add too much value for new tests as we'll soon make a switch.
15 ↗	(On Diff #316756)	-NEXT
llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting2.ll
23 ↗	(On Diff #316756)	-NEXT

use CHECK-NEXT, drop -passes=loop-reduce

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	Let me be more precise: Before this patch is applied: -loop-reduce: callbr-critical-edge-splitting.ll: fail callbr-critical-edge-splitting2.ll: fail -passes=loop-reduce: callbr-critical-edge-splitting.ll: pass (WTF) callbr-critical-edge-splitting2.ll: fail -passes='loop(loop-reduce)': callbr-critical-edge-splitting.ll: pass (WTF) callbr-critical-edge-splitting2.ll: fail So I guess it makes sense to drop `-passes=loop-reduce` if that's covered by `-passes='loop(loop-reduce)'`, but it's curious to me from a TDD perspective why the test is green pre-patch for one pass manager (the new one) and red pre-patch for another pass manager (the old one). The legacy PM test does not add too much value for new tests as we'll soon make a switch. But the test is red pre-patch for legacy PM. WTF

nickdesaulniers added inline comments.Jan 14 2021, 4:22 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	The output between -debug-only=loop-pass is a bit different between `-loop-reduce` and `-passes='loop(loop-reduce)'` as well...

nickdesaulniers added inline comments.Jan 14 2021, 4:36 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	Interesting, ok, so `-print-after-all` shows that `-loop-reduce` (ie. old PM) runs more passes than just `-loop-reduce`. Looks like `-loop-reduce` additionally runs: -loop-simplify -loop-reduce -verify I don't know how to run `-loop-simplify` with NPM (guessed: `-passes='loop(loop-simplify,loop-reduce)'`; doesn't work), but if I did I guess that might also make `callbr-critical-edge-splitting2.ll` pass pre-patch. In that case, unless it's possible to run `-loop-simplify` with NPM, then I think I will simply drop the OPM run from `callbr-critical-edge-splitting2.ll`, since it doesn't add any signal, but keep OPM for `callbr-critical-edge-splitting.ll` since it does. Thoughts?

drop old pass manager test from callbr-critical-edge-splitting2.ll since it does not add signal

nickdesaulniers added inline comments.Jan 14 2021, 4:41 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	Aha, `-passes='loop-simplify,loop(loop-reduce)'`!

nickdesaulniers added inline comments.Jan 14 2021, 4:46 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	I would have expected `-passes='loop-simplify,loop(loop-reduce)'` (new pass manager) to fail pre-patch similar to `-lood-reduce` (old pass manager). It does not. I'm out of ideas here. Anyone know any tips or reasons why we get different results between new and old pass manager for loop reduction?

nickdesaulniers added inline comments.Jan 14 2021, 4:51 PM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	similar to -lood-reduce similar to `-loop-reduce` (I don't have it misspelled locally or copy+paste mistakes) Otherwise, I'm thinking of committing this as is.

Harbormaster completed remote builds in B85262: Diff 316801.Jan 14 2021, 4:53 PM

Harbormaster completed remote builds in B85267: Diff 316812.Jan 14 2021, 5:19 PM

nickdesaulniers added a subscriber: aeubanks.Jan 15 2021, 12:47 AM

nickdesaulniers added inline comments.

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	@aeubanks I saw https://lists.llvm.org/pipermail/llvm-dev/2021-January/147695.html / https://bugs.llvm.org/show_bug.cgi?id=46649; is this thread of questions about different behavior between OPM and NPM something we should worry about?

aeubanks added inline comments.Jan 15 2021, 9:32 AM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	The new PM will run LCSSA and LoopSimplify before any loop passes: https://github.com/llvm/llvm-project/blob/6227069bdce6b0c3c22f0a0c8f1aef705985125a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h#L415. The legacy PM does not by default, but the loop-reduce legacy pass depends on loop-simplify. Adding `-loop-simplify -lcssa` at the beginning will probably make both RUN lines behave the same. Or just you can make the input already in LCSSA form.

nickdesaulniers added inline comments.Jan 15 2021, 10:24 AM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	I tried `-passes='loop-simplify,lcssa,loop(loop-reduce)'` (and `-passes='loop-simplify,lcssa,loop-reduce'`), but `callbr-critical-edge-splitting.ll` still seems to pass before this patch is applied, which is unexpected. So there's still something different here between pass managers. (Also, I'm surprised that `-print-after-all` and `-print-before-all` don't seem to mention anything related to `lcssa` for LPM/OPM.) Ok, this is weird: comparing the output of `-print-before-all` between `-loop-reduce` and `-passes='loop-simplify,loop(loop-reduce)'`: `-loop-reduce`: * IR Dump Before Canonicalize natural loops * * IR Dump Before Loop Strength Reduction * `-passes='loop-simplify,loop(loop-reduce)'`: * IR Dump Before VerifierPass * * IR Dump Before LoopSimplifyPass * * IR Dump Before LoopSimplifyPass * * IR Dump Before LCSSAPass * * IR Dump Before LoopStrengthReducePass * * IR Dump Before LoopStrengthReducePass * * IR Dump Before VerifierPass * * IR Dump Before PrintModulePass * `Canonicalize natural loops` looks to me like `loop-simplify`, and yet when I explicitly run `loop-simplify` via NPM, the printed pass name is `LoopSimplifyPass` (I would have expected `Canonicalize natural loops`). What is going on?

nickdesaulniers added inline comments.Jan 15 2021, 10:38 AM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	llvm-project/llvm/lib/Transforms/Utils/LoopSimplify.cpp has the following: INITIALIZE_PASS_BEGIN(LoopSimplify, "loop-simplify", "Canonicalize natural loops", false, false) INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker) INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass) INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass) INITIALIZE_PASS_END(LoopSimplify, "loop-simplify", "Canonicalize natural loops", false, false) so it looks like `-loop-simplify` is silently running 3 additional passes? (That's annoying that those passes don't print anything via -print-before-all, unless they're not being run). Looking at DominatorTreeWrapperPass, it looks like that's not ported to run with NPM? Same for LoopInfoWrapperPass. At this point; I'm just going to commit this test invoking both pass managers: this is a bug visible with LPM, but it should not regress for NPM.

aeubanks added inline comments.Jan 15 2021, 11:14 AM

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
1 ↗	(On Diff #316725)	`Canonicalize natural loops` = `LoopSimplifyPass`. The naming is different for implementation reasons. `AssumptionCacheTracker`/`DomTree`/`LoopInfo` are analysis passes that the pass uses, they won't show up in `--print-before/after-all`. The new PM has a different way of querying analyses. The legacy PM doesn't automatically run LCSSA, so you won't see it unless the pass depends on it.