This is an archive of the discontinued LLVM Phabricator instance.

BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor
ClosedPublic

Authored by nickdesaulniers on Sep 28 2020, 11:43 AM.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 11:43 AM
nickdesaulniers requested review of this revision.Sep 28 2020, 11:43 AM
void added a comment.Sep 30 2020, 4:34 PM

I believe that a critical edge may be split on the default path, but not the indirect path.

I believe that a critical edge may be split on the default path, but not the indirect path.

I don't think https://github.com/llvm/llvm-project/commit/2fe457690da0fc38bc7f9f1d0aee2ba6a6a16ada made a similar distinction?

I think if I add a bool CallBrInst::isIndirectDest(BasicBlock&); method, I would be able to use that here and in the above cases linked without regressing the test case added in the above link.

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

(oh, I never git add'ed my test case, let me do so now).

  • add test case
llvm/test/CodeGen/PowerPC/loop-reduce-callbr.ll
1 ↗(On Diff #295437)

is there a better way to express: "this should not trip an assertion build?"

5 ↗(On Diff #295437)

TODO(Nick): delete

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

The problem has to do with blockaddress.

For normal branches, splitting an edge is usually straightforward: you just introduce a new block that branches to the original block, and then you rewrite the branch in the predecessor. But suppose you have a block with two indirectbr predecessors, and you want to split the edge. In that case, the "rewrite the branch" step doesn't work: you can't rewrite the address argument to the indirectbr. (Or at least, you can't without performing some invasive rewrite like indirectbr->switch lowering.)

efriedma added inline comments.Sep 30 2020, 6:08 PM
llvm/test/CodeGen/PowerPC/loop-reduce-callbr.ll
1 ↗(On Diff #295437)

Just running opt will automatically check the exit code, so this works. Generally you'd want to add a few CHECK lines to ensure loop-reduce is actually doing what you expect it to, though, so the test doesn't bitrot.

void added a comment.Sep 30 2020, 7:18 PM

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

The problem has to do with blockaddress.

For normal branches, splitting an edge is usually straightforward: you just introduce a new block that branches to the original block, and then you rewrite the branch in the predecessor. But suppose you have a block with two indirectbr predecessors, and you want to split the edge. In that case, the "rewrite the branch" step doesn't work: you can't rewrite the address argument to the indirectbr. (Or at least, you can't without performing some invasive rewrite like indirectbr->switch lowering.)

Correkt. We had a long discussion during the original implementation. I eventually relented, because I couldn't figure out a way to implement it without making the code unbearably horrible.

The best way to think about critical edges is to imagine trying to resolve a PHI node in block BB. The PHI node is a pseudo instruction, which has to become a concrete instruction during code gen. Let's take a simple case where the value is an immediate: 42 on branch A (from predecessor P1) and 37 on branch B (from predecessor P2), and further that the result of the PHI node is some register rN. When resolving the PHI node, you'll push rN <- 42 to the bottom of P1 and rN <- 37 to the bottom of P2. Nice!

Now if you have critical edges, things get messy. You can no longer push rN <- 42 and rN <- 37 into the predecessor blocks, because rN may be used for another value on edges not going to BB. In this case, we "split" the critical edge. I.e. we create an empty block between P1 and BB on edge A and another between P2 and BB on edge B. This provides us with a place to insert the rN <- 42 and rN <- 37 instructions respectively.

void added a comment.Sep 30 2020, 7:23 PM

I believe that a critical edge may be split on the default path, but not the indirect path.

I don't think https://github.com/llvm/llvm-project/commit/2fe457690da0fc38bc7f9f1d0aee2ba6a6a16ada made a similar distinction?

Yeah. I made an error there...

I think if I add a bool CallBrInst::isIndirectDest(BasicBlock&); method, I would be able to use that here and in the above cases linked without regressing the test case added in the above link.

It should be enough that the block isn't the default destination. (Though recent bug reports seem troubling for this.)

I don't know if it's important enough to allow splitting default edges right now. It may be okay to wait.

  • remove comment from test, simplify lambda
nickdesaulniers edited the summary of this revision. (Show Details)Jan 14 2021, 11:44 AM
nickdesaulniers added a reviewer: MaskRay.
nickdesaulniers added a subscriber: nathanchance.
  • add comments and FileCheck to tests

The concept of splitting a critical edge is still relatively new to me; any thoughts on *why* it would not be ok to split a critical branch of an indirect-like jump?

The problem has to do with blockaddress.

For normal branches, splitting an edge is usually straightforward: you just introduce a new block that branches to the original block, and then you rewrite the branch in the predecessor. But suppose you have a block with two indirectbr predecessors, and you want to split the edge. In that case, the "rewrite the branch" step doesn't work: you can't rewrite the address argument to the indirectbr. (Or at least, you can't without performing some invasive rewrite like indirectbr->switch lowering.)

Isn't "the address argument in the indirectbr" derived from a blockaddress? So couldn't you just find the blockaddress and rewrite it (or create a new one) that referred to the newly created block?

Correkt. We had a long discussion during the original implementation. I eventually relented, because I couldn't figure out a way to implement it without making the code unbearably horrible.

The best way to think about critical edges is to imagine trying to resolve a PHI node in block BB. The PHI node is a pseudo instruction, which has to become a concrete instruction during code gen. Let's take a simple case where the value is an immediate: 42 on branch A (from predecessor P1) and 37 on branch B (from predecessor P2), and further that the result of the PHI node is some register rN. When resolving the PHI node, you'll push rN <- 42 to the bottom of P1 and rN <- 37 to the bottom of P2. Nice!

Now if you have critical edges, things get messy. You can no longer push rN <- 42 and rN <- 37 into the predecessor blocks, because rN may be used for another value on edges not going to BB. In this case, we "split" the critical edge. I.e. we create an empty block between P1 and BB on edge A and another between P2 and BB on edge B. This provides us with a place to insert the rN <- 42 and rN <- 37 instructions respectively.

Thanks! I found https://www.inf.ed.ac.uk/teaching/courses/copt/lecture-4-from-ssa.pdf pretty instructive, too.

  • run the same invocations for both tests, remove unnecessary target_triple
nickdesaulniers retitled this revision from BreakCriticalEdges: bail if loop-simplify form requested for CallBr terminated BB to BreakCriticalEdges: do not split the critical edge from a CallBr indirect successor.
jyknight accepted this revision.Jan 14 2021, 1:13 PM

Looks totally reasonable to me.

This revision is now accepted and ready to land.Jan 14 2021, 1:13 PM
MaskRay added inline comments.Jan 14 2021, 1:29 PM
llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

Just use the new PM way: -passes='loop(loop-reduce)'

4

Please add some CHECK lines to prevent bitrot.

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
4

Beyond the one related to actual critical edge? Aren't those just noise?

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

If I do that for both tests:

  1. this one passes before this patch has been applied. (bad)(lol, wut)
  2. the other test still fails. (good)

Is there something funny with NPM that I need to specify additional passes in addition to loop(loop-reduce)?

MaskRay accepted this revision.Jan 14 2021, 2:00 PM

LGTM.

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
4

One RUN line opt -passes='loop(loop-reduce)' %s is sufficient.

-passes=loop-reduce is the same, just getting an implicit loop pass adapter.

The legacy PM test does not add too much value for new tests as we'll soon make a switch.

16

-NEXT

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting2.ll
24

-NEXT

nickdesaulniers marked 3 inline comments as done.
  • use CHECK-NEXT, drop -passes=loop-reduce
llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

Let me be more precise:

Before this patch is applied:

-loop-reduce:

callbr-critical-edge-splitting.ll:  fail
callbr-critical-edge-splitting2.ll: fail

-passes=loop-reduce:

callbr-critical-edge-splitting.ll:  pass (WTF)
callbr-critical-edge-splitting2.ll: fail

-passes='loop(loop-reduce)':

callbr-critical-edge-splitting.ll:  pass (WTF)
callbr-critical-edge-splitting2.ll: fail

So I guess it makes sense to drop -passes=loop-reduce if that's covered by -passes='loop(loop-reduce)', but it's curious to me from a TDD perspective why the test is green pre-patch for one pass manager (the new one) and red pre-patch for another pass manager (the old one).

The legacy PM test does not add too much value for new tests as we'll soon make a switch.

But the test is red pre-patch for legacy PM. WTF

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

The output between -debug-only=loop-pass is a bit different between -loop-reduce and -passes='loop(loop-reduce)' as well...

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

Interesting, ok, so -print-after-all shows that -loop-reduce (ie. old PM) runs more passes than just -loop-reduce. Looks like -loop-reduce additionally runs:

  1. -loop-simplify
  2. -loop-reduce
  3. -verify

I don't know how to run -loop-simplify with NPM (guessed: -passes='loop(loop-simplify,loop-reduce)'; doesn't work), but if I did I guess that might also make callbr-critical-edge-splitting2.ll pass pre-patch.

In that case, unless it's possible to run -loop-simplify with NPM, then I think I will simply drop the OPM run from callbr-critical-edge-splitting2.ll, since it doesn't add any signal, but keep OPM for callbr-critical-edge-splitting.ll since it does. Thoughts?

drop old pass manager test from callbr-critical-edge-splitting2.ll since it does not add signal

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

Aha, -passes='loop-simplify,loop(loop-reduce)'!

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

I would have expected -passes='loop-simplify,loop(loop-reduce)' (new pass manager) to fail pre-patch similar to -lood-reduce (old pass manager). It does not. I'm out of ideas here. Anyone know any tips or reasons why we get different results between new and old pass manager for loop reduction?

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

similar to -lood-reduce

similar to -loop-reduce (I don't have it misspelled locally or copy+paste mistakes)

Otherwise, I'm thinking of committing this as is.

nickdesaulniers added inline comments.
llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

@aeubanks I saw https://lists.llvm.org/pipermail/llvm-dev/2021-January/147695.html / https://bugs.llvm.org/show_bug.cgi?id=46649; is this thread of questions about different behavior between OPM and NPM something we should worry about?

aeubanks added inline comments.Jan 15 2021, 9:32 AM
llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

The new PM will run LCSSA and LoopSimplify before any loop passes: https://github.com/llvm/llvm-project/blob/6227069bdce6b0c3c22f0a0c8f1aef705985125a/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h#L415.

The legacy PM does not by default, but the loop-reduce legacy pass depends on loop-simplify. Adding -loop-simplify -lcssa at the beginning will probably make both RUN lines behave the same. Or just you can make the input already in LCSSA form.

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

I tried -passes='loop-simplify,lcssa,loop(loop-reduce)' (and -passes='loop-simplify,lcssa,loop-reduce'), but callbr-critical-edge-splitting.ll still seems to pass before this patch is applied, which is unexpected. So there's still something different here between pass managers. (Also, I'm surprised that -print-after-all and -print-before-all don't seem to mention anything related to lcssa for LPM/OPM.)

Ok, this is weird: comparing the output of -print-before-all between -loop-reduce and -passes='loop-simplify,loop(loop-reduce)':

-loop-reduce:

*** IR Dump Before Canonicalize natural loops ***
*** IR Dump Before Loop Strength Reduction ***

-passes='loop-simplify,loop(loop-reduce)':

*** IR Dump Before VerifierPass ***
*** IR Dump Before LoopSimplifyPass ***
*** IR Dump Before LoopSimplifyPass ***
*** IR Dump Before LCSSAPass ***
*** IR Dump Before LoopStrengthReducePass ***
*** IR Dump Before LoopStrengthReducePass ***
*** IR Dump Before VerifierPass ***
*** IR Dump Before PrintModulePass ***

Canonicalize natural loops looks to me like loop-simplify, and yet when I explicitly run loop-simplify via NPM, the printed pass name is LoopSimplifyPass (I would have expected Canonicalize natural loops). What is going on?

llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

llvm-project/llvm/lib/Transforms/Utils/LoopSimplify.cpp has the following:

INITIALIZE_PASS_BEGIN(LoopSimplify, "loop-simplify",
                "Canonicalize natural loops", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
INITIALIZE_PASS_END(LoopSimplify, "loop-simplify",
                "Canonicalize natural loops", false, false)

so it looks like -loop-simplify is silently running 3 additional passes? (That's annoying that those passes don't print anything via -print-before-all, unless they're not being run). Looking at DominatorTreeWrapperPass, it looks like that's not ported to run with NPM? Same for LoopInfoWrapperPass.

At this point; I'm just going to commit this test invoking both pass managers: this is a bug visible with LPM, but it should not regress for NPM.

aeubanks added inline comments.Jan 15 2021, 11:14 AM
llvm/test/Transforms/LoopStrengthReduce/callbr-critical-edge-splitting.ll
2

Canonicalize natural loops = LoopSimplifyPass. The naming is different for implementation reasons.

AssumptionCacheTracker/DomTree/LoopInfo are analysis passes that the pass uses, they won't show up in --print-before/after-all. The new PM has a different way of querying analyses.

The legacy PM doesn't automatically run LCSSA, so you won't see it unless the pass depends on it.

void accepted this revision.Jan 15 2021, 11:38 AM