This is an archive of the discontinued LLVM Phabricator instance.

[UnifyLoopExits] Reduce number of guard blocks
ClosedPublic

Authored by bcahoon on Apr 6 2022, 9:05 AM.

Details

Summary

UnifyLoopExits creates a single exit, a control flow hub, for
loops with multiple exits. There is an input to the block for
each loop exiting block and and output from the block for each
loop exit block. Multiple checks, or guard blocks, are needed
to branch to the correct exit block.

For large loops with lots of exit blocks, all the extra guard
blocks cause problems for StructurizeCFG and subsequent passes.
This patch reduces the number of guard blocks needed when the
exit blocks branch to a common block (e.g., an unreachable
block). The guard blocks are reduced by changing the inputs
and outputs of the control flow hub. The inputs are the exit
blocks and the outputs are the common block.

Reducing the guard blocks enables StructurizeCFG to reorder the
basic blocks in the CFG to reduce the values that exit a loop
with multiple exits. This reduces the compile-time of
StructurizeCFG and also reduces register pressure.

Diff Detail

Event Timeline

bcahoon created this revision.Apr 6 2022, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 9:05 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
bcahoon requested review of this revision.Apr 6 2022, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 9:05 AM
sameerds added inline comments.Apr 7 2022, 12:59 AM
llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
163–164 ↗(On Diff #420906)

This can be rewritten as "if (auto *Succ = ...) "

182 ↗(On Diff #420906)

Can you rephrase a bit? The join blocks are for what?

207 ↗(On Diff #420906)

Is it true that an exit block always has a single predecessor? This will return nullptr otherwise. Maybe a comment will help.

215 ↗(On Diff #420906)

What does this do? We haven't done any updates yet, right?

235–242 ↗(On Diff #420906)

Scary stuff. Could you please add a comment? I think this is related to the test "unreachable_from_inner_loop" ... but an explanation in English will help a lot.

bcahoon updated this revision to Diff 421257.Apr 7 2022, 10:02 AM

Changes based upon review comments.

bcahoon marked 5 inline comments as done.Apr 7 2022, 10:08 AM
bcahoon added inline comments.
llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
207 ↗(On Diff #420906)

Great point. I've added a check when CommonSuccs is created that checks for a single predecessor. It's the common case to try to optimize, along with the changes needed to StructurzeCFG. It's possible to allow multiple predecessors though that could complicate the node ordering changes in StructurizeCFG.

215 ↗(On Diff #420906)

Thanks for catching this.

235–242 ↗(On Diff #420906)

Agreed. Please let me know if this requires additional clarification.

sameerds added inline comments.Apr 9 2022, 3:30 AM
llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
235–242 ↗(On Diff #420906)

I am able to follow the test unreachable_from_inner_loop, but not seeing how the part about "if (PredLoop) remove pred" works. In the test, { B D F} is an inner loop, inside { A B D F G} and two guards are generated. But neither guard causes a node to move from a non-child loop to ParentLoop.

bcahoon updated this revision to Diff 421755.Apr 9 2022, 2:27 PM
bcahoon marked 3 inline comments as done.

Added a test to show the case when we need to move the guard block predecessor from an outer loop to an inner loop. Three nested loops are needed for this. Initially, the guard block predecessor is added to the outer loop, but then need to change to the middle loop.

bcahoon updated this revision to Diff 421757.Apr 9 2022, 2:43 PM

Messed up the last revision by including another patch. This hopefully fixes it. The only new change is an additional test case, three_loops, added to test/Transforms/UnifyLoopExits/reduce_guards.ll

bcahoon added inline comments.Apr 9 2022, 2:46 PM
llvm/lib/Transforms/Utils/UnifyLoopExits.cpp
235–242 ↗(On Diff #420906)

I've added a test case to show this condition. It requires at least 3 nested loops. A guard block predecessor is moved from the outermost loop to the middle loop.

sameerds accepted this revision.Apr 10 2022, 10:36 PM
This revision is now accepted and ready to land.Apr 10 2022, 10:36 PM
This revision was landed with ongoing or failed builds.Jun 22 2022, 1:49 PM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: ruiling.Jun 23 2022, 7:46 AM
foad added a subscriber: foad.

Unfortunately this can breaks reconvergence in some edge cases.
This occurs because exits in an inner loop may end up no longer reconvergencing with exits in an outer loop.
I am trying to clean up on a test case based on a Vulkan CTS test failure this causes.

I believe the solution is to test that all predecessors of the common successor are members of the set of exits.
If not then it indicates the common successor is part of a reconvergence structure shared by inner and outer loop exits and this optimization cannot be used.

e.g.

for (auto CS : CommonSuccs) {
    // ...

    if (!llvm::all_of(predecessors(CB), [Exits](auto Pred) {return Exits.contains(Pred);}))
      continue;
    
    // ...
  }

Hi @critson, thanks for pointing out the issue, and a potential fix. I'm looking at it now.

@bcahoon Thanks! I have attached what I think is a semi-representative test for the issue.

In the test Exit2, Exit3 and Exit4 should all reconverged at the TrueExit.
However with this change Exit3 and Exit4 are reconverged with C, prior to control transfer to TrueExit or Loop2.preheader.
Again this test has been heavily cut down from an actual shader, and has lost some nuance in the process, so let me know if you want the original.

Hi @critson. Thanks for reducing the test case. I don't see an error with the test case though, so I may be missing something. Or, perhaps I need to see the complete test case? Here's my understanding. With the patch, control flow still converges at TrueExit. It's just that the Exit3 and Exit4 blocks have moved in the CFG so that they appear prior to loop.exit.guard rather than after. Following the exit paths from F, G, and C. I think that should be ok?

Before patch:
F -> loop.exit.guard ->loop.exit.guard1 -> Exit3 -> TrueExit
G -> loop.exit.guard -> loop.exit.guard1 -> Exit4 -> TrueExit
C -> loop.exit.guard -> Loop2.preheader -> Loop2 -> Exit2 -> TrueExit

After patch:
F -> Exit3 -> loop.exit.guard -> TrueExit
G -> Exit4 -> loop.exit.guard => TrueExit
C -> loop.exit.guard -> Loop2 -> Exit2 -> TrueExit

The CFG structure (hopefully the formatting works)
Before patch:

              C   F   G
              \   |   /
          loop.exit.guard
            /         \  
Loop2PH/Loop2   loop.exit.guard1
     |               /      \
     \             Exit3 Exit4
      Exit2       /     /
           \     |     /
             TrueExit

After patch:

              F    G
              |    |
      C    Exit3 Exit4
        \     |     |
       loop.exit.guard
          /      \
Loop2PH/Loop2    |
          |      |
       Exit2     /
           \    /
          TrueExit