This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Preserve LCSSA when rewriting instruction with PHI user
ClosedPublic

Authored by dmakogon on Mar 24 2023, 7:18 AM.

Details

Summary

Fixes https://github.com/llvm/llvm-project/issues/61182.
LoopStrengthReduce may sometimes break LCSSA form when applying a rewrite for an instruction used in a PHI.
It happens if:

  1. The PHI is in a loop exit block,
  2. The edge from the corresponding exiting block to that exit is critical,
  3. The PHI has at least two inputs coming from loop blocks,
  4. The rewritten instruction is inserted in the loop.

An example of initial CFG:

LoopBlock  <--  ExitingBlock1   ExitingBlock2  
                    |                 |
                    v                 |
                ExitBlock  <----------+

In such case we split the critical edge:

LoopBlock  <--  ExitingBlock1   ExitingBlock2  
                    |                |
                    v                v
                 CritEdge          split
                    |                |
                    v                |
                ExitBlock  <---------+
           (not exit anymore)

and then replace PHI inputs with the rewritten instruction. However ExitBlock is no longer a loop exit, so LCSSA form is broken.

This patch fixes it by collecting all inserted instructions for PHIs whose parent block is not a loop exit and then forming LCSSA for them.

Diff Detail

Event Timeline

dmakogon created this revision.Mar 24 2023, 7:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 7:18 AM
dmakogon requested review of this revision.Mar 24 2023, 7:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2023, 7:18 AM
dmakogon updated this revision to Diff 508085.Mar 24 2023, 7:22 AM

Accidentally changed one test's run line to use new PM, reverted back now.

mkazantsev added inline comments.Mar 27 2023, 3:33 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5623

I propose to form InsertedNonLCSSAInsts here, because here the order is deterministic.

5668

The order of iteration here is not deterministic. I don't know if it's a source of any real problem, but we may end up doing slightly different things in different order. If we are especially unlucky, it will be different resulting code. See my other comment where I think it's appropriate to form this vector.

mkazantsev added inline comments.Mar 27 2023, 3:35 AM
llvm/test/Transforms/LoopStrengthReduce/depth-limit-overrun.ll
77

That's weird.

mkazantsev added inline comments.Mar 27 2023, 3:39 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5540–5547

I'm curious, is it possible that second value here is *not* an instruction? Might be a good NFC change beforehand.

dmakogon added inline comments.Mar 27 2023, 10:30 AM
llvm/test/Transforms/LoopStrengthReduce/depth-limit-overrun.ll
77

I guess it's ok because we get exactly the same IR as with DEFAULT checks above which are generated by running the pass through new pass manager.

dmakogon added inline comments.Mar 28 2023, 1:15 AM
llvm/test/Transforms/LoopStrengthReduce/depth-limit-overrun.ll
77

Ok, I found out why this happens. LSR here calculates the exit value of the inner loop IV (%i33) using SCEV, and it is able to do so because its only use is in loop exit block (in LCSSA PHI which is inserted in outer_tail.loopexit block because of my changes). Previously we wouldn't create this LCSSA PHI and the IV's use would be in a inner loop exit's successor.
LSR checks for LCSSA form of the loop explicitly, and if the check fails, we don't even try this exit value calculation.
The corresponding code for that:

if (L->isRecursivelyLCSSAForm(DT, LI) && L->getExitBlock()) {
    ...
    int Rewrites = rewriteLoopExitValues(L, &LI, &TLI, &SE, &TTI, Rewriter, &DT,
                                             UnusedIndVarInLoop, DeadInsts);
dmakogon added inline comments.Mar 28 2023, 1:34 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
5540–5547

Yes, it's possible. LSR uses SCEVExpander here. There are case where it expands the expression to constants.

dmakogon updated this revision to Diff 508932.Mar 28 2023, 1:35 AM

Address comment: collect instructions right where they are inserted.

dmakogon marked 3 inline comments as done.Mar 28 2023, 1:36 AM
mkazantsev accepted this revision.Mar 28 2023, 1:41 AM
mkazantsev added inline comments.
llvm/test/Transforms/LoopStrengthReduce/depth-limit-overrun.ll
77

So it's an improvement then. Allright, let it be.

This revision is now accepted and ready to land.Mar 28 2023, 1:41 AM
This revision was landed with ongoing or failed builds.Mar 30 2023, 12:49 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 30 2023, 2:19 AM

This seems to break tests: http://45.33.8.238/linux/103038/step_12.txt

Please take a look and revert for now if it takes a while to fix.

The mve-float-loops.ll change that was added after the review is a performance regression because of the extra mov instruction. The mov comes from a copy that can't be merged because of "interference" in RegisterCoalescer.cpp.

Considering merging to rGPR with %73 in %106
RHS = %73 [384r,448r:0) 0@384r  weight:0.000000e+00
LHS = %106 [192r,240B:0)[240B,432r:3)[448r,896B:4)[896B,1008r:5)[1312r,1504B:1)[1616r,2096B:2) 0@192r 1@1312r 2@1616r 3@240B-phi 4@448r 5@896B-phi  weight:0.000000e+00
merge %106:4@448r into %73:0@384r --> @384r
interference at %73:0@384r
Interference!

Do you think you'd be able to fix that in this patch or a follow-up?