Page MenuHomePhabricator

[LSR] Fold terminating condition to other IV when possible
ClosedPublic

Authored by eopXD on Aug 23 2022, 12:53 AM.

Details

Summary

When the IV is only used by the terminating condition (say IV-A) and the loop
has a predictable back-edge count and we have another IV (say IV-B) that is an
affine add recursion, we will be able to calculate the terminating value of
IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the
new terminating condition and remove IV-A. It is safe to do so since IV-A is
only used as the terminating condition.

This transformation is suitable to be appended after LSR as it may optimize the
loop into the situation mentioned above. The transformation can reduce number of
IV-s in the loop by one.

A cli option lsr-term-fold is added and default disabled.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
craig.topper added inline comments.Aug 26 2022, 10:15 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6697

Combine this with the previous if

eopXD updated this revision to Diff 456126.Aug 27 2022, 9:22 AM
eopXD marked 12 inline comments as done.

Address comments from @craig.topper.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
189

Replaced all use of "terminal condition" to "terminating condition".

191–193

Thank you for the simplification. Was trying to explain in detail but this is much better.

6579

Thank you for the tip, done.

6611

Done.

6667

This simplifies the code very much. Thank you for the tip.

6697

Done. Thank you.

6785

Yes, thank you.

6786

The problem is elegantly solved by using PHINode:: getIncomingValueForBlock you've mentioned.

craig.topper added inline comments.Aug 27 2022, 2:55 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
189

The option name still says Terminal

Review needs to re-titled and the description needs to be updated

craig.topper added inline comments.Aug 27 2022, 2:57 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6604

neq -> ne

craig.topper added inline comments.Aug 27 2022, 3:12 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6634

Is PN.getNumIncomingValues() == 2 implied by LoopSimplifyForm?

6661

Is PN.getNumIncomingValues() == 2 implied by LoopSimplifyForm?

6668

Are the LoopPreheader and LoopLatch being predecessors of the phi implied by LoopSimplify form?

eopXD retitled this revision from [LSR] Fold terminal condition to other IV when possible to [LSR] Fold terminating condition to other IV when possible.Aug 27 2022, 10:57 PM
eopXD edited the summary of this revision. (Show Details)
eopXD updated this revision to Diff 456171.Aug 27 2022, 11:14 PM
eopXD marked 5 inline comments as done.

Address review comments.

By definition of LoopSimplifyForm (contains exactly 1 preheader, exactly 1 latch, and has dedicated exit), the code can be simplified.
Thank you @craig.topper for the review.

eopXD updated this revision to Diff 456612.Aug 30 2022, 4:30 AM

Rebased upon change in D132443.

eopXD updated this revision to Diff 456825.Aug 30 2022, 5:43 PM

Rename testcase.

fhahn added inline comments.Aug 31 2022, 10:59 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6609

Looks like a negative test with different branch conditions is missing.

6670

it looks like there's a negative test missing with types that aren't scev-able (e.g. vector types I think).

6675

It looks like there's a test case missing where one of the phis isn't an AddRec/non-affine AddRec.

nikic added a subscriber: nikic.Sep 4 2022, 3:02 AM

Could you please update the patch description to describe in more detailed what this patch does?

eopXD edited the summary of this revision. (Show Details)Sep 6 2022, 12:50 AM
eopXD edited the summary of this revision. (Show Details)Sep 6 2022, 12:52 AM
eopXD updated this revision to Diff 458123.Sep 6 2022, 2:18 AM
eopXD marked 2 inline comments as done.
eopXD edited the summary of this revision. (Show Details)

Added missing negative testcases.

eopXD marked an inline comment as done.Sep 6 2022, 2:19 AM
eopXD added inline comments.
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6670

Done, thank you for the reminder.

6675

Done, thank you for the reminder.

eopXD updated this revision to Diff 458174.Sep 6 2022, 7:43 AM

Fix type-o.

eopXD added a comment.Sep 7 2022, 7:36 PM

Some statistics to help push the patch, here are the number of loops affected when compiling spec2k6 with downstream compiler and this patch applied:

400.perlbench 56
401.bzip2 6
403.gcc 155
429.mcf 1
445.gobmk 45
456.hmmer 44
458.sjeng 9
462.libquantum 4
464.h264ref 125
471.omnetpp 7
473.astar 14
483.xalancbmk 81
eopXD updated this revision to Diff 458630.Sep 7 2022, 7:44 PM

Add counter to record number of folding done.

eopXD added a comment.Sep 8 2022, 2:50 AM

Gentle ping, thank you. @fhahn @craig.topper

Is the description acceptable now? @nikic

eopXD edited the summary of this revision. (Show Details)Sep 8 2022, 2:51 AM

ping, thank you.

craig.topper added inline comments.Sep 14 2022, 9:38 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6579

Looking at this again. Could this be an Optional<std::pair<PHINode *, PHINode *>> Then you could return None for CantFold.

craig.topper added inline comments.Sep 14 2022, 9:51 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6840

What are the other users of OldTermCond and why is it ok to replace them with poison?

6845

Is there a DCE after this? Earlier optimizations in this function used DeleteDeadPHIs to recursively clean up.

eopXD updated this revision to Diff 460140.Sep 14 2022, 9:55 AM
eopXD marked an inline comment as done.

Address comment.

eopXD added inline comments.Sep 14 2022, 11:18 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6579

Modified, thank you for the tip.

eopXD updated this revision to Diff 460179.Sep 14 2022, 12:14 PM
eopXD marked 2 inline comments as done.

Address comment.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6840

Nice catch. Added check and negative test for it.

6845

Yes. Removing the terminal condition and leveraging DeleteDeadPHIs is a much better approach. Thank you.

craig.topper added inline comments.Sep 15 2022, 6:22 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6840

What if OldTermCond isn't the direct user of the ToFold phi? Do we clean everything?

eopXD added inline comments.Sep 16 2022, 4:09 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6840

Yes, DeleteDeadPHIs is able to delete the chain of uses of the ToFold phi.

eopXD updated this revision to Diff 460705.Sep 16 2022, 5:30 AM

Fix testcase.

This revision is now accepted and ready to land.Sep 18 2022, 7:42 PM
eopXD updated this revision to Diff 461455.Sep 19 2022, 7:49 PM

Rebase to latest main.

eopXD edited the summary of this revision. (Show Details)Sep 19 2022, 7:51 PM
eopXD updated this revision to Diff 461457.Sep 19 2022, 7:52 PM

Rebase to latest main.

This revision was landed with ongoing or failed builds.Sep 20 2022, 1:38 AM
This revision was automatically updated to reflect the committed changes.

I recently stumbled into a test case which would benefit from this, and as a result, took a close look at the code in tree.

I posted one generalization https://reviews.llvm.org/D142240 so that this covers my case of interest.

However, I also think we've got a couple of correctness issues here which need addressed.

First, I don't see anything in the code which checks that the alternate IV is at least as wide as the starting IV. As a result, we can try to rewrite a loop with a BE count of say 1024 to use an IV which is an i8. This can change the BE count of the loop.

Second, we're adding a use to an existing IV. When doing that, we need to worry about about poison correctness. The case to consider is the alternate IV overflowing on the last iteration, and that poison value previously being unused (since we don't take backedge), and now used in the latch test (resulting in UB).

(More observations)

For example of code which gets both of the above issues correct, look at the LFTR and rewriteLoopExitValues transformations.

Third, we need some form of cost model. The simplest would be to add a isHighCostExpansion check on the new bound. SCEV expressions can (in edge cases) be quite involved, and we probably don't want to do this rewrite universally.