This is an archive of the discontinued LLVM Phabricator instance.

[LSR] Fold terminating condition to other IV when possible
ClosedPublic

Authored by eopXD on Aug 23 2022, 12:53 AM.

Details

Summary

When the IV is only used by the terminating condition (say IV-A) and the loop
has a predictable back-edge count and we have another IV (say IV-B) that is an
affine add recursion, we will be able to calculate the terminating value of
IV-B in the loop pre-header. This patch adds attempts to replace IV-B as the
new terminating condition and remove IV-A. It is safe to do so since IV-A is
only used as the terminating condition.

This transformation is suitable to be appended after LSR as it may optimize the
loop into the situation mentioned above. The transformation can reduce number of
IV-s in the loop by one.

A cli option lsr-term-fold is added and default disabled.

Diff Detail

Event Timeline

eopXD created this revision.Aug 23 2022, 12:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 12:53 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
eopXD requested review of this revision.Aug 23 2022, 12:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 12:53 AM
eopXD added reviewers: Restricted Project, Whitney, Meinersbur, congzhe, bmahjour.Aug 23 2022, 12:54 AM
fhahn added a subscriber: fhahn.Aug 23 2022, 1:02 AM
fhahn added inline comments.
llvm/test/Transforms/LoopStrengthReduce/lsr-fold-iv-complicate-add-rec.ll
46 ↗(On Diff #454725)

is this needed?

50 ↗(On Diff #454725)

is this needed? Would be good to try to simplify the test as much as possible. Also, it would be good to add a patch just for the tests and then only include the diff here.

eopXD added inline comments.Aug 23 2022, 1:19 AM
llvm/test/Transforms/LoopStrengthReduce/lsr-fold-iv-complicate-add-rec.ll
46 ↗(On Diff #454725)

Thanks for dropping by. Yes, my code checks if the phi node has a value that comes from the loop pre-header. Without this the condition check won't pass.

50 ↗(On Diff #454725)

Thanks for the reminder. Let me add a pre-commit test for this.

fhahn added inline comments.Aug 23 2022, 1:25 AM
llvm/test/Transforms/LoopStrengthReduce/lsr-fold-iv-complicate-add-rec.ll
46 ↗(On Diff #454725)

right, but it doesn't need to be defined in the preheader, this could just be %mark I assume?

50 ↗(On Diff #454725)

It would be good to clean them up before committing the test. Also, it looks like it is missing negative tests.

59 ↗(On Diff #454725)

It looks like this function doesn't need to return a pointer?

eopXD updated this revision to Diff 454751.Aug 23 2022, 2:18 AM

Update code based on pre-commit test.

eopXD marked 5 inline comments as done.Aug 23 2022, 2:19 AM
eopXD added inline comments.
llvm/test/Transforms/LoopStrengthReduce/lsr-fold-iv-complicate-add-rec.ll
46 ↗(On Diff #454725)

Done.

50 ↗(On Diff #454725)

Will add negative test in pre-commit test.

59 ↗(On Diff #454725)

Yeah, this is real code extracted, will prune in pre-commit test.

eopXD updated this revision to Diff 454755.Aug 23 2022, 2:25 AM
eopXD marked 3 inline comments as done.

Update testcase effected by this patch.

mcberg2021 added inline comments.Aug 23 2022, 7:47 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6597

With isLoopSimplifyForm() you do not need to check for LoopLatch and LoopPreheader here, it already does.

6616

What happens if there is more than one use here? Why not use hasOneUse() and return if false?

6625

If you use hasOneUse, you will not need a loop here, you can still get the beginning and check

eopXD edited the summary of this revision. (Show Details)Aug 24 2022, 12:52 AM
eopXD added inline comments.Aug 24 2022, 2:32 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6616

The main intention here is that the primary IV of the loop, may be transformed by multiple LLVM-IR instructions, but is only used to determine the terminal condition. In this case the use-def chain will be a long chain until the final value that is for

  1. used by the terminal condition
  2. goes into the phi-node in the loop header

So loop here is to travel through the long chain until meeting value that is used exactly twice.

When the scenario 1 and 2 is satisfied, we can say that this IV (and its corresponding PHINode) is the one we are looking to fold. We will need to traverse through the chain to verify the final usage is for 1 and 2.

Here is a illustration.

eopXD updated this revision to Diff 455119.Aug 24 2022, 2:51 AM

Add negative testcase.

eopXD updated this revision to Diff 455265.EditedAug 24 2022, 10:08 AM

Since LSR is used on many targets, adding this transformation will affect many
testcases and it will make this patch hard to review. For now we can open up
an option for the added transformation and only observe on the middle-end
testcases.

After this patch lands, we can then go on and create a TTI to allow reviewers
from each target to see if they want this transformation in their backend.

eopXD updated this revision to Diff 455266.Aug 24 2022, 10:11 AM

Recover testcase LoopStrengthReduce/opaque-ptr.ll since an option is now added
to guard the transformation.

eopXD marked an inline comment as done.Aug 24 2022, 10:15 AM
eopXD added inline comments.
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6597

Done. Thank you for the tip.

eopXD updated this revision to Diff 455293.Aug 24 2022, 11:06 AM
eopXD marked an inline comment as done.

Only perform transformation and set Changed to true when the SCEV is safe to expand.

eopXD updated this revision to Diff 455301.Aug 24 2022, 11:22 AM

Add comments and illustration on how the transformation is qualified.
Add TODO for follow-ups.

mcberg2021 accepted this revision.Aug 24 2022, 11:28 AM

LGTM, you might allow time for others though...

This revision is now accepted and ready to land.Aug 24 2022, 11:28 AM
eopXD updated this revision to Diff 455489.EditedAug 25 2022, 12:46 AM

Update code based on slight adjustment in pre-commit test.

eopXD updated this revision to Diff 455492.Aug 25 2022, 12:48 AM

Fix minor type-o.

hiraditya added inline comments.Aug 26 2022, 9:52 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
192–194
cl::desc("Attempt to replace primary IV with other IV."));

Should "terminal condition" in the description by "terminating condition"?

craig.topper requested changes to this revision.Aug 26 2022, 10:08 AM
craig.topper added inline comments.
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
190

Terminal -> Terminating? The word terminal does not appear in LSR prior to this patch.

6583

std::tuple?

6794

Is this loop just ToHelpFold->getIncomingValueForBlock(LoopPreheader)?

6795

Do you really need a cast here? If you do, I'd use U.get() instead.

6797

Same here

This revision now requires changes to proceed.Aug 26 2022, 10:08 AM
craig.topper added inline comments.Aug 26 2022, 10:15 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6615

Combine this if with the previous if

if (!isa<ICmpInst>(TermCond) || !cast<ICmpInst>(TermCond)->isEquality())
6638

std::function<bool(PHINode &)> -> auto

6641

This cast is unecessary.

6665

std::function<bool(PHINode &)> -> auto

6671

Can this use PN.getIncomingValueForBlock or getBasicBlockIndex?

craig.topper added inline comments.Aug 26 2022, 10:15 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6701

Combine this with the previous if

eopXD updated this revision to Diff 456126.Aug 27 2022, 9:22 AM
eopXD marked 12 inline comments as done.

Address comments from @craig.topper.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
190

Replaced all use of "terminal condition" to "terminating condition".

192–194

Thank you for the simplification. Was trying to explain in detail but this is much better.

6583

Thank you for the tip, done.

6615

Done.

6671

This simplifies the code very much. Thank you for the tip.

6701

Done. Thank you.

6794

Yes, thank you.

6795

The problem is elegantly solved by using PHINode:: getIncomingValueForBlock you've mentioned.

craig.topper added inline comments.Aug 27 2022, 2:55 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
190

The option name still says Terminal

Review needs to re-titled and the description needs to be updated

craig.topper added inline comments.Aug 27 2022, 2:57 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6608

neq -> ne

craig.topper added inline comments.Aug 27 2022, 3:12 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6638

Is PN.getNumIncomingValues() == 2 implied by LoopSimplifyForm?

6665

Is PN.getNumIncomingValues() == 2 implied by LoopSimplifyForm?

6672

Are the LoopPreheader and LoopLatch being predecessors of the phi implied by LoopSimplify form?

eopXD retitled this revision from [LSR] Fold terminal condition to other IV when possible to [LSR] Fold terminating condition to other IV when possible.Aug 27 2022, 10:57 PM
eopXD edited the summary of this revision. (Show Details)
eopXD updated this revision to Diff 456171.Aug 27 2022, 11:14 PM
eopXD marked 5 inline comments as done.

Address review comments.

By definition of LoopSimplifyForm (contains exactly 1 preheader, exactly 1 latch, and has dedicated exit), the code can be simplified.
Thank you @craig.topper for the review.

eopXD updated this revision to Diff 456612.Aug 30 2022, 4:30 AM

Rebased upon change in D132443.

eopXD updated this revision to Diff 456825.Aug 30 2022, 5:43 PM

Rename testcase.

fhahn added inline comments.Aug 31 2022, 10:59 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6613

Looks like a negative test with different branch conditions is missing.

6674

it looks like there's a negative test missing with types that aren't scev-able (e.g. vector types I think).

6679

It looks like there's a test case missing where one of the phis isn't an AddRec/non-affine AddRec.

nikic added a subscriber: nikic.Sep 4 2022, 3:02 AM

Could you please update the patch description to describe in more detailed what this patch does?

eopXD edited the summary of this revision. (Show Details)Sep 6 2022, 12:50 AM
eopXD edited the summary of this revision. (Show Details)Sep 6 2022, 12:52 AM
eopXD updated this revision to Diff 458123.Sep 6 2022, 2:18 AM
eopXD marked 2 inline comments as done.
eopXD edited the summary of this revision. (Show Details)

Added missing negative testcases.

eopXD marked an inline comment as done.Sep 6 2022, 2:19 AM
eopXD added inline comments.
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6674

Done, thank you for the reminder.

6679

Done, thank you for the reminder.

eopXD updated this revision to Diff 458174.Sep 6 2022, 7:43 AM

Fix type-o.

eopXD added a comment.Sep 7 2022, 7:36 PM

Some statistics to help push the patch, here are the number of loops affected when compiling spec2k6 with downstream compiler and this patch applied:

400.perlbench 56
401.bzip2 6
403.gcc 155
429.mcf 1
445.gobmk 45
456.hmmer 44
458.sjeng 9
462.libquantum 4
464.h264ref 125
471.omnetpp 7
473.astar 14
483.xalancbmk 81
eopXD updated this revision to Diff 458630.Sep 7 2022, 7:44 PM

Add counter to record number of folding done.

eopXD added a comment.Sep 8 2022, 2:50 AM

Gentle ping, thank you. @fhahn @craig.topper

Is the description acceptable now? @nikic

eopXD edited the summary of this revision. (Show Details)Sep 8 2022, 2:51 AM

ping, thank you.

craig.topper added inline comments.Sep 14 2022, 9:38 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6583

Looking at this again. Could this be an Optional<std::pair<PHINode *, PHINode *>> Then you could return None for CantFold.

craig.topper added inline comments.Sep 14 2022, 9:51 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6849

What are the other users of OldTermCond and why is it ok to replace them with poison?

6854

Is there a DCE after this? Earlier optimizations in this function used DeleteDeadPHIs to recursively clean up.

eopXD updated this revision to Diff 460140.Sep 14 2022, 9:55 AM
eopXD marked an inline comment as done.

Address comment.

eopXD added inline comments.Sep 14 2022, 11:18 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6583

Modified, thank you for the tip.

eopXD updated this revision to Diff 460179.Sep 14 2022, 12:14 PM
eopXD marked 2 inline comments as done.

Address comment.

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6849

Nice catch. Added check and negative test for it.

6854

Yes. Removing the terminal condition and leveraging DeleteDeadPHIs is a much better approach. Thank you.

craig.topper added inline comments.Sep 15 2022, 6:22 PM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6849

What if OldTermCond isn't the direct user of the ToFold phi? Do we clean everything?

eopXD added inline comments.Sep 16 2022, 4:09 AM
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
6849

Yes, DeleteDeadPHIs is able to delete the chain of uses of the ToFold phi.

eopXD updated this revision to Diff 460705.Sep 16 2022, 5:30 AM

Fix testcase.

This revision is now accepted and ready to land.Sep 18 2022, 7:42 PM
eopXD updated this revision to Diff 461455.Sep 19 2022, 7:49 PM

Rebase to latest main.

eopXD edited the summary of this revision. (Show Details)Sep 19 2022, 7:51 PM
eopXD updated this revision to Diff 461457.Sep 19 2022, 7:52 PM

Rebase to latest main.

This revision was landed with ongoing or failed builds.Sep 20 2022, 1:38 AM
This revision was automatically updated to reflect the committed changes.

I recently stumbled into a test case which would benefit from this, and as a result, took a close look at the code in tree.

I posted one generalization https://reviews.llvm.org/D142240 so that this covers my case of interest.

However, I also think we've got a couple of correctness issues here which need addressed.

First, I don't see anything in the code which checks that the alternate IV is at least as wide as the starting IV. As a result, we can try to rewrite a loop with a BE count of say 1024 to use an IV which is an i8. This can change the BE count of the loop.

Second, we're adding a use to an existing IV. When doing that, we need to worry about about poison correctness. The case to consider is the alternate IV overflowing on the last iteration, and that poison value previously being unused (since we don't take backedge), and now used in the latch test (resulting in UB).

(More observations)

For example of code which gets both of the above issues correct, look at the LFTR and rewriteLoopExitValues transformations.

Third, we need some form of cost model. The simplest would be to add a isHighCostExpansion check on the new bound. SCEV expressions can (in edge cases) be quite involved, and we probably don't want to do this rewrite universally.