This is an archive of the discontinued LLVM Phabricator instance.

[ScalarEvolution] Handle <= and >= in non infinite loops
ClosedPublic

Authored by wsmoses on Jan 24 2022, 6:10 PM.

Details

Summary

Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1.

In the case of lhs <= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop.

In the case of lhs >= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop.

Diff Detail

Event Timeline

wsmoses created this revision.Jan 24 2022, 6:10 PM
wsmoses requested review of this revision.Jan 24 2022, 6:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2022, 6:10 PM
nikic requested changes to this revision.Jan 25 2022, 12:54 AM
nikic added a reviewer: reames.

Why does this add a FiniteLoops flag to SCEV, rather than using mustprogress attributes/metadata? I'm not convinced that should be added, and in either case this change is independent from the change to exit limit logic.

llvm/unittests/Analysis/ScalarEvolutionTest.cpp
1750

Is there any particular reason these can't use normal print<scalar-evolution> tests? See various llvm/test/Analysis/ScalarEvolution/trip-count*.ll tests for example.

This revision now requires changes to proceed.Jan 25 2022, 12:54 AM

I second @nikic comments. Also, we already have quite similar in SimplifyICmpOperands, so if there's a way to common the code to use the mustprogress/guarded exit fact there, that would be preferred.

Can do regarding SimplifyICmp and also the trip count tests.

As for the flag, a question for you all:
The existing llvm metadata for mustprogress is insufficient as it only implies the loop is finite if it does not interact with the environment of memory. Here, however, it would be nice if we could more strongly apply this to codes (e.g. if there was a call in the loop). This code came from a project where we did so with an additional flag (in that case all loops can be assumed finite), though I concur an existing way in the IR would be nice. Perhaps a finite loop metadata?

wsmoses updated this revision to Diff 402957.Jan 25 2022, 10:13 AM

Move logic into simplifyicmp

As for the flag, a question for you all:
The existing llvm metadata for mustprogress is insufficient as it only implies the loop is finite if it does not interact with the environment of memory. Here, however, it would be nice if we could more strongly apply this to codes (e.g. if there was a call in the loop). This code came from a project where we did so with an additional flag (in that case all loops can be assumed finite), though I concur an existing way in the IR would be nice. Perhaps a finite loop metadata?

Adding loop metadata for this purpose sounds sensible to me.

wsmoses updated this revision to Diff 403141.Jan 25 2022, 10:57 PM

Use isFinite

@nikic, mind taking another look here. I've gone ahead and made the requisite changes and set up the relevant infrastructure.

This now does not have a dependency on the langref change (though once that lands this should automatically become more powerful whenever that loop metadata is available).

lebedev.ri added inline comments.Jan 26 2022, 4:42 AM
llvm/lib/Analysis/ScalarEvolution.cpp
10120
wsmoses updated this revision to Diff 403290.Jan 26 2022, 8:46 AM

Change order evaluating condition

jdoerfert added inline comments.Jan 27 2022, 11:36 AM
llvm/lib/Analysis/ScalarEvolution.cpp
10123

Nit: format above.

wsmoses updated this revision to Diff 403739.Jan 27 2022, 11:52 AM

Fix format

Conceptually, this makes sense to me, but it would be much better to first get the langref change codified,
because this makes the same legality reasoning, and clearly langref is the place for legality wording.

reames added inline comments.Jan 27 2022, 1:25 PM
llvm/lib/Analysis/LoopInfo.cpp
1122 ↗(On Diff #403739)

This is - in the current patch - a form of attribute inference. We prefer inference be done once - either in the core API itself, or explicitly via materialization in IR - not duplicated for each consumer.

i.e. Calling code should assume that willreturn loops are mustprogress, not check for that case explicitly.

llvm/lib/Analysis/ScalarEvolution.cpp
7020

This change by itself should be testable and profitable. I'd advise separating the SimplifyICmpOperands bits into a following change.

Or said differently, a patch which causes loopIsFiniteByAssumption for loops in willreturn functions seems entirely reasonable on it's own without any new metadata.

wsmoses added inline comments.Jan 27 2022, 8:54 PM
llvm/lib/Analysis/ScalarEvolution.cpp
7020

I've made a version of this PR that just contains the isFinite component here (https://reviews.llvm.org/D118429) though frankly I'm not sure how to set up a test for it.

wsmoses updated this revision to Diff 403873.Jan 27 2022, 8:57 PM

Remove mustprogress if willreturn

wsmoses marked 2 inline comments as done.Jan 27 2022, 8:57 PM
wsmoses updated this revision to Diff 404102.Jan 28 2022, 11:20 AM

Rebase after removing/landing the willreturn finite

lebedev.ri accepted this revision.Jan 28 2022, 11:27 AM
lebedev.ri added a subscriber: mkazantsev.

LG, but please wait for @nikic / @reames / @mkazantsev.

Looks fine to me as well.

llvm/include/llvm/Analysis/ScalarEvolution.h
1120

controlling

llvm/lib/Analysis/ScalarEvolution.cpp
8478

As the same check is repeated below, extract it into a variable?

10120

controlling

llvm/test/Analysis/ScalarEvolution/finite-trip-count.ll
9 ↗(On Diff #404102)

Please avoid attribute groups for test input, just put the willreturn directly here.

It would also be good to have a negative test that shows that the BECount is not inferred without willreturn.

wsmoses updated this revision to Diff 404154.Jan 28 2022, 1:34 PM

Address comments

This revision was not accepted when it landed; it landed in state Needs Review.Jan 28 2022, 2:41 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
mtrofin added a subscriber: mtrofin.Mar 1 2022, 8:12 AM

This change has the side-effect of making SCEVExpander::isHighCostExpansion return true and thus block loop unrolling, which is the root cause of the performance degradation that led to 7e3606f43c63.

Before, the expression was:

(1 + ((-2 + %1) /u 2))<nuw>

with this change, the expression becomes:

(1 + ((-3 + (4 smax (1 + %1)<nsw>))<nsw> /u 2))<nuw><nsw>

The Budget is always 4, but the second expression trips over it (the (4 smax (1 + %1)<nsw>) subexpression raises the cost to 5)

@wsmoses, is there a follow-up for mitigating cost estimate implications?

nikic added a comment.Mar 1 2022, 8:23 AM

This change has the side-effect of making SCEVExpander::isHighCostExpansion return true and thus block loop unrolling, which is the root cause of the performance degradation that led to 7e3606f43c63.

Before, the expression was:

(1 + ((-2 + %1) /u 2))<nuw>

with this change, the expression becomes:

(1 + ((-3 + (4 smax (1 + %1)<nsw>))<nsw> /u 2))<nuw><nsw>

The Budget is always 4, but the second expression trips over it (the (4 smax (1 + %1)<nsw>) subexpression raises the cost to 5)

Could you please provide the input IR that is affected? We'd probably want to generate a better BECount than try to adjust cost modelling.

This change has the side-effect of making SCEVExpander::isHighCostExpansion return true and thus block loop unrolling, which is the root cause of the performance degradation that led to 7e3606f43c63.

Before, the expression was:

(1 + ((-2 + %1) /u 2))<nuw>

with this change, the expression becomes:

(1 + ((-3 + (4 smax (1 + %1)<nsw>))<nsw> /u 2))<nuw><nsw>

The Budget is always 4, but the second expression trips over it (the (4 smax (1 + %1)<nsw>) subexpression raises the cost to 5)

Could you please provide the input IR that is affected? We'd probably want to generate a better BECount than try to adjust cost modelling.

Ack - let me try to distill it down to something shareable (so it can easily become a testcase, I imagine we'd want that). Wanted to first check if this was expected and there was a plan, before spending more time.

I concur with @nikic

Sorry for the delay, I managed to extract a repro, please see https://github.com/llvm/llvm-project/issues/54191

The repro could be smaller, but I wasn't successful at shrinking it any further. It's down to the 1 function though, and I left additional pointers in the issue.

Thanks!

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 7:25 PM

I am also getting performance degradation from this patch. The backedge count with the max is considered as high cost and it blocks to rewrite exit value in IndVarSimplify pass. It affects LSR's decision to rewrite IV and it causes additional instructions in loop...
As far as I understand, the expression of backedge count describes (max(End,Start)-Start)/Stride normally. If the expression satisfies with (max(RHS,Start) > Start - Stride, the expression is refined simply.
In order to prove (max(RHS,Start) > Start - Stride, SCEV is using below code.

// Can we prove (max(RHS,Start) > Start - Stride?
if (isLoopEntryGuardedByCond(L, Cond, OrigStartMinusStride, OrigStart) &&
    isLoopEntryGuardedByCond(L, Cond, OrigStartMinusStride, OrigRHS)) {

Let's see the reduced example of @nikic on https://github.com/llvm/llvm-project/issues/54191 with above code.

; RUN: opt -S -passes='print<scalar-evolution>' < %s
define void @test(i64 %n) mustprogress {
entry:
  %guard = icmp sgt i64 %n, 1
  br i1 %guard, label %loop, label %exit
  
loop:
  %iv = phi i64 [ 2, %entry ], [ %iv.next, %loop ]
  %iv.next = add nuw nsw i64 %iv, 2
  %cmp = icmp sle i64 %iv.next, %n
  br i1 %cmp, label %loop, label %exit
  
exit:
  ret void
}

With this patch, the second isLoopEntryGuardedByCond is failed because the loop condition's upper bound has been changed from %n to %n + 1 and it is over the guard condition's upper bound which is %n.
Without this patch, on SimplifyICmpOperands, the lower bound is changed from 4 to 3because it is not minimum signed value and upper bound is maximum signed value.
At this point, I have questions.

  1. Can we check LHS first on SimplifyICmpOperands as below?
    if (!getSignedRangeMin(LHS).isMinSignedValue()) {
...
    } else if (ControllingFiniteLoop || !getSignedRangeMax(RHS).isMaxSignedValue()) {
...
    }
  1. Can we change the default value of SCEVCheapExpansionBudget to bigger value?
    • The smax's cost checks the cost of Instruction::ICmp and Instruction::Select. The default value 4 could not be good enough for the smax.
  2. If the upper bound is signed maximum signed value, the upper bound change with +1 could cause overflow and isLoopEntryGuardedByCond returns false...
    • It could be safe to check just maximum signed value of upper bound !getSignedRangeMax(RHS).isMaxSignedValue() without ControllingFiniteLoop .

How do you think about it? If I missed something, please let me know.