Page MenuHomePhabricator

[SCEV] Iteratively compute ranges for deeply nested expressions.
Needs ReviewPublic

Authored by fhahn on Jul 28 2022, 1:01 PM.

Details

Summary

At the moment, getRangeRef may overflow the stack for very deeply nested
expressions.

This patch introduces a new getRangeRefIter function, which first builds
a worklist of N-ary expressions and phi nodes, followed by their
operands iteratively.

getRangeRef has been extended to also take a Depth argument and it
switches to use getRangeRefIter once the depth reaches a certain
threshold.

This ensures compile-time is not impacted in general. Note that
the iterative algorithm may lead to a slightly different evaluation
order, which could result in slightly worse ranges for cyclic phis.

https://llvm-compile-time-tracker.com/compare.php?from=23c3eb7cdf3478c9db86f6cb5115821a8f0f5f40&to=e0e09fa338e77e53242bfc846e1484350ad79773&stat=instructions

Fixes #49579.

Diff Detail

Event Timeline

fhahn created this revision.Jul 28 2022, 1:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 1:01 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
fhahn requested review of this revision.Jul 28 2022, 1:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 1:01 PM
hiraditya added inline comments.Jul 28 2022, 2:21 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6477

Can the traversal order be preserved if we populated the Worklist with DFS traversal for operands starting with S?

fhahn updated this revision to Diff 448994.Aug 1 2022, 4:53 AM
fhahn marked an inline comment as done.

Rebased and addede logic to avoid infinite cycles when evaluating phis using PendingPhiRangesIter.

llvm/lib/Analysis/ScalarEvolution.cpp
6477

I had a look at some of the differences and the main issue is that we now may evaluate ranges earlier than previously.

One example is that for AddRecs, getRangeRef may not evaluate the range of the start value, e.g. because it may not be necessary if the addrec overflows. With the iterative approach, we will evaluate the range of the start value before evaluating the addrec. I don't think there anything we could do about it, except adding the expression-dependent logic of when to not evaluate an operand.

I think this is undesirable and should also not be necessary in practice. The differences caused by this are very minor (in some cases it even leads to small improvements) and won't materialize except in very deep expressions. I am also open to increasing the threshold when the iterative logic triggers. SCEV should just not overflow the stack for valid IR inputs.

Here's where the ranges would be a bit tighter if we unconditionally use the iterative approach:

diff --git a/llvm/test/Analysis/ScalarEvolution/pr49856.ll b/llvm/test/Analysis/ScalarEvolution/pr49856.ll
index 751677f1f9f8..661fd5482ad5 100644
--- a/llvm/test/Analysis/ScalarEvolution/pr49856.ll
+++ b/llvm/test/Analysis/ScalarEvolution/pr49856.ll
@@ -5,7 +5,7 @@ define void @test() {
 ; CHECK-LABEL: 'test'
 ; CHECK-NEXT:  Classifying expressions for: @test
 ; CHECK-NEXT:    %tmp = phi i32 [ 2, %bb ], [ %tmp2, %bb3 ]
-; CHECK-NEXT:    --> %tmp U: [1,-2147483648) S: [0,-2147483648)
+; CHECK-NEXT:    --> %tmp U: [1,-2147483648) S: [1,-2147483648)
 ; CHECK-NEXT:    %tmp2 = add nuw nsw i32 %tmp, 1
 ; CHECK-NEXT:    --> (1 + %tmp)<nuw> U: [1,-2147483647) S: [1,-2147483647)
 ; CHECK-NEXT:  Determining loop execution counts for: @test
diff --git a/llvm/test/Analysis/ScalarEvolution/shift-recurrences.ll b/llvm/test/Analysis/ScalarEvolution/shift-recurrences.ll
index bc0f62e827ea..e5199e027ab3 100644
--- a/llvm/test/Analysis/ScalarEvolution/shift-recurrences.ll
+++ b/llvm/test/Analysis/ScalarEvolution/shift-recurrences.ll
@@ -446,7 +446,7 @@ define void @nonloop_recurrence() {
 ; CHECK-LABEL: 'nonloop_recurrence'
 ; CHECK-NEXT:  Classifying expressions for: @nonloop_recurrence
 ; CHECK-NEXT:    %tmp = phi i32 [ 2, %bb ], [ %tmp2, %bb3 ]
-; CHECK-NEXT:    --> %tmp U: [1,-2147483648) S: [0,-2147483648)
+; CHECK-NEXT:    --> %tmp U: [1,-2147483648) S: [1,-2147483648)
 ; CHECK-NEXT:    %tmp2 = add nuw nsw i32 %tmp, 1
 ; CHECK-NEXT:    --> (1 + %tmp)<nuw> U: [1,-2147483647) S: [1,-2147483647)
 ; CHECK-NEXT:  Determining loop execution counts for: @nonloop_recurrence
@@ -470,7 +470,7 @@ define void @nonloop_recurrence_2() {
 ; CHECK-LABEL: 'nonloop_recurrence_2'
 ; CHECK-NEXT:  Classifying expressions for: @nonloop_recurrence_2
 ; CHECK-NEXT:    %tmp = phi i32 [ 2, %loop ], [ %tmp2, %bb3 ]
-; CHECK-NEXT:    --> %tmp U: [1,-2147483648) S: [0,-2147483648) Exits: <<Unknown>> LoopDispositions: { %loop: Variant }
+; CHECK-NEXT:    --> %tmp U: [1,-2147483648) S: [1,-2147483648) Exits: <<Unknown>> LoopDispositions: { %loop: Variant }
 ; CHECK-NEXT:    %tmp2 = add nuw nsw i32 %tmp, 1
 ; CHECK-NEXT:    --> (1 + %tmp)<nuw> U: [1,-2147483647) S: [1,-2147483647) Exits: <<Unknown>> LoopDispositions: { %loop: Variant }
 ; CHECK-NEXT:  Determining loop execution counts for: @nonloop_recurrence_2

The only regression AFAICT would be in llvm/test/Analysis/ScalarEvolution/addrec-computed-during-addrec-calculation.ll where we fail to hoist the sext out of (sext i32 {%iv,+,1}<nsw><%loop2> to i64)

nikic added a comment.Aug 1 2022, 9:38 AM

High-level concern: We have lots of code that is working with SCEVs recursively. If we fix getRangeRef(), are we just shifting the problem over to GetMinTrailingZeroes(), or getSCEVAtScope(), or ...?

I wonder whether it would make sense to prevent the creation of very deeply nested SCEVs instead. We already have various limits, and the one that is most relevant is probably the huge expression limit, but that one limits the total number of (recursive) children, not the nesting.

fhahn added a comment.Aug 2 2022, 1:38 AM

High-level concern: We have lots of code that is working with SCEVs recursively. If we fix getRangeRef(), are we just shifting the problem over to GetMinTrailingZeroes(), or getSCEVAtScope(), or ...?

That's true, I think we have another known crash due to recursion in getSCEVAtScope, GetMinTrailingZeroes also looks like a likely candidate for issues. As far as I know there are no other reported crashes caused by stack overflows other than the one in getRangeRef and getSCEVAtScope.

The getRangeRef situation is particularly unfortunate, because not only are we traversing operands of a single SCEV expression tree, but we also look through SCEVUnknown phis.

I wonder whether it would make sense to prevent the creation of very deeply nested SCEVs instead. We already have various limits, and the one that is most relevant is probably the huge expression limit, but that one limits the total number of (recursive) children, not the nesting.

A limit on the nesting may help in some (most?) cases, but one potential drawback would be that SCEVs may depend on the order they have been constructed (e.g. if we start constructing form the bottom of the tree we may miss folds for expressions higher up in the tree that may reduce overall height; if we construct from the top first we could end up with a different expression). It probably won't matter in too many cases in practice, but at least conceptually avoiding hard limits seems desirable if possible IMO. Updating most or all places to work iteratively if needed is probably going to take longer and more work, but so far the changes have been fairly targeted/isolated I think.

As mentioned earlier, getRangeRef looks through SCEVUnknown phis so it may traverse multiple SCEV expressions, so a limit on the size of a single expression may not help in all cases there.

Switching everything in SCEV to use iterative algorithms to avoid stack recursion depth issues seems like the better long term approach to me. I really dislike the notion of one more limit here.

The other option is we could use the musttail attributes to explicitly write recursion based code without the stack depth problems, but I think that's currently a clang only extension. Not sure if GCC supports it, or where it stands in certification efforts.

mkazantsev added inline comments.Aug 5 2022, 1:35 AM
llvm/lib/Analysis/ScalarEvolution.cpp
6513

This should be an option.

fhahn updated this revision to Diff 452991.Aug 16 2022, 7:00 AM

Add option to customize threshold, ping :)

Sorry, was on vacation. Taking a look...

I'm not sure if the threshold is ever reached in the tests. Could you pls add a test that

  • runs default computation
  • runs iterative computation (with cut limit)
  • shows that the results don't differ?
llvm/lib/Analysis/ScalarEvolution.cpp
6449

What about div?