This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
8/13
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
1/1
no-wrap-symbolic-becount.ll

Differential D89381

[SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3
ClosedPublic

Authored by mkazantsev on Oct 14 2020, 4:05 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
fhahn
reames
skatkov
efriedma
nikic
bollu

Commits

rG624fc63a0510: [SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of…
rGa10a64e7e334: [SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of…
rG32b72c3165bf: Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of…
rG905101c36025: [SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs

Summary

We can sharpen the range of a AddRec if we know that it does not
self-wrap and know the symbolic iteration count in the loop. If we can
evaluate the value of AddRec on the last iteration and prove that at least
one its intermediate value lies between start and end, then no-wrap flag
allows us to conclude that all of them also lie between start and end. So
the estimate of range can be improved to union of ranges of start and end.

Switched off by default, can be turned on by flag.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mkazantsev created this revision.Oct 14 2020, 4:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2020, 4:05 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

mkazantsev requested review of this revision.Oct 14 2020, 4:05 AM

mkazantsev added inline comments.Oct 14 2020, 5:10 AM

llvm/test/Analysis/ScalarEvolution/no-wrap-symbolic-becount.ll
11	Unsigned range here can also be made [0,0,4294967296)...

xbolva00 added a subscriber: xbolva00.Oct 14 2020, 5:12 AM

xbolva00 added inline comments.

llvm/test/Transforms/IndVarSimplify/widen-loop-comp.ll
548 ↗	(On Diff #298101)	remove todo

mkazantsev planned changes to this revision.Oct 14 2020, 5:19 AM

mkazantsev added inline comments.

llvm/test/Transforms/IndVarSimplify/widen-loop-comp.ll
548 ↗	(On Diff #298101)	Not until the zext is removed.

Sharpened unsigend flag too.

mkazantsev added inline comments.Oct 14 2020, 5:46 AM

llvm/test/Transforms/IndVarSimplify/widen-loop-comp.ll
548 ↗	(On Diff #298101)	Ideally, we should be able to widen everything to i64. Currently IV gives up because zext(iv) is not an addrec after the range is sharpened. It's weird but fixable I think.

efriedma added a subscriber: efriedma.Oct 14 2020, 10:30 AM

efriedma added inline comments.

llvm/lib/Analysis/ScalarEvolution.cpp
5731	I'm concerned that evaluating at iteration MaxBECount won't do what you want. In particular, if MaxBECount is larger than the number of steps required to violate nw, then the way you're comparing Start and End doesn't work. The actual backedge-taken count at runtime can't be that large, of course, but MaxBECount is an approximation in general; I don't think we make any relevant guarantees. In particular, if the step isn't constant, we can't get this right. I think you could explicitly compute the largest number of non-wrapping iterations, and then take the minimum of that and MaxBECount to compute the "real" maximum count.
5770	At first I wasn't sure what `isKnownPredicate(LEPred, Start, End))` was checking, but then I figured out it's accounting for the possibility of crossing INT_MAX/UINT_MAX, since nw doesn't exclude that. I think an alternate way of satisfying this condition would be to check for nsw/nuw.

fhahn added inline comments.Oct 14 2020, 10:59 AM

llvm/lib/Analysis/ScalarEvolution.cpp
5731	I think you could explicitly compute the largest number of non-wrapping iterations, and then take the minimum of that and MaxBECount to compute the "real" maximum count. Unrelated to the patch directly, but I suppose we could also use this to clamp down the computed MaxBECount directly, if we do not already do so? But we still won't have a reliable way to indicate that the computed count can be used directly I think.

mkazantsev added inline comments.Oct 14 2020, 10:18 PM

llvm/lib/Analysis/ScalarEvolution.cpp
5731	I don't think this situation is possible, but I don't have strong enough arguments to prove that. We can add an extra check that step * max num iterations must be known less than UINT_MAX for this type.

mkazantsev added inline comments.Oct 14 2020, 10:24 PM

llvm/lib/Analysis/ScalarEvolution.cpp
5731	I'll think about adding clamp into `MaxBECount` computation. I'm concerned that SCEV may not always be able to derive required facts from clamped expressions. Let's not do it in this patch and just add a check against this situation.

mkazantsev added inline comments.Oct 14 2020, 10:50 PM

llvm/lib/Analysis/ScalarEvolution.cpp
5770	`nuw/nsw` checks should be, in different ways, already incorporated inside of `isKnownPredicate`.

mkazantsev updated this revision to Diff 298307.Oct 14 2020, 11:05 PM

mkazantsev marked an inline comment as done.

Added check to address Eli's comment. Now we make sure that we can *prove* nw.

mkazantsev added a reviewer: efriedma.Oct 14 2020, 11:05 PM

LGTM

This revision is now accepted and ready to land.Oct 15 2020, 11:18 AM

mkazantsev added inline comments.Oct 15 2020, 8:21 PM

llvm/lib/Analysis/ScalarEvolution.cpp
5783	Actually we do not need to prove anything about V1. Knowing `start < end` && `step is positive` is enough, as well as `start > end` is negative.

mkazantsev planned changes to this revision.Oct 15 2020, 8:23 PM

Removed V1 check which is actually not needed.

This revision is now accepted and ready to land.Oct 15 2020, 9:26 PM

Closed by commit rG905101c36025: [SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs (authored by mkazantsev). · Explain WhyOct 15 2020, 10:01 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev added a commit: rG905101c36025: [SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs.

nikic added a reverting change: rG7d3b475810ba: Revert "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of….Oct 16 2020, 12:47 AM

I've reverted this change because it causes large compile-time regressions without a commensurate change in codegen: https://llvm-compile-time-tracker.com/compare.php?from=cc175c2cc8e638462bab74e0781e06f9b6eb5017&to=905101c36025fe1c8ecdf9a20cd59db036676073&stat=instructions For example, we have a 4% regression on mafft without any codegen change. This either needs a more efficient implementation or a stronger motivation.

I'll try using less powerful proof methods here; isKnownPredicate might be an overkill.

mkazantsev added a commit: rG32b72c3165bf: Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of….Oct 16 2020, 3:36 AM

nikic added a reverting change: rG74c8c2d903e2: Revert "Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen….Oct 16 2020, 12:36 PM

fhahn mentioned this in D89586: [SCEV] Use the number of steps of an IV to clamp max BTC (WIP)..Oct 16 2020, 12:36 PM

fhahn added inline comments.Oct 16 2020, 12:40 PM

llvm/lib/Analysis/ScalarEvolution.cpp
5731	I started working on using this info for MaxBECount: D89586. Still needs some further work though.

I've reverted this change again for a couple of reasons:

First, the compile-time impact is now better, but still larger than the change justifies (at least without knowing the broader context): https://llvm-compile-time-tracker.com/compare.php?from=fbd62fe60fb2281ca33da35dc25ca3c87ec0bb51&to=32b72c3165bf65cca2e8e6197b59eb4c4b60392a&stat=instructions The mafft regression is no longer 4%, but it's still 3%.

Less importantly, I don't think the logic here is quite correct. The first issue is the use of the unionWith() API, which probably doesn't do what you want it to do. The way it is used here, it will return the smallest union of Start and End, which is not necessarily the range [Start, End], it can also be Start], [End, if we simplify this down to single-element ranges. Here is a test-case that I think illustrates the issue, if I didn't mess it up:

define i32 @test_02(i32 %start, i32* %p, i32* %q) {
; CHECK-LABEL: 'test_02'
; CHECK-NEXT:  Classifying expressions for: @test_02
; CHECK-NEXT:    %zext = zext i32 %start to i64
; CHECK-NEXT:    --> (zext i32 %start to i64) U: [0,4294967296) S: [0,4294967296)
; CHECK-NEXT:    %shl = shl i64 %zext, 31
; CHECK-NEXT:    --> (2147483648 * (zext i32 %start to i64))<nuw><nsw> U: [0,9223372034707292161) S: [0,9223372034707292161)
; CHECK-NEXT:    %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ %shl, %entry ]
; CHECK-NEXT:    --> {(2147483648 * (zext i32 %start to i64))<nuw><nsw>,+,-1}<nsw><%loop> U: [-9223372036854775808,9223372034707292161) S: [-9223372036854775808,9223372034707292161) Exits: -9223372036854775806 LoopDispositions: { %loop: Computable }
; CHECK-NEXT:    %indvars.iv.next = add nsw i64 %indvars.iv, -1
; CHECK-NEXT:    --> {(-1 + (2147483648 * (zext i32 %start to i64))<nuw><nsw>)<nsw>,+,-1}<nw><%loop> U: full-set S: [-1,-9223372036854775806) Exits: -9223372036854775807 LoopDispositions: { %loop: Computable }
; CHECK-NEXT:  Determining loop execution counts for: @test_02
; CHECK-NEXT:  Loop %loop: backedge-taken count is (9223372036854775806 + (2147483648 * (zext i32 %start to i64))<nuw><nsw>)<nuw>
; CHECK-NEXT:  Loop %loop: max backedge-taken count is -2147483650
; CHECK-NEXT:  Loop %loop: Predicated backedge-taken count is (9223372036854775806 + (2147483648 * (zext i32 %start to i64))<nuw><nsw>)<nuw>
; CHECK-NEXT:   Predicates:
; CHECK:       Loop %loop: Trip multiple is 1
;
entry:
  %zext = zext i32 %start to i64
  %shl = shl i64 %zext, 31
  br label %loop

loop:                                             ; preds = %backedge, %entry
  %indvars.iv = phi i64 [ %indvars.iv.next, %loop ], [ %shl, %entry ]
  %cond = icmp eq i64 %indvars.iv, -9223372036854775806
  %indvars.iv.next = add nsw i64 %indvars.iv, -1
  br i1 %cond, label %exit, label %loop

exit:                                             ; preds = %loop
  ret i32 0
}

Note that the signed range of %indvars.iv.next is [-1,-9223372036854775806), while it should be [-9223372036854775807,9223372034707292161), give or take a few off by one errors.

This could be partially avoided by passing a sign preference to unionWith(). However, even with a sign preference, this may return a smaller wrapping range, if that avoid returning a full set. Similarly, the signed/unsigned ranges you request for start/end are also only sign preferences, you may still get back a wrapping range. I haven't followed it through in detail, but I don't think the logic is correct for wrapping ranges.

That's a good point, thanks! I'll see what I can do.

@nikic the output on your test is same with and without the patch. The point of checking of range's start and end is valid, but not in this case.

Scratch that, I used wrong revision.

@nikic I've checked in your test separately and added handling for it here, to make sure that its behavior does not change. Proof methods is reduced to isKnownPredicateViaConstantRanges which does not create new SCEVs and therefore should not lead to recursive behavior. Could you please confirm that the compile time is OK with this version?

mkazantsev added a commit: rGa10a64e7e334: [SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of….Oct 19 2020, 9:33 PM

Unfortunately the compile-time issue still exists and is not materially affected by the switch to isKnownPredicateViaConstantRanges(). I've looked at callgrind profiles for partSalignmm.c, which has a 5% regression with this change, and found that the additional 166M instructions are spread roughly as follows:

72M: evaluateAtIteration
35M: isKnownPredicate for the TODO check
32M: Various SCEV expression construction for the TODO check
9M: computeMaxBackedgeTakenCount

Resolving the TODO would be a significant optimization, but unfortunately the largest part of the cost is inside evaluateAtIteration(), which seems like a more fundamental part of this calculation.

Is this patch being reverted again? It also broke the Polly buildbots:

Failed Tests (1):
  Polly :: ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll

MaskRay added a reverting change: rGd9f91a3d1452: Revert D89381 "[SCEV] Recommit "Use nw flag and symbolic iteration count to….Oct 20 2020, 9:06 PM

In D89381#2343087, @Meinersbur wrote:
Is this patch being reverted again? It also broke the Polly buildbots:
Failed Tests (1):
  Polly :: ScopInfo/NonAffine/non-affine-loop-condition-dependent-access_3.ll

Verified this did break the Polly test. Reverted. Changed nikic to a blocking reviewer.

Thanks, The buildbot is green again: http://lab.llvm.org:8014/#/builders/5/builds/5

I would work on a fix (an integer range changed), but it would cause failure again if this patch was reverted.

How come that polly buildbot breach is a reason for a revert?

Hi @nikic, evaluateAtIteration is supposed to be completely free in this case, returning simply a+b. Do you have any info regarding what types of start/step are causing this?

mkazantsev added a child revision: D89548: [SCEV] Prove implications of different type via truncation.Oct 20 2020, 11:16 PM

In D89381#2343569, @mkazantsev wrote:

How come that polly buildbot breach is a reason for a revert?

The patch has been reverted multiple times (it is likely there are other unhandled issues) and the polly test demonstrates an internal difference which did not seem cosmetic. Plus nikic had several unaddressed comments. Hence the revert.

I think if we want to gatekeep new improvements based on their compile time impact, we should document that first in an appropriate documentation.

All comments have been addressed, except for the compile time issue, and we are trying to figure out what case we are dealing with. I have no info about what, how and in which config is being regressed. https://llvm-compile-time-tracker.com/ gives no info in this. This is not a part of LLVM test suite, and the change is not introducing a functional bug.

I would expect that, if the compile time impact is treated as a problem, this problem would be filed on bugs.llvm.com. I will gladly help to resolve this issue once I have info about what, how, in which circumstances and which environment has regressed. I don't think that reverting patches without providing info about problematic tests is a constructive approach that will move us anywhere.

As for Polly, I'm pretty sure it wouldn't be a big problem to update the test. It's not the part of core LLVM, and this patch can only sharpen the analyzes results. If the polly test suffers from more precise SCEV ranges, it's clearly not a SCEV problem.

Fixed Polly test. Looks like a clear improvement to me.

Herald added a reviewer: bollu. · View Herald TranscriptOct 21 2020, 2:06 AM

In D89381#2343856, @mkazantsev wrote:

Fixed Polly test. Looks like a clear improvement to me.

I agree, with this patch Polly is able to deduce that the range of k is between zero and 1024*1024 (instead of MAX_INT)

Now isKnownPredicateViaConstantRanges is used for TODO check too, and Step is limited to the constant. If it still is not helping, then I have no slightest clue what it can be and need a reproducer for the problem.

In D89381#2344199, @mkazantsev wrote:

Now isKnownPredicateViaConstantRanges is used for TODO check too, and Step is limited to the constant. If it still is not helping, then I have no slightest clue what it can be and need a reproducer for the problem.

From what I understand the revert commit specifically mentioned a large (+3%) regression in compile-time for mafft from CTMark, which is part of llvm-test-suite, built with -O3. And in particular mentioned a larger regression in a specific file of mafft (partSalignmm.c). IIUC it should be possible to reproduce the regression by building that file with -O3 for X86 with release builds (no assertions) with and without the change?

@nikic is there a way to measure the CT impact of this version without actually merging it? If it's still too bad, please provide me something that I can analyze (either C++ test + run instructions or IR test).

In D89381#2344217, @fhahn wrote:

In D89381#2344199, @mkazantsev wrote:

Now isKnownPredicateViaConstantRanges is used for TODO check too, and Step is limited to the constant. If it still is not helping, then I have no slightest clue what it can be and need a reproducer for the problem.

From what I understand the revert commit specifically mentioned a large (+3%) regression in compile-time for mafft from CTMark, which is part of llvm-test-suite, built with -O3. And in particular mentioned a larger regression in a specific file of mafft (partSalignmm.c). IIUC it should be possible to reproduce the regression by building that file with -O3 for X86 with release builds (no assertions) with and without the change?

I don't have it in my tree; is it https://github.com/llvm/llvm-test-suite/tree/master/CTMark what you are referencing? I can try this one...

In D89381#2344230, @mkazantsev wrote:

In D89381#2344217, @fhahn wrote:

In D89381#2344199, @mkazantsev wrote:

Now isKnownPredicateViaConstantRanges is used for TODO check too, and Step is limited to the constant. If it still is not helping, then I have no slightest clue what it can be and need a reproducer for the problem.

From what I understand the revert commit specifically mentioned a large (+3%) regression in compile-time for mafft from CTMark, which is part of llvm-test-suite, built with -O3. And in particular mentioned a larger regression in a specific file of mafft (partSalignmm.c). IIUC it should be possible to reproduce the regression by building that file with -O3 for X86 with release builds (no assertions) with and without the change?

I don't have it in my tree; is it https://github.com/llvm/llvm-test-suite/tree/master/CTMark what you are referencing? I can try this one...

Yes, it's part of the separate llvm-test-suite repo, the sources for mafft specifically are here https://github.com/llvm/llvm-test-suite/tree/master/MultiSource/Benchmarks/mafft. The CTMark directory only contains links to the actual sources, because it is only a small subset targeted at tracking compile-time.

It would probably be helpful to add a short snippet about how to build & run CTMark to reproduce the setup on the compile-time tracker website (clone, cmake, build and llvm-lit commands to run). And maybe linking to https://llvm.org/docs/TestSuiteGuide.html (and improving this page :))

Thank you @fhahn for providing some context. I have updated https://llvm-compile-time-tracker.com/about.php to add some basic reproducing instructions as well.

For reference, here are the results for the current patch, which still has roughly the same characteristics: https://llvm-compile-time-tracker.com/compare.php?from=4b7dafd9046f0ceaadacaafe0ea4a1fb00cf70a5&to=c713c1ef75612a3dae1ef4b99f766e0c7784381f&stat=instructions

In D89381#2343573, @mkazantsev wrote:

Hi @nikic, evaluateAtIteration is supposed to be completely free in this case, returning simply a+b. Do you have any info regarding what types of start/step are causing this?

I added some debug output and compiled the previously mentioned partSalignmm.c test case, with the following output: https://gist.github.com/nikic/40d676b9eeacee3ed38cb896a111e9f1 I couldn't say which ones of these are particularly slow though, and nothing seems particularly out of the ordinary there.

As you say, evaluateAtIteration() is simply a+b in this case, but this is not necessarily free, because potentially expensive expression folding may be involved. That expression folding is uncached, unless the expression doesn't fold.

I think the core issue here is that this optimization is a bit of a layering violation: All the existing getRangeRef() logic just recursively combines ranges of existing SCEV expressions. It does not create any new SCEV expressions. (Apart from calculating the constant max BE count, which is at least a cached calculation.)

I was going to suggest that we could use getSCEVAtScope on the parent loop to at least make this a cached addrec exit value, but that would not cover your motivating example (though probably would cover most other test changes). Unfortunately I'm not familiar enough with SCEV to make a good suggestion here.

What's the impact on O1 and O2 modes?

mkazantsev added a comment.Oct 21 2020, 9:52 PM

This comment was removed by mkazantsev.

Let's try to cache symbolic max BE count, it's the only thing that still can be improved here.

mkazantsev added a parent revision: D89954: [SCEV][NFC] Cache symbolic max exit count.Oct 22 2020, 6:00 AM

Rebased using caching version of BE count computation. Should be cheaper now.

@nikic @fhahn thanks for help with the reproduction! I think I can reproduce the problem locally now. Investigating where the ticks are spent and why.

Need to investigate why End computation is so expensive.

The common pattern where the CT drops that I ovserve is following: Start is a constant (in most cases zero), Step is also a constant and MaxBECount is a pretty big expression. The time is spent in attempts to prove nuw/nsw flags on End computation in this case.

mkazantsev updated this revision to Diff 300208.Oct 23 2020, 3:13 AM

mkazantsev edited the summary of this revision. (Show Details)

Herald added a subscriber: javed.absar. · View Herald TranscriptOct 23 2020, 3:13 AM

Here's the study that I made on the example of rdopt_coding_state. Units below are Ir (hopefully it's "instructions retired") on compilation with clang O3.

Almost queries are made for start = 0, constant step & BE count expression that varies from small to really huge expressions.
IR without patch: 220370413
IR with previous version of patch (no SCEV size limits): 224114149 (2% regression).
When I started limiting the size of IRs involved in transform, the dynamics was following:
Limit = 2: 220517261
Limit = 3: 221761742
Limit = 4-6: 222841240

So starting size 4, we have 1% regression on this test (but wide picture might be different). In the patch I set the limit of 6 because it's the minimum value needed to pass all tests in the way as they passed without limits. @nikic could you please evaluate this one?

Here are the numbers for the new patch: https://llvm-compile-time-tracker.com/compare.php?from=13edfcc97d29574d1d38ded4fa9c2af6e6519472&to=45962d4244d46b1df0618701686a8756c6fb6b78&stat=instructions The geomean regression is now at 0.6% and the mafft regression is at a bit over 2%.

In that case, I see no other way than merge it with threshold 0 for clang. Frankly, I don't care if we lose 2% CT on some corner case tests if we get better SCEV ranges from it. Limiting SCEV sizes is a dirty and unstable hack, so it's better be an "all or nothing" flag.

mkazantsev updated this revision to Diff 300581.Oct 25 2020, 11:23 PM

mkazantsev edited the summary of this revision. (Show Details)

It would be great to get some benchmark data to see runtime performance / improvements as well.

This patch alone does not give any improvements itself. However, it's an important enabling step for fighting incomplete IV widening (see dependent patches), which is a source of performance problems in many various cases, mostly with decrementing loops. I only ran Java benchmarks and see up to 30% boosts in some cases.

mkazantsev mentioned this in D89692: [SCEV] SCEVPtrToIntExpr simplifications.Oct 27 2020, 3:24 AM

@nikic @MaskRay any reasons to keep this out? It's effectively an NFC now. If we want to further measure its impact on compile time & performance of clang benchmarks, it can be a separate change switching the flag value.

@nikic any reason for this compile time re-measuring dance instead of explaining the onboarding process?
(I.e. telling https://github.com/max-azul to make a github fork off https://github.com/llvm/llvm-project
into https://github.com/max-azul/llvm-project and adding that as a tracked remote for http://llvm-compile-time-tracker.com/ ?
(then, push branch to be measured as perf/<...>, and it will get measured))

Current numbers: (+0.6% geomean)
https://llvm-compile-time-tracker.com/compare.php?from=0ac56e8eaaeb41b59a952fa465872f647c930347&to=b36e80eaf4b4e35c74d814babe722160ef164f9b&stat=instructions

In D89381#2355899, @mkazantsev wrote:

@nikic @MaskRay any reasons to keep this out? It's effectively an NFC now. If we want to further measure its impact on compile time & performance of clang benchmarks, it can be a separate change switching the flag value.

I don't see any reasons, especially if this doesn't change the default just yet.

LG for landing under the flag.

I already landed a couple of SCEV improvements on Sunday that improved compile-time (while also improving nowrap inference), and am still pursuing some more leads, so I hope we can drop the flag in the future.

llvm/lib/Analysis/ScalarEvolution.cpp
5735	You can use the recently added getMinusOne() here.
5764	It might make more sense to use `getRangeRef(Start, SignHint)` and `getRangeRef(End, SignHint)` here, rather than going SignHint -> IsSigned -> SignHint.
5773	You can use isWrappedSet() and isSignWrappedSet() here (there are some special cases due to the half-open range).

This revision is now accepted and ready to land.Oct 27 2020, 10:23 AM

mkazantsev added inline comments.Oct 27 2020, 10:06 PM

llvm/lib/Analysis/ScalarEvolution.cpp
5764	Good point!
5773	Will do.

Closed by commit rG624fc63a0510: [SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of… (authored by mkazantsev). · Explain WhyOct 27 2020, 10:41 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev marked an inline comment as done.

mkazantsev added a commit: rG624fc63a0510: [SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of….

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

7 lines

lib/

Analysis/

ScalarEvolution.cpp

84 lines

test/

Analysis/

ScalarEvolution/

no-wrap-symbolic-becount.ll

8 lines

Diff 301182

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,501 Lines • ▼ Show 20 Lines	private:
/// copied if its needed for longer.		/// copied if its needed for longer.
const ConstantRange &getRangeRef(const SCEV *S, RangeSignHint Hint);		const ConstantRange &getRangeRef(const SCEV *S, RangeSignHint Hint);

/// Determines the range for the affine SCEVAddRecExpr {\p Start,+,\p Stop}.		/// Determines the range for the affine SCEVAddRecExpr {\p Start,+,\p Stop}.
/// Helper for \c getRange.		/// Helper for \c getRange.
ConstantRange getRangeForAffineAR(const SCEV Start, const SCEV Stop,		ConstantRange getRangeForAffineAR(const SCEV Start, const SCEV Stop,
const SCEV *MaxBECount, unsigned BitWidth);		const SCEV *MaxBECount, unsigned BitWidth);

		/// Determines the range for the affine non-self-wrapping SCEVAddRecExpr {\p
		/// Start,+,\p Stop}<nw>.
		ConstantRange getRangeForAffineNoSelfWrappingAR(const SCEVAddRecExpr *AddRec,
		const SCEV *MaxBECount,
		unsigned BitWidth,
		RangeSignHint SignHint);

/// Try to compute a range for the affine SCEVAddRecExpr {\p Start,+,\p		/// Try to compute a range for the affine SCEVAddRecExpr {\p Start,+,\p
/// Stop} by "factoring out" a ternary expression from the add recurrence.		/// Stop} by "factoring out" a ternary expression from the add recurrence.
/// Helper called by \c getRange.		/// Helper called by \c getRange.
ConstantRange getRangeViaFactoring(const SCEV Start, const SCEV Stop,		ConstantRange getRangeViaFactoring(const SCEV Start, const SCEV Stop,
const SCEV *MaxBECount, unsigned BitWidth);		const SCEV *MaxBECount, unsigned BitWidth);

/// We know that there is no SCEV for the specified value. Analyze the		/// We know that there is no SCEV for the specified value. Analyze the
/// expression.		/// expression.
▲ Show 20 Lines • Show All 631 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	HugeExprThreshold("scalar-evolution-huge-expr-threshold", cl::Hidden,
cl::desc("Size of the expression which is considered huge"),		cl::desc("Size of the expression which is considered huge"),
cl::init(4096));		cl::init(4096));

static cl::opt<bool>		static cl::opt<bool>
ClassifyExpressions("scalar-evolution-classify-expressions",		ClassifyExpressions("scalar-evolution-classify-expressions",
cl::Hidden, cl::init(true),		cl::Hidden, cl::init(true),
cl::desc("When printing analysis, include information on every instruction"));		cl::desc("When printing analysis, include information on every instruction"));

		static cl::opt<bool> UseExpensiveRangeSharpening(
		"scalar-evolution-use-expensive-range-sharpening", cl::Hidden,
		cl::init(false),
		cl::desc("Use more powerful methods of sharpening expression ranges. May "
		"be costly in terms of compile time"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SCEV class definitions		// SCEV class definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Implementation of the SCEV class.		// Implementation of the SCEV class.
//		//
▲ Show 20 Lines • Show All 5,285 Lines • ▼ Show 20 Lines	if (AddRec->isAffine()) {
ConservativeResult.intersectWith(RangeFromAffine, RangeType);		ConservativeResult.intersectWith(RangeFromAffine, RangeType);

auto RangeFromFactoring = getRangeViaFactoring(		auto RangeFromFactoring = getRangeViaFactoring(
AddRec->getStart(), AddRec->getStepRecurrence(*this), MaxBECount,		AddRec->getStart(), AddRec->getStepRecurrence(*this), MaxBECount,
BitWidth);		BitWidth);
ConservativeResult =		ConservativeResult =
ConservativeResult.intersectWith(RangeFromFactoring, RangeType);		ConservativeResult.intersectWith(RangeFromFactoring, RangeType);
}		}

		// Now try symbolic BE count and more powerful methods.
		if (UseExpensiveRangeSharpening) {
		const SCEV *SymbolicMaxBECount =
		getSymbolicMaxBackedgeTakenCount(AddRec->getLoop());
		if (!isa<SCEVCouldNotCompute>(SymbolicMaxBECount) &&
		getTypeSizeInBits(MaxBECount->getType()) <= BitWidth &&
		AddRec->hasNoSelfWrap()) {
		auto RangeFromAffineNew = getRangeForAffineNoSelfWrappingAR(
		AddRec, SymbolicMaxBECount, BitWidth, SignHint);
		ConservativeResult =
		ConservativeResult.intersectWith(RangeFromAffineNew, RangeType);
		}
		}
}		}

return setRange(AddRec, SignHint, std::move(ConservativeResult));		return setRange(AddRec, SignHint, std::move(ConservativeResult));
}		}

if (const SCEVUnknown *U = dyn_cast<SCEVUnknown>(S)) {		if (const SCEVUnknown *U = dyn_cast<SCEVUnknown>(S)) {
// Check if the IR explicitly contains !range metadata.		// Check if the IR explicitly contains !range metadata.
Optional<ConstantRange> MDRange = GetRangeFromMetadata(U->getValue());		Optional<ConstantRange> MDRange = GetRangeFromMetadata(U->getValue());
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	ConstantRange ScalarEvolution::getRangeForAffineAR(const SCEV *Start,
ConstantRange UR = getRangeForAffineARHelper(		ConstantRange UR = getRangeForAffineARHelper(
getUnsignedRangeMax(Step), getUnsignedRange(Start),		getUnsignedRangeMax(Step), getUnsignedRange(Start),
MaxBECountValue, BitWidth, /* Signed = */ false);		MaxBECountValue, BitWidth, /* Signed = */ false);

// Finally, intersect signed and unsigned ranges.		// Finally, intersect signed and unsigned ranges.
return SR.intersectWith(UR, ConstantRange::Smallest);		return SR.intersectWith(UR, ConstantRange::Smallest);
}		}

		ConstantRange ScalarEvolution::getRangeForAffineNoSelfWrappingAR(
		const SCEVAddRecExpr AddRec, const SCEV MaxBECount, unsigned BitWidth,
		ScalarEvolution::RangeSignHint SignHint) {
		assert(AddRec->isAffine() && "Non-affine AddRecs are not suppored!\n");
		assert(AddRec->hasNoSelfWrap() &&
		"This only works for non-self-wrapping AddRecs!");
		const bool IsSigned = SignHint == HINT_RANGE_SIGNED;
		const SCEV Step = AddRec->getStepRecurrence(this);
		// Only deal with constant step to save compile time.
		if (!isa<SCEVConstant>(Step))
		return ConstantRange::getFull(BitWidth);
		// Let's make sure that we can prove that we do not self-wrap during
		// MaxBECount iterations. We need this because MaxBECount is a maximum
		// iteration count estimate, and we might infer nw from some exit for which we
		efriedmaUnsubmitted Done Reply Inline Actions I'm concerned that evaluating at iteration MaxBECount won't do what you want. In particular, if MaxBECount is larger than the number of steps required to violate nw, then the way you're comparing Start and End doesn't work. The actual backedge-taken count at runtime can't be that large, of course, but MaxBECount is an approximation in general; I don't think we make any relevant guarantees. In particular, if the step isn't constant, we can't get this right. I think you could explicitly compute the largest number of non-wrapping iterations, and then take the minimum of that and MaxBECount to compute the "real" maximum count. efriedma: I'm concerned that evaluating at iteration MaxBECount won't do what you want. In particular…
		fhahnUnsubmitted Not Done Reply Inline Actions I think you could explicitly compute the largest number of non-wrapping iterations, and then take the minimum of that and MaxBECount to compute the "real" maximum count. Unrelated to the patch directly, but I suppose we could also use this to clamp down the computed MaxBECount directly, if we do not already do so? But we still won't have a reliable way to indicate that the computed count can be used directly I think. fhahn: > I think you could explicitly compute the largest number of non-wrapping iterations, and then…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions I don't think this situation is possible, but I don't have strong enough arguments to prove that. We can add an extra check that step * max num iterations must be known less than UINT_MAX for this type. mkazantsev: I don't think this situation is possible, but I don't have strong enough arguments to prove…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions I'll think about adding clamp into `MaxBECount` computation. I'm concerned that SCEV may not always be able to derive required facts from clamped expressions. Let's not do it in this patch and just add a check against this situation. mkazantsev: I'll think about adding clamp into `MaxBECount` computation. I'm concerned that SCEV may not…
		fhahnUnsubmitted Not Done Reply Inline Actions I started working on using this info for MaxBECount: D89586. Still needs some further work though. fhahn: I started working on using this info for MaxBECount: D89586. Still needs some further work…
		// do not know max exit count (or any other side reasoning).
		// TODO: Turn into assert at some point.
		MaxBECount = getNoopOrZeroExtend(MaxBECount, AddRec->getType());
		const SCEV *RangeWidth = getMinusOne(AddRec->getType());
		nikicUnsubmitted Done Reply Inline Actions You can use the recently added getMinusOne() here. nikic: You can use the recently added getMinusOne() here.
		const SCEV *StepAbs = getUMinExpr(Step, getNegativeSCEV(Step));
		const SCEV *MaxItersWithoutWrap = getUDivExpr(RangeWidth, StepAbs);
		if (!isKnownPredicateViaConstantRanges(ICmpInst::ICMP_ULE, MaxBECount,
		MaxItersWithoutWrap))
		return ConstantRange::getFull(BitWidth);

		ICmpInst::Predicate LEPred =
		IsSigned ? ICmpInst::ICMP_SLE : ICmpInst::ICMP_ULE;
		ICmpInst::Predicate GEPred =
		IsSigned ? ICmpInst::ICMP_SGE : ICmpInst::ICMP_UGE;
		const SCEV End = AddRec->evaluateAtIteration(MaxBECount, this);

		// We know that there is no self-wrap. Let's take Start and End values and
		// look at all intermediate values V1, V2, ..., Vn that IndVar takes during
		// the iteration. They either lie inside the range [Min(Start, End),
		// Max(Start, End)] or outside it:
		//
		// Case 1: RangeMin ... Start V1 ... VN End ... RangeMax;
		// Case 2: RangeMin Vk ... V1 Start ... End Vn ... Vk + 1 RangeMax;
		//
		// No self wrap flag guarantees that the intermediate values cannot be BOTH
		// outside and inside the range [Min(Start, End), Max(Start, End)]. Using that
		// knowledge, let's try to prove that we are dealing with Case 1. It is so if
		// Start <= End and step is positive, or Start >= End and step is negative.
		const SCEV *Start = AddRec->getStart();
		ConstantRange StartRange = getRangeRef(Start, SignHint);
		ConstantRange EndRange = getRangeRef(End, SignHint);
		ConstantRange RangeBetween = StartRange.unionWith(EndRange);
		// If they already cover full iteration space, we will know nothing useful
		nikicUnsubmitted Not Done Reply Inline Actions It might make more sense to use `getRangeRef(Start, SignHint)` and `getRangeRef(End, SignHint)` here, rather than going SignHint -> IsSigned -> SignHint. nikic: It might make more sense to use `getRangeRef(Start, SignHint)` and `getRangeRef(End, SignHint)`…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Good point! mkazantsev: Good point!
		// even if we prove what we want to prove.
		if (RangeBetween.isFullSet())
		return RangeBetween;
		// Only deal with ranges that do not wrap (i.e. RangeMin < RangeMax).
		bool IsWrappedSet = IsSigned ? RangeBetween.isSignWrappedSet()
		: RangeBetween.isWrappedSet();
		efriedmaUnsubmitted Not Done Reply Inline Actions At first I wasn't sure what `isKnownPredicate(LEPred, Start, End))` was checking, but then I figured out it's accounting for the possibility of crossing INT_MAX/UINT_MAX, since nw doesn't exclude that. I think an alternate way of satisfying this condition would be to check for nsw/nuw. efriedma: At first I wasn't sure what `isKnownPredicate(LEPred, Start, End))` was checking, but then I…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions `nuw/nsw` checks should be, in different ways, already incorporated inside of `isKnownPredicate`. mkazantsev: `nuw/nsw` checks should be, in different ways, already incorporated inside of…
		if (IsWrappedSet)
		return ConstantRange::getFull(BitWidth);

		nikicUnsubmitted Not Done Reply Inline Actions You can use isWrappedSet() and isSignWrappedSet() here (there are some special cases due to the half-open range). nikic: You can use isWrappedSet() and isSignWrappedSet() here (there are some special cases due to the…
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Will do. mkazantsev: Will do.
		if (isKnownPositive(Step) &&
		isKnownPredicateViaConstantRanges(LEPred, Start, End))
		return RangeBetween;
		else if (isKnownNegative(Step) &&
		isKnownPredicateViaConstantRanges(GEPred, Start, End))
		return RangeBetween;
		return ConstantRange::getFull(BitWidth);
		}

ConstantRange ScalarEvolution::getRangeViaFactoring(const SCEV *Start,		ConstantRange ScalarEvolution::getRangeViaFactoring(const SCEV *Start,
		mkazantsevAuthorUnsubmitted Done Reply Inline Actions Actually we do not need to prove anything about V1. Knowing `start < end` && `step is positive` is enough, as well as `start > end` is negative. mkazantsev: Actually we do not need to prove anything about V1. Knowing `start < end` && `step is positive`…
const SCEV *Step,		const SCEV *Step,
const SCEV *MaxBECount,		const SCEV *MaxBECount,
unsigned BitWidth) {		unsigned BitWidth) {
// RangeOf({C?A:B,+,C?P:Q}) == RangeOf(C?{A,+,P}:{B,+,Q})		// RangeOf({C?A:B,+,C?P:Q}) == RangeOf(C?{A,+,P}:{B,+,Q})
// == RangeOf({A,+,P}) union RangeOf({B,+,Q})		// == RangeOf({A,+,P}) union RangeOf({B,+,Q})

struct SelectPattern {		struct SelectPattern {
Value *Condition = nullptr;		Value *Condition = nullptr;
▲ Show 20 Lines • Show All 7,162 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/no-wrap-symbolic-becount.ll

	; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
	; RUN: opt < %s -S -analyze -scalar-evolution -enable-new-pm=0 \| FileCheck %s			; RUN: opt < %s -S -scalar-evolution-use-expensive-range-sharpening -analyze -scalar-evolution -enable-new-pm=0 \| FileCheck %s
	; RUN: opt < %s -S -passes='print<scalar-evolution>' 2>&1 \| FileCheck %s			; RUN: opt < %s -S -scalar-evolution-use-expensive-range-sharpening -passes='print<scalar-evolution>' 2>&1 \| FileCheck %s

	define i32 @test_01(i32 %start, i32* %p, i32* %q) {			define i32 @test_01(i32 %start, i32* %p, i32* %q) {
	; CHECK-LABEL: 'test_01'			; CHECK-LABEL: 'test_01'
	; CHECK-NEXT: Classifying expressions for: @test_01			; CHECK-NEXT: Classifying expressions for: @test_01
	; CHECK-NEXT: %0 = zext i32 %start to i64			; CHECK-NEXT: %0 = zext i32 %start to i64
	; CHECK-NEXT: --> (zext i32 %start to i64) U: [0,4294967296) S: [0,4294967296)			; CHECK-NEXT: --> (zext i32 %start to i64) U: [0,4294967296) S: [0,4294967296)
	; CHECK-NEXT: %indvars.iv = phi i64 [ %indvars.iv.next, %backedge ], [ %0, %entry ]			; CHECK-NEXT: %indvars.iv = phi i64 [ %indvars.iv.next, %backedge ], [ %0, %entry ]
	; CHECK-NEXT: --> {(zext i32 %start to i64),+,-1}<nsw><%loop> U: [-4294967295,4294967296) S: [-4294967295,4294967296) Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> {(zext i32 %start to i64),+,-1}<nsw><%loop> U: [0,4294967296) S: [0,4294967296) Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
				mkazantsevAuthorUnsubmitted Done Reply Inline Actions Unsigned range here can also be made [0,0,4294967296)... mkazantsev: Unsigned range here can also be made [0,0,4294967296)...
	; CHECK-NEXT: %iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]			; CHECK-NEXT: %iv = phi i32 [ %start, %entry ], [ %iv.next, %backedge ]
	; CHECK-NEXT: --> {%start,+,-1}<%loop> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> {%start,+,-1}<%loop> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
	; CHECK-NEXT: %iv.next = add i32 %iv, -1			; CHECK-NEXT: %iv.next = add i32 %iv, -1
	; CHECK-NEXT: --> {(-1 + %start),+,-1}<%loop> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> {(-1 + %start),+,-1}<%loop> U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
	; CHECK-NEXT: %index = zext i32 %iv.next to i64			; CHECK-NEXT: %index = zext i32 %iv.next to i64
	; CHECK-NEXT: --> (zext i32 {(-1 + %start),+,-1}<%loop> to i64) U: [0,4294967296) S: [0,4294967296) Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> (zext i32 {(-1 + %start),+,-1}<%loop> to i64) U: [0,4294967296) S: [0,4294967296) Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
	; CHECK-NEXT: %store.addr = getelementptr i32, i32* %p, i64 %index			; CHECK-NEXT: %store.addr = getelementptr i32, i32* %p, i64 %index
	; CHECK-NEXT: --> ((4 * (zext i32 {(-1 + %start),+,-1}<%loop> to i64))<nuw><nsw> + %p) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> ((4 * (zext i32 {(-1 + %start),+,-1}<%loop> to i64))<nuw><nsw> + %p) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
	; CHECK-NEXT: %load.addr = getelementptr i32, i32* %q, i64 %index			; CHECK-NEXT: %load.addr = getelementptr i32, i32* %q, i64 %index
	; CHECK-NEXT: --> ((4 * (zext i32 {(-1 + %start),+,-1}<%loop> to i64))<nuw><nsw> + %q) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> ((4 * (zext i32 {(-1 + %start),+,-1}<%loop> to i64))<nuw><nsw> + %q) U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
	; CHECK-NEXT: %stop = load i32, i32* %load.addr, align 4			; CHECK-NEXT: %stop = load i32, i32* %load.addr, align 4
	; CHECK-NEXT: --> %stop U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant }			; CHECK-NEXT: --> %stop U: full-set S: full-set Exits: <<Unknown>> LoopDispositions: { %loop: Variant }
	; CHECK-NEXT: %indvars.iv.next = add nsw i64 %indvars.iv, -1			; CHECK-NEXT: %indvars.iv.next = add nsw i64 %indvars.iv, -1
	; CHECK-NEXT: --> {(-1 + (zext i32 %start to i64))<nsw>,+,-1}<nsw><%loop> U: [-4294967296,4294967295) S: [-4294967296,4294967295) Exits: <<Unknown>> LoopDispositions: { %loop: Computable }			; CHECK-NEXT: --> {(-1 + (zext i32 %start to i64))<nsw>,+,-1}<nsw><%loop> U: [-4294967296,4294967295) S: [-1,4294967295) Exits: <<Unknown>> LoopDispositions: { %loop: Computable }
	; CHECK-NEXT: Determining loop execution counts for: @test_01			; CHECK-NEXT: Determining loop execution counts for: @test_01
	; CHECK-NEXT: Loop %loop: <multiple exits> Unpredictable backedge-taken count.			; CHECK-NEXT: Loop %loop: <multiple exits> Unpredictable backedge-taken count.
	; CHECK-NEXT: exit count for loop: (zext i32 %start to i64)			; CHECK-NEXT: exit count for loop: (zext i32 %start to i64)
	; CHECK-NEXT: exit count for backedge: *COULDNOTCOMPUTE*			; CHECK-NEXT: exit count for backedge: *COULDNOTCOMPUTE*
	; CHECK-NEXT: Loop %loop: max backedge-taken count is 4294967295			; CHECK-NEXT: Loop %loop: max backedge-taken count is 4294967295
	; CHECK-NEXT: Loop %loop: Unpredictable predicated backedge-taken count.			; CHECK-NEXT: Loop %loop: Unpredictable predicated backedge-taken count.
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 301182

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/no-wrap-symbolic-becount.ll

[SCEV] Re-enable "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 3
ClosedPublic