This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] limit recursion depth of CompareSCEVComplexity
ClosedPublic

Authored by dfukalov on Nov 8 2016, 3:22 AM.

Details

Summary

CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time.

Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter.

Diff Detail

Repository
rL LLVM

Event Timeline

dfukalov updated this revision to Diff 77163.Nov 8 2016, 3:22 AM
dfukalov retitled this revision from to [SCEV] limit recursion depth of CompareSCEVComplexity.
dfukalov updated this object.
dfukalov added a reviewer: sanjoy.
dfukalov added a subscriber: llvm-commits.

current test input .ll is quite big (~1000 lines). I'm trying to reduce it, but not sure I'll be able to. Should I add it's big version to the change?

sanjoy edited edge metadata.Nov 8 2016, 9:32 AM

Hi Dmitry,

Please upload the patch with context (http://llvm.org/docs/Phabricator.html)

Have you checked if this is solely due to the depth of the expressions, or if this is due to some exponential behavior? Can this be solved using a cache, like in CompareValueComplexity (though it also has a Depth so perhaps it isn't a good example).

As far as the test case goes, you can always add a C++ test case to unittests/ that generates the IR or SCEV expression in memory. I agree with you in that we should avoid adding excessively large test files to tests/

sanjoy requested changes to this revision.Nov 8 2016, 9:33 AM
sanjoy edited edge metadata.
This revision now requires changes to proceed.Nov 8 2016, 9:33 AM
dfukalov updated this revision to Diff 77322.Nov 9 2016, 2:08 AM
dfukalov edited edge metadata.

Hi Sanjoy,

the issue is in SLSR pass when it tries to construct with getAddExpr and getMulExpr on a source that is similar to test case from https://llvm.org/bugs/show_bug.cgi?id=28721 but much more bigger: >1000 lines of muls+adds in main block and has 8 phis instead of 2.

Tom,
as I mentioned above, the input caused issue is very similar to test case of bug 28721 and in my case it hangs in SLSR pass too. But both bugs are not reproducible for me.

I'll try to create a test case by expanding one from bug 28721.

Hi Sanjoy,

I've reduced testcase to 400 lines. Then I tried to use cache as you suggested and it fixes the issue indeed. I'm going to re-test and up[date patch.

But I would add recursion depth control since there is no guarantee the cache will help in all possible cases. For example, during test case reduction I had input variants of input that caused hangs in CompareSCEVComplexity called from other (not SLSR) passes. What do you think? If you agree with recursion depth, should it be new parameter (as in my first implementation) or a hard-coded value as for CompareValueComplexity?

sanjoy added a comment.Nov 9 2016, 4:23 PM

Hi Daniil,

I've reduced testcase to 400 lines. Then I tried to use cache as you suggested and it fixes the issue indeed. I'm going to re-test and up[date patch.

Great!

As I said before, if the test case is repetitive then it is probably better to generate the IR in memory using IRBuilder and put the test case in unittests/. Otherwise a 400 line file does not sound that bad.

But I would add recursion depth control since there is no guarantee the cache will help in all possible cases. For example, during test case reduction I had input variants of input that caused hangs in CompareSCEVComplexity called from other (not SLSR) passes.

I think a recursion depth as a stopgap measure is fine. For now I'd add a max depth as a cl::opt and use the same depth in both CompareValueComplexity and CompareSCEVComplexity (that is, don't "forget" the depth when CompareSCEVComplexity calls into CompareValueComplexity).

I'd make this max depth fairly high by default (perhaps 64 or something) since, as I understand it, the caching should take care of avoiding any exponential behavior, and if the we're spending lots of time in CompareSCEVComplexity then the SCEV expression itself is very complex. In this (latter) case it is defensible for CompareSCEVComplexity to take time (it simply has more work to do), but only up to a point.

What do you think? If you agree with recursion depth, should it be new parameter (as in my first implementation) or a hard-coded value as for CompareValueComplexity?

dfukalov updated this revision to Diff 77530.Nov 10 2016, 12:33 PM
dfukalov updated this object.
dfukalov edited edge metadata.

Hi Sanjoy,

I've updated the change and added unittest. I made it since reduced to 400 lines testcase can be compiled by ~10 sec on a fast machine without the fix, but the suggested unittest has "softhang" behavior.

For my initial big testcase recursion depth was up to 50+ an I never completed the compilation on a quite fast CPU. So I suggest to set it 32 by default.

Hi Sanjoy, would you please check the updated diff and test?

sanjoy accepted this revision.Nov 14 2016, 10:13 AM
sanjoy edited edge metadata.

lgtm

lib/Analysis/ScalarEvolution.cpp
704 ↗(On Diff #77530)

IMO a slightly better pattern would have been to have EqCache be a SmallSet of std::pair<PointerUnion<Value *, const SCEV *>, PointerUnion<Value *, const SCEV *>> and to re-use the same cache in CompareValueComplexity.

If you agree, can you do that as a followup change?

This revision is now accepted and ready to land.Nov 14 2016, 10:13 AM
dfukalov added inline comments.Nov 14 2016, 2:56 PM
lib/Analysis/ScalarEvolution.cpp
704 ↗(On Diff #77530)

I'm not sure this will actually be a re-use: any correct SCEV * is not equal to any correct Value * (except nullptr values), so the combined cache will contain all of items from both current caches, without memory usage reducing. And if it so, any search in a combined cache will be the same or slower.
E.g. there is an edge case: we can try to find actual value typed std::pair<const SCEV*, const SCEV*> in a set that contains std::pair<Value*, Value*> only. Do you agree?

sanjoy added inline comments.Nov 14 2016, 3:05 PM
lib/Analysis/ScalarEvolution.cpp
704 ↗(On Diff #77530)

We might win in some edge cases by since we'll persist the same value cache across multiple SCEVUnknown instances in the same SCEV tree. That is:

%x = ...
%y = ...
%x1 = add %x, 20
%y1 = add %y, 20

If LHS has SCEVUnknowns for %x and %x1, and RHS has SCEVUnknowns for %y and %y1, and then a combined cache will save us from recomputing CompareValueComplexity(%x, %y) twice.

But this is a minimal gain -- please don't block the commit on getting this right.

sanjoy added inline comments.Nov 14 2016, 3:10 PM
lib/Analysis/ScalarEvolution.cpp
704 ↗(On Diff #77530)

So, for instance, persisting a (Value *, Value *) set for the entirety of CompareSCEVComplexity will also do the same thing (and will probably be mildly cleaner).

dfukalov added inline comments.Nov 14 2016, 3:31 PM
lib/Analysis/ScalarEvolution.cpp
704 ↗(On Diff #77530)

Yes, I got it. I'll try to get a statistic on my hard case to check actual sizes of both caches (for an entire CompareSCEVComplexity call tree). It seems if they will be quite big and will have almost same size it will be better to combine caches of pairs (Value *, Value *) separately.

What do you think about improving caches further to cache all already estimated values, not only zeros?

sanjoy added inline comments.Nov 14 2016, 3:47 PM
lib/Analysis/ScalarEvolution.cpp
704 ↗(On Diff #77530)

What do you think about improving caches further to cache all already estimated values, not only zeros?

IIUC that won't help as much since (by design) only the last leaf query and the "path" from the last leaf query to the root query in CompareXXXComplexity returns a non-zero value, so it sound less "bang for the buck" to me. The big gains are by speeding up expressions that look "almost" equal which you've already addressed in this patch.

However, if you want to do the work to cache non-zero complexity differences then I won't stop you. :)

This revision was automatically updated to reflect the committed changes.