Page MenuHomePhabricator

[LoopUnroll] avoid assumption clone explosion

Authored by spatel on Apr 1 2021, 11:59 AM.



The PhaseOrdering test example is based on:

The test goes over the complexity cliff somewhere between -O2 and -O3 (more unrolling is done at -O3).
The initial assume is created by SimplifyCFG, multiplied by the LoopVectorizer, and then copied across numerous clone blocks by LoopUnroll. LoopUnroll then calls SimplifyInstruction() which calls simplifyICmpWithDominatingAssume() and we seem to be spending all of our time in there computing known bits, etc.

It seems unlikely that we have much to gain by cloning assumes late in the opt pipeline, so I'm proposing to stub that out when called during LoopUnroll.

Diff Detail

Event Timeline

spatel created this revision.Apr 1 2021, 11:59 AM
spatel requested review of this revision.Apr 1 2021, 11:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2021, 11:59 AM
nikic added a comment.Apr 9 2021, 11:36 AM

Taking your example from, it seems to be scaling with about O(n^5.5) with iteration count, which is beyond all reasonable bounds. I think this needs to be mitigated at a lower level than unrolling.

nikic added a comment.Apr 9 2021, 1:09 PM

For reference, this is the IR after unrolling but before simplification for the PhaseOrdering test: (note that mass of assumes on extractvalues is unproblematic, the issue are the assumes on %.pre.pre).

Limiting the number of assumes we look at for one value helps (and is something we should probably do as well), but I think there's a more fundamental problem with the recursion in computeKnownBitsForAssume. Possibly the exclusion mechanism there needs to be strengthened.

spatel abandoned this revision.Apr 16 2021, 5:52 AM

We decided that this really is a ValueTracking problem.
I committed D100573 / bb907b26e2bf as an alternate fix for the bug.
Let's see if there are any regressions -- and potentially compile-time improvements -- from the loss of assumption analysis power.

Also, I think it's still worth including the test here as a regression test, so we will know quickly if we fall into this problem again.
But it will be 6K+ lines of IR with all of the current -O3 expansion, so this may be one of the rare times where it's better to not auto-generate the full CHECK lines!