This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with
ClosedPublic

Authored by fedor.sergeev on Nov 7 2018, 1:05 PM.

Details

Summary

We need to control exponential behavior of loop-unswitch so we do not get
run-away compilation.

Suggested solution is to introduce a multiplier for an unswitch cost that
makes cost prohibitive as soon as there are too many candidates and too
many sibling loops (meaning we have already started duplicating loops
by unswitching).

It does solve the currently known problem with compile-time degradation
(PR 39544).

Diff Detail

Event Timeline

fedor.sergeev created this revision.Nov 7 2018, 1:05 PM

Honestly, this seems like a really reasonable approach in practice.

Between the sibling threshold and the candidate threshold I think gives highly stable results as well as pruning unreasonable search spaces.

I think this does in practice prevent the explosion, not just make it less likely.

One thing that might be interesting to consider is whether unswitching that typically creates siblings could through some surprise create non-sibling loops due to restructuring of the nest? If not, then this seems pretty solid. If so, we could consider look at the sibling loop nest sizes rather than just the sibling loop nest counts.

Can a contrived test case hit this quickly enough to make sense to add to the tests and show the limit being applied? Potentially by lowering these thresholds on the commandline?

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2486–2487

s/since it a/since it is a/

or maybe....

"since it is designed to limit the worst case behavior and not an attempt to provide a nuanced heuristic size prediction"

2493–2494

Maybe:

"When the number of unswitch candidates is below our "bottom cap" we disable the scaling of the cost and use the directly computed cost."

2496

I would just cast the size to an int rather than the larger function style cast?

2503–2504

Maybe:

"Handle possible overlow."
->
"Compute the cost multiplier in a way that won't overflow by saturating at an upper bound."

In general I think it is what we can go/start with.
This formula seems to be a bit pessimistic. Specifically not every candidate creates sibling loops. Only if all successors is in loop or there are sub-loops before unswitch candidate. So applying the power multiplier for this candidates seems unreasonable.
Also it is possible that several unswitch candidates can be done at one shot (the same condition), in this case the number of siblings will not grow so fast.
But these improvements can be done as a follow-up if needed.

General comment, please consider to extract the helper function findBestUnswitchCandidate to get these computation in one place because it becomes not so trivial.

mkazantsev added inline comments.Nov 7 2018, 10:34 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2489

The check here should be more accurate, at least it should consider conditional branches that has a dest outside the loop, because unswitching of such branches won't produce a sibling loop.

2513

Space before (

2518

How can CostMultiplier be less than 1? Integer overflow is undefined, so you won't catch it, otherwise it's just impossible. :)

introducing two tests, no comments addressed yet

fedor.sergeev edited the summary of this revision. (Show Details)Nov 8 2018, 1:12 PM

I like you suggestion of writing FileCheck against print<loops> output.

The 96 loop case .... is painful. Is it possible to get something smaller? Or possible to add something like CHECK-COUNT-<N>: ... ?

Not trying to make this too much yak shaving / work invention so if adding CHECK-COUNT is ... too painful and you can get the large ones closer to 32, I can tolerate that. Still seems better than grep.

test/Transforms/SimpleLoopUnswitch/exponential-nontrivial-unswitch.ll
20–27 ↗(On Diff #173215)

Maybe use cap=4 or something here so that you still see the cap applying compared to the entire thing disabled below?

addressing some of the comments, will work on improving tests next

fedor.sergeev retitled this revision from WIP... [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with to [SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with.Nov 8 2018, 2:50 PM
fedor.sergeev edited the summary of this revision. (Show Details)
fedor.sergeev marked 6 inline comments as done.

updating tests, using not-yet-integrated CHECK-COUNT- FileCheck directives

introducing cap=4 tests in nested case; use sort -b

mkazantsev added inline comments.Nov 12 2018, 3:12 AM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
75

That name is not super-informative. Maybe something like LimitUnswitchingFromExponentExplosion or something of this variety?

2281

I'd suggest to make this check outside calculateUnswitchCostMultiplier and only multiply cost by this factor if needed.

2487

Maybe 1..UnswitchThreshold?

fedor.sergeev added inline comments.Nov 12 2018, 3:39 AM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
75

There might be other means of controlling this explosion things.
I would prefer naming by actual semantics and not by the most interesting side-effect.
This is definitely not an option that anybody should use lightly.

update as per Max' comments

fedor.sergeev marked 2 inline comments as done.Nov 12 2018, 3:41 AM
chandlerc accepted this revision.Nov 12 2018, 7:43 AM

I think this is looking pretty good as an initial cut. I'm still very interested in follow-ups to use a more precise heuristic of course, but I think those can be follow-ups.

Make sure to sync with Max before landing to make sure he's OK with the current state.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
82–85

Maybe UnswitchNumInitialUnscaledCandidates for a name?

2295–2296

Just return here?

This revision is now accepted and ready to land.Nov 12 2018, 7:43 AM

more extensive checking for exiting branches; adding tests with exiting branches

fedor.sergeev added inline comments.Nov 12 2018, 12:48 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2295–2296

Nope, power 0 means 1 for this part of a multiplier.
Still can get nontrivial siblings multiplier part.

renamed bottom-cap option, updated comments, tests

fedor.sergeev planned changes to this revision.Nov 14 2018, 6:24 AM

Discovered a tricky testcase with 16-way switch and nested exiting branches which manages to skip the multiplier introduced here and still lead to exponential explosion.
Definitely need to check if exiting branch dominates the latch before skipping its multiplier calculation.
Also the testcase shows that we really need to go further calculating costs per candidate and using that in calculation of exponential-explosion threshold, just as Chandler asked to.

Will do an update to a multiplier skipping check here.
Other changes will go as a follow up (hopefully, soon :-/ ).

handle switch candidates; exponential switch case added

This revision is now accepted and ready to land.Nov 14 2018, 1:51 PM
fedor.sergeev marked 6 inline comments as done.Nov 14 2018, 1:55 PM

Makes sense to me generally. Some questions below really.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
84

nit: s/Amount/Number/

2273–2275

Not sure how simple this is now?

2299–2300

I'd change the name here. You're not counting candidates, you're counting unswitched clones I think.

To that end, should you do the same filtering here that you do above for guards and exiting conditions?

fedor.sergeev added inline comments.Nov 14 2018, 2:56 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2273–2275

Err... intent was for it to be simple :)

2299–2300

Hmm... yes, filtering them here makes sense.
Speaking of which, I dont believe my filtering is right for the switch.
Even if switch has one exiting case it still can have many other cases remaining for the unswitch (and thus duplication of loops).
Will try to do something here, though for now I'm still going to keep it "simple" ... at least to the extent possible.

updating names, comments etc

chandlerc added inline comments.Nov 14 2018, 3:22 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2299–2300

I'd change here and above to skip when: DT.dominates(CondBlock, Latch) && (isGuard(&TI) || CondBlock->getNumSuccessors() - count_if(CondBlock->successors(), [](BasicBlock *SuccBB) { return !L.contains(SuccBB); }) <= 1).

chandlerc added inline comments.Nov 14 2018, 3:24 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2299–2300

Er, that should somewhat obviously be: DT.dominates(CondBlock, Latch) && (isGuard(&TI) || count_if(CondBlock->successors(), [](BasicBlock *SuccBB) { return L.contains(SuccBB); }) <= 1)

getting clones calculation more precise

fedor.sergeev marked 7 inline comments as done.Nov 14 2018, 4:26 PM
chandlerc accepted this revision.Nov 14 2018, 9:08 PM

still LGTM.

lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2295

I think the yoda-comparison makes this harder to read. I'd rather say count_if(...) <= 1.

fedor.sergeev added inline comments.Nov 14 2018, 10:54 PM
lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2295

Specifically wanted to try this and check how people react. :)
I find both variants somewhat uneasy to glance over.
Standard one because of the multi-line nature of a lambda and many closing brackets/parens :(

This revision was automatically updated to reflect the committed changes.