This is an archive of the discontinued LLVM Phabricator instance.

[Unroll] Improve the brute force loop unroll estimate by propagating through PHI nodes across iterations.
ClosedPublic

Authored by chandlerc on Aug 2 2015, 12:06 AM.

Details

Summary

This patch teaches the new advanced loop unrolling heuristics to propagate
constants into the loop from the preheader and around the backedge after
simulating each iteration. This lets us brute force solve simple recurrances
that aren't modeled effectively by SCEV. It also makes it more clear why we
need to process the loop in-order rather than bottom-up which might otherwise
make much more sense (for example, for DCE).

This came out of an attempt I'm making to develop a principled way to account
for dead code in the unroll estimation. When I implemented
a forward-propagating version of that it produced incorrect results due to
failing to propagate *cost* between loop iterations through the PHI nodes, and
it occured to me we really should at least propagate simplifications across
those edges, and it is quite easy thanks to the loop being in canonical and
LCSSA form.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 31198.Aug 2 2015, 12:06 AM
chandlerc retitled this revision from to [Unroll] Improve the brute force loop unroll estimate by propagating through PHI nodes across iterations..
chandlerc updated this object.
chandlerc added a reviewer: mzolotukhin.
chandlerc added a subscriber: llvm-commits.
mzolotukhin edited edge metadata.Aug 2 2015, 11:32 PM

Hi Chandler,

That's really cool! Looks good to me.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
551–553 ↗(On Diff #31198)

Shouldn't we just bail out in this case instead of crashing with assert?
IOW, is it possible to make a loop, that won't be simplified, but will reach this point?

test/Transforms/LoopUnroll/full-unroll-heuristics-phi-prop.ll
14–22 ↗(On Diff #31198)

Maybe just leave first 2-3 of them to make the test shorter?

%x1 = or i64 %x0, 1
%x2 = or i64 %x1, 2
mzolotukhin accepted this revision.Aug 2 2015, 11:32 PM
mzolotukhin edited edge metadata.
This revision is now accepted and ready to land.Aug 2 2015, 11:32 PM

Thanks, will submit with a minimal test case. See below for detailed replies.

lib/Transforms/Scalar/LoopUnrollPass.cpp
551–553 ↗(On Diff #31198)

My intent is for that to be the responsibility of the caller.

In this case, a LoopPass operates under these specific invariants. This is why we require and preserve LoopSimplify and LCSSA above.

So currently, no, it isn't possible without a bug somewhere.

test/Transforms/LoopUnroll/full-unroll-heuristics-phi-prop.ll
14–22 ↗(On Diff #31198)

I mostly wanted it to be more clear that these couldn't just be simplified via SCEV. The add and icmp are simplified already for example. I'll see if there is a useful smaller number that still clearly fails without my change.

This revision was automatically updated to reflect the committed changes.

Thanks!

lib/Transforms/Scalar/LoopUnrollPass.cpp
551–553 ↗(On Diff #31198)

Sounds good, just wanted to verify this.