If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.
Details
Diff Detail
Event Timeline
Hi Max,
I'm not very sure about the change in priority of the loop peeling. There can be performance repercussions for that change alone, and might be better to get another opinion on the priority change.
Could you perhaps have this change as a strict improvement for loop peeling in itself?
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
788 ↗ | (On Diff #89091) | Can this patch be separated into 2 parts, where the second part is this change to the priority? |
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
145 | There is a method for this getNumBackEdges. I think you can just reuse this and have a check against 1. | |
147 | can change this to for (auto *Pred: predecessors(Header)) |
lib/Transforms/Scalar/LoopUnrollPass.cpp | ||
---|---|---|
788 ↗ | (On Diff #89091) | The problem here is that without this change, if UP.Partial is not set, line 803 prevents us from peeling (we return from method with false), and if UP.Partial is set, we may unroll the loop rather than peel it and have a worse result. So the test in this patch doesn't work without this re-prioritizing. I can split it into 2 patches, but priority change will be a parent and peeling change will depend on it. |
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
---|---|---|
144 | Can you just use L->getLoopLatch()? Or is that different from what you're looking for? |
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
---|---|---|
149 | Out of curiosity, why the complexity about finding the backedge? Wouldn't *all* of the inputs to the phi be loop invariant in the case you're interested in? |
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
---|---|---|
149 | Off course, all inputs from preheaders will be invariant. Finding the backedge for header with n predecessors takes O(n) (it needs traversal over all predecessors with "contains" check in set that takes O(1). Acquiring its input also takes O(n) for every Phi, so total complexity being O(n*m) for m Phis. If we just check all inputs for being invariants, it will also take O(n*m), but we will have positive results for loops with multiple back edges. The current implementation of peeling expects the loop to have 1 back edge, otherwise it will bail and we do unneded work with such loops. |
This generally LGTM, except for some nits. But please wait for @reames as well.
lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
---|---|---|
83 | Why the parens around (Phi = ...)? | |
test/Transforms/LoopUnroll/peel-loop-not-forced.ll | ||
2 | We generally prefer to test passes in isolation. Can you please make this a test for loop-unroll only? |
llvm/trunk/lib/Transforms/Utils/LoopUnrollPeel.cpp | ||
---|---|---|
95 ↗ | (On Diff #90508) | Do we need to do some sort of threshold check here? At first glance, it looks like this will peel a loop of any size. |
Why the parens around (Phi = ...)?