ScalarEvolution::getMulExpr could take a very long time to execute when there is a long change of dependent multiplication. Worse yet, when the number of operands was very high, Overflow would be set consistently in the middle of the loop, so no progress was actually made in simplifying the SCEV. By testing for overflow early, we can avoid entering the loop in the first place. I have included a test case (choose-overflow-fast.ll) that takes a very long time (probably hours) to execute without this patch. After applying the patch, the test completes in about 1.5 seconds.
Details
Diff Detail
Event Timeline
Good idea! Your change looks good, but I'm not sure I understand the math.
Instead of:
Choose(AddRec->getNumOperands() - 2, AddRec->getNumOperands() / 2 - 1,
Why not:
// max(choose(n, k)) = choose(max(n), max(k)/2) // max(n) = max(x) // = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 2 // where OtherAddRec->getNumOperands >= 2 // // max(k)/2 = max(2x - y)/2 = max(x)/2 = AddRec->getNumOperands()/2 Choose(AddRec->getNumOperands(), AddRec->getNumOperands() / 2,
Regardless of whether I'm right, please add similar comments.
Instead of modifying the loop condition, it might me more clear just to write:
if (EarlyOut)
continue
(otherwise it looks like EarlyOut is loop variant).
On more thing. Can you please explain the state of the existing code? (note, I didn't write this, I only reformatted it at one point). Why do we need two attempts to loop over OtherIdx?
for (unsigned OtherIdx = Idx+1; OtherIdx < Ops.size() && isa<SCEVAddRecExpr>(Ops[OtherIdx]); ++OtherIdx) {
That makes the getMulExpr reduction cubic! (Probably n^4 considering the cost of Choose.)
I stared at this for a long time and just don't get it. We only iterate over the OuterIdx loop when all previous attempts to multiply AddRec * OtherAddRec failed with Overflow. This seems to me to guarantee that all subsequent attempts will fail.
I have made the changes you suggested. The code is functionally equivalent to the original version, but the design of the original version is a bit unclear to me. Basically, the code is trying to combine many AddRecs into a single expression. The run-time is "only" N^2 in the size of Ops, because the second and third loop levels share indexes. I think the underlying issue is the expensive recursive calls to getMulExpr in the inner-most loop.
I know the problem with the loop structure isn't a problem with your change per-say, but it makes your change a lot harder to understand.
It seems obvious to me that the 2nd and 3rd loop are redundant. But I have been wrong before. Can you confirm? If you agree, would you be willing to remove that extra loop as a separate checkin before committing your change?
Comments could be a little more clear.
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
2064–2070 | I think this comment needs to be clarified. The loop actually computes up to C(xe - 1, xe - 1) right? We just happen to know that the Choose function's maximum is at C(xe - 1, (xe - 1) / 2). Let's make that explicit. |
Andrew Trick wrote:
Good idea! Your change looks good, but I'm not sure I understand the math.
Instead of:
Choose(AddRec->getNumOperands() - 2, AddRec->getNumOperands() / 2 - 1,Why not:
// max(choose(n, k)) = choose(max(n), max(k)/2) // max(n) = max(x) // = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 2 // where OtherAddRec->getNumOperands>= 2 // // max(k)/2 = max(2x - y)/2 = max(x)/2 = AddRec->getNumOperands()/2 Choose(AddRec->getNumOperands(), AddRec->getNumOperands() / 2,Regardless of whether I'm right, please add similar comments.
Instead of modifying the loop condition, it might me more clear just to write:
if (EarlyOut)
continue(otherwise it looks like EarlyOut is loop variant).
On more thing. Can you please explain the state of the existing code? (note, I didn't write this, I only reformatted it at one point).
Guilty here, reporting as ordered.
Why do we need two attempts to loop over OtherIdx?
for (unsigned OtherIdx = Idx+1; OtherIdx< Ops.size()&& isa<SCEVAddRecExpr>(Ops[OtherIdx]); ++OtherIdx) {That makes the getMulExpr reduction cubic! (Probably n^4 considering the cost of Choose.)
I stared at this for a long time and just don't get it. We only iterate over the OuterIdx loop when all previous attempts to multiply AddRec * OtherAddRec failed with Overflow. This seems to me to guarantee that all subsequent attempts will fail.
My best guess is that this is a buggy attempt to revisit Ops and
continue folding when the inner loop modified Ops, before going back
through getMulExpr. What we actually do is through getMulExpr
immediately as soon as a modification is made, so the outer loop is
completely pointless.
Removing it passes tests, I'll remove it.
Nick
llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
2120 | You tested whether we'll overflow up front. Is that the exact case where we overflow or does it only catch some of the cases where we overflow? If it's exactly right, then we can stop updating and checking Overflow in here, right? | |
test/Analysis/ScalarEvolution/choose-overflow-fast.ll | ||
1 | Can you use "opt < %s -analyze -scalar-evolution" instead? |
I think this comment needs to be clarified.
The loop actually computes up to C(xe - 1, xe - 1) right?
We just happen to know that the Choose function's maximum is at C(xe - 1, (xe - 1) / 2). Let's make that explicit.