This is an archive of the discontinued LLVM Phabricator instance.

Exit ScalarEvolution::getMulExpr Early when Choose Overflows
Needs ReviewPublic

Authored by tjablin on Aug 28 2014, 5:17 PM.

Details

Reviewers
atrick
Summary

ScalarEvolution::getMulExpr could take a very long time to execute when there is a long change of dependent multiplication. Worse yet, when the number of operands was very high, Overflow would be set consistently in the middle of the loop, so no progress was actually made in simplifying the SCEV. By testing for overflow early, we can avoid entering the loop in the first place. I have included a test case (choose-overflow-fast.ll) that takes a very long time (probably hours) to execute without this patch. After applying the patch, the test completes in about 1.5 seconds.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 13065.Aug 28 2014, 5:17 PM
tjablin retitled this revision from to Exit ScalarEvolution::getMulExpr Early when Choose Overflows.
tjablin updated this object.
tjablin edited the test plan for this revision. (Show Details)
tjablin added a reviewer: atrick.
tjablin set the repository for this revision to rL LLVM.
tjablin added a subscriber: Unknown Object (MLST).
atrick requested changes to this revision.Aug 28 2014, 11:07 PM
atrick edited edge metadata.

Good idea! Your change looks good, but I'm not sure I understand the math.

Instead of:

Choose(AddRec->getNumOperands() - 2,
       AddRec->getNumOperands() / 2 - 1,

Why not:

// max(choose(n, k)) = choose(max(n), max(k)/2)
// max(n) = max(x)
//   = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 2
//   where OtherAddRec->getNumOperands >= 2
//
// max(k)/2 = max(2x - y)/2 = max(x)/2 = AddRec->getNumOperands()/2
Choose(AddRec->getNumOperands(),
       AddRec->getNumOperands() / 2,

Regardless of whether I'm right, please add similar comments.

Instead of modifying the loop condition, it might me more clear just to write:

if (EarlyOut)

continue

(otherwise it looks like EarlyOut is loop variant).

On more thing. Can you please explain the state of the existing code? (note, I didn't write this, I only reformatted it at one point). Why do we need two attempts to loop over OtherIdx?

for (unsigned OtherIdx = Idx+1;
     OtherIdx < Ops.size() && isa<SCEVAddRecExpr>(Ops[OtherIdx]);
     ++OtherIdx) {

That makes the getMulExpr reduction cubic! (Probably n^4 considering the cost of Choose.)

I stared at this for a long time and just don't get it. We only iterate over the OuterIdx loop when all previous attempts to multiply AddRec * OtherAddRec failed with Overflow. This seems to me to guarantee that all subsequent attempts will fail.

This revision now requires changes to proceed.Aug 28 2014, 11:07 PM
tjablin updated this revision to Diff 13119.Aug 29 2014, 5:15 PM
tjablin edited edge metadata.

I have made the changes you suggested. The code is functionally equivalent to the original version, but the design of the original version is a bit unclear to me. Basically, the code is trying to combine many AddRecs into a single expression. The run-time is "only" N^2 in the size of Ops, because the second and third loop levels share indexes. I think the underlying issue is the expensive recursive calls to getMulExpr in the inner-most loop.

I know the problem with the loop structure isn't a problem with your change per-say, but it makes your change a lot harder to understand.

It seems obvious to me that the 2nd and 3rd loop are redundant. But I have been wrong before. Can you confirm? If you agree, would you be willing to remove that extra loop as a separate checkin before committing your change?

Comments could be a little more clear.

lib/Analysis/ScalarEvolution.cpp
2064–2070

I think this comment needs to be clarified.

The loop actually computes up to C(xe - 1, xe - 1) right?

We just happen to know that the Choose function's maximum is at C(xe - 1, (xe - 1) / 2). Let's make that explicit.

Andrew Trick wrote:

Good idea! Your change looks good, but I'm not sure I understand the math.

Instead of:

Choose(AddRec->getNumOperands() - 2,
       AddRec->getNumOperands() / 2 - 1,

Why not:

// max(choose(n, k)) = choose(max(n), max(k)/2)
// max(n) = max(x)
//   = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 2
//   where OtherAddRec->getNumOperands>= 2
//
// max(k)/2 = max(2x - y)/2 = max(x)/2 = AddRec->getNumOperands()/2
Choose(AddRec->getNumOperands(),
       AddRec->getNumOperands() / 2,

Regardless of whether I'm right, please add similar comments.

Instead of modifying the loop condition, it might me more clear just to write:

if (EarlyOut)

continue

(otherwise it looks like EarlyOut is loop variant).

On more thing. Can you please explain the state of the existing code? (note, I didn't write this, I only reformatted it at one point).

Guilty here, reporting as ordered.

Why do we need two attempts to loop over OtherIdx?
for (unsigned OtherIdx = Idx+1;
     OtherIdx<  Ops.size()&&  isa<SCEVAddRecExpr>(Ops[OtherIdx]);
     ++OtherIdx) {

That makes the getMulExpr reduction cubic! (Probably n^4 considering the cost of Choose.)

I stared at this for a long time and just don't get it. We only iterate over the OuterIdx loop when all previous attempts to multiply AddRec * OtherAddRec failed with Overflow. This seems to me to guarantee that all subsequent attempts will fail.

My best guess is that this is a buggy attempt to revisit Ops and
continue folding when the inner loop modified Ops, before going back
through getMulExpr. What we actually do is through getMulExpr
immediately as soon as a modification is made, so the outer loop is
completely pointless.

Removing it passes tests, I'll remove it.

Nick

http://reviews.llvm.org/D5113


llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

nicholas added inline comments.Aug 31 2014, 10:35 PM
lib/Analysis/ScalarEvolution.cpp
2120

You tested whether we'll overflow up front. Is that the exact case where we overflow or does it only catch some of the cases where we overflow? If it's exactly right, then we can stop updating and checking Overflow in here, right?

test/Analysis/ScalarEvolution/choose-overflow-fast.ll
2

Can you use "opt < %s -analyze -scalar-evolution" instead?