This is an archive of the discontinued LLVM Phabricator instance.

Exit ScalarEvolution::getMulExpr Early when Choose Overflows
Needs ReviewPublic

Authored by tjablin on Aug 28 2014, 5:17 PM.

Download Raw Diff

Details

Reviewers

Summary

ScalarEvolution::getMulExpr could take a very long time to execute when there is a long change of dependent multiplication. Worse yet, when the number of operands was very high, Overflow would be set consistently in the middle of the loop, so no progress was actually made in simplifying the SCEV. By testing for overflow early, we can avoid entering the loop in the first place. I have included a test case (choose-overflow-fast.ll) that takes a very long time (probably hours) to execute without this patch. After applying the patch, the test completes in about 1.5 seconds.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 13065.Aug 28 2014, 5:17 PM

tjablin retitled this revision from to Exit ScalarEvolution::getMulExpr Early when Choose Overflows.

tjablin updated this object.

tjablin edited the test plan for this revision. (Show Details)

tjablin added a reviewer: atrick.

tjablin set the repository for this revision to rL LLVM.

tjablin added a subscriber: Unknown Object (MLST).

Good idea! Your change looks good, but I'm not sure I understand the math.

Instead of:

Choose(AddRec->getNumOperands() - 2,
       AddRec->getNumOperands() / 2 - 1,

Why not:

// max(choose(n, k)) = choose(max(n), max(k)/2)
// max(n) = max(x)
//   = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 2
//   where OtherAddRec->getNumOperands >= 2
//
// max(k)/2 = max(2x - y)/2 = max(x)/2 = AddRec->getNumOperands()/2
Choose(AddRec->getNumOperands(),
       AddRec->getNumOperands() / 2,

Regardless of whether I'm right, please add similar comments.

Instead of modifying the loop condition, it might me more clear just to write:

if (EarlyOut)
continue

(otherwise it looks like EarlyOut is loop variant).

On more thing. Can you please explain the state of the existing code? (note, I didn't write this, I only reformatted it at one point). Why do we need two attempts to loop over OtherIdx?

for (unsigned OtherIdx = Idx+1;
     OtherIdx < Ops.size() && isa<SCEVAddRecExpr>(Ops[OtherIdx]);
     ++OtherIdx) {

That makes the getMulExpr reduction cubic! (Probably n^4 considering the cost of Choose.)

I stared at this for a long time and just don't get it. We only iterate over the OuterIdx loop when all previous attempts to multiply AddRec * OtherAddRec failed with Overflow. This seems to me to guarantee that all subsequent attempts will fail.

This revision now requires changes to proceed.Aug 28 2014, 11:07 PM

I have made the changes you suggested. The code is functionally equivalent to the original version, but the design of the original version is a bit unclear to me. Basically, the code is trying to combine many AddRecs into a single expression. The run-time is "only" N^2 in the size of Ops, because the second and third loop levels share indexes. I think the underlying issue is the expensive recursive calls to getMulExpr in the inner-most loop.

I know the problem with the loop structure isn't a problem with your change per-say, but it makes your change a lot harder to understand.

It seems obvious to me that the 2nd and 3rd loop are redundant. But I have been wrong before. Can you confirm? If you agree, would you be willing to remove that extra loop as a separate checkin before committing your change?

Comments could be a little more clear.

lib/Analysis/ScalarEvolution.cpp
2064–2070	I think this comment needs to be clarified. The loop actually computes up to C(xe - 1, xe - 1) right? We just happen to know that the Choose function's maximum is at C(xe - 1, (xe - 1) / 2). Let's make that explicit.

Andrew Trick wrote:

Good idea! Your change looks good, but I'm not sure I understand the math.

Instead of:
Choose(AddRec->getNumOperands() - 2,
       AddRec->getNumOperands() / 2 - 1,
Why not:
// max(choose(n, k)) = choose(max(n), max(k)/2)
// max(n) = max(x)
//   = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 2
//   where OtherAddRec->getNumOperands>= 2
//
// max(k)/2 = max(2x - y)/2 = max(x)/2 = AddRec->getNumOperands()/2
Choose(AddRec->getNumOperands(),
       AddRec->getNumOperands() / 2,
Regardless of whether I'm right, please add similar comments.

Instead of modifying the loop condition, it might me more clear just to write:
if (EarlyOut)
continue
(otherwise it looks like EarlyOut is loop variant).

On more thing. Can you please explain the state of the existing code? (note, I didn't write this, I only reformatted it at one point).

Guilty here, reporting as ordered.

Why do we need two attempts to loop over OtherIdx?

for (unsigned OtherIdx = Idx+1;
     OtherIdx<  Ops.size()&&  isa<SCEVAddRecExpr>(Ops[OtherIdx]);
     ++OtherIdx) {
That makes the getMulExpr reduction cubic! (Probably n^4 considering the cost of Choose.)

I stared at this for a long time and just don't get it. We only iterate over the OuterIdx loop when all previous attempts to multiply AddRec * OtherAddRec failed with Overflow. This seems to me to guarantee that all subsequent attempts will fail.

My best guess is that this is a buggy attempt to revisit Ops and
continue folding when the inner loop modified Ops, before going back
through getMulExpr. What we actually do is through getMulExpr
immediately as soon as a modification is made, so the outer loop is
completely pointless.

Removing it passes tests, I'll remove it.

Nick

http://reviews.llvm.org/D5113

llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

nicholas added inline comments.Aug 31 2014, 10:35 PM

lib/Analysis/ScalarEvolution.cpp
2120	You tested whether we'll overflow up front. Is that the exact case where we overflow or does it only catch some of the cases where we overflow? If it's exactly right, then we can stop updating and checking Overflow in here, right?
test/Analysis/ScalarEvolution/choose-overflow-fast.ll
2	Can you use "opt < %s -analyze -scalar-evolution" instead?

Revision Contents

Path

Size

lib/

Analysis/

ScalarEvolution.cpp

17 lines

test/

Analysis/

ScalarEvolution/

choose-overflow-fast.ll

48 lines

Diff 13065

lib/Analysis/ScalarEvolution.cpp

Context not available.
	// Okay, if there weren't any loop invariants to be folded, check to see if	// Okay, if there weren't any loop invariants to be folded, check to see if
	// there are multiple AddRec's with the same loop induction variable being	// there are multiple AddRec's with the same loop induction variable being
	// multiplied together. If so, we can fold them.	// multiplied together. If so, we can fold them.

		// If the number of AddRec operands is too high, Choose will overflow, so
		// don't bother entering the loop.
		bool EarlyOut = false;
		Choose(AddRec->getNumOperands() - 2,
		AddRec->getNumOperands() / 2 - 1,
		EarlyOut);
		atrickUnsubmitted Not Done Reply Inline Actions I think this comment needs to be clarified. The loop actually computes up to C(xe - 1, xe - 1) right? We just happen to know that the Choose function's maximum is at C(xe - 1, (xe - 1) / 2). Let's make that explicit. atrick: I think this comment needs to be clarified. The loop actually computes up to C(xe - 1, xe - 1)…
	for (unsigned OtherIdx = Idx+1;	for (unsigned OtherIdx = Idx+1;
	OtherIdx < Ops.size() && isa<SCEVAddRecExpr>(Ops[OtherIdx]);	!EarlyOut && OtherIdx < Ops.size() &&
		isa<SCEVAddRecExpr>(Ops[OtherIdx]);
	++OtherIdx) {	++OtherIdx) {
	if (AddRecLoop != cast<SCEVAddRecExpr>(Ops[OtherIdx])->getLoop())	if (AddRecLoop != cast<SCEVAddRecExpr>(Ops[OtherIdx])->getLoop())
	continue;	continue;
Context not available.
	if (!OtherAddRec \|\| OtherAddRec->getLoop() != AddRecLoop)	if (!OtherAddRec \|\| OtherAddRec->getLoop() != AddRecLoop)
	continue;	continue;

		// If we are going to overflow. Let's get it over with early.
	bool Overflow = false;	bool Overflow = false;
		int xe = AddRec->getNumOperands() + OtherAddRec->getNumOperands() - 1;
		Choose(xe - 1, (xe - 1) / 2, Overflow);

	Type *Ty = AddRec->getType();	Type *Ty = AddRec->getType();
	bool LargerThan64Bits = getTypeSizeInBits(Ty) > 64;	bool LargerThan64Bits = getTypeSizeInBits(Ty) > 64;
	SmallVector<const SCEV*, 7> AddRecOps;	SmallVector<const SCEV*, 7> AddRecOps;
	for (int x = 0, xe = AddRec->getNumOperands() +	for (int x = 0; x != xe && !Overflow; ++x) {
	OtherAddRec->getNumOperands() - 1; x != xe && !Overflow; ++x) {
	const SCEV *Term = getConstant(Ty, 0);	const SCEV *Term = getConstant(Ty, 0);
	for (int y = x, ye = 2*x+1; y != ye && !Overflow; ++y) {	for (int y = x, ye = 2*x+1; y != ye && !Overflow; ++y) {
	uint64_t Coeff1 = Choose(x, 2*x - y, Overflow);	uint64_t Coeff1 = Choose(x, 2*x - y, Overflow);
Context not available.
		nicholasUnsubmitted Not Done Reply Inline Actions You tested whether we'll overflow up front. Is that the exact case where we overflow or does it only catch some of the cases where we overflow? If it's exactly right, then we can stop updating and checking Overflow in here, right? nicholas: You tested whether we'll overflow up front. Is that the exact case where we overflow or does it…

test/Analysis/ScalarEvolution/choose-overflow-fast.ll

				; RUN: opt < %s -iv-users

				nicholasUnsubmitted Not Done Reply Inline Actions Can you use "opt < %s -analyze -scalar-evolution" instead? nicholas: Can you use "opt < %s -analyze -scalar-evolution" instead?
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define void @main() {
				entry:
				br label %for.body33

				for.body33: ; preds = %for.body33, %entry
				%inc4064 = phi i32 [ %inc40, %for.body33 ], [ 1, %entry ]
				%mul35.lcssa63 = phi i32 [ %mul35.15, %for.body33 ], [ undef, %entry ]
				%mul34 = mul i32 %mul35.lcssa63, %inc4064
				%mul35 = mul i32 %mul34, %mul35.lcssa63
				%mul34.1 = mul i32 %mul35, %inc4064
				%mul35.1 = mul i32 %mul34.1, %mul35
				%mul34.2 = mul i32 %mul35.1, %inc4064
				%mul35.2 = mul i32 %mul34.2, %mul35.1
				%mul34.3 = mul i32 %mul35.2, %inc4064
				%mul35.3 = mul i32 %mul34.3, %mul35.2
				%mul34.4 = mul i32 %mul35.3, %inc4064
				%mul35.4 = mul i32 %mul34.4, %mul35.3
				%mul34.5 = mul i32 %mul35.4, %inc4064
				%mul35.5 = mul i32 %mul34.5, %mul35.4
				%mul34.6 = mul i32 %mul35.5, %inc4064
				%mul35.6 = mul i32 %mul34.6, %mul35.5
				%mul34.7 = mul i32 %mul35.6, %inc4064
				%mul35.7 = mul i32 %mul34.7, %mul35.6
				%mul34.8 = mul i32 %mul35.7, %inc4064
				%mul35.8 = mul i32 %mul34.8, %mul35.7
				%mul34.9 = mul i32 %mul35.8, %inc4064
				%mul35.9 = mul i32 %mul34.9, %mul35.8
				%mul34.10 = mul i32 %mul35.9, %inc4064
				%mul35.10 = mul i32 %mul34.10, %mul35.9
				%mul34.11 = mul i32 %mul35.10, %inc4064
				%mul35.11 = mul i32 %mul34.11, %mul35.10
				%mul34.12 = mul i32 %mul35.11, %inc4064
				%mul35.12 = mul i32 %mul34.12, %mul35.11
				%mul34.13 = mul i32 %mul35.12, %inc4064
				%mul35.13 = mul i32 %mul34.13, %mul35.12
				%mul34.14 = mul i32 %mul35.13, %inc4064
				%mul35.14 = mul i32 %mul34.14, %mul35.13
				%mul34.15 = mul i32 %mul35.14, %inc4064
				%mul35.15 = mul i32 %mul34.15, %mul35.14
				%inc40 = add i32 %inc4064, 1
				br label %for.body33
				}
				No newline at end of file