This is an archive of the discontinued LLVM Phabricator instance.

current test input .ll is quite big (~1000 lines). I'm trying to reduce it, but not sure I'll be able to. Should I add it's big version to the change?

Hi Dmitry,

Please upload the patch with context (http://llvm.org/docs/Phabricator.html)

Have you checked if this is solely due to the depth of the expressions, or if this is due to some exponential behavior? Can this be solved using a cache, like in CompareValueComplexity (though it also has a Depth so perhaps it isn't a good example).

As far as the test case goes, you can always add a C++ test case to unittests/ that generates the IR or SCEV expression in memory. I agree with you in that we should avoid adding excessively large test files to tests/

sanjoy requested changes to this revision.Nov 8 2016, 9:33 AM

sanjoy edited edge metadata.

This revision now requires changes to proceed.Nov 8 2016, 9:33 AM

Can you check if this fixes the tests cases from these bugs:
https://llvm.org/bugs/show_bug.cgi?id=28721
https://llvm.org/bugs/show_bug.cgi?id=30257

dfukalov updated this revision to Diff 77322.Nov 9 2016, 2:08 AM

dfukalov edited edge metadata.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptNov 9 2016, 2:08 AM

Hi Sanjoy,

the issue is in SLSR pass when it tries to construct with getAddExpr and getMulExpr on a source that is similar to test case from https://llvm.org/bugs/show_bug.cgi?id=28721 but much more bigger: >1000 lines of muls+adds in main block and has 8 phis instead of 2.

Tom,
as I mentioned above, the input caused issue is very similar to test case of bug 28721 and in my case it hangs in SLSR pass too. But both bugs are not reproducible for me.

I'll try to create a test case by expanding one from bug 28721.

Hi Sanjoy,

I've reduced testcase to 400 lines. Then I tried to use cache as you suggested and it fixes the issue indeed. I'm going to re-test and up[date patch.

But I would add recursion depth control since there is no guarantee the cache will help in all possible cases. For example, during test case reduction I had input variants of input that caused hangs in CompareSCEVComplexity called from other (not SLSR) passes. What do you think? If you agree with recursion depth, should it be new parameter (as in my first implementation) or a hard-coded value as for CompareValueComplexity?

Hi Daniil,

In D26389#590731, @dfukalov wrote:

I've reduced testcase to 400 lines. Then I tried to use cache as you suggested and it fixes the issue indeed. I'm going to re-test and up[date patch.

Great!

As I said before, if the test case is repetitive then it is probably better to generate the IR in memory using IRBuilder and put the test case in unittests/. Otherwise a 400 line file does not sound that bad.

But I would add recursion depth control since there is no guarantee the cache will help in all possible cases. For example, during test case reduction I had input variants of input that caused hangs in CompareSCEVComplexity called from other (not SLSR) passes.

I think a recursion depth as a stopgap measure is fine. For now I'd add a max depth as a cl::opt and use the same depth in both CompareValueComplexity and CompareSCEVComplexity (that is, don't "forget" the depth when CompareSCEVComplexity calls into CompareValueComplexity).

I'd make this max depth fairly high by default (perhaps 64 or something) since, as I understand it, the caching should take care of avoiding any exponential behavior, and if the we're spending lots of time in CompareSCEVComplexity then the SCEV expression itself is very complex. In this (latter) case it is defensible for CompareSCEVComplexity to take time (it simply has more work to do), but only up to a point.

What do you think? If you agree with recursion depth, should it be new parameter (as in my first implementation) or a hard-coded value as for CompareValueComplexity?

Hi Sanjoy,

I've updated the change and added unittest. I made it since reduced to 400 lines testcase can be compiled by ~10 sec on a fast machine without the fix, but the suggested unittest has "softhang" behavior.

For my initial big testcase recursion depth was up to 50+ an I never completed the compilation on a quite fast CPU. So I suggest to set it 32 by default.

Hi Sanjoy, would you please check the updated diff and test?

lgtm

lib/Analysis/ScalarEvolution.cpp
690	IMO a slightly better pattern would have been to have `EqCache` be a `SmallSet` of `std::pair<PointerUnion<Value , const SCEV >, PointerUnion<Value , const SCEV >>` and to re-use the same cache in `CompareValueComplexity`. If you agree, can you do that as a followup change?

This revision is now accepted and ready to land.Nov 14 2016, 10:13 AM

dfukalov added inline comments.Nov 14 2016, 2:56 PM

lib/Analysis/ScalarEvolution.cpp
690	I'm not sure this will actually be a re-use: any correct `SCEV ` is not equal to any correct `Value ` (except nullptr values), so the combined cache will contain all of items from both current caches, without memory usage reducing. And if it so, any search in a combined cache will be the same or slower. E.g. there is an edge case: we can try to find actual value typed `std::pair<const SCEV, const SCEV>` in a set that contains `std::pair<Value, Value>` only. Do you agree?

sanjoy added inline comments.Nov 14 2016, 3:05 PM

lib/Analysis/ScalarEvolution.cpp
690	We might win in some edge cases by since we'll persist the same value cache across multiple `SCEVUnknown` instances in the same SCEV tree. That is: %x = ... %y = ... %x1 = add %x, 20 %y1 = add %y, 20 If LHS has SCEVUnknowns for %x and %x1, and RHS has SCEVUnknowns for %y and %y1, and then a combined cache will save us from recomputing `CompareValueComplexity(%x, %y)` twice. But this is a minimal gain -- please don't block the commit on getting this right.

sanjoy added inline comments.Nov 14 2016, 3:10 PM

lib/Analysis/ScalarEvolution.cpp
690	So, for instance, persisting a `(Value , Value )` set for the entirety of `CompareSCEVComplexity` will also do the same thing (and will probably be mildly cleaner).

dfukalov added inline comments.Nov 14 2016, 3:31 PM

lib/Analysis/ScalarEvolution.cpp
690	Yes, I got it. I'll try to get a statistic on my hard case to check actual sizes of both caches (for an entire `CompareSCEVComplexity` call tree). It seems if they will be quite big and will have almost same size it will be better to combine caches of pairs `(Value , Value )` separately. What do you think about improving caches further to cache all already estimated values, not only zeros?

sanjoy added inline comments.Nov 14 2016, 3:47 PM

lib/Analysis/ScalarEvolution.cpp
690	What do you think about improving caches further to cache all already estimated values, not only zeros? IIUC that won't help as much since (by design) only the last leaf query and the "path" from the last leaf query to the root query in `CompareXXXComplexity` returns a non-zero value, so it sound less "bang for the buck" to me. The big gains are by speeding up expressions that look "almost" equal which you've already addressed in this patch. However, if you want to do the work to cache non-zero complexity differences then I won't stop you. :)

Closed by commit rL287232: [SCEV] limit recursion depth of CompareSCEVComplexity (authored by dfukalov). · Explain WhyNov 17 2016, 8:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Analysis/

ScalarEvolution.cpp

23 lines

Diff 77322

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	VerifySCEVMap("verify-scev-maps",
cl::desc("Verify no dangling value in ScalarEvolution's "		cl::desc("Verify no dangling value in ScalarEvolution's "
"ExprValueMap (slow)"));		"ExprValueMap (slow)"));

static cl::opt<unsigned> MulOpsInlineThreshold(		static cl::opt<unsigned> MulOpsInlineThreshold(
"scev-mulops-inline-threshold", cl::Hidden,		"scev-mulops-inline-threshold", cl::Hidden,
cl::desc("Threshold for inlining multiplication operands into a SCEV"),		cl::desc("Threshold for inlining multiplication operands into a SCEV"),
cl::init(1000));		cl::init(1000));

		static cl::opt<unsigned>
		MaxCompareDepth("scalar-evolution-max-compare-depth", cl::Hidden,
		cl::desc("Maximum depth of recursive compare complexity"),
		cl::init(10));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SCEV class definitions		// SCEV class definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Implementation of the SCEV class.		// Implementation of the SCEV class.
//		//

▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	CompareValueComplexity(SmallSet<std::pair<Value , Value >, 8> &EqCache,

return 0;		return 0;
}		}

// Return negative, zero, or positive, if LHS is less than, equal to, or greater		// Return negative, zero, or positive, if LHS is less than, equal to, or greater
// than RHS, respectively. A three-way result allows recursive comparisons to be		// than RHS, respectively. A three-way result allows recursive comparisons to be
// more efficient.		// more efficient.
static int CompareSCEVComplexity(const LoopInfo const LI, const SCEV LHS,		static int CompareSCEVComplexity(const LoopInfo const LI, const SCEV LHS,
const SCEV *RHS) {		const SCEV *RHS, unsigned Depth = 0) {
// Fast-path: SCEVs are uniqued so we can do a quick equality check.		// Fast-path: SCEVs are uniqued so we can do a quick equality check.
if (LHS == RHS)		if (LHS == RHS)
return 0;		return 0;

// Primarily, sort the SCEVs by their getSCEVType().		// Primarily, sort the SCEVs by their getSCEVType().
unsigned LType = LHS->getSCEVType(), RType = RHS->getSCEVType();		unsigned LType = LHS->getSCEVType(), RType = RHS->getSCEVType();
if (LType != RType)		if (LType != RType)
return (int)LType - (int)RType;		return (int)LType - (int)RType;

		// Limit recursion depth
		if (Depth > MaxCompareDepth)
		return 0;

// Aside from the getSCEVType() ordering, the particular ordering		// Aside from the getSCEVType() ordering, the particular ordering
// isn't very important except that it's beneficial to be consistent,		// isn't very important except that it's beneficial to be consistent,
// so that (a + b) and (b + a) don't end up as different expressions.		// so that (a + b) and (b + a) don't end up as different expressions.
switch (static_cast<SCEVTypes>(LType)) {		switch (static_cast<SCEVTypes>(LType)) {
case scUnknown: {		case scUnknown: {
const SCEVUnknown *LU = cast<SCEVUnknown>(LHS);		const SCEVUnknown *LU = cast<SCEVUnknown>(LHS);
const SCEVUnknown *RU = cast<SCEVUnknown>(RHS);		const SCEVUnknown *RU = cast<SCEVUnknown>(RHS);

Show All 28 Lines	case scAddRecExpr: {

// Addrec complexity grows with operand count.		// Addrec complexity grows with operand count.
unsigned LNumOps = LA->getNumOperands(), RNumOps = RA->getNumOperands();		unsigned LNumOps = LA->getNumOperands(), RNumOps = RA->getNumOperands();
if (LNumOps != RNumOps)		if (LNumOps != RNumOps)
return (int)LNumOps - (int)RNumOps;		return (int)LNumOps - (int)RNumOps;

// Lexicographically compare.		// Lexicographically compare.
for (unsigned i = 0; i != LNumOps; ++i) {		for (unsigned i = 0; i != LNumOps; ++i) {
long X = CompareSCEVComplexity(LI, LA->getOperand(i), RA->getOperand(i));		long X = CompareSCEVComplexity(LI, LA->getOperand(i), RA->getOperand(i),
		Depth + 1);
if (X != 0)		if (X != 0)
return X;		return X;
}		}

return 0;		return 0;
}		}

case scAddExpr:		case scAddExpr:
case scMulExpr:		case scMulExpr:
case scSMaxExpr:		case scSMaxExpr:
case scUMaxExpr: {		case scUMaxExpr: {
const SCEVNAryExpr *LC = cast<SCEVNAryExpr>(LHS);		const SCEVNAryExpr *LC = cast<SCEVNAryExpr>(LHS);
const SCEVNAryExpr *RC = cast<SCEVNAryExpr>(RHS);		const SCEVNAryExpr *RC = cast<SCEVNAryExpr>(RHS);

// Lexicographically compare n-ary expressions.		// Lexicographically compare n-ary expressions.
unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands();		unsigned LNumOps = LC->getNumOperands(), RNumOps = RC->getNumOperands();
if (LNumOps != RNumOps)		if (LNumOps != RNumOps)
return (int)LNumOps - (int)RNumOps;		return (int)LNumOps - (int)RNumOps;

for (unsigned i = 0; i != LNumOps; ++i) {		for (unsigned i = 0; i != LNumOps; ++i) {
if (i >= RNumOps)		if (i >= RNumOps)
return 1;		return 1;
long X = CompareSCEVComplexity(LI, LC->getOperand(i), RC->getOperand(i));		long X = CompareSCEVComplexity(LI, LC->getOperand(i), RC->getOperand(i),
		Depth + 1);
if (X != 0)		if (X != 0)
return X;		return X;
}		}
return (int)LNumOps - (int)RNumOps;		return (int)LNumOps - (int)RNumOps;
}		}

case scUDivExpr: {		case scUDivExpr: {
const SCEVUDivExpr *LC = cast<SCEVUDivExpr>(LHS);		const SCEVUDivExpr *LC = cast<SCEVUDivExpr>(LHS);
const SCEVUDivExpr *RC = cast<SCEVUDivExpr>(RHS);		const SCEVUDivExpr *RC = cast<SCEVUDivExpr>(RHS);

// Lexicographically compare udiv expressions.		// Lexicographically compare udiv expressions.
long X = CompareSCEVComplexity(LI, LC->getLHS(), RC->getLHS());		long X = CompareSCEVComplexity(LI, LC->getLHS(), RC->getLHS(), Depth + 1);
if (X != 0)		if (X != 0)
return X;		return X;
return CompareSCEVComplexity(LI, LC->getRHS(), RC->getRHS());		return CompareSCEVComplexity(LI, LC->getRHS(), RC->getRHS(), Depth + 1);
}		}

case scTruncate:		case scTruncate:
case scZeroExtend:		case scZeroExtend:
case scSignExtend: {		case scSignExtend: {
const SCEVCastExpr *LC = cast<SCEVCastExpr>(LHS);		const SCEVCastExpr *LC = cast<SCEVCastExpr>(LHS);
const SCEVCastExpr *RC = cast<SCEVCastExpr>(RHS);		const SCEVCastExpr *RC = cast<SCEVCastExpr>(RHS);

// Compare cast expressions by operand.		// Compare cast expressions by operand.
return CompareSCEVComplexity(LI, LC->getOperand(), RC->getOperand());		return CompareSCEVComplexity(LI, LC->getOperand(), RC->getOperand(), Depth + 1);
}		}

case scCouldNotCompute:		case scCouldNotCompute:
llvm_unreachable("Attempt to use a SCEVCouldNotCompute object!");		llvm_unreachable("Attempt to use a SCEVCouldNotCompute object!");
}		}
llvm_unreachable("Unknown SCEV kind!");		llvm_unreachable("Unknown SCEV kind!");
}		}

/// Given a list of SCEV objects, order them by their complexity, and group		/// Given a list of SCEV objects, order them by their complexity, and group
/// objects of the same complexity together by value. When this routine is		/// objects of the same complexity together by value. When this routine is
/// finished, we know that any duplicates in the vector are consecutive and that		/// finished, we know that any duplicates in the vector are consecutive and that
/// complexity is monotonically increasing.		/// complexity is monotonically increasing.
///		///
/// Note that we go take special precautions to ensure that we get deterministic		/// Note that we go take special precautions to ensure that we get deterministic
/// results from this routine. In other words, we don't want the results of		/// results from this routine. In other words, we don't want the results of
/// this to depend on where the addresses of various SCEV objects happened to		/// this to depend on where the addresses of various SCEV objects happened to
/// land in memory.		/// land in memory.
///		///
static void GroupByComplexity(SmallVectorImpl<const SCEV *> &Ops,		static void GroupByComplexity(SmallVectorImpl<const SCEV *> &Ops,
LoopInfo *LI) {		LoopInfo *LI) {
if (Ops.size() < 2) return; // Noop		if (Ops.size() < 2) return; // Noop
if (Ops.size() == 2) {		if (Ops.size() == 2) {
// This is the common case, which also happens to be trivially simple.		// This is the common case, which also happens to be trivially simple.
		sanjoyUnsubmitted Not Done Reply Inline Actions IMO a slightly better pattern would have been to have `EqCache` be a `SmallSet` of `std::pair<PointerUnion<Value , const SCEV >, PointerUnion<Value , const SCEV >>` and to re-use the same cache in `CompareValueComplexity`. If you agree, can you do that as a followup change? sanjoy: IMO a slightly better pattern would have been to have `EqCache` be a `SmallSet` of `std…
		dfukalovAuthorUnsubmitted Not Done Reply Inline Actions I'm not sure this will actually be a re-use: any correct `SCEV ` is not equal to any correct `Value ` (except nullptr values), so the combined cache will contain all of items from both current caches, without memory usage reducing. And if it so, any search in a combined cache will be the same or slower. E.g. there is an edge case: we can try to find actual value typed `std::pair<const SCEV, const SCEV>` in a set that contains `std::pair<Value, Value>` only. Do you agree? dfukalov: I'm not sure this will actually be a re-use: any correct `SCEV *` is not equal to any correct…
		sanjoyUnsubmitted Not Done Reply Inline Actions We might win in some edge cases by since we'll persist the same value cache across multiple `SCEVUnknown` instances in the same SCEV tree. That is: %x = ... %y = ... %x1 = add %x, 20 %y1 = add %y, 20 If LHS has SCEVUnknowns for %x and %x1, and RHS has SCEVUnknowns for %y and %y1, and then a combined cache will save us from recomputing `CompareValueComplexity(%x, %y)` twice. But this is a minimal gain -- please don't block the commit on getting this right. sanjoy: We might win in some edge cases by since we'll persist the same value cache across multiple…
		sanjoyUnsubmitted Not Done Reply Inline Actions So, for instance, persisting a `(Value , Value )` set for the entirety of `CompareSCEVComplexity` will also do the same thing (and will probably be mildly cleaner). sanjoy: So, for instance, persisting a `(Value , Value )` set for the entirety of…
		dfukalovAuthorUnsubmitted Not Done Reply Inline Actions Yes, I got it. I'll try to get a statistic on my hard case to check actual sizes of both caches (for an entire `CompareSCEVComplexity` call tree). It seems if they will be quite big and will have almost same size it will be better to combine caches of pairs `(Value , Value )` separately. What do you think about improving caches further to cache all already estimated values, not only zeros? dfukalov: Yes, I got it. I'll try to get a statistic on my hard case to check actual sizes of both caches…
		sanjoyUnsubmitted Not Done Reply Inline Actions What do you think about improving caches further to cache all already estimated values, not only zeros? IIUC that won't help as much since (by design) only the last leaf query and the "path" from the last leaf query to the root query in `CompareXXXComplexity` returns a non-zero value, so it sound less "bang for the buck" to me. The big gains are by speeding up expressions that look "almost" equal which you've already addressed in this patch. However, if you want to do the work to cache non-zero complexity differences then I won't stop you. :) sanjoy: > What do you think about improving caches further to cache all already estimated values, not…
// Special case it.		// Special case it.
const SCEV &LHS = Ops[0], &RHS = Ops[1];		const SCEV &LHS = Ops[0], &RHS = Ops[1];
if (CompareSCEVComplexity(LI, RHS, LHS) < 0)		if (CompareSCEVComplexity(LI, RHS, LHS) < 0)
std::swap(LHS, RHS);		std::swap(LHS, RHS);
return;		return;
}		}

// Do the rough sort by complexity.		// Do the rough sort by complexity.
▲ Show 20 Lines • Show All 9,888 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] limit recursion depth of CompareSCEVComplexityClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 77322

lib/Analysis/ScalarEvolution.cpp

[SCEV] limit recursion depth of CompareSCEVComplexity
ClosedPublic