This is an archive of the discontinued LLVM Phabricator instance.

Limit number of blocks scanned for non-local dependences
ClosedPublic

Authored by joerg on Jan 12 2016, 12:26 PM.

Download Raw Diff

Details

Reviewers

Commits

rG6e1dc54f7c18: Merging rr261430: -------------------------------------------------------------…
rG36894dcfedf8: When MemoryDependenceAnalysis hits a CFG with many transparent blocks, the…
rL261430: When MemoryDependenceAnalysis hits a CFG with many transparent blocks,

Summary

When MemoryDependenceAnalysis hits a CFG with many transparent blocks, the algorithm easily degrades into quadratic memory and time complexity. The easiest example is a long chain of BBs that don't otherwise use a location. The patch introduces a limit similar to the existing instructions-per-block limit, counting the total number of blocks checked. If the limit it reached, entries are considered unknown.

I'm not entirely sure how to best test this. The test cases for the original bug are huge, when they should trigger on a moderately large machine.

Diff Detail

Repository: rL LLVM

Event Timeline

joerg updated this revision to Diff 44660.Jan 12 2016, 12:26 PM

joerg retitled this revision from to Limit number of blocks scanned for non-local dependences.

joerg updated this object.

joerg added a reviewer: reames.

joerg added a subscriber: llvm-commits.

This is https://llvm.org/bugs/show_bug.cgi?id=25897

Ping?

Hi Joerg,

Thanks for working on this!

Given your testcase (original bug), could you show how much time it takes for different BlockNumberLimit values? Although a 100 seems reasonable at first look, I wonder how much we can push the limit for the default value.

Copying comment that didn't make into phab:

Run time in steps of 100, absolute value not necessarily stable:
2.5s 3.0s 3.3s 4.1s 4.1s 4.3s 4.8s 5.1s 5.4s 5.6s

It's not exactly surprising that the initial ramp up is mostly linear. I
simply don't know what the implications for runtime are at this point,
can't easily run LNT.

I'm really not sure this patch is going in the right direction. In generally, we try to avoid arbitrary thresholds. Is there an algorithmic fix which can address the compile time issue?

Can you give a bit more information on the IR characteristics which are causing this?

Ping? As discussed in the associated PR, the algorithmic issues are pretty much unavoidable due to the nature of the analysis. As this fixes a real issue for the 3.8 release, I'd still like to see this go in. For trunk, the work on MemSSA likely makes this whole code obsolete with NewGVN.

Closed by commit rL261430: When MemoryDependenceAnalysis hits a CFG with many transparent blocks, (authored by joerg). · Explain WhyFeb 20 2016, 3:29 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri mentioned this in D84609: [MemDepAnalysis] Cut-off threshold reshuffling .Jul 26 2020, 3:31 PM

lebedev.ri mentioned this in D84742: [NFCI]MemDepAnalysis] Introduce global limit on a number of instructions to be traversed during single query.Jul 28 2020, 3:37 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

MemoryDependenceAnalysis.cpp

32 lines

Diff 48586

llvm/trunk/lib/Analysis/MemoryDependenceAnalysis.cpp

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

// Limit for the number of instructions to scan in a block.		// Limit for the number of instructions to scan in a block.

static cl::opt<unsigned> BlockScanLimit(		static cl::opt<unsigned> BlockScanLimit(
"memdep-block-scan-limit", cl::Hidden, cl::init(100),		"memdep-block-scan-limit", cl::Hidden, cl::init(100),
cl::desc("The number of instructions to scan in a block in memory "		cl::desc("The number of instructions to scan in a block in memory "
"dependency analysis (default = 100)"));		"dependency analysis (default = 100)"));

		static cl::opt<unsigned> BlockNumberLimit(
		"memdep-block-number-limit", cl::Hidden, cl::init(1000),
		cl::desc("The number of blocks to scan during memory "
		"dependency analysis (default = 1000)"));

// Limit on the number of memdep results to process.		// Limit on the number of memdep results to process.
static const unsigned int NumResultsLimit = 100;		static const unsigned int NumResultsLimit = 100;

char MemoryDependenceAnalysis::ID = 0;		char MemoryDependenceAnalysis::ID = 0;

// Register this pass...		// Register this pass...
INITIALIZE_PASS_BEGIN(MemoryDependenceAnalysis, "memdep",		INITIALIZE_PASS_BEGIN(MemoryDependenceAnalysis, "memdep",
"Memory Dependence Analysis", false, true)		"Memory Dependence Analysis", false, true)
▲ Show 20 Lines • Show All 1,173 Lines • ▼ Show 20 Lines	bool MemoryDependenceAnalysis::getNonLocalPointerDepFromBB(
SmallVector<std::pair<BasicBlock*, PHITransAddr>, 16> PredList;		SmallVector<std::pair<BasicBlock*, PHITransAddr>, 16> PredList;

// Keep track of the entries that we know are sorted. Previously cached		// Keep track of the entries that we know are sorted. Previously cached
// entries will all be sorted. The entries we add we only sort on demand (we		// entries will all be sorted. The entries we add we only sort on demand (we
// don't insert every element into its sorted position). We know that we		// don't insert every element into its sorted position). We know that we
// won't get any reuse from currently inserted values, because we don't		// won't get any reuse from currently inserted values, because we don't
// revisit blocks after we insert info for them.		// revisit blocks after we insert info for them.
unsigned NumSortedEntries = Cache->size();		unsigned NumSortedEntries = Cache->size();
		unsigned WorklistEntries = BlockNumberLimit;
		bool GotWorklistLimit = false;
DEBUG(AssertSorted(*Cache));		DEBUG(AssertSorted(*Cache));

while (!Worklist.empty()) {		while (!Worklist.empty()) {
BasicBlock *BB = Worklist.pop_back_val();		BasicBlock *BB = Worklist.pop_back_val();

// If we do process a large number of blocks it becomes very expensive and		// If we do process a large number of blocks it becomes very expensive and
// likely it isn't worth worrying about		// likely it isn't worth worrying about
if (Result.size() > NumResultsLimit) {		if (Result.size() > NumResultsLimit) {
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (!Pointer.NeedsPHITranslationFromBlock(BB)) {
if (InsertRes.first->second != Pointer.getAddr()) {		if (InsertRes.first->second != Pointer.getAddr()) {
// Make sure to clean up the Visited map before continuing on to		// Make sure to clean up the Visited map before continuing on to
// PredTranslationFailure.		// PredTranslationFailure.
for (unsigned i = 0; i < NewBlocks.size(); i++)		for (unsigned i = 0; i < NewBlocks.size(); i++)
Visited.erase(NewBlocks[i]);		Visited.erase(NewBlocks[i]);
goto PredTranslationFailure;		goto PredTranslationFailure;
}		}
}		}
		if (NewBlocks.size() > WorklistEntries) {
		// Make sure to clean up the Visited map before continuing on to
		// PredTranslationFailure.
		for (unsigned i = 0; i < NewBlocks.size(); i++)
		Visited.erase(NewBlocks[i]);
		GotWorklistLimit = true;
		goto PredTranslationFailure;
		}
		WorklistEntries -= NewBlocks.size();
Worklist.append(NewBlocks.begin(), NewBlocks.end());		Worklist.append(NewBlocks.begin(), NewBlocks.end());
continue;		continue;
}		}

// We do need to do phi translation, if we know ahead of time we can't phi		// We do need to do phi translation, if we know ahead of time we can't phi
// translate this value, don't even try.		// translate this value, don't even try.
if (!Pointer.IsPotentiallyPHITranslatable())		if (!Pointer.IsPotentiallyPHITranslatable())
goto PredTranslationFailure;		goto PredTranslationFailure;
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	PredTranslationFailure:
// If nothing works, mark the pointer as unknown.		// If nothing works, mark the pointer as unknown.
//		//
// If this is the magic first block, return this as a clobber of the whole		// If this is the magic first block, return this as a clobber of the whole
// incoming value. Since we can't phi translate to one of the predecessors,		// incoming value. Since we can't phi translate to one of the predecessors,
// we have to bail out.		// we have to bail out.
if (SkipFirstBlock)		if (SkipFirstBlock)
return true;		return true;

for (NonLocalDepInfo::reverse_iterator I = Cache->rbegin(); ; ++I) {		bool foundBlock = false;
assert(I != Cache->rend() && "Didn't find current block??");		for (NonLocalDepEntry &I: llvm::reverse(*Cache)) {
if (I->getBB() != BB)		if (I.getBB() != BB)
continue;		continue;

assert((I->getResult().isNonLocal() \|\| !DT->isReachableFromEntry(BB)) &&		assert((GotWorklistLimit \|\| I.getResult().isNonLocal() \|\| \
		!DT->isReachableFromEntry(BB)) &&
"Should only be here with transparent block");		"Should only be here with transparent block");
I->setResult(MemDepResult::getUnknown());		foundBlock = true;
Result.push_back(NonLocalDepResult(I->getBB(), I->getResult(),		I.setResult(MemDepResult::getUnknown());
		Result.push_back(NonLocalDepResult(I.getBB(), I.getResult(),
Pointer.getAddr()));		Pointer.getAddr()));
break;		break;
}		}
		(void)foundBlock;
		assert((foundBlock \|\| GotWorklistLimit) && "Current block not in cache?");
}		}

// Okay, we're done now. If we added new values to the cache, re-sort it.		// Okay, we're done now. If we added new values to the cache, re-sort it.
SortNonLocalDepInfoCache(*Cache, NumSortedEntries);		SortNonLocalDepInfoCache(*Cache, NumSortedEntries);
DEBUG(AssertSorted(*Cache));		DEBUG(AssertSorted(*Cache));
return false;		return false;
}		}

▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines