This is an archive of the discontinued LLVM Phabricator instance.

[LVI] Switch from BFS to DFS exploration strategy
ClosedPublic

Authored by reames on Dec 30 2016, 7:34 PM.

Download Raw Diff

Details

Reviewers

chandlerc
wmi
• dberlin
nicholas
sanjoy
hfinkel

Commits

rGc80bd0486d1b: [LVI] Switch from BFS to DFS exploration order
rL294264: [LVI] Switch from BFS to DFS exploration order

Summary

This patch changes the order in which LVI explores previously unexplored paths.

Previously, the code used an BFS strategy where each unexplored input was added to the search queue before any of them were explored. This has the effect of causing all inputs to be explored before returning to re-evaluate the merge point (non-local or phi node). This has the unfortunate property of doing redundant work if one of the inputs to the merge is found to be overdefined (i.e. unanalysable). If any input is overdefined, the result of the merge will be too; regardless of the values of other inputs.

The new code uses a DFS strategy where we re-evaluate the merge after evaluating each input. If we discover an overdefined input, we immediately return without exploring other inputs.

I don't believe this patch changes the observed results produced by LVI, but the interactions between CVP/JT, LVI, and the partial cache reset that happens on jump-threading are complicated enough that I can't state that confidently. I can state that on at least one example (pr10584), this variant is substantially faster (80% improvement in compile time).

I also can't find any clear reason why the original code uses DFS despite it being clearly intentional. Anyone know history here?

Diff Detail

Event Timeline

reames updated this revision to Diff 82754.Dec 30 2016, 7:34 PM

reames retitled this revision from to [LVI] Switch from BFS to DFS exploration strategy.

reames updated this object.

reames added reviewers: nicholas, sanjoy, chandlerc, hfinkel, wmi, • dberlin.

Herald added a subscriber: mcrosier. · View Herald TranscriptDec 30 2016, 7:34 PM

reames added a subscriber: llvm-commits.Dec 30 2016, 7:34 PM

DFS is clearly the optimal exploration order for this problem.
As mentioned on the bug, the optimal solution is to have this do dfs backwards, and callers to call it in RPO forwards.

That will ensure the minimal work per call.

Once you commit this, i will test and update callers like CVP, etc to iterate in the correct order.

It would be nice if you had a way to test that this changes nothing. If anything, it actually should improve results, since the optimal lattice values for loops and irreducible control flow may only be achievable with proper iteration order.

My usual plan is to save-temps llvm into bc files, and then generate results before and after.

Assuming you do *something* to verify we aren't getting worse answers, i think this can go in.

This revision is now accepted and ready to land.Feb 2 2017, 11:07 AM

(also,just to record for history, this produces a 4-10x speedup on value propagation pretty much everywhere i've tried it, which is not surprising, since the old exploration strategy is almost optimally bad :P)

I have one case in pkgsrc (gmic) which currently hits the 2GB VA limit for build processes. Taking the problematic example and comparing before/after on Linux I get:
before: 1.9GB max RSS
after: 0.9GB max RSS

E.g. in this case, it effectively cuts memory use in half. That's quite significant and something I would like to see in 4.0.

FYI: i ran this on all LLVM files for correlatedvalueprop and jumpthreading.
As expected, it produces strictly better answers.
For CVP, it is significantly better at propagating non-nullness (but otherwise no code-differences).
For jump threading, no differences are found.

Closed by commit rL294264: [LVI] Switch from BFS to DFS exploration order (authored by reames). · Explain WhyFeb 6 2017, 4:36 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Analysis/

LazyValueInfo.cpp

30 lines

Diff 82754

lib/Analysis/LazyValueInfo.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	if (BB == &BB->getParent()->getEntryBlock()) {
} else {		} else {
Result = LVILatticeVal::getOverdefined();		Result = LVILatticeVal::getOverdefined();
}		}
BBLV = Result;		BBLV = Result;
return true;		return true;
}		}

// Loop over all of our predecessors, merging what we know from them into		// Loop over all of our predecessors, merging what we know from them into
// result.		// result. If we encounter an unexplored predecessor, we eagerly explore it
bool EdgesMissing = false;		// in a depth first manner. In practice, this has the effect of discovering
		// paths we can't analyze eagerly without spending compile times analyzing
		// other paths. This heuristic benefits from the fact that predecessors are
		// frequently arranged such that dominating ones come first and we quickly
		// find a path to function entry. TODO: We should consider explicitly
		// canonicalizing to make this true rather than relying on this happy
		// accident.
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {		for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
LVILatticeVal EdgeResult;		LVILatticeVal EdgeResult;
EdgesMissing \|= !getEdgeValue(Val, *PI, BB, EdgeResult);		if (!getEdgeValue(Val, *PI, BB, EdgeResult))
if (EdgesMissing)		// Explore that input, then return here
continue;		return false;

Result.mergeIn(EdgeResult, DL);		Result.mergeIn(EdgeResult, DL);

// If we hit overdefined, exit early. The BlockVals entry is already set		// If we hit overdefined, exit early. The BlockVals entry is already set
// to overdefined.		// to overdefined.
if (Result.isOverdefined()) {		if (Result.isOverdefined()) {
DEBUG(dbgs() << " compute BB '" << BB->getName()		DEBUG(dbgs() << " compute BB '" << BB->getName()
<< "' - overdefined because of pred (non local).\n");		<< "' - overdefined because of pred (non local).\n");
// Before giving up, see if we can prove the pointer non-null local to		// Before giving up, see if we can prove the pointer non-null local to
// this particular block.		// this particular block.
if (Val->getType()->isPointerTy() &&		if (Val->getType()->isPointerTy() &&
isObjectDereferencedInBlock(Val, BB)) {		isObjectDereferencedInBlock(Val, BB)) {
PointerType *PTy = cast<PointerType>(Val->getType());		PointerType *PTy = cast<PointerType>(Val->getType());
Result = LVILatticeVal::getNot(ConstantPointerNull::get(PTy));		Result = LVILatticeVal::getNot(ConstantPointerNull::get(PTy));
}		}

BBLV = Result;		BBLV = Result;
return true;		return true;
}		}
}		}
if (EdgesMissing)
return false;

// Return the merged value, which is more precise than 'overdefined'.		// Return the merged value, which is more precise than 'overdefined'.
assert(!Result.isOverdefined());		assert(!Result.isOverdefined());
BBLV = Result;		BBLV = Result;
return true;		return true;
}		}

bool LazyValueInfoImpl::solveBlockValuePHINode(LVILatticeVal &BBLV,		bool LazyValueInfoImpl::solveBlockValuePHINode(LVILatticeVal &BBLV,
PHINode PN, BasicBlock BB) {		PHINode PN, BasicBlock BB) {
LVILatticeVal Result; // Start Undefined.		LVILatticeVal Result; // Start Undefined.

// Loop over all of our predecessors, merging what we know from them into		// Loop over all of our predecessors, merging what we know from them into
// result.		// result. See the comment about the chosen traversal order in
bool EdgesMissing = false;		// solveBlockValueNonLocal; the same reasoning applies here.
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {
BasicBlock *PhiBB = PN->getIncomingBlock(i);		BasicBlock *PhiBB = PN->getIncomingBlock(i);
Value *PhiVal = PN->getIncomingValue(i);		Value *PhiVal = PN->getIncomingValue(i);
LVILatticeVal EdgeResult;		LVILatticeVal EdgeResult;
// Note that we can provide PN as the context value to getEdgeValue, even		// Note that we can provide PN as the context value to getEdgeValue, even
// though the results will be cached, because PN is the value being used as		// though the results will be cached, because PN is the value being used as
// the cache key in the caller.		// the cache key in the caller.
EdgesMissing \|= !getEdgeValue(PhiVal, PhiBB, BB, EdgeResult, PN);		if (!getEdgeValue(PhiVal, PhiBB, BB, EdgeResult, PN))
if (EdgesMissing)		// Explore that input, then return here
continue;		return false;

Result.mergeIn(EdgeResult, DL);		Result.mergeIn(EdgeResult, DL);

// If we hit overdefined, exit early. The BlockVals entry is already set		// If we hit overdefined, exit early. The BlockVals entry is already set
// to overdefined.		// to overdefined.
if (Result.isOverdefined()) {		if (Result.isOverdefined()) {
DEBUG(dbgs() << " compute BB '" << BB->getName()		DEBUG(dbgs() << " compute BB '" << BB->getName()
<< "' - overdefined because of pred (local).\n");		<< "' - overdefined because of pred (local).\n");

BBLV = Result;		BBLV = Result;
return true;		return true;
}		}
}		}
if (EdgesMissing)
return false;

// Return the merged value, which is more precise than 'overdefined'.		// Return the merged value, which is more precise than 'overdefined'.
assert(!Result.isOverdefined() && "Possible PHI in entry block?");		assert(!Result.isOverdefined() && "Possible PHI in entry block?");
BBLV = Result;		BBLV = Result;
return true;		return true;
}		}

static LVILatticeVal getValueFromCondition(Value Val, Value Cond,		static LVILatticeVal getValueFromCondition(Value Val, Value Cond,
▲ Show 20 Lines • Show All 865 Lines • Show Last 20 Lines