This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/
-
lib/
-
Analysis/
1
LoopAccessAnalysis.cpp
-
Transforms/
-
Scalar/
3
LoopDistribute.cpp
1/2
LoopLoadElimination.cpp
-
Utils/
-
LoopVersioning.cpp

Differential D92066

[LAA] Relax restrictions on early exits in loop structure
ClosedPublic

Authored by reames on Nov 24 2020, 6:23 PM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal

Commits

rGf5fe8493e5ac: [LAA] Relax restrictions on early exits in loop structure

Summary

This is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be purely NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist).

Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

reames created this revision.Nov 24 2020, 6:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 6:23 PM

Herald added subscribers: dantrushin, bollu, hiraditya, mcrosier. · View Herald Transcript

reames requested review of this revision.Nov 24 2020, 6:23 PM

Harbormaster completed remote builds in B80032: Diff 307490.Nov 24 2020, 7:11 PM

ping

This looks good to me, adding some minor comments. Thanks!

I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times.

Well, LAA's RuntimePointerChecking::insert() relies on getBackedgeTakenCount() to provide "the" trip-count, which may vary for distinct instructions if the loop is not bottom-tested. But it does provide an upper-bound for all instructions, i.e., a safe, conservative value. In other words, a slightly tighter runtime check could potentially be provided for pointers accessed one less iteration, appearing 'below' the exiting block.

(where duplicates don't already exist)

LV indeed checks for single-exit-at-bottom by itself, in LVL.canVectorize*(). Curious to see this restriction being lifted! It is somewhat related to "massaging" inner loops when vectorizing an outer loop.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1789	Note: these DEBUG messages get lost.
llvm/lib/Transforms/Scalar/LoopDistribute.cpp
674	This existing !getExitBlock() check indeed saves us from introducing an additional !getExitingBlock() check; worth a comment, here and/or in LoopInfo? Note that L may have a single exit block B and multiple exiting blocks - all jumping to B. But getExitBlock() returns false for such an L, in contrast to getUniqueExitBlock() which returns true. I.e., getExitBlock() does imply getExitingBlock().
680	May be worth noting in the patch summary that some ORE and LAA debug messages are added and dropped, respectively.
683–684	Hoist or remove above comment?
llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
635	Does this (TODO?) comment imply that in present/future this limitation can/should be dropped?

This revision is now accepted and ready to land.Dec 12 2020, 3:23 PM

LGTM, thanks! It looks like there are some formatting issues, please re-format before landing.

In D92066#2450541, @Ayal wrote:

I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times.

Well, LAA's RuntimePointerChecking::insert() relies on getBackedgeTakenCount() to provide "the" trip-count, which may vary for distinct instructions if the loop is not bottom-tested. But it does provide an upper-bound for all instructions, i.e., a safe, conservative value. In other words, a slightly tighter runtime check could potentially be provided for pointers accessed one less iteration, appearing 'below' the exiting block.

That matches my understanding as well. The current approach should be conservatively correct, with potential for making LAA smarter when it comes to reasoning about dependences only along certain exit paths in the future.

reames added inline comments.Dec 14 2020, 12:29 PM

llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp
635	I was thinking of it more as explaining why a random check exists. Someone interested could certainly relax this, I have no intention to pursue.

This revision was landed with ongoing or failed builds.Dec 14 2020, 12:44 PM

Closed by commit rGf5fe8493e5ac: [LAA] Relax restrictions on early exits in loop structure (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGf5fe8493e5ac: [LAA] Relax restrictions on early exits in loop structure.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LoopAccessAnalysis.cpp

20 lines

Transforms/

Scalar/

LoopDistribute.cpp

4 lines

LoopLoadElimination.cpp

3 lines

Utils/

LoopVersioning.cpp

5 lines

Diff 311674

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 1,775 Lines • ▼ Show 20 Lines	bool LoopAccessInfo::canAnalyzeLoop() {
if (TheLoop->getNumBackEdges() != 1) {		if (TheLoop->getNumBackEdges() != 1) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LAA: loop control flow is not understood by analyzer\n");		dbgs() << "LAA: loop control flow is not understood by analyzer\n");
recordAnalysis("CFGNotUnderstood")		recordAnalysis("CFGNotUnderstood")
<< "loop control flow is not understood by analyzer";		<< "loop control flow is not understood by analyzer";
return false;		return false;
}		}

// We must have a single exiting block.
if (!TheLoop->getExitingBlock()) {
LLVM_DEBUG(
dbgs() << "LAA: loop control flow is not understood by analyzer\n");
recordAnalysis("CFGNotUnderstood")
<< "loop control flow is not understood by analyzer";
AyalUnsubmitted Not Done Reply Inline Actions Note: these DEBUG messages get lost. Ayal: Note: these DEBUG messages get lost.
return false;
}

// We only handle bottom-tested loops, i.e. loop in which the condition is
// checked at the end of each iteration. With that we can assume that all
// instructions in the loop are executed the same number of times.
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
LLVM_DEBUG(
dbgs() << "LAA: loop control flow is not understood by analyzer\n");
recordAnalysis("CFGNotUnderstood")
<< "loop control flow is not understood by analyzer";
return false;
}

// ScalarEvolution needs to be able to find the exit count.		// ScalarEvolution needs to be able to find the exit count.
const SCEV *ExitCount = PSE->getBackedgeTakenCount();		const SCEV *ExitCount = PSE->getBackedgeTakenCount();
if (isa<SCEVCouldNotCompute>(ExitCount)) {		if (isa<SCEVCouldNotCompute>(ExitCount)) {
recordAnalysis("CantComputeNumberOfIterations")		recordAnalysis("CantComputeNumberOfIterations")
<< "could not determine number of loop iterations";		<< "could not determine number of loop iterations";
LLVM_DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");		LLVM_DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");
return false;		return false;
}		}
▲ Show 20 Lines • Show All 520 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopDistribute.cpp

Show First 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	public:
/// Try to distribute an inner-most loop.		/// Try to distribute an inner-most loop.
bool processLoop(std::function<const LoopAccessInfo &(Loop &)> &GetLAA) {		bool processLoop(std::function<const LoopAccessInfo &(Loop &)> &GetLAA) {
assert(L->isInnermost() && "Only process inner loops.");		assert(L->isInnermost() && "Only process inner loops.");

LLVM_DEBUG(dbgs() << "\nLDist: In \""		LLVM_DEBUG(dbgs() << "\nLDist: In \""
<< L->getHeader()->getParent()->getName()		<< L->getHeader()->getParent()->getName()
<< "\" checking " << *L << "\n");		<< "\" checking " << *L << "\n");

		// Having a single exit block implies there's also one exiting block.
if (!L->getExitBlock())		if (!L->getExitBlock())
		AyalUnsubmitted Not Done Reply Inline Actions This existing !getExitBlock() check indeed saves us from introducing an additional !getExitingBlock() check; worth a comment, here and/or in LoopInfo? Note that L may have a single exit block B and multiple exiting blocks - all jumping to B. But getExitBlock() returns false for such an L, in contrast to getUniqueExitBlock() which returns true. I.e., getExitBlock() does imply getExitingBlock(). Ayal: This existing !getExitBlock() check indeed saves us from introducing an additional !
return fail("MultipleExitBlocks", "multiple exit blocks");		return fail("MultipleExitBlocks", "multiple exit blocks");
if (!L->isLoopSimplifyForm())		if (!L->isLoopSimplifyForm())
return fail("NotLoopSimplifyForm",		return fail("NotLoopSimplifyForm",
"loop is not in loop-simplify form");		"loop is not in loop-simplify form");
		if (!L->isRotatedForm())
		return fail("NotBottomTested", "loop is not bottom tested");
		AyalUnsubmitted Not Done Reply Inline Actions May be worth noting in the patch summary that some ORE and LAA debug messages are added and dropped, respectively. Ayal: May be worth noting in the patch summary that some ORE and LAA debug messages are added and…

BasicBlock *PH = L->getLoopPreheader();		BasicBlock *PH = L->getLoopPreheader();

// LAA will check that we only have a single exiting block.
LAI = &GetLAA(*L);		LAI = &GetLAA(*L);
		AyalUnsubmitted Not Done Reply Inline Actions Hoist or remove above comment? Ayal: Hoist or remove above comment?

// Currently, we only distribute to isolate the part of the loop with		// Currently, we only distribute to isolate the part of the loop with
// dependence cycles to enable partial vectorization.		// dependence cycles to enable partial vectorization.
if (LAI->canVectorizeMemory())		if (LAI->canVectorizeMemory())
return fail("MemOpsCanBeVectorized",		return fail("MemOpsCanBeVectorized",
"memory operations are safe for vectorization");		"memory operations are safe for vectorization");

auto *Dependences = LAI->getDepChecker().getDependences();		auto *Dependences = LAI->getDepChecker().getDependences();
▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp

Show First 20 Lines • Show All 626 Lines • ▼ Show 20 Lines	for (Loop *L : depth_first(TopLevelLoop)) {
Changed \|= simplifyLoop(L, &DT, &LI, SE, AC, /MSSAU/ nullptr, false);		Changed \|= simplifyLoop(L, &DT, &LI, SE, AC, /MSSAU/ nullptr, false);
// We only handle inner-most loops.		// We only handle inner-most loops.
if (L->isInnermost())		if (L->isInnermost())
Worklist.push_back(L);		Worklist.push_back(L);
}		}

// Now walk the identified inner loops.		// Now walk the identified inner loops.
for (Loop *L : Worklist) {		for (Loop *L : Worklist) {
		// Match historical behavior
		AyalUnsubmitted Not Done Reply Inline Actions Does this (TODO?) comment imply that in present/future this limitation can/should be dropped? Ayal: Does this (TODO?) comment imply that in present/future this limitation can/should be dropped?
		reamesAuthorUnsubmitted Done Reply Inline Actions I was thinking of it more as explaining why a random check exists. Someone interested could certainly relax this, I have no intention to pursue. reames: I was thinking of it more as explaining why a random check exists. Someone interested could…
		if (!L->isRotatedForm() \|\| !L->getExitingBlock())
		continue;
// The actual work is performed by LoadEliminationForLoop.		// The actual work is performed by LoadEliminationForLoop.
LoadEliminationForLoop LEL(L, &LI, GetLAI(*L), &DT, BFI, PSI);		LoadEliminationForLoop LEL(L, &LI, GetLAI(*L), &DT, BFI, PSI);
Changed \|= LEL.processLoop();		Changed \|= LEL.processLoop();
}		}
return Changed;		return Changed;
}		}

namespace {		namespace {
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopVersioning.cpp

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	for (Loop TopLevelLoop : LI)
for (Loop *L : depth_first(TopLevelLoop))		for (Loop *L : depth_first(TopLevelLoop))
// We only handle inner-most loops.		// We only handle inner-most loops.
if (L->isInnermost())		if (L->isInnermost())
Worklist.push_back(L);		Worklist.push_back(L);

// Now walk the identified inner loops.		// Now walk the identified inner loops.
bool Changed = false;		bool Changed = false;
for (Loop *L : Worklist) {		for (Loop *L : Worklist) {
		if (!L->isLoopSimplifyForm() \|\| !L->isRotatedForm() \|\|
		!L->getExitingBlock())
		continue;
const LoopAccessInfo &LAI = GetLAA(*L);		const LoopAccessInfo &LAI = GetLAA(*L);
if (L->isLoopSimplifyForm() && !LAI.hasConvergentOp() &&		if (!LAI.hasConvergentOp() &&
(LAI.getNumRuntimePointerChecks() \|\|		(LAI.getNumRuntimePointerChecks() \|\|
!LAI.getPSE().getUnionPredicate().isAlwaysTrue())) {		!LAI.getPSE().getUnionPredicate().isAlwaysTrue())) {
LoopVersioning LVer(LAI, LAI.getRuntimePointerChecking()->getChecks(), L,		LoopVersioning LVer(LAI, LAI.getRuntimePointerChecking()->getChecks(), L,
LI, DT, SE);		LI, DT, SE);
LVer.versionLoop();		LVer.versionLoop();
LVer.annotateLoopWithNoAlias();		LVer.annotateLoopWithNoAlias();
Changed = true;		Changed = true;
}		}
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines