This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
2
GVN.cpp
-
test/Transforms/GVN/
-
Transforms/
-
GVN/
-
local-pre.ll
-
pre-load.ll

Differential D21041

[GVN] PRE can't hoist loads across calls in general.
Needs ReviewPublic

Authored by eli.friedman on Jun 6 2016, 2:19 PM.

Download Raw Diff

Details

Reviewers

chandlerc
• dberlin
reames
sanjoy

Summary

Issue exposed by noalias or more aggressive alias analysis.

I'm not particularly happy with the extra loop this patch adds, but I'm
not sure how to go about fixing it.

I'm also not very happy with the use of mayHaveSideEffects; we only care
specifically about control flow here.

Diff Detail

Event Timeline

eli.friedman updated this revision to Diff 59784.Jun 6 2016, 2:19 PM

eli.friedman retitled this revision from to [GVN] PRE can't hoist loads across calls in general..

eli.friedman updated this object.

eli.friedman added reviewers: chandlerc, • dberlin, reames, sanjoy.

eli.friedman added a subscriber: llvm-commits.

I'm very concerned about adding these loops everywhere that are N^2 and
require checking literally every instruction in every block to figure
things out.

Why are we not just adding edges to an exit block or something that the CFG
based algorithms will naturally see as a hoist blocker?

It probably doesn't make sense to turn every call instruction into an invoke or equivalent; that would cause the size of the IR to explode, and make the IR more difficult to use for passes which don't actually care about "trivial" edges (including most of GVN itself).

It might be possible to add some sort of sub-block abstraction over basic blocks... for example:

block:
  call void @a() ; sub-block 0
  %y = add i32 %x, 1 ; sub-block 1
  call void @b() ; sub-block 1
  ret i32 %y ; sub-block 2

Then we provide some sort of sub-block tree that includes edges to trivial unwind blocks.

Assuming we have such a thing, the PRE algorithm can iterate over it to behave correctly. (I'm assuming MemoryDependenceAnalysis itself wouldn't use it because of the overhead involved.)

The question, of course, is how exactly you would implement this. If you try to compute it on the fly, it boils down to exactly the same loop I've already written. If you try to maintain it as a side-table, you now have a gigantic hashtable containing every instruction in the function, and modifying the IR requires special sub-block-aware methods to insert, remove, or move instructions. Attaching it to the IR itself adds an overhead of at least one pointer per Instruction, plus extra overhead for passes which don't care.

One possibility is writing a "strong post dominance" analysis pass that was brought up by @broune earlier [0]. If it looks like a lot of LLVM's transform passes are buggy in the face of exit(0) or throwing or inf-looping calls, perhaps we will be okay paying the cost of computing and preserving this analysis?

We also need some infrastructural work around "halting" or "always_returns" attributes; I suspect the bug you're fixing here will also occur if the function being called does volatile int i = 0; while (true) i++; instead of exit(0).

[0]: http://lists.llvm.org/pipermail/llvm-dev/2015-July/087744.html

Whatever we choose, i think we need to take a step back and evaluate the
larger problem and whether we are solving it in the best way, rather than
just add tons of "check every instruction between hear and there" code :)

(If we end up thinking that is truly the best way, great, i'll not object.
i'm more concerned that we are just trying to patch what we see as bugs
instead of stopping to think about whether the whole thing is just broken
and in need of rethinking, because i suspect it is.)

Improved control-flow check, fix for scalar PRE.

Still has crappy O(N^2) loop. My current thinking is that it's possible to precompute it on a
per-block basis in GVN::runImpl.

Yuck, how have we managed to be this wrong for this long and not notice it?

Like Danny, I'm hesitant to add the instruction walks here. MDA already did that once, so doing the walk again seems very wasteful. I can see a couple of options here:

Add an analysis which tracks where a basic block always exits if entered. By returning a safe result ("maybe?"), the invalidation wouldn't be too painful. This would be a step in a direction I've been thinking about for a while to consolidate all of our various dereferenceability checks.
Extend MDA to track this information when doing it's walk. This would be strictly weaker than the current code though.

At minimum, the patch should be rewritten as an extension to isValueFullyAvailableInBlock; that's the bit that's supposed to be reasoning about anticipation for LoadPRE.

lib/Transforms/Scalar/GVN.cpp
2422	Can you point to the test case which requires this part? ScalarPRE is not supposed to need to reason about availability. Where does this break? Also, can this be rephrased in terms of the same isValueFullyAvailableInBlock helper?

In D21041#457240, @reames wrote:

Yuck, how have we managed to be this wrong for this long and not notice it?

For the load case, you have to have a pointer which is both NoAlias relative to an arbitrary call and not dereferenceable... so it's basically impossible to trigger at the moment unless you have noalias metadata.

Like Danny, I'm hesitant to add the instruction walks here. MDA already did that once, so doing the walk again seems very wasteful. I can see a couple of options here:

Add an analysis which tracks where a basic block always exits if entered. By returning a safe result ("maybe?"), the invalidation wouldn't be too painful. This would be a step in a direction I've been thinking about for a while to consolidate all of our various dereferenceability checks.

Extend MDA to track this information when doing it's walk. This would be strictly weaker than the current code though.

I'm hesitant to add a new analysis pass because keeping the analysis up-to-date is inevitably painful... especially given that GVN probably wants to check this at higher resolution than just per-block. Maybe it's the right approach, though.

At minimum, the patch should be rewritten as an extension to isValueFullyAvailableInBlock; that's the bit that's supposed to be reasoning about anticipation for LoadPRE.

I'm not following; this isn't something we need to check on a per-predecessor basis. The issue is basically whether it's legal to hoist a given load from the middle of its parent BB to the beginning of that BB.

lib/Transforms/Scalar/GVN.cpp
2422	See testcase; this is basically just avoiding dividing by zero.

I'm hesitant to add a new analysis pass because keeping the analysis
up-to-date is inevitably painful... especially given that GVN probably
wants to check this at higher resolution than just per-block.

Again, it's very wasteful for it to have to know/compute (even once), where
the blocks that it has to look at in more detail are, when nobody makes
things *more* maythrow then they were before.

If the CFG properly represented "this block, somewhere, throws", it would
know "okay, if i want to hoist out of this block, i have to look harder".

Instead, we now have a bunch of passes that are, at a minimum, going to
look at every instruction in each block and start keeping track of the same
info. In fact, every patch we've done so far here (even mine!) tries to
recompute this very info so it can figure out where it needs to look harder
in constant time.

Also, the current PRE does not really do good PRE, and makes no attempt to
place instructions like a real PRE would.
Any real PRE or store sinking pass is going to want to know what blocks are
"transparent", that is, it is possible to hoist or sink through.

MemorySSA will tell them if the memory state is killed in the block in
constant time.
SSA will tell them if the operands are killed or reusable in the block in
constant time.

The per-block bit you are talking about is precisely the last piece that
tells you if a block is transparent in constant time.

Without it, you must touch every instruction in the program to know which
blocks are transparent.

Maybe it's the right approach, though.

At minimum, the patch should be rewritten as an extension to

isValueFullyAvailableInBlock; that's the bit that's supposed to be
reasoning about anticipation for LoadPRE.

I'm not following; this isn't something we need to check on a
per-predecessor basis. The issue is basically whether it's legal to hoist
a given load from the middle of its parent BB to the beginning of that BB.

This is just because of how current PRE works (which, as mentioned, is
pretty wasteful). If you only want to fix that behavior, then yeah, compute
it once, use that.

hiraditya added a subscriber: hiraditya.Jun 14 2016, 3:13 PM

Inactive, as far as I can tell.

davide added a subscriber: davide.Jun 24 2017, 12:50 PM

efriedma mentioned this in D37460: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors.Sep 12 2017, 11:31 AM

Resigning from a stale review (2016). Feel free to re-add if thread ever revived.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

GVN.cpp

24 lines

test/

Transforms/

GVN/

local-pre.ll

37 lines

pre-load.ll

52 lines

Diff 60478

lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 1,385 Lines • ▼ Show 20 Lines	if (AnalyzeLoadAvailability(LI, DepInfo, Address, AV)) {
UnavailableBlocks.push_back(DepBB);		UnavailableBlocks.push_back(DepBB);
}		}
}		}

assert(NumDeps == ValuesPerBlock.size() + UnavailableBlocks.size() &&		assert(NumDeps == ValuesPerBlock.size() + UnavailableBlocks.size() &&
"post condition violation");		"post condition violation");
}		}

		static bool canHoistAcross(BasicBlock::iterator BBI, BasicBlock::iterator BBE) {
		// Don't hoist a load across a call which could throw an exception
		// or call exit().
		// FIXME: Potential O(N^2) performance issue?
		for (; BBI != BBE; ++BBI)
		if (isGuaranteedToTransferExecutionToSuccessor(BBI))
		return false;
		return true;
		}

bool GVN::PerformLoadPRE(LoadInst *LI, AvailValInBlkVect &ValuesPerBlock,		bool GVN::PerformLoadPRE(LoadInst *LI, AvailValInBlkVect &ValuesPerBlock,
UnavailBlkVect &UnavailableBlocks) {		UnavailBlkVect &UnavailableBlocks) {
// Okay, we have some definitions of the value. This means that the value		// Okay, we have some definitions of the value. This means that the value
// is available in some of our (transitive) predecessors. Lets think about		// is available in some of our (transitive) predecessors. Lets think about
// doing PRE of this load. This will involve inserting a new load into the		// doing PRE of this load. This will involve inserting a new load into the
// predecessor when it's not available. We could do this in general, but		// predecessor when it's not available. We could do this in general, but
// prefer to not increase code size. As such, we only do this when we know		// prefer to not increase code size. As such, we only do this when we know
// that we only have to insert one load (which means we're basically moving		// that we only have to insert one load (which means we're basically moving
// the load, not inserting a new one).		// the load, not inserting a new one).

SmallPtrSet<BasicBlock *, 4> Blockers(UnavailableBlocks.begin(),		SmallPtrSet<BasicBlock *, 4> Blockers(UnavailableBlocks.begin(),
UnavailableBlocks.end());		UnavailableBlocks.end());

// Let's find the first basic block with more than one predecessor. Walk		// Let's find the first basic block with more than one predecessor. Walk
// backwards through predecessors if needed.		// backwards through predecessors if needed.
BasicBlock *LoadBB = LI->getParent();		BasicBlock *LoadBB = LI->getParent();
BasicBlock *TmpBB = LoadBB;		BasicBlock *TmpBB = LoadBB;

		const DataLayout &DL = LI->getModule()->getDataLayout();
		Value* UnderlyingObject = GetUnderlyingObject(LI->getPointerOperand(), DL);
		bool SafeToLoadUnconditionally = isa<AllocaInst>(UnderlyingObject);
		if (!SafeToLoadUnconditionally && !canHoistAcross(LoadBB->begin(), LI->getIterator()))
		return false;

while (TmpBB->getSinglePredecessor()) {		while (TmpBB->getSinglePredecessor()) {
TmpBB = TmpBB->getSinglePredecessor();		TmpBB = TmpBB->getSinglePredecessor();
if (TmpBB == LoadBB) // Infinite (unreachable) loop.		if (TmpBB == LoadBB) // Infinite (unreachable) loop.
return false;		return false;
if (Blockers.count(TmpBB))		if (Blockers.count(TmpBB))
return false;		return false;

// If any of these blocks has more than one successor (i.e. if the edge we		// If any of these blocks has more than one successor (i.e. if the edge we
// just traversed was critical), then there are other paths through this		// just traversed was critical), then there are other paths through this
// block along which the load may not be anticipated. Hoisting the load		// block along which the load may not be anticipated. Hoisting the load
// above this block would be adding the load to execution paths along		// above this block would be adding the load to execution paths along
// which it was not previously executed.		// which it was not previously executed.
if (TmpBB->getTerminator()->getNumSuccessors() != 1)		if (TmpBB->getTerminator()->getNumSuccessors() != 1)
return false;		return false;

		if (!SafeToLoadUnconditionally && !canHoistAcross(TmpBB->begin(), TmpBB->end()))
		return false;
}		}

assert(TmpBB);		assert(TmpBB);
LoadBB = TmpBB;		LoadBB = TmpBB;

// Check to see how many predecessors have the loaded value fully		// Check to see how many predecessors have the loaded value fully
// available.		// available.
MapVector<BasicBlock , Value > PredLoads;		MapVector<BasicBlock , Value > PredLoads;
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	for (BasicBlock *OrigPred : CriticalEdgePred) {
assert(!PredLoads.count(OrigPred) && "Split edges shouldn't be in map!");		assert(!PredLoads.count(OrigPred) && "Split edges shouldn't be in map!");
PredLoads[NewPred] = nullptr;		PredLoads[NewPred] = nullptr;
DEBUG(dbgs() << "Split critical edge " << OrigPred->getName() << "->"		DEBUG(dbgs() << "Split critical edge " << OrigPred->getName() << "->"
<< LoadBB->getName() << '\n');		<< LoadBB->getName() << '\n');
}		}

// Check if the load can safely be moved to all the unavailable predecessors.		// Check if the load can safely be moved to all the unavailable predecessors.
bool CanDoPRE = true;		bool CanDoPRE = true;
const DataLayout &DL = LI->getModule()->getDataLayout();
SmallVector<Instruction*, 8> NewInsts;		SmallVector<Instruction*, 8> NewInsts;
for (auto &PredLoad : PredLoads) {		for (auto &PredLoad : PredLoads) {
BasicBlock *UnavailablePred = PredLoad.first;		BasicBlock *UnavailablePred = PredLoad.first;

// Do PHI translation to get its value in the predecessor if necessary. The		// Do PHI translation to get its value in the predecessor if necessary. The
// returned pointer (if non-null) is guaranteed to dominate UnavailablePred.		// returned pointer (if non-null) is guaranteed to dominate UnavailablePred.

// If all preds have a single successor, then we know it is safe to insert		// If all preds have a single successor, then we know it is safe to insert
▲ Show 20 Lines • Show All 888 Lines • ▼ Show 20 Lines	bool GVN::performScalarPRE(Instruction *CurInst) {
// insertion.		// insertion.
Instruction *PREInstr = nullptr;		Instruction *PREInstr = nullptr;

if (NumWithout != 0) {		if (NumWithout != 0) {
// Don't do PRE across indirect branch.		// Don't do PRE across indirect branch.
if (isa<IndirectBrInst>(PREPred->getTerminator()))		if (isa<IndirectBrInst>(PREPred->getTerminator()))
return false;		return false;

		if (!isSafeToSpeculativelyExecute(CurInst) &&
		reamesUnsubmitted Not Done Reply Inline Actions Can you point to the test case which requires this part? ScalarPRE is not supposed to need to reason about availability. Where does this break? Also, can this be rephrased in terms of the same isValueFullyAvailableInBlock helper? reames: Can you point to the test case which requires this part? ScalarPRE is not supposed to need to…
		eli.friedmanAuthorUnsubmitted Not Done Reply Inline Actions See testcase; this is basically just avoiding dividing by zero. eli.friedman: See testcase; this is basically just avoiding dividing by zero.
		!canHoistAcross(CurrentBlock->begin(), CurInst->getIterator()))
		return false;

// We can't do PRE safely on a critical edge, so instead we schedule		// We can't do PRE safely on a critical edge, so instead we schedule
// the edge to be split and perform the PRE the next time we iterate		// the edge to be split and perform the PRE the next time we iterate
// on the function.		// on the function.
unsigned SuccNum = GetSuccessorNumber(PREPred, CurrentBlock);		unsigned SuccNum = GetSuccessorNumber(PREPred, CurrentBlock);
if (isCriticalEdge(PREPred->getTerminator(), SuccNum)) {		if (isCriticalEdge(PREPred->getTerminator(), SuccNum)) {
toSplit.push_back(std::make_pair(PREPred->getTerminator(), SuccNum));		toSplit.push_back(std::make_pair(PREPred->getTerminator(), SuccNum));
return false;		return false;
}		}
▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

test/Transforms/GVN/local-pre.ll

	; RUN: opt < %s -gvn -enable-pre -S \| grep "b.pre"			; RUN: opt < %s -gvn -S \| FileCheck %s

	define i32 @main(i32 %p, i32 %q) {			define i32 @test1(i32 %p, i32 %q) {
				; CHECK-LABEL: @test1
	block1:			block1:
	%cmp = icmp eq i32 %p, %q			%cmp = icmp eq i32 %p, %q
	br i1 %cmp, label %block2, label %block3			br i1 %cmp, label %block2, label %block3

	block2:			block2:
	%a = add i32 %p, 1			%a = add i32 %p, 1
	br label %block4			br label %block4

				; CHECK: block3:
				; CHECK-NEXT: add i32 %p, 1
	block3:			block3:
	br label %block4			br label %block4

	block4:			block4:
				; CHECK: block4:
				; CHECK-NEXT: phi
				; CHECK-NEXT: ret
	%b = add i32 %p, 1			%b = add i32 %p, 1
	ret i32 %b			ret i32 %b
	}			}

				define i32 @test2(i32 %p, i32 %q) {
				; CHECK-LABEL: @test2
				; CHECK: block1:
				block1:
				%cmp = icmp eq i32 %p, %q
				br i1 %cmp, label %block2, label %block3

				block2:
				%a = sdiv i32 %p, %q
				br label %block4

				block3:
				br label %block4

				; CHECK: block4
				; CHECK-NEXT: call
				; CHECK-NEXT: %b = sdiv
				; CHECK-NEXT: ret i32 %b

				block4:
				call void @may_exit() nounwind
				%b = sdiv i32 %p, %q
				ret i32 %b
				}

				declare void @may_exit() nounwind

test/Transforms/GVN/pre-load.ll

	Show First 20 Lines • Show All 424 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: %c2 = cleanuppad within none []			; CHECK-NEXT: %c2 = cleanuppad within none []
	; CHECK-NEXT: %NOTPRE = load i32, i32* %p			; CHECK-NEXT: %NOTPRE = load i32, i32* %p
	cleanup2:			cleanup2:
	%c2 = cleanuppad within none []			%c2 = cleanuppad within none []
	%NOTPRE = load i32, i32* %p			%NOTPRE = load i32, i32* %p
	call void @g(i32 %NOTPRE)			call void @g(i32 %NOTPRE)
	cleanupret from %c2 unwind to caller			cleanupret from %c2 unwind to caller
	}			}

				; Don't PRE load across call which could throw or call exit().
				define i32 @test13(i32* noalias nocapture readonly %x, i32* noalias nocapture %r, i32 %a) {
				; CHECK-LABEL: @test13(
				; CHECK: entry:
				; CHECK-NEXT: icmp eq
				; CHECK-NEXT: br i1
				entry:
				%tobool = icmp eq i32 %a, 0
				br i1 %tobool, label %if.end, label %if.then

				; CHECK: if.then:
				; CHECK-NEXT: load i32
				; CHECK-NEXT: store i32
				if.then:
				%uu = load i32, i32* %x, align 4
				store i32 %uu, i32* %r, align 4
				br label %if.end

				; CHECK: if.end:
				; CHECK-NEXT: call void @f()
				; CHECK-NEXT: load i32
				if.end:
				call void @f()
				%vv = load i32, i32* %x, align 4
				ret i32 %vv
				}

				; Okay to PRE load from alloca across call.
				declare void @h(i32* nocapture)
				define i32 @test14(i32* noalias nocapture %r, i32 %a) {
				; CHECK-LABEL: @test14(
				entry:
				%x = alloca i32
				call void @h(i32* %x)
				%tobool = icmp eq i32 %a, 0
				br i1 %tobool, label %if.end, label %if.then

				if.then:
				%uu = load i32, i32* %x, align 4
				store i32 %uu, i32* %r, align 4
				br label %if.end

				; CHECK: if.end:
				; CHECK-NEXT: %vv = phi i32
				; CHECK-NEXT: call void @f()
				; CHECK-NEXT: ret i32 %vv
				if.end:
				call void @f()
				%vv = load i32, i32* %x, align 4
				ret i32 %vv
				}