This is an archive of the discontinued LLVM Phabricator instance.

Differential D35475

[PATCH] [GVN] Ensure replacement instruction dominates its uses
Needs ReviewPublic

Authored by PFerreira on Jul 17 2017, 2:04 AM.

Download Raw Diff

This revision needs review, but all specified reviewers are disabled or inactive.

Details

Reviewers

• dberlin

Summary

Related to bugzilla 33731 https://bugs.llvm.org/show_bug.cgi?id=33731

This is caused by a a PRE insertion when the predecessor block hasn't yet been through Value Numbering (due to a back-edge).
The PRE insertion adds a duplicate instruction at the end of the block, but the already-existing instruction is already used in the predecessor block, like so:

L.LB1_346: ; preds = %L.LB1_423

%7 = sext i32 %2 to i64
%8 = mul i64 %7, 4

%.pre = sext i32 %1 to i64** ; inserted by GVN::performScalarPREInsertion

When we then iterate over this block (later in the pass), GVN chooses to delete the first instance of the instruction as it is duplicated (instead of the later one which was created before). This is because "GVN::performScalarPREInsertion" will number the instruction as it is created (so it is in the leader-list).

The attached patch moves the "%.pre" instruction up so that the replacement is valid.

I am not sure this is a valid fix, but I couldn't figure out another solution.
I though about stoping "GVN::performScalarPREInsertion" from inserting the instruction by numbering the BasicBlock, but that might lead to infinite loops. I also tried to stop PREInsertion from running if the BasicBlock had not been numbered yet (by keeping a list of numbered blocks) but that wouldn't work since GVN only does one pass over the function (so this case would never be optimised out).

The patch does fix the bug though, without adding regressions.

Diff Detail

Event Timeline

PFerreira created this revision.Jul 17 2017, 2:04 AM

Herald added a reviewer: • dberlin. · View Herald TranscriptJul 17 2017, 2:04 AM

Analyzing more, i'm confused, see the last paragraph.

This is caused by a a PRE insertion when the predecessor block hasn't yet been through Value Numbering (due to a back-edge).

I'm not surprised this is buggy, sadly.
I suspect there are a bunch of issues around this kind of thing.

When we then iterate over this block (later in the pass), GVN chooses to delete the first instance of the instruction as it is duplicated (instead of the later one which was created before).

This sounds like an issue with findLeader/addToLeaderTable not checking local dominance if they are in the same block, based on the assumption that it will process the block in order.
addToLeaderTable always adds to the end, and so it will hit the PRE leader before any other.

If so, the fastest fix i can think of is to number the instructions in the block starting at 0 as we process them. An instruction can only be a leader when BB = LeaderBB iff the number is < than the current instruction's number.
Assign all PRE'd instructions numbers that are MAX_INT.

If you have two MAX_INT instructions, they are in-order already (since PRE will only add them in order).

This is because "GVN::performScalarPREInsertion" will number the instruction as it is created (so it is in the leader-list).

The attached patch moves the "%.pre" instruction up so that the replacement is valid.

I am not sure this is a valid fix, but I couldn't figure out another solution.

There are a couple other ways to fix this.
Another would be "always check local dominance using OrderedInstruction in FindLeader ". This is equivalent to the above.

I also tried to stop PREInsertion from running if the BasicBlock had not been numbered yet (by keeping a list of numbered blocks) but that wouldn't work since GVN only does one pass over the function (so this case would never be optimised out).

I'm unclear on how this happens at all. Maybe you are talking about Load PRE and not scalar PRE?

Otherwise, from what I see, scalar PRE doesn't run until value numbering is completely done, except in *one* case, which is for GEP's.

while (ShouldContinue) {
  DEBUG(dbgs() << "GVN iteration: " << Iteration << "\n");
  ShouldContinue = iterateOnFunction(F);
  Changed |= ShouldContinue;
  ++Iteration;
}

if (EnablePRE) {
  // Fabricate val-num for dead-code in order to suppress assertion in
  // performPRE().
  assignValNumForDeadCode();
  bool PREChanged = true;
  while (PREChanged) {
    PREChanged = performPRE(F);
    Changed |= PREChanged;
  }

Note that GVN above is completed before scalar PRE.

The only case this isn't true is here:

If this load follows a GEP, see if we can PRE the indices before analyzing.
 if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(LI->getOperand(0))) {
   for (GetElementPtrInst::op_iterator OI = GEP->idx_begin(),
                                       OE = GEP->idx_end();
        OI != OE; ++OI)
     if (Instruction *I = dyn_cast<Instruction>(OI->get()))
       performScalarPRE(I);
 }

Obviously, this is broken if we haven't value numbered the predecessors.
In fact, even with your fix, i don't see how it could ever get correct answers.

So i'm confused how your issue occurs at all, and feel like i'm missing something.
Can you give me a code path/example, other than the GEP one (which is not mentioned in your bugzilla bug), where we try to perform scalar PRE before value numbering a predecessor?

I set a breakpoint in the R->moveBefore(I) which I added and this was the block in question:

L.LB1_346:                                        ; preds = %L.LB1_423
  %6 = sext i32 %1 to i64
  %7 = mul i64 %6, 4
  %8 = getelementptr i8, i8* null, i64 %7
  %9 = bitcast void (...)* @poisn2_ to void ([...])*
  call [...]
  %.pre = sext i32 %1 to i64
  br label %L.LB1_350

The way I debugged this was to track the creation of the "%.pre" instruction. This led me to "GVN::performScalarPREInsertion". I set a breakpoint in that function and it gets hit before the BasicBlock above has a chance to go through "processBlock" (during iterateOnFunction). Hence the %7 has not been found yet (not yet in the leader list). This violates the assumption in performScalarPREInsertion, as you stated before (I think) which is documented by the comment in that function:

// Because we are going top-down through the block, all value numbers
// will be available in the predecessor by the time we need them.

Thus "%.pre" gets created.
The iteration order doesn't process %7's parent block due to a loop.

Does that make sense?

In D35475#811435, @PFerreira wrote:
I set a breakpoint in the R->moveBefore(I) which I added and this was the block in question:
L.LB1_346:                                        ; preds = %L.LB1_423
  %6 = sext i32 %1 to i64
  %7 = mul i64 %6, 4
  %8 = getelementptr i8, i8* null, i64 %7
  %9 = bitcast void (...)* @poisn2_ to void ([...])*
  call [...]
  %.pre = sext i32 %1 to i64
  br label %L.LB1_350
The way I debugged this was to track the creation of the "%.pre" instruction. This led me to "GVN::performScalarPREInsertion". I set a breakpoint in that function and it gets hit before the BasicBlock above has a chance to go through "processBlock" (during iterateOnFunction). Hence the %7 has not been found yet (not yet in the leader list). This violates the assumption in performScalarPREInsertion, as you stated before (I think) which is documented by the comment in that function:

This is, in fact, because of the GEP case, where it is PRE'ing in the middle of value numbering.
There is no other case it does this.

This seems fairly dangerous.
I'd actually really like to understand if this produces any real performance improvements, because i'd rather just disable this.
There are a bunch of cases we know such things will fail (for example, if GVN has discovered dead code, etc).

Because GVN's VN is so weak, we may be able to prove it safe, but it seems ... a bad plan.

A-ha, I see your point. Apologies for the confusion, but as I mentioned before, it was my hack on GVN.

I wouldn't be able to say if this improves performance or not; the original bug isn't mine. I found it on bugzilla and thought it might be something interesting to hack on.
If you'd prefer to fix the bug by disabling parts of the transformation, should I discard this review?

Keep track of BasicBlocks that have been processed and avoid doing PRE on blocks that haven't been operated on.

This causes the transformation not to be applied in some of the PRE test cases (but fixes the original problem). Not sure if this is desirable.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Scalar/

GVN.h

1 line

lib/

Transforms/

Scalar/

GVN.cpp

12 lines

Diff 108080

include/llvm/Transforms/Scalar/GVN.h

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	private:
friend class gvn::GVNLegacyPass;		friend class gvn::GVNLegacyPass;
friend struct DenseMapInfo<Expression>;		friend struct DenseMapInfo<Expression>;

MemoryDependenceResults *MD;		MemoryDependenceResults *MD;
DominatorTree *DT;		DominatorTree *DT;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
AssumptionCache *AC;		AssumptionCache *AC;
SetVector<BasicBlock *> DeadBlocks;		SetVector<BasicBlock *> DeadBlocks;
		SmallSet<BasicBlock *, 32> NumberedBlocks;
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;

ValueTable VN;		ValueTable VN;

/// A mapping from value numbers to lists of Value*'s that		/// A mapping from value numbers to lists of Value*'s that
/// have that value number. Use findLeader to query it.		/// have that value number. Use findLeader to query it.
struct LeaderTableEntry {		struct LeaderTableEntry {
Value *Val;		Value *Val;
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 2,047 Lines • ▼ Show 20 Lines	if (NumWithout != 0) {
// We can't do PRE safely on a critical edge, so instead we schedule		// We can't do PRE safely on a critical edge, so instead we schedule
// the edge to be split and perform the PRE the next time we iterate		// the edge to be split and perform the PRE the next time we iterate
// on the function.		// on the function.
unsigned SuccNum = GetSuccessorNumber(PREPred, CurrentBlock);		unsigned SuccNum = GetSuccessorNumber(PREPred, CurrentBlock);
if (isCriticalEdge(PREPred->getTerminator(), SuccNum)) {		if (isCriticalEdge(PREPred->getTerminator(), SuccNum)) {
toSplit.push_back(std::make_pair(PREPred->getTerminator(), SuccNum));		toSplit.push_back(std::make_pair(PREPred->getTerminator(), SuccNum));
return false;		return false;
}		}

		// Don't PRE on blocks that have not been numbered.
		if (NumberedBlocks.count(PREPred) == 0)
		return false;

// We need to insert somewhere, so let's give it a shot		// We need to insert somewhere, so let's give it a shot
PREInstr = CurInst->clone();		PREInstr = CurInst->clone();
if (!performScalarPREInsertion(PREInstr, PREPred, ValNo)) {		if (!performScalarPREInsertion(PREInstr, PREPred, ValNo)) {
// If we failed insertion, make sure we remove the instruction.		// If we failed insertion, make sure we remove the instruction.
DEBUG(verifyRemoved(PREInstr));		DEBUG(verifyRemoved(PREInstr));
PREInstr->deleteValue();		PREInstr->deleteValue();
return false;		return false;
}		}
Show All 34 Lines	bool GVN::performScalarPRE(Instruction *CurInst) {

return true;		return true;
}		}

/// Perform a purely local form of PRE that looks for diamond		/// Perform a purely local form of PRE that looks for diamond
/// control flow patterns and attempts to perform simple PRE at the join point.		/// control flow patterns and attempts to perform simple PRE at the join point.
bool GVN::performPRE(Function &F) {		bool GVN::performPRE(Function &F) {
bool Changed = false;		bool Changed = false;
		NumberedBlocks.clear();
for (BasicBlock *CurrentBlock : depth_first(&F.getEntryBlock())) {		for (BasicBlock *CurrentBlock : depth_first(&F.getEntryBlock())) {
		NumberedBlocks.insert(CurrentBlock);
// Nothing to PRE in the entry block.		// Nothing to PRE in the entry block.
if (CurrentBlock == &F.getEntryBlock())		if (CurrentBlock == &F.getEntryBlock())
continue;		continue;

// Don't perform PRE on an EH pad.		// Don't perform PRE on an EH pad.
if (CurrentBlock->isEHPad())		if (CurrentBlock->isEHPad())
continue;		continue;

Show All 33 Lines	bool GVN::splitCriticalEdges() {
} while (!toSplit.empty());		} while (!toSplit.empty());
if (MD) MD->invalidateCachedPredecessors();		if (MD) MD->invalidateCachedPredecessors();
return true;		return true;
}		}

/// Executes one iteration of GVN		/// Executes one iteration of GVN
bool GVN::iterateOnFunction(Function &F) {		bool GVN::iterateOnFunction(Function &F) {
cleanupGlobalSets();		cleanupGlobalSets();
		NumberedBlocks.clear();

// Top-down walk of the dominator tree		// Top-down walk of the dominator tree
bool Changed = false;		bool Changed = false;
// Needed for value numbering with phi construction to work.		// Needed for value numbering with phi construction to work.
// RPOT walks the graph in its constructor and will not be invalidated during		// RPOT walks the graph in its constructor and will not be invalidated during
// processBlock.		// processBlock.
ReversePostOrderTraversal<Function *> RPOT(&F);		ReversePostOrderTraversal<Function *> RPOT(&F);
for (BasicBlock *BB : RPOT)		for (BasicBlock *BB : RPOT) {
		NumberedBlocks.insert(BB);
Changed \|= processBlock(BB);		Changed \|= processBlock(BB);
		}

return Changed;		return Changed;
}		}

void GVN::cleanupGlobalSets() {		void GVN::cleanupGlobalSets() {
VN.clear();		VN.clear();
LeaderTable.clear();		LeaderTable.clear();
TableAllocator.Reset();		TableAllocator.Reset();
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines