This is an archive of the discontinued LLVM Phabricator instance.

DSE miscompile when store is clobbered across loop iterations
AbandonedPublic

Authored by apilipenko on Sep 20 2019, 5:41 PM.

Download Raw Diff

Details

Reviewers

eli.friedman
hfinkel
bjope

Summary

Currently DSE miscompiles the following example:

a = calloc(n+1)
for (int i = 0; i < n; i++) {
  store 1, a[i+1] // (1)
  store 0, a[i]   // (2)
}

It eliminates the second store thinking that it's redundant. This happens because memoryIsNotModifiedBetween doesn't see a[i] being clobbered in between the allocation and the store.

memoryIsNotModifiedBetween does a backwards scan through the CFG from SecondI to FirstI. It looks for instructions which can modify the memory location accessed by SecondI. For every may-write-to-memory instruction is asks AA whether this instruction modifies memory location accessed by SecondI.

The problem occurs when it visits the loop block through a backedge. It asks the AA about aliasing between stores (1) and (2). BasicAA sees that two stores access addresses which are distincs offsets from the same base and concludes that they don't alias. This is true for accesses within one loop iteration, but this is not true across iterations.

The change is to keep track whether we visit a block through a backedge. If we visit the block through a backedge, be conservative and treat any may-write-to-memory instruction as a clobber.

Note that there is a somewhat similar problem in DA:
https://bugs.llvm.org/show_bug.cgi?id=42143

Diff Detail

Event Timeline

apilipenko created this revision.Sep 20 2019, 5:41 PM

The problem occurs when it visits the loop block through a backedge. It asks the AA about aliasing between stores (1) and (2). BasicAA sees that two stores access addresses which are distincs offsets from the same base and concludes that they don't alias. This is true for accesses within one loop iteration, but this is not true across iterations.

This really sounds like a bug in BasicAA, not a bug in DSE. Reading through BasicAA, we have a number of cases where we explicitly reason about values in loops from different iterations being potentially equal. I think you've simply found one more. I will certainly admit, the documentation on expected behaviour is a tad bit lacking here though.

How do others interpret this?

It looks like you forgot to CC llvm-commits? Please post a new differential so the patch is sent to the list.

In some sense, the alias() method of AliasAnalysis really has three parameters: the two memory locations are explicit, and the implicit third parameter is a position in the source code where both locations are valid. Given that context, you can prove "obvious" identities like aliasing between x and x+1, aliasing with restrict pointers, etc. This is generally useful for a lot of transforms. In contexts where that doesn't work, we use some other mechanism, like LoopAccessAnalysis. BasicAA does look through PHI nodes in certain cases, but when it does, it's very careful to use a different set of assumptions (although we've had bugs with this in the past).

This is specifically a hole in the calloc() handling, right? This case can't come up in regular load->store no-op stores: the address of the load dominates the load, so the address dominates all the operations between the load and the store, so alias() does the right thing.

fhahn added a subscriber: fhahn.Sep 24 2019, 9:21 AM

In D67870#1680256, @efriedma wrote:

It looks like you forgot to CC llvm-commits? Please post a new differential so the patch is sent to the list.

Resubmitted as D68006.

In D67870#1680256, @efriedma wrote:

This is specifically a hole in the calloc() handling, right? This case can't come up in regular load->store no-op stores: the address of the load dominates the load, so the address dominates all the operations between the load and the store, so alias() does the right thing.

That's right. I don't think that load->store case is affected.

This change was resubmitted as D68006.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

DeadStoreElimination.cpp

77 lines

test/

Transforms/

DeadStoreElimination/

simple.ll

72 lines

Diff 221125

lib/Transforms/Scalar/DeadStoreElimination.cpp

Show All 18 Lines
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/CaptureTracking.h"		#include "llvm/Analysis/CaptureTracking.h"
		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/OrderedBasicBlock.h"		#include "llvm/Analysis/OrderedBasicBlock.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	EnablePartialStoreMerging("enable-dse-partial-store-merging",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable partial store merging in DSE"));		cl::desc("Enable partial store merging in DSE"));

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper functions		// Helper functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
using OverlapIntervalsTy = std::map<int64_t, int64_t>;		using OverlapIntervalsTy = std::map<int64_t, int64_t>;
using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;		using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;
		using BackedgesTy =
		SmallSet<std::pair<const BasicBlock , const BasicBlock >, 8>;

/// Delete this instruction. Before we do, go through and zero out all the		/// Delete this instruction. Before we do, go through and zero out all the
/// operands of this instruction. If any of them become dead, delete them and		/// operands of this instruction. If any of them become dead, delete them and
/// the computation tree that feeds them.		/// the computation tree that feeds them.
/// If ValueSet is non-null, remove any deleted instructions from it as well.		/// If ValueSet is non-null, remove any deleted instructions from it as well.
static void		static void
deleteDeadInstruction(Instruction I, BasicBlock::iterator BBI,		deleteDeadInstruction(Instruction I, BasicBlock::iterator BBI,
MemoryDependenceResults &MD, const TargetLibraryInfo &TLI,		MemoryDependenceResults &MD, const TargetLibraryInfo &TLI,
▲ Show 20 Lines • Show All 478 Lines • ▼ Show 20 Lines
}		}

/// Returns true if the memory which is accessed by the second instruction is not		/// Returns true if the memory which is accessed by the second instruction is not
/// modified between the first and the second instruction.		/// modified between the first and the second instruction.
/// Precondition: Second instruction must be dominated by the first		/// Precondition: Second instruction must be dominated by the first
/// instruction.		/// instruction.
static bool memoryIsNotModifiedBetween(Instruction *FirstI,		static bool memoryIsNotModifiedBetween(Instruction *FirstI,
Instruction *SecondI,		Instruction *SecondI,
AliasAnalysis *AA) {		AliasAnalysis *AA,
SmallVector<BasicBlock *, 16> WorkList;		BackedgesTy &Backedges) {
		// Do a backwards scan through the CFG from SecondI to FirstI. Look for
		// instructions which can modify the memory location accessed by SecondI.
		//
		// While scanning through the CFG keep track whether we visited the block
		// through a back edge. If we visited through a back edge it means that we are
		// analyzing instructions across different iterations of the loop. We
		// need to be more conservative in this case. Consider this example:
		// a = calloc(n+1)
		// for (int i = 0; i < n; i++) {
		// store 1, a[i+1] // (1)
		// store 0, a[i] // (2)
		// }
		// Scanning from store (2) to the calloc we'll visit the loop basic block
		// through the backedge. If we ask BasicAA whether store (1) aliases with
		// store (2) it would report NoAlias since the two addresses are distinct
		// offsets from the same base (a[i] and a[i+1]). It's correct within one
		// iteration of the loop, but across different iterations they alias.
		//
		// If we visit a block through a backedge we treat any may-write-to-memory
		// instruction as a clobber.
		//
		// We use PointerIntPair to track the BasicBlock to visit along with the fact
		// whether we visited it through a back edge or not.
		using Block = PointerIntPair<BasicBlock *, 1, bool>;
		SmallVector<Block, 16> WorkList;
SmallPtrSet<BasicBlock *, 8> Visited;		SmallPtrSet<BasicBlock *, 8> Visited;
BasicBlock::iterator FirstBBI(FirstI);		BasicBlock::iterator FirstBBI(FirstI);
++FirstBBI;		++FirstBBI;
BasicBlock::iterator SecondBBI(SecondI);		BasicBlock::iterator SecondBBI(SecondI);
BasicBlock *FirstBB = FirstI->getParent();		BasicBlock *FirstBB = FirstI->getParent();
BasicBlock *SecondBB = SecondI->getParent();		BasicBlock *SecondBB = SecondI->getParent();
MemoryLocation MemLoc = MemoryLocation::get(SecondI);		MemoryLocation MemLoc = MemoryLocation::get(SecondI);

// Start checking the store-block.		// Start checking the store-block.
WorkList.push_back(SecondBB);		WorkList.emplace_back(SecondBB, /VisitedThroughBackedge=/false);

bool isFirstBlock = true;		bool isFirstBlock = true;

// Check all blocks going backward until we reach the load-block.		// Check all blocks going backward until we reach the load-block.
while (!WorkList.empty()) {		while (!WorkList.empty()) {
BasicBlock *B = WorkList.pop_back_val();		Block CurBlock = WorkList.pop_back_val();
		BasicBlock *B = CurBlock.getPointer();
		bool VisitedThroughBackedge = CurBlock.getInt();

// Ignore instructions before LI if this is the FirstBB.		// Ignore instructions before LI if this is the FirstBB.
BasicBlock::iterator BI = (B == FirstBB ? FirstBBI : B->begin());		BasicBlock::iterator BI = (B == FirstBB ? FirstBBI : B->begin());

BasicBlock::iterator EI;		BasicBlock::iterator EI;
if (isFirstBlock) {		if (isFirstBlock) {
// Ignore instructions after SI if this is the first visit of SecondBB.		// Ignore instructions after SI if this is the first visit of SecondBB.
assert(B == SecondBB && "first block is not the store block");		assert(B == SecondBB && "first block is not the store block");
EI = SecondBBI;		EI = SecondBBI;
isFirstBlock = false;		isFirstBlock = false;
} else {		} else {
// It's not SecondBB or (in case of a loop) the second visit of SecondBB.		// It's not SecondBB or (in case of a loop) the second visit of SecondBB.
// In this case we also have to look at instructions after SI.		// In this case we also have to look at instructions after SI.
EI = B->end();		EI = B->end();
}		}
for (; BI != EI; ++BI) {		for (; BI != EI; ++BI) {
Instruction I = &BI;		Instruction I = &BI;
if (I->mayWriteToMemory() && I != SecondI)		if (I->mayWriteToMemory() && I != SecondI) {
		if (VisitedThroughBackedge)
		// If we visited this block through a backedge we are analyzing
		// instructions across different iterations. Be conservative and
		// treat any may-write-to-memory as clobbering.
		return false;
if (isModSet(AA->getModRefInfo(I, MemLoc)))		if (isModSet(AA->getModRefInfo(I, MemLoc)))
return false;		return false;
}		}
		}
if (B != FirstBB) {		if (B != FirstBB) {
assert(B != &FirstBB->getParent()->getEntryBlock() &&		assert(B != &FirstBB->getParent()->getEntryBlock() &&
"Should not hit the entry block because SI must be dominated by LI");		"Should not hit the entry block because SI must be dominated by LI");
for (auto PredI = pred_begin(B), PE = pred_end(B); PredI != PE; ++PredI) {		for (auto PredI = pred_begin(B), PE = pred_end(B); PredI != PE; ++PredI) {
if (!Visited.insert(*PredI).second)		auto Pred = PredI;
		if (!Visited.insert(Pred).second)
continue;		continue;
WorkList.push_back(*PredI);		bool IsBackedge = Backedges.count(std::make_pair(B, Pred));
		WorkList.emplace_back(Pred, VisitedThroughBackedge \| IsBackedge);
}		}
}		}
}		}
return true;		return true;
}		}

/// Find all blocks that will unconditionally lead to the block BB and append		/// Find all blocks that will unconditionally lead to the block BB and append
/// them to F.		/// them to F.
▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	static bool removePartiallyOverlappedStores(AliasAnalysis *AA,
return Changed;		return Changed;
}		}

static bool eliminateNoopStore(Instruction *Inst, BasicBlock::iterator &BBI,		static bool eliminateNoopStore(Instruction *Inst, BasicBlock::iterator &BBI,
AliasAnalysis AA, MemoryDependenceResults MD,		AliasAnalysis AA, MemoryDependenceResults MD,
const DataLayout &DL,		const DataLayout &DL,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
InstOverlapIntervalsTy &IOL,		InstOverlapIntervalsTy &IOL,
OrderedBasicBlock &OBB) {		OrderedBasicBlock &OBB,
		BackedgesTy &Backedges) {
// Must be a store instruction.		// Must be a store instruction.
StoreInst *SI = dyn_cast<StoreInst>(Inst);		StoreInst *SI = dyn_cast<StoreInst>(Inst);
if (!SI)		if (!SI)
return false;		return false;

// If we're storing the same value back to a pointer that we just loaded from,		// If we're storing the same value back to a pointer that we just loaded from,
// then the store can be removed.		// then the store can be removed.
if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {		if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {
if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&		if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&
isRemovable(SI) && memoryIsNotModifiedBetween(DepLoad, SI, AA)) {		isRemovable(SI) &&
		memoryIsNotModifiedBetween(DepLoad, SI, AA, Backedges)) {

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "DSE: Remove Store Of Load from same pointer:\n LOAD: "		dbgs() << "DSE: Remove Store Of Load from same pointer:\n LOAD: "
<< DepLoad << "\n STORE: " << SI << '\n');		<< DepLoad << "\n STORE: " << SI << '\n');

deleteDeadInstruction(SI, &BBI, MD, TLI, IOL, OBB);		deleteDeadInstruction(SI, &BBI, MD, TLI, IOL, OBB);
++NumRedundantStores;		++NumRedundantStores;
return true;		return true;
}		}
}		}

// Remove null stores into the calloc'ed objects		// Remove null stores into the calloc'ed objects
Constant *StoredConstant = dyn_cast<Constant>(SI->getValueOperand());		Constant *StoredConstant = dyn_cast<Constant>(SI->getValueOperand());
if (StoredConstant && StoredConstant->isNullValue() && isRemovable(SI)) {		if (StoredConstant && StoredConstant->isNullValue() && isRemovable(SI)) {
Instruction *UnderlyingPointer =		Instruction *UnderlyingPointer =
dyn_cast<Instruction>(GetUnderlyingObject(SI->getPointerOperand(), DL));		dyn_cast<Instruction>(GetUnderlyingObject(SI->getPointerOperand(), DL));

if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, TLI) &&		if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, TLI) &&
memoryIsNotModifiedBetween(UnderlyingPointer, SI, AA)) {		memoryIsNotModifiedBetween(UnderlyingPointer, SI, AA, Backedges)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "DSE: Remove null store to the calloc'ed object:\n DEAD: "		dbgs() << "DSE: Remove null store to the calloc'ed object:\n DEAD: "
<< Inst << "\n OBJECT: " << UnderlyingPointer << '\n');		<< Inst << "\n OBJECT: " << UnderlyingPointer << '\n');

deleteDeadInstruction(SI, &BBI, MD, TLI, IOL, OBB);		deleteDeadInstruction(SI, &BBI, MD, TLI, IOL, OBB);
++NumRedundantStores;		++NumRedundantStores;
return true;		return true;
}		}
}		}
return false;		return false;
}		}

static bool eliminateDeadStores(BasicBlock &BB, AliasAnalysis *AA,		static bool eliminateDeadStores(BasicBlock &BB, AliasAnalysis *AA,
MemoryDependenceResults MD, DominatorTree DT,		MemoryDependenceResults MD, DominatorTree DT,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI,
		BackedgesTy &Backedges) {
const DataLayout &DL = BB.getModule()->getDataLayout();		const DataLayout &DL = BB.getModule()->getDataLayout();
bool MadeChange = false;		bool MadeChange = false;

OrderedBasicBlock OBB(&BB);		OrderedBasicBlock OBB(&BB);
Instruction *LastThrowing = nullptr;		Instruction *LastThrowing = nullptr;

// A map of interval maps representing partially-overwritten value parts.		// A map of interval maps representing partially-overwritten value parts.
InstOverlapIntervalsTy IOL;		InstOverlapIntervalsTy IOL;
Show All 16 Lines	if (Inst->mayThrow()) {
continue;		continue;
}		}

// Check to see if Inst writes to memory. If not, continue.		// Check to see if Inst writes to memory. If not, continue.
if (!hasAnalyzableMemoryWrite(Inst, *TLI))		if (!hasAnalyzableMemoryWrite(Inst, *TLI))
continue;		continue;

// eliminateNoopStore will update in iterator, if necessary.		// eliminateNoopStore will update in iterator, if necessary.
if (eliminateNoopStore(Inst, BBI, AA, MD, DL, TLI, IOL, OBB)) {		if (eliminateNoopStore(Inst, BBI, AA, MD, DL, TLI, IOL, OBB, Backedges)) {
MadeChange = true;		MadeChange = true;
continue;		continue;
}		}

// If we find something that writes memory, get its memory dependence.		// If we find something that writes memory, get its memory dependence.
MemDepResult InstDep = MD->getDependency(Inst, &OBB);		MemDepResult InstDep = MD->getDependency(Inst, &OBB);

// Ignore any store where we can't find a local dependence.		// Ignore any store where we can't find a local dependence.
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	while (InstDep.isDef() \|\| InstDep.isClobber()) {
auto *Earlier = dyn_cast<StoreInst>(DepWrite);		auto *Earlier = dyn_cast<StoreInst>(DepWrite);
auto *Later = dyn_cast<StoreInst>(Inst);		auto *Later = dyn_cast<StoreInst>(Inst);
if (Earlier && isa<ConstantInt>(Earlier->getValueOperand()) &&		if (Earlier && isa<ConstantInt>(Earlier->getValueOperand()) &&
DL.typeSizeEqualsStoreSize(		DL.typeSizeEqualsStoreSize(
Earlier->getValueOperand()->getType()) &&		Earlier->getValueOperand()->getType()) &&
Later && isa<ConstantInt>(Later->getValueOperand()) &&		Later && isa<ConstantInt>(Later->getValueOperand()) &&
DL.typeSizeEqualsStoreSize(		DL.typeSizeEqualsStoreSize(
Later->getValueOperand()->getType()) &&		Later->getValueOperand()->getType()) &&
memoryIsNotModifiedBetween(Earlier, Later, AA)) {		memoryIsNotModifiedBetween(Earlier, Later, AA, Backedges)) {
// If the store we find is:		// If the store we find is:
// a) partially overwritten by the store to 'Loc'		// a) partially overwritten by the store to 'Loc'
// b) the later store is fully contained in the earlier one and		// b) the later store is fully contained in the earlier one and
// c) they both have a constant value		// c) they both have a constant value
// d) none of the two stores need padding		// d) none of the two stores need padding
// Merge the two stores, replacing the earlier store's value with a		// Merge the two stores, replacing the earlier store's value with a
// merge of both values.		// merge of both values.
// TODO: Deal with other constant types (vectors, etc), and probably		// TODO: Deal with other constant types (vectors, etc), and probably
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	if (BB.getTerminator()->getNumSuccessors() == 0)
MadeChange \|= handleEndBlock(BB, AA, MD, TLI, IOL, OBB);		MadeChange \|= handleEndBlock(BB, AA, MD, TLI, IOL, OBB);

return MadeChange;		return MadeChange;
}		}

static bool eliminateDeadStores(Function &F, AliasAnalysis *AA,		static bool eliminateDeadStores(Function &F, AliasAnalysis *AA,
MemoryDependenceResults MD, DominatorTree DT,		MemoryDependenceResults MD, DominatorTree DT,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
		SmallVector<std::pair<const BasicBlock , const BasicBlock >, 8>
		BackedgesVector;
		llvm::FindFunctionBackedges(F, BackedgesVector);
		BackedgesTy Backedges;
		for (auto BE : BackedgesVector)
		Backedges.insert(BE);

bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
// Only check non-dead blocks. Dead blocks may have strange pointer		// Only check non-dead blocks. Dead blocks may have strange pointer
// cycles that will confuse alias analysis.		// cycles that will confuse alias analysis.
if (DT->isReachableFromEntry(&BB))		if (DT->isReachableFromEntry(&BB))
MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI);		MadeChange \|= eliminateDeadStores(BB, AA, MD, DT, TLI, Backedges);

return MadeChange;		return MadeChange;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DSE Pass		// DSE Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

test/Transforms/DeadStoreElimination/simple.ll

	Show First 20 Lines • Show All 887 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;

	tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)			tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %Q, i64 12, i32 1)
	tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 8, i32 1)			tail call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i64(i8* align 1 %P, i8* align 1 %R, i64 8, i32 1)
	ret void			ret void
	}			}

				define i32 @test40() {
				; CHECK-LABEL: @test40(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[M:%.]] = call i8 @calloc(i32 9, i32 20)
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[LOOP]] ]
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[P_NEXT:%.]] = getelementptr inbounds i8, i8 [[M]], i64 [[INDVARS_IV_NEXT]]
				; CHECK-NEXT: store i8 1, i8* [[P_NEXT]]
				; CHECK-NEXT: [[P:%.]] = getelementptr inbounds i8, i8 [[M]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store i8 0, i8* [[P]]
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ugt i64 [[INDVARS_IV]], 15
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[RETURN:%.*]]
				; CHECK: return:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%m = call i8* @calloc(i32 9, i32 20)
				br label %loop
				loop:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %loop ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%p.next = getelementptr inbounds i8, i8* %m, i64 %indvars.iv.next
				store i8 1, i8* %p.next
				%p = getelementptr inbounds i8, i8* %m, i64 %indvars.iv
				store i8 0, i8* %p
				%continue = icmp ugt i64 %indvars.iv, 15
				br i1 %continue, label %loop, label %return
				return:
				ret i32 0
				}

				define i32 @test41() {
				; CHECK-LABEL: @test41(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[M:%.]] = call i8 @calloc(i32 9, i32 20)
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[CONT:%.]] ]
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; CHECK-NEXT: [[P_NEXT:%.]] = getelementptr inbounds i8, i8 [[M]], i64 [[INDVARS_IV_NEXT]]
				; CHECK-NEXT: store i8 1, i8* [[P_NEXT]]
				; CHECK-NEXT: br label [[CONT]]
				; CHECK: cont:
				; CHECK-NEXT: [[P:%.]] = getelementptr inbounds i8, i8 [[M]], i64 [[INDVARS_IV]]
				; CHECK-NEXT: store i8 0, i8* [[P]]
				; CHECK-NEXT: [[CONTINUE:%.*]] = icmp ugt i64 [[INDVARS_IV]], 15
				; CHECK-NEXT: br i1 [[CONTINUE]], label [[LOOP]], label [[RETURN:%.*]]
				; CHECK: return:
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%m = call i8* @calloc(i32 9, i32 20)
				br label %loop
				loop:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cont ]
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%p.next = getelementptr inbounds i8, i8* %m, i64 %indvars.iv.next
				store i8 1, i8* %p.next
				br label %cont

				cont:
				%p = getelementptr inbounds i8, i8* %m, i64 %indvars.iv
				store i8 0, i8* %p
				%continue = icmp ugt i64 %indvars.iv, 15
				br i1 %continue, label %loop, label %return

				return:
				ret i32 0
				}

	declare void @llvm.memmove.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1)			declare void @llvm.memmove.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1)
	declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32)			declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32)