This is an archive of the discontinued LLVM Phabricator instance.

[CodeExtractor] Only lift lifetime markers present in the extraction region
ClosedPublic

Authored by vsk on Feb 6 2019, 11:39 AM.

Download Raw Diff

Details

Reviewers

kachkov98
davidxl
tejohnson
kuhar

Commits

rG4b0cc9a7c80f: [CodeExtractor] Only lift lifetime markers present in the extraction region
rL353973: [CodeExtractor] Only lift lifetime markers present in the extraction region

Summary

When CodeExtractor finds liftime markers referencing inputs to the
extraction region, it lifts these markers out of the region and inserts
them around the call to the extracted function (see r350420, PR39671).

However, it should *only* lift lifetime markers that are actually
present in the extraction region. I.e., if a start marker is present in
the extraction region but a corresponding end marker isn't (or vice
versa), only the start marker (or end marker, resp.) should be lifted.

This fixes a miscompile in which a lifetime start marker was lifted
inappropriately, causing an argument to the extracted function to
optimized out as undef.

rdar://47802482

Diff Detail

Repository: rL LLVM

Event Timeline

vsk created this revision.Feb 6 2019, 11:39 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 6 2019, 11:39 AM

LGTM

This revision is now accepted and ready to land.Feb 12 2019, 9:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2019, 9:48 PM

Herald added a subscriber: jdoerfert. · View Herald Transcript

Closed by commit rL353973: [CodeExtractor] Only lift lifetime markers present in the extraction region (authored by vedantk). · Explain WhyFeb 13 2019, 11:55 AM

This revision was automatically updated to reflect the committed changes.

@kachkov98 @davidxl apologies, but this patch is still not correct.

If a lifetime.end marker occurs along one path through the execution region, but not another, then it's still incorrect to lift the marker, because there is some path through the extracted function which would ordinarily not reach the marker. Example (extract blocks extract{1,2}):

entry:
  lifetime_start(%slot)
  br label %header

header:
  use(%slot)
  br i1 undef, label %extract1, label %extract2

extract1:
  ; Backedge.
  br label %header 

extract2:
  ; Lifting this marker would result in %slot being dead in the header block.
  lifetime_end(%slot)
  br label %exit

I think we have two options to fix this:

Do not lift any lifetime.end markers. Continue to lift lifetime.start markers for inputs, as this should still be safe.
Only lift a lifetime.end marker for an input if that marker would be reached in any every path through the extraction region. If an end marker can't be lifted, erase all lifetime markers for the input in the parent function to prevent bad stack slot merging.

Both options may cause some stack slot merging opportunities to go away. Wdyt?

In this example there are two possible entry points in region, so it can't be extracted (first block in region should dominate the others). Maybe there are better examples of issue?
As I understand, correct (but conservative) transformation is to move lifetime.start upwards to some dominator block. According to this, lifting lifetime.start before call must be right. Moving lifetime.end downwards to some post-dominator block is generally incorrect. The case when lifetime.end for the same value occured at every execution path doesn't seem very common, but placing lifetime.end after call gives more accurate information. The thing that is unclear fo me is how erasing all markers for input in case when lifetime.end can't be lifted helps stack coloring?
BTW, it seems there is no simple solution that doesn't reduce stack coloring opportunities in all cases (can't provide information to the caller whether lifetime.end for given value occured or not).

In D57834#1398530, @kachkov98 wrote:

In this example there are two possible entry points in region, so it can't be extracted (first block in region should dominate the others). Maybe there are better examples of issue?

I've posted an example in a follow-up patch: https://reviews.llvm.org/D58253

As I understand, correct (but conservative) transformation is to move lifetime.start upwards to some dominator block. According to this, lifting lifetime.start before call must be right.

Agreed.

Moving lifetime.end downwards to some post-dominator block is generally incorrect. The case when lifetime.end for the same value occured at every execution path doesn't seem very common, but placing lifetime.end after call gives more accurate information. The thing that is unclear fo me is how erasing all markers for input in case when lifetime.end can't be lifted helps stack coloring?

I don't think it does. For option (2), I meant, if we can't lift a lifetime.end marker then don't lift the lifetime.start marker either, and also erase all markers for the input to prevent bad stack coloring. It seems that option (1) is both simpler and sufficient.

BTW, it seems there is no simple solution that doesn't reduce stack coloring opportunities in all cases (can't provide information to the caller whether lifetime.end for given value occured or not).

Agreed, this seems unavoidable.

Just for note: it seems that skipping extracting lifetime.start when lifetime.end can't be lifted (in 2nd option) can again lead to miscompile as in https://reviews.llvm.org/D55967 (assume that markers for x and y in rhs are not lifted). It's safer to extract all lifetime.start markers referencing inputs to mark them all as simultaneously used, as in option (1).

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Utils/

CodeExtractor.cpp

96 lines

test/

Transforms/

HotColdSplit/

lifetime-markers-on-inputs-1.ll

66 lines

lifetime-markers-on-inputs-2.ll

135 lines

lifetime-markers-on-inputs.ll

66 lines

Diff 186714

llvm/trunk/lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 879 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = Users.size(); i != e; ++i)
if (Instruction *I = dyn_cast<Instruction>(Users[i]))		if (Instruction *I = dyn_cast<Instruction>(Users[i]))
if (I->isTerminator() && !Blocks.count(I->getParent()) &&		if (I->isTerminator() && !Blocks.count(I->getParent()) &&
I->getParent()->getParent() == oldFunction)		I->getParent()->getParent() == oldFunction)
I->replaceUsesOfWith(header, newHeader);		I->replaceUsesOfWith(header, newHeader);

return newFunction;		return newFunction;
}		}

/// Scan the extraction region for lifetime markers which reference inputs.		/// Erase lifetime.start markers which reference inputs to the extraction
/// Erase these markers. Return the inputs which were referenced.		/// region, and insert the referenced memory into \p LifetimesStart. Do the same
		/// with lifetime.end markers (but insert them into \p LifetimesEnd).
///		///
/// The extraction region is defined by a set of blocks (\p Blocks), and a set		/// The extraction region is defined by a set of blocks (\p Blocks), and a set
/// of allocas which will be moved from the caller function into the extracted		/// of allocas which will be moved from the caller function into the extracted
/// function (\p SunkAllocas).		/// function (\p SunkAllocas).
static SetVector<Value *>		static void eraseLifetimeMarkersOnInputs(const SetVector<BasicBlock *> &Blocks,
eraseLifetimeMarkersOnInputs(const SetVector<BasicBlock *> &Blocks,		const SetVector<Value *> &SunkAllocas,
const SetVector<Value *> &SunkAllocas) {		SetVector<Value *> &LifetimesStart,
SetVector<Value *> InputObjectsWithLifetime;		SetVector<Value *> &LifetimesEnd) {
for (BasicBlock *BB : Blocks) {		for (BasicBlock *BB : Blocks) {
for (auto It = BB->begin(), End = BB->end(); It != End;) {		for (auto It = BB->begin(), End = BB->end(); It != End;) {
auto II = dyn_cast<IntrinsicInst>(&It);		auto II = dyn_cast<IntrinsicInst>(&It);
++It;		++It;
if (!II \|\| !II->isLifetimeStartOrEnd())		if (!II \|\| !II->isLifetimeStartOrEnd())
continue;		continue;

// Get the memory operand of the lifetime marker. If the underlying		// Get the memory operand of the lifetime marker. If the underlying
// object is a sunk alloca, or is otherwise defined in the extraction		// object is a sunk alloca, or is otherwise defined in the extraction
// region, the lifetime marker must not be erased.		// region, the lifetime marker must not be erased.
Value *Mem = II->getOperand(1)->stripInBoundsOffsets();		Value *Mem = II->getOperand(1)->stripInBoundsOffsets();
if (SunkAllocas.count(Mem) \|\| definedInRegion(Blocks, Mem))		if (SunkAllocas.count(Mem) \|\| definedInRegion(Blocks, Mem))
continue;		continue;

InputObjectsWithLifetime.insert(Mem);		if (II->getIntrinsicID() == Intrinsic::lifetime_start)
		LifetimesStart.insert(Mem);
		else
		LifetimesEnd.insert(Mem);
II->eraseFromParent();		II->eraseFromParent();
}		}
}		}
return InputObjectsWithLifetime;
}		}

/// Insert lifetime start/end markers surrounding the call to the new function		/// Insert lifetime start/end markers surrounding the call to the new function
/// for objects defined in the caller.		/// for objects defined in the caller.
static void insertLifetimeMarkersSurroundingCall(Module *M,		static void insertLifetimeMarkersSurroundingCall(
ArrayRef<Value *> Objects,		Module M, ArrayRef<Value > LifetimesStart, ArrayRef<Value *> LifetimesEnd,
CallInst *TheCall) {		CallInst *TheCall) {
if (Objects.empty())
return;

LLVMContext &Ctx = M->getContext();		LLVMContext &Ctx = M->getContext();
auto Int8PtrTy = Type::getInt8PtrTy(Ctx);		auto Int8PtrTy = Type::getInt8PtrTy(Ctx);
auto NegativeOne = ConstantInt::getSigned(Type::getInt64Ty(Ctx), -1);		auto NegativeOne = ConstantInt::getSigned(Type::getInt64Ty(Ctx), -1);
auto StartFn = llvm::Intrinsic::getDeclaration(
M, llvm::Intrinsic::lifetime_start, Int8PtrTy);
auto EndFn = llvm::Intrinsic::getDeclaration(M, llvm::Intrinsic::lifetime_end,
Int8PtrTy);
Instruction *Term = TheCall->getParent()->getTerminator();		Instruction *Term = TheCall->getParent()->getTerminator();

		// The memory argument to a lifetime marker must be a i8*. Cache any bitcasts
		// needed to satisfy this requirement so they may be reused.
		DenseMap<Value , Value > Bitcasts;

		// Emit lifetime markers for the pointers given in \p Objects. Insert the
		// markers before the call if \p InsertBefore, and after the call otherwise.
		auto insertMarkers = [&](Function MarkerFunc, ArrayRef<Value > Objects,
		bool InsertBefore) {
for (Value *Mem : Objects) {		for (Value *Mem : Objects) {
assert((!isa<Instruction>(Mem) \|\|		assert((!isa<Instruction>(Mem) \|\| cast<Instruction>(Mem)->getFunction() ==
cast<Instruction>(Mem)->getFunction() == TheCall->getFunction()) &&		TheCall->getFunction()) &&
"Input memory not defined in original function");		"Input memory not defined in original function");
Value *MemAsI8Ptr = nullptr;		Value *&MemAsI8Ptr = Bitcasts[Mem];
		if (!MemAsI8Ptr) {
if (Mem->getType() == Int8PtrTy)		if (Mem->getType() == Int8PtrTy)
MemAsI8Ptr = Mem;		MemAsI8Ptr = Mem;
else		else
MemAsI8Ptr =		MemAsI8Ptr =
CastInst::CreatePointerCast(Mem, Int8PtrTy, "lt.cast", TheCall);		CastInst::CreatePointerCast(Mem, Int8PtrTy, "lt.cast", TheCall);
		}

		auto Marker = CallInst::Create(MarkerFunc, {NegativeOne, MemAsI8Ptr});
		if (InsertBefore)
		Marker->insertBefore(TheCall);
		else
		Marker->insertBefore(Term);
		}
		};

		if (!LifetimesStart.empty()) {
		auto StartFn = llvm::Intrinsic::getDeclaration(
		M, llvm::Intrinsic::lifetime_start, Int8PtrTy);
		insertMarkers(StartFn, LifetimesStart, /InsertBefore=/true);
		}

auto StartMarker = CallInst::Create(StartFn, {NegativeOne, MemAsI8Ptr});		if (!LifetimesEnd.empty()) {
StartMarker->insertBefore(TheCall);		auto EndFn = llvm::Intrinsic::getDeclaration(
auto EndMarker = CallInst::Create(EndFn, {NegativeOne, MemAsI8Ptr});		M, llvm::Intrinsic::lifetime_end, Int8PtrTy);
EndMarker->insertBefore(Term);		insertMarkers(EndFn, LifetimesEnd, /InsertBefore=/false);
}		}
}		}

/// emitCallAndSwitchStatement - This method sets up the caller side by adding		/// emitCallAndSwitchStatement - This method sets up the caller side by adding
/// the call instruction, splitting any PHI nodes in the header block as		/// the call instruction, splitting any PHI nodes in the header block as
/// necessary.		/// necessary.
CallInst CodeExtractor::emitCallAndSwitchStatement(Function newFunction,		CallInst CodeExtractor::emitCallAndSwitchStatement(Function newFunction,
BasicBlock *codeReplacer,		BasicBlock *codeReplacer,
▲ Show 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	default:
TheSwitch->setDefaultDest(TheSwitch->getSuccessor(NumExitBlocks));		TheSwitch->setDefaultDest(TheSwitch->getSuccessor(NumExitBlocks));
// Remove redundant case		// Remove redundant case
TheSwitch->removeCase(SwitchInst::CaseIt(TheSwitch, NumExitBlocks-1));		TheSwitch->removeCase(SwitchInst::CaseIt(TheSwitch, NumExitBlocks-1));
break;		break;
}		}

// Insert lifetime markers around the reloads of any output values. The		// Insert lifetime markers around the reloads of any output values. The
// allocas output values are stored in are only in-use in the codeRepl block.		// allocas output values are stored in are only in-use in the codeRepl block.
insertLifetimeMarkersSurroundingCall(M, ReloadOutputs, call);		insertLifetimeMarkersSurroundingCall(M, ReloadOutputs, ReloadOutputs, call);

return call;		return call;
}		}

void CodeExtractor::moveCodeToFunction(Function *newFunction) {		void CodeExtractor::moveCodeToFunction(Function *newFunction) {
Function oldFunc = (Blocks.begin())->getParent();		Function oldFunc = (Blocks.begin())->getParent();
Function::BasicBlockListType &oldBlocks = oldFunc->getBasicBlockList();		Function::BasicBlockListType &oldBlocks = oldFunc->getBasicBlockList();
Function::BasicBlockListType &newBlocks = newFunction->getBasicBlockList();		Function::BasicBlockListType &newBlocks = newFunction->getBasicBlockList();
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	if (!HoistingCands.empty()) {
for (auto *II : HoistingCands)		for (auto *II : HoistingCands)
cast<Instruction>(II)->moveBefore(TI);		cast<Instruction>(II)->moveBefore(TI);
}		}

// Collect objects which are inputs to the extraction region and also		// Collect objects which are inputs to the extraction region and also
// referenced by lifetime start/end markers within it. The effects of these		// referenced by lifetime start/end markers within it. The effects of these
// markers must be replicated in the calling function to prevent the stack		// markers must be replicated in the calling function to prevent the stack
// coloring pass from merging slots which store input objects.		// coloring pass from merging slots which store input objects.
ValueSet InputObjectsWithLifetime =		ValueSet LifetimesStart, LifetimesEnd;
eraseLifetimeMarkersOnInputs(Blocks, SinkingCands);		eraseLifetimeMarkersOnInputs(Blocks, SinkingCands, LifetimesStart,
		LifetimesEnd);

// Construct new function based on inputs/outputs & add allocas for all defs.		// Construct new function based on inputs/outputs & add allocas for all defs.
Function *newFunction =		Function *newFunction =
constructFunction(inputs, outputs, header, newFuncRoot, codeReplacer,		constructFunction(inputs, outputs, header, newFuncRoot, codeReplacer,
oldFunction, oldFunction->getParent());		oldFunction, oldFunction->getParent());

// Update the entry count of the function.		// Update the entry count of the function.
if (BFI) {		if (BFI) {
auto Count = BFI->getProfileCountFromFreq(EntryFreq.getFrequency());		auto Count = BFI->getProfileCountFromFreq(EntryFreq.getFrequency());
if (Count.hasValue())		if (Count.hasValue())
newFunction->setEntryCount(		newFunction->setEntryCount(
ProfileCount(Count.getValue(), Function::PCT_Real)); // FIXME		ProfileCount(Count.getValue(), Function::PCT_Real)); // FIXME
BFI->setBlockFreq(codeReplacer, EntryFreq.getFrequency());		BFI->setBlockFreq(codeReplacer, EntryFreq.getFrequency());
}		}

CallInst *TheCall =		CallInst *TheCall =
emitCallAndSwitchStatement(newFunction, codeReplacer, inputs, outputs);		emitCallAndSwitchStatement(newFunction, codeReplacer, inputs, outputs);

moveCodeToFunction(newFunction);		moveCodeToFunction(newFunction);

// Replicate the effects of any lifetime start/end markers which referenced		// Replicate the effects of any lifetime start/end markers which referenced
// input objects in the extraction region by placing markers around the call.		// input objects in the extraction region by placing markers around the call.
insertLifetimeMarkersSurroundingCall(oldFunction->getParent(),		insertLifetimeMarkersSurroundingCall(oldFunction->getParent(),
InputObjectsWithLifetime.getArrayRef(),		LifetimesStart.getArrayRef(),
TheCall);		LifetimesEnd.getArrayRef(), TheCall);

// Propagate personality info to the new function if there is one.		// Propagate personality info to the new function if there is one.
if (oldFunction->hasPersonalityFn())		if (oldFunction->hasPersonalityFn())
newFunction->setPersonalityFn(oldFunction->getPersonalityFn());		newFunction->setPersonalityFn(oldFunction->getPersonalityFn());

// Update the branch weights for the exit block.		// Update the branch weights for the exit block.
if (BFI && NumExitBlocks > 1)		if (BFI && NumExitBlocks > 1)
calculateNewCallTerminatorWeights(codeReplacer, ExitWeights, BPI);		calculateNewCallTerminatorWeights(codeReplacer, ExitWeights, BPI);
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/HotColdSplit/lifetime-markers-on-inputs-1.ll

				; RUN: opt -S -hotcoldsplit -hotcoldsplit-threshold=0 < %s 2>&1 \| FileCheck %s

				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture)

				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture)

				declare void @use(i8*)

				declare void @cold_use2(i8, i8) cold

				; CHECK-LABEL: define {{.*}}@foo(
				define void @foo() {
				entry:
				%local1 = alloca i256
				%local2 = alloca i256
				%local1_cast = bitcast i256* %local1 to i8*
				%local2_cast = bitcast i256* %local2 to i8*
				br i1 undef, label %normalPath, label %outlinedPath

				normalPath:
				; These two uses of stack slots are non-overlapping. Based on this alone,
				; the stack slots could be merged.
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %local1_cast)
				call void @use(i8* %local1_cast)
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local1_cast)
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %local2_cast)
				call void @use(i8* %local2_cast)
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local2_cast)
				ret void

				; CHECK-LABEL: codeRepl:
				; CHECK: [[local1_cast:%.]] = bitcast i256 %local1 to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 -1, i8* [[local1_cast]])
				; CHECK-NEXT: [[local2_cast:%.]] = bitcast i256 %local2 to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 -1, i8* [[local2_cast]])
				; CHECK-NEXT: call i1 @foo.cold.1(i8* %local1_cast, i8* %local2_cast)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 -1, i8* [[local1_cast]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 -1, i8* [[local2_cast]])
				; CHECK-NEXT: br i1

				outlinedPath:
				; These two uses of stack slots are overlapping. This should prevent
				; merging of stack slots. CodeExtractor must replicate the effects of
				; these markers in the caller to inhibit stack coloring.
				%gep1 = getelementptr inbounds i8, i8* %local1_cast, i64 1
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %gep1)
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %local2_cast)
				call void @cold_use2(i8* %local1_cast, i8* %local2_cast)
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %gep1)
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local2_cast)
				br i1 undef, label %outlinedPath2, label %outlinedPathExit

				outlinedPath2:
				; These extra lifetime markers are used to test that we emit only one
				; pair of guard markers in the caller per memory object.
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %local2_cast)
				call void @use(i8* %local2_cast)
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local2_cast)
				ret void

				outlinedPathExit:
				ret void
				}

				; CHECK-LABEL: define {{.*}}@foo.cold.1(
				; CHECK-NOT: @llvm.lifetime

llvm/trunk/test/Transforms/HotColdSplit/lifetime-markers-on-inputs-2.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -hotcoldsplit -hotcoldsplit-threshold=0 < %s 2>&1 \| FileCheck %s

				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture)

				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture)

				declare void @cold_use(i8*) cold

				; In this CFG, splitting will extract the blocks extract{1,2}. I.e., it will
				; extract a lifetime.start marker, but not the corresponding lifetime.end
				; marker. Make sure that a lifetime.start marker is emitted before the call to
				; the split function, and only that marker.
				;
				; entry
				; / \
				; extract1 no-extract1
				; (lt.start) \|
				; / \|
				; extract2 \|
				; \_____ \|
				; \ /
				; exit
				; (lt.end)
				;
				; After splitting, we should see:
				;
				; entry
				; / \
				; codeRepl no-extract1
				; (lt.start) \|
				; \ /
				; exit
				; (lt.end)
				define void @only_lifetime_start_is_cold() {
				; CHECK-LABEL: @only_lifetime_start_is_cold(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LOCAL1:%.*]] = alloca i256
				; CHECK-NEXT: [[LOCAL1_CAST:%.]] = bitcast i256 [[LOCAL1]] to i8*
				; CHECK-NEXT: br i1 undef, label [[CODEREPL:%.]], label [[NO_EXTRACT1:%.]]
				; CHECK: codeRepl:
				; CHECK-NEXT: [[LT_CAST:%.]] = bitcast i256 [[LOCAL1]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 -1, i8* [[LT_CAST]])
				; CHECK-NEXT: [[TARGETBLOCK:%.]] = call i1 @only_lifetime_start_is_cold.cold.1(i8 [[LOCAL1_CAST]]) #3
				; CHECK-NEXT: br i1 [[TARGETBLOCK]], label [[NO_EXTRACT1]], label [[EXIT:%.*]]
				; CHECK: no-extract1:
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 1, i8* [[LOCAL1_CAST]])
				; CHECK-NEXT: ret void
				;
				entry:
				%local1 = alloca i256
				%local1_cast = bitcast i256* %local1 to i8*
				br i1 undef, label %extract1, label %no-extract1

				extract1:
				; lt.start
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %local1_cast)
				call void @cold_use(i8* %local1_cast)
				br i1 undef, label %extract2, label %no-extract1

				extract2:
				br label %exit

				no-extract1:
				br label %exit

				exit:
				; lt.end
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local1_cast)
				ret void
				}

				; In this CFG, splitting will extract the block extract1. I.e., it will extract
				; a lifetime.end marker, but not the corresponding lifetime.start marker. Make
				; sure that a lifetime.end marker is emitted after the call to the split
				; function, and only that marker.
				;
				; entry
				; (lt.start)
				; / \
				; no-extract1 extract1
				; (lt.end) (lt.end)
				; \ /
				; exit
				;
				; After splitting, we should see:
				;
				; entry
				; (lt.start)
				; / \
				; no-extract1 codeRepl
				; (lt.end) (lt.end)
				; \ /
				; exit
				define void @only_lifetime_end_is_cold() {
				; CHECK-LABEL: @only_lifetime_end_is_cold(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LOCAL1:%.*]] = alloca i256
				; CHECK-NEXT: [[LOCAL1_CAST:%.]] = bitcast i256 [[LOCAL1]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 1, i8* [[LOCAL1_CAST]])
				; CHECK-NEXT: br i1 undef, label [[NO_EXTRACT1:%.]], label [[CODEREPL:%.]]
				; CHECK: no-extract1:
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 1, i8* [[LOCAL1_CAST]])
				; CHECK-NEXT: br label [[EXIT:%.*]]
				; CHECK: codeRepl:
				; CHECK-NEXT: [[LT_CAST:%.]] = bitcast i256 [[LOCAL1]] to i8*
				; CHECK-NEXT: call void @only_lifetime_end_is_cold.cold.1(i8* [[LOCAL1_CAST]]) #3
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 -1, i8* [[LT_CAST]])
				; CHECK-NEXT: br label [[EXIT]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
				entry:
				; lt.start
				%local1 = alloca i256
				%local1_cast = bitcast i256* %local1 to i8*
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %local1_cast)
				br i1 undef, label %no-extract1, label %extract1

				no-extract1:
				; lt.end
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local1_cast)
				br label %exit

				extract1:
				; lt.end
				call void @cold_use(i8* %local1_cast)
				call void @llvm.lifetime.end.p0i8(i64 1, i8* %local1_cast)
				br label %exit

				exit:
				ret void
				}

llvm/trunk/test/Transforms/HotColdSplit/lifetime-markers-on-inputs.ll

	; RUN: opt -S -hotcoldsplit -hotcoldsplit-threshold=0 < %s 2>&1 \| FileCheck %s

	declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture)

	declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture)

	declare void @use(i8*)

	declare void @cold_use2(i8, i8) cold

	; CHECK-LABEL: define {{.*}}@foo(
	define void @foo() {
	entry:
	%local1 = alloca i256
	%local2 = alloca i256
	%local1_cast = bitcast i256* %local1 to i8*
	%local2_cast = bitcast i256* %local2 to i8*
	br i1 undef, label %normalPath, label %outlinedPath

	normalPath:
	; These two uses of stack slots are non-overlapping. Based on this alone,
	; the stack slots could be merged.
	call void @llvm.lifetime.start.p0i8(i64 1, i8* %local1_cast)
	call void @use(i8* %local1_cast)
	call void @llvm.lifetime.end.p0i8(i64 1, i8* %local1_cast)
	call void @llvm.lifetime.start.p0i8(i64 1, i8* %local2_cast)
	call void @use(i8* %local2_cast)
	call void @llvm.lifetime.end.p0i8(i64 1, i8* %local2_cast)
	ret void

	; CHECK-LABEL: codeRepl:
	; CHECK: [[local1_cast:%.]] = bitcast i256 %local1 to i8*
	; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 -1, i8* [[local1_cast]])
	; CHECK-NEXT: [[local2_cast:%.]] = bitcast i256 %local2 to i8*
	; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 -1, i8* [[local2_cast]])
	; CHECK-NEXT: call i1 @foo.cold.1(i8* %local1_cast, i8* %local2_cast)
	; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 -1, i8* [[local1_cast]])
	; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 -1, i8* [[local2_cast]])
	; CHECK-NEXT: br i1

	outlinedPath:
	; These two uses of stack slots are overlapping. This should prevent
	; merging of stack slots. CodeExtractor must replicate the effects of
	; these markers in the caller to inhibit stack coloring.
	%gep1 = getelementptr inbounds i8, i8* %local1_cast, i64 1
	call void @llvm.lifetime.start.p0i8(i64 1, i8* %gep1)
	call void @llvm.lifetime.start.p0i8(i64 1, i8* %local2_cast)
	call void @cold_use2(i8* %local1_cast, i8* %local2_cast)
	call void @llvm.lifetime.end.p0i8(i64 1, i8* %gep1)
	call void @llvm.lifetime.end.p0i8(i64 1, i8* %local2_cast)
	br i1 undef, label %outlinedPath2, label %outlinedPathExit

	outlinedPath2:
	; These extra lifetime markers are used to test that we emit only one
	; pair of guard markers in the caller per memory object.
	call void @llvm.lifetime.start.p0i8(i64 1, i8* %local2_cast)
	call void @use(i8* %local2_cast)
	call void @llvm.lifetime.end.p0i8(i64 1, i8* %local2_cast)
	ret void

	outlinedPathExit:
	ret void
	}

	; CHECK-LABEL: define {{.*}}@foo.cold.1(
	; CHECK-NOT: @llvm.lifetime