This is an archive of the discontinued LLVM Phabricator instance.

[Statepoints] Reuse stack slots for assignment idioms
AbandonedPublic

Authored by reames on Jun 11 2015, 2:19 PM.

Download Raw Diff

Details

Reviewers

igor-laevsky
pgavlin
sanjoy

Summary

Reading over @igor's changes in 239472, it hit me that the same scheme could be easily extended to implement one of the existing TODOs in the lowering code.

If we have an idiom that looks like assignment (i.e. a new SSA value produced from some operation where it's input is no longer used), we can reuse the stack slot of the original SSA value for the updated value. This works really nicely for GEPs on x86 because we can use an update directly against the stack slot and avoid a fill, modify, spill pattern entirely.

It doesn't work quite as cleanly for deoptimization arguments, mostly because the original spill slot is not reused for the update. I suspect this might be because we're marking the slot as being updated by the previous statepoint, but I decided to separate that into a separate patch. We at least get better stack slot usage, even if we don't yet get ideal code gen.

(The best option would of course be to directly use the register allocator, but a) that's hard, and b) this is an incremental improvement.)

Diff Detail

Event Timeline

reames updated this revision to Diff 27543.Jun 11 2015, 2:19 PM

reames retitled this revision from to [Statepoints] Reuse stack slots for assignment idioms.

reames updated this object.

reames edited the test plan for this revision. (Show Details)

reames added reviewers: igor-laevsky, sanjoy, pgavlin.

reames added subscribers: Unknown Object (MLST), • igor.

Do I understand correctly that you have relaxed constraints on findPreviousSpillSlot to return location which does not necessarily contains requested value? If so, maybe it's better to rename it? Something like findPreferredSpillSlot.

Also with this approach we can do better for phi nodes. When visiting phi node we can return known location even if some of the phi node branches are unknown. There is TODO for this somewhere in findPreviousSpillSlot. Not sure how profitable it would be though.

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
422–424	Are you certain about this? I do remember experimenting with this approach, but I was seeing quite the opposite picture - almost no redundant stores were removed. But actually it was quite a while ago and I didn't dig to deep into this, maybe my code had some simple mistake. I will try to run few of my old tests on your code tomorrow.
test/CodeGen/X86/statepoint-stack-usage.ll
98	Is that different from back_to_back_calls test?

igor-laevsky added inline comments.Jun 12 2015, 8:37 AM

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
422–424	Yes, as I suspected there is little llvm can do about this redundant stores. Actually from quick glance at the code I don't see place in llvm where it removes dead stores in machine code. There is small function in StackSlotColoring, but it is very limited. MachineCSE and RemoveDeadInstructions explicitly discard all mayStore values. I am wondering is there any good reason why there is no dead store elimination for the machine code? In theory it should not be very hard to implement.
test/CodeGen/X86/statepoint-stack-usage.ll
137	This actually passes even without your changes. Maybe add some additional oops to the second statepoint, so that by default we would assign different slots for a1 and a1_derived.

@igor - Your restatement is correct. I like your idea of renaming the method, but I hadn't really come up with a great name. As your follow on question about PHIs points out, the current notion is actually stronger than just "preferred". It's more like "preferred so strongly that there can't be a better slot to pick".

Hm, I think I just understood your question about phis after writing that. Were you intending to say that we could special case PHIs where exactly one input had a spill slot assigned and that input's only use is the phi? That avoids the problem of picking one spill slot when multiple are available. Might be worth exploring in a future change.

w.r.t. your comments about DSE, no, I'm not really sure. :) I may be utterly wrong in fact. Having said that, non of the existing test cases break with this change and I didn't see any differences in codegen for the example I happened to look at. I'll revert this part of the patch, but I'll have to introduce another data structure to do it. Can I ask you to submit a test which would have caught this?

In D10402#188451, @reames wrote:

@igor - Your restatement is correct. I like your idea of renaming the method, but I hadn't really come up with a great name. As your follow on question about PHIs points out, the current notion is actually stronger than just "preferred". It's more like "preferred so strongly that there can't be a better slot to pick".

Maybe findBestSpillSlot? Or findOptimalSpillSlot.

Hm, I think I just understood your question about phis after writing that. Were you intending to say that we could special case PHIs where exactly one input had a spill slot assigned and that input's only use is the phi? That avoids the problem of picking one spill slot when multiple are available. Might be worth exploring in a future change.

Yes, that's correct. However we don't need to special case only two entry phi nodes. Condition could be "spill slots for all incoming values are same, but some might be unknown". (Currently it is "spill slots for all incoming values are known and same")

w.r.t. your comments about DSE, no, I'm not really sure. :) I may be utterly wrong in fact. Having said that, non of the existing test cases break with this change and I didn't see any differences in codegen for the example I happened to look at. I'll revert this part of the patch, but I'll have to introduce another data structure to do it. Can I ask you to submit a test which would have caught this?

I submitted test case in r239842.

I may come back to this at some point, but I can't justify the time now or any time in the near future.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

StatepointLowering.h

4 lines

StatepointLowering.cpp

90 lines

test/

CodeGen/

X86/

statepoint-stack-usage.ll

85 lines

Diff 27543

lib/CodeGen/SelectionDAG/StatepointLowering.h

Show All 37 Lines	public:

/// Clear the memory usage of this object. This is called from		/// Clear the memory usage of this object. This is called from
/// SelectionDAGBuilder::clear. We require this is never called in the		/// SelectionDAGBuilder::clear. We require this is never called in the
/// midst of processing a statepoint sequence.		/// midst of processing a statepoint sequence.
void clear();		void clear();

/// Returns the spill location of a value incoming to the current		/// Returns the spill location of a value incoming to the current
/// statepoint. Will return SDValue() if this value hasn't been		/// statepoint. Will return SDValue() if this value hasn't been
/// spilled. Otherwise, the value has already been spilled and no		/// assigned a spill slot. Note that the value may have been assigned a
/// further action is required by the caller.		/// spill slot, but not yet actually spilled.
SDValue getLocation(SDValue val) {		SDValue getLocation(SDValue val) {
if (!Locations.count(val))		if (!Locations.count(val))
return SDValue();		return SDValue();
return Locations[val];		return Locations[val];
}		}
void setLocation(SDValue val, SDValue Location) {		void setLocation(SDValue val, SDValue Location) {
assert(!Locations.count(val) &&		assert(!Locations.count(val) &&
"Trying to allocate already allocated location");		"Trying to allocate already allocated location");
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/StatepointLowering.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	for (int i = 0; i < 40000; i++) {
// minor bug previously. Unless performance shows this matters, please		// minor bug previously. Unless performance shows this matters, please
// keep this code as simple as possible.		// keep this code as simple as possible.
NextSlotToAllocate++;		NextSlotToAllocate++;
}		}
llvm_unreachable("infinite loop?");		llvm_unreachable("infinite loop?");
}		}

/// Utility function for reservePreviousStackSlotForValue. Tries to find		/// Utility function for reservePreviousStackSlotForValue. Tries to find
/// stack slot index to which we have spilled value for previous statepoints.		/// stack slot index which is obviously profitable to use for this value.
/// LookUpDepth specifies maximum DFS depth this function is allowed to look.		/// Looks for values which spills of the same value, values generated from
		/// reloads, and simple updates to previously spilled values. LookUpDepth
		/// specifies maximum DFS depth this function is allowed to look. This function
		/// doesn't worry about conflicting spill slots, that's handled by it's
		/// caller.
static Optional<int> findPreviousSpillSlot(const Value *Val,		static Optional<int> findPreviousSpillSlot(const Value *Val,
SelectionDAGBuilder &Builder,		SelectionDAGBuilder &Builder,
int LookUpDepth) {		int LookUpDepth) {
// Can not look any futher - give up now		// Can not look any futher - give up now
if (LookUpDepth <= 0)		if (LookUpDepth <= 0)
return Optional<int>();		return Optional<int>();

// Spill location is known for gc relocates		// Spill location is known for gc relocates
if (isGCRelocate(Val)) {		if (isGCRelocate(Val)) {
GCRelocateOperands RelocOps(cast<Instruction>(Val));		GCRelocateOperands RelocOps(cast<Instruction>(Val));

FunctionLoweringInfo::StatepointSpilledValueMapTy &SpillMap =		FunctionLoweringInfo::StatepointSpilledValueMapTy &SpillMap =
Builder.FuncInfo.StatepointRelocatedValues[RelocOps.getStatepoint()];		Builder.FuncInfo.StatepointRelocatedValues[RelocOps.getStatepoint()];

auto It = SpillMap.find(RelocOps.getDerivedPtr());		auto It = SpillMap.find(RelocOps.getDerivedPtr());
if (It == SpillMap.end())		if (It == SpillMap.end())
return Optional<int>();		return Optional<int>();

return It->second;		return It->second;
}		}

// Look through bitcast instructions.		// Look through bitcast instructions. We can do this since bitcast don't
		// change the value in the slot and we can store both the original and
		// bitcasted value in the same slot.
if (const BitCastInst *Cast = dyn_cast<BitCastInst>(Val)) {		if (const BitCastInst *Cast = dyn_cast<BitCastInst>(Val)) {
return findPreviousSpillSlot(Cast->getOperand(0), Builder, LookUpDepth - 1);		return findPreviousSpillSlot(Cast->getOperand(0), Builder, LookUpDepth - 1);
}		}

		// We can place a new value in an existing stack slot if we can be confident
		// that the original value won't be live over the statepoint and thus
		// preferrable to put in that slot. Note that hasOneUse check is an
		// optimization heuristic, not a correctness check. Conflicting reservations
		// are handled explicitly in our caller. Single use updates have a
		// particularly nice property on x86 in that many of them can be folded to
		// updates against memory if the same spill slot can be used on both sides.
		// This really helps cases like:
		// statepoint @foo() (p)
		// p2 = gep p+1
		// statepoint @bar() (p2) <-- where p is dead here
		// Note that the spill slot reuse works for deopt state as well, but we don't
		// get the nice update in place property since the data dependency in on p,
		// not reload(spill(p)) and the register allocator doesn't know that the
		// spill slot contains the value of (p). This is probably fixable.
		if (Val->hasOneUse()) {
		if (auto *GEP = dyn_cast<GetElementPtrInst>(Val))
		return findPreviousSpillSlot(GEP->getPointerOperand(), Builder,
		LookUpDepth - 1);
		if (auto *BOp = dyn_cast<BinaryOperator>(Val))
		if (isa<Constant>(BOp->getOperand(1)))
		return findPreviousSpillSlot(BOp->getOperand(0), Builder,
		LookUpDepth - 1);
		}

// Look through phi nodes		// Look through phi nodes
// All incoming values should have same known stack slot, otherwise result		// All incoming values should have same known stack slot, otherwise result
// is unknown.		// is unknown.
if (const PHINode *Phi = dyn_cast<PHINode>(Val)) {		if (const PHINode *Phi = dyn_cast<PHINode>(Val)) {
Optional<int> MergedResult = None;		Optional<int> MergedResult = None;

for (auto &IncomingValue : Phi->incoming_values()) {		for (auto &IncomingValue : Phi->incoming_values()) {
Optional<int> SpillSlot =		Optional<int> SpillSlot =
Show All 15 Lines	static Optional<int> findPreviousSpillSlot(const Value *Val,
// We will return that stack slot for ptr is unknown. And later we might		// We will return that stack slot for ptr is unknown. And later we might
// assign different stack slots for ptr and relocated_pointer. This limits		// assign different stack slots for ptr and relocated_pointer. This limits
// llvm's ability to remove redundant stores.		// llvm's ability to remove redundant stores.
// Unfortunately it's hard to accomplish in current infrastructure.		// Unfortunately it's hard to accomplish in current infrastructure.
// We use this function to eliminate spill store completely, while		// We use this function to eliminate spill store completely, while
// in example we still need to emit store, but instead of any location		// in example we still need to emit store, but instead of any location
// we need to use special "preferred" location.		// we need to use special "preferred" location.

// TODO: handle simple updates. If a value is modified and the original
// value is no longer live, it would be nice to put the modified value in the
// same slot. This allows folding of the memory accesses for some
// instructions types (like an increment).
// statepoint (i)
// i1 = i+1
// statepoint (i1)
// However we need to be careful for cases like this:
// statepoint(i)
// i1 = i+1
// statepoint(i, i1)
// Here we want to reserve spill slot for 'i', but not for 'i+1'. If we just
// put handling of simple modifications in this function like it's done
// for bitcasts we might end up reserving i's slot for 'i+1' because order in
// which we visit values is unspecified.

// Don't know any information about this instruction		// Don't know any information about this instruction
return Optional<int>();		return Optional<int>();
}		}

/// Try to find existing copies of the incoming values in stack slots used for		/// Try to find existing copies of the incoming values in stack slots used for
/// statepoint spilling. If we can find a spill slot for the incoming value,		/// statepoint spilling. If we can find a spill slot for the incoming value,
/// mark that slot as allocated, and reuse the same slot for this safepoint.		/// mark that slot as allocated, and reuse the same slot for this safepoint.
/// This helps to avoid series of loads and stores that only serve to resuffle		/// This helps to avoid series of loads and stores that only serve to resuffle
/// values on the stack between calls.		/// values on the stack between calls. We may also be able to reuse an
		/// existing slot if we can trivially tell the original value isn't live at
		/// this statepoint but a value computed from it is.
static void reservePreviousStackSlotForValue(const Value *IncomingValue,		static void reservePreviousStackSlotForValue(const Value *IncomingValue,
SelectionDAGBuilder &Builder) {		SelectionDAGBuilder &Builder) {

SDValue Incoming = Builder.getValue(IncomingValue);		SDValue Incoming = Builder.getValue(IncomingValue);

if (isa<ConstantSDNode>(Incoming) \|\| isa<FrameIndexSDNode>(Incoming)) {		if (isa<ConstantSDNode>(Incoming) \|\| isa<FrameIndexSDNode>(Incoming)) {
// We won't need to spill this, so no need to check for previously		// We won't need to spill this, so no need to check for previously
// allocated stack slots		// allocated stack slots
Show All 16 Lines	static void reservePreviousStackSlotForValue(const Value *IncomingValue,
assert(Itr != Builder.FuncInfo.StatepointStackSlots.end() &&		assert(Itr != Builder.FuncInfo.StatepointStackSlots.end() &&
"value spilled to the unknown stack slot");		"value spilled to the unknown stack slot");

// This is one of our dedicated lowering slots		// This is one of our dedicated lowering slots
const int Offset =		const int Offset =
std::distance(Builder.FuncInfo.StatepointStackSlots.begin(), Itr);		std::distance(Builder.FuncInfo.StatepointStackSlots.begin(), Itr);
if (Builder.StatepointLowering.isStackSlotAllocated(Offset)) {		if (Builder.StatepointLowering.isStackSlotAllocated(Offset)) {
// stack slot already assigned to someone else, can't use it!		// stack slot already assigned to someone else, can't use it!
// TODO: currently we reserve space for gc arguments after doing
// normal allocation for deopt arguments. We should reserve for
// _all_ deopt and gc arguments, then start allocating. This
// will prevent some moves being inserted when vm state changes,
// but gc state doesn't between two calls.
return;		return;
}		}
// Reserve this stack slot		// Reserve this stack slot
Builder.StatepointLowering.reserveStackSlot(Offset);		Builder.StatepointLowering.reserveStackSlot(Offset);

// Cache this slot so we find it when going through the normal		// Cache this slot so we find it when going through the normal
// assignment loop.		// assignment loop.
SDValue Loc = Builder.DAG.getTargetFrameIndex(*Index, Incoming.getValueType());		SDValue Loc = Builder.DAG.getTargetFrameIndex(*Index, Incoming.getValueType());
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
/// is a null constant. Return pair with first element being frame index		/// is a null constant. Return pair with first element being frame index
/// containing saved value and second element with outgoing chain from the		/// containing saved value and second element with outgoing chain from the
/// emitted store		/// emitted store
static std::pair<SDValue, SDValue>		static std::pair<SDValue, SDValue>
spillIncomingStatepointValue(SDValue Incoming, SDValue Chain,		spillIncomingStatepointValue(SDValue Incoming, SDValue Chain,
SelectionDAGBuilder &Builder) {		SelectionDAGBuilder &Builder) {
SDValue Loc = Builder.StatepointLowering.getLocation(Incoming);		SDValue Loc = Builder.StatepointLowering.getLocation(Incoming);

// Emit new store if we didn't do it for this ptr before		// If we couldn't find a profitable stack slot previously, assign one.
if (!Loc.getNode()) {		if (!Loc.getNode()) {
Loc = Builder.StatepointLowering.allocateStackSlot(Incoming.getValueType(),		Loc = Builder.StatepointLowering.allocateStackSlot(Incoming.getValueType(),
Builder);		Builder);
assert(isa<FrameIndexSDNode>(Loc));		assert(isa<FrameIndexSDNode>(Loc));
		Builder.StatepointLowering.setLocation(Incoming, Loc);
		}

		// Now spill the value. When we have multiple values which map the same
		// stack slot this may result in redundant spills, but we can generally rely
		// on the backend to clean these up for us and don't need to be fancy here.
		igor-laevskyUnsubmitted Not Done Reply Inline Actions Are you certain about this? I do remember experimenting with this approach, but I was seeing quite the opposite picture - almost no redundant stores were removed. But actually it was quite a while ago and I didn't dig to deep into this, maybe my code had some simple mistake. I will try to run few of my old tests on your code tomorrow. igor-laevsky: Are you certain about this? I do remember experimenting with this approach, but I was seeing…
		igor-laevskyUnsubmitted Not Done Reply Inline Actions Yes, as I suspected there is little llvm can do about this redundant stores. Actually from quick glance at the code I don't see place in llvm where it removes dead stores in machine code. There is small function in StackSlotColoring, but it is very limited. MachineCSE and RemoveDeadInstructions explicitly discard all mayStore values. I am wondering is there any good reason why there is no dead store elimination for the machine code? In theory it should not be very hard to implement. igor-laevsky: Yes, as I suspected there is little llvm can do about this redundant stores. Actually from…

int Index = cast<FrameIndexSDNode>(Loc)->getIndex();		int Index = cast<FrameIndexSDNode>(Loc)->getIndex();
// We use TargetFrameIndex so that isel will not select it into LEA		// We use TargetFrameIndex so that isel will not select it into LEA
Loc = Builder.DAG.getTargetFrameIndex(Index, Incoming.getValueType());		Loc = Builder.DAG.getTargetFrameIndex(Index, Incoming.getValueType());

// TODO: We can create TokenFactor node instead of		// TODO: We can create TokenFactor node instead of
// chaining stores one after another, this may allow		// chaining stores one after another, this may allow
// a bit more optimal scheduling for them		// a bit more optimal scheduling for them
Chain = Builder.DAG.getStore(Chain, Builder.getCurSDLoc(), Incoming, Loc,		Chain = Builder.DAG.getStore(Chain, Builder.getCurSDLoc(), Incoming, Loc,
MachinePointerInfo::getFixedStack(Index),		MachinePointerInfo::getFixedStack(Index),
false, false, 0);		false, false, 0);

Builder.StatepointLowering.setLocation(Incoming, Loc);
}

assert(Loc.getNode());		assert(Loc.getNode());
return std::make_pair(Loc, Chain);		return std::make_pair(Loc, Chain);
}		}

/// Lower a single value incoming to a statepoint node. This value can be		/// Lower a single value incoming to a statepoint node. This value can be
/// either a deopt value or a gc value, the handling is the same. We special		/// either a deopt value or a gc value, the handling is the same. We special
/// case constants and allocas, then fall back to spilling if required.		/// case constants and allocas, then fall back to spilling if required.
static void lowerIncomingStatepointValue(SDValue Incoming,		static void lowerIncomingStatepointValue(SDValue Incoming,
▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

test/CodeGen/X86/statepoint-stack-usage.ll

; RUN: llc < %s \| FileCheck %s		; RUN: llc < %s \| FileCheck %s

target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"		target triple = "x86_64-pc-linux-gnu"

; This test is checking to make sure that we reuse the same stack slots		; This test is checking to make sure that we reuse the same stack slots
; for GC values spilled over two different call sites. Since the order		; for GC values spilled over two different call sites. Since the order
; of GC arguments differ, niave lowering code would insert loads and		; of GC arguments differ, niave lowering code would insert loads and
; stores to rearrange items on the stack. We need to make sure (for		; stores to rearrange items on the stack. We need to make sure (for
; performance) that this doesn't happen.		; performance) that this doesn't happen.
define i32 @back_to_back_calls(i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c) #1 gc "statepoint-example" {		define i32 @back_to_back_calls(i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c) gc "statepoint-example" {
; CHECK-LABEL: back_to_back_calls		; CHECK-LABEL: back_to_back_calls
; The exact stores don't matter, but there need to be three stack slots created		; The exact stores don't matter, but there need to be three stack slots created
; CHECK: movq %rdi, 16(%rsp)		; CHECK: movq %rdi, 16(%rsp)
; CHECK: movq %rdx, 8(%rsp)		; CHECK: movq %rdx, 8(%rsp)
; CHECK: movq %rsi, (%rsp)		; CHECK: movq %rsi, (%rsp)
%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c)		%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c)
%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 12)		%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 12)
%b1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 13)		%b1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 13)
%c1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 14)		%c1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 14)
; CHECK: callq		; CHECK: callq
; This is the key check. There should NOT be any memory moves here		; This is the key check. There should NOT be any memory moves here
; CHECK-NOT: movq		; CHECK-NOT: movq
%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %c1, i32 addrspace(1)* %b1, i32 addrspace(1)* %a1)		%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %c1, i32 addrspace(1)* %b1, i32 addrspace(1)* %a1)
%a2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 14)		%a2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 14)
%b2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 13)		%b2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 13)
%c2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 12)		%c2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 12)
; CHECK: callq		; CHECK: callq
ret i32 1		ret i32 1
}		}

; This test simply checks that minor changes in vm state don't prevent slots		; This test simply checks that minor changes in vm state don't prevent slots
; being reused for gc values.		; being reused for gc values.
define i32 @reserve_first(i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c) #1 gc "statepoint-example" {		define i32 @reserve_first(i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c) gc "statepoint-example" {
; CHECK-LABEL: reserve_first		; CHECK-LABEL: reserve_first
; The exact stores don't matter, but there need to be three stack slots created		; The exact stores don't matter, but there need to be three stack slots created
; CHECK: movq %rdi, 16(%rsp)		; CHECK: movq %rdi, 16(%rsp)
; CHECK: movq %rdx, 8(%rsp)		; CHECK: movq %rdx, 8(%rsp)
; CHECK: movq %rsi, (%rsp)		; CHECK: movq %rsi, (%rsp)
%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c)		%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c)
%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 12)		%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 12)
%b1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 13)		%b1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 13)
%c1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 14)		%c1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 12, i32 14)
; CHECK: callq		; CHECK: callq
; This is the key check. There should NOT be any memory moves here		; This is the key check. There should NOT be any memory moves here
; CHECK-NOT: movq		; CHECK-NOT: movq
%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 addrspace(1)* %a1, i32 0, i32 addrspace(1)* %c1, i32 0, i32 0, i32 addrspace(1)* %c1, i32 addrspace(1)* %b1, i32 addrspace(1)* %a1)		%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 addrspace(1)* %a1, i32 0, i32 addrspace(1)* %c1, i32 0, i32 0, i32 addrspace(1)* %c1, i32 addrspace(1)* %b1, i32 addrspace(1)* %a1)
%a2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 14)		%a2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 14)
%b2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 13)		%b2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 13)
%c2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 12)		%c2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 12, i32 12)
; CHECK: callq		; CHECK: callq
ret i32 1		ret i32 1
}		}

; Test that stack slots are reused for invokes		; Test that stack slots are reused for invokes
define i32 @back_to_back_invokes(i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c) #1 gc "statepoint-example" {		define i32 @back_to_back_invokes(i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c) gc "statepoint-example" {
; CHECK-LABEL: back_to_back_invokes		; CHECK-LABEL: back_to_back_invokes
entry:		entry:
; The exact stores don't matter, but there need to be three stack slots created		; The exact stores don't matter, but there need to be three stack slots created
; CHECK: movq %rdi, 16(%rsp)		; CHECK: movq %rdi, 16(%rsp)
; CHECK: movq %rdx, 8(%rsp)		; CHECK: movq %rdx, 8(%rsp)
; CHECK: movq %rsi, (%rsp)		; CHECK: movq %rsi, (%rsp)
; CHECK: callq		; CHECK: callq
%safepoint_token = invoke i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c)		%safepoint_token = invoke i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32 addrspace(1)* %b, i32 addrspace(1)* %c)
Show All 23 Lines	exceptional_return:
ret i32 0		ret i32 0

exceptional_return2:		exceptional_return2:
%landing_pad2 = landingpad { i8, i32 } personality i32 () @"personality_function"		%landing_pad2 = landingpad { i8, i32 } personality i32 () @"personality_function"
cleanup		cleanup
ret i32 0		ret i32 0
}		}

		; A simple forwarding case where the same value is used across two
		; safepoints. We'll build on this pattern for other tests.
		define i32 addrspace(1)* @test3(i32 addrspace(1)* %a) gc "statepoint-example" {
		igor-laevskyUnsubmitted Not Done Reply Inline Actions Is that different from back_to_back_calls test? igor-laevsky: Is that different from back_to_back_calls test?
		; CHECK-LABEL: test3
		; CHECK: movq %rdi, (%rsp)
		; CHECK-NEXT: callq
		; CHECK-NOT: movq
		; CHECK: callq
		; CHECK: movq (%rsp), %rax

		%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a)
		%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 7, i32 7)
		%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a1)
		%a2 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 7, i32 7)
		ret i32 addrspace(1)* %a2
		}

		; A bitcast does not change the value stored, we can record the same stack slot
		; for both a bit cast and it's source value. We'll end up with one stack slot,
		; but two locations in the stackmap.
		define i32 addrspace(1)* @test4(i32 addrspace(1)* %a) gc "statepoint-example" {
		; CHECK-LABEL: test4
		; CHECK: movq %rdi, (%rsp)
		; CHECK-NEXT: callq
		; CHECK-NOT: movq
		; CHECK: callq
		; CHECK: movq (%rsp), %rax

		%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a)
		%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 7, i32 7)
		%a1_cast = bitcast i32 addrspace(1)* %a1 to i64 addrspace(1)*
		%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a1, i64 addrspace(1)* %a1_cast)
		%unused = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 7, i32 7)
		%a2 = tail call coldcc i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32 %safepoint_token2, i32 8, i32 8)
		%a2_cast = bitcast i64 addrspace(1)* %a2 to i32 addrspace(1)*
		ret i32 addrspace(1)* %a2_cast
		}

		; For certain RMW idioms between statepoints, we can reuse the original stack
		; slot and update the memory slot directly. This is obviously profitable if
		; the original value is only live for the update.
		define i32 addrspace(1)* @test5(i32 addrspace(1)* %a) gc "statepoint-example" {
		igor-laevskyUnsubmitted Not Done Reply Inline Actions This actually passes even without your changes. Maybe add some additional oops to the second statepoint, so that by default we would assign different slots for a1 and a1_derived. igor-laevsky: This actually passes even without your changes. Maybe add some additional oops to the second…
		; CHECK-LABEL: test5
		; CHECK: movq %rdi, (%rsp)
		; CHECK-NEXT: callq
		; CHECK-NOT: movq
		; CHECK: addq $32, (%rsp)
		; CHECK-NEXT: callq
		; CHECK: movq (%rsp), %rax

		%safepoint_token = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a)
		%a1 = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 7, i32 7)
		%a1_derived = getelementptr i32, i32 addrspace(1)* %a1, i64 8
		%safepoint_token2 = tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a1_derived)
		%relocated = tail call coldcc i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token2, i32 7, i32 7)
		ret i32 addrspace(1)* %relocated
		}

		; Same as test5, but with an integer add operation on deopt state rather than
		; a GEP on a GC pointer. Note that we don't get update in place at the moment.
		define void @test6(i32 %a) gc "statepoint-example" {
		; CHECK-LABEL: test6
		; CHECK: movl %edi, %ebx
		; CHECK-NEXT: movl %ebx, 12(%rsp)
		; CHECK-NEXT: callq
		; CHECK-NOT: movq
		; CHECK: incl %ebx
		; CHECK-NEXT: movl %ebx, 12(%rsp)
		; CHECK-NEXT: callq

		call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 1, i32 %a)
		%a1_add = add i32 %a, 1
		tail call i32 (i64, i32, void (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void () undef, i32 0, i32 0, i32 0, i32 1, i32 %a1_add)
		ret void
		}


; Function Attrs: nounwind		; Function Attrs: nounwind
declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32, i32, i32) #3		declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32, i32, i32) #3
		declare i64 addrspace(1)* @llvm.experimental.gc.relocate.p1i64(i32, i32, i32) #3
declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32, i32, i32) #3		declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(i32, i32, i32) #3

declare i32 @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32, void ()*, i32, i32, ...)		declare i32 @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32, void ()*, i32, i32, ...)

declare i32 @"personality_function"()		declare i32 @"personality_function"()

attributes #1 = { uwtable }