Skip to content

Commit 2b1084a

Browse files
committedAug 31, 2016
[statepoints][experimental] Add support for live-in semantics of values in deopt bundles
This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we *only* fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any *known* miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250
1 parent 6199b4f commit 2b1084a

File tree

6 files changed

+233
-9
lines changed

6 files changed

+233
-9
lines changed
 

Diff for: ‎llvm/include/llvm/IR/Statepoint.h

+9-2
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,15 @@ enum class StatepointFlags {
3434
None = 0,
3535
GCTransition = 1, ///< Indicates that this statepoint is a transition from
3636
///< GC-aware code to code that is not GC-aware.
37-
38-
MaskAll = GCTransition ///< A bitmask that includes all valid flags.
37+
/// Mark the deopt arguments associated with the statepoint as only being
38+
/// "live-in". By default, deopt arguments are "live-through". "live-through"
39+
/// requires that they the value be live on entry, on exit, and at any point
40+
/// during the call. "live-in" only requires the value be available at the
41+
/// start of the call. In particular, "live-in" values can be placed in
42+
/// unused argument registers or other non-callee saved registers.
43+
DeoptLiveIn = 2,
44+
45+
MaskAll = 3 ///< A bitmask that includes all valid flags.
3946
};
4047

4148
class GCRelocateInst;

Diff for: ‎llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp

+35-5
Original file line numberDiff line numberDiff line change
@@ -370,7 +370,7 @@ spillIncomingStatepointValue(SDValue Incoming, SDValue Chain,
370370
/// Lower a single value incoming to a statepoint node. This value can be
371371
/// either a deopt value or a gc value, the handling is the same. We special
372372
/// case constants and allocas, then fall back to spilling if required.
373-
static void lowerIncomingStatepointValue(SDValue Incoming,
373+
static void lowerIncomingStatepointValue(SDValue Incoming, bool LiveInOnly,
374374
SmallVectorImpl<SDValue> &Ops,
375375
SelectionDAGBuilder &Builder) {
376376
SDValue Chain = Builder.getRoot();
@@ -389,6 +389,14 @@ static void lowerIncomingStatepointValue(SDValue Incoming,
389389
// relocate the address of the alloca itself?)
390390
Ops.push_back(Builder.DAG.getTargetFrameIndex(FI->getIndex(),
391391
Incoming.getValueType()));
392+
} else if (LiveInOnly) {
393+
// If this value is live in (not live-on-return, or live-through), we can
394+
// treat it the same way patchpoint treats it's "live in" values. We'll
395+
// end up folding some of these into stack references, but they'll be
396+
// handled by the register allocator. Note that we do not have the notion
397+
// of a late use so these values might be placed in registers which are
398+
// clobbered by the call. This is fine for live-in.
399+
Ops.push_back(Incoming);
392400
} else {
393401
// Otherwise, locate a spill slot and explicitly spill it so it
394402
// can be found by the runtime later. We currently do not support
@@ -439,19 +447,38 @@ lowerStatepointMetaArgs(SmallVectorImpl<SDValue> &Ops,
439447
"non gc managed derived pointer found in statepoint");
440448
}
441449
}
450+
assert(SI.Bases.size() == SI.Ptrs.size() && "Pointer without base!");
442451
} else {
443452
assert(SI.Bases.empty() && "No gc specified, so cannot relocate pointers!");
444453
assert(SI.Ptrs.empty() && "No gc specified, so cannot relocate pointers!");
445454
}
446455
#endif
447456

457+
// Figure out what lowering strategy we're going to use for each part
458+
// Note: Is is conservatively correct to lower both "live-in" and "live-out"
459+
// as "live-through". A "live-through" variable is one which is "live-in",
460+
// "live-out", and live throughout the lifetime of the call (i.e. we can find
461+
// it from any PC within the transitive callee of the statepoint). In
462+
// particular, if the callee spills callee preserved registers we may not
463+
// be able to find a value placed in that register during the call. This is
464+
// fine for live-out, but not for live-through. If we were willing to make
465+
// assumptions about the code generator producing the callee, we could
466+
// potentially allow live-through values in callee saved registers.
467+
const bool LiveInDeopt =
468+
SI.StatepointFlags & (uint64_t)StatepointFlags::DeoptLiveIn;
469+
470+
auto isGCValue =[&](const Value *V) {
471+
return is_contained(SI.Ptrs, V) || is_contained(SI.Bases, V);
472+
};
473+
448474
// Before we actually start lowering (and allocating spill slots for values),
449475
// reserve any stack slots which we judge to be profitable to reuse for a
450476
// particular value. This is purely an optimization over the code below and
451477
// doesn't change semantics at all. It is important for performance that we
452478
// reserve slots for both deopt and gc values before lowering either.
453479
for (const Value *V : SI.DeoptState) {
454-
reservePreviousStackSlotForValue(V, Builder);
480+
if (!LiveInDeopt || isGCValue(V))
481+
reservePreviousStackSlotForValue(V, Builder);
455482
}
456483
for (unsigned i = 0; i < SI.Bases.size(); ++i) {
457484
reservePreviousStackSlotForValue(SI.Bases[i], Builder);
@@ -468,7 +495,8 @@ lowerStatepointMetaArgs(SmallVectorImpl<SDValue> &Ops,
468495
// what type of values are contained within.
469496
for (const Value *V : SI.DeoptState) {
470497
SDValue Incoming = Builder.getValue(V);
471-
lowerIncomingStatepointValue(Incoming, Ops, Builder);
498+
const bool LiveInValue = LiveInDeopt && !isGCValue(V);
499+
lowerIncomingStatepointValue(Incoming, LiveInValue, Ops, Builder);
472500
}
473501

474502
// Finally, go ahead and lower all the gc arguments. There's no prefixed
@@ -478,10 +506,12 @@ lowerStatepointMetaArgs(SmallVectorImpl<SDValue> &Ops,
478506
// (base[0], ptr[0], base[1], ptr[1], ...)
479507
for (unsigned i = 0; i < SI.Bases.size(); ++i) {
480508
const Value *Base = SI.Bases[i];
481-
lowerIncomingStatepointValue(Builder.getValue(Base), Ops, Builder);
509+
lowerIncomingStatepointValue(Builder.getValue(Base), /*LiveInOnly*/ false,
510+
Ops, Builder);
482511

483512
const Value *Ptr = SI.Ptrs[i];
484-
lowerIncomingStatepointValue(Builder.getValue(Ptr), Ops, Builder);
513+
lowerIncomingStatepointValue(Builder.getValue(Ptr), /*LiveInOnly*/ false,
514+
Ops, Builder);
485515
}
486516

487517
// If there are any explicit spill slots passed to the statepoint, record

Diff for: ‎llvm/lib/CodeGen/TargetInstrInfo.cpp

+9-2
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,11 @@ static MachineInstr *foldPatchpoint(MachineFunction &MF, MachineInstr &MI,
448448
StartIdx = PatchPointOpers(&MI).getVarIdx();
449449
break;
450450
}
451+
case TargetOpcode::STATEPOINT: {
452+
// For statepoints, fold deopt and gc arguments, but not call arguments.
453+
StartIdx = StatepointOpers(&MI).getVarIdx();
454+
break;
455+
}
451456
default:
452457
llvm_unreachable("unexpected stackmap opcode");
453458
}
@@ -513,7 +518,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
513518
MachineInstr *NewMI = nullptr;
514519

515520
if (MI.getOpcode() == TargetOpcode::STACKMAP ||
516-
MI.getOpcode() == TargetOpcode::PATCHPOINT) {
521+
MI.getOpcode() == TargetOpcode::PATCHPOINT ||
522+
MI.getOpcode() == TargetOpcode::STATEPOINT) {
517523
// Fold stackmap/patchpoint.
518524
NewMI = foldPatchpoint(MF, MI, Ops, FI, *this);
519525
if (NewMI)
@@ -794,7 +800,8 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
794800
int FrameIndex = 0;
795801

796802
if ((MI.getOpcode() == TargetOpcode::STACKMAP ||
797-
MI.getOpcode() == TargetOpcode::PATCHPOINT) &&
803+
MI.getOpcode() == TargetOpcode::PATCHPOINT ||
804+
MI.getOpcode() == TargetOpcode::STATEPOINT) &&
798805
isLoadFromStackSlot(LoadMI, FrameIndex)) {
799806
// Fold stackmap/patchpoint.
800807
NewMI = foldPatchpoint(MF, MI, Ops, FrameIndex, *this);

Diff for: ‎llvm/lib/Transforms/Scalar/RewriteStatepointsForGC.cpp

+26
Original file line numberDiff line numberDiff line change
@@ -1273,6 +1273,24 @@ class DeferredReplacement {
12731273
};
12741274
}
12751275

1276+
static StringRef getDeoptLowering(CallSite CS) {
1277+
const char *DeoptLowering = "deopt-lowering";
1278+
if (CS.hasFnAttr(DeoptLowering)) {
1279+
// FIXME: CallSite has a *really* confusing interface around attributes
1280+
// with values.
1281+
const AttributeSet &CSAS = CS.getAttributes();
1282+
if (CSAS.hasAttribute(AttributeSet::FunctionIndex,
1283+
DeoptLowering))
1284+
return CSAS.getAttribute(AttributeSet::FunctionIndex,
1285+
DeoptLowering).getValueAsString();
1286+
Function *F = CS.getCalledFunction();
1287+
assert(F && F->hasFnAttribute(DeoptLowering));
1288+
return F->getFnAttribute(DeoptLowering).getValueAsString();
1289+
}
1290+
return "live-through";
1291+
}
1292+
1293+
12761294
static void
12771295
makeStatepointExplicitImpl(const CallSite CS, /* to replace */
12781296
const SmallVectorImpl<Value *> &BasePtrs,
@@ -1314,6 +1332,14 @@ makeStatepointExplicitImpl(const CallSite CS, /* to replace */
13141332
if (SD.StatepointID)
13151333
StatepointID = *SD.StatepointID;
13161334

1335+
// Pass through the requested lowering if any. The default is live-through.
1336+
StringRef DeoptLowering = getDeoptLowering(CS);
1337+
if (DeoptLowering.equals("live-in"))
1338+
Flags |= uint32_t(StatepointFlags::DeoptLiveIn);
1339+
else {
1340+
assert(DeoptLowering.equals("live-through") && "Unsupported value!");
1341+
}
1342+
13171343
Value *CallTarget = CS.getCalledValue();
13181344
if (Function *F = dyn_cast<Function>(CallTarget)) {
13191345
if (F->getIntrinsicID() == Intrinsic::experimental_deoptimize) {

Diff for: ‎llvm/test/CodeGen/X86/statepoint-live-in.ll

+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
; RUN: llc -O3 < %s | FileCheck %s
2+
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
3+
target triple = "x86_64-apple-macosx10.11.0"
4+
5+
declare void @bar() #0
6+
declare void @baz()
7+
8+
define void @test1(i32 %a) gc "statepoint-example" {
9+
entry:
10+
; We expect the argument to be passed in an extra register to bar
11+
; CHECK-LABEL: test1
12+
; CHECK: pushq %rax
13+
; CHECK-NEXT: Ltmp0:
14+
; CHECK-NEXT: .cfi_def_cfa_offset 16
15+
; CHECK-NEXT: callq _bar
16+
%statepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 %a)
17+
ret void
18+
}
19+
20+
define void @test2(i32 %a, i32 %b) gc "statepoint-example" {
21+
entry:
22+
; Because the first call clobbers esi, we have to move the values into
23+
; new registers. Note that they stay in the registers for both calls.
24+
; CHECK-LABEL: @test2
25+
; CHECK: movl %esi, %ebx
26+
; CHECK-NEXT: movl %edi, %ebp
27+
; CHECK-NEXT: callq _bar
28+
call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 2, i32 %a, i32 %b)
29+
call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 2, i32 %b, i32 %a)
30+
ret void
31+
}
32+
33+
define void @test3(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i) gc "statepoint-example" {
34+
entry:
35+
; TODO: We should have folded the reload into the statepoint.
36+
; CHECK-LABEL: @test3
37+
; CHECK: movl 32(%rsp), %r10d
38+
; CHECK-NEXT: movl 24(%rsp), %r11d
39+
; CHECK-NEXT: movl 16(%rsp), %eax
40+
; CHECK-NEXT: callq _bar
41+
%statepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 9, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i)
42+
ret void
43+
}
44+
45+
; This case just confirms that we don't crash when given more live values
46+
; than registers. This is a case where we *have* to use a stack slot.
47+
define void @test4(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n, i32 %o, i32 %p, i32 %q, i32 %r, i32 %s, i32 %t, i32 %u, i32 %v, i32 %w, i32 %x, i32 %y, i32 %z) gc "statepoint-example" {
48+
entry:
49+
; TODO: We should have folded the reload into the statepoint.
50+
; CHECK-LABEL: test4
51+
; CHECK: pushq %r15
52+
; CHECK: pushq %r14
53+
; CHECK: pushq %r13
54+
; CHECK: pushq %r12
55+
; CHECK: pushq %rbx
56+
; CHECK: pushq %rax
57+
; CHECK: movl 128(%rsp), %r13d
58+
; CHECK-NEXT: movl 120(%rsp), %r12d
59+
; CHECK-NEXT: movl 112(%rsp), %r15d
60+
; CHECK-NEXT: movl 104(%rsp), %r14d
61+
; CHECK-NEXT: movl 96(%rsp), %ebp
62+
; CHECK-NEXT: movl 88(%rsp), %ebx
63+
; CHECK-NEXT: movl 80(%rsp), %r11d
64+
; CHECK-NEXT: movl 72(%rsp), %r10d
65+
; CHECK-NEXT: movl 64(%rsp), %eax
66+
; CHECK-NEXT: callq _bar
67+
%statepoint_token1 = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 26, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n, i32 %o, i32 %p, i32 %q, i32 %r, i32 %s, i32 %t, i32 %u, i32 %v, i32 %w, i32 %x, i32 %y, i32 %z)
68+
ret void
69+
}
70+
71+
; A live-through gc-value must be spilled even if it is also a live-in deopt
72+
; value. For live-in, we could technically report the register copy, but from
73+
; a code quality perspective it's better to reuse the required stack slot so
74+
; as to put less stress on the register allocator for no benefit.
75+
define i32 addrspace(1)* @test5(i32 %a, i32 addrspace(1)* %p) gc "statepoint-example" {
76+
entry:
77+
; CHECK-LABEL: test5
78+
; CHECK: movq %rsi, (%rsp)
79+
; CHECK-NEXT: callq _bar
80+
%token = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 %a, i32 addrspace(1)* %p, i32 addrspace(1)* %p)
81+
%p2 = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %token, i32 9, i32 9)
82+
ret i32 addrspace(1)* %p2
83+
}
84+
85+
; Show the interaction of live-through spilling followed by live-in.
86+
define void @test6(i32 %a) gc "statepoint-example" {
87+
entry:
88+
; TODO: We could have reused the previous spill slot at zero additional cost.
89+
; CHECK-LABEL: test6
90+
; CHECK: movl %edi, %ebx
91+
; CHECK: movl %ebx, 12(%rsp)
92+
; CHECK-NEXT: callq _baz
93+
; CHECK-NEXT: Ltmp30:
94+
; CHECK-NEXT: callq _bar
95+
call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @baz, i32 0, i32 0, i32 0, i32 1, i32 %a)
96+
call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 %a)
97+
ret void
98+
}
99+
100+
101+
; CHECK: Ltmp1-_test1
102+
; CHECK: .byte 1
103+
; CHECK-NEXT: .byte 4
104+
; CHECK-NEXT: .short 5
105+
; CHECK-NEXT: .long 0
106+
107+
; CHECK: Ltmp7-_test2
108+
; CHECK: .byte 1
109+
; CHECK-NEXT: .byte 4
110+
; CHECK-NEXT: .short 6
111+
; CHECK-NEXT: .long 0
112+
; CHECK: .byte 1
113+
; CHECK-NEXT: .byte 4
114+
; CHECK-NEXT: .short 3
115+
; CHECK-NEXT: .long 0
116+
; CHECK: Ltmp8-_test2
117+
; CHECK: .byte 1
118+
; CHECK-NEXT: .byte 4
119+
; CHECK-NEXT: .short 3
120+
; CHECK-NEXT: .long 0
121+
; CHECK: .byte 1
122+
; CHECK-NEXT: .byte 4
123+
; CHECK-NEXT: .short 6
124+
; CHECK-NEXT: .long 0
125+
126+
declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32, void ()*, i32, i32, ...)
127+
declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token, i32, i32)
128+
129+
130+
attributes #0 = { "deopt-lowering"="live-in" }
131+
attributes #1 = { "deopt-lowering"="live-through" }
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
; RUN: opt -rewrite-statepoints-for-gc -S < %s | FileCheck %s
2+
; Check that the "deopt-lowering" function attribute gets transcoded into
3+
; flags on the resulting statepoint
4+
5+
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
6+
target triple = "x86_64-apple-macosx10.11.0"
7+
8+
declare void @foo()
9+
declare void @bar() "deopt-lowering"="live-in"
10+
declare void @baz() "deopt-lowering"="live-through"
11+
12+
define void @test1() gc "statepoint-example" {
13+
; CHECK-LABEL: @test1(
14+
; CHECK: @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 1, i32 57)
15+
; CHECK: @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @bar, i32 0, i32 2, i32 0, i32 1, i32 42)
16+
; CHECK: @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @baz, i32 0, i32 0, i32 0, i32 1, i32 13)
17+
18+
entry:
19+
call void @foo() [ "deopt"(i32 57) ]
20+
call void @bar() [ "deopt"(i32 42) ]
21+
call void @baz() [ "deopt"(i32 13) ]
22+
ret void
23+
}

0 commit comments

Comments
 (0)
Please sign in to comment.