This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Prefer static frame index for STATEPOINT liveness args
ClosedPublic

Authored by cherry on Oct 30 2018, 1:33 PM.

Download Raw Diff

Details

Reviewers

thanm
niravd
reames

Commits

rG0e0a8a3fee0c: [CodeGen] Prefer static frame index for STATEPOINT liveness args
rL347998: [CodeGen] Prefer static frame index for STATEPOINT liveness args

Summary

If a given liveness arg of STATEPOINT is at a fixed frame index
(e.g. a function argument passed on stack), prefer to use this
fixed location even the address is also in a register. If we use
the register it will generate a spill, which is not necessary
since the fixed frame index can be directly recorded in the stack
map.

Diff Detail

Repository: rL LLVM

Event Timeline

cherry created this revision.Oct 30 2018, 1:33 PM

Herald added subscribers: llvm-commits, arphaman. · View Herald TranscriptOct 30 2018, 1:33 PM

anna added a subscriber: anna.Nov 7 2018, 11:56 AM

anna added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1383 ↗	(On Diff #171776)	is this just being conservative against a corner case?

Thanks for the review!

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1383 ↗	(On Diff #171776)	INT_MAX is a sentinel value that getArgumentFrameIndex returns meaning "no". Other uses of getArgumentFrameIndex have similar checks.

niravd added inline comments.Nov 9 2018, 8:19 AM

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
527 ↗	(On Diff #171776)	Can we fold this into getValue/getValueImpl? IIUC, this is a reasonable expectation for getValue for these inputs and it'd be nicer there. I haven't checked, but I think it must be the case that all possible matches are always non-register values in any context we encounter them so there's no reason for getValue to be otherwise defined.

reames requested changes to this revision.Nov 25 2018, 7:12 PM

reames added a subscriber: reames.

reames added inline comments.

test/CodeGen/X86/statepoint-stack-usage.ll
143 ↗	(On Diff #171776)	Can you expand the check statements here? And check the actual debug output for the stackmap? I can't judge whether the encoding is correct from the test.

This revision now requires changes to proceed.Nov 25 2018, 7:12 PM

cherry updated this revision to Diff 175703.Nov 28 2018, 9:10 AM

cherry marked 3 inline comments as done.

cherry added inline comments.

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
527 ↗	(On Diff #171776)	I tried this, but it turns out a number of tests failing, mostly on 32-bit architectures handling 64-bit values. I think I'll do it in the current way here for this patch, as a targeted change. And I can look into the 32-bit architecture issue and do a cleanup later. Thanks.
test/CodeGen/X86/statepoint-stack-usage.ll
143 ↗	(On Diff #171776)	Done. Added CHECK for stack map output, and moved to statepoint-stackmap-format.ll where it emits and checks the stack map output.

reames added inline comments.Nov 28 2018, 10:04 AM

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
529 ↗	(On Diff #175703)	Hm, I'm really uncomfortable with this bit of code. For two reasons: I agree with the general idea this should be generic and buried inside getValue. If it can't be, it makes me want to understand why because I'm missing something. You shouldn't need to change handling of allocas since that should already work. Is it possible to let the virtual register be lowered and then peak back through to the source? Basically, I'm looking to be convinced this is the right approach.

cherry marked an inline comment as done.Nov 28 2018, 11:37 AM

cherry added inline comments.

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
529 ↗	(On Diff #175703)	You're right that allocas already work. I'll remove that case. I'm happy to try other approach, either trying harder in doing this in getValue, or some other way.

In addition to the detailed implementation comment below, I thought of a possible semantic problem. Per the ABI, who "owns" the memory for the arguments? Does either the callee or caller assume it's immutable? If so, then your optimization needs to be restricted to a non-relocating collector since otherwise the collector might be updating a memory location assumed by AA to be immutable and bad things could happen...

I'm 97% sure we do assume arguments are immutable within a function...

lib/CodeGen/SelectionDAG/StatepointLowering.cpp
529 ↗	(On Diff #175703)	I have two alternatives I'd like you to try. Both start with removing the code from the generic getNonRegisterValue since I'm unsure about the implications of that. Option 1 - Right here, just check to see if the Value V is an argument of the right type and inline the logic from getNonRegisterValue into the only user. Option 2 - Inside lowerIncomingStatepointValue, add a special case which matches the DAG pattern Load(FrameIndex) and uses the optimized lowering. There are examples of things like this in the argument handling in SelectionDAG, the rough structure will look something like: if (N.getNode()) // Check if frame index is available. if (LoadSDNode LNode = dyn_cast<LoadSDNode>(N.getNode())) if (FrameIndexSDNode FINode = dyn_cast<FrameIndexSDNode>(LNode->getBasePtr().getNode())) Op = MachineOperand::CreateFI(FINode->getIndex());

I think the argument, as an SSA Value, should be immutable. But here, argument at fixed frame index is, in the IR level, a pointer argument with byval attribute. We don't modify the pointer. A function could modify the content of the memory that argument points to, just like a regular pointer argument. The only difference is that we know the pointer points to a fixed stack location.

If the GC were to modify the content of the pointed-to memory, it would do it with or without this change. So I don't think this change affects that.

I tried Option 1. It works, thanks!

I also got another approach, which handles arguments at fixed stack locations more general in getValue. See https://reviews.llvm.org/D55072. I have to tweak some tests slightly. Let me know which one do you think is better. Thanks!

LGTM in the current form. The alternate framing might also be good, but we can come back and undo this later if that works out.

This revision is now accepted and ready to land.Nov 29 2018, 2:57 PM

Thanks for the review!

Closed by commit rL347998: [CodeGen] Prefer static frame index for STATEPOINT liveness args (authored by thanm). · Explain WhyNov 30 2018, 8:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

StatepointLowering.cpp

11 lines

test/

CodeGen/

X86/

statepoint-stackmap-format.ll

87 lines

Diff 176132

llvm/trunk/lib/CodeGen/SelectionDAG/StatepointLowering.cpp

Show First 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	#endif
// lowered. Note that this is the number of Values not the		// lowered. Note that this is the number of Values not the
// number of SDValues required to lower them.		// number of SDValues required to lower them.
const int NumVMSArgs = SI.DeoptState.size();		const int NumVMSArgs = SI.DeoptState.size();
pushStackMapConstant(Ops, Builder, NumVMSArgs);		pushStackMapConstant(Ops, Builder, NumVMSArgs);

// The vm state arguments are lowered in an opaque manner. We do not know		// The vm state arguments are lowered in an opaque manner. We do not know
// what type of values are contained within.		// what type of values are contained within.
for (const Value *V : SI.DeoptState) {		for (const Value *V : SI.DeoptState) {
SDValue Incoming = Builder.getValue(V);		SDValue Incoming;
		// If this is a function argument at a static frame index, generate it as
		// the frame index.
		if (const Argument *Arg = dyn_cast<Argument>(V)) {
		int FI = Builder.FuncInfo.getArgumentFrameIndex(Arg);
		if (FI != INT_MAX)
		Incoming = Builder.DAG.getFrameIndex(FI, Builder.getFrameIndexTy());
		}
		if (!Incoming.getNode())
		Incoming = Builder.getValue(V);
const bool LiveInValue = LiveInDeopt && !isGCValue(V);		const bool LiveInValue = LiveInDeopt && !isGCValue(V);
lowerIncomingStatepointValue(Incoming, LiveInValue, Ops, Builder);		lowerIncomingStatepointValue(Incoming, LiveInValue, Ops, Builder);
}		}

// Finally, go ahead and lower all the gc arguments. There's no prefixed		// Finally, go ahead and lower all the gc arguments. There's no prefixed
// length for this one. After lowering, we'll have the base and pointer		// length for this one. After lowering, we'll have the base and pointer
// arrays interwoven with each (lowered) base pointer immediately followed by		// arrays interwoven with each (lowered) base pointer immediately followed by
// it's (lowered) derived pointer. i.e		// it's (lowered) derived pointer. i.e
▲ Show 20 Lines • Show All 488 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/statepoint-stackmap-format.ll

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	define i32 @test_spadj(i32 addrspace(1)* %p) gc "statepoint-example" {
; CHECK: addq $16, %rsp		; CHECK: addq $16, %rsp
; CHECK: movq (%rsp)		; CHECK: movq (%rsp)
%statepoint_token = call token (i64, i32, void (i64, i64, i64, i64, i64, i64, i64, i64), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidi64i64i64i64i64i64i64i64f(i64 0, i32 0, void (i64, i64, i64, i64, i64, i64, i64, i64) @many_arg, i32 8, i32 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i32 0, i32 0, i32 addrspace(1)* %p)		%statepoint_token = call token (i64, i32, void (i64, i64, i64, i64, i64, i64, i64, i64), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidi64i64i64i64i64i64i64i64f(i64 0, i32 0, void (i64, i64, i64, i64, i64, i64, i64, i64) @many_arg, i32 8, i32 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i64 0, i32 0, i32 0, i32 addrspace(1)* %p)
%p.relocated = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %statepoint_token, i32 15, i32 15) ; (%p, %p)		%p.relocated = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token %statepoint_token, i32 15, i32 15) ; (%p, %p)
%ld = load i32, i32 addrspace(1)* %p.relocated		%ld = load i32, i32 addrspace(1)* %p.relocated
ret i32 %ld		ret i32 %ld
}		}

		; Test that function arguments at fixed stack offset
		; can be directly encoded in the stack map, without
		; spilling.
		%struct = type { i64, i64, i64 }

		declare void @use(%struct*)

		define void @test_fixed_arg(%struct* byval %x) gc "statepoint-example" {
		; CHECK-LABEL: test_fixed_arg
		; CHECK: pushq %rax
		; CHECK: leaq 16(%rsp), %rdi
		; Should not spill fixed stack address.
		; CHECK-NOT: movq %rdi, (%rsp)
		; CHECK: callq use
		; CHECK: popq %rax
		; CHECK: retq
		entry:
		br label %bb

		bb: ; preds = %entry
		%statepoint_token = call token (i64, i32, void (%struct), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidp0s_structsf(i64 0, i32 0, void (%struct) @use, i32 1, i32 0, %struct* %x, i32 0, i32 1, %struct* %x)
		ret void
		}

declare token @llvm.experimental.gc.statepoint.p0f_i1f(i64, i32, i1 ()*, i32, i32, ...)		declare token @llvm.experimental.gc.statepoint.p0f_i1f(i64, i32, i1 ()*, i32, i32, ...)
declare token @llvm.experimental.gc.statepoint.p0f_isVoidi64i64i64i64i64i64i64i64f(i64, i32, void (i64, i64, i64, i64, i64, i64, i64, i64)*, i32, i32, ...)		declare token @llvm.experimental.gc.statepoint.p0f_isVoidi64i64i64i64i64i64i64i64f(i64, i32, void (i64, i64, i64, i64, i64, i64, i64, i64)*, i32, i32, ...)
		declare token @llvm.experimental.gc.statepoint.p0f_isVoidp0s_structsf(i64, i32, void (%struct), i32, i32, ...)
declare i1 @llvm.experimental.gc.result.i1(token)		declare i1 @llvm.experimental.gc.result.i1(token)
declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token, i32, i32) #3		declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(token, i32, i32) #3

; CHECK-LABEL: .section .llvm_stackmaps		; CHECK-LABEL: .section .llvm_stackmaps
; CHECK-NEXT: __LLVM_StackMaps:		; CHECK-NEXT: __LLVM_StackMaps:
; Header		; Header
; CHECK-NEXT: .byte 3		; CHECK-NEXT: .byte 3
; CHECK-NEXT: .byte 0		; CHECK-NEXT: .byte 0
; CHECK-NEXT: .short 0		; CHECK-NEXT: .short 0
; Num Functions		; Num Functions
; CHECK-NEXT: .long 4		; CHECK-NEXT: .long 5
; Num LargeConstants		; Num LargeConstants
; CHECK-NEXT: .long 0		; CHECK-NEXT: .long 0
; Num Callsites		; Num Callsites
; CHECK-NEXT: .long 4		; CHECK-NEXT: .long 5

; Functions and stack size		; Functions and stack size
; CHECK-NEXT: .quad test		; CHECK-NEXT: .quad test
; CHECK-NEXT: .quad 40		; CHECK-NEXT: .quad 40
; CHECK-NEXT: .quad 1		; CHECK-NEXT: .quad 1
; CHECK-NEXT: .quad test_derived_arg		; CHECK-NEXT: .quad test_derived_arg
; CHECK-NEXT: .quad 40		; CHECK-NEXT: .quad 40
; CHECK-NEXT: .quad 1		; CHECK-NEXT: .quad 1
; CHECK-NEXT: .quad test_id		; CHECK-NEXT: .quad test_id
; CHECK-NEXT: .quad 8		; CHECK-NEXT: .quad 8
; CHECK-NEXT: .quad 1		; CHECK-NEXT: .quad 1
; CHECK-NEXT: .quad test_spadj		; CHECK-NEXT: .quad test_spadj
; CHECK-NEXT: .quad 8		; CHECK-NEXT: .quad 8
; CHECK-NEXT: .quad 1		; CHECK-NEXT: .quad 1
		; CHECK-NEXT: .quad test_fixed_arg
		; CHECK-NEXT: .quad 8
		; CHECK-NEXT: .quad 1

;		;
; test		; test
;		;

; Statepoint ID		; Statepoint ID
; CHECK-NEXT: .quad 0		; CHECK-NEXT: .quad 0

▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines
; CHECK: .short 7		; CHECK: .short 7
; CHECK-NEXT: .short 0		; CHECK-NEXT: .short 0
; CHECK: .long 16		; CHECK: .long 16

; No padding or LiveOuts		; No padding or LiveOuts
; CHECK: .short 0		; CHECK: .short 0
; CHECK: .short 0		; CHECK: .short 0
; CHECK: .p2align 3		; CHECK: .p2align 3

		;
		; test_fixed_arg

		; Statepoint ID
		; CHECK-NEXT: .quad 0

		; Instruction Offset
		; CHECK-NEXT: .long .Ltmp4-test_fixed_arg

		; Reserved:
		; CHECK: .short 0

		; NumLocations:
		; CHECK: .short 4

		; StkMapRecord[0]:
		; SmallConstant(0):
		; CHECK: .byte 4
		; CHECK-NEXT: .byte 0
		; CHECK: .short 8
		; CHECK: .short 0
		; CHECK-NEXT: .short 0
		; CHECK: .long 0

		; StkMapRecord[1]:
		; SmallConstant(0):
		; CHECK: .byte 4
		; CHECK-NEXT: .byte 0
		; CHECK: .short 8
		; CHECK: .short 0
		; CHECK-NEXT: .short 0
		; CHECK: .long 0

		; StkMapRecord[2]:
		; SmallConstant(1):
		; CHECK: .byte 4
		; CHECK-NEXT: .byte 0
		; CHECK: .short 8
		; CHECK: .short 0
		; CHECK-NEXT: .short 0
		; CHECK: .long 1

		; StkMapRecord[3]:
		; Direct RSP+16
		; CHECK: .byte 2
		; CHECK-NEXT: .byte 0
		; CHECK: .short 8
		; CHECK: .short 7
		; CHECK-NEXT: .short 0
		; CHECK: .long 16

		; No padding or LiveOuts
		; CHECK: .short 0
		; CHECK: .short 0
		; CHECK: .p2align 3