Download Raw Diff

Details

Reviewers

andreadb
DavidKreitzer
aaboud
tari

Commits

rG0389f62879fc: x86 interrupt calling convention: re-align stack pointer on 64-bit if an error…
rL299383: x86 interrupt calling convention: re-align stack pointer on 64-bit if an error…

Summary

The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated.

This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue.

Fixes Bug 26413

Diff Detail

Repository: rL LLVM

Event Timeline

phil-opp created this revision.Feb 16 2017, 11:44 AM

phil-opp edited the summary of this revision. (Show Details)Feb 16 2017, 11:54 AM

phil-opp edited the summary of this revision. (Show Details)

This should also fix the 64-bit case of Bug 26477. I'm not quite sure what's the problem in the 32-bit case.

aaboud added a reviewer: DavidKreitzer.Feb 16 2017, 1:14 PM

Please, generate the patch using this command:
svn diff --diff-cmd=diff -x -U999999 > file.patch

Aligning the stack is certainly the right thing to do. But this isn't the only problem with 26413. I will add a note to the report explaining what I mean.

As far as I am concerned, this patch looks good, but I would like someone else to comment on whether this is best mechanism for accomplishing the desired stack alignment.

Thanks,
Dave

lib/Target/X86/X86ISelLowering.cpp
3117 ↗	(On Diff #88761)	Please add a '.' here.

Amjad pointed out to me that the incoming alignment to an interrupt handler is only guaranteed to be 0 mod 8, not 8 mod 16 as is the case with the normal x86-64 ABI. HJ mentions this in 26477.

So this fix is insufficient. We need to dynamically realign the stack as in HJ's 32-bit example.

Also, I added notes to 26413 explaining the issues with attempting to save & restore XMM/YMM/ZMM register state in an interrupt handler.

Amjad pointed out to me that the incoming alignment to an interrupt handler is only guaranteed to be 0 mod 8, not 8 mod 16 as is the case with the normal x86-64 ABI. HJ mentions this in 26477.

The x86_64 architecture unconditionally aligns the stack on a 16-byte boundary when an interrupt occurs. From the AMD64 manual (Section 8.9.3):

In legacy mode, the interrupt-stack pointer can be aligned at any address boundary. Long mode, however, aligns the stack on a 16-byte boundary. This alignment is performed by the processor in hardware before pushing items onto the stack frame. The revious RSP is saved unconditionally on the new stack by the interrupt mechanism. A subsequent IRET instruction automatically restores the previous RSP.

It even says:

Aligning the stack on a 16-byte boundary allows optimal performance for saving and restoring the 16-byte XMM registers. The interrupt handler can save and restore the XMM registers using the faster 16-byte aligned loads and stores (MOVAPS), rather than unaligned loads and stores (MOVUPS).

The problem is that the CPU pushes an error code for some exceptions, which destroys the 16-byte alignment. This patches fixes this problem.

Regarding the YMM and ZMM registers: I think they are already saved if the target supports them: https://github.com/llvm-mirror/llvm/blob/47cf6aadec0bc58d970052092ee85a69b3625792/lib/Target/X86/X86RegisterInfo.cpp#L336-L342

The situation on 32-bit x86 is vastly different. The CPU performs no stack alignment at all, so you're correct that we need dynamic realignment.

re-created the patch with -U999999

add a . in comment

No, there is no need to realign stack in 64-bit mode.

No, there is no need to realign stack in 64-bit mode.

I'm not sure if I understand you correctly. Yes, there is no need to dynamically realign the stack in 64-bit mode since the CPU aligns the stack on a 16 byte boundary on interrupt entry. However, for some exceptions, the CPU pushes an 8 byte error code afterwards. In that case it is necessary to subtract another 8 bytes from RSP to restore the 16-byte alignment.

What's the state of this? Are there any problems with this PR?

In D30049#682363, @phil-opp wrote:

No, there is no need to realign stack in 64-bit mode.

I'm not sure if I understand you correctly. Yes, there is no need to dynamically realign the stack in 64-bit mode since the CPU aligns the stack on a 16 byte boundary on interrupt entry. However, for some exceptions, the CPU pushes an 8 byte error code afterwards. In that case it is necessary to subtract another 8 bytes from RSP to restore the 16-byte alignment.

Can we verify how stack is aligned when there is an error code?

Can we verify how stack is aligned when there is an error code?

The documentation that Phil cited is pretty clear in this regard, HJ. In addition to the following text (where I removed some unrelated steps), there is figure 8-13.

In long mode, when a control transfer to an interrupt handler occurs, the processor performs the
following:

Aligns the new interrupt-stack frame by masking RSP with FFFF_FFFF_FFFF_FFF0h.

...

Pushes the return stack pointer (old SS:RSP) onto the new stack. The SS value is padded with six

bytes to form a quadword.

Pushes the 64-bit RFLAGS register onto the stack. The upper 32 bits of the RFLAGS image on

the stack are written as zeros.
...

Pushes the return CS register and RIP register onto the stack. The CS value is padded with six

bytes to form a quadword.

If the interrupt vector number has an error code associated with it, pushes the error code onto the

stack. The error code is padded with four bytes to form a quadword.

So this patch LGTM. But I think you should get someone else to confirm the correctness of lines 962-970. Amjad, can you do that?

Thanks for your patience.

Now that we agreed on why this fix is correct, the code LGTM.

This revision is now accepted and ready to land.Mar 14 2017, 6:50 AM

I just tested this version again and found a bug in the implementation: I forgot to adjust the offset of the error code and the pointer to the exception stack frame. So please do not merge this yet.

The argument offsets (exception stack frame and error code) are now updated if the stack is aligned. I also updated the tests accordingly.

(updated the test_isr_clobbers test too, even though the CHECK-SSE-NEXT directives are broken at the moment)

Code looks good, but I am wondering about one thing, see below.

test/CodeGen/X86/x86-64-intrcc.ll
33 ↗	(On Diff #91781)	I am wondering if this push was not generated before your last fix? Did you just forget to update the test? or we needed the last change to have this push?
35 ↗	(On Diff #91781)	This is a good catch. Indeed we should update the argument offsets after we re-align the stack with +8 bytes.

phil-opp added inline comments.Mar 15 2017, 8:24 AM

test/CodeGen/X86/x86-64-intrcc.ll
33 ↗	(On Diff #91781)	I just forgot to update this test, because it still succeeded (it's only a `CHECK` and not a `CHECK-NEXT`). However, by adding the additional check we ensure that the stack alignment isn't removed accidentally in the future.

Good.
I think this patch is ready now to land.
Anybody has any final comment?

Not from me. Looks good.

It seems like this patch is ready to land. I don't have SVN access, so it would be great if someone could commit it for me.

Friendly ping :)

I will commit that.
Sorry for the late response.

Closed by commit rL299383: x86 interrupt calling convention: re-align stack pointer on 64-bit if an error… (authored by aaboud). · Explain WhyApr 3 2017, 1:41 PM

This revision was automatically updated to reflect the committed changes.

Diff 93936

llvm/trunk/lib/Target/X86/X86FrameLowering.cpp

Show First 20 Lines • Show All 982 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitPrologue(MachineFunction &MF,
// The default stack probe size is 4096 if the function has no stackprobesize		// The default stack probe size is 4096 if the function has no stackprobesize
// attribute.		// attribute.
unsigned StackProbeSize = 4096;		unsigned StackProbeSize = 4096;
if (Fn->hasFnAttribute("stack-probe-size"))		if (Fn->hasFnAttribute("stack-probe-size"))
Fn->getFnAttribute("stack-probe-size")		Fn->getFnAttribute("stack-probe-size")
.getValueAsString()		.getValueAsString()
.getAsInteger(0, StackProbeSize);		.getAsInteger(0, StackProbeSize);

		// Re-align the stack on 64-bit if the x86-interrupt calling convention is
		// used and an error code was pushed, since the x86-64 ABI requires a 16-byte
		// stack alignment.
		if (Fn->getCallingConv() == CallingConv::X86_INTR && Is64Bit &&
		Fn->arg_size() == 2) {
		StackSize += 8;
		MFI.setStackSize(StackSize);
		emitSPUpdate(MBB, MBBI, -8, /InEpilogue=/false);
		}

// If this is x86-64 and the Red Zone is not disabled, if we are a leaf		// If this is x86-64 and the Red Zone is not disabled, if we are a leaf
// function, and use up to 128 bytes of stack space, don't have a frame		// function, and use up to 128 bytes of stack space, don't have a frame
// pointer, calls, or dynamic alloca then we do not need to adjust the		// pointer, calls, or dynamic alloca then we do not need to adjust the
// stack pointer (we fit in the Red Zone). We also check that we don't		// stack pointer (we fit in the Red Zone). We also check that we don't
// push and pop from the stack.		// push and pop from the stack.
if (Is64Bit && !Fn->hasFnAttribute(Attribute::NoRedZone) &&		if (Is64Bit && !Fn->hasFnAttribute(Attribute::NoRedZone) &&
!TRI->needsStackRealignment(MF) &&		!TRI->needsStackRealignment(MF) &&
!MFI.hasVarSizedObjects() && // No dynamic alloca.		!MFI.hasVarSizedObjects() && // No dynamic alloca.
▲ Show 20 Lines • Show All 2,035 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,785 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerMemArgument(SDValue Chain, CallingConv::ID CallConv,
// taken by a return address.		// taken by a return address.
int Offset = 0;		int Offset = 0;
if (CallConv == CallingConv::X86_INTR) {		if (CallConv == CallingConv::X86_INTR) {
// X86 interrupts may take one or two arguments.		// X86 interrupts may take one or two arguments.
// On the stack there will be no return address as in regular call.		// On the stack there will be no return address as in regular call.
// Offset of last argument need to be set to -4/-8 bytes.		// Offset of last argument need to be set to -4/-8 bytes.
// Where offset of the first argument out of two, should be set to 0 bytes.		// Where offset of the first argument out of two, should be set to 0 bytes.
Offset = (Subtarget.is64Bit() ? 8 : 4) * ((i + 1) % Ins.size() - 1);		Offset = (Subtarget.is64Bit() ? 8 : 4) * ((i + 1) % Ins.size() - 1);
		if (Subtarget.is64Bit() && Ins.size() == 2) {
		// The stack pointer needs to be realigned for 64 bit handlers with error
		// code, so the argument offset changes by 8 bytes.
		Offset += 8;
		}
}		}

// FIXME: For now, all byval parameter objects are marked mutable. This can be		// FIXME: For now, all byval parameter objects are marked mutable. This can be
// changed with more analysis.		// changed with more analysis.
// In case of tail call optimization mark all arguments mutable. Since they		// In case of tail call optimization mark all arguments mutable. Since they
// could be overwritten by lowering of arguments in case of a tail call.		// could be overwritten by lowering of arguments in case of a tail call.
if (Flags.isByVal()) {		if (Flags.isByVal()) {
unsigned Bytes = Flags.getByValSize();		unsigned Bytes = Flags.getByValSize();
▲ Show 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	if (isVarArg && MFI.hasMustTailInVarArgFunc()) {
}		}
}		}

// Some CCs need callee pop.		// Some CCs need callee pop.
if (X86::isCalleePop(CallConv, Is64Bit, isVarArg,		if (X86::isCalleePop(CallConv, Is64Bit, isVarArg,
MF.getTarget().Options.GuaranteedTailCallOpt)) {		MF.getTarget().Options.GuaranteedTailCallOpt)) {
FuncInfo->setBytesToPopOnReturn(StackSize); // Callee pops everything.		FuncInfo->setBytesToPopOnReturn(StackSize); // Callee pops everything.
} else if (CallConv == CallingConv::X86_INTR && Ins.size() == 2) {		} else if (CallConv == CallingConv::X86_INTR && Ins.size() == 2) {
// X86 interrupts must pop the error code if present		// X86 interrupts must pop the error code (and the alignment padding) if
FuncInfo->setBytesToPopOnReturn(Is64Bit ? 8 : 4);		// present.
		FuncInfo->setBytesToPopOnReturn(Is64Bit ? 16 : 4);
} else {		} else {
FuncInfo->setBytesToPopOnReturn(0); // Callee pops nothing.		FuncInfo->setBytesToPopOnReturn(0); // Callee pops nothing.
// If this is an sret function, the return should pop the hidden pointer.		// If this is an sret function, the return should pop the hidden pointer.
if (!Is64Bit && !canGuaranteeTCO(CallConv) &&		if (!Is64Bit && !canGuaranteeTCO(CallConv) &&
!Subtarget.getTargetTriple().isOSMSVCRT() &&		!Subtarget.getTargetTriple().isOSMSVCRT() &&
argsAreStructReturn(Ins, Subtarget.isTargetMCU()) == StackStructReturn)		argsAreStructReturn(Ins, Subtarget.isTargetMCU()) == StackStructReturn)
FuncInfo->setBytesToPopOnReturn(4);		FuncInfo->setBytesToPopOnReturn(4);
}		}
▲ Show 20 Lines • Show All 32,747 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/x86-64-intrcc-nosse.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=-sse < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=-sse < %s \| FileCheck %s

	%struct.interrupt_frame = type { i64, i64, i64, i64, i64 }			%struct.interrupt_frame = type { i64, i64, i64, i64, i64 }

	@llvm.used = appending global [1 x i8] [i8 bitcast (void (%struct.interrupt_frame, i64) @test_isr_sse_clobbers to i8*)], section "llvm.metadata"			@llvm.used = appending global [1 x i8] [i8 bitcast (void (%struct.interrupt_frame, i64) @test_isr_sse_clobbers to i8*)], section "llvm.metadata"

	; Clobbered SSE must not be saved when the target doesn't support SSE			; Clobbered SSE must not be saved when the target doesn't support SSE
	define x86_intrcc void @test_isr_sse_clobbers(%struct.interrupt_frame* %frame, i64 %ecode) {			define x86_intrcc void @test_isr_sse_clobbers(%struct.interrupt_frame* %frame, i64 %ecode) {
	; CHECK-LABEL: test_isr_sse_clobbers:			; CHECK-LABEL: test_isr_sse_clobbers:
	; CHECK: # BB#0:			; CHECK: # BB#0:
				; CHECK-NEXT: pushq %rax
	; CHECK-NEXT: cld			; CHECK-NEXT: cld
	; CHECK-NEXT: #APP			; CHECK-NEXT: #APP
	; CHECK-NEXT: #NO_APP			; CHECK-NEXT: #NO_APP
	; CHECK-NEXT: addq $8, %rsp			; CHECK-NEXT: addq $16, %rsp
	; CHECK-NEXT: iretq			; CHECK-NEXT: iretq
	call void asm sideeffect "", "~{xmm0},~{xmm6}"()			call void asm sideeffect "", "~{xmm0},~{xmm6}"()
	ret void			ret void
	}			}

llvm/trunk/test/CodeGen/X86/x86-64-intrcc.ll

Show All 24 Lines	define x86_intrcc void @test_isr_no_ecode(%struct.interrupt_frame* %frame) {
ret void		ret void
}		}

; Spills rax and rcx, putting original rsp at +16. Stack is adjusted up another 8 bytes		; Spills rax and rcx, putting original rsp at +16. Stack is adjusted up another 8 bytes
; before return, popping the error code.		; before return, popping the error code.
define x86_intrcc void @test_isr_ecode(%struct.interrupt_frame* %frame, i64 %ecode) {		define x86_intrcc void @test_isr_ecode(%struct.interrupt_frame* %frame, i64 %ecode) {
; CHECK-LABEL: test_isr_ecode		; CHECK-LABEL: test_isr_ecode
; CHECK: pushq %rax		; CHECK: pushq %rax
		; CHECK: pushq %rax
; CHECK: pushq %rcx		; CHECK: pushq %rcx
; CHECK: movq 16(%rsp), %rax		; CHECK: movq 24(%rsp), %rax
; CHECK: movq 40(%rsp), %rcx		; CHECK: movq 48(%rsp), %rcx
; CHECK: popq %rcx		; CHECK: popq %rcx
; CHECK: popq %rax		; CHECK: popq %rax
; CHECK: addq $8, %rsp		; CHECK: addq $16, %rsp
; CHECK: iretq		; CHECK: iretq
; CHECK0-LABEL: test_isr_ecode		; CHECK0-LABEL: test_isr_ecode
; CHECK0: pushq %rax		; CHECK0: pushq %rax
		; CHECK0: pushq %rax
; CHECK0: pushq %rcx		; CHECK0: pushq %rcx
; CHECK0: movq 16(%rsp), %rax		; CHECK0: movq 24(%rsp), %rax
; CHECK0: leaq 24(%rsp), %rcx		; CHECK0: leaq 32(%rsp), %rcx
; CHECK0: movq 16(%rcx), %rcx		; CHECK0: movq 16(%rcx), %rcx
; CHECK0: popq %rcx		; CHECK0: popq %rcx
; CHECK0: popq %rax		; CHECK0: popq %rax
; CHECK0: addq $8, %rsp		; CHECK0: addq $16, %rsp
; CHECK0: iretq		; CHECK0: iretq
%pflags = getelementptr inbounds %struct.interrupt_frame, %struct.interrupt_frame* %frame, i32 0, i32 2		%pflags = getelementptr inbounds %struct.interrupt_frame, %struct.interrupt_frame* %frame, i32 0, i32 2
%flags = load i64, i64* %pflags, align 4		%flags = load i64, i64* %pflags, align 4
call void asm sideeffect "", "r,r"(i64 %flags, i64 %ecode)		call void asm sideeffect "", "r,r"(i64 %flags, i64 %ecode)
ret void		ret void
}		}

; All clobbered registers must be saved		; All clobbered registers must be saved
define x86_intrcc void @test_isr_clobbers(%struct.interrupt_frame* %frame, i64 %ecode) {		define x86_intrcc void @test_isr_clobbers(%struct.interrupt_frame* %frame, i64 %ecode) {
call void asm sideeffect "", "~{rax},~{rbx},~{rbp},~{r11},~{xmm0}"()		call void asm sideeffect "", "~{rax},~{rbx},~{rbp},~{r11},~{xmm0}"()
; CHECK-LABEL: test_isr_clobbers		; CHECK-LABEL: test_isr_clobbers
; CHECK-SSE-NEXT: pushq %rax		; CHECK-SSE-NEXT: pushq %rax
		; CHECK-SSE-NEXT: pushq %rax
; CHECK-SSE-NEXT; pushq %r11		; CHECK-SSE-NEXT; pushq %r11
; CHECK-SSE-NEXT: pushq %rbp		; CHECK-SSE-NEXT: pushq %rbp
; CHECK-SSE-NEXT: pushq %rbx		; CHECK-SSE-NEXT: pushq %rbx
; CHECK-SSE-NEXT: movaps %xmm0		; CHECK-SSE-NEXT: movaps %xmm0
; CHECK-SSE-NEXT: movaps %xmm0		; CHECK-SSE-NEXT: movaps %xmm0
; CHECK-SSE-NEXT: popq %rbx		; CHECK-SSE-NEXT: popq %rbx
; CHECK-SSE-NEXT: popq %rbp		; CHECK-SSE-NEXT: popq %rbp
; CHECK-SSE-NEXT: popq %r11		; CHECK-SSE-NEXT: popq %r11
; CHECK-SSE-NEXT: popq %rax		; CHECK-SSE-NEXT: popq %rax
; CHECK-SSE-NEXT: addq $8, %rsp		; CHECK-SSE-NEXT: addq $8, %rsp
; CHECK-SSE-NEXT: iretq		; CHECK-SSE-NEXT: iretq
; CHECK0-LABEL: test_isr_clobbers		; CHECK0-LABEL: test_isr_clobbers
; CHECK0-SSE-NEXT: pushq %rax		; CHECK0-SSE-NEXT: pushq %rax
; CHECK0-SSE-NEXT; pushq %r11		; CHECK0-SSE-NEXT; pushq %r11
; CHECK0-SSE-NEXT: pushq %rbp		; CHECK0-SSE-NEXT: pushq %rbp
; CHECK0-SSE-NEXT: pushq %rbx		; CHECK0-SSE-NEXT: pushq %rbx
; CHECK0-SSE-NEXT: movaps %xmm0		; CHECK0-SSE-NEXT: movaps %xmm0
; CHECK0-SSE-NEXT: movaps %xmm0		; CHECK0-SSE-NEXT: movaps %xmm0
; CHECK0-SSE-NEXT: popq %rbx		; CHECK0-SSE-NEXT: popq %rbx
; CHECK0-SSE-NEXT: popq %rbp		; CHECK0-SSE-NEXT: popq %rbp
; CHECK0-SSE-NEXT: popq %r11		; CHECK0-SSE-NEXT: popq %r11
; CHECK0-SSE-NEXT: popq %rax		; CHECK0-SSE-NEXT: popq %rax
; CHECK0-SSE-NEXT: addq $8, %rsp		; CHECK0-SSE-NEXT: addq $16, %rsp
; CHECK0-SSE-NEXT: iretq		; CHECK0-SSE-NEXT: iretq
ret void		ret void
}		}

@f80 = common global x86_fp80 0xK00000000000000000000, align 4		@f80 = common global x86_fp80 0xK00000000000000000000, align 4

; Test that the presence of x87 does not crash the FP stackifier		; Test that the presence of x87 does not crash the FP stackifier
define x86_intrcc void @test_isr_x87(%struct.interrupt_frame* %frame) {		define x86_intrcc void @test_isr_x87(%struct.interrupt_frame* %frame) {
Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93936

llvm/trunk/lib/Target/X86/X86FrameLowering.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/x86-64-intrcc-nosse.ll

llvm/trunk/test/CodeGen/X86/x86-64-intrcc.ll

This is an archive of the discontinued LLVM Phabricator instance.

x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushedClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93936

llvm/trunk/lib/Target/X86/X86FrameLowering.cpp

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/x86-64-intrcc-nosse.ll

llvm/trunk/test/CodeGen/X86/x86-64-intrcc.ll

x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed
ClosedPublic