This is an archive of the discontinued LLVM Phabricator instance.

x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed
ClosedPublic

Authored by phil-opp on Feb 16 2017, 11:44 AM.

Details

Summary

The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated.

This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue.

Fixes Bug 26413

Diff Detail

Repository
rL LLVM

Event Timeline

phil-opp created this revision.Feb 16 2017, 11:44 AM
phil-opp edited the summary of this revision. (Show Details)Feb 16 2017, 11:54 AM
phil-opp edited the summary of this revision. (Show Details)

This should also fix the 64-bit case of Bug 26477. I'm not quite sure what's the problem in the 32-bit case.

aaboud edited edge metadata.Feb 17 2017, 4:53 AM

Please, generate the patch using this command:
svn diff --diff-cmd=diff -x -U999999 > file.patch

DavidKreitzer edited edge metadata.Feb 17 2017, 8:21 AM

Aligning the stack is certainly the right thing to do. But this isn't the only problem with 26413. I will add a note to the report explaining what I mean.

As far as I am concerned, this patch looks good, but I would like someone else to comment on whether this is best mechanism for accomplishing the desired stack alignment.

Thanks,
Dave

lib/Target/X86/X86ISelLowering.cpp
3117 ↗(On Diff #88761)

Please add a '.' here.

Amjad pointed out to me that the incoming alignment to an interrupt handler is only guaranteed to be 0 mod 8, not 8 mod 16 as is the case with the normal x86-64 ABI. HJ mentions this in 26477.

So this fix is insufficient. We need to dynamically realign the stack as in HJ's 32-bit example.

Also, I added notes to 26413 explaining the issues with attempting to save & restore XMM/YMM/ZMM register state in an interrupt handler.

Amjad pointed out to me that the incoming alignment to an interrupt handler is only guaranteed to be 0 mod 8, not 8 mod 16 as is the case with the normal x86-64 ABI. HJ mentions this in 26477.

The x86_64 architecture unconditionally aligns the stack on a 16-byte boundary when an interrupt occurs. From the AMD64 manual (Section 8.9.3):

In legacy mode, the interrupt-stack pointer can be aligned at any address boundary. Long mode, however, aligns the stack on a 16-byte boundary. This alignment is performed by the processor in hardware before pushing items onto the stack frame. The revious RSP is saved unconditionally on the new stack by the interrupt mechanism. A subsequent IRET instruction automatically restores the previous RSP.

It even says:

Aligning the stack on a 16-byte boundary allows optimal performance for saving and restoring the 16-byte XMM registers. The interrupt handler can save and restore the XMM registers using the faster 16-byte aligned loads and stores (MOVAPS), rather than unaligned loads and stores (MOVUPS).

The problem is that the CPU pushes an error code for some exceptions, which destroys the 16-byte alignment. This patches fixes this problem.

Regarding the YMM and ZMM registers: I think they are already saved if the target supports them: https://github.com/llvm-mirror/llvm/blob/47cf6aadec0bc58d970052092ee85a69b3625792/lib/Target/X86/X86RegisterInfo.cpp#L336-L342

The situation on 32-bit x86 is vastly different. The CPU performs no stack alignment at all, so you're correct that we need dynamic realignment.

phil-opp updated this revision to Diff 89023.Feb 18 2017, 6:47 AM

re-created the patch with -U999999

phil-opp updated this revision to Diff 89024.Feb 18 2017, 6:51 AM

add a . in comment

No, there is no need to realign stack in 64-bit mode.

phil-opp marked an inline comment as done.Feb 21 2017, 9:04 AM

No, there is no need to realign stack in 64-bit mode.

I'm not sure if I understand you correctly. Yes, there is no need to dynamically realign the stack in 64-bit mode since the CPU aligns the stack on a 16 byte boundary on interrupt entry. However, for some exceptions, the CPU pushes an 8 byte error code afterwards. In that case it is necessary to subtract another 8 bytes from RSP to restore the 16-byte alignment.

What's the state of this? Are there any problems with this PR?

No, there is no need to realign stack in 64-bit mode.

I'm not sure if I understand you correctly. Yes, there is no need to dynamically realign the stack in 64-bit mode since the CPU aligns the stack on a 16 byte boundary on interrupt entry. However, for some exceptions, the CPU pushes an 8 byte error code afterwards. In that case it is necessary to subtract another 8 bytes from RSP to restore the 16-byte alignment.

Can we verify how stack is aligned when there is an error code?

Can we verify how stack is aligned when there is an error code?

The documentation that Phil cited is pretty clear in this regard, HJ. In addition to the following text (where I removed some unrelated steps), there is figure 8-13.

In long mode, when a control transfer to an interrupt handler occurs, the processor performs the
following:

  1. Aligns the new interrupt-stack frame by masking RSP with FFFF_FFFF_FFFF_FFF0h.

...

  1. Pushes the return stack pointer (old SS:RSP) onto the new stack. The SS value is padded with six

bytes to form a quadword.

  1. Pushes the 64-bit RFLAGS register onto the stack. The upper 32 bits of the RFLAGS image on

the stack are written as zeros.
...

  1. Pushes the return CS register and RIP register onto the stack. The CS value is padded with six

bytes to form a quadword.

  1. If the interrupt vector number has an error code associated with it, pushes the error code onto the

stack. The error code is padded with four bytes to form a quadword.

So this patch LGTM. But I think you should get someone else to confirm the correctness of lines 962-970. Amjad, can you do that?

Thanks for your patience.

aaboud accepted this revision.Mar 14 2017, 6:50 AM

Now that we agreed on why this fix is correct, the code LGTM.

This revision is now accepted and ready to land.Mar 14 2017, 6:50 AM

I just tested this version again and found a bug in the implementation: I forgot to adjust the offset of the error code and the pointer to the exception stack frame. So please do not merge this yet.

phil-opp updated this revision to Diff 91775.Mar 14 2017, 3:08 PM

The argument offsets (exception stack frame and error code) are now updated if the stack is aligned. I also updated the tests accordingly.

phil-opp updated this revision to Diff 91781.Mar 14 2017, 3:13 PM

(updated the test_isr_clobbers test too, even though the CHECK-SSE-NEXT directives are broken at the moment)

Code looks good, but I am wondering about one thing, see below.

test/CodeGen/X86/x86-64-intrcc.ll
33 ↗(On Diff #91781)

I am wondering if this push was not generated before your last fix?
Did you just forget to update the test? or we needed the last change to have this push?

35 ↗(On Diff #91781)

This is a good catch.
Indeed we should update the argument offsets after we re-align the stack with +8 bytes.

phil-opp added inline comments.Mar 15 2017, 8:24 AM
test/CodeGen/X86/x86-64-intrcc.ll
33 ↗(On Diff #91781)

I just forgot to update this test, because it still succeeded (it's only a CHECK and not a CHECK-NEXT). However, by adding the additional check we ensure that the stack alignment isn't removed accidentally in the future.

Good.
I think this patch is ready now to land.
Anybody has any final comment?

Not from me. Looks good.

It seems like this patch is ready to land. I don't have SVN access, so it would be great if someone could commit it for me.

Friendly ping :)

aaboud added a comment.Apr 3 2017, 8:58 AM

I will commit that.
Sorry for the late response.

This revision was automatically updated to reflect the committed changes.