This is an archive of the discontinued LLVM Phabricator instance.

[X86] Emit fewer instructions to allocate >16GB stack frames
ClosedPublic

Authored by rnk on Feb 16 2017, 1:11 PM.

Details

Summary

Use this code pattern when RAX is live, instead of emitting up to 2
billion adjustments:

pushq %rax
movabsq $Offset, %rax
addq/subq %rsp, %rax
xchg %rax, (%rsp)
movq (%rsp), %rsp

Try to clean this code up a bit while I'm here. In particular, hoist the
logic that handles the entire adjustment with movabsq $imm, %rax out
of the loop.

Fixes PR31962

Event Timeline

rnk created this revision.Feb 16 2017, 1:11 PM
rnk updated this revision to Diff 88773.Feb 16 2017, 1:14 PM
  • Fix buglet
sdardis accepted this revision.Feb 17 2017, 4:26 AM

LGTM with the first inlined comment addressed. When I tested this, the new stack pointer in the epilogue case was pointing at the value above the spilled base pointer. So %rbp got filled with junk when popped, at then %rip got the value of the base pointer.

lib/Target/X86/X86FrameLowering.cpp
327–345

Given the assert(Is64Bit..) above this hunk, can't PushOpc be replaced with PUSH64r directly? Likewise for LoadOpc, XchgOpc.

334

In the epilogue case, this needs to be Offset + SlotSize, as we need to account for the spill we just performed.

This revision is now accepted and ready to land.Feb 17 2017, 4:26 AM
rnk updated this revision to Diff 88909.Feb 17 2017, 9:36 AM
  • Account for 8 byte push SP adjustment
  • Avoid two operand sub, it does the wrong thing here
rnk added inline comments.Feb 17 2017, 9:38 AM
lib/Target/X86/X86FrameLowering.cpp
327–345

I kind of waffled on that assertion. I added it at the end after deciding on the 16GB stack frame threshold. I guess I'll do the simplification.

Also, it's easy to hit the assertion by running the new test targeting i686. Nothing else in our pass pipeline validates that stack objects are small enough to fit in the address space, which is pretty silly. This doesn't feel like the right place to report that, though.

334

Thanks, nice catch!

I went ahead and actually did a manual execution test for this by tweaking the conditions to come here in the common case, and this executes correctly now. I also found a bug where x86's two operand SUB instruction was doing the wrong thing. I want to subtract RAX from RSP and store the result in RAX, which is not possible. Instead, I've negated the offset and used ADD.

This revision was automatically updated to reflect the committed changes.