This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] Fix for http://llvm.org/bugs/show_bug.cgi?id=19905
AbandonedPublic

Authored by sanjoy on Jun 6 2014, 4:36 PM.

Details

Reviewers
asl
Summary

This patch fixes issue 19905. This solution isn't as clean as I'd
like it to be, but I haven't been able to come up with something that
would be both cleaner and not super invasive.

Another possible fix for this that seems cleaner to me is to have the
x86 spillCalleeSavedRegisters routine emit MOVs and not PUSHes (except
maybe for %rbp, to spare the debugger), with MO_FrameIndex operands.
These MOV instructions could then be peephole-optimized into sequences
of PUSHes and POPs if possible, after the frame indices have been
mapped to actual stack slot offsets. I'm not sure how feasible this
is, though.

Diff Detail

Event Timeline

sanjoy updated this revision to Diff 10198.Jun 6 2014, 4:36 PM
sanjoy retitled this revision from to [PATCH] Fix for http://llvm.org/bugs/show_bug.cgi?id=19905.
sanjoy updated this object.
sanjoy edited the test plan for this revision. (Show Details)
sanjoy added reviewers: vadimcn, asl.
sanjoy added a subscriber: Unknown Object (MLST).
vadimcn edited edge metadata.EditedJun 6 2014, 7:06 PM

Hi Sanjay,
Please take a look at my preliminary patch to support Win64 SEH I've put here: http://reviews.llvm.org/D4053 (specifically stuff in X86FrameLowering.cpp). Not directly related to 19905, however, it touches the same code, and could benefit from the same refactoring.

Firstly, I think that spill slots will have to go into fixed frame slots, because in order to describe them in terms of Win64 SEH directives, the spilled XMM registers need to be addressable from the frame pointer. Right now they end up on the wrong side of the "stack realignment gap" in cases when stack is realigned.

Also, as I was working on this code, a couple of questions popped in my mind:

  1. Why does PrologueEpilogueInserter pre-assign spill slots to registers before calling spillCalleeSavedRegisters? Wouldn't it be more logical to let spillCalleeSavedRegisters do that (only if it is going to return true, of course).
  2. Wouldn't it make sense to move the actual emission of PUSHes and MOVs into emitPrologue? I think this would make CFI/SEH info generation much more straightforward. spillCalleeSavedRegisters would still have to assign the spill slots, of course. The only downside I see, is that this code wouldn't be able to use TII.storeRegToStackSlot(), because by that point frame slots have already been reified.

Looking at the llc output you provided in the bug, I notice that it spilled and emitted CFI info for XMM registers. What target was that compiled for? AFAIK, only Windows 64 ABI specifies XMMs as callee-saved. So if it was for Win64, my patch would directly address your problem.

asl added inline comments.Jun 7 2014, 9:47 PM
test/CodeGen/X86/pr19905.ll
26

You redefined CFA register above to be %rbp. So, not, this is not stack pointer just before the call anymore.

-120 is because the rule here is"take the address from %rbp and add 16 to it". %rbx is 13th register to be pushed, so we have to offset by 13 * -8 - 16 ;)

In D4052#5, @vadimcn wrote:

Looking at the llc output you provided in the bug, I notice that it spilled and emitted CFI info for XMM registers. What target was that compiled for? AFAIK, only Windows 64 ABI specifies XMMs as callee-saved. So if it was for Win64, my patch would directly address your problem.

Hi Vadim,

I haven't looked at your change yet, but this change has nothing to do
with Win64 specifically. The test case simply forces llvm (using
llvm.eh.unwind.init) to spill out all the registers as CSRs, and the
.cfi_offset values emitted demonstrate the bug I'm trying to address.

In D4052#4, @vadimcn wrote:

Hi Sanjay,
Please take a look at my preliminary patch to support Win64 SEH I've put here: http://reviews.llvm.org/D4053 (specifically stuff in X86FrameLowering.cpp). Not directly related to 19905, however, it touches the same code, and could benefit from the same refactoring.

Yes, I think you could do away with MaxSpillSlotOffset once this
change (or something like this) is in, and just use getObjectOffset.
This is assuming I correctly understand what MaxSpillSlotOffset is
used for. :)

Firstly, I think that spill slots will have to go into fixed frame slots, because in order to describe them in terms of Win64 SEH directives, the spilled XMM registers need to be addressable from the frame pointer. Right now they end up on the wrong side of the "stack realignment gap" in cases when stack is realigned.

I don't know anything about the Win64 ABI, so I won't comment on this.
However, I think the above issue is orthogonal to this change.

Also, as I was working on this code, a couple of questions popped in my mind:

  1. Why does PrologueEpilogueInserter pre-assign spill slots to registers before calling spillCalleeSavedRegisters? Wouldn't it be more logical to let spillCalleeSavedRegisters do that (only if it is going to return true, of course).

I think PEI pre-assigns frame indices, not actual spill slots. Also
spillCalleeSavedRegisters is a TargetFrameLowering function, and the
logic to assign frame indices is target agnostic, so maybe that's the
reason for putting it in PEI. A point to note is that this issue
doesn't have any direct relation to eager / early assignment of frame
indices.

  1. Wouldn't it make sense to move the actual emission of PUSHes and MOVs into emitPrologue? I think this would make CFI/SEH info generation much more straightforward. spillCalleeSavedRegisters would still have to assign the spill slots, of course. The only downside I see, is that this code wouldn't be able to use TII.storeRegToStackSlot(), because by that point frame slots have already been reified.

At this time I don't have enough insight to comment on the pros and
cons of such a change; especially whether such a change simplifies x86
code at the cost of making other backends more complex.

@sanjoy, I wonder what happens with your patch in case of stack re-alignment.
Sorry, I can't figure out how to get git to apply your diff. Could you please try it on the following IR:

declare void @llvm.eh.unwind.init()

define coldcc void @calls_unwind_init() {
  %s = alloca i32, align 64
  call void @llvm.eh.unwind.init()
  ret void
}

If CFI info for XMM registers still correct?

sanjoy added inline comments.Jun 8 2014, 1:59 AM
test/CodeGen/X86/pr19905.ll
26

Makes sense, I'll fix the comment in the next iteration of the patch. :)

sanjoy added a comment.Jun 9 2014, 5:40 PM
In D4052#9, @vadimcn wrote:

@sanjoy, I wonder what happens with your patch in case of stack re-alignment.
Sorry, I can't figure out how to get git to apply your diff. Could you please try it on the following IR:

declare void @llvm.eh.unwind.init()

define coldcc void @calls_unwind_init() {
  %s = alloca i32, align 64
  call void @llvm.eh.unwind.init()
  ret void
}

If CFI info for XMM registers still correct?

Hi Vadim,

With my changes the CFI info for the above snippet isn't correct, for the reason you mentioned earlier -- the xmm registers are spilled *after* the stack realignment, and I don't see how you could even correctly access them using %rbp.

I've updated my patch. It should now solve your problem too.

vadimcn resigned from this revision.Dec 4 2014, 5:58 PM
vadimcn removed a reviewer: vadimcn.
sanjoy abandoned this revision.Dec 11 2014, 8:31 PM

The issue has already been fixed on llvm upstream.