This is an archive of the discontinued LLVM Phabricator instance.

[WinEH] Allocate unique stack slots for xmm CSRs in funclets
AbandonedPublic

Authored by andrew.w.kaylor on Feb 8 2019, 12:23 PM.

Details

Reviewers
rnk
Summary

Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value.

This patch allocates a second stack slot to be used in the EH funclets for saving these same registers. Long term, it would be better to determine actual CSR use by the funclets and only allocate the extra space when needed.

Diff Detail

Repository
rL LLVM

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2019, 12:23 PM
rnk added inline comments.Feb 8 2019, 12:39 PM
test/CodeGen/X86/catchpad-realign-savexmm.ll
65–66

I don't think this will work long in practice because we are storing XMM into the parent function's stack frame, but the .seh_savexmm directive describes locations relative the the funclet's RSP. So, when the stack is unwound, (throw an exception out of a catch block) XMM CSRs will not be restored correctly.

Another idea I had for fixing this was to change X86FrameLowering::getFrameIndexReference to do something special for XMM CSRs (we should be able to find the list of them somewhere), and resolve them to some SP-relative offset in the funclet's frame. We'll have to adjust the "SUB RSP, 32" that we currently emit for every funclet as well for that to work.

andrew.w.kaylor marked an inline comment as done.Feb 8 2019, 1:38 PM
andrew.w.kaylor added inline comments.
test/CodeGen/X86/catchpad-realign-savexmm.ll
65–66

I see what you mean, and that makes the hardcoded offset suggestion make more sense.

That said, I think this would probably "work" (i.e. not fail) for a lot of cases. If the funclet throws an exception the unwinder will put garbage in the CSRs, but if the exception isn't caught by the parent function the correct value will be filled in when the parent's stack unwinds. If my thinking on this is correct, we'd really only keep incorrect values when a funclet throws an exception and the parent catches it. I'm not suggesting that this is in any way good, but it's slightly less bad than what we're doing now.

I'll see if I can figure out how to do it the way you are suggesting. In the meantime do you think what I have here is possibly useful as a workaround for desperate people (of which I am currently one)? And if so, do you think it's worth committing if I'm not going to have a proper fix soon?

I haven't had a lot more time to spend on this, but I did do a bit of investigation. I still only have a vague idea of what would be required to implement hard-coded offsets, but it didn't look to me like there is existing infrastructure for this so all of the things that just kind of magically worked with this patch (like growing the stack allocation and writing the .seh_savexmm directive) would need to be explicitly updated, right?

On the other hand, it looked like it might be relatively easy to update the code that writes the savexmm directive to describe the actual location of the stack slot I'm using in the current version of this patch. If I was reading the code correctly, it looks like it's already using the offset between the parent stack pointer and the funclet stack pointer to figure out how to build the instruction that actually stores the register to the stack. So we should be able to do a similar calculation to fill out the savexmm directive correctly. Does that seem worth exploring?

rnk added a comment.Feb 15 2019, 4:02 PM

I haven't had a lot more time to spend on this, but I did do a bit of investigation. I still only have a vague idea of what would be required to implement hard-coded offsets, but it didn't look to me like there is existing infrastructure for this so all of the things that just kind of magically worked with this patch (like growing the stack allocation and writing the .seh_savexmm directive) would need to be explicitly updated, right?

On the other hand, it looked like it might be relatively easy to update the code that writes the savexmm directive to describe the actual location of the stack slot I'm using in the current version of this patch. If I was reading the code correctly, it looks like it's already using the offset between the parent stack pointer and the funclet stack pointer to figure out how to build the instruction that actually stores the register to the stack. So we should be able to do a similar calculation to fill out the savexmm directive correctly. Does that seem worth exploring?

I don't think it's possible, because the XMM registers are stored relative to RBP, which points into the stack frame of the parent function. I believe the savexmm directive offsets are always relative to the RSP at the end of the prologue, even if a frame pointer is in use, and that points into the frame of the funclet, which is usually empty (we always use a frame size of 32 or something).

Anyway, I think you are right that this intermediate state would fix the issue for a lot of folks. Typically you don't need to restore XMM registers to unwind the stack (it's OK if their contents are wrong), so this is only a problem if someone recovers from an exception thrown out of a catch block.

I haven't had time to work on the alternate approach suggested, but one of my colleagues is working on it.

The patch as I have it here has been working very well in our internal testing, but I hope we'll have something better to offer in the next couple of weeks.

andrew.w.kaylor abandoned this revision.Dec 4 2019, 12:00 PM

This was fixed in a different way.