This is an archive of the discontinued LLVM Phabricator instance.

[MC] [Win64EH] Try to generate packed unwind info where possible
ClosedPublic

Authored by mstorsjo on Sep 9 2020, 4:46 AM.

Details

Summary

Unfortunately, in practice, this isn't ever matched by the code generated by LLVM, as the frame layout doesn't match the canonical layout required by the packed format.

Diff Detail

Event Timeline

mstorsjo created this revision.Sep 9 2020, 4:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2020, 4:46 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

Unfortunately, in practice, this isn't ever matched by the code generated by LLVM

Are you planning this as a followup?

llvm/lib/MC/MCWin64EH.cpp
955

Is it always exactly zero? Spec says in some cases "No instruction corresponding to mov x29,sp is present in the epilog"

Unfortunately, in practice, this isn't ever matched by the code generated by LLVM

Are you planning this as a followup?

Originally I didn't, I was just trying to fulfill my curiosity regarding this (that has ended up taking way much longer than I originally anticipated), but we'll see if I end up nerd-sniping myself into it in the end.

I'm also toying with a relaxed version of this patch, that just checks whether the layout of stored registers on the stack is right, even though the exact sequence of operations isn't the expected one. E.g. in one case, llvm generates

sub sp, sp, #32
stp x19, x20, [sp, #16]

which generates the same stack layout as

stp x19, x20, [sp, #-16]!
sub sp, sp, #16

which is possible to describe with packed unwind info. As long as unwinding doesn't happen within prologues/epilogues, it should be equal. Not sure if one wants to do that for real (the length of the synthesized canonical prologue might differ a little from the actual one maybe), but it's at least interesting to play with, to see how big gains there potentially could be by tweaking the frame lowering.

llvm/lib/MC/MCWin64EH.cpp
955

I'm not entirely sure how to interpret that clause.

I'm looking at one case, where MSVC generates the following prologue:

0: fd 7b bb a9   stp     x29, x30, [sp, #-80]!
4: fd 03 00 91   mov     x29, sp
8: ff 43 00 d1   sub     sp, sp, #16

However, the final sub sp, sp, #16 isn't part of the canonical prologue, and this is described as packed unwind info RegI=0, CR=3, FrameSize=80, which llvm-readobj prints as

Prologue [
  mov x29, sp
  stp x29, lr, [sp, #-80]!
  end
]

However the actual epilogue of the generated code looks like this:

dc: ff 43 00 91   add     sp, sp, #16
e0: fd 7b c5 a8   ldp     x29, x30, [sp], #80

(Here, instead of mov sp, x29 it has add sp, sp #16, but it doesn't matter as they're equivalent here.)

So if unwinding from the body, the prologue is executed, and it starts off by reversing mov x29, sp from the prologue. But if choosing to execute the epilogue instead of the prologue, it has to execute that instruction, to get the right stack pointer.

From testing the unwinder behaviour with RtlVirtualUnwind, it does look like it does execute mov sp, x29 as part of the epilogue for these cases though, which would make sense.

So I guess it should be ok to generate packed data if the epilogue is an exact match of the prologue, except for the final set_fp opcode - but that wording actually makes me worry more about the cases where the epilogue does contain that opcode (and rely on it). And the docs say

Packed unwind data can't be used if a function requires restoration of sp from x29.

But I'd say that the code generated by MSVC that I quoted does seem to rely on it.

mstorsjo updated this revision to Diff 290900.Sep 10 2020, 1:04 AM

Updated to allow cases where the epilog is missing the SetFP opcode.

mstorsjo added inline comments.Sep 10 2020, 1:07 AM
llvm/lib/MC/MCWin64EH.cpp
719

After updating the other patch, where a "stp x19, x20" can't be made into SaveR19R20X if the offset is too large, I tried update the code here to expect SaveRegPX for x19 - but it turns out it's impossible to make such a case with a large enough offset not to be canonicalized into SaveR19R20X, while maintaining the expected stack layout here, so I just updated the comment here.

mstorsjo added inline comments.Sep 10 2020, 5:07 AM
llvm/lib/MC/MCWin64EH.cpp
1126

The false parameter (to the parameter TryPacked) here was the reason why this didn't match anything for real code generation. With D87448 in place, we don't call this needlessly.

Alternatively, we could remove the TryPacked parameter, and just look at info->HandlesExceptions (as we already do). In that case, a plain .seh_handlerdata wouldn't skip making packed info, but that feels a bit risky (as .seh_handlerdata implies switching to the xdata section, supposedly after emitting the preceding unwind info there). But then again, if there's no unwind handler set, nothing would be able to find the trailing data written to the xdata section anyway? So maybe we could just remove this extra check as well?

the length of the synthesized canonical prologue might differ a little from the actual one maybe

Getting the length wrong has caused miscompiles in the past.

llvm/lib/MC/MCWin64EH.cpp
955

Makes sense.

Maybe we can get a spec clarification for this.

the length of the synthesized canonical prologue might differ a little from the actual one maybe

Getting the length wrong has caused miscompiles in the past.

I don't see how that would cause miscompiles here. But in any case, that experiment turned out to be less exciting than expected in the end; this patch, when properly working without being blocked by extra .seh_handlerdata, shaves off 2.5 KB from a xdata section of 228 KB, and such a relaxed matcher only got rid of another 2 KB. So for it to get any notable effect, the frame lowering needs to be adjusted anyway. But at least it has *some* effect now.

llvm/lib/MC/MCWin64EH.cpp
955

I'll see if I can get @TomTan to comment on it.

mstorsjo added inline comments.Sep 22 2020, 1:51 AM
llvm/lib/MC/MCWin64EH.cpp
955

I discussed this with @TomTan, and he clarified that the synthesized epilogue for these cases lacks the corresponding set_fp opcode - because makes it suitable in a lot more cases for the kind of code MSVC often generates.

But he also agreed that the wording in the docs seems to be overly strict and the statement "Packed unwind data can't be used if a function requires restoration of sp from x29." doesn't seem to hold up (and agrees with the reasoning above that it would unwind correctly anyway), and he checked with the original document author who can't remember a reason why it wouldn't work either. So he agrees we can proceed with this patch, generating the packed format even if the epilogue is an exact mirror of the prologue, including the set_fp opcode.

mstorsjo updated this revision to Diff 293381.Sep 22 2020, 2:04 AM

Expanded the comment regarding matching/mismatching prologues to include more about the backstory.

This revision is now accepted and ready to land.Sep 22 2020, 11:42 AM