This is an archive of the discontinued LLVM Phabricator instance.

[3/5] [MC] [Win64EH] Produce well-formed xdata records when info is missing
ClosedPublic

Authored by mstorsjo on Aug 25 2020, 4:34 AM.

Details

Summary

If there's no unwinding opcodes, omit writing the xdata/pdata records.

If writing of an xdata record is forced via the .seh_handlerdata directive, make sure the xdata record is of a valid length even though it's bogus.

Previously, this generated truncated xdata records, and llvm-readobj would error out when trying to print them.

Diff Detail

Event Timeline

mstorsjo created this revision.Aug 25 2020, 4:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2020, 4:34 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

If there aren't any unwind codes, the compiler should completely skip emitting the .pdata record. That's what the spec suggests the compiler should do, and probably matches what tooling expects. So I think if we do end up with SEH directives without any SEH opcodes, we should either omit the .pdata record, or reject the assembly.

If there aren't any unwind codes, the compiler should completely skip emitting the .pdata record. That's what the spec suggests the compiler should do, and probably matches what tooling expects. So I think if we do end up with SEH directives without any SEH opcodes, we should either omit the .pdata record, or reject the assembly.

Fair enough. In most cases it's straightforward to do that, but there's one main exception. There's a .seh_handlerdata directive, and on that, it triggers emission of the xdata record right away. And when emitting the xdata record, the FrameInfo gets cleared of the opcodes, so it's not entirely evident at that point whether the xdata already emitted was bogus or not.

Or I guess one possible solution to that would be to add another flag to FrameInfo after the xdata was written, indicating whether it was sensible or not. (Or just clear the info->Symbol member to pretend that it wasn't ever emitted, even if there's a few orphaned bytes in the xdata section?)

mstorsjo updated this revision to Diff 287866.Aug 26 2020, 1:40 AM
mstorsjo edited the summary of this revision. (Show Details)

Avoid writing the xdata/pdata entries where easily possible, but kept the code for avoiding malformed entries for the cases where we are forced to emit them.

If there aren't any unwind codes, the compiler should completely skip emitting the .pdata record. That's what the spec suggests the compiler should do, and probably matches what tooling expects. So I think if we do end up with SEH directives without any SEH opcodes, we should either omit the .pdata record, or reject the assembly.

Fair enough. In most cases it's straightforward to do that, but there's one main exception. There's a .seh_handlerdata directive, and on that, it triggers emission of the xdata record right away. And when emitting the xdata record, the FrameInfo gets cleared of the opcodes, so it's not entirely evident at that point whether the xdata already emitted was bogus or not.

Or I guess one possible solution to that would be to add another flag to FrameInfo after the xdata was written, indicating whether it was sensible or not. (Or just clear the info->Symbol member to pretend that it wasn't ever emitted, even if there's a few orphaned bytes in the xdata section?)

I don't understand how we end up in this position in the first place. Why does .seh_handlerdata need to trigger the emission of unwind data? Why is it okay if the resulting unwind data is nonsense?

I don't understand how we end up in this position in the first place. Why does .seh_handlerdata need to trigger the emission of unwind data?

Because .seh_handlerdata changes the active section to the xdata section, emits the unwind info record, letting you append handler specific data which is supposed to follow directly after the unwind data itself. (See the testcase in this patch.)

Why is it okay if the resulting unwind data is nonsense?

Well it's at least parsable, but nonsense, which is better than unparseable records.

But I don't believe this case, .seh_handlerdata without any valid opcodes really happens in the wild - so the exact handling doesn't matter much. Functions with just .seh_proc/.seh_endproc do occur though, but those are easy to omit and handle cleanly (which the current revision if the patch does).

For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry. Can we output warnings from this layer?

llvm/test/MC/AArch64/seh.s
97

The .long 0 here goes in the xdata section, appended after the unwind info.

letting you append handler specific data which is supposed to follow directly after the unwind data itself

Oh, that's the part I was missing; thanks. So in well-formed code, .seh_handlerdata should come after an .seh_endprologue, and there shouldn't be any .seh_* directives or instructions between the .seh_handlerdata and the .seh_endproc?

For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry.

That's probably makes the most sense, yes.

Can we output warnings from this layer?

Technically yes, but you've lost the source location by the time you get this deep, so it wouldn't be pretty. Probably we should do some primitive tracking of the SEH state in the asmparser, and emit a warning from there.

letting you append handler specific data which is supposed to follow directly after the unwind data itself

Oh, that's the part I was missing; thanks. So in well-formed code, .seh_handlerdata should come after an .seh_endprologue, and there shouldn't be any .seh_* directives or instructions between the .seh_handlerdata and the .seh_endproc?

It's actually even a bit stricter/worse than that. Not only does the xdata record contain the unwind opcodes themselves, but it also contains the function length field. So ideally .seh_handlerdata comes after .seh_endfunclet, so that the full function length is known.

In practice, there can be cases where .seh_handlerdata comes before .seh_endfunclet (or functions without that altogether), and then we need to set the function length up to the current point, as I do in D86528. This means that the actual unwindable region of the function only is up to this point. So if we have .seh_handlerdata directly after the prologue, one can't actually unwind from the body of the function, only within the prologue itself. So ideally .seh_handlerdata really should be as far to the end of the function as possible.

Then there's real world messes like https://github.com/mingw-w64/mingw-w64/blob/master/mingw-w64-crt/crt/crtexe.c#L179-L198, where .seh_handlerdata is injected via inline assembly in C code. This works fine in x86_64, because the function length itself isn't embedded in the xdata record, but is handled via the BeginAddress/EndAddress pair in the pdata record. But for the aarch64 case, that code needs to be adjusted to move the .seh_handlerdata bit to the end of the function. (I'll try to get that fixed after these patches settle.) It won't cover the epilogue of the function, but would at least cover the body.

For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry.

That's probably makes the most sense, yes.

Ok, will try to do that then.

Can we output warnings from this layer?

Technically yes, but you've lost the source location by the time you get this deep, so it wouldn't be pretty. Probably we should do some primitive tracking of the SEH state in the asmparser, and emit a warning from there.

Even without the source location, just giving the function name might be context enough - you'd probably only have this in cases with assembly involved anyway. But it's probably not necessary.

mstorsjo updated this revision to Diff 288269.Aug 27 2020, 4:20 AM

Omitting empty unwind info even when triggered by a .seh_handlerdata directive, added an error for the case if such a skipped entry (that was supposed to be emitted, but wasn't) later got more unwind info - which should trigger users to move the .seh_handlerdata further ahead if that ever happens.

This revision is now accepted and ready to land.Aug 27 2020, 2:14 PM
This revision was automatically updated to reflect the committed changes.