This is an archive of the discontinued LLVM Phabricator instance.

[WinEH][ARM64] Split unwind info for functions larger than 1MB
ClosedPublic

Authored by zzheng on Jul 18 2022, 4:03 PM.

Details

Summary

Create function segments and emit unwind info of them.

A segment must be less than 1MB and no prolog or epilog can cross segments boundary.

Reference: https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling#function-fragments

TODOs:

  1. Enable packed unwind info (.pdata only) for multi-segment functions. (currently only enabled for functions less than 1MB, that is: functions have one segment)
  2. Emit packed unwind info (.pdata only) for segments that have neithor prolog nor epilog.

These two will be done in next patch (WIP).

  1. Handle functions that requires more than 255 (0xFF) words to encode all prolog and epilogs unwind op or have more than 65535 (0xFFFF) epilogs.

This one needs some change as we need to move counting code words for epilog earlier in the current process. We may not see either case in practice, IMHO.

  1. Shrink Wrapping where some stack ops are moved out of prolog/epilog and put in different segment. See example 2 of above link.
  2. Align directive or aligning loops where we cannot reliably compute function size.

4 and 5 are kind of future work, I'm think about what to do as well as getting test cases for them.

Diff Detail

Event Timeline

zzheng created this revision.Jul 18 2022, 4:03 PM
zzheng requested review of this revision.Jul 18 2022, 4:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2022, 4:03 PM
zzheng edited the summary of this revision. (Show Details)Jul 18 2022, 4:14 PM
efriedma added inline comments.Jul 19 2022, 12:22 PM
llvm/lib/MC/MCWin64EH.cpp
1099

This FIXME doesn't really make sense; in general, we can't represent segments in pdata-only form. Among other things, pdata-only form is limited to 8KB function length.


I'm still not really convinced this "fake epilog" form is really useful, even if the Microsoft documentation suggests using it.

1180

for (auto Inst : llvm::reverse(info->Instructions)

zzheng marked 2 inline comments as done.Jul 19 2022, 4:50 PM
zzheng added inline comments.
llvm/lib/MC/MCWin64EH.cpp
1099

Let's keep it this way for now. We can remove it once we verified that MS unwinder is ok with zero epilog scope.

zzheng updated this revision to Diff 445981.Jul 19 2022, 4:51 PM
zzheng marked an inline comment as done.
zzheng edited the summary of this revision. (Show Details)

ping.. @mstorsjo

Yep, I've noted the patch and I'll definitely have a look, but it'll take a day or two, unfortunately!

ping.. @mstorsjo

Yep, I've noted the patch and I'll definitely have a look, but it'll take a day or two, unfortunately!

no problem, we're doing some internal testing on it.

Overall this looks quite good to me, I mostly have a bunch of readability/understandability comments.

llvm/include/llvm/MC/MCWinEH.h
72

Can you add a comment about what the int64_t is here? It's not obvious to me on the first read through the code.

llvm/lib/MC/MCWin64EH.cpp
1010

Should this be >? Isn't it theoretically possible here that we have RemainingLength == SegLimit, we produce one segment and consume all of RemainingLength, and we'd end up adding a zero length segment at the end?

1016

To me, it would be clearer if we'd move the declaration of EpilogsInSegment here, so we don't need to clear it, but we're operating on a new instance of it each time. Or are there (tangible?) performance benefits to doing it like this?

1018

Can we assume that all the epilogs are monotonically ordered? (I think it can be reasonable to do that, but maybe it'd be good to have an assert somewhere that tests it?)

1025

By point 1., I presume you mean "we included all completely contained epilogs, and found another one which is entirely after the current segment" - or would you consider that a third case? It feels confusing to not have that case listed in any case (even if the code probably is correct).

1031

If you don't want to move the declaration of EpilogsInSegment to line 1012 above, it would be clearer to me to do the .clear() here, directly after the std::move, even if that does one unnecessary clean for the last iteration.

1042

Please add the `/*HasProlog*/ comment here for the third parameter.

1144–1147

Isn't SEH unwind data splitting kinda what's being implemented here? It's just that no split is being done for the condition of too many opcode bytes, only for the condition of too large functions. I.e. I'd like to rephrase the error message a bit more.

1180–1181

I'd like to have a more verbose comment here to remind myself about how prologs in !HasProlog segments work - I always end up confused about this, e.g. something like this:

// Even for a segment with !HasProlog, we do need to emit the opcodes for the prolog, for the unwinder to use for unwinding from this segment.
// The end_c opcode at the start indicates to the unwinder that the actual prolog is outside of the current segment, and the unwinder shouldn't try to check for unwinding from a partial prolog.
llvm/test/MC/AArch64/seh-large-func.s
10

Is it meaningful to include all the section headers in the testcase here? They're mostly distracting here I think, and just extra work to update when working on the testcase in my experience.

zzheng updated this revision to Diff 447834.Jul 26 2022, 2:38 PM
zzheng marked 10 inline comments as done.
zzheng retitled this revision from [WinSEH][ARM64] Split unwind info for functions larger than 1MB to [WinEH][ARM64] Split unwind info for functions larger than 1MB.

Addressed @mstorsjo 's comments

Added an unit test that has multiple epilogs in one segment.

Fixed an oversight in previous version that we only call getARM64OffsetInProlog() for the first segment.

zzheng added inline comments.Jul 26 2022, 2:42 PM
llvm/lib/MC/MCWin64EH.cpp
1018

I put an assert in the loop that finds and records epilogs' start and end offset (line-990 to line-994), the idea is:
assert(epilog[n].end <= epilog[n+1].start),

llvm/test/MC/AArch64/seh-large-func.s
10

makes sense.
will just keep .pdata and .xdata sections

zzheng added inline comments.Jul 26 2022, 3:06 PM
llvm/lib/MC/MCWin64EH.cpp
1027

case 1 is when we exit the above while loop with (E == Epilogs.size()), i.e. there's no epilog left to be put into a segment.

1028

case 2 is when we exit the loop above with (E < Epilogs.size) but (Epilogs[E].End >= SegEnd).

1031

this is (E < Epilogs.size()) and (Epilogs[E].end >= SegEnd) when we exit the loop above.

zzheng added inline comments.Jul 26 2022, 3:11 PM
llvm/lib/MC/MCWin64EH.cpp
1033

this is to find out whether we're in case 2 or 3.

gentle ping..

mstorsjo accepted this revision.Jul 29 2022, 1:56 PM

LGTM, thanks! Nothing more from me here, if @efriedma is ok with it. A couple very minor stylistic comments remain.

llvm/lib/MC/MCWin64EH.cpp
1034

Nitpick: the indentation of the comment, between the unscoped if() and the statement it controls, is a bit weird. I'd prefer to move the comment to the end of the SegLength = .. line to keep it clearer, or add braces around it (and fix the indentation of the comment).

1145

Super-nitpick: You're missing a space after the comma, before the following text in the next string literal on the next line.

llvm/test/MC/AArch64/seh-large-func.s
24

Super nitpick: You seem to have lost a space before Section { here, making it misaligned.

This revision is now accepted and ready to land.Jul 29 2022, 1:56 PM
This revision was landed with ongoing or failed builds.Aug 5 2022, 12:01 PM
This revision was automatically updated to reflect the committed changes.
zzheng marked 2 inline comments as done.

This triggers failed asserts in two cases in my builds - this one:

clang: ../lib/MC/MCWin64EH.cpp:998: void ARM64FindSegmentsInFunction(llvm::MCStreamer&, llvm::WinEH::FrameInfo*, int64_t): Assertion `(Epilogs.size() == 0 || Offset >= Epilogs.back().End) && "Epilogs should be monotonically ordered"' failed.

It’s reproducible with https://martin.st/temp/wine-corruntimehost.c and https://martin.st/temp/dav1d-thread.c with this command:
clang -target aarch64-windows -c src.c -O2

This triggers failed asserts in two cases in my builds - this one:

clang: ../lib/MC/MCWin64EH.cpp:998: void ARM64FindSegmentsInFunction(llvm::MCStreamer&, llvm::WinEH::FrameInfo*, int64_t): Assertion `(Epilogs.size() == 0 || Offset >= Epilogs.back().End) && "Epilogs should be monotonically ordered"' failed.

It’s reproducible with https://martin.st/temp/wine-corruntimehost.c and https://martin.st/temp/dav1d-thread.c with this command:
clang -target aarch64-windows -c src.c -O2

I ran into this issue in a number of more cases too, hitting compilation of at least 3 different projects.

I had a look at it - see D131393 for a quick tweak that avoids triggering the failed assert in at least the particular case that I looked at.

I also posted D131394, which adds a similar consistency check as for ARM, which triggers and shows the real root cause for this issue. For the dav1d-thread.c testcase, we're currently generating the following code:

$ clang -target aarch64-windows -S dav1d-thread.c -O2 -o -
[...]
dav1d_set_thread_name:
[...]
        ldp     x2, x19, [sp, #8]               // 16-byte Folded Reload
        .seh_startepilogue
        ldr     x30, [sp, #24]                  // 8-byte Folded Reload
        .seh_save_reg   x30, 24
        .seh_save_reg   x19, 16
        add     sp, sp, #32
        .seh_stackalloc 32
        .seh_endepilogue
        br      x2

There's three .seh_ opcodes in the epilogue, even if there's only two actual instructions - while the .seh_save_reg x19 has been merged into an ldp x2, x19 before the start of the epilogue.

I thought we had resolved all the issues with instruction mismatches... I guess not. We need to suppress the ldp formation transform in that case.

I thought we had resolved all the issues with instruction mismatches... I guess not. We need to suppress the ldp formation transform in that case.

Did we address this yet?

I thought we had resolved all the issues with instruction mismatches... I guess not. We need to suppress the ldp formation transform in that case.

Did we address this yet?

I'm not aware of any fix at least; including D131394 should expose a bunch of issues if they're still present. (If there's reason to believe things are fixed, I can try to include that patch in my next round of nightly builds.)