This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Async unwind - function prologues
ClosedPublic

Authored by chill on Oct 8 2021, 7:53 AM.

Details

Summary

This patch rearranges emission of CFI instructions, so the resulting DWARF and .eh_frame information is
precise at every instruction.

The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but
the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example:

stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...

after the stp is executed the (initial) rule for the CFA still says the CFA is in the sp, even though it's already offset by 16 bytes

A correct unwind info could look like:

stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...

Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal
(towards this patch is just a step) is to have fully working -fasynchronous-unwind-tables.

Diff Detail

Event Timeline

chill created this revision.Oct 8 2021, 7:53 AM
chill requested review of this revision.Oct 8 2021, 7:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2021, 7:53 AM
danielkiss accepted this revision.Oct 22 2021, 8:29 AM

Did not apply clearly for me, maybe a rebase is needed but otherwise LGTM thanks!

This revision is now accepted and ready to land.Oct 22 2021, 8:29 AM
chill planned changes to this revision.Oct 22 2021, 8:40 AM

Thanks. I'm sorry, but work on epilogues made more changes necessary.

chill updated this revision to Diff 381580.Oct 22 2021, 9:24 AM
This revision is now accepted and ready to land.Oct 22 2021, 9:24 AM
chill planned changes to this revision.Oct 22 2021, 9:24 AM
chill retitled this revision from [AArch64] Asynchronous unwind - function prologues to [AArch64] Async unwind (5/6) - function prologues.Oct 22 2021, 9:29 AM
chill updated this revision to Diff 389540.Nov 24 2021, 10:24 AM
chill retitled this revision from [AArch64] Async unwind (5/6) - function prologues to [AArch64] Async unwind - function prologues.
chill added a reviewer: MaskRay.
This revision is now accepted and ready to land.Nov 24 2021, 10:24 AM
chill requested review of this revision.Nov 24 2021, 10:25 AM
chill updated this revision to Diff 392394.Dec 7 2021, 7:22 AM
chill updated this revision to Diff 399014.Jan 11 2022, 10:38 AM
chill updated this revision to Diff 408893.Feb 15 2022, 8:42 AM
MaskRay accepted this revision.Feb 22 2022, 4:05 PM

.eh_table

.eh_frame

This revision is now accepted and ready to land.Feb 22 2022, 4:05 PM
danielkiss accepted this revision.Feb 24 2022, 10:36 AM
chill updated this revision to Diff 411781.Feb 28 2022, 3:49 AM
chill edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Feb 28 2022, 5:43 AM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Mar 4 2022, 8:23 AM

We're hitting an assert after this commit:

Assertion failed: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1302998#c2 for a reproducer.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 8:23 AM
chill reopened this revision.Mar 4 2022, 11:53 AM
This revision is now accepted and ready to land.Mar 4 2022, 11:53 AM
chill planned changes to this revision.Mar 4 2022, 11:53 AM
chill updated this revision to Diff 413419.Mar 7 2022, 4:13 AM
This revision is now accepted and ready to land.Mar 7 2022, 4:13 AM
chill updated this revision to Diff 417921.Mar 24 2022, 7:30 AM

Updated on top of latest main and on top of D121017
Now https://bugs.chromium.org/p/chromium/issues/detail?id=1302998 should be fixed.

This revision was automatically updated to reflect the committed changes.

Sorry for the late comment; I'm catching up on the async unwind changes as part of our LLVM 15 upgrade.

From the description of this change and https://discourse.llvm.org/t/rfc-asynchronous-unwind-tables-attribute/59282, it seems like the extra directives being omitted should only be required for asynchronous unwind tables. I was wondering why the emission of the extra directives in this diff wasn't guarded by a check for async unwind tables in that case. E.g. https://godbolt.org/z/h4ceMe5ja shows Clang 15 emitting an extra .cfi_def_cfa_offset vs. Clang 14, even with -fno-asynchronous-unwind-tables. I don't think the size overhead is too significant for us, but I was wondering why that emission wasn't guarded by a check for async unwind tables.

I was also curious about the long-term plans here. https://godbolt.org/z/jnoc8YvKz shows that the .cfi_offset directives for saved registers are emitted together at the end of the prolog, vs. right after the instruction which saves the registers. Will that also be changing eventually for async unwind tables, or is that considered unnecessary?

chill added a comment.Sep 7 2022, 2:20 AM

Sorry for the late comment; I'm catching up on the async unwind changes as part of our LLVM 15 upgrade.

From the description of this change and https://discourse.llvm.org/t/rfc-asynchronous-unwind-tables-attribute/59282, it seems like the extra directives being omitted should only be required for asynchronous unwind tables. I was wondering why the emission of the extra directives in this diff wasn't guarded by a check for async unwind tables in that case. E.g. https://godbolt.org/z/h4ceMe5ja shows Clang 15 emitting an extra .cfi_def_cfa_offset vs. Clang 14, even with -fno-asynchronous-unwind-tables. I don't think the size overhead is too significant for us, but I was wondering why that emission wasn't guarded by a check for async unwind tables.

It's just for simplicity - unwind info emission in prologues is the same and suitable for synchronous and asynchronous unwind tables. IMHO, the benefit is not worth the engineering effort, but
I certainly wouldn't object against it.

That said, I would suggest a good general strategy would be for the prologue/epilogue emission functions to output the most detailed information available, which could
then be optimised by follow-up passes, depending on specific use cases (and extra command line options) - for example, a release/production build of an application
which uses no C++ exceptions could have its unwind tables stripped down only to the bare minimum, required by a profiler to obtain a stack trace - basically removing
any mention of callee-saved registers and retaining only information about frame pointer, return address and CFA offset.

I was also curious about the long-term plans here. https://godbolt.org/z/jnoc8YvKz shows that the .cfi_offset directives for saved registers are emitted together at the end of the prolog, vs. right after the instruction which saves the registers. Will that also be changing eventually for async unwind tables, or is that considered unnecessary?

That's deliberate and I don't have plans to change it. Having unwind instructions grouped together saves quite a lot of unwind tables space, because you don't need a DW_CFA_advance_loc
between DW_CFA_offsets. and as long as no callee-saved register is modified before its .cfi_offset you can obtain its value from the register itself.