We have the clang -cc1 command-line option -funwind-tables=1|2 and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the uwtable
attribute, i.e. we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. .cfi_offset reg, off vs. .cfi_restore reg). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of SP adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify SP
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra DW_CFA_advance_loc commands, further increasing the unwind
tables size.
That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.
This patch extends the uwtable attribute with an optional
value:
- uwtable (default to async)
- uwtable(sync), synchronous unwind tables
- uwtable(async), asynchronous (instruction precise) unwind tables
The comment applying to the whole file is generally placed at the top, before all RUN lines.