This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Emit fewer CFI instructions for synchronous unwind tables
ClosedPublic

Authored by ikudrin on Jun 15 2023, 9:51 PM.

Details

Summary

The instruction-precise, or asynchronous, unwind tables usually take up much more space than the synchronous ones. If a user is concerned about the load size of the program and does not need the features provided with the asynchronous tables, the compiler should be able to generate the more compact variant.

This patch changes the generation of CFI instructions for these cases so that they all come in one chunk in the prolog; it emits only one .cfi_def_cfa* instruction followed by .cfi_offset ones after all stack adjustments and register spills, and avoids generating CFI instructions in the epilog(s) as well as any other exceeding CFI instructions like .cfi_remember_state and .cfi_restore_state. Effectively, it reverses the effects of D111411 and D114545 on functions with the uwtable(sync) attribute. As a side effect, it also restores the behavior on functions that have neither uwtable nor nounwind attributes.

Diff Detail

Event Timeline

ikudrin created this revision.Jun 15 2023, 9:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 9:51 PM
ikudrin requested review of this revision.Jun 15 2023, 9:51 PM

Do you know what's causing the changes in instruction scheduling in some testcases? For example, sve-fixed-length-permute-zip-uzp-trn.ll and sve-split-int-pred-reduce.ll.

Do you know what's causing the changes in instruction scheduling in some testcases? For example, sve-fixed-length-permute-zip-uzp-trn.ll and sve-split-int-pred-reduce.ll.

I did not investigate that; it looks like the CFI instructions work like a barrier. Anyway, these tests are just returning to the state they were before D111411.

MaskRay added inline comments.Jun 16 2023, 5:14 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

.cfi_def_cfa_offset 32 move like this seems unnecessary. It degrades precision without reducing the number of CFI instructions for sync.

ikudrin added inline comments.Jun 16 2023, 5:37 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

The goal is to reduce the size of the unwind table, not just the number of CFI instructions. Having only one chunk removes DWA_CFA_advance_loc instructions that would otherwise be required.

MaskRay added inline comments.Jun 16 2023, 5:52 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

Yes, the goal is fine. However, when the number of instructions does not change, it would be better not to degrade the precision by moving .cfi_def_cfa_offset...

ikudrin added inline comments.Jun 16 2023, 6:10 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

In this test, there is a .cfi_offset w30, -16 instruction below. If we kept .cfi_def_cfa_offset 32 there it was, the size of the unwind table would be bigger. Yes, only by two bytes, but still.

smeenai added inline comments.Jun 18 2023, 2:29 AM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

Those bytes do add up ... I measured this change to be a 72 KiB size reduction for the Facebook for Android app, which is quite nice.

Matt added a subscriber: Matt.Jun 20 2023, 2:08 PM
MaskRay added inline comments.Jun 21 2023, 12:49 AM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

I wonder whether we just need a mode to omit most of the registers? Many registers are probably not useful for some use cases, e.g. profiling.

smeenai added inline comments.Jun 21 2023, 10:04 AM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

That would make that mode unusable for exception handling, right? My interest here is having smaller unwind tables that can still be used for exceptions.

MaskRay added inline comments.Jun 21 2023, 1:38 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

Am I mistaken that the number of CHECK lines in addsub-constant-folding.ll does not change with this patch? However, the patch degrades the precision of .cfi_def_cfa_offset 32. This seems a regression to me.
If the number of CFI instructions decreases, we can consider it different.

I wonder whether we just need a mode to omit most of the registers? Many registers are probably not useful for some use cases, e.g. profiling.

I agree that for exception handling we need to preserve callee-saved registers.

I am mainly thinking of nounwind functions that get an unwind table due to -funwind-tables/-fasynchronous-unwind-tables. If we just need to get stack trace for these functions, it seems that we can have a mode to omit most callee-saved registers.

efriedma added inline comments.Jun 21 2023, 2:06 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

Am I mistaken that the number of CHECK lines in addsub-constant-folding.ll does not change with this patch?

It's the same number of lines of assembly, but it's fewer bytes in DWARF encoding.

MaskRay added inline comments.Jun 21 2023, 2:46 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

Presumably DW_CFA_advance_loc. Yes, it's fewer bytes.

smeenai added inline comments.Jun 21 2023, 4:06 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

Yup, exactly.

It makes sense to emit only the minimal amount of information for functions which don't need to be unwound through, but that seems like a separate change. With unwindabort and -fno-exceptions=noexcept, we'd have way more nounwind functions as well, so that'd be even more beneficial. I haven't looked into it very much, but what's your opinion on using SFrame for that?

ikudrin added inline comments.Jun 21 2023, 4:39 PM
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
363–364

My interest here is having smaller unwind tables that can still be used for exceptions.

Mine is the same. The application shipped to end users does not need to support profiling, so it is beneficial to emit more compact unwind tables. They should be just enough for exceptions and crash dumps.

@MaskRay, do you think it's okay to move forward with this?

MaskRay added a comment.EditedJun 29 2023, 6:03 PM

@MaskRay, do you think it's okay to move forward with this?

I am happy if @chill is happy.

Have you checked all the lost .cfi_def_cfa_offset and verified that they are all non-important?

Have you checked all the lost .cfi_def_cfa_offset and verified that they are all non-important?

The patch preserves the last .cfi_def_cfa or .cfi_def_cfa_offset instruction in the prolog, which is enough for synchronous unwind tables.

MaskRay accepted this revision.Jun 29 2023, 8:26 PM
This revision is now accepted and ready to land.Jun 29 2023, 8:26 PM