This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Async unwind - add a pass to fix CFI information
ClosedPublic

Authored by chill on Nov 24 2021, 10:32 AM.

Details

Summary

This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.

Unlike the CFIInstrInserer pass, this one almost always emits only
.cfi_remember_state/.cfi_restore_state, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. ShrinkWrap.cpp):

  • there is a single basic block, containing the function prologue
  • possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks.
  • prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  • "has a call frame", if the function has executed the prologue, or has not executed any epilogue
  • "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

  • CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be:
.cfi_def_cfa <sp>, 0
.cfi_same_value <rN>
.cfi_same_value <rN-1>
...

where <rN> are the callee-saved registers.

  • CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence:
.cfi_restore_state
.cfi_remember_state

In this case we also insert a .cfi_remember_state after the
last CFI instruction in the function prologue.

Diff Detail

Event Timeline

chill created this revision.Nov 24 2021, 10:32 AM
chill requested review of this revision.Nov 24 2021, 10:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2021, 10:32 AM
xgupta added a subscriber: jrtc27.Nov 28 2021, 11:13 AM

This seems like it should be a generic pass conditionally added (when supported, until all backends support it) by TargetPassConfig with TRI/TLI/etc hooks, not something each target needs to subclass and manually add to the pass order

jrtc27 added inline comments.Nov 28 2021, 11:35 AM
llvm/lib/CodeGen/CFIFixup.cpp
146

Having both even in the case when you don't need it is a bit weird. You should be able to just keep track of the last MBB seen that had a frame and insert the remember at the end of it when you discover you need a restore in the current MBB. That also removes the need to special-case the prologue.

(This code is also confusing due to using MBB.begin() twice rather than an interator that gets advanced, as the insertion order ends up being backwards so the code looks the wrong way round at first glance)

chill updated this revision to Diff 392748.Dec 8 2021, 6:40 AM
chill marked an inline comment as done.Dec 8 2021, 6:41 AM
chill added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
146

Indeed, thanks for the suggestion!

jrtc27 added inline comments.Dec 8 2021, 6:55 AM
llvm/lib/CodeGen/CFIFixup.cpp
191

Ok but doing that here doesn't make sense, now you restore back to the prologue but other things may have happened? I was imagining updating the last MBB with a frame whenever Info.HasFrameOnExit. Then you also don't need InsertPt, just insert at the end of the that MBB. That'll keep the remember/restore tight around just the MBBs that clobber the CFI state.

jrtc27 added inline comments.Dec 8 2021, 7:04 AM
llvm/lib/CodeGen/CFIFixup.cpp
191

Although that doesn't work when you have multiple epilogues next to each other. You can instead keep InsertPt lying around, and updated it to CurrBB->end() whenever HasFrameOnExit is true, and in this case here set it to what you currently do just in case you don't see a non-epilogue frame before the next restore is needed.

chill marked an inline comment as done.Dec 8 2021, 7:13 AM

This seems like it should be a generic pass conditionally added (when supported, until all backends support it) by TargetPassConfig with TRI/TLI/etc hooks, not something each target needs to subclass and manually add to the pass order

If that target specific functionality was needed by multiple separate passes, I would consider using hooks. While technically possible in this case, I fail to see
any advantage in doing so.

jrtc27 added a comment.Dec 8 2021, 7:55 AM

This seems like it should be a generic pass conditionally added (when supported, until all backends support it) by TargetPassConfig with TRI/TLI/etc hooks, not something each target needs to subclass and manually add to the pass order

If that target specific functionality was needed by multiple separate passes, I would consider using hooks. While technically possible in this case, I fail to see
any advantage in doing so.

I'm not aware of any passes that apply to all targets and follow the model of requiring you to subclass them rather than using TLI/TRI/TFI hooks? This pass is very similar in its model to PrologEpilogInserter, which uses TFI hooks, not subclassing. I would expect this to follow the same approach. It's certainly much easier to use from a backend author's perspective, less boilerplate needed, you just implement the hooks in your TFI and, at least for now, opt in via a bool or bool-returning method in your TFI, and that's it. No new files, no messing with TM to add your pass, no target-specific name for the pass that gets annoying when running tests/using -debug-only/etc, ...

chill added inline comments.Dec 8 2021, 8:50 AM
llvm/lib/CodeGen/CFIFixup.cpp
191

I'm not sure I understand.

Conceptually, we do two operations: reset to initial state and reset to after-prologue state.
The pair .cfi_restore_state/.cfi_remember_state is a workaround for the lack of suitable CFI directive that sets the current state without popping the stack, in other words, these make sense to go together.

As an optimisation from this base case, as you suggested, we can omit the last .cfi_remember_state, saving a bit of space in the unwind tables.

Keeping directives adjacent to one another reduces the size of the unwind tables, as you don't need intervening DW_CFA_advance_loc instructions.

Do you suggest the instead of inserting .cfi_remember_state just after the last .cfi_restore_state we could insert it at the end of the last block, known to have a stack frame?
That would separate the directive from the previous .cfi_restore_state.

Also, I think I had an earlier variant that inserted at the ->end() of blocks, which caused failures in the verifier. Didn't investigate it further, but that would mean the directive has to go before the terminator, which would separate it from a following .cfi_restore_state, too.

Apologies if I misunderstood you.

chill updated this revision to Diff 393508.Dec 10 2021, 8:41 AM
chill added a comment.Dec 10 2021, 8:44 AM

This seems like it should be a generic pass conditionally added (when supported, until all backends support it) by TargetPassConfig with TRI/TLI/etc hooks, not something each target needs to subclass and manually add to the pass order

I reworked it like this^, I think, in the last update.

Sorry to ping reviewers (i am not confident to review it myself) to review, as this patch's .cfi_remeber_state & .cfi_remember_state support is a prerequisite to fix/complete the RISCV CFI directive support. Just a question: the patch is adding .cfi_same_value support but no test case shows that functionality?

chill added a comment.Dec 20 2021, 2:04 AM

Sorry to ping reviewers (i am not confident to review it myself) to review, as this patch's .cfi_remeber_state & .cfi_remember_state support is a prerequisite to fix/complete the RISCV CFI directive support. Just a question: the patch is adding .cfi_same_value support but no test case shows that functionality?

cfi-fixup.mir has some .cfi_same_values

xgupta resigned from this revision.Dec 20 2021, 3:57 AM
chill updated this revision to Diff 399016.Jan 11 2022, 10:40 AM

Could you please rebase the series? it generated a lot of conflict for me.
the generated directives in the tests are looking good.

llvm/lib/CodeGen/CFIFixup.cpp
103

NIT, I'm not sure the bitfield here brings anything in terms of performance vs used memory.

121

NIT HasPrologue maybe better to call AfterPrologue since recalculated during the traversal but the function could has a prolog while HasPrologue false.
HasEpilogue -> AfterEpilog?

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
624–625

later the for loop can handle the empt list.

chill updated this revision to Diff 408927.Feb 15 2022, 9:36 AM
chill updated this revision to Diff 409194.Feb 16 2022, 3:17 AM
chill marked 5 inline comments as done.
chill added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
103

It brings 200% reduction in the size of an auxiliary data structure (from 3 to 1 byte) and
we have one entry of these per basic block. At the same time it's extremely cheap
as coding effort (6 key presses :D) and does not impact readability in any way: if you
delete the bitfields, nothing of the rest of the code needs to change.

Is it a measurable difference? Maybe yes, maybe no, but if no, I will claim the rest of the compiler
is just written in a sloppy and lazy manner :D

121

It's ok, these two variables pertain only to the block examined in the current iteration of the loop.

chill marked 2 inline comments as done.Feb 16 2022, 7:50 AM
danielkiss accepted this revision.Feb 24 2022, 11:03 AM

LGMT

llvm/include/llvm/Target/TargetOptions.h
347–348
llvm/lib/CodeGen/CFIFixup.cpp
103

I'm happy with the bitfield just was interested in :D

This revision is now accepted and ready to land.Feb 24 2022, 11:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 8:10 AM
chill requested review of this revision.Mar 4 2022, 8:30 AM

Brought forward, emitting epilogue would no be correct without this patch. A little bit of tests moved to the epilogues patch, no other changes.

chill updated this revision to Diff 413421.Mar 7 2022, 4:17 AM
chill marked an inline comment as done.Mar 7 2022, 4:18 AM
chill updated this revision to Diff 417923.Mar 24 2022, 7:32 AM

Updated on top of latest main, will commit it the following days, based on previous
acceptance, as it contains no non-trivial changes.

chill accepted this revision.Mar 24 2022, 7:34 AM
This revision is now accepted and ready to land.Mar 24 2022, 7:34 AM
MaskRay added inline comments.Mar 25 2022, 10:02 PM
llvm/lib/CodeGen/CMakeLists.txt
15

Move before CFIInstrInserter.cpp to keep alphabetical order.

llvm/lib/CodeGen/TargetPassConfig.cpp
145

cl::init(false), can be deleted

MaskRay added inline comments.Mar 25 2022, 10:36 PM
llvm/lib/CodeGen/CFIFixup.cpp
29

"strong no call frame on entry"

The "on entry" seems important because a block with a prologue may have the StrongNoFrameOnEntry flag set.

88

llvm::reverse(MBB)

Though using reverse iteration may not improve much.

114
115
115
137
139

&& !HasEpilogue seems unneeded.

146

... Every block inherits the frame state (as recorded in the unwind tables) of the previous block. If the intended frame state is different, insert compensating CFI instructions.

160

delete braces

for (MachineInstr &MI : *PrologueBlock) is more common.

MaskRay accepted this revision.Mar 25 2022, 11:08 PM
MaskRay added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
182

Super nit: if (!HasFrame && Info.HasFrameOnEntry && !Info.StrongNoFrameOnEntry) {

HasFrame represents the previous state and Info.HasFrameOnEntry represents the current state. The order implies a state transition.


The term "Reset" seems to imply transiting to the initial state.

"Transit to" may be better. To describe !HasFrame as well, the comment is better changed to something like Transit from ... state to ... state.

200

Similarly, HasFrame && !Info.HasFrameOnEntry

MaskRay added inline comments.Mar 25 2022, 11:18 PM
llvm/lib/CodeGen/CFIFixup.cpp
29

Does the Info.StrongNoFrameOnEntry semantic drift from the description here? In addition, removing !Info.StrongNoFrameOnEntry && from the .cfi_remember_state code path's condition below doesn't break any test, which indicates missing coverage.

MaskRay added a comment.EditedMar 26 2022, 7:01 PM

there is a single basic block, containing the function prologue

There is the current limitation of LLVM's shrink wrapping.
I think GCC can place a prologue in multiple basic blocks.
It seems that the algorithm can process multiple prologues with some conditions.

If we consider (a) a prologue, (b) an epilogue, or (c) a region delimitered by a pair of prologue/epilogue, as a vertex in a graph,
the algorithm can handle any linear basic block layout which is a pre-order traversal.
(This condition is sufficient but not necessary: if we require that all prologues are the same, more traversal orders can be handled)

Conceptually, the basic blocks can be partitioned into multiple segments where each segment describes a prologue and its associated epilogues/inner regions.
For each segment, the prologue must precede all its associated epilogues.

"prologue and epilogue blocks are outside of any loops"

I think this is naturally satisfied and does not need to be emphasized here.
I think the algorithm can handle prologue and epilogue in a loop (a prologue vertex cannot be recursively entered)

"has a call frame", if the function has executed the prologue, or has not executed any epilogue

or => "... and has not executed any epilogue"

From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order.

"layout order" seems to suggest the linear placement order.
Here we seem to suggest the order related to reachability, but I do not know the appropriate term.


In a function using SP as the CFA, if a call needs to temporarily adjust the stack size (e.g. x86 pushl/pushq; in x86-32 pushl is particular common because of the calling convention), llvm may produce something like .cfi_adjust_cfa_offset {4,8}. Can AArch64 codegen produce something like this? Whose responsibility is to emit a negative .cfi_adjust_cfa_offset to cancel the effect? CFIFixup cannot handle such a case while CFIInstrInserter can handle it.

See test/CodeGen/X86/segmented-stacks-dynamic.ll: rbx may be used as a base register and CFIInstrInserter can insert .cfi_restore %rbx. CFIFixup doesn't support this.

MaskRay added inline comments.Mar 26 2022, 7:27 PM
llvm/lib/CodeGen/CFIFixup.cpp
182

Actually, I think Info.StrongNoFrameOnEntry can just be removed.

Info.HasFrameOnEntry implies that this is a block in "after prologue" state and Info.StrongNoFrameOnEntry must be false.

MaskRay requested changes to this revision.Mar 26 2022, 10:22 PM
MaskRay added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
99

MF.getNumBlockIDs() may be larger than MF.size()

This revision now requires changes to proceed.Mar 26 2022, 10:22 PM

"layout order" seems to suggest the linear placement order.

That's right, it's the order of increasing instruction addresses, corresponding to the basic block layout and the order of instructions in the basic block.

In a function using SP as the CFA, if a call needs to temporarily adjust the stack size (e.g. x86 pushl/pushq; in x86-32 pushl is particular common
because of the calling convention), llvm may produce something like .cfi_adjust_cfa_offset {4,8}. Can AArch64 codegen produce something like this?

I don't think AArch64 ever generates .cfi_adjust_cfa_offset.

Whose responsibility is to emit a negative .cfi_adjust_cfa_offset to cancel the effect? CFIFixup cannot handle such a case while CFIInstrInserter can handle it.

Indeed, if a backend uses SP as a frame pointer and allocates/deallocates outgoing arguments space in different basic blocks, CFIFixup won't fix it.

See test/CodeGen/X86/segmented-stacks-dynamic.ll: rbx may be used as a base register and CFIInstrInserter can insert .cfi_restore %rbx. CFIFixup doesn't support this.

As far as I can understand this testcase, the blocks containing .cfi_restore %rbx would have StrongNoFrameOnEntry set, which would make CFIFixup
fall into the "reset to initial state".

FWIW, for the epilogue blocks the test, the PEI should have already inserted .cfi_restore for the pops of the CSRs.

"layout order" seems to suggest the linear placement order.

That's right, it's the order of increasing instruction addresses, corresponding to the basic block layout and the order of instructions in the basic block.

OK

In a function using SP as the CFA, if a call needs to temporarily adjust the stack size (e.g. x86 pushl/pushq; in x86-32 pushl is particular common
because of the calling convention), llvm may produce something like .cfi_adjust_cfa_offset {4,8}. Can AArch64 codegen produce something like this?

I don't think AArch64 ever generates .cfi_adjust_cfa_offset.

OK. Agreed according to the call sites of emitCFIAdjustCfaOffset.

Whose responsibility is to emit a negative .cfi_adjust_cfa_offset to cancel the effect? CFIFixup cannot handle such a case while CFIInstrInserter can handle it.

Indeed, if a backend uses SP as a frame pointer and allocates/deallocates outgoing arguments space in different basic blocks, CFIFixup won't fix it.

Agree.

See test/CodeGen/X86/segmented-stacks-dynamic.ll: rbx may be used as a base register and CFIInstrInserter can insert .cfi_restore %rbx. CFIFixup doesn't support this.

As far as I can understand this testcase, the blocks containing .cfi_restore %rbx would have StrongNoFrameOnEntry set, which would make CFIFixup
fall into the "reset to initial state".

FWIW, for the epilogue blocks the test, the PEI should have already inserted .cfi_restore for the pops of the CSRs.

Perhaps StrongNoFrameOnEntry has not clearly been explained:) hence my confusion.

chill updated this revision to Diff 419186.Mar 30 2022, 9:41 AM
chill marked 16 inline comments as done.Mar 30 2022, 9:54 AM
chill added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
29

There was a bug in the conditions, that I fixed and it triggers in a few of the tests (those with .cfi_same_value appearing)

182

My mental model is that both represent the "opinion" of the unwind tables on one hand (HasFrame), and the "opinion" of the CFG, on the other hand (Info.HasFrameOnEntry), about the current state at entry of the block. CFG is right (as far as we can tell form the available CFI instructions) and we fix the unwind tables.

182

If there are missing unwind instructions in the epilogue, a block can have HasFrameOnEntrySet, even though it's reachable from function entry without passing through a prologue. In this case StroingNoFrameOnEntry will be true. Example of this is llvm/test/CodeGen/AArch64/fast-isel-branch-cond-split.ll

MaskRay added inline comments.Mar 30 2022, 3:34 PM
llvm/lib/CodeGen/CFIFixup.cpp
10

"This pass inserts the necessary .cfi_remember_state/.cfi_remember_state and calls resetCFIToInitialState"

resetCFIToInitialState may insert CFI instructions other than .cfi_remember_state/.cfi_remember_state

28

I think this comment should be moved before StrongNoFrameOnEntry is declared in the code below.

29

Thanks:) I think I fully understand it now.

167

Delete the TODO: I believe this is not a TODO, but an assumption. The stack based remember/restore approach just cannot handle epilogue before prologue in theory.

This is what I wanted to express but probably not in a clear way:
"If we consider (a) a prologue, (b) an epilogue, or (c) a region delimitered by a pair of prologue/epilogue, as a vertex in a graph,
the algorithm can handle any linear basic block layout which is a pre-order traversal.
(This condition is sufficient but not necessary: if we require that all prologues are the same, more traversal orders can be handled)"

llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
29 ↗(On Diff #419186)

Thanks for the update to foo. It makes me understand the purpose of StrongNoFrameOnEntry. Right now there is no epilogue CFI, so BB1's HasFrameOnExit is (incorrectly) false. We rely on StrongNoFrameOnEntry being true to generate .cfi_def_cfa wsp, 0 and .cfi_same_value *.

However, using MF.getFrameInfo().getCalleeSavedInfo(); has the problem that we may add redundant .cfi_same_value. As this test shows, none of w18/w30/w29 is updated in previous blocks but we add them: (a) it unnecessarily increases unwind table size (b) it may confuse a reader.

We may need to check whether the registers have actually been modified by previous CFI instructions, before emitting .cfi_same_value.

MaskRay added inline comments.Mar 30 2022, 3:44 PM
llvm/lib/CodeGen/CFIFixup.cpp
196

It would be nice to have a test that a cfi_remember_state is inserted after a cfi_restore_state.
Right now if I delete ++InsertPt I don't break any test.

chill updated this revision to Diff 419432.Mar 31 2022, 6:51 AM
chill marked 2 inline comments as done.
chill marked 6 inline comments as done.Mar 31 2022, 7:09 AM
chill added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
28

I'd prefer to have an overall description in one place at the start of the file. It easy to relate the name StrongNoFrameOnEntry to the description.

167

I've moved it to the start of the file. If an actual need appears, I think it's implementable
and the pass need not restrict itself to remember/restore state.

196

cfi-fixup.mir tests this now.

llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
29 ↗(On Diff #419186)

These (except for x18) are not redundant. At LBB0_2 the unwind info says the CSR are on the stack, but they aren't there.

x18 is because of it's potential use as a shadow call stack pointer, I've fixed the code to emit .cfi_same_value x18 only if the function indeed uses shadow stack.

MaskRay accepted this revision.Mar 31 2022, 11:01 PM

Looks great! Thanks again for the hard work! And sorry again for being so slow to get to this important pass :-(

llvm/include/llvm/CodeGen/CFIFixup.h
2

"C++" is missing. Hyphens on the left are too long. See https://llvm.org/docs/CodingStandards.html#file-headers

llvm/lib/CodeGen/CFIFixup.cpp
167

If an epilogue proceeds the associated prologue, the pass needs to copy the prologue state and remember/restore cannot be used. This is a case I'd call an assumption instead of a TODO since implementing it seems has nearly no advantage, at the cost of large complexity. OK, you make the call whether the TODO should remain here:)

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
301

Optional: wonder whether this can be avoided by moving AArch64FrameLowering::resetCFIToInitialState after the body of needsShadowCallStackPrologueEpilogue

llvm/test/CodeGen/AArch64/arm64-shrink-wrapping.ll
29 ↗(On Diff #419186)

Thanks. I did not notice .cfi_offset w30 yesterday. My bad. Seems that there is some mechanism to not emit .cfi_same_value for unreferenced registers. Sounds good.

This revision is now accepted and ready to land.Mar 31 2022, 11:01 PM
chill updated this revision to Diff 419689.Apr 1 2022, 2:46 AM
chill marked 4 inline comments as done.
chill marked 4 inline comments as done.Apr 1 2022, 2:49 AM
chill added inline comments.
llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
301

I'd prefer various functions, which deal with CFI emission to be closer together.

chill marked an inline comment as done.Apr 4 2022, 2:00 AM

Looks great! Thanks again for the hard work! And sorry again for being so slow to get to this important pass :-(

Thanks, much appreciated!

This revision was landed with ongoing or failed builds.Apr 4 2022, 6:38 AM
This revision was automatically updated to reflect the committed changes.

Hi I have temporarily reverted this change as it has caused LLVM buildbot failures with failing fuzzer-threaded.test with clang crashing over an assertion failure in llvm/lib/CodeGen/CFIFixup.cpp. Kindly look into it. Thanks
.

chill reopened this revision.Apr 5 2022, 9:48 AM
This revision is now accepted and ready to land.Apr 5 2022, 9:48 AM
chill added a comment.Apr 5 2022, 9:54 AM

Hi I have temporarily reverted this change as it has caused LLVM buildbot failures with failing fuzzer-threaded.test with clang crashing over an assertion failure in llvm/lib/CodeGen/CFIFixup.cpp. Kindly look into it. Thanks
.

Thanks. From what I can tell the reason is the test ends up with unreachable block(s) for which the invariants we maintain do not hold. I'll make modification to ignore those blocks.
The block(s) have their address taken (recorded in the "sancov_pcs" section), but no predecessors.

chill planned changes to this revision.Apr 5 2022, 9:54 AM
chill updated this revision to Diff 420877.Apr 6 2022, 8:23 AM

Updated to exclude unreachable blocks.

This revision is now accepted and ready to land.Apr 6 2022, 8:23 AM
chill requested review of this revision.Apr 6 2022, 8:23 AM
chill added a reviewer: omjavaid.
MaskRay accepted this revision.Apr 7 2022, 8:26 PM
MaskRay added inline comments.
llvm/lib/CodeGen/CFIFixup.cpp
180
This revision is now accepted and ready to land.Apr 7 2022, 8:26 PM
chill updated this revision to Diff 421870.Apr 11 2022, 3:47 AM
chill marked an inline comment as done.
This revision was landed with ongoing or failed builds.Apr 11 2022, 5:36 AM
This revision was automatically updated to reflect the committed changes.