Page MenuHomePhabricator

Correct dwarf unwind information in function epilogue

Authored by violetav on Jul 25 2017, 9:17 AM.



This patch aims to provide correct dwarf unwind information in function epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

  • CFI instructions do not affect code generation
  • Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary.

Changed CFI instructions so that they:

  • are duplicable
  • are not counted as instructions when tail duplicating or tail merging
  • can be compared as equal

Added CFIInstrInserter pass:

  • analyzes each basic block to determine cfa offset and register valid at its entry and exit
  • verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors
  • inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them.
CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue.

Diff Detail


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This patch has been reviewed in D18046 and commited. It was reverted because it caused a number of sanitizer test failures on PPC64 buildbot, and also, it reported incorrect cfa offset values in recursive LLVM builds. These issues have been resolved in this patch.

The following was changed:

Changed CFIInfoVerifier to report errors if it finds incorrect offset/register (not just to print them out).
Added support for unhandled cases that were the cause of incorrect offset values in recursive builds.
There were two issues.
The first one is the issue of a MBB having an adjust_cfa_offset and being its own successor. Because of this, the mechanism for updating CFI info was changed. Added AdjustIncomingCFAOffset and AdjustOutgoingCFAOffset information to a basic block. These values present the amount of adjustment (originating from adjust_cfa_offset directives) in effect at basic block entry and exit. These values get updated during different passes. IncomingCFAOffset and OutgoingCFAOffset now contain only values set by CFI instructions that set absolute offset (def_cfa, def_cfa_offset). In CFIInfoVerifier adjust in/out cfa offset values are added to in/out cfa offsets before checking this information for correctness.
The second issue was the case of creating a new MBB in SplitCriticalEdge() and not updating its CFI info.
Added maintainsCFIInfo() in TargetFrameLowering so that CFIInfoVerifier and CFIInstrInserter run only for targets that maintain CFI information (only X86 for now).

While trying to make this work with AArch64 and add the support for .cfi_offset / .cfi_restore directives, I came across a few questions and comments:

  • Do you plan on adding MIR support? I guess you would have to (de)serialize the state that was added in MachineBasicBlock.h.
  • This pass is required to have correct unwind information, but it's disabled on Darwin, because of compact unwinding. Compact unwinding has a mode where it can fallback on DWARF CFI (X86, AArch64), and it is used for cases where the information can't be encoded in that format. I guess this can take a similar path.
1489 ↗(On Diff #108098)

SmallPtrSet<MachineBasicBlock *> ?

1546 ↗(On Diff #108098)

SmallPtrSet<MachineBasicBlock *> ?

241 ↗(On Diff #108098)

Looks like this is already called in PEI::insertPrologEpilogCode, which sounds like a better idea than having it in target specific passes.

Do you plan on adding MIR support?

Not at the moment.

This pass is required to have correct unwind information, but it's disabled on Darwin, because of compact unwinding. Compact unwinding has a mode where it can fallback on DWARF CFI (X86, AArch64), and it is used for cases where the information can't be encoded in that format. I guess this can take a similar path.

When CFI instructions are inserted in epilogue, wrong compact unwind encoding is generated. I have previously posted a question to llvm-dev about this and received an answer that "Darwin compact unwinder is never going to learn anything to its benefit from a CFI instruction in the epilogue". I have tried the suggested approach (marking CFI instructions with FrameSetup/FrameDestroy, and then disregarding the ones from epilogue, marked with FrameDestroy). However, there was a problem when you wanted to generate compact unwind section from an existing .s file; there is no info about FrameSetup/FrameDestroy flags, so again you end up generating the wrong encoding.
In the end I disabled this support for Darwin.

I am not sure what you are suggesting here. Doesn't that mean that compact unwind would practically be disabled (would always fallback to DWARF)?

241 ↗(On Diff #108098)

You are right. This was needed for previous version of the patch, and I haven't removed it. This pass runs before PEI and needed to have initial values set, because cfa offset and cfa adjustment were both contained in cfa offset field (in the previous patch). Now that they are apart, and because this pass adds only adjust_cfa_offset directives, this initialization is no longer needed here; absolute cfa offsets can be initialized in PEI.

I am not sure what you are suggesting here. Doesn't that mean that compact unwind would practically be disabled (would always fallback to DWARF)?

The difference here is that your pass is not even running if the target is Darwin. Compact unwinding is generated from CFI. The CFI generated without your pass can be incorrect, so the compact unwinding will be too.

What I'm suggesting is that when we have extra CFI directives that can't be encoded in the compact unwinding format, we use the fallback on DWARF (as we already do in other cases). That would generate correct unwinding information (since your pass fixed it), but won't use compact unwinding.

Let me know if there's something I missed here.

violetav updated this revision to Diff 109065.Aug 1 2017, 2:49 AM

Removed unnecessary call to initializeCFIInfo() in X86CallFrameOptimization.cpp.
Using SmallPtrSet instead of std::set in MachineBasicBlock.cpp.

iteratee edited edge metadata.Aug 3 2017, 1:06 PM

Can you provide a description of what you had to change relative to the rollback, and how you're verifying that the issue that caused the rollback has been fixed?

petarj added a comment.Aug 3 2017, 2:22 PM

Can you provide a description of what you had to change relative to the rollback

Violeta has provided that information in the first comment.

and how you're verifying that the issue that caused the rollback has been fixed?

She reproduced all the issues, and verified that none of these occur with this version of the patch.

Hi Kyle,
Here are a few more details, on top of what Petar said:

So, there were a few issues, the first one being that some sanitizer tests on PPC64 were failing due to this change (in stage 2). I saw that the CFIInfoVerifier reported incorrect offsets during stage2 build for PPC, and realised what was causing this. Even if the values for cfa offset and register are not initialized for platforms other that X86, cfa offset value can get set in recalculateCFIInfo(), so it doesn't remain uninitialized. I confirmed that CFIInstrInserter and CFIInfoVerifier were causing the failures on PPC, and not the changes made to CFI instructions in general. So, I added a target hook so that these two passes run only for platforms that maintain CFI info. This resolved the issue for PPC.
The second was the group of issues that caused incorrect offset reports with recursive LLVM builds (which I unfortunately didn't test before). I started debugging the example that Eli reported: This example had two issues, which I have described in the first comment. Resolving these two issues resulted in successfully building the CGStmtOpenMP.cpp.o example. I have also changed the CFIInfoVerifier to emit errors as Eli suggested, and not just to print them out. After that, I have built clang+llvm+compiler-rt+lld, ran make check-all, did a recursive build, ran make check with that, ran the llvm-test suite, and got no failures.

iteratee added inline comments.Aug 4 2017, 1:16 PM
781 ↗(On Diff #109065)

I don't follow the comment about why these are necessary again.
Can you elaborate?

772 ↗(On Diff #109065)

I thought part of the problem was that these passes should not be run generally.

MatzeB edited edge metadata.Aug 4 2017, 4:52 PM

In order to avoid the discussion being split between two places I added some comments to D18046

violetav added inline comments.Aug 17 2017, 8:37 AM
781 ↗(On Diff #109065)

I will try. So, let's say there is a function that looks like this:

  BB_1 <-----|
  /  \       |
BB_2 BB_3    |
       |     |
     BB_4    |
       |     |
     BB_5 ___|

and that the following is true:

  1. BB_1 has following cfi directives:


.cfi_adjust_cfa_offset 4
.cfi_adjust_cfa_offset -4
  1. BB_4 has the same:


.cfi_adjust_cfa_offset 4
.cfi_adjust_cfa_offset -4
  1. BB_1 is a successor of BB_5 (if it's not clear from the 'illustration', BB_0 has one successor, BB_1 has two, and BB_3,BB_4,BB_5 each have one).
  1. For this example, first the cfi_adjust_cfa directives with positive values are inserted (in both BB_1 and BB_4) and then the ones with negative values.

What happens here with the previous approach is that, although BB_1 and BB_4 do not change cfa offset (their incoming and outgoing offsets are the same), they and their successors end up having the wrong offset set.

This was the way that in/out offset was set to successors in the previous approach (from updateCFIInfoSucc()):

// the difference remains the same, calculate it by using previous in/out offset values
int AdjustAmount = Succ->getOutgoingCFAOffset() - Succ->getIncomingCFAOffset();
// set outgoing offset to be sum of the outgoing offset of the predecessor and the amount that this successor adjusts the offset itself (with adjust_cfa_directives)
Succ->setOutgoingCFAOffset(CurrSucc->getOutgoingCFAOffset() + AdjustAmount);
// set incoming offset to be the outgoing offset of the predecessor

Let's say that all blocks have in/out offset set to zero at the beginning (just to simplify the numbers).
When first 'adjust_cfa_offset 4' is inserted in BB_1, after updating the successors, the in/out cfa offsets of BBs in the function look like this:

BB_0, in 0, out 0
BB_1, in 0, out 4
BB_2, in 4, out 4
BB_3, in 4, out 4
BB_4, in 4, out 4
BB_5, in 4, out 4

Next, when 'adjust_cfa_offset 4' is inserted in BB_4:

BB_0, in 0, out 0
BB_1, in 8, out 12
BB_2, in 12, out 12
BB_3, in 12, out 12
BB_4, in 4, out 8
BB_5, in 8, out 8

Next, when 'adjust_cfa_offset -4' is inserted in BB_1:

BB_0, in 0, out 0
BB_1, in 8, out 8
BB_2, in 8, out 8
BB_3, in 8, out 8
BB_4, in 8, out 12
BB_5, in 12, out 12

Next, when 'adjust_cfa_offset -4' is inserted in BB_4:

BB_0, in 0, out 0
BB_1, in 8, out 8
BB_2, in 8, out 8
BB_3, in 8, out 8
BB_4, in 8, out 8
BB_5, in 8, out 8

And then you end up having wrong in/out cfa offset for blocks 1-5 because the adjustments are contained in OutgoingCFAOffset which is the value that gets propagated further. That is what is propagated to the successors, instead of the increment/decrement from adjust directives. This way of calculating in/out cfa offset values produces wrong results for the case where you have a cycle (like in the example above).

What is changed now is that each 'adjust_cfa_offset' directive is propagated to the successors when it's inserted and updates their incoming and outgoing adjust values appropriately. In the end, before verifying CFI info and inserting additional CFI instructions if needed, these values are added to incoming and outgoing cfa offset (that now contain only absolute offsets that are set).

772 ↗(On Diff #109065)

I put the added target hook in the passes themselves. Should I move it to TargetPassConfig?

thanm added a subscriber: thanm.Sep 27 2017, 7:57 AM
cherry added a subscriber: cherry.Sep 27 2017, 12:01 PM
thanm added a comment.Sep 28 2017, 2:56 PM

Remark: there are several outstanding LLVM bugs for this problem:

bug 25916 filed in 2015 and
bug 20774 filed in 2014
bug 13161 filed way back in 2012

My team also ran into the same problem this month. Looking in the Google internal bug tracker I can find reports there too.

It would be great to get this bugfix in -- seems as though it has caused problems for many folks over the years.

violetav updated this revision to Diff 117536.Oct 3 2017, 8:47 AM

Here is the new patch written based on the comments received and the previously implemented solution.
What is changed:

  • Removed all changes from MachineBasicBlock.
  • Removed changes related to updating in/out CFI information from common passes (BranchFolding.cpp and TailDuplicator.cpp).
  • Merged CFIInfoVerifier and CFIInstrInserter into one pass that now calculates in/out CFI info for BBs, verifies it and inserts additional CFI instructions if needed. This pass isn't added in TargetPassConfig.cpp if the target does not maintain CFA info.

These are the tests that still haven't been updated (and that currently fail with this patch):


They need to have appropriate .cfi directives added.
I will add these changes when this patch gets approved.

aprantl added inline comments.Oct 3 2017, 9:03 AM
346 ↗(On Diff #117536)

/// please

64 ↗(On Diff #117536)

please use /// for all comments describing declarations.

Hi Violeta,

Thanks a lot for working on this! This looks really nice! Seems a lot easier to adapt to other targets and other CFI state like CSRs.

33 ↗(On Diff #117536)

I think most of the members can be private here.

53 ↗(On Diff #117536)

Should this be guarded by the EXPENSIVE_CHECKS macro?

62 ↗(On Diff #117536)

In LLVM we usually use

struct MBBCFAInfo {
87 ↗(On Diff #117536)

Why not a reference instead of a pointer? It can never be null, right?

131 ↗(On Diff #117536)

No need for struct here.

156 ↗(On Diff #117536)

You can already initialize the variables here.

193 ↗(On Diff #117536)

You could limit the scope of SuccInfo to the loop body.

254 ↗(On Diff #117536)

I think you can assume the reference is always non-null here, right?

Thanks @violetav this looks a lot better as a self-contained pass!

795 ↗(On Diff #117536)

This name is unintuitive to me. And it feels like this isn't really a category of machine instructions but rather just a combination that BranchFolding is able to deal with after this patch.

Maybe it would be better to move this into BranchFolding.cpp then in the form of
static bool isDirective(const MachineInstr &MI) { return MI.isDebugValue() || MI.isCFIInstruction(); }.

420–424 ↗(On Diff #117536)

Even though there is a lot of precedent in this file, I'm starting to wonder how useful it is to have a description of a pass at the beginning of the passes implementation file and another one in front of the corresponding function in Passes.h. Maybe it would be best to shorten this to:

/// Creates CFI Instruction Inserter pass. \see CFIInstrInserter.cpp

so we only have to maintain one description of the passes details.

345–347 ↗(On Diff #117536)
  • Use doxygen /// comments.
  • I guess const MachineFunction &MF should work as well?
349–351 ↗(On Diff #117536)


268–272 ↗(On Diff #117536)

I think we should rather add a TargetPassConfig method instead of another TargetMachine callback (see below).

10–18 ↗(On Diff #117536)

Use doxygen comments:

/// \file This pass verifies ...
35 ↗(On Diff #117536)

This feels slightly out of place as a member variable. Why not make it a return value of verify()?

53 ↗(On Diff #117536)

I wouldn't go as far as EXPENSIVE_CHECKS. But I think #ifndef NDEBUG would be good to skip the verification in release builds.

77–79 ↗(On Diff #117536)

MachineBasicBlocks have a dense numbering, so you can use a more efficient std::vector of length MachineFunction::getNumBlockIDs() instead. (See MachineTraceMetrics::BlockInfo for an example)

114 ↗(On Diff #117536)

We tend to use lowercase pass names here. So maybe "cfi-instr-inserter"?

130 ↗(On Diff #117536)

Please avoid auto in cases where the actual type is only slightly longer. I think that is friendlier to readers of the code. (same for a number of other loops below).

164–165 ↗(On Diff #117536)

You could use for (MachineInstr &MI : make_range(MBBInfo->MBB->instr_begin(), MBBInfo->MBB->instr_end()) { ... } here.

205 ↗(On Diff #117536)

You could move this declaration down to the use (and combine it with the first assignment).

254 ↗(On Diff #117536)

References must be non-null otherwise it's undefined behavior. This means the compiler will most likely optimize this assert away and it won't do anything even if the reference happened to illegally be nullptr.

266 ↗(On Diff #117536)

The declarations could be move downwards to their first use.

331–347 ↗(On Diff #117536)

This looks like a good candidate for a switch().

109–112 ↗(On Diff #117536)

I'd simply use llvm_unreachable("getInitialCFAOffset not implemented!"); and don't even bother checking the flag. Same in the next function.

877–881 ↗(On Diff #117536)
  • I'd vote to remove the maintainsCFAInfo() flag and simply let the targets that need it add the pass as part of addPreEmitPass().
  • Is the , false really necessary? I wouldn't expect the pass to produce anything that fails the machine verifier.
violetav updated this revision to Diff 117836.Oct 5 2017, 9:20 AM

Addressed comments. Added changes for tests that were previously failing.

violetav marked 23 inline comments as done.Oct 5 2017, 9:30 AM
violetav added inline comments.
795 ↗(On Diff #117536)

This is used in BranchFolding and TailDuplication. It was suggested previously to be done this way

MatzeB added a comment.Oct 5 2017, 9:48 AM

This is starting to look good, a few more nitpicks below and hopefullly @iteratee will comment on the isDirective() thing.

795 ↗(On Diff #117536)

@iteratee Have you seen isMetaInstruction() and isTransient()? This feels like yet another variation of machine instructions that don't produce code. However I don't know why this specific combination is interesting and the generic name and no comment doesn't help understanding that either.

333–353 ↗(On Diff #117836)

Keep case/default at the same indentation level as the switch.

354–355 ↗(On Diff #117836)

As there are only 2 or 3 cases that are not handled yet, I would recommend to explicitely add cases for them and not add a default: at all. The benefit of this is that when someone adds new OpTypes in the future he will immediately get a compiler warning for a missing case when he forgets to add a case here.

110 ↗(On Diff #117836)

llvm_unreachable never returns so you can leave out the return statement behind it. Same in the next function.

iteratee added inline comments.Oct 5 2017, 1:32 PM
795 ↗(On Diff #117536)

I hadn't.
Violeta, are there any reasons those 2 categories won't work?

thegameg added inline comments.Oct 5 2017, 2:35 PM
202 ↗(On Diff #117836)

This could be a const MBBCFAInfo * to avoid copies.

210 ↗(On Diff #117836)

And this const MBBCFAInfo &.

262 ↗(On Diff #117836)


264 ↗(On Diff #117836)


292 ↗(On Diff #117836)


violetav updated this revision to Diff 118012.Oct 6 2017, 8:58 AM

Addressed comments.

violetav marked 8 inline comments as done.Oct 6 2017, 9:02 AM
violetav added inline comments.
795 ↗(On Diff #117536)

Four regression tests fail when isDirective() is replaced with isMetaInstruction():

Different code gets generated because of some tail merging and tail duplicating that don't happen with clean version of the code. At first glance, it seems that presence of IMPLICIT_DEF affects BranchFolding, and presence of KILL affects TailDuplicator. I haven't looked at this further than comparing .s files.

iteratee added inline comments.Oct 6 2017, 10:09 AM
795 ↗(On Diff #117536)


For Tail duplication I'm certain that it's fine. It's just counting instructions, and so it should skip all the meta instructions.

If you post the diff for BranchFolding, I'll look at it. I'm guessing it's fine too, but I'm less certain.

It would be easier if you did that first in a separate patch.

violetav added inline comments.Oct 10 2017, 10:13 AM
795 ↗(On Diff #117536)


Other than different code being generated in some tests, there are at least two problems when isMetaInstruction() is used instead of isDirective() in BranchFolding.
First of all, even without this patch (only when places where there's isDirective() in it are replaced with isMetaInstruction()) there is a test that fails with 'Bad machine code' errors. That's test/CodeGen/PowerPC/extra-toc-reg-deps.ll and it reports:

Bad machine code: MBB has more than one landing pad successor ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#49 lpad175 (0x16665d0)

Bad machine code: MBB exits via unconditional branch but doesn't have exactly one CFG successor! ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#49 lpad175 (0x16665d0)

Bad machine code: MBB has more than one landing pad successor ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#53 lpad230 (0x1666da0)

Bad machine code: MBB exits via unconditional branch but doesn't have exactly one CFG successor! ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#53 lpad230 (0x1666da0)

Bad machine code: MBB has more than one landing pad successor ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#63 _ZN4Foam4ListIiED2Ev.exit.i3073 (0x1668510)

Bad machine code: MBB exits via unconditional branch but doesn't have exactly one CFG successor! ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#63 _ZN4Foam4ListIiED2Ev.exit.i3073 (0x1668510)

Bad machine code: MBB has more than one landing pad successor ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#70 lpad905.loopexit.split-lp (0x1668e00)

Bad machine code: MBB exits via unconditional branch but doesn't have exactly one CFG successor! ***
function: _ZN4Foam13checkTopologyERKNS_8polyMeshEbb
basic block: BB#70 lpad905.loopexit.split-lp (0x1668e00)
LLVM ERROR: Found 8 machine code errors.

The second problem is seen in test/CodeGen/X86/conditional-tailcall.ll, which fails with following error:

The incoming offset of a successor is inconsistent. ***
function: pr31257
basic block: BB#12 if.else28 (0x903c28)
Successor BB#14 has incoming offset (32), while BB#12 has outgoing offset (48).

The outgoing offset of a predecessor is inconsistent. ***
function: pr31257
basic block: BB#14 cleanup.thread (0x903e98)
Predecessor BB#12 has outgoing offset (48), while BB#14 has incoming offset (32).
LLVM ERROR: Found 2 in/out CFI information errors.

What happens here is that a case appears where an iterator that points to the start of the common tail, points to a CFI instruction, and then that CFI instruction gets cut off, with a jump to the common tail, and gets lost. This then results in setting wrong outgoing CFA offset for that block (because one CFI instruction is missing), and the CFI Info Verifier reports a mismatch between outgoing offset of that block and incoming offset of its successor.

Errors about incorrect cfa offset are also reported with recursive LLVM build.

Considering these failures, I am not sure that isMetaInstruction() is safe to use here. I am not familiar with how these types of instructions should be handled.

It makes sense that isDirective() method is not added to MachineInstr, as it is used only for this purpose. If isMetaInstruction() turns out to be unsuitable for use here, we could, as Matthias suggested, add something such as shouldSkipInstr(MI) to BranchFolding and TailDuplication.

Is there anything else that should be done for this patch?

Is there anything else that should be done for this patch?

Could you try changing it to not add MachineInstr::isDirective()? The rest looked good.

Could you try changing it to not add MachineInstr::isDirective()?

Would changing the name 'isDirective()' into something more sensible and adding a description be acceptable?
Or maybe moving it to BranchFolding and using isMetaInstruction() in TailDuplication?
Or leaving the 'isDebugValue() || isCFIInstruction()' condition without adding a special method for this particular case?

I know that isDirective() would currently be used only in BranchFolding, but I suspect that it will also need to be used elsewhere when support for CFI instructions in epilogue is added to other targets.

violetav updated this revision to Diff 119589.Oct 19 2017, 9:25 AM

Removed MachineInstr::isDirective(). Added countsAsInstruction(const MachineInstr &MI) in BranchFolding. Replaced isDirective() with isMetaInstruction() in TailDuplication.

rnk edited edge metadata.Oct 23 2017, 5:57 PM

I'd like to see this go in so we can iterate on it in tree.

92 ↗(On Diff #119589)

These are gone now. You will need to rebase this patch.

MatzeB accepted this revision.Oct 23 2017, 10:37 PM

LGTM, thanks!

This revision is now accepted and ready to land.Oct 23 2017, 10:37 PM
violetav updated this revision to Diff 121130.Nov 1 2017, 8:13 AM
violetav edited the summary of this revision. (Show Details)

Rebased the patch.
Thank you all for the comments!

This revision was automatically updated to reflect the committed changes.

As this change was reverted in r317726, is the way to solve this issue to leave CFI_INSTRUCTION to be notDuplicable, and allow TailDuplication to generate different code when cfi_directives are present (e.g. not to duplicate epilogue blocks, if they contain cfi_directives) ?