This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Allow duplication of tails with CFI instructions
ClosedPublic

Authored by djokov on Dec 7 2017, 12:00 PM.

Details

Summary

This commit came as a result for revert of patch r317579 (originally committed as r317100). The patch made CFI instructions duplicable, because their existence in the epilogue block was affecting the Tail duplication pass. However, duplicating blocks with CFI instructions was an issue for compact unwind info on Darwin, which is why the patch was reverted.

This patch allows duplicating tails with CFI instructions, though they are not duplicable, by copying them 'manually'.

Diff Detail

Repository
rL LLVM

Event Timeline

djokov created this revision.Dec 7 2017, 12:00 PM
MatzeB added a comment.EditedDec 14 2017, 11:17 AM

I haven't worked much with CFI instruction before, but seeing this it appears the only reason they are marked as NotDuplicable is because they use this strange system of referencing the FrameInstructions side table in the MachineFunction instead of encoding the information with normal machine operand.

But if that is really all there is to it then I would rather see us implement this logic in the MachineInstr::MachineInstr(MachineFunction &MF, const MachineInstr &MI) constructor, set isNotDuplicable=0 for CFI_INSTRUCTION.
That avoids all the special casing and allows us to use the normal MachineFunction::CloneMachineInstruction()/TargetInstrInfo::duplicate() methods.

lib/CodeGen/TailDuplicator.cpp
370–372 ↗(On Diff #126014)

Use references instead of pointers for things that mustn't be nullptr.

379–411 ↗(On Diff #126014)

Why not implement a copy constructor for MCCFIInstruction instead?

I just see that you mention compact unwind being in the way of duplicating CFI instructions. I wonder if there isn't something that can be done there, I'll research that a bit.

I think the bug described by @rnk in r317726 is still going to trigger with this patch.

The problem with compact unwind is that its purpose is to encode the unwind info for the whole function. This happens in AArch64AsmBackend.cpp, in the function generateCompactUnwindEncoding, where it tries to interpret the .cfi_ directives and encode them to a specific format. Not all functions can be represented in AArch64's compact unwinding. Things like stack adjustments > 65520, .cfi_offset directives that don't come up as a save of pairs, and mostly everything that makes generateCompactUnwindEncoding return CU::UNWIND_ARM64_MODE_DWARF fall back to using regular DWARF for unwind info.

After we allow the duplication of CFI_INSTRUCTIONS, the TailDuplicator that runs in the BasicBlockPlacement pass duplicates a shrink-wrapped function prologue, and we end up with two prologues instead of one, which is not supported by compact unwinding (I am even wondering if any backtracer / debugger supports this). I am also wondering if we prevent a shrink-wrapped prologue to be duplicated in any way, or we're just lucky that it never happens because things like CFI_INSTRUCTIONS stay in its way. Just by looking at the code it seems that duplicating instructions with the FrameSetup flag is not prohibited...

In cases like this, we can probably teach the compact unwind generator to just fallback on DWARF for multiple prologues instead of asserting, even though I certainly say unwinders will act correctly on it.

The idea behind this code is to fix Linux (which uses DWARF) and leave Darwin (which uses compact-unwind) as-is. That's why nonDuplicableInstruction returns true if (MBB->getParent()->getTarget().getTargetTriple().isOSDarwin()).

AArch64's generateCompactUnwindEncoding is part of Darwin's backend so it is not triggered with this change.

If we fix generateCompactUnwindEncoding(Impl) in ARM, AArch64 and x86 backends to fallback to DWARF when the info cannot fit into compact encoding, this change might be unnecessary.
Though even if we do that, I wonder whether we should leave the choice to users that they can require compact encoding even if that affects code generation (for example when object file size matters).

BTW, the test in this patch came up from Reid's example attached to r317726.

The idea behind this code is to fix Linux (which uses DWARF) and leave Darwin (which uses compact-unwind) as-is. That's why nonDuplicableInstruction returns true if (MBB->getParent()->getTarget().getTargetTriple().isOSDarwin()).

Oh, I'm sorry, I misunderstood your intent.

AArch64's generateCompactUnwindEncoding is part of Darwin's backend so it is not triggered with this change.

I agree.

If we fix generateCompactUnwindEncoding(Impl) in ARM, AArch64 and x86 backends to fallback to DWARF when the info cannot fit into compact encoding, this change might be unnecessary.

I agree.

Though even if we do that, I wonder whether we should leave the choice to users that they can require compact encoding even if that affects code generation (for example when object file size matters).

I think that is also a concern of emitting "correct" cfi directives in epilogues, right? Even on Linux it would have some size impact (and runtime slowdown in the unwinder) because we have more directives.

Ok, then I would suggest that instead of this code we

  1. make CFI instructions duplicable;
  2. let TailDuplicator::shouldTailDuplicate return false for CFI and Darwin,

or better yet -

  1. fix generateCompactUnwindEncoding in ARM, AArch64 and x86 backends to fallback to DWARF when the info cannot fit into compact encoding.

Still, we should probably use 3. only for corner cases like Reid's example.

If I'm not missing something, if we used it for CFI instructions in epilogue (which will start appearing when Violeta resubmits r317579) then practically every function would fallback to DWARF so compact unwind would become quite meaningless.

I think that is also a concern of emitting "correct" cfi directives in epilogues, right? Even on Linux it would have some size impact (and runtime slowdown in the unwinder) because we have more directives.

I guess emitting CFI directives in epilogues won't impact file size on Linux because it already uses DWARF, but it would impact it on Darwin if compact unwinder would fallback to DWARF for every function.

Ok I didn't realize where/how our code fails to deal with duplicated CFI.

Ok, then I would suggest that instead of this code we

  1. make CFI instructions duplicable;

I would advise to keep the NotDuplicable instruction attribute for now as long as our code cannot handle it. But moving the duplication logic into MachineInstr/MCCFIInstruction is a good idea anyway.

  1. let TailDuplicator::shouldTailDuplicate return false for CFI and Darwin,

or better yet -

  1. fix generateCompactUnwindEncoding in ARM, AArch64 and x86 backends to fallback to DWARF when the info cannot fit into compact encoding.

Still, we should probably use 3. only for corner cases like Reid's example.

If I'm not missing something, if we used it for CFI instructions in epilogue (which will start appearing when Violeta resubmits r317579) then practically every function would fallback to DWARF so compact unwind would become quite meaningless.

Hmm yeah, I guess compact unwind is important for code size. So we would prefer to stay with the current behavior on darwin as you did with this patch. (Though I've never seen actual numbers of how big an impact it has).

Hmm yeah, I guess compact unwind is important for code size. So we would prefer to stay with the current behavior on darwin as you did with this patch. (Though I've never seen actual numbers of how big an impact it has).

I compiled the test suite + SPEC for AArch64 -Os. The diff is between trunk and trunk with generateCompactUnwinding returning UNWIND_ARM64_MODE_DWARF if there are any CFI directives.

Program                                       diff

cfp2000
183.equake                                    30.9%
188.ammp                                      11.5%
177.mesa                                      6.8%

cfp2006
433.milc                                      10.9%
482.sphinx3                                   8.1%
450.soplex                                    7.9%
447.dealII                                    7.1%
453.povray                                    7.0%

cint2000
197.parser                                    10.3%
254.gap                                       9.9%
255.vortex                                    8.1%
253.perlbmk                                   8.0%
300.twolf                                     6.9%
176.gcc                                       5.7%

cint2006
462.libquantum                                22.5%
471.omnetpp                                   11.9%
456.hmmer                                     11.2%
483.xalancbmk                                 10.2%
400.perlbench                                 7.6%
403.gcc                                       6.4%
464.h264ref                                   6.1%
445.gobmk                                     3.0%

cint95
132.ijpeg                                     17.6%
130.li                                        16.4%
124.m88ksim                                   8.9%
147.vortex                                    8.1%
134.perl                                      5.7%
099.go                                        4.8%

test_suite
SingleSour...nchmarks/Adobe-C++/loop_unroll   81.5%
MultiSourc...enchmarks/VersaBench/dbms/dbms   30.1%
SingleSour...arks/Adobe-C++/stepanov_vector   29.0%
MultiSourc.../Reductions-flt/Reductions-flt   24.2%
SingleSour.../simple_types_constant_folding   23.8%
MultiSourc...nchmarks/FreeBench/pifft/pifft   23.5%
MultiSourc...ALAC/encode/alacconvert-encode   21.9%
MultiSourc...ALAC/decode/alacconvert-decode   21.9%
MultiSource/Applications/siod/siod            20.2%
MultiSourc...plications/lambda-0.1.3/lambda   20.0%
MultiSource/Benchmarks/Ptrdist/bc/bc          17.9%
SingleSour...++/simple_types_loop_invariant   14.3%
MultiSourc...Benchmarks/7zip/7zip-benchmark   14.1%
MultiSource/Benchmarks/PAQ8p/paq8p            11.8%
MultiSourc...OE-ProxyApps-C++/miniFE/miniFE   11.0%
MultiSource/Applications/SPASS/SPASS          10.9%
MultiSourc...ch/consumer-jpeg/consumer-jpeg   10.0%
MultiSourc...-ProxyApps-C++/PENNANT/PENNANT    9.8%
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg    9.1%
MultiSourc.../MallocBench/espresso/espresso    8.9%
MultiSource/Applications/lua/lua               8.6%
MultiSourc...e/Applications/sqlite3/sqlite3    8.3%
MultiSourc...ks/ASCI_Purple/SMG2000/smg2000    7.9%
MultiSourc...e/Benchmarks/MallocBench/gs/gs    7.2%
MultiSource/Applications/kimwitu++/kc          7.2%
MultiSourc...ch/consumer-lame/consumer-lame    6.9%
MultiSource/Applications/d/make_dparser        5.8%
MultiSourc.../Applications/JM/ldecod/ldecod    5.0%
MultiSourc...e/Applications/ClamAV/clamscan    4.9%
MultiSourc...enchmarks/mafft/pairlocalalign    4.5%
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR    4.0%
MultiSource/Benchmarks/Bullet/bullet           3.9%
SingleSour.../Benchmarks/Misc-C++-EH/spirit    3.5%
MultiSourc...sumer-typeset/consumer-typeset    3.0%
MultiSourc.../Applications/JM/lencod/lencod    2.5%
MultiSource/Applications/oggenc/oggenc         0.8%
SingleSour...Adobe-C++/stepanov_abstraction    0.1%
djokov updated this revision to Diff 129872.EditedJan 15 2018, 9:23 AM

I would advise to keep the NotDuplicable instruction attribute for now as long as our code cannot handle it. But moving the duplication logic into MachineInstr/MCCFIInstruction is a good idea anyway.

I agree. I meant to change that attribute only after fixing generateCompactUnwindEncoding in ARM, AArch64 and x86 backends to fallback to DWARF.

Anyway, I updated this patch. CFI instructions are still 'NotDuplicable', and they are duplicated by using default copy constructor for MCCFIInstruction.

djokov marked 2 inline comments as done.Jan 15 2018, 9:30 AM
lib/CodeGen/TailDuplicator.cpp
379–411 ↗(On Diff #126014)

Turns out default copy constructor for MCCFIInstruction already does what is needed.

djokov marked an inline comment as done.Jan 15 2018, 9:31 AM
thegameg added inline comments.Jan 15 2018, 9:47 AM
lib/CodeGen/TailDuplicator.cpp
381 ↗(On Diff #129872)

Do we actually need a copy? Does anything disallow us to share the CFIIndex (and reference the MCCFIInstruction more than once) within the same function?

djokov added inline comments.Jan 15 2018, 11:21 AM
lib/CodeGen/TailDuplicator.cpp
381 ↗(On Diff #129872)

According to tests, we don't need a copy. It works fine with referencing the existing MCCFIInstruction.

djokov updated this revision to Diff 129894.Jan 15 2018, 11:25 AM

Removed copying CFI Instructions via copy constructor.

djokov marked an inline comment as done.Jan 15 2018, 11:27 AM

Does this solution look good?

thegameg accepted this revision.Jan 29 2018, 5:14 PM

This LGTM, thanks.

I just had one more question (it's not blocking this patch in any way):

If we end up duplicating a prologue like in the test case that r317726 broke, will the CFI be correct without Correct dwarf unwind information in function epilogue? I know this patch is needed to fix the other patch, I'm just wondering if it's ok to commit them separately. Also we can argue that CFI unwind info is not entirely correct/complete today since the epilogue is not described at all.

test/CodeGen/AArch64/taildup-cfi.ll
6 ↗(On Diff #129894)

You can get rid of these 2 lines.

This revision is now accepted and ready to land.Jan 29 2018, 5:14 PM

If we end up duplicating a prologue like in the test case that r317726 broke, will the CFI be correct without Correct dwarf unwind information in function epilogue?

They will not be completely correct (one correcting .cfi_def_cfa_offset instruction on the beginning of the duplicated block would be missing), but hopefully this won't be a problem, because it is planned to reland Correct dwarf unwind information in function epilogue right after this patch.

This revision was automatically updated to reflect the committed changes.