This is an archive of the discontinued LLVM Phabricator instance.

[LLD] [COFF] Fix including the personality function for DWARF EH when linking with --gc-sections
ClosedPublic

Authored by mstorsjo on May 9 2021, 1:50 PM.

Details

Summary

Since c579a5b1d92a9bc2046d00ee2d427832e0f5ddec we don't traverse
.eh_frame when doing GC. But the exception handling personality
function needs to be included, and is only referenced from within
.eh_frame.

Longterm, we probably would want to split .eh_frame into associative
COMDATs just like we do for .xdata/.pdata (and then include them in
the GC traversal), but for now, just try to pull in the name of the
known personality function in mingw DWARF EH.

Diff Detail

Event Timeline

mstorsjo requested review of this revision.May 9 2021, 1:50 PM
mstorsjo created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2021, 1:50 PM
rnk added inline comments.May 11 2021, 11:08 AM
lld/COFF/Driver.cpp
2237

GCC produces a handful of these:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/EHPersonalities.cpp#L26

.Case("__gxx_personality_v0", EHPersonality::GNU_CXX)
.Case("__gxx_personality_seh0", EHPersonality::GNU_CXX)
.Case("__gxx_personality_sj0", EHPersonality::GNU_CXX_SjLj)
.Case("__gcc_personality_v0", EHPersonality::GNU_C)
.Case("__gcc_personality_seh0", EHPersonality::GNU_C)
.Case("__gcc_personality_sj0", EHPersonality::GNU_C_SjLj)

The __gcc variants mostly come from __attribute__((cleanup)) usage, _sj0 is presumably for sjlj eh, and I'm not sure about _v0 vs. _seh0.

We could replicate the list here, or we could parse eh_frame and look for personalities. Don't we already parse eh_frame in ELF? Can we use the logic easily?

mstorsjo added inline comments.May 11 2021, 11:20 AM
lld/COFF/Driver.cpp
2237

SJLJ doesn't use unwind tables but just direct references in code, so that works as-is.

SEH is handled via xdata/pdata, which is handled correctly via either properly associative comdats or gnu-style suffixed comdats, so there the personality function is kept.

So for now, dwarf is the only one that needs this, and dwarf uses the _v0 suffixed personality. But we could indeed check for both the gcc and gxx variants.

I'm not sure how much we'd gain by parsing the eh_frame - we only retain in the gcc/gxx personality functions if they're already pulled in by object files. Sure - if they're only referenced by a function that is discarded, we don't need to keep them...

But I'd rather (long term) split up eh_frame in similar associative comdats like xdata/pdata - although that seems to be a bit more work. (Or is there a reason not to do that - how is that handled in ELF?)

rnk added a subscriber: MaskRay.May 11 2021, 11:28 AM
rnk added inline comments.
lld/COFF/Driver.cpp
2237

Great, let's just do the gcc and gxx variants and call it a day.

On ELF, .eh_frame is generally merged together into one section, and the linker parses it to discard the unnecessary DIE entries for discarded functions. This is not ideal, but until recently there wasn't consensus about how to represent something like an associative comdat section. And, I believe splitting .eh_frame would have high overhead, both from ELF and DWARF. Maybe @MaskRay can say more about the state of the art.

MaskRay added a comment.EditedMay 11 2021, 2:04 PM

.eh_frame consists of FDE (text section contribution) and CIE (common part shared by multiple FDE).
It needs to be monolithic in -ffunction-sections mode because:

  • A section header has a large overhead (sizeof(Elf64_Shdr)=64).
  • A CIE must precede a FDE which references it (CIE Pointer in a FDE is a non-zero unsigned integer). The input section order in one object file may not match the output section order, breaking this requirement. (I am surprised that mingw can have fragmented .eh_frame$

.eh_frame pieces can be stale (GC, ICF), so ld.lld has a pass merging .eh_frame pieces and discarded unneeded pieces.

When GC is concerned: ld.lld computes --gc-sections (markLive<ELFT>()) before combining .eh_frame pices (combineEHSections).
So we now have a phase ordering problem.
Since GC runs earlier, we have to conservatively assume all .eh_frame pieces live.
However, we don't necessarily retain everything referenced by .eh_frame pieces.
See the comment in MarkLive.cpp:scanEHFrameSection and MarkLive<ELFT>::resolveReloc: we retain most referenced symbols live except
a symbol defined in an executable non-linkorder-non-group section.

In ELF, you may find test/ELF/eh-frame-gc{,2}.s and their history useful.

MaskRay added inline comments.May 11 2021, 2:07 PM
lld/COFF/Driver.cpp
2237

Does mingw still use sjlj? GCC used to have an option -fsjlj-exceptions which has been removed for many years now.

lld/test/COFF/gc-dwarf-eh.s
38

Is .cfi_personality supported? Would be good to use correct content. The personality (__gxx_personality_v0) relocation precedes the text section symbols.

mati865 added inline comments.
lld/COFF/Driver.cpp
2237

There are mingw-w64 toolchains out there that still use SJLJ for 32-bit builds but most of them has switched to DWARF already. For 64-bit builds AFAIK every popular toolchain uses SEH.

.eh_frame consists of FDE (text section contribution) and CIE (common part shared by multiple FDE).
It needs to be monolithic in -ffunction-sections mode because:

  • A section header has a large overhead (sizeof(Elf64_Shdr)=64).
  • A CIE must precede a FDE which references it (CIE Pointer in a FDE is a non-zero unsigned integer). The input section order in one object file may not match the output section order, breaking this requirement. (I am surprised that mingw can have fragmented .eh_frame$

Thanks for the info! No I'm not saying that mingw can have fragmented .eh_frame$, that was just my initial assumption for how to implement it, mirroring the SEH case. But now it does indeed sound like that wouldn't be the right thing to do, so that saves me some amount of wasted time. Thanks!

.eh_frame pieces can be stale (GC, ICF), so ld.lld has a pass merging .eh_frame pieces and discarded unneeded pieces.

When GC is concerned: ld.lld computes --gc-sections (markLive<ELFT>()) before combining .eh_frame pices (combineEHSections).
So we now have a phase ordering problem.
Since GC runs earlier, we have to conservatively assume all .eh_frame pieces live.
However, we don't necessarily retain everything referenced by .eh_frame pieces.
See the comment in MarkLive.cpp:scanEHFrameSection and MarkLive<ELFT>::resolveReloc: we retain most referenced symbols live except
a symbol defined in an executable non-linkorder-non-group section.

In ELF, you may find test/ELF/eh-frame-gc{,2}.s and their history useful.

Thanks! That sounds like it would be the way to go, but it also sounds like a bit more effort than I was expecting to spend on it, so in that case, I think this particular approach might be good enough for now - dwarf EH on mingw isn't used on the most important targets anyway.

lld/test/COFF/gc-dwarf-eh.s
38

I can try to give it a shot to rewrite this using proper CFI directives. This way it felt a bit more clear exactly what the object file looks like though (although I could reorder things to fake a real .eh_frame a bit better too).

mstorsjo updated this revision to Diff 344743.May 12 2021, 2:45 AM

Checking for both __gxx_personality_v0 and __gcc_personality_v0. Reworded the comment to clarify what the proper full solution would be, changed the testcase to use .cfi directives instead of a fake .eh_frame section.

rnk accepted this revision.May 12 2021, 10:41 AM

lgtm

This revision is now accepted and ready to land.May 12 2021, 10:41 AM