This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Associate compact unwind entries with function symbols
ClosedPublic

Authored by int3 on Sep 16 2021, 8:35 PM.

Details

Summary

Compact unwind entries (CUEs) contain pointers to their respective
function symbols. However, during the link process, it's far more useful
to have pointers from the function symbol to the CUE than vice versa.
This diff adds that pointer in the form of Defined::compactUnwind.

In particular, when doing dead-stripping, we want to mark CUEs live when
their function symbol is live; and when doing ICF, we want to dedup
sections iff the symbols in that section have identical CUEs. In both
cases, we want to be able to locate the symbols within a given section,
as well as locate the CUEs belonging to those symbols. So this diff also
adds InputSection::symbols.

The ultimate goal of this refactor is to have ICF support dedup'ing
functions with unwind info, but that will be handled in subsequent
diffs. This diff focuses on simplifying -dead_strip --
findFunctionsWithUnwindInfo is no longer necessary, and
Defined::isLive() is now a lot simpler. Moreover, UnwindInfoSection no
longer has to check for dead CUEs -- we simply avoid adding them in the
first place.

Additionally, we now support stripping of dead LSDAs, which follows
quite naturally since markLive() can now reach them via the CUEs.

Diff Detail

Event Timeline

int3 created this revision.Sep 16 2021, 8:35 PM
Herald added a project: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Sep 16 2021, 8:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2021, 8:35 PM
int3 updated this revision to Diff 373133.Sep 16 2021, 8:44 PM

edit comment

gkm accepted this revision.Oct 25 2021, 1:57 PM

LGTM! Thank you for this. It makes __eh_frame handling much nicer too.
Please fix test failure for MachO/compact-unwind-both-local-and-dylib-personality.s.

lld/MachO/InputSection.cpp
104–107

Nit: According to LLVM Coding Standards, if one body of an if/else chain has braces, then all should have them.

lld/MachO/Writer.cpp
690–695

Nit: LLVM Coding Standards wants braces when nested statements are more than 2 deep, though there is no clear guidance about where to put them here. My suggestion is around the for bodies.

This revision is now accepted and ready to land.Oct 25 2021, 1:57 PM
int3 updated this revision to Diff 382371.Oct 26 2021, 9:27 AM
int3 marked an inline comment as done.

fix test + style

int3 added a subscriber: oontvoo.Oct 26 2021, 9:31 AM

@oontvoo want to have a look at the changes to compact-unwind-both-local-and-dylib-personality.s before I land it?

lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
42–45

It seems that in the case of b.out, we don't emit this extra local binding. I haven't dug into why but I think it should be safe since that binding wasn't being used anyway

42–46

I removed the -DAG part here because it seems we are actually relying on the local symbols matching in exactly that order

oontvoo added inline comments.Oct 26 2021, 10:34 AM
lld/MachO/Writer.cpp
693

Why is dyn_cast used in 691 but dyn_cast_or_null here? (seems like the same use case to me )

lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
42–45

This is probably fine.
(See also https://reviews.llvm.org/D111852#inline-1066410)

42–46

Any guarantee that the LOCALs will get printed after non-locals?

While it's true that we rely on the partial order (ie., amongst the LOCALs and amongst the __gxx_personaliys ) but I wasn't sure we could rely on the ordering overall being deterministic

oontvoo added inline comments.Oct 26 2021, 12:04 PM
lld/MachO/InputSection.cpp
119

do you want to assert the two compactUnwind's are actually the same first?

int3 marked 2 inline comments as done.Oct 26 2021, 1:00 PM
int3 added inline comments.
lld/MachO/InputSection.cpp
119

in general the compactUnwinds won't have pointer equality, and doing a recursive comparison here seems like too much work

lld/MachO/Writer.cpp
693

because sym can be null here. See ObjFile::parseSymbols -- not even index in the symbols array gets initialized.

lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
42–46

No guarantee per se -- i.e. it *could* change -- that's just how it's currently implemented. But yes the behavior should be deterministic (and if not it's a bug).