This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Private label aliases to weak symbols should not retain section data
ClosedPublic

Authored by int3 on Nov 30 2022, 7:09 PM.

Details

Summary

If we have two files with the same weak symbol like so:

ltmp0:
_weak:
  <contents>

and

ltmp1:
_weak:
  <contents>

Linking them together should leave only one copy of <contents>, not
two. Previously, we would keep around both copies because of the
private-label ltmp<N> symbols (i.e. symbols that start with l) -- we
would not coalesce those, so we would treat them as retaining the
contents.

This matters for more than just size -- we are depending upon this
behavior internally for emitting a certain file format. This file
format's header is repeated in each object file, but we want it to
appear just once in our output.

Why can't we not emit those aliases to _weak, or reference the
ltmp<N> symbols instead of _weak? Well, MC actually adds ltmp<N>
symbols as part of the assembly-to-binary translation step. So any
codegen at the clang level can't access them.

All that said... this solution is actually kind of hacky. Here, we avoid
creating the private-label symbols at parse time. This is acceptable
since we never emit those symbols in our output. However, in ld64, any
aliasing temporary symbols (ignored or otherwise) won't retain coalesced
data. But implementing this is harder -- we would have to create those
symbols first (so we can emit their names later), but we would have to
ensure the linker correctly shuffles them around when their aliasees get
coalesced.

Additionally, ld64 treats these temporary symbols as functionally
equivalent to the weak symbols themselves -- that is, it will emit weak
binds when those non-weak temporary aliases are referenced. We have
imitated this behavior for private-label symbols, but implementing it for
local aliases in general seems substantially more difficult. I'm not
sure if any programs actually depend on this behavior though, so maybe
it's a moot point.

Finally, ld64 does all this regardless of whether
.subsections_via_symbols is specified. We don't. But again, given how
rare the lack of that directive is (I've only seen it from hand-written
assembly inputs), I don't think we need to worry about it.

Diff Detail

Event Timeline

int3 created this revision.Nov 30 2022, 7:09 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 30 2022, 7:09 PM
int3 requested review of this revision.Nov 30 2022, 7:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 7:09 PM
oontvoo added inline comments.
lld/MachO/InputFiles.cpp
625

super nit ...

843–852

This seems like we'd be repeating the strtab + sym.n_strx a lot ... i wonder if it's worth pre-computing the names once and just use the stored values?

lld/test/MachO/weak-def-alias-ignored.s
16 ↗(On Diff #479139)
oontvoo added inline comments.Nov 30 2022, 7:39 PM
lld/MachO/UnwindInfoSection.cpp
212

i wonder if this assert could be stricter (ie., how do we know this p.first->second == d is true because of the alias-symbol reuse (from InputFiles ~line 872) and not some bug where the same set of symbols are being added twice ....

int3 marked an inline comment as done.Nov 30 2022, 9:28 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
625

will add the comment. honestly I don't think either name is super good heh (I went back and forth before deciding on this). ld64 calls them "ignored labels" which doesn't seem great either because the rest of the code doesn't use the word "label"

but the symbol isn't exactly ignored either :/ it's used at parse time, just immediately dropped as part of the parse...

will mull on this a bit

843–852

I think it's an extremely cheap computation so probably not worth caching

int3 marked 3 inline comments as done.Nov 30 2022, 9:42 PM
int3 added inline comments.
lld/MachO/InputFiles.cpp
625

MC calls these "private labels" so let's go with that (see https://github.com/llvm/llvm-project/blob/main/llvm/lib/MC/MCAsmInfo.cpp#L46)

lld/MachO/UnwindInfoSection.cpp
212

I feel that too, but not sure there's a great solution here

we could delete the duplicate (aliased) symbols after parsing the relocations but that seems like unnecessary overhead (in terms of iterating over all the symbols an extra time)

int3 updated this revision to Diff 479165.Nov 30 2022, 9:46 PM

rename ignored -> private in the test too

int3 added a comment.Nov 30 2022, 9:49 PM

TODO: also update the commit message ("ignored" -> "private") assuming no one objects to the new name

oontvoo accepted this revision.Dec 1 2022, 6:05 AM

LG!

(Not sure why git-clang-format is failing for the debian build ... probably unrelated?)

lld/MachO/InputFiles.cpp
625

SG!

This revision is now accepted and ready to land.Dec 1 2022, 6:05 AM
int3 retitled this revision from [lld-macho] Ignored aliases to weak symbols should not retain section data to [lld-macho] Private label aliases to weak symbols should not retain section data.Dec 1 2022, 8:59 AM
int3 edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Dec 1 2022, 9:01 AM
This revision was automatically updated to reflect the committed changes.

FYI, this change caused an iOS/arm64 test in Chromium to crash (https://bugs.chromium.org/p/chromium/issues/detail?id=1400716). Will try to investigate more, though.

int3 added a comment.Dec 15 2022, 3:14 PM

Doesn't look like I have permissions to see that link. Could you share it?

We have been running into some issues too that blames to this, mostly around Swift-related stuff. Don't have a small repro yet though, so a small failing test would be helpful

I guess we should revert this for now

Doesn't look like I have permissions to see that link. Could you share it?

We have been running into some issues too that blames to this, mostly around Swift-related stuff. Don't have a small repro yet though, so a small failing test would be helpful

I guess we should revert this for now

(fixed the permisisons) This one is also Swift-related.