This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Handle encoding personalities of same names but different kinds
ClosedPublic

Authored by oontvoo on Aug 4 2021, 10:01 PM.

Details

Summary

Sometimes people intentionally re-define a dylib personlity symbol as a local defined symbol as a workaround to a ld -r bug.
As a result, we could see "too many personalities" to encode. This patch tries to handle this case by ignoring the local symbols entirely.

More generally, this patch also tries to resolve which personality symbols getting "picked" when there are more than one with the same name to avoid encoding too many personalities. (eg., personality symbols could be from 3 groups, local, global, and dylib. Granted, this is not a usual scenario but it does happen sometimes and LD64 seems to allow it - so I think LLD should , too.)

    
PR: 51262
    
Differential Revision: https://reviews.llvm.org/D107533

Diff Detail

Event Timeline

oontvoo created this revision.Aug 4 2021, 10:01 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
oontvoo requested review of this revision.Aug 4 2021, 10:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 10:01 PM
oontvoo edited the summary of this revision. (Show Details)Aug 4 2021, 10:04 PM
thakis added a subscriber: thakis.Aug 5 2021, 6:25 AM
thakis added inline comments.
lld/MachO/UnwindInfoSection.cpp
285

Due to this comment, I think you'll have to change some encoding code in addition to bumping the number too. Two bits aren't enough for 4 personalities.

int3 added a subscriber: int3.Aug 5 2021, 11:10 AM
int3 added inline comments.
lld/MachO/UnwindInfoSection.cpp
285

Yeah. To be clear, it's not enough because in addition to the 4 personalities, we need to encode the "no personality" value.

ld64 should have the same limitation too. I found some links mentioning the issue, with the suggested workaround of passing -no_compact_unwind: https://bugzilla.mozilla.org/show_bug.cgi?id=1289847. However https://stackoverflow.com/a/30733047/149330 mentions that it only works if you crash on every exception (which I think Rust does?)

In any case, LLD doesn't yet support this flag. But I'm wondering if the ld64-linked build in PR51262 is succeeding because of something else, e.g. perhaps dead-stripping is removing the code that requires the extra personality.

oontvoo updated this revision to Diff 364558.Aug 5 2021, 11:13 AM
oontvoo marked an inline comment as done.

Encode the personality index as 0-based

oontvoo added inline comments.Aug 5 2021, 11:14 AM
lld/MachO/UnwindInfoSection.cpp
285

How about we keep it as 2-bit and change the personalityIndex to be 0-based (below)?
Should still be enough for now - unless we find a fifth one ...

oontvoo marked an inline comment as not done.Aug 5 2021, 11:15 AM
oontvoo added inline comments.
lld/MachO/UnwindInfoSection.cpp
285

Ooops - crosstalking ... didn't see int3's comment

oontvoo marked an inline comment as not done.Aug 5 2021, 1:58 PM
oontvoo added inline comments.
lld/MachO/UnwindInfoSection.cpp
285
  • in practice, how often do you see 4+ personalty symbols?
  • it's a limitation from ld64, too - but doesn't mean have to imitate that, right?
int3 added inline comments.Aug 5 2021, 2:19 PM
lld/MachO/UnwindInfoSection.cpp
285

in practice, how often do you see 4+ personality symbols?

none of the things I've been building so far need it :)

it's a limitation from ld64, too - but doesn't mean have to imitate that, right?

nope, but it's a limitation of the format. I assume -no_compact_unwind makes ld64 use the non-compact DWARF format that doesn't have that limitation.

oontvoo planned changes to this revision.Aug 5 2021, 9:53 PM
oontvoo added inline comments.
lld/MachO/UnwindInfoSection.cpp
285

Ah! I think I saw the problem (at least in this build I'm trying to get working).
The personality symbols could come from both Defined and DylibSymbol.
So while there are only 3 uniquely named personalities , there were 4 or 5 personalities (ie., different got indices).

I guess for this case, we could just pick the Defined personalities over the others?

oontvoo updated this revision to Diff 364694.Aug 5 2021, 9:53 PM

Pick Defined

oontvoo retitled this revision from [lld-macho] Allow encode up to 4 personality symbols. to Handle encoding personalities of same names but different kinds..Aug 5 2021, 9:54 PM
oontvoo edited the summary of this revision. (Show Details)
oontvoo updated this revision to Diff 364850.Aug 6 2021, 11:21 AM
oontvoo edited the summary of this revision. (Show Details)

Lint precheck

int3 added a comment.Aug 6 2021, 12:53 PM

Huh, interesting. I assume that the Defined symbols here are local symbols? Since global Defineds should overwrite the dylib symbols.

Can we have a test to verify that we're matching ld64's behavior?

P.S. I'm on semi-vacation till the middle of next week, will be a bit sporadic in responding

Huh, interesting. I assume that the Defined symbols here are local symbols? Since global Defineds should overwrite the dylib symbols.

Yep! Local symbols (ie., 't' ) (more specifically private_extern ('T') symbol that got changed to local symbol via ld -r)

I've tried to make the repro as less convoluted as I could:

// defined.s

.private_extern ___gxx_personality_v0

.text
.no_dead_strip ___gxx_personality_v0	
___gxx_personality_v0:	
  .cfi_startproc
  .cfi_def_cfa_offset 16
  .cfi_endproc
  nop

.subsections_via_symbols

// user_1.s

.globl _foo
.text
.p2align 2
_foo:
  .cfi_startproc
  .cfi_personality 155, ___gxx_personality_v0
  .cfi_def_cfa_offset 16
  ret
  .cfi_endproc

.subsections_via_symbols

// user_2.s

.globl _main

.text
.p2align 2
_main:
  .cfi_startproc
  .cfi_personality 155, ___gxx_personality_v0
  .cfi_def_cfa_offset 16
  ret
  .cfi_endproc

.subsections_via_symbols

Link commands:

clang -fexceptions -c -o defined.o defined.s
clang -fexceptions -c -o user_1.o user_1.s
clang -fexceptions -c -o user_2.o user_2.s

##  concat defined.o and user_1.o with `ld -r`
## this will change  'T' ___gxx_personality_v0  ===> 't' ___gxx_personality_v0
ld -r -o combined.o defined.o user_1.o

## finally link with lld-macho:
ld64.lld.darwinnew   -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk"  -lc++ -lSystem user_2.o combined.o

Can we have a test to verify that we're matching ld64's behavior?

maybe we could have lld-macho implement -r to make it easier to write tests like these ...

P.S. I'm on semi-vacation till the middle of next week, will be a bit sporadic in responding

no worries! thanks!

oontvoo updated this revision to Diff 365266.Aug 9 2021, 12:40 PM

added test

oontvoo updated this revision to Diff 365281.Aug 9 2021, 1:38 PM

readjust triple version

oontvoo edited the summary of this revision. (Show Details)Aug 9 2021, 1:45 PM
oontvoo updated this revision to Diff 365287.Aug 9 2021, 2:09 PM

add postlink check and fix lint error

oontvoo updated this revision to Diff 365328.Aug 9 2021, 6:26 PM

ignore tombstoned personality too

int3 added a comment.EditedAug 11 2021, 3:05 PM

Interesting example... after playing around with it for a bit, I'm not convinced that ld64 is actually behaving sensibly here 😕

First, when linking against combined.o, the dylib symbol "wins":

(base) ~/unwind: ld -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk" -lSystem -lc++ user_2.o combined.o -o out-ld64
ld: warning: object file (user_2.o) was built for newer macOS version (11.0) than being linked (10.15)
ld: warning: object file (combined.o) was built for newer macOS version (11.0) than being linked (10.15)
(base) ~/unwind: llvm-objdump --macho --bind out-ld64
out-ld64:

Bind table:
segment  section            address    type       addend dylib            symbol
__DATA_CONST __got              0x100004000 pointer         0 libc++           ___gxx_personality_v0

In contrast, the code in this diff lets the local symbol win. But... I put "wins" above in quotes because ld64 doesn't actually seem to consider the local symbol a valid candidate at all! If we remove -lc++:

(base) ~/unwind: ld -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk" -lSystem user_2.o combined.o -o out-ld64
ld: warning: object file (user_2.o) was built for newer macOS version (11.0) than being linked (10.15)
ld: warning: object file (combined.o) was built for newer macOS version (11.0) than being linked (10.15)
Undefined symbols for architecture x86_64:
  "___gxx_personality_v0", referenced from:
      _main in user_2.o
      Dwarf Exception Unwind Info (__eh_frame) in user_2.o
      _foo in combined.o
      Dwarf Exception Unwind Info (__eh_frame) in combined.o
ld: symbol(s) not found for architecture x86_64

However, linking user_1.o and defined.o in place of combined.o does cause the personalities to be resolved to the global symbol (as expected):

(base) ~/unwind: ld -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk" -lSystem user_2.o user_1.o defined.o -o out-ld64
ld: warning: object file (user_2.o) was built for newer macOS version (11.0) than being linked (10.15)
ld: warning: object file (user_1.o) was built for newer macOS version (11.0) than being linked (10.15)
ld: warning: object file (defined.o) was built for newer macOS version (11.0) than being linked (10.15)
ld: warning: could not create compact unwind for ___gxx_personality_v0: FDE found for zero size function
(base) ~/unwind: llvm-objdump --macho --bind out-ld64
out-ld64:

Bind table:
segment  section            address    type       addend dylib            symbol

I'm not convinced that we should attempt to emulate this bizarre behavior. I don't see why the result of a link should differ depending on whether some of its intermediate inputs have been run through ld -r. Would it be possible to change your build system to work around this instead?

Interesting example...

Yep! this is "inspired" by for ios apps that try really hard to do funny things static linking

after playing around with it for a bit, I'm not convinced that ld64 is actually behaving sensibly here 😕

First, when linking against combined.o, the dylib symbol "wins":
In contrast, the code in this diff lets the local symbol win. But... I put "wins" above in quotes because ld64 doesn't actually seem to consider the local symbol a valid candidate at all! If we remove -lc++:

Actually this diff doesn't change which symbol "wins" in the final output - (it's pretty inconsequential to that)
Similarly, without supplying -lc++, LLD would also fail with undefined symbol here.

$  ../bin/ld64.lld.darwinnew   -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk"  -lc++ -lSystem user_2.o combined.o -o lld_output
ld64.lld.darwinnew: warning: user_2.o has version 11.0.0, which is newer than target minimum of 10.15
ld64.lld.darwinnew: warning: combined.o has version 11.0.0, which is newer than target minimum of 10.15

$ llvm-objdump --macho --bind lld_output 
lld_output:
Bind table:
segment  section            address    type       addend dylib            symbol
__DATA_CONST __got              0x100002000 pointer         0 libc++           ___gxx_personality_v0

$  ../bin/ld64.lld.darwinnew   -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk"   -lSystem user_2.o combined.o -o lld_output 
ld64.lld.darwinnew: warning: user_2.o has version 11.0.0, which is newer than target minimum of 10.15
ld64.lld.darwinnew: warning: combined.o has version 11.0.0, which is newer than target minimum of 10.15
ld64.lld.darwinnew: error: undefined symbol: ___gxx_personality_v0
>>> referenced by user_2.o
ld64.lld.darwinnew: error: undefined symbol: ___gxx_personality_v0
>>> referenced by user_2.o

However, linking user_1.o and defined.o in place of combined.o does cause the personalities to be resolved to the global symbol (as expected):

user_1.o + defined.o is not the same thing as combined.o .
(The trick here was that ___gxx_personality_v0 was defined as private-extern in defined.o, ie., T, and ld -r changed it to local defined, ie t in combined.o)

I'm not convinced that we should attempt to emulate this bizarre behavior. I don't see why the result of a link should differ depending on whether some of its intermediate inputs have been run through ld -r. Would it be possible to change your build system to work around this instead?

I think the most convoluted part here is how ld -r changes a a private extern to local 🤔
The rest of the things that we've observed here are by-products

I think the most convoluted part here is how ld -r changes a a private extern to local 🤔
The rest of the things that we've observed here are by-products

I believe -keep_private_externs (which isn't the default) will turn that off.

int3 added a comment.Aug 16 2021, 2:26 PM

Actually this diff doesn't change which symbol "wins" in the final output - (it's pretty inconsequential to that)

It changes which symbol "wins" as the personality symbol (as your "Prioritize Defined over others." comment indicates). ld64 always picks the dylib symbol; your diff seems to pick whichever symbol comes first:

(base) ~/unwind: ld64.lld -arch "x86_64" -platform_version macos 10.15 11.0   -syslibroot "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk" -lSystem -lc++ combined.o user_2.o -o out-lld
ld64.lld: warning: combined.o has version 11.0.0, which is newer than target minimum of 10.15
ld64.lld: warning: user_2.o has version 11.0.0, which is newer than target minimum of 10.15
(base) ~/unwind:  llvm-objdump --macho --bind --unwind-info out-lld
out-lld:
Contents of __unwind_info section:
  Version:                                   0x1
  Common encodings array section offset:     0x1c
  Number of common encodings in array:       0x3
  Personality function array section offset: 0x28
  Number of personality functions in array:  0x1
  Index array section offset:                0x2c
  Number of indices in array:                0x2
  Common encodings: (count = 3)
    encoding[0]: 0x12020000
    encoding[1]: 0x02020000
    encoding[2]: 0x00000000
  Personality functions: (count = 1)
    personality[1]: 0x00002000
  Top level indices: (count = 2)
    [0]: function offset=0x00000000, 2nd level page offset=0x00000044, LSDA offset=0x00000044
    [1]: function offset=0x00000439, 2nd level page offset=0x00000000, LSDA offset=0x00000044
  LSDA descriptors:
  Second level indices:
    Second level index[0]: offset in section=0x00000044, base function offset=0x00000000
      [0]: function offset=0x00000000, encoding=0x12020000
      [1]: function offset=0x00000001, encoding=0x02020000
      [2]: function offset=0x00000430, encoding=0x00000000
      [3]: function offset=0x00000438, encoding=0x12020000

Bind table:
segment  section            address    type       addend dylib            symbol
__DATA_CONST __got              0x100002008 pointer         0 libc++           ___gxx_personality_v0

Note that the personality symbol does not match the dynamically bound address. (We should probably not be binding ___gxx_personality_v0 at all, but anyway...)

user_1.o + defined.o is not the same thing as combined.o .

I understood that. My point is that it *should* be the same thing. I would rather we not try to handle this weird edge case of ld64 if possible.

I believe -keep_private_externs (which isn't the default) will turn that off.

Nice, this seems to work in my local testing. @oontvoo, can you use that flag to work around the issue? Or are you trying to link some binary blob for which you don't have the source?

I believe -keep_private_externs (which isn't the default) will turn that off.

Thanks for the suggestion!

Nice, this seems to work in my local testing. @oontvoo, can you use that flag to work around the issue? Or are you trying to link some binary blob for which you don't have the source?

Well, in theory do have access to all the source files - but for this specific build, it's just a pre-compiled object file we have to deal with.
Let me talk to them and see if setting this flag would be an option.

Just so I have a sense, how strongly opposed are you to supporting this? :) 100% ? 80%, etc?

int3 added a comment.Aug 16 2021, 7:30 PM

Well, assuming it's multiple-choice, I pick 80% ;p

I mean, if you can't find a workaround, and/or there are multiple builds all over the place that need this, then I guess we should support it somehow

Re: setting -keep_private_externs

Turnt out, the flag was *not* set for a specific reason

https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/cronet/tools/hide_symbols.py#124

# When compiling for 64bit targets, the symbols in call_with_eh_frame.o are
# referenced in assembly and eventually stripped by the call to ld -r above,
# perhaps because the linker incorrectly assumes that those symbols are not
# used. Using -keep_private_externs fixes the compile issue, but breaks
# other parts of cronet. Instead, simply add a second .o file with the
# personality routine. Note that this issue was not caught by Chrome tests,
# it was only detected when apps tried to link the resulting .a file.

Maybe when/if we implement lld -r we can fix the bug here ...

But anyways, back the previous question of whether we want this, cronet (part of Chrominium) is a large part of what we want to support, so I'd really like to get this fixed somehow
(I see now this patch has a bug but it can be fixed).
Thoughts?

int3 added a comment.Aug 18 2021, 6:56 PM

Fair enough, let's support it then :)

I think we can have a considerably simpler implementation though, by handling this case as part of UnwindInfoSectionImpl<Ptr>::prepareRelocations. In particular, we already have logic that special-cases the personality relocation in order to handle section relocations: https://github.com/llvm/llvm-project/blob/main/lld/MachO/UnwindInfoSection.cpp#L212. We could add something that replaces local defined symbols with dylib symbols of the same name. Also, I would guess that it's pretty rare for personality symbols to be local, so we can skip the expensive string hash in the common case.

(Worth checking: what happens if we have local defined + global defined? Presumably the same sort of replacement happens?)

lld/MachO/UnwindInfoSection.cpp
306–307

do we really need to care about <internal> symbols?

oontvoo planned changes to this revision.Aug 30 2021, 11:08 AM

make picking a symbol deterministic

oontvoo updated this revision to Diff 371179.Sep 7 2021, 1:53 PM

addressed comment

(Worth checking: what happens if we have local defined + global defined? Presumably the same sort of replacement happens?)

I think if you have both local and global, then the local would take precedence (as expected).

oontvoo planned changes to this revision.Sep 7 2021, 2:58 PM

needs test for local vs global too

oontvoo updated this revision to Diff 371416.Sep 8 2021, 12:22 PM

more tests

I think this is ready for review. PTAL. Thanks!

I only reviewed the mechanical changes as I'm not too familiar with the functional changes. I'll let someone more experienced comment on that.

lld/MachO/UnwindInfoSection.cpp
225

Probably it's just my lack of knowledge here, but is this line useful? It eventually gets overridden in the start of the loop on L228 right?

298

Why does this need to be initialized to 0? Isn't it always going to be eventually defined below?

lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
26

Shouldn't there also be a check on the symbol table for b.out and c.out?

oontvoo marked an inline comment as done.Sep 9 2021, 6:04 PM
oontvoo added inline comments.
lld/MachO/UnwindInfoSection.cpp
225

No, it's a reference to the entry into the map. I guess it would've been clearer to have written:

Line: 228: auto it = personalityTable.find(....);
< .... >

HERE: it->second = s;
298

sorry - spurious change .

int3 added inline comments.Sep 9 2021, 10:57 PM
lld/MachO/UnwindInfoSection.cpp
145–146

hm, what do you mean "intentionally defined ... as a hack to get around another bug"? Did you figure out why ld -r is emitting these local symbols?

175–178

omg... I assume you've verified that this what ld64 does? this seems super bizarre. It's not even a valid total order!

179
213
oontvoo marked an inline comment as done.Sep 10 2021, 10:21 AM
oontvoo added inline comments.
lld/MachO/UnwindInfoSection.cpp
175–178

Yeah - the last two I could understand a little. The first two are completely unexpected.
Now that we're here, actually. I, personally, would be fine with LLD re-defining its own rules - as long as it's consistent.

(The problem I'm trying to fix here is the duplicate personality symbols - hence as long as we have a set of rules to pick one, i'm good)
Any inclination on which way to go from here?

oontvoo retitled this revision from Handle encoding personalities of same names but different kinds. to [lld-macho] Handle encoding personalities of same names but different kinds.Sep 10 2021, 11:47 AM
oontvoo edited the summary of this revision. (Show Details)
oontvoo added inline comments.Sep 10 2021, 2:21 PM
lld/MachO/UnwindInfoSection.cpp
145–146

To be clear, the "bug" here is *NOT* that ld -r produces local symbols for previously private extern symbols . In fact, it seems sort of reasonable(in hindsight 😊) - if you consider the output of ld -r to be a shared library, then making sure private symbols remain local makes sense.

The "bug" here is that the LD64 incorrectly thinks symbols from one of our .o files were not used, and therefore strip them from the final output when we run ld -r. We've been working around this by having an additional .o file containing the personality symbol as a local symbol.

int3 added inline comments.Sep 13 2021, 10:49 PM
lld/MachO/UnwindInfoSection.cpp
175–178

I was thinking we could special-case just local symbols. For each local personality symbol, we check the symbol table to see if an entry with the same name exists there. If so, we replace the relocation's reference to the local symbol with the symbol in the symbol table.

oontvoo marked 2 inline comments as done.Sep 16 2021, 11:24 AM
oontvoo added inline comments.
lld/MachO/UnwindInfoSection.cpp
175–178

Ok - let's try that - it might be simpler. (I wasn't super in favour of that because we might end up special-casing a lot of things ... in which case, it would've been better to do it in a separate pass, as attempted here).

oontvoo updated this revision to Diff 373017.Sep 16 2021, 11:25 AM
oontvoo marked an inline comment as done.
oontvoo retitled this revision from [lld-macho] Handle encoding personalities of same names but different kinds to [lld-macho] Handle encoding personalities of same names but different kinds.
oontvoo edited the summary of this revision. (Show Details)
  • Simplify the handling to just deal with local sym. (left a fixme on the possibly non-determinism of this)
  • Added more checks in test
int3 added inline comments.Sep 16 2021, 12:57 PM
lld/MachO/UnwindInfoSection.cpp
199

why not use SymbolTable::find() instead?

Right now, this just pick whatever comes first

There will only be one symbol of a given name in the global symbol table, so that issue is avoided too :)

oontvoo updated this revision to Diff 373067.Sep 16 2021, 2:12 PM
oontvoo marked an inline comment as done.

simplify

int3 accepted this revision.Sep 16 2021, 11:06 PM

Thanks!

lld/MachO/UnwindInfoSection.cpp
187

symtab->find will never return an Undefined, so we can use else if here

195
lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
18

let's try to be consistent with commas and spaces :) (same for the lines below)

19

-pie is enabled by default

36

suggestion: instead of _LOW and _HI, how about GXX_PERSONALITY_DYLIB and GXX_PERSONALITY?

57
58
This revision is now accepted and ready to land.Sep 16 2021, 11:06 PM
oontvoo updated this revision to Diff 373248.Sep 17 2021, 9:09 AM
oontvoo marked 6 inline comments as done.

fixed formating issues

lld/MachO/UnwindInfoSection.cpp
187

Ah, no actually - it *can* find an undefined (dylib) symbol. (The third testcase (c.out) covers this).

195

🧐 🧐 🧐

lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
18

(the 3 RUNs were consistent with each other tho)

36

But it's not the _DYLIB that's picked in b.out and c.out . so calling it that is a bit mis-leading, I think.

int3 added inline comments.Sep 17 2021, 9:52 AM
lld/MachO/UnwindInfoSection.cpp
187

oh yeah I forgot that while we exit early if some undefined symbols remain unresolved, that exit happens after this function runs...

lld/test/MachO/compact-unwind-both-local-and-dylib-personality.s
36

ah gotcha

oontvoo updated this revision to Diff 373265.Sep 17 2021, 9:59 AM

more minor formatting + rebase

thanks for the review!

This revision was landed with ongoing or failed builds.Sep 17 2021, 10:01 AM
This revision was automatically updated to reflect the committed changes.