Page MenuHomePhabricator

[lld-macho] Implement -bundle_loader
ClosedPublic

Authored by oontvoo on Feb 2 2021, 6:27 PM.

Details

Summary

Usage: -bundle_loader <executable>
This option specifies the executable that will load the build output file being linked.
When building a bundle, users can use the --bundle_loader to specify an executable that contains symbols referenced, but not implemented in the bundle.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
int3 added inline comments.Feb 5 2021, 4:13 PM
lld/MachO/InputFiles.cpp
666–672

it would be ideal if this TBD code path were tested as well. We have some example TBD files e.g. at test/MachO/Inputs/MacOSX.sdk/usr/lib/libSystem.tbd. But I'm fine with having this added later so long as you leave a TODO.

lld/MachO/InputFiles.h
128–130

would be good to assert(!isBundleLoader || !umbrella). Or just create a makeBundleLoader function that doesn't take an umbrella.

oontvoo updated this revision to Diff 321912.Feb 5 2021, 6:49 PM
oontvoo marked 12 inline comments as done.

addressed comments

lld/MachO/Driver.cpp
333–334

This was because some tests was expecting this error upon seeing executable file being passed here. (It was a correct error, except for the case where the executable being the bundle-loader)

lld/MachO/InputFiles.cpp
666–672

Added a FIXME for this case.

lld/test/MachO/bundle_loader-darwin.test
29 ↗(On Diff #321878)

llvm-objdump didn't seem to produce anything useful to assert on for me:

llvm-objdump-9 --macho --bind /mnt/ssd/repo/llvm-project/build/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle.bundle
/mnt/ssd/repo/llvm-project/build/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle.bundle:
Bind table:
segment  section            address    type       addend dylib            symbol
__DATA_CONST __got              0x00002000 pointer         0 libSystem        dyld_stub_binder
int3 added inline comments.Feb 5 2021, 11:47 PM
lld/MachO/Driver.cpp
333–334

aha I see.

lld/MachO/InputFiles.h
144

Thanks for adding the comment. However, while it explains what a bundle loader is, it does not talk about why we are creating a DylibFile from things that are not dylibs, which was the main point of my comment :)

145

this line is unnecessary (that much is clear from the variable name)

lld/test/MachO/bundle_loader-darwin.test
1 ↗(On Diff #321912)

to follow the other tests

also can you name this test bundle-loader.s? I.e. don't use underscores, no need to mention darwin (all the tests here are for darwin), and since the test inputs are assembly, the extension should match.

14 ↗(On Diff #321912)
  1. use hyphen to follow other tests
  2. missing colon
29 ↗(On Diff #321878)

Sorry, it should have been --lazy-bind, since _main is a function and all functions are lazily bound.

Anyway I tried it locally and ran into this error:

(base) test/MachO: llvm-objdump --macho --lazy-bind /Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle
/Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle:
Lazy bind table:
segment  section            address     dylib            symbol
llvm-objdump: error: '/Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle': truncated or malformed object (for BIND_OPCODE_SET_DYLIB_ORDINAL_ULEB bad library ordinal: 255 (max 1) for opcode at: 0x2)

I wasn't sure if this was a problem in our output or in llvm-objdump, so I tried replacing %lld in the test with ld to see what ld64 was emitting. But it appears that ld64 isn't even emitting any references to _main in bundle2.bundle:

(base) test/MachO: llvm-nm -m /Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle
                 (undefined) external dyld_stub_binder (from libSystem)
0000000000003fb7 (__TEXT,__text) external my_func
0000000000003fb1 (__TEXT,__text) non-external my_user

I was testing this on Catalina 10.15.7 with ld64-609.7. Does the behavior differ on your system?

oontvoo updated this revision to Diff 322269.Feb 8 2021, 7:29 PM
oontvoo marked 10 inline comments as done.

updated diff

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2021, 7:29 PM
oontvoo added inline comments.Feb 8 2021, 7:29 PM
lld/MachO/InputFiles.cpp
633–634

No, it's not technically needed. I just thought it should be made a bit clearer (Otherwise, it was calling isImplicitlyLinked() on an empty dylibName). WDYT?

lld/MachO/Writer.cpp
489

@int3 I think this needs to emit the __mh_bundle_header here too?

(not because of this patch, but it seems it's currently lacking that)

lld/test/MachO/bundle_loader-darwin.test
29 ↗(On Diff #321878)

Ah, I think the error was because the EXECUTABLE_ORDINAL should not have been emitted. (really, the point of setting that EXECUTABLE_ORDINAL was for llvm-nm to know to emit "from executable" when it prints the symbol).

Anyways, I've fixed that and updated the tests to also use llvm-objdump.

About` _main`, yeah, maybe there needs to be a special case where _main doesn't get emitted?

jyknight added inline comments.Feb 10 2021, 11:22 AM
lld/MachO/SyntheticSections.cpp
277

I think this needs to emit BIND_OPCODE_SET_DYLIB_SPECIAL_IMM when ordinal < 0, instead of checking isBundleLoader and not outputting any command.

I guess the default is executable, perhaps? So the test works if it's the first symbol, but if there was a symbol refering to another lib first, then this would incorrectly remain pointing to that other library since no command is emitted.

799–804

Probably should explicitly convert back to EXECUTABLE_ORDINAL here -- it's working anyways because BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE is -1, which converts to 0xff, but that seems kinda sketchy.

oontvoo updated this revision to Diff 322839.Feb 10 2021, 3:20 PM

changed ordinal types to signed type

oontvoo marked 2 inline comments as done.Feb 10 2021, 3:20 PM
oontvoo updated this revision to Diff 322895.Feb 10 2021, 7:08 PM
oontvoo marked an inline comment as done.

rebase

int3 added inline comments.Feb 11 2021, 4:51 PM
lld/MachO/InputFiles.cpp
633–634

I think it's confusing either way...

I'm actually wondering if it we should have separate DylibFile and BundleLoader classes that inherit from some common base class. Not asking you to do it in this diff though. But for now I would prefer we just remove the isBundleLoader check here

lld/MachO/Writer.cpp
489

yup I'm aware it's a missing feature

504

hm why did this change from EXECUTABLE_ORDINAL to BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE?

Also, shouldn't there be a continue; here since I assume we don't want to emit an LC_LOAD_DYLIB for this? (Can you add a test that would catch this issue?)

also, no need for llvm::

int3 added inline comments.Feb 11 2021, 4:54 PM
lld/test/MachO/bundle_loader-darwin.test
15 ↗(On Diff #322895)

I think this comment is kinda redundant

17 ↗(On Diff #322895)

Since the address is not relevant to the test, let's not match on it (to make future changes to this file easier)

I was also confused for a bit about how this was testing for EXECUTABLE_ORDINAL. Then I realized that the lack of dylib name in the output indicates that it's not from a dylib. I think it would be clearer if you included the segment/section/address etc header before this line.

oontvoo updated this revision to Diff 323511.Feb 12 2021, 7:31 PM
oontvoo marked 5 inline comments as done.

updated diff

lld/MachO/InputFiles.cpp
633–634

Done. That makes sense.

lld/MachO/Writer.cpp
504

hm why did this change from EXECUTABLE_ORDINAL to BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE?

These two are kind of the same thing (The former being 0xff and the latter -1). BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE is the right thing to use when encoding because it's consistent with others BIND_* things.
Also (from looking at what ld64 does), it's more convenient to set BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE here so that you can have the encodeDylibOrdinal function do the right thing for the special cases where ordinal <=0. ( 0 is for self, and -1 is for dylib-main-exec)

Also, shouldn't there be a continue; here since I assume we don't want to emit an LC_LOAD_DYLIB for this? (Can you add a test that would catch this issue?)

No, I think it doesn't "reexport" the symbols but it still needs the load_dylib for the symbols that are referenced.
(otherwise, we're missing those. And the test is that when you do llvm-objdump, you'd get an error for "truncated or malformed object")

also, no need for llvm::

Done

lld/test/MachO/bundle_loader-darwin.test
17 ↗(On Diff #322895)

I was also confused for a bit about how this was testing for EXECUTABLE_ORDINAL.

It's a bit clearer to use llvm-nm to test for this. (because it'd say "from executable")

Then I realized that the lack of dylib name in the output indicates that it's not from a dylib.

Right.

I think it would be clearer if you included the segment/section/address etc header before this line.

done

int3 added a comment.Feb 15 2021, 11:44 AM

Okay, this seems like it's almost there. Thanks for putting up with all the nitty comments and explaining the details, I know it's taken a while :) Once the remaining comments are addressed I'll stamp this. Thank you!

lld/MachO/SyntheticSections.cpp
270

hmm doesn't BIND_OPCODE_SET_DYLIB_SPECIAL_IMM encode its argument as an immediate rather than in a trailing ULEB?

lld/MachO/Writer.cpp
504

These two are kind of the same thing (The former being 0xff and the latter -1).

Ah, I see... the encoded ordinal value isn't actually a signed integer, but an unsigned byte with a few special values.

No, I think it doesn't "reexport" the symbols but it still needs the load_dylib for the symbols that are referenced.

Got it, thanks!

lld/test/MachO/bundle_loader-darwin.test
17 ↗(On Diff #322895)

It's a bit clearer to use llvm-nm to test for this. (because it'd say "from executable")

Fair. Though I'm actually not sure if llvm-nm is getting that information from the bind table or from the symbol table. (I think it's actually the symbol table.)

oontvoo updated this revision to Diff 324044.Feb 16 2021, 10:03 AM
oontvoo marked an inline comment as done.

Updated diff:

  • fixed BIND_OPCODE_SET_DYLIB_SPECIAL_IMM encoding
  • rename test to bundle-loader.s

Okay, this seems like it's almost there. Thanks for putting up with all the nitty comments and explaining the details, I know it's taken a while :) Once the remaining comments are addressed I'll stamp this. Thank you!

No worries. Thanks for taking the time to review this!

One more thing, some tests started to fail after the signed => unsigned types change ... I must've missed something obvious ... thoughts?

lld/MachO/SyntheticSections.cpp
270

hmm doesn't BIND_OPCODE_SET_DYLIB_SPECIAL_IMM encode its argument as an immediate rather than in a trailing ULEB?

Whoops, yes - it should be. (fixed now).

lld/test/MachO/bundle_loader-darwin.test
17 ↗(On Diff #322895)

Fair. Though I'm actually not sure if llvm-nm is getting that information from the bind table or from the symbol table. (I think it's actually the symbol table.)

Yeah, it gets it from the symbol table[0] - Specifically, it looks at the nlist_64::n_desc field and prints from executable for ordinal-exec [1]

[0] https://github.com/llvm/llvm-project/blob/6c05005238a805a699d9dec39a61971affd1cab4/llvm/tools/llvm-nm/llvm-nm.cpp#L414
[1] https://github.com/llvm/llvm-project/blob/6c05005238a805a699d9dec39a61971affd1cab4/llvm/tools/llvm-nm/llvm-nm.cpp#L613

oontvoo updated this revision to Diff 324053.Feb 16 2021, 10:50 AM

updated diff:

  • removed accidental incr on the dylibOrdinal (twice)
  • added missing handling of the BIND_OPCODE_SET_DYLIB_SPECIAL_IMM
oontvoo updated this revision to Diff 324062.Feb 16 2021, 11:19 AM

updated diff: refactored code to reuse encodeDylibOrdinal

One more thing, some tests started to fail after the signed => unsigned types change ... I must've missed something obvious ... thoughts?

P.S: never mind - I'd forgot to also handle the special_imm case in LazyBindingSection::encode().
(resolved by refactoring the code to use encodeDylibOrdinal)

int3 accepted this revision.Feb 18 2021, 1:00 PM
int3 added inline comments.
lld/test/MachO/bundle-loader.s
2

This currently works because split-file is being used, which means the uncommented line doesn't get passed to llvm-mc, but we should still comment it out for uniformity

14

missing colon + consider not adding the _CHECK suffix which is kinda redundant. Also I would suggest using hyphens instead of underscores as separators, since that's what llvm-lit uses for its native prefixes (e.g. CHECK-NEXT).

lld/test/MachO/bundle_loader-darwin.test
17 ↗(On Diff #322895)

Great, we're testing both outputs then :)

This revision is now accepted and ready to land.Feb 18 2021, 1:00 PM
oontvoo updated this revision to Diff 324749.Feb 18 2021, 1:10 PM
oontvoo marked 3 inline comments as done and an inline comment as not done.

updated all the check prefixes

This revision was landed with ongoing or failed builds.Feb 18 2021, 1:12 PM
This revision was automatically updated to reflect the committed changes.

http://lab.llvm.org:8011/#/builders/5/builds/4761

/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:269:17: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:43:28: note: nonnull attribute specified here
    #0 0x50b3cf2 in (anonymous namespace)::LCDylib::writeTo(unsigned char*) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:269:5
    #1 0x5081da7 in lld::macho::MachHeaderSection::writeTo(unsigned char*) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/SyntheticSections.cpp:89:9
    #2 0x509fea5 in writeSections /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:761:13
    #3 0x509fea5 in (anonymous namespace)::Writer::run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:815:3
    #4 0x509dab0 in lld::macho::writeResult() /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:822:38
    #5 0x505a851 in lld::macho::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Driver.cpp:921:3
    #6 0x4b35e25 in lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:159:13
    #7 0x4b385b3 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:180:15
    #8 0x4b385b3 in void llvm::function_ref<void ()>::callback_fn<lld::safeLldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&)::$_0>(long) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLExtras.h:185:12
    #9 0x4b5dc82 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLExtras.h:209:12
    #10 0x4b5dc82 in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:424:3
    #11 0x4b357df in lld::safeLldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:179:14
    #12 0x4b359b6 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:222:14
    #13 0x7f16b4e9d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #14 0x4b121f9 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/lld+0x4b121f9)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:269:17 in
vitalybuka reopened this revision.Feb 19 2021, 5:41 PM
This revision is now accepted and ready to land.Feb 19 2021, 5:41 PM

@vitalybuka How do I rerun the test with uban enabled?
Thanks!

@vitalybuka How do I rerun the test with uban enabled?
Thanks!

https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

For particular this case I'd just rerun check-lld with "assert()/if (is null) abort()" in the place of already known error.

int3 added a comment.Feb 19 2021, 7:14 PM

@oontvoo this is what I do using cmake:

llvm-project/build/asan: cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='lld' -DLLVM_USE_SANITIZER='Address;Undefined' ../../llvm
llvm-project/build/asan: ninja check-lld-macho
oontvoo updated this revision to Diff 325272.Feb 20 2021, 7:41 PM

Updated diff: fix ubsan bug. The executable didn't have a dylib path/name.

lld/MachO/Writer.cpp
504

Actually, I think you were right about this. The bundle-loader also didn't need this extra load command. (I've jsut retried the test case with ld64, and it skipped this.)
This was also why we hit the ubsan one (the path name is going to be empty for executable).

oontvoo added inline comments.Feb 20 2021, 7:45 PM
lld/MachO/Writer.cpp
504

P.S: updated test was:

llvm-objdump  --macho --lazy-bind %t/bundle.bundle | FileCheck %s --check-prefix BUNDLE-OBJ
segment   section                    address        dylib                          symbol
 __DATA    __la_symbol_ptr     0x{{[0-9a-f]*}}    main-executable       my_fun

Will sleep on this reland it on Monday, just in case.