Usage: -bundle_loader <executable>
This option specifies the executable that will load the build output file being linked.
When building a bundle, users can use the --bundle_loader to specify an executable that contains symbols referenced, but not implemented in the bundle.
Details
- Reviewers
jyknight int3 - Group Reviewers
Restricted Project - Commits
- rG5a856f5b4499: Reland [lld-macho]Implement bundle_loader
rG1a0afcf51871: Implement -bundle_loader
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Please prefix the title of the commit with "[lld-macho]" so that it's easier to understand what bundle_loader is being implemented in.
And, I'm not an expert on this codebase, so while I have some comments, I'm also adding a code-owner of lld macho to the review.
lld/MachO/Driver.cpp | ||
---|---|---|
801 | Why is this check before opening needed? | |
lld/MachO/InputFiles.cpp | ||
626 | As discussed via chat, you'll need to make sure that the proper "EXECUTABLE_ORDINAL" ordinal gets used, rather than emitting a LOAD_DYLIB with an empty dylib name. | |
lld/test/MachO/bundle_loader-darwin.test | ||
2 | The tests for LLD shouldn't invoke clang; this will need to use assembly, and invoke llvm-mc and lld, instead. Also, for this sort of multi-input test, the "split-file" util will be better than a bunch of "echo" commands. See e.g. lld/test/MachO/archive.s for examples of how you might structure it. |
Some context (in the commit message and/or comments) on what semantics -bundle_loader is supposed to achieve would be helpful :)
Could you add more detail to the commit message? Assume the reader (like me) has general knowledge of linkers and Mach-O but no knowledge of what specifying a bundle loader does. What kind of restrictions does this impose on the binary / what kind of optimizations does it allow / how is the current implementation falling short? The other commits in lld-macho should give you a flavor of how detailed things should be.
If there are things in the diff that you are uncertain about (e.g. I see the test is not quite done yet) do add it to the message as well (or in a comment, that works too).
Updated test. Also added more comments to the test to explain what bundle_loader is used for
lld/MachO/Driver.cpp | ||
---|---|---|
328–329 | That an error message is emitted here but not in the above call to loadDylib confuses me. I don't know if that's because it's extraneous here or missing above, but it seems like one or the other must be wrong. | |
801 | Usually failing on actual open is sufficient. Also, the call to addFile right below this doesn't do so -- so I'd expect that either that call is missing such a check, or this one is extraneous. | |
lld/MachO/SyntheticSections.cpp | ||
792 | I'd expect to see the code that sets ->ordinal modified to set it to EXECUTABLE_ORDINAL, rather than the code that uses ->ordinal modified to override it. | |
lld/test/MachO/bundle_loader-darwin.test | ||
20–21 | Is the dylib necessary in this test? It's weird to link the same "2.o" file into both a dylib and the main binary. And, assuming not, I think my_func definition can just be moved into main.s, eliminating 2.s. | |
37 | There's no need for the test to be a correctly-executing function -- you can delete all the asm except the call (in order to have the needed symbol reference) or ret (just so there's at least an instruction in the function). |
updated diff to address comments
lld/MachO/Driver.cpp | ||
---|---|---|
328–329 | I've removed it to be consistent with the others... | |
lld/MachO/SyntheticSections.cpp | ||
792 | Lookin at the code, I think DylibFile::ordinal means a different thing than "ordinal" in EXECUTABLE_ORDINAL. Also, the line right above, which sets the WEAK_REF flag also didn't set it earlier. | |
lld/test/MachO/bundle_loader-darwin.test | ||
20–21 | Yes, I think the point of the test to show that even though both (the dylib and the executable) have the symbol, the output bundle gets the symbol from the executable |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
792 | pls ignore the comment above(should've been discarded) . |
lld/MachO/Driver.cpp | ||
---|---|---|
323 | nit | |
324 | ultra nit: here you refer to it as bundle_loader, in Writer.cpp you refer to it as bundle-loader. Personally I think it reads more naturally with a space in between, but anything is fine as long as it's consistent :P | |
325–326 | this should be an assert() -- error()s are things that can arise from user input | |
798 | OPT_bundle_loader should be handled using getLastArg (like the other options in lines 735-773 above) instead of in this loop. This loop should be reserved for options whose values can be specified multiple times. | |
799–800 | should test this | |
lld/MachO/InputFiles.cpp | ||
628 | No need for llvm::. Also this should probably live in Writer.cpp (see comment there) | |
632–634 | I don't think this change is necessary, unless bundle loaders can somehow be sub-libraries | |
lld/MachO/InputFiles.h | ||
144 | this needs a detailed comment. I think it's pretty surprising that we're making DylibFiles out of bundles and executables, which are clearly not dylibs, but I get why -- they serve the same role here as binary interfaces whose contents do not appear in the final output. Still, surprising behavior should always be explained. (Would also be good to preface it with some explanation of what a bundle loader is, i.e. the stuff you have written in the commit message.) | |
lld/MachO/SyntheticSections.cpp | ||
269–270 | since EXECUTABLE_ORDINAL is larger than BIND_IMMEDIATE_MASK, this seems unnecessary | |
lld/MachO/Writer.cpp | ||
506–507 ↗ | (On Diff #321878) | I think we should assign ordinal = EXECUTABLE_ORDINAL here instead of in InputFiles.cpp. It's kind of weird to have ordinal set to its final value for bundle loaders at the parsing phase while all other ordinal values get set at the writing phase. |
lld/test/MachO/bundle_loader-darwin.test | ||
9–18 | ah I guess you were following archive.s, but most of our other tests don't do these smoke checks, I think they make things unnecessarily verbose. We're not testing llvm-mc here, we can just assume that the object files are correct | |
20 | the three "Produce" comments in this file aren't really necessary -- it's easy enough to see what the commands are doing | |
27 | -dynamic isn't necessary (it's currently a no-op in LLD, and even if it weren't, it's implied by -bundle) I don't think -demangle is necessary either | |
27 | Also, I guess if we created bundle.bundle from 3.o and 2.o (instead of mylib.dylib) then my_func would come from 2.o, right? Could we have a test for that too? | |
30 | doesn't seem like the grep is necessary | |
30 | I would prefer we use llvm-objdump --macho --bind instead of llvm-nm here -- it should provide a more "raw" view of the file data since it's Mach-O-specific and isn't trying to shoehorn multiple file formats into one output format | |
37 | nit: most test files use two-space indents | |
43 | ||
53 |
lld/MachO/InputFiles.cpp | ||
---|---|---|
666–670 | it would be ideal if this TBD code path were tested as well. We have some example TBD files e.g. at test/MachO/Inputs/MacOSX.sdk/usr/lib/libSystem.tbd. But I'm fine with having this added later so long as you leave a TODO. | |
lld/MachO/InputFiles.h | ||
128–130 | would be good to assert(!isBundleLoader || !umbrella). Or just create a makeBundleLoader function that doesn't take an umbrella. |
addressed comments
lld/MachO/Driver.cpp | ||
---|---|---|
325–326 | This was because some tests was expecting this error upon seeing executable file being passed here. (It was a correct error, except for the case where the executable being the bundle-loader) | |
lld/MachO/InputFiles.cpp | ||
666–670 | Added a FIXME for this case. | |
lld/test/MachO/bundle_loader-darwin.test | ||
30 | llvm-objdump didn't seem to produce anything useful to assert on for me: llvm-objdump-9 --macho --bind /mnt/ssd/repo/llvm-project/build/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle.bundle /mnt/ssd/repo/llvm-project/build/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle.bundle: Bind table: segment section address type addend dylib symbol __DATA_CONST __got 0x00002000 pointer 0 libSystem dyld_stub_binder |
lld/MachO/Driver.cpp | ||
---|---|---|
325–326 | aha I see. | |
lld/MachO/InputFiles.h | ||
144 | Thanks for adding the comment. However, while it explains what a bundle loader is, it does not talk about why we are creating a DylibFile from things that are not dylibs, which was the main point of my comment :) | |
145 | this line is unnecessary (that much is clear from the variable name) | |
lld/test/MachO/bundle_loader-darwin.test | ||
2 | to follow the other tests also can you name this test bundle-loader.s? I.e. don't use underscores, no need to mention darwin (all the tests here are for darwin), and since the test inputs are assembly, the extension should match. | |
15 |
| |
30 | Sorry, it should have been --lazy-bind, since _main is a function and all functions are lazily bound. Anyway I tried it locally and ran into this error: (base) test/MachO: llvm-objdump --macho --lazy-bind /Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle /Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle: Lazy bind table: segment section address dylib symbol llvm-objdump: error: '/Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle': truncated or malformed object (for BIND_OPCODE_SET_DYLIB_ORDINAL_ULEB bad library ordinal: 255 (max 1) for opcode at: 0x2) I wasn't sure if this was a problem in our output or in llvm-objdump, so I tried replacing %lld in the test with ld to see what ld64 was emitting. But it appears that ld64 isn't even emitting any references to _main in bundle2.bundle: (base) test/MachO: llvm-nm -m /Users/jezng/src/llvm-project/build/release/tools/lld/test/MachO/Output/bundle_loader-darwin.test.tmp/bundle2.bundle (undefined) external dyld_stub_binder (from libSystem) 0000000000003fb7 (__TEXT,__text) external my_func 0000000000003fb1 (__TEXT,__text) non-external my_user I was testing this on Catalina 10.15.7 with ld64-609.7. Does the behavior differ on your system? |
lld/MachO/InputFiles.cpp | ||
---|---|---|
632–634 | No, it's not technically needed. I just thought it should be made a bit clearer (Otherwise, it was calling isImplicitlyLinked() on an empty dylibName). WDYT? | |
lld/MachO/Writer.cpp | ||
485 ↗ | (On Diff #321912) | @int3 I think this needs to emit the __mh_bundle_header here too? (not because of this patch, but it seems it's currently lacking that) |
lld/test/MachO/bundle_loader-darwin.test | ||
30 | Ah, I think the error was because the EXECUTABLE_ORDINAL should not have been emitted. (really, the point of setting that EXECUTABLE_ORDINAL was for llvm-nm to know to emit "from executable" when it prints the symbol). Anyways, I've fixed that and updated the tests to also use llvm-objdump. About` _main`, yeah, maybe there needs to be a special case where _main doesn't get emitted? |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
272 | I think this needs to emit BIND_OPCODE_SET_DYLIB_SPECIAL_IMM when ordinal < 0, instead of checking isBundleLoader and not outputting any command. I guess the default is executable, perhaps? So the test works if it's the first symbol, but if there was a symbol refering to another lib first, then this would incorrectly remain pointing to that other library since no command is emitted. | |
789 | Probably should explicitly convert back to EXECUTABLE_ORDINAL here -- it's working anyways because BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE is -1, which converts to 0xff, but that seems kinda sketchy. |
lld/MachO/InputFiles.cpp | ||
---|---|---|
632–634 | I think it's confusing either way... I'm actually wondering if it we should have separate DylibFile and BundleLoader classes that inherit from some common base class. Not asking you to do it in this diff though. But for now I would prefer we just remove the isBundleLoader check here | |
lld/MachO/Writer.cpp | ||
485 ↗ | (On Diff #321912) | yup I'm aware it's a missing feature |
504 ↗ | (On Diff #322895) | hm why did this change from EXECUTABLE_ORDINAL to BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE? Also, shouldn't there be a continue; here since I assume we don't want to emit an LC_LOAD_DYLIB for this? (Can you add a test that would catch this issue?) also, no need for llvm:: |
lld/test/MachO/bundle_loader-darwin.test | ||
---|---|---|
16 | I think this comment is kinda redundant | |
18 | Since the address is not relevant to the test, let's not match on it (to make future changes to this file easier) I was also confused for a bit about how this was testing for EXECUTABLE_ORDINAL. Then I realized that the lack of dylib name in the output indicates that it's not from a dylib. I think it would be clearer if you included the segment/section/address etc header before this line. |
updated diff
lld/MachO/InputFiles.cpp | ||
---|---|---|
632–634 | Done. That makes sense. | |
lld/MachO/Writer.cpp | ||
504 ↗ | (On Diff #322895) |
These two are kind of the same thing (The former being 0xff and the latter -1). BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE is the right thing to use when encoding because it's consistent with others BIND_* things.
No, I think it doesn't "reexport" the symbols but it still needs the load_dylib for the symbols that are referenced.
Done |
lld/test/MachO/bundle_loader-darwin.test | ||
18 |
It's a bit clearer to use llvm-nm to test for this. (because it'd say "from executable")
Right.
done |
Okay, this seems like it's almost there. Thanks for putting up with all the nitty comments and explaining the details, I know it's taken a while :) Once the remaining comments are addressed I'll stamp this. Thank you!
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
270 | hmm doesn't BIND_OPCODE_SET_DYLIB_SPECIAL_IMM encode its argument as an immediate rather than in a trailing ULEB? | |
lld/MachO/Writer.cpp | ||
504 ↗ | (On Diff #322895) |
Ah, I see... the encoded ordinal value isn't actually a signed integer, but an unsigned byte with a few special values.
Got it, thanks! |
lld/test/MachO/bundle_loader-darwin.test | ||
18 |
Fair. Though I'm actually not sure if llvm-nm is getting that information from the bind table or from the symbol table. (I think it's actually the symbol table.) |
Updated diff:
- fixed BIND_OPCODE_SET_DYLIB_SPECIAL_IMM encoding
- rename test to bundle-loader.s
No worries. Thanks for taking the time to review this!
One more thing, some tests started to fail after the signed => unsigned types change ... I must've missed something obvious ... thoughts?
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
270 |
Whoops, yes - it should be. (fixed now). | |
lld/test/MachO/bundle_loader-darwin.test | ||
18 |
Yeah, it gets it from the symbol table[0] - Specifically, it looks at the nlist_64::n_desc field and prints from executable for ordinal-exec [1] [0] https://github.com/llvm/llvm-project/blob/6c05005238a805a699d9dec39a61971affd1cab4/llvm/tools/llvm-nm/llvm-nm.cpp#L414 |
updated diff:
- removed accidental incr on the dylibOrdinal (twice)
- added missing handling of the BIND_OPCODE_SET_DYLIB_SPECIAL_IMM
P.S: never mind - I'd forgot to also handle the special_imm case in LazyBindingSection::encode().
(resolved by refactoring the code to use encodeDylibOrdinal)
lld/test/MachO/bundle-loader.s | ||
---|---|---|
1 ↗ | (On Diff #324062) | This currently works because split-file is being used, which means the uncommented line doesn't get passed to llvm-mc, but we should still comment it out for uniformity |
13 ↗ | (On Diff #324062) | missing colon + consider not adding the _CHECK suffix which is kinda redundant. Also I would suggest using hyphens instead of underscores as separators, since that's what llvm-lit uses for its native prefixes (e.g. CHECK-NEXT). |
lld/test/MachO/bundle_loader-darwin.test | ||
18 | Great, we're testing both outputs then :) |
http://lab.llvm.org:8011/#/builders/5/builds/4761
/b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:269:17: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/include/string.h:43:28: note: nonnull attribute specified here #0 0x50b3cf2 in (anonymous namespace)::LCDylib::writeTo(unsigned char*) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:269:5 #1 0x5081da7 in lld::macho::MachHeaderSection::writeTo(unsigned char*) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/SyntheticSections.cpp:89:9 #2 0x509fea5 in writeSections /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:761:13 #3 0x509fea5 in (anonymous namespace)::Writer::run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:815:3 #4 0x509dab0 in lld::macho::writeResult() /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:822:38 #5 0x505a851 in lld::macho::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Driver.cpp:921:3 #6 0x4b35e25 in lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:159:13 #7 0x4b385b3 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:180:15 #8 0x4b385b3 in void llvm::function_ref<void ()>::callback_fn<lld::safeLldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&)::$_0>(long) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLExtras.h:185:12 #9 0x4b5dc82 in operator() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/STLExtras.h:209:12 #10 0x4b5dc82 in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/CrashRecoveryContext.cpp:424:3 #11 0x4b357df in lld::safeLldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:179:14 #12 0x4b359b6 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/tools/lld/lld.cpp:222:14 #13 0x7f16b4e9d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #14 0x4b121f9 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/lld+0x4b121f9) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/lld/MachO/Writer.cpp:269:17 in
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild
For particular this case I'd just rerun check-lld with "assert()/if (is null) abort()" in the place of already known error.
@oontvoo this is what I do using cmake:
llvm-project/build/asan: cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='lld' -DLLVM_USE_SANITIZER='Address;Undefined' ../../llvm llvm-project/build/asan: ninja check-lld-macho
Updated diff: fix ubsan bug. The executable didn't have a dylib path/name.
lld/MachO/Writer.cpp | ||
---|---|---|
504 ↗ | (On Diff #322895) | Actually, I think you were right about this. The bundle-loader also didn't need this extra load command. (I've jsut retried the test case with ld64, and it skipped this.) |
lld/MachO/Writer.cpp | ||
---|---|---|
504 ↗ | (On Diff #322895) | P.S: updated test was: llvm-objdump --macho --lazy-bind %t/bundle.bundle | FileCheck %s --check-prefix BUNDLE-OBJ segment section address dylib symbol __DATA __la_symbol_ptr 0x{{[0-9a-f]*}} main-executable my_fun Will sleep on this reland it on Monday, just in case. |
nit