Implement pass 3 of bind opcodes from ld64 (which supports both 32-bit and 64-bit).
Pass 3 implementation condenses BIND_OPCODE_DO_BIND_ADD_ADDR_ULEB opcode
to BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED. This change is already behind an
O2 flag so it shouldn't impact current performance. I verified ld64's output with x86_64 LLD
and they were both emitting the same optimized bind opcodes (although in a slightly different
order). Tested with arm64_32 LLD and compared that with x86 LLD that the order of the bind
opcodes are the same (offset values are different which should be expected).
Details
- Reviewers
int3 gkm MaskRay - Group Reviewers
Restricted Project - Commits
- rG33ab995617d0: Recommit "[lld-macho] Use DO_BIND_ADD_ADDR_IMM_SCALED for bind opcodes"
rG321b2bef0985: [lld-macho] Use DO_BIND_ADD_ADDR_IMM_SCALED for bind opcodes
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
367–368 | can we have a comment explaining how BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED works, and where the 15 and sizeof(uint64_t) is coming from? |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
367 | I believe ld64 switches off the type based on whether it operates on a 32-bit or a 64-bit. I didn't get a chance to verify it because LLD doesn't seem to support i386 (https://github.com/llvm/llvm-project/blob/main/lld/MachO/Driver.cpp#L700-L711). |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
367–368 | That was quick! Will add a comment about the 15. As far as the sizeof(uint64_t), I wrote a comment above describing that situation. |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
367 | I'm confused as to why LLD not supporting i386 matters for testing ld64's behavior. llvm-mc can emit i386 object files that we can pass to ld64... |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
367 | I get the following error when trying to pass an i386 object file to LLD using x86_64 arch. ld64.lld: error: /Users/leevince/local/llvm-project/build/Debug/tools/lld/test/MachO/Output/bind-opcodes.s.tmp/foo.o has architecture i386 which is incompatible with target architecture x86_64. I'm unsure how to go about this without having to support i386. |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
367 | Spoken offline - tested 32-bit arch using arm64_32, and used that as comparison with x86_64. | |
367–368 | I switched the use of 15 to BIND_IMMEDIATE_MASK which I think should be enough to cover adding the extra comment, but I'm happy to add it if it's still confusing to readers. | |
lld/test/MachO/bind-opcodes.s | ||
1 | This file is pretty messy and hard to read. So I'll try to condense what I did here. |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
365–366 | the ternary is unnecessary, offsetWidth is just target->wordSize :) also, I think offsetWidth is kind of a misleading name... pointerSize is probably more apt. Or we could just use target->wordSize directly. I think I understand the motivation behind this opcode design: Since every binding is the size of one pointer, the next binding must be at least wordSize away. Most likely it's some multiple of wordSize away (if there are multiple intervening pointers). Hence the scaling by pointer size. (Might be worth to write something like this as a comment) | |
369 | how about p->data / offsetWidth <= BIND_IMMEDIATE_MASK, to mirror the assignment below? | |
372 | ||
lld/test/MachO/bind-opcodes.s | ||
1 | I think we can make it less messy :) 1: yeah this makes sense .ifdef PTR64 .macro ptr val .quad val .endm .endif .ifdef PTR32 .macro ptr val .int val .endm .endif ptr _foo ptr _bar ... PTR64 and PTR32 will have to be defined as part of llvm-mc's invocation. See https://github.com/llvm/llvm-project/blob/main/llvm/test/DebugInfo/X86/dwarfdump-header-64.s for an example. | |
15–20 | something like this should allow reusing the check across both archs (see 'Numeric Substitutions' in the FileCheck manual for details) | |
29 | you can just do %lld -arch arm64_32, the later arch will override the earlier one. no need to define another substitution |
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
369 | I changed it to p->data / target->wordSize < BIND_IMMEDIATE_MASK. I removed the equals comparison because when dyld uncompacts, it seems to add an extra sizeof(intptr_t). See https://opensource.apple.com/source/dyld/dyld-852/src/ImageLoaderMachOCompressed.cpp.auto.html and search for address += immediate*sizeof(intptr_t) + sizeof(intptr_t); | |
lld/test/MachO/bind-opcodes.s | ||
1 | All of them make sense and have been fixed. The file looks a lot cleaner now :) |
Thanks!
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
369 | I think the previous implementation was the right one. The DO_BIND opcode itself adds a sizeof(intptr_t), which is probably why dyld is doing that. From http://www.m4b.io/reverse/engineering/mach/binaries/2015/03/29/mach-binaries.html 's description of DO_BIND:
It would be good to have a test case that covers this edge case :) |
one nit about the commit title / description: there are lots of opcodes with immediates, I think it would be clearer to mention DO_BIND_ADD_ADDR_IMM_SCALED specifically in the title. Also "Implement pass 3 of bind opcodes from ld64" should be followed by a short description of what pass 3 does -- we shouldn't expect future readers to have to look it up :)
lld/test/MachO/bind-opcodes.s | ||
---|---|---|
1 | almost forgot about this :) we need it now that we're building arm binaries |
Done.
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
369 | Re-capping offline convo. In ld64's implementation, it uses "<" instead of "<=". It makes more sense to keep the same behavior to prevent any unknown deviations from ld64. It may be a typo on ld64, but this will prevent unknown mysterious bugs down the road. |
Reverted by 88e2268a344a0ab3df455af08f32c2c354ea55a4
for (BindIR *p = &opcodes[0]; p->opcode != BIND_OPCODE_DONE; ++p) { has a heap-buffer-overflow with test/MachO/bind-opcodes.s
-DLLVM_USE_SANITIZER=Address check-lld-macho to reproduce.
Fixing ASAN
In D106128, buffer overflow was detected when incrementing BindIR pointers. This would
most likely not happen in an ideal world, but switching it to array indexing
is safer.
Ran with -DLLVM_USE_SANITIZER=Address and check-lld-macho.
Fix looks good but can we figure out how it was happening in the first place? I would guess that we were running optimizeOpcodes on an empty vector, but I'm not sure how that would happen in the given test...
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
369 | I think this can be a for-range loop |
We assumed that the BIND_OPCODE_DONE would exist in the vector but it actually doesn't. It actually never gets stored in the vector and is just emitted after everything is optimized. Printing out the opcodes as well shows that BIND_OPCODE_DONE never existed. Now, why this wasn't caught in testing is that this pass happens on specific checks. Sooner or later, it will randomly encounter the an opcode of 0 (by random chance) and exit the loop. The size and contents of the vector are still unchanged so testing without ASAN continued to show correct results.
lld/MachO/SyntheticSections.cpp | ||
---|---|---|
369 | Done. Re-tested with ASAN. Everything looks good :) |
It actually never gets stored in the vector and is just emitted after everything is optimized.
D'oh, that makes sense :)
Some patterns may not exist in the real world but it'd be good to think of such scenarios and don't crash the linker.
buffer overrun is worse, because it makes the linker vulnerable.
the ternary is unnecessary, offsetWidth is just target->wordSize :)
also, I think offsetWidth is kind of a misleading name... pointerSize is probably more apt. Or we could just use target->wordSize directly.
I think I understand the motivation behind this opcode design: Since every binding is the size of one pointer, the next binding must be at least wordSize away. Most likely it's some multiple of wordSize away (if there are multiple intervening pointers). Hence the scaling by pointer size. (Might be worth to write something like this as a comment)