This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump] Exclude __mh_*_header symbols during MachO disassembly
ClosedPublic

Authored by gkm on May 3 2021, 1:55 PM.

Details

Summary

__mh_(execute|dylib|dylinker|bundle|preload|object)_header are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in __TEXT, even though their addresses are outside __TEXT, and they do not refer to code.

It is normally harmless, but when the first section of __TEXT has no other symbols, __mh_*_header is considered by the disassembler when determing function boundaries. Since __mh_*_header refers to an address outside __TEXT, the boundary determination fails and disassembly quits.

Since __TEXT,__text normally has symbols, this bug is obscured. Experiments placing __stubs and __stub_helper first exposed the bug, since neither has symbols.

Diff Detail

Event Timeline

gkm created this revision.May 3 2021, 1:55 PM
gkm requested review of this revision.May 3 2021, 1:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2021, 1:55 PM
MaskRay added inline comments.May 4 2021, 12:51 PM
llvm/test/tools/llvm-objdump/MachO/no-text-symbols-disassembly.test
2

Use yaml2obj to create the test file. A prebuilt binary is difficult to inspect/modify.

gkm updated this revision to Diff 343877.May 8 2021, 6:17 PM
  • revise according to review feedback
int3 added a subscriber: int3.May 11 2021, 9:47 AM
int3 added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
1082

there are a bunch of other __mh_*_header symbols (you can see them in the LLD source)... should we ignore them all?

int3 added inline comments.May 11 2021, 9:53 AM
llvm/test/tools/llvm-objdump/MachO/no-text-symbols-disassembly.test
77

I believe LC_UNIXTHREAD is emitted when linking for older target versions. If you pass a modern macOS version to the ld64 you should get LC_MAIN instead. Not that it really matters, but LC_MAIN is generally more compact to encode, especially in yaml -- it doesn't have all those PayloadBytes.

gkm marked 2 inline comments as done.May 11 2021, 1:44 PM
gkm added inline comments.
llvm/test/tools/llvm-objdump/MachO/no-text-symbols-disassembly.test
77

LC_UNIXTHREAD is required for static linking regardless of version, and there is no way to override that. YAML output for a dynamic link is larger.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1082

Yes, all should be ignored. They serve the same purpose of header introspection and do not offer any useful information about code boundaries to the disassembler.

int3 added inline comments.May 11 2021, 1:50 PM
llvm/test/tools/llvm-objdump/MachO/no-text-symbols-disassembly.test
77

Ah I see. I went to look at my recent diff that added a yaml test, and it seems like yaml2obj doesn't require either load command: rG9260760235261a5cd150b15a3499f7988da65a02

Not sure if objdump needs it to work, but the guidance I've gotten in the past wrt yaml inputs is to make them as small as possible, even if the resulting binary doesn't execute.

gkm marked 2 inline comments as done.May 11 2021, 2:38 PM
gkm added inline comments.
llvm/test/tools/llvm-objdump/MachO/no-text-symbols-disassembly.test
77

I manually edited the YAML file to remove PayloadBytes for LC_UNIXTHREAD. I couldn't remove the entire command without getting a non-obvious error from yaml2obj. Is manual editing how we're supposed to trim these files?

int3 added inline comments.May 11 2021, 2:52 PM
llvm/test/tools/llvm-objdump/MachO/no-text-symbols-disassembly.test
77

Is manual editing how we're supposed to trim these files?

yeah that's what I've done.

LC_UUID should be removable too.

gkm updated this revision to Diff 344590.May 11 2021, 4:36 PM
  • revise according to review feedback
gkm updated this revision to Diff 344619.May 11 2021, 5:50 PM
  • trim a few more load commands from the test YAML
gkm marked 2 inline comments as done.May 11 2021, 5:51 PM
int3 accepted this revision.May 11 2021, 6:33 PM

lgtm

This revision is now accepted and ready to land.May 11 2021, 6:33 PM
int3 added a comment.May 11 2021, 6:53 PM

Maybe update the commit title since we now support all the __mh_* symbols

gkm updated this revision to Diff 344804.May 12 2021, 6:38 AM
gkm retitled this revision from [llvm-objdump] Exclude __mh_execute_header symbol during MachO disassembly to [llvm-objdump] Exclude __mh_*_header symbols during MachO disassembly.
gkm edited the summary of this revision. (Show Details)
  • update title & summary
This revision was landed with ongoing or failed builds.May 12 2021, 6:39 AM
This revision was automatically updated to reflect the committed changes.