This is an archive of the discontinued LLVM Phabricator instance.

[MachO] Fix symbol merging during symtab parsing.
ClosedPublic

Authored by JDevlieghere on Oct 4 2019, 10:58 PM.

Details

Summary

The symtab parser in ObjectFileMachO has logic to coalesce debug (STAB) and non-debug symbols, based on the address and the symbol name for static (STSYM) and global symbols (GSYM) respectively. It makes the assumption that the debug variant is always encountered first. Rather than creating a second entry in the symbol table for the non-debug symbol, the latter gets merged into the existing debug symbol.

This breaks when the linker emits the non-debug symbol first. We'd end up with two entries in the symbol table, each containing part of the information LLDB relies on. Indeed, commenting out the merging logic breaks the test suite spectacularly.

This patch solves that problem by always parsing the debug symbols first. This guarantees that the assumption for merging holds.

PS: I'm not particularly proud of the way this turned out, but after numerous attempts this is the best solution I could come up with. The symtab parsing logic is pretty complex in that it touches a lot of things. I've experienced first hand that it's very easy to break things. I believe this approach strikes a balance between fixing the issue while limiting the risk of regressions.

Diff Detail

Event Timeline

JDevlieghere created this revision.Oct 4 2019, 10:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2019, 10:58 PM
Herald added a subscriber: abidh. · View Herald Transcript
JDevlieghere edited the summary of this revision. (Show Details)Oct 4 2019, 11:03 PM

I don't know much about this stuff, but it seems relatively reasonable to me. Tagging @kwk and @jankratochvil because they were involved in some elf symbol merging discussions recently, and so they may find this interesting.

lldb/lit/ObjectFile/Inputs/SymbolTable.yaml
1 ↗(On Diff #223363)

Somewhat confusingly, the "object file" tests currently live in lit/Modules/$OBJ_FORMAT. In the current arrangement, I believe this should go there too, but if you want to rename the whole Modules folder into ObjectFile, I would be fine with that.

lldb/lit/ObjectFile/TestSymbolTable.test
1 ↗(On Diff #223363)

Is that necessary? I believe our MachO capabilities should work fine everywhere. At least, it doesn't seem to be necessary for all (four) of our existing MachO obj file tests.

lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
3695

Personally, I'd just move this check out of the lambda and into the code which invokes it.

JDevlieghere marked 2 inline comments as done.

Address code review feedback.

JDevlieghere added inline comments.Oct 7 2019, 1:43 PM
lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
3695

Given that you need is_debug all throughout the parsing code I'm not convinced that'd be an improvement. Either you'd have to compute it twice or you'd have to pass it in which case you still have the extra argument.

aprantl added inline comments.Oct 7 2019, 2:58 PM
lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
3736

because of the early exit above — isn't this either always true or always false?

4477

reserve symtab_load_command.nsyms elements?

aprantl added inline comments.Oct 7 2019, 3:11 PM
lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
3736

Never mind, I thought the RHS above was the enum, but it's actually the parameter.

JDevlieghere marked 2 inline comments as done.

Code review feedback Adrian.

aprantl accepted this revision.Oct 7 2019, 4:23 PM
This revision is now accepted and ready to land.Oct 7 2019, 4:23 PM
This revision was automatically updated to reflect the committed changes.