This is an archive of the discontinued LLVM Phabricator instance.

Use dSYM SymbolFile Sections file addr instead of ObjectFile Section file addr when they differ in ObjectFileMachO
ClosedPublic

Authored by jasonmolenda on Aug 29 2021, 1:55 AM.

Details

Summary

This is addressing a regression introduced by Greg's change in https://reviews.llvm.org/D86122 . His change was fine, but the original code was written unclearly as to what it was doing.

When a binary is in the system's "shared cache", the file addresses will be different from the dSYM's file addresses. The binary (both binary file & dSYM) usually start with a file address of 0, but when the binary is integrated into the "shared cache" of all common binaries, the segments are rearranged and now the TEXT segment has a non-0 file address. This is pretty unusual normally -- when the UUID matches for a binary and its dSYMs, they should have identical file addresses for their segments, for example.

When we add a dSYM SymbolFile to an ObjectFile that is in the shared cache, we're going to use the symbol table and debug information from the dSYM (SymbolFile) -- we need the Sections to have file addresses that match the symbol table and debug info in the dSYM. In other words, the dSYM's file addresses must win.

This is handled in ObjectFileMachO::ProcessSegmentCommand. This was previously behind a check of "does this section have a valid base address" or something that didn't make sense. Greg noticed this and changed it to "Is this ObjectFile a memory image". This seems like the right thing to do but "IsInMemory()" doesn't apply to the way we create shared cache binaries when lldb and the inferior are using the same shared cache. lldb thinks of these as the same as an on-disk ObjectFile that we've mmap'ed into our space, except it's contents are represented by a DataBufferUnowned. With this "file addr fixup" code behind a "IsInMemory()" check, now we weren't fixing the file addresses and none of the debug info was ever found because the fileaddrs were contained by any of the Sections.

I shouldn't intermix changes, but I was also annoyed by how many places we check if a binary is in the shared cache by checking flags in ObjectFileMachO, and put it in a little method.

Diff Detail

Event Timeline

jasonmolenda created this revision.Aug 29 2021, 1:55 AM
jasonmolenda requested review of this revision.Aug 29 2021, 1:55 AM

I should note two more things. First, I make this code which was previously checking if the ObjectFile was a memory image unconditional if the dSYM's segment file address differs from the ObjectFile's segment file address. I can't see how this is possible in any case but a shared cache -- so I could check that the ObjectFile is in the shared cache -- but however we got to this point, the only correct thing to do is to use the dSYM (SymbolFile)'s file addresses, it's immaterial.

Second, man, I can't think of a way to test this. You need to be adding a dSYM to a binary in the system's shared cache on a darwin system. We have access to the dSYMs for system binaries inside Apple, but we can't use those for automated testing in any way, and which dSYM you need depends on the system. This is a drag because this regression has snuck past us at least once in the past too, looking over the history of the code.

Nice catch, LGTM but I'll defer to Greg to accept.

clayborg accepted this revision.Aug 30 2021, 2:14 PM

So do the shared library files on disk that are exploded from a shared cache of an iOS device have the file addresses set to where they are in the shared cache? I am guessing this must the issue that you were running into where the module was not in memory by still it needed to have its file addresses adjusted.

Besides some indentation, and possibly logging when we modify the file address, this LGTM.

lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp
1691–1692

This seems like a good thing to log to a log channel maybe?

1691–1714

unindent this code? Or is phabricator just not showing that this was unindented?

This revision is now accepted and ready to land.Aug 30 2021, 2:14 PM

So do the shared library files on disk that are exploded from a shared cache of an iOS device have the file addresses set to where they are in the shared cache? I am guessing this must the issue that you were running into where the module was not in memory by still it needed to have its file addresses adjusted.

Yes, when a shared cache is expanded into discrete binaries, they still have the file addresses from the shared cache. So a dylib starts with file address based at 0, then it gets a new file address when incorporated into the shared cache, and then when expanded out of the shared cache, it retains the non-zero base file address.

But also on macOS now, it behaves like iOS where the binaries don't exist on the filesystem any more, they only exist in the shared cache blob / in memory. When the lldb process & the inferior process are using the same shared cache (99% of the time this is true), lldb creates ObjectFiles in its own address space for the shared cache binaries. That's where this bug was especially hitting.

Adding a log message here is a good idea.

Besides some indentation, and possibly logging when we modify the file address, this LGTM.

Update patch to incorporate Greg's suggestion.

This revision was landed with ongoing or failed builds.Aug 31 2021, 1:35 AM
This revision was automatically updated to reflect the committed changes.