This is an archive of the discontinued LLVM Phabricator instance.

make debugserver able to inspect mach-o binaries present in memory, but not yet registered with dyld
ClosedPublic

Authored by jasonmolenda on Jun 30 2022, 3:57 PM.

Details

Summary

lldb sends the jGetLoadedDynamicLibrariesInfos packet to debugserver with a list of addresses of mach-o binaries. Today, debugserver collects a list of all binaries that are loaded in the process, along with their filenames, and then debugserver parsers the mach-o binaries in memory and returns the important details in a JSON response so lldb doesn't need to read this data from the inferior itself. If lldb asks about a binary that isn't officially loaded in the process yet -- but is present in memory -- then this packet will return nothing about it.

This patch handles the case where dyld does not know about the binary. We cannot get a filepath for the binary, so "" is returned there, but debugserver can examine memory and if there's something that looks like a mach-o header + load commands, return that information to lldb.

In macOS Ventura, as part of the early startup, we launch with one dynamic linker (dyld) and that dyld hands off control to a dyld in the shared cache. During that transition, the new dyld is not officially loaded in the process yet. I'd like to switch to having lldb parse the not-yet-running dyld and put a breakpoint in it. This is needed first, to avoid a perf hit when lldb inspects the load commands of new-dyld.

I added a test case that constructs a tiny little mach-o binary in a memory buffer and have lldb send that buffer address to jGetLoadedDynamicLibrariesInfos. Current debugserver will return an empty list of binaries found; with this patch, debugserver will have an empty filepath but return everything from the load commands.

There's no real good reviewer in this case, this is all jason code, but if anyone has any questions or comments they'd like to make, I'd be interested in hearing them.

Diff Detail

Event Timeline

jasonmolenda created this revision.Jun 30 2022, 3:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 3:57 PM
jasonmolenda requested review of this revision.Jun 30 2022, 3:57 PM
JDevlieghere added inline comments.Jun 30 2022, 4:56 PM
lldb/tools/debugserver/source/MacOSX/MachProcess.mm
1186–1191

What happens when it turns out that there's no Mach-O header at this load address? I expect it doesn't end up in the jGetLoadedDynamicLibrariesInfos response then? Where does this filtering take place?

jasonmolenda added inline comments.Jul 1 2022, 12:35 AM
lldb/tools/debugserver/source/MacOSX/MachProcess.mm
1186–1191

Yeah, it's a good question. MachProcess::GetMachOInformationFromMemory does the Mach-O parsing, and reviewing that code, I do some sanity checking but I don't check for the magic number in the header. At the very least, I should add that check. I should also check the return value from GetMachOInformationFromMemory and not add the entry to my list of binaries I'm going to respond to lldb with. So, a bit more work here would be a good idea, agreed.

Some random comments.

Also agree that a test for what happens when there is no header found would be good.

lldb/test/API/macosx/unregistered-macho/TestUnregisteredMacho.py
21

You can drop the () around the variable names.

lldb/test/API/macosx/unregistered-macho/main.c
21

It seems like this should be the size of the commands only, not including the header. Though I can't find documentation for it so maybe this is just an unfortunate name.

33

Does this need to be sizeof (struct mach_header_64)? Probably doesn't and/or it doesn't matter for this test.

52

No comment here other than I always like to see this kind of info in a test!

Thanks for the review David. I am still fiddling with handling the case where debugserver is handed the address of something that isn't actually a Mach-O binary in memory, but I'll update the patch soon with these.

lldb/test/API/macosx/unregistered-macho/TestUnregisteredMacho.py
21

Ah, yes thanks.

lldb/test/API/macosx/unregistered-macho/main.c
21

No, you're exactly correct. When I was writing the test case originally I had a little bug and switched sizeofcmds to be size of the mach header and the commands, and it's not correct. Most parsers limit their parsing to the number of load commands and the total size of load commands, so they wouldn't read past the actual load commands. Thanks for noticing, I fixed it after confirming I was wrong, hah.

Updated patch to address Jonas and David's feedback.

Updated the test C file so the mach header sizeofcmds is correct. Update test to test querying an invalid mach-o binary, and the combination of a valid & invalid mach-o binary addresses, and check that we get the correct replies.

Add simple checks to MachProcess::GetMachOInformationFromMemory() which parses inferior memory as a Mach-O binary, to detect things that are not mach-o binaries and return false in those case.

Update callers to GetMachOInformationFromMemory() to mark any entry that had a problem parsing it to mark that entry is is_valid=false.

Update MachProcess::FormatDynamicLibrariesIntoJSON() to check the is_valid flag for each entry, and don't add those entries to the StructuredData return array. Do return validly formatted mach-o's that were found during the scan, though.

MachProcess::GetLoadedDynamicLibrariesInfos() (not used on modern Darwin systems any more) - in the process of updating its handling of macho-parsing call, noticed that the indentation was all incorrect for this method. Fixed it, removed an extra return at the end. This looks like it's undergone some incorrect revision over the years -- probably by me. It's near time this could be removed altogether most likely, but I don't want to do that just yet.

I'll add an NDEBUG assert to DynamicLoaderMacOS later which checks that lldb gets back the same number of binary image entries that it asked for -- so we can flag any case where debugserver omits a binary in our testing. I can't imagine this would happen.

DavidSpickett added inline comments.Jul 7 2022, 6:29 AM
lldb/test/API/macosx/unregistered-macho/TestUnregisteredMacho.py
47

For these would it be more reliable to just look for the load_address: bits?

If the packet contains multiple response, then fine but if it's one response then the second one wouldn't ever match even if debugserver did somehow find the one at the invalid address. Also if it did would it put it in a images section and therefore not match because of that too?

lldb/test/API/macosx/unregistered-macho/main.c
49

This is redundant (the test uses macho_buf).

lldb/tools/debugserver/source/MacOSX/MachProcess.mm
756

Could this set inf.valid instead of having to loop later to set it?

768

Personally I'd put this as a static function just before this one as it doesn't capture anything, but up to you.

jasonmolenda added inline comments.Jul 7 2022, 3:13 PM
lldb/test/API/macosx/unregistered-macho/TestUnregisteredMacho.py
47

Yeah, good suggestion. I wrote the two tests in the order they're in, and realized I couldn't match the images key in the second one; didn't go back and make the first one consistent.

lldb/test/API/macosx/unregistered-macho/main.c
33

I don't think it matters. I'm saying that this segment doesn't exist in the file. Quick check, I created a hello world program with a single BSS int, so it doesn't exist in the binary but is initialized to zero at runtime.

Load command 2
      cmd LC_SEGMENT_64
  cmdsize 152
  segname __DATA
   vmaddr 0x0000000100004000
   vmsize 0x0000000000004000
  fileoff 0
 filesize 0
49

I needed to increment p so it points to the end of the mach-o image if I want to write it to disk and try sending it through otool for testing. You're right though, this has no effect to the program itself.

lldb/tools/debugserver/source/MacOSX/MachProcess.mm
768

Yeah good point. It's operating on either a mach_header or mach_header_64 so there wasn't a single variable to capture and test.

One more patch update to address David's further feedback. This is ready to land now, will give it a day in case anyone has comments or suggestions.

DavidSpickett accepted this revision.Jul 8 2022, 1:37 AM

Looks good from my perspective.

lldb/test/API/macosx/unregistered-macho/main.c
49

Got it, now I see it in the comments below. Fair enough.

This revision is now accepted and ready to land.Jul 8 2022, 1:37 AM
JDevlieghere accepted this revision.Jul 8 2022, 10:52 AM

LGTM

lldb/tools/debugserver/source/MacOSX/MachProcess.mm
749

Seems like a slightly odd name. is_valid_mach_header?

1021–1022

Nit: If you're touching this line this might be a good opportunity to fix up the weird quadruple/triple slash and additional space.

Final small patch revision.