This is an archive of the discontinued LLVM Phabricator instance.

Complete support of loading a darwin kernel over a live gdb-remote connection given the address of a mach-o fileset
ClosedPublic

Authored by jasonmolenda on Sep 8 2022, 3:05 PM.

Details

Summary

In an internal development environment, lldb will attach to a gdb stub that can provide the address of a Mach-O fileset including the kernel binary. I added some preliminary support for the binary-addresses key in qProcessInfo in https://reviews.llvm.org/D130813; this patch fixes that support now that we have a live stub to run against, and passes the addr_t it gets through the PlatformList to see if any Platform knows how to handle it.

PlatformDarwinKernel will test the addr_t to see if it is a Mach-O Fileset, using Jonas' mach-o fileset plugin (https://reviews.llvm.org/D132433 et al). If it is a fileset, it will search for a com.apple.kernel entry, and if found, it will set the Process DynamicLoader to be DynamicLoaderDarwinKernel, and register the address of the com.apple.kernel entry in the fileset, to be loaded later by the DynamicLoader plugin.

The big trick with this is that I needed to do several Darwin specific things -- parse a Mach-O fileset in memory, detect if this is a Darwin kernel, set the Process DynamicLoader -- when I'm given only an addr_t in ProcessGDBRemote. Going through the Platform plugins and asking them if they can do these things given the addr_t was a way of accomplishing this from generic code.

This does mean that I introduced cross-platform dependencies in PlatformDarwinKernel. It needs to directly create a ObjectContainerMachOFileset object to use the FindEntry methods which are only available to the subclass. I need to instantiate a DynamicLoaderDarwinKernel object and set the Process to use it as the m_dyld. I like isolating all of this very-darwin-specific logic down in PlatformDarwinKernel, but I didn't see a good way of working around the dependencies.

Testing this is a trick I haven't figured out yet; I'd need a remote stub that will return the qProcessInfo binary-addresses key, have a Mach-O fileset there with a com.apple.kernel entry pointing to a binary that is structured like a Darwin kernel binary enough to trick lldb into loading it. It's possible to hand-test against a live environment, but faking enough of that to create a test would be a lot.

There will be a second patch dealing with corefiles, where I ended up making a lot of changes to ProcessMachCore so it's a rather large diff, and it was possible to separate the two halves, so I did.

Any feedback appreciated, I've been working on this one for a little bit now & a fresh set of eyes is welcome.

Diff Detail

Event Timeline

jasonmolenda created this revision.Sep 8 2022, 3:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2022, 3:05 PM
Herald added a subscriber: mgorny. · View Herald Transcript
jasonmolenda requested review of this revision.Sep 8 2022, 3:05 PM

So far just a bunch of random comments...

lldb/include/lldb/Target/Platform.h
1055

Are there any restrictions on when in the process lifecycle this should be called? If so it would be good to call that out.

lldb/source/Core/DynamicLoader.cpp
191

This isn't your addition, but it seems dangerous to assume process will always be non-null here...

233

It seems weird that we would get all the way here, with a live process and everything, but not have a target architecture. How does that happen?

lldb/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
699–700

This second test is redundant, isn't it?

lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
747

Is this comment accurate anymore? You only get to this code if module_spec.GetUUID is NOT valid, in which case DebugSymbols isn't going to find anything.

928

Do you really need to pass in a module_sp with a pretty much totally bogus module in it? That seems weird to me. The Module constructor does a bunch of work, all of which is going to fail with an empty ModuleSpec, so at some point CreateMemoryInstance is going to replace whatever is in this module_sp with something real. So what's the point of passing it something it's going to throw away?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
2214

llvm::to_integer(..., 16) doesn't handle strings that start with 0x? That seems weak.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
597

I'm a little surprised we don't have a platform at this point?

What happens if we had a Platform at this point, but it wasn't the one that knew how to load this binary? That seems awkward.

jasonmolenda marked an inline comment as done.Sep 8 2022, 9:59 PM

Thanks Jim! Really helpful feedback and questions.

lldb/source/Core/DynamicLoader.cpp
191

Thanks, I don't even know why this is here, I don't seem to be calling the platform anywhere.

233

I hit this with a macho corefile debugging case where the Target arch *could* have been set by the corefile's cputype / cpusubtype, but it hadn't been yet (and that may not be the most accurate arch), and this crashed when it created a DataExtractor and tried to get addrsize. It's pretty early in the target's lifetime, so very little has been set up; it seemed simplest to make sure there's an architecture for the Target before I run code that I know may need it.

lldb/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
699–700

This code is really not filling me with joy, I'm sure I wrote it a decade+ ago myself. The idea was that once I know the actual running UUID of the kernel in this process, I want to remove any non-matching modules that may have been added manually by a user specifying the wrong kernel binary. Otherwise we have extra kernel symbols loaded and symbol name lookups can get the wrong one. Let me rewrite this to do a clearer pass over all the Target's modules and remove any kernels with non-matching UUIDs, instead of just doing this with the 0th module in the list (the "executable module").

lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
747

The old code behaved this way -- if the FileSpec in the ModuleSpec was empty or we had no UUID, we would call into the more generic PlatformDarwin::GetSharedModule. If I had a FileSpec, if it was the special name "mach_kernel", I would look in the list of local mach kernels I'd found by scanning; else it was a kext BundleID and I would look up the kexts I scanned locally by that bundle ID.

In the new code path, I might not be able to set "mach_kernel" filename for the kernel because I might only have gotten it by a bare address. So now I will check the UUIDs of all the kernels on the local system for a match. Else I'll try the kexts by bundle ID. If the "mach_kernel"/empty name finds a module, we return it directly. Kexts return directly.

Then we fall through to calling Platform::GetSharedModule with whatever we had in the ModuleSpec -- a FileSpec maybe, a UUID maybe. I believe this will be correct behavior.

928

Yeah, this is how the CreateMemoryInstance for ObjectFiles work too - Process::ReadModuleFromMemory creates a Module, possibly adding an ObjectFile from memory, and returns the ModuleSP. In my case here, I'm not going to use/return the ObjectContainer being constructed, but Jonas' ObjectContainerMachOFileset::CreateMemoryInstance requires a Module that it locks before starting its operations, and adds an ObjectFile to it. I think this is an artifact of following the ObjectFile API pattern, I checked with Jonas and he didn't want to deviate.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
2214

Looking at llvm::consumeUnsignedInteger in lib/Support/StringRef.cpp it will call GetAutoSenseRadix() if a radix isn't specified, it will see & advance past "0x" and work. But if you also want to accept base16 numbers that aren't 0x prefixed, you have to specify the radix to llvm::to_integer() and the GetAutoSenseRadix() won't be called. It only does one or the other, it doesn't accept a mix.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
597

We do have a platform, but it's a Host platform. We've just connected to a gdb remote serial protocol port -- it could be debugserver, it could be a jtag device, it could be a kernel debugger. Once we get a real binary, we can start figuring out the best platform and dynamic linker, but we're starting with almost nothing here, so given this special binary address, we need to ask any of the platform plugins if they know what to do with it.

jasonmolenda marked 7 inline comments as done.Sep 8 2022, 10:05 PM

Update the patch to address Jim's comments/suggestions. Biggest change was to clean up how DynamicLoaderDarwinKernel filters out any kernels in the Target that don't match the UUID of the actually running kernel. There was a vague attempt at this before, checking the target's ExecutableModule, but that assumes there was only one module loaded and it was the wrong kernel. Easier to step through everything loaded in the target and eliminate these, and clearer what's being done. I had the same sequence for detecting if a module is a kernel in a bunch of places so I added module_is_a_kernel() as a static function there.

I also renamed the method in Platform and PlatformList to LoadSpecialBinaryAndSetDynamicLoader(), to help make it clear that this is an uncommon use case, so someone doesn't try to use it for loading general binaries.

labath added a subscriber: labath.Sep 9 2022, 5:35 AM
labath added inline comments.
lldb/unittests/Interpreter/CMakeLists.txt
17–18 ↗(On Diff #458970)

These dependencies should be added to the PlatformDarwinKernel CMakeLists.txt, not to the every file that depends on it.

The current patch didn't have context so I just left a bunch of nits.

lldb/include/lldb/Target/Platform.h
870–874

What makes the binary special? The comment above doesn't say. Can we give this a more descriptive name or omit the "Special" part?

lldb/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
133

This seems needless verbose: is_kernel conveys the same information.

574–577

m_kernel_image = is_kernel(module_sp.get())

668–669

is_kernel = is_kernel(memory_module_sp.get())

747–752

The ModuleList has a Modules() function that returns a locking iterator. You can rewrite this as:

for(ModuleSP module : target.GetImages().Modules()) {
  ...
}
lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
929
ModuleSP module_sp = std::make_shared<Module>()
970
if(platform_sp)
973–974

std::make_shared

Update patch to include more surrounding context.

jasonmolenda marked 8 inline comments as done.Sep 9 2022, 1:39 PM
jasonmolenda added inline comments.
lldb/include/lldb/Target/Platform.h
870–874

I'm still not happy with this method name, I agree this name isn't clear. I'll try to think of something better.

lldb/source/Plugins/DynamicLoader/Darwin-Kernel/DynamicLoaderDarwinKernel.cpp
133

Good point, will do.

668–669

ah yes, it made more sense previously when the conditional expression was larger.

747–752

Ach, thanks, missed that.

lldb/unittests/Interpreter/CMakeLists.txt
17–18 ↗(On Diff #458970)

Thanks Pavel, I should have thought of that.

jasonmolenda marked 5 inline comments as done.

Update patch to incorporate Jonas & Pavel's feedback.

JDevlieghere accepted this revision.Sep 9 2022, 2:29 PM

A small comment but otherwise this LGTM.

lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
924

Looks like we're never checking this status below. Is there any chance that ReadMemory succeeds but the status is set to an error? If not, should we add an assert?

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
597–601

This is the crux of the patch. I think it would be helpful here to explain what's going on, i.e. that we give the platform a chance first, before deferring to the DynamicLoader.

This revision is now accepted and ready to land.Sep 9 2022, 2:29 PM
jasonmolenda marked 2 inline comments as done.Sep 9 2022, 2:44 PM
jasonmolenda added inline comments.
lldb/source/Plugins/Platform/MacOSX/PlatformDarwinKernel.cpp
924

Good point. Process::ReadMemoryFromInferior returns the number of bytes read, and a Status object. So if were were only able to read part of the mach header object, this would not return an error condition. (and we'd have uninitialized fields and could try to read too many load commands or something)

This revision was automatically updated to reflect the committed changes.
jasonmolenda marked an inline comment as done.
thakis added a subscriber: thakis.Sep 9 2022, 4:15 PM
thakis added inline comments.
lldb/source/Plugins/Platform/MacOSX/CMakeLists.txt
47

This causes a dependency cycle:

//lldb/source/Plugins/Platform/MacOSX:MacOSX ->
//lldb/source/Plugins/DynamicLoader/Darwin-Kernel:Darwin-Kernel ->
//lldb/source/Plugins/Platform/MacOSX:MacOSX
jasonmolenda added inline comments.Sep 9 2022, 4:50 PM
lldb/source/Plugins/Platform/MacOSX/CMakeLists.txt
47

Ach, naturally. DynamicLoaderDarwinKernel has to create a PlatformDarwinKernel to set in the Target when it is initializing itself. :/ Maybe I'll just add DynamicLoaderDarwinKernel to the unit tests that have PlatformMacOSX in them.

jasonmolenda added inline comments.Sep 9 2022, 5:40 PM
lldb/source/Plugins/Platform/MacOSX/CMakeLists.txt
47

I removed the dependency in DynamicLoaderDarwinKernel, a very specialized plugin, and left the dependency in PlatformMacOSX which includes all of the darwin platforms and is a common one to import. I believe any target that is linking against DynamicLoaderDarwinKernel will also have a dependency on PlatformMacOSX already. I landed this as 30578c08568bc8de79dea72e41f49899ba10ea55 to make sure this causes no problems, we can fix it better if someone has a suggestion.

thakis added inline comments.Sep 13 2022, 7:49 AM
lldb/source/Plugins/Platform/MacOSX/CMakeLists.txt
47

Thanks!

I believe anything linking the darwin kernel DynamicLoader plugin
will already have lldbPluginPlatformMacOSX in its dependency list,
so not explicitly expressing this dependency is safe.

That sounds like it doesn't really remove the dependency, it just lies about the build system about it and things still happen to work :)

Normally, if you have a cycle between libraries A and B, you'd either:

  • decide which of the two should depend on the other
  • make the "lower" library expose some virtual delegate
  • implement that in the "higher" library

…or move the needed shared bits to an even lower library that both A and B naturally depend on already.

https://chromium.googlesource.com/chromium/src/+/lkgr/docs/patterns/inversion-of-control.md has some (slightly, but not overly so) chromium-specific notes on this.

But layering in lldb is generally a mess, so I think your workaround is ok for now. It'd be nice to clean up lldb's layering at some point!