This is an archive of the discontinued LLVM Phabricator instance.

Implement GetLoadAddress for the Windows process plugin
ClosedPublic

Authored by asmith on Jan 2 2019, 5:08 PM.

Details

Summary

When a process is loaded, update its sections with the load address to resolve any created breakpoints. For the remote debugging case, the debugged process is launched remotely so GetLoadAddress is intended to pass the load address from remote to LLDB (client).

Diff Detail

Event Timeline

asmith created this revision.Jan 2 2019, 5:08 PM

We should probably have a test for this. It sounds like all you need to do to test this is create a target, set a breakpoint, run the process, then verify that you see some text that says the breakpoint was resolved. With the new build.py script this should be able to work in a cross-platform manner, so maybe it can go under lit/ExecControl and use compiler=any (or don't specify a compiler at all, since that's the default).

labath added a subscriber: labath.Jan 3 2019, 1:59 AM

I think there is something wrong here. Normally, it should be the job of the dynamic loader plugin to figure out where a module got loaded (and populate the target section load list). Process::GetFileLoadAddress is there to help it with this job. So, the fact that ProcessWindows::GetFileLoadAddress goes back to the target section load list to get the data creates a weird loop. The only way I can see this working is if this implementation happens to return the module base address, as specified in the object file, and that base address happens to be correct (no ASLR). The kind of code I would expect to see here is a call to some windows API to determine the load address straight from the OS (if such a thing is possible). For example, on Linux, we parse the /proc/<pid>/maps file to determine this information.

source/Plugins/Process/Windows/Common/ProcessWindows.cpp
878

I think FileOffset will always be zero here. The only case where this is not zero is for fat macho binaries. And even in that case, I don't think adding that offset to the load address is the right thing to do.

886

typo

Seems like DynamicLoaderWindowsDYLD isn't doing its job correctly here and DynamicLoaderWindowsDYLD should be fixed. We shouldn't have to go to the process at all in order to set the section load addresses correctly. Why? As you might have seen lldb-server.exe is being worked on now and then we will have two clients for DynamicLoaderWindowsDYLD: the native ProcessWindows plug-in and ProcessGDBRemote. The

source/Plugins/Process/Windows/Common/ProcessWindows.cpp
858

This entire function is doing the job of the dynamic loader and should be moved into the dynamic loader.

clayborg requested changes to this revision.Jan 3 2019, 11:20 AM

So it seems like DynamicLoaderWindowsDYLD is not doings its job correctly and it needs to be fixed. DynamicLoaderWindowsDYLD is responsible for setting all of the section load addresses correctly using an abstract lldb_private::Process plug-in. No special calls should be added if that is possible. We have someone working on lldb-server.exe and then we will have two clients for DynamicLoaderWindowsDYLD: ProcessWindows and ProcessGDBRemote. DynamicLoaderWindowsDYLD should work for any lldb_private::Process subclass without the need to call back into a specific lldb_private::Process subclass method.

This revision now requires changes to proceed.Jan 3 2019, 11:20 AM
asmith marked an inline comment as done.Jan 14 2019, 10:25 PM
source/Plugins/Process/Windows/Common/ProcessWindows.cpp
858

when do you think the changes you mentioned for the loader and lldb-server.exe would be available for review? should we abandon this and wait for those?

clayborg added inline comments.Jan 15 2019, 7:12 AM
source/Plugins/Process/Windows/Common/ProcessWindows.cpp
858

I am saying the DynamicLoaderWindowsDYLD should be fixed, so that no matter how we connect to a windows process, we get correct DYLD functionality. The way this patch is currently coded, you require every process plug-in that connects to windows processes to have this work around.

Hui added a subscriber: Hui.EditedJan 24 2019, 10:49 AM

I think there is something wrong here. Normally, it should be the job of the dynamic loader plugin to figure out where a module got loaded (and populate the target section load list). Process::GetFileLoadAddress is there to help it with this job. So, the fact that ProcessWindows::GetFileLoadAddress goes back to the target section load list to get the data creates a weird loop. The only way I can see this working is if this implementation happens to return the module base address, as specified in the object file, and that base address happens to be correct (no ASLR). The kind of code I would expect to see here is a call to some windows API to determine the load address straight from the OS (if such a thing is possible). For example, on Linux, we parse the /proc/<pid>/maps file to determine this information.

The Process::GetFileLoadAddress is needed for both process clients mentioned here. For the remote process, it will be the remote platform's responsibility to provide the load address. The remote platform could be lldb-server or other server eligible to connect to lldb. Currently the processwindows plugin knows about the load address since the plugin is notified the process base address (or say the randomized image base) by windows debug event when the process is loaded on Windows. So the processwindows::GetFileLoadAddress actually does a job for dyld's convenience. Later on, if a native windows process is just a pass-through to processwindows, it will benefit the same feature (I am thinking processwindows is working as a common implementation for windows specific).

It is doable on windows to get the process base address by enumerate the process's module. I tried the following. The load address has the same result as processwindows::GetFileLoaddAddress().

+lldb::addr_t Host::GetProcessBaseAddress(lldb::pid_t pid) {
+  AutoHandle snapshot(CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, pid));
+  if (!snapshot.IsValid())
+    return LLDB_INVALID_ADDRESS;
+
+  MODULEENTRY32W me;
+  me.dwSize = sizeof(MODULEENTRY32W);
+  if (Module32FirstW(snapshot.get(), &me)) {
+    // The first module is always the EXE or DLL itself.
+    return (addr_t)me.modBaseAddr;
+  }
+
+  return LLDB_INVALID_ADDRESS;
+}
+

I think there are some architectural issues here in how ProcessWindows works. Normally (i.e., on all non-windows process classes), it is the job of the dynamic loader plugin to create modules and set their load addresses. However, ProcessWindows does things differently, and creates the modules by itself (ProcessWindows::OnLoadDLL).
I think the behavior of ProcessWindows is fine (it flows naturally from how modules are handled on windows), but if it's going to handle module loading by itself, then I think it should do the job completely, and not only half-way. So, if we want ProcessWindows to do the module loading, then it should return a null value for GetDynamicLoader() (or a reasonably stubbed out implementation) instead of relying on a generic dynamic loader plugin to finish the job.

Or, if we want to have the DynamicLoader plugin, then ProcessWindows should stop meddling with the module list in the OnLoadDLL function. One way to achieve that would be:

  • ProcessWindows::OnLoadDLL calls DynamicLoaderWindowsDYLD::OnLoadDll
  • dynamic loader loads the module and sets load address
  • dynamic loader does not have to call ProcessWindows::GetFileLoadAddress as it already got the load address from in the OnLoadDLL function

This would make the flow very similar to how other platforms handle it (the only difference would be that the "OnLoadDLL" equivalent is called from a breakpoint callback function, instead of from a special debug event callback, but that's just due to a difference in how modules are reported).

I think there are some architectural issues here in how ProcessWindows works. Normally (i.e., on all non-windows process classes), it is the job of the dynamic loader plugin to create modules and set their load addresses. However, ProcessWindows does things differently, and creates the modules by itself (ProcessWindows::OnLoadDLL).
I think the behavior of ProcessWindows is fine (it flows naturally from how modules are handled on windows), but if it's going to handle module loading by itself, then I think it should do the job completely, and not only half-way. So, if we want ProcessWindows to do the module loading, then it should return a null value for GetDynamicLoader() (or a reasonably stubbed out implementation) instead of relying on a generic dynamic loader plugin to finish the job.

Or, if we want to have the DynamicLoader plugin, then ProcessWindows should stop meddling with the module list in the OnLoadDLL function. One way to achieve that would be:

  • ProcessWindows::OnLoadDLL calls DynamicLoaderWindowsDYLD::OnLoadDll
  • dynamic loader loads the module and sets load address
  • dynamic loader does not have to call ProcessWindows::GetFileLoadAddress as it already got the load address from in the OnLoadDLL function

This would make the flow very similar to how other platforms handle it (the only difference would be that the "OnLoadDLL" equivalent is called from a breakpoint callback function, instead of from a special debug event callback, but that's just due to a difference in how modules are reported).

I would like the see the dynamic loader do all of the work. We can easily add any new virtual functions to the DynamicLoader class to fit how windows does things.

asmith updated this revision to Diff 184323.Jan 30 2019, 10:18 AM
asmith retitled this revision from Implement GetFileLoadAddress for the Windows process plugin to Implement GetLoadAddress for the Windows process plugin.
asmith edited the summary of this revision. (Show Details)
labath requested changes to this revision.Feb 4 2019, 5:58 AM

Going to the Host straight from the DynamicLoader is the worst possible option, since then this code will not be correct for remote processes (it will just randomly pick up some data from the local process with the same pid, if you happen to have one). I'd like to hear what you think about the latest comments by Greg & myself.

I think there are some architectural issues here in how ProcessWindows works. Normally (i.e., on all non-windows process classes), it is the job of the dynamic loader plugin to create modules and set their load addresses. However, ProcessWindows does things differently, and creates the modules by itself (ProcessWindows::OnLoadDLL).
I think the behavior of ProcessWindows is fine (it flows naturally from how modules are handled on windows), but if it's going to handle module loading by itself, then I think it should do the job completely, and not only half-way. So, if we want ProcessWindows to do the module loading, then it should return a null value for GetDynamicLoader() (or a reasonably stubbed out implementation) instead of relying on a generic dynamic loader plugin to finish the job.

Or, if we want to have the DynamicLoader plugin, then ProcessWindows should stop meddling with the module list in the OnLoadDLL function. One way to achieve that would be:

  • ProcessWindows::OnLoadDLL calls DynamicLoaderWindowsDYLD::OnLoadDll
  • dynamic loader loads the module and sets load address
  • dynamic loader does not have to call ProcessWindows::GetFileLoadAddress as it already got the load address from in the OnLoadDLL function

This would make the flow very similar to how other platforms handle it (the only difference would be that the "OnLoadDLL" equivalent is called from a breakpoint callback function, instead of from a special debug event callback, but that's just due to a difference in how modules are reported).

I would like the see the dynamic loader do all of the work. We can easily add any new virtual functions to the DynamicLoader class to fit how windows does things.

That should be fairly easy to achieve. The process class is the one that creates the dynamic loader (in GetDynamicLoader()), so it can easily call any special method on it that it needs from OnLoadDLL (i.e., we don't even need to extend the public DynamicLoader API).

However, note that this would not be the first process class that handles module loading on its own. ProcessMinidump already does that too. It could be changed to use a special dynamic loader too, but it's not clear to me whether that is worth the trouble, as the loading process for minidumps is very simple.

This revision now requires changes to proceed.Feb 4 2019, 5:58 AM
Hui added inline comments.Feb 4 2019, 7:19 AM
source/Plugins/DynamicLoader/Windows-DYLD/DynamicLoaderWindowsDYLD.cpp
82

Shall not come to this point for a remote process. Need to bail out ahead.

labath added inline comments.Feb 4 2019, 7:35 AM
source/Plugins/DynamicLoader/Windows-DYLD/DynamicLoaderWindowsDYLD.cpp
82

Yes, but what's the reason for the fallback in the first place? ProcessWindows is already by definition on the host, so you could do the same work in ProcessWindows::GetFileLoadAddress (but I'm not saying you should, since that information should already be known through ProcessWindows, as it received it through the windows debug api in OnLoadDLL ).

Hui added a comment.Feb 4 2019, 8:08 AM

Going to the Host straight from the DynamicLoader is the worst possible option, since then this code will not be correct for remote processes (it will just randomly pick up some data from the local process with the same pid, if you happen to have one). I'd like to hear what you think about the latest comments by Greg & myself.

I think there are some architectural issues here in how ProcessWindows works. Normally (i.e., on all non-windows process classes), it is the job of the dynamic loader plugin to create modules and set their load addresses. However, ProcessWindows does things differently, and creates the modules by itself (ProcessWindows::OnLoadDLL).
I think the behavior of ProcessWindows is fine (it flows naturally from how modules are handled on windows), but if it's going to handle module loading by itself, then I think it should do the job completely, and not only half-way. So, if we want ProcessWindows to do the module loading, then it should return a null value for GetDynamicLoader() (or a reasonably stubbed out implementation) instead of relying on a generic dynamic loader plugin to finish the job.

Or, if we want to have the DynamicLoader plugin, then ProcessWindows should stop meddling with the module list in the OnLoadDLL function. One way to achieve that would be:

  • ProcessWindows::OnLoadDLL calls DynamicLoaderWindowsDYLD::OnLoadDll
  • dynamic loader loads the module and sets load address
  • dynamic loader does not have to call ProcessWindows::GetFileLoadAddress as it already got the load address from in the OnLoadDLL function

This would make the flow very similar to how other platforms handle it (the only difference would be that the "OnLoadDLL" equivalent is called from a breakpoint callback function, instead of from a special debug event callback, but that's just due to a difference in how modules are reported).

I would like the see the dynamic loader do all of the work. We can easily add any new virtual functions to the DynamicLoader class to fit how windows does things.

That should be fairly easy to achieve. The process class is the one that creates the dynamic loader (in GetDynamicLoader()), so it can easily call any special method on it that it needs from OnLoadDLL (i.e., we don't even need to extend the public DynamicLoader API).

However, note that this would not be the first process class that handles module loading on its own. ProcessMinidump already does that too. It could be changed to use a special dynamic loader too, but it's not clear to me whether that is worth the trouble, as the loading process for minidumps is very simple.

I am thinking getting load address and setting load address by dyld could be done in separate reviews? For the latter, it might lead to some architecture changes in processwindows plugin.

For this review, getting load address of a host process can just leverage the OS API if no processwindows plugin's help is wanted.

Servers other than lldb-server.exe can easily implement the qFileLoadAddress packet (using their own loader) as a reply to m_process->GetFileLoadAddress().

Hui added inline comments.Feb 4 2019, 8:23 AM
source/Plugins/DynamicLoader/Windows-DYLD/DynamicLoaderWindowsDYLD.cpp
82

Server other lldb-server.exe could respond with a bogus address to qFileLoadAddress packet and then lldb (on host) will come to this point and result in the case you mentioned.

since then this code will not be correct for remote processes (it will just randomly pick up some data from the local process with the same pid, if you happen to have one)

labath added a comment.Feb 4 2019, 8:41 AM
In D56237#1383232, @Hui wrote:

Going to the Host straight from the DynamicLoader is the worst possible option, since then this code will not be correct for remote processes (it will just randomly pick up some data from the local process with the same pid, if you happen to have one). I'd like to hear what you think about the latest comments by Greg & myself.

I think there are some architectural issues here in how ProcessWindows works. Normally (i.e., on all non-windows process classes), it is the job of the dynamic loader plugin to create modules and set their load addresses. However, ProcessWindows does things differently, and creates the modules by itself (ProcessWindows::OnLoadDLL).
I think the behavior of ProcessWindows is fine (it flows naturally from how modules are handled on windows), but if it's going to handle module loading by itself, then I think it should do the job completely, and not only half-way. So, if we want ProcessWindows to do the module loading, then it should return a null value for GetDynamicLoader() (or a reasonably stubbed out implementation) instead of relying on a generic dynamic loader plugin to finish the job.

Or, if we want to have the DynamicLoader plugin, then ProcessWindows should stop meddling with the module list in the OnLoadDLL function. One way to achieve that would be:

  • ProcessWindows::OnLoadDLL calls DynamicLoaderWindowsDYLD::OnLoadDll
  • dynamic loader loads the module and sets load address
  • dynamic loader does not have to call ProcessWindows::GetFileLoadAddress as it already got the load address from in the OnLoadDLL function

This would make the flow very similar to how other platforms handle it (the only difference would be that the "OnLoadDLL" equivalent is called from a breakpoint callback function, instead of from a special debug event callback, but that's just due to a difference in how modules are reported).

I would like the see the dynamic loader do all of the work. We can easily add any new virtual functions to the DynamicLoader class to fit how windows does things.

That should be fairly easy to achieve. The process class is the one that creates the dynamic loader (in GetDynamicLoader()), so it can easily call any special method on it that it needs from OnLoadDLL (i.e., we don't even need to extend the public DynamicLoader API).

However, note that this would not be the first process class that handles module loading on its own. ProcessMinidump already does that too. It could be changed to use a special dynamic loader too, but it's not clear to me whether that is worth the trouble, as the loading process for minidumps is very simple.

I am thinking getting load address and setting load address by dyld could be done in separate reviews? For the latter, it might lead to some architecture changes in processwindows plugin.

For this review, getting load address of a host process can just leverage the OS API if no processwindows plugin's help is wanted.

Yes, two reviews would be great. However, the way I see this, the second patch would basically remove the need for the special Host function you're trying to introduce here. So I'd suggest to do the second part first, and then this should be very simple addon on top of that.

Servers other than lldb-server.exe can easily implement the qFileLoadAddress packet (using their own loader) as a reply to m_process->GetFileLoadAddress().

I am aware of that. However, my point is that you shouldn't need *any* fallback here. If ProcessWindows::GetFileLoadAddress does the right thing, then all the dyld plugin would not need to call any host functions *at all* (which is just a fundamentally wrong thing to do).

asmith updated this revision to Diff 186401.Feb 11 2019, 10:01 PM

I like the direction this is going. I have one small comment about the code organization inline, but the bigger question I have in mind is: After these changes, is there any use for Host::GetProcessBaseAddress left? If so, what is it? I'd like to understand it and remove it.

source/Plugins/Process/Windows/Common/ProcessWindows.cpp
868

You could have this function return DynamicLoaderWindowsDYLD * (thanks to the covariant return type thingy), and avoid having to cast the result in a bunch of places.

1039–1044

On other platforms, it is the responsibility of the DYLD plugin to add the module to the target list (i.e. call Target::GetSharedModule or something equivalent). I think it would be more consistent to move this code there too. At that point you may just delete ProcessWindows::OnLoadDll and have the debugger thread call process_windows->GetDynamicLoader()->OnLoadDLL() directly. The same goes for module UnloadDLL.

Hui added a comment.Feb 12 2019, 10:47 PM

I like the direction this is going. I have one small comment about the code organization inline, but the bigger question I have in mind is: After these changes, is there any use for Host::GetProcessBaseAddress left? If so, what is it? I'd like to understand it and remove it.

From my understanding, lldb-server.exe is to debug a process instantiated by NativeProcessProtocol which might or might not relate to ProcessWindows plugin.

To remove the host codes mentioned above, it requires the implementation of NativeProcessProtocol also notify the DYLD the load address of the main executable otherwise
DYLD has to do the job with the OS help. For now I think just remove them.

labath added a subscriber: sas.Feb 12 2019, 11:27 PM
In D56237#1395791, @Hui wrote:

I like the direction this is going. I have one small comment about the code organization inline, but the bigger question I have in mind is: After these changes, is there any use for Host::GetProcessBaseAddress left? If so, what is it? I'd like to understand it and remove it.

From my understanding, lldb-server.exe is to debug a process instantiated by NativeProcessProtocol which might or might not relate to ProcessWindows plugin.

To remove the host codes mentioned above, it requires the implementation of NativeProcessProtocol also notify the DYLD the load address of the main executable otherwise
DYLD has to do the job with the OS help. For now I think just remove them.

Yes, the NativeProcessWindows will need to notify the DYLD about the load address somehow. If DYLD went and did that on it's own, it would kill the "remote" aspect of lldb-server, as the DYLD plugin lives in the client. I think @sas is already passing that information somehow with ds2, so it might be good to sync up with him, so we don't reinvent something new. (My guess would be that they just have ds2 implement the packet underlying the GetFileLoadAddress() call.)

It short, let's remove the host code for now, and we can discuss that again after we get around to implementing NativeProcessWindows or whatever.

asmith updated this revision to Diff 186796.Feb 13 2019, 9:32 PM

Looks good to me now. I think that, for completeness, you ought to add the DYLD plugin to the list of dependencies for in the CMakeLists.txt for ProcessWindows. Thanks for putting up with me.

source/Plugins/Process/Windows/Common/ProcessWindows.h
93

add override

labath accepted this revision.Feb 13 2019, 10:59 PM
asmith updated this revision to Diff 186799.Feb 14 2019, 12:03 AM
asmith marked 9 inline comments as done.Feb 14 2019, 12:04 AM
This revision was not accepted when it landed; it landed in state Needs Review.Feb 14 2019, 8:34 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 8:34 PM