Page MenuHomePhabricator

Initial support for native debugging of x86/x64 Windows processes
Needs RevisionPublic

Authored by asmith on Jun 11 2019, 1:45 PM.

Diff Detail

Event Timeline

asmith created this revision.Jun 11 2019, 1:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2019, 1:45 PM

I don't see much I would change here as long as this works and gets tested by the generic GDB remote protocol testing? Any others have comments?

source/Plugins/Process/Windows/Common/DebuggerThread.cpp
350 ↗(On Diff #204154)

Define STATUS_WX86_BREAKPOINT somewhere and don't use a magic number?

compnerd added inline comments.
source/Plugins/Process/Utility/RegisterContextWindows_i386.cpp
40 ↗(On Diff #204154)

I believe that this bounds the range, and needs to be re-enabled. Why not permit clang-format to reflow DEFINE_GPR?

76 ↗(On Diff #204154)

Could you use llvm_unreachable instead please? Same throughout.

84 ↗(On Diff #204154)

Why not llvm::array_lengthof?

source/Plugins/Process/Utility/RegisterContextWindows_wow64.h
14 ↗(On Diff #204154)

Could you spell this WoW64 to match the Microsoft naming scheme for the "proper" noun?

Sorry for the stupid question, but ...

What exactly is meant here by "Native"? How is a NativeProcessWindows different from ProcessWindows?

source/Plugins/Process/Windows/Common/NativeProcessWindows.h
16 ↗(On Diff #204154)

Where is ProcessDebugger.h? Should that be part of this patch?

I take it that D63166 is a prerequisite for this patch. Is that all, or is there something else that we ought to look at first?

Overall, this patch is slightly larger that would be ideal for a proper review, though I appreciate the difficulty in bootstrapping something like this. However, given that the register context gunk is actually a significant portion of this patch (though that's not your fault), maybe you could just restrict this patch to a single architecture, and do others as follow-ups? This would let the overall structure of the code stand out more prominently.

Sorry for the stupid question, but ...

What exactly is meant here by "Native"? How is a NativeProcessWindows different from ProcessWindows?

The Native*** classes are meant to be used from lldb-server. They look somewhat similar to their non-native counterpart because they still do debugging, but they're a lot dumber, because they only deal with basic process control, and none of the fancy symbolic stuff that you'd need debug info for.

source/Plugins/Process/Windows/Common/NativeProcessWindows.cpp
352 ↗(On Diff #204154)

s/matin/main

529–530 ↗(On Diff #204154)

Fetching the architecture this way looks like it could be racy if the pid is recycled before you get a chance to attach to the process (can that happen?). Is there a way to fetch the architecture *after* you perform the attach operation.

536 ↗(On Diff #204154)

passing attach_info seems pretty redundant, as at this point the process should already be know the pid and the architecture (they were passed in the constructor). In fact, I think you should just delete DoAttachToProcessWithID, do that work in the constructor, and use the ErrorAsOutParameter pattern to return the error from there. (The Factory::Attach pattern was trying to avoid/discourage the use fallible constructors, but that's hard to avoid when you need to instantiate another object and set up callbacks. In that case, a fallible constructor is better over a separate function, as this way at least the initialization happens in a single step.)

source/Plugins/Process/Windows/Common/NativeProcessWindows.h
16 ↗(On Diff #204154)

I guess that's introduced in D63166

source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.h
31 ↗(On Diff #204154)

Is this overriding something? Can you please use override to indicate that (throughout this patch)?

source/Plugins/Process/Windows/Common/NativeRegisterContextWindows_x86_64.cpp
13 ↗(On Diff #204154)

The empty lines here look completely random. I'd just delete all of the empty lines and let clang-format sort/group the includes for you.

Hui added a comment.Jun 12 2019, 3:01 PM

Sorry for the stupid question, but ...

What exactly is meant here by "Native"? How is a NativeProcessWindows different from ProcessWindows?

The Native*** classes are meant to be used from lldb-server. They look somewhat similar to their non-native counterpart because they still do debugging, but they're a lot dumber, because they only deal with basic process control, and none of the fancy symbolic stuff that you'd need debug info for.

They differ in APIs but most of them have common implementations. The APIs from native process classes are more easy to apply process/thread control.
Hope the native and non-native ones can be merged. The similar thing to the RegisterContext and NativeRegisterContext classes.

The other thing is that using "native" classes can avoid linking a lot of unnecessary lldb libs (LLDB plugins or whatever comes with the plugins) to lldb-server.
The nativeprocesswindows could just be a pass-through to processwindows plugin, but the usage is a sort of strange since the
lldb-server needs to initialize the plugin, create a target, and create a instance just like what lldb does. This means literally
there will be two lldb debuggers, one on host and the other one on remote. It is doable, but not that applicable.

source/Plugins/Process/Windows/Common/DebuggerThread.cpp
350 ↗(On Diff #204154)

The definition is in ntstatus.h which has quite a few macros redefined in winnt.h that is implicitly included in both LLDB. and LLVM.
Could force to include ntstatus.h but need to make a handful of extra patches to replace EXCEPTION_BREAKPOINT macro etc with ntstatus's
in all concerned sources.

source/Plugins/Process/Windows/Common/NativeProcessWindows.cpp
529–530 ↗(On Diff #204154)

This factory attach will return a constructed native process which needs process architecture ahead to construct its native thread with a proper register context. So I think can't do it after the attach operation. At least need to do it before the factory routine returns. Do you mean to put these codes before line 540, i.e. return std::move(process_up)?

In D63165#1540606, @Hui wrote:

Sorry for the stupid question, but ...

What exactly is meant here by "Native"? How is a NativeProcessWindows different from ProcessWindows?

The Native*** classes are meant to be used from lldb-server. They look somewhat similar to their non-native counterpart because they still do debugging, but they're a lot dumber, because they only deal with basic process control, and none of the fancy symbolic stuff that you'd need debug info for.

They differ in APIs but most of them have common implementations. The APIs from native process classes are more easy to apply process/thread control.
Hope the native and non-native ones can be merged. The similar thing to the RegisterContext and NativeRegisterContext classes.

The other thing is that using "native" classes can avoid linking a lot of unnecessary lldb libs (LLDB plugins or whatever comes with the plugins) to lldb-server.
The nativeprocesswindows could just be a pass-through to processwindows plugin, but the usage is a sort of strange since the
lldb-server needs to initialize the plugin, create a target, and create a instance just like what lldb does. This means literally
there will be two lldb debuggers, one on host and the other one on remote. It is doable, but not that applicable.

So the idea is ProcessWindows will be deleted once lldb-server supports windows debugging. A bit of history here: when we first started we started making ProcessXXXX for each platform (mac, linux, windows). Then we thought about remote debugging and decided to have everyone just make a lldb-server for their platform and even when we are locally debugging, we launch a lldb-server. This is how linux, macOS, the BSDs and other targets do it. Why? Because if you do it the other way you have more code to support: one native plug-in and one remote plug-in. This means the remote debugging usually has more issues since it is the less tested debugging scenario. It also means that if you had a native process implementation only, like ProcessWindows is currently, you can't remotely debug to a windows machine.

So yes there is duplicated code for now, but ProcessWindows.cpp and ThreadWindows.cpp should go away in the near future once lldb-server takes over so the temporary code duplication is just to keep things working until lldb-server takes over.

Hui added inline comments.Jun 14 2019, 2:49 PM
source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.h
31 ↗(On Diff #204154)

No, it doesn't override anything. It has different signature from the pure virtual method with the same name.

NativeRegisterContext::virtual Status ReadAllRegisterValues(lldb::DataBufferSP &data_sp) = 0;

It would be better to change the name to be ReadAllRegisterValuesWithSize or something else.

labath added inline comments.Jun 17 2019, 6:34 AM
source/Plugins/Process/Windows/Common/NativeProcessWindows.cpp
315 ↗(On Diff #204154)

This doesn't look right. You're completely ignoring the file_name argument. The idea of this function is to return the load (base) address of the module specified by that argument. Here I guess you're always returning the load address of the main module.

529–530 ↗(On Diff #204154)

I meant doing this *before* the factory returns, but *after* you perform the actual OS attach syscall is completed.

Combining this with the constructor suggestion below, I'd imagine doing something like

Error E = Error::success();
auto process_up = std::make_unique<NativeProcessWindows>(pid, -1, native_delegate, E);
if (E)
  return E;
return std::move(process_up);

Then in the process constructor, you'd do:

ErrorAsOutParameter EOut(E);
auto delegate_sp = std::make_shared<NativeDebugDelegate>(*this);
ProcessAttachInfo attach_info;
attach_info.SetPID(pid); // TODO: This is dumb, we are already passing the pid separately
E = AttachProcess(pid, attach_info, delegate).ToError();
if (E)
  return;
SetID(GetDebuggedProcessId());
ProcessInstanceInfo info;
if (!Host::GetProcessInfo(pid, info)) {
  E = createStringError(inconvertibleErrorCode(), "Cannot get process information")
  return;
}
m_arch = info.GetArchitecture();

Among other things, I'm trying to cut down on the number of layers one has to go through in order to launch/attach. This file already has a number of functions called "attach", D63166 has a bunch more, and then there even more in DebuggerThread. By the time one gets to the bottom of the stack and figures out what's actually happening, he's forgotten what was he actually trying to achieve.

source/Plugins/Process/Windows/Common/NativeProcessWindows.h
31 ↗(On Diff #204154)

I'm not sure this multiple inheritance is really helping anything here. While looking through the code i kept wondering "why doesn't this function just do the job itself instead of delegating to somebody else" only to have to remind myself that the function it is delegating to is actually a different object, which does not know anything about the bigger picture.

If using composition here wouldn't introduce additional complexities/code (which it doesn't look like it would), I think it might be better to do that instead.

133 ↗(On Diff #204154)

It doesn't look like process should ever be null. So, I'd suggest replacing the pointer by a reference, and deleting all the null checks.

source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.h
31 ↗(On Diff #204154)

Hm.., interesting. I think I am starting to understand what's going on here. IIUC, the size of the register context is a constant for any given architecture, but it varies between architectures (i.e., NativeRegisterContextWindows_XXX instances). If that's the case, then it sounds like the data_size should be an argument to the NativeRegisterContextWindows constructor, instead it being passed down for every call. Would that work?

asmith updated this revision to Diff 205164.Jun 17 2019, 1:23 PM

Address reviewers comments

asmith updated this revision to Diff 205175.Jun 17 2019, 1:46 PM

Fix caps for WoW64

Hui added inline comments.Jun 21 2019, 8:35 AM
source/Plugins/Process/Windows/Common/NativeProcessWindows.h
31 ↗(On Diff #204154)

Do you mean to move what is in ProcessDebugger into nativeprocess instead?

labath added inline comments.Jun 21 2019, 8:42 AM
source/Plugins/Process/Windows/Common/NativeProcessWindows.h
31 ↗(On Diff #204154)

No, just instead of inheriting from a ProcessDebugger, making it a member variable instead. Unless there are reasons which make that hard/impossible...

Hui added inline comments.Jun 21 2019, 9:37 AM
source/Plugins/Process/Windows/Common/NativeProcessWindows.h
31 ↗(On Diff #204154)

Debug event callback in ProcessDebugger need to be overridden by both processwindows and nativeprocess. Not sure if the change is doable. Also same change is required to processwindows. That will make the patch a little bigger.

asmith updated this revision to Diff 206146.Sun, Jun 23, 4:18 PM
asmith marked an inline comment as done.
asmith marked 10 inline comments as done.Sun, Jun 23, 4:28 PM
labath added inline comments.Mon, Jun 24, 1:49 AM
lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.h
46–50

I guess these shouldn't be public as these object should be constructed through the factory, right?

lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.cpp
32–37

new doesn't fail. This is dead code.

53–58

This can't ever be true.

69–90

These look like they should be operating invariants (at most guarded by an assertion, but probably even that is not needed). Right now, they're just making it hard to figure out what this function actually does...

lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows_WoW64.cpp
83

How about a helper method like NativeThreadWindows &GetThread() to hide the static_cast everywhere. You could also change the constructor argument to NativeThreadWindows& to make it clear that the register context is only to be constructed with threads of this type.

a GetThreadHandle might be nice too since fetching the thread seems to be invariably followed by the GetHostThread().GetNativeThread().GetSystemHandle() blurb.

lldb/source/Plugins/Process/Windows/Common/NativeThreadWindows.cpp
81–85

What's the purpose of this function? It seems to be only called immediately before deleting the thread, so nobody gets to read it's stop reason anyway...

source/Plugins/Process/Windows/Common/NativeProcessWindows.h
31 ↗(On Diff #204154)

Ok, let's leave it like that for now. However, I find it very weird that the ProcessDebugger class takes a "delegate" object *and* it requires you to inherit from it to override some of it's methods. Seems like one of these extension mechanisms ought to be enough.

source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.h
31 ↗(On Diff #204154)

I still think the register context size would be better handled as an constructor argument.

asmith updated this revision to Diff 207002.Fri, Jun 28, 12:27 AM

removed the "clang-format off" in a few places

asmith added inline comments.Sun, Jun 30, 1:37 PM
lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.cpp
32–37

The code is copied from/following the other targets. Maybe all the targets could be cleaned up in another PR. For example Arm64,

NativeRegisterContextLinux_arm64::ReadAllRegisterValues()

data_sp.reset(new DataBufferHeap(REG_CONTEXT_SIZE, 0));
if (!data_sp)
53–58

Copied from Arm64

asmith updated this revision to Diff 207234.Sun, Jun 30, 1:38 PM

Add a define for STATUS_WX86_BREAKPOINT.

labath added inline comments.Mon, Jul 1, 6:21 AM
lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows.cpp
32–37

Done in r364744.

53–58

Done in r364754.

asmith updated this revision to Diff 207635.Tue, Jul 2, 2:43 PM
asmith edited the summary of this revision. (Show Details)

removed the redundant null checks
added GetThreadHandle to simplify code

asmith marked 6 inline comments as done.Tue, Jul 2, 2:45 PM
asmith updated this revision to Diff 210480.Wed, Jul 17, 9:07 PM
asmith edited the summary of this revision. (Show Details)

I believe this address all the comments.

I am sorry, but the code still seems a lot more verbose to me than it should be needed to implement the given functionality. I'd like to understand why/if it's that necessary..

lldb/source/Plugins/Process/Utility/RegisterContextWindows_WoW64.cpp
72–105

This is very verbose. You could just have GetRegisterInfo return g_register_infos_WoW64 directly (and similarly for other functions. Then all you'd be left with is putting an assert(target_arch.GetMachine() == llvm::Triple::x86) into the RegisterContextWindows_WoW64 constructor. Same goes for other register context classes.

lldb/source/Plugins/Process/Utility/RegisterContextWindows_x86_64.cpp
141

Why is all of this complexity necessary? Couldn't you just switch on the target architecture in CreateRegisterInfoInterface in NativeRegisterContextWindows_x86_64.cpp and create either RegisterContextWindows_WoW64 or RegisterContextWindows_x86_64 ?

In fact, if RegisterContextWindows_WoW64 is identical to RegisterContextWindows_i386, then why do you even need the _WoW64 version of the class in the first place? FWIW, linux also does not have a RegisterContextLinux_32_bit_process_on_64_bit_kernel class...

lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.cpp
285–286

I think it would be better to return an error here. llvm::errc::not_supported ?

311–337

This doesn't seem right. You only retrieve the module list the first time this function was called, but the list of loaded modules can change over time. You probably need to invalidate this cache every time the process is resumed...

lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.h
46–50

This doesn't seem to be addressed.

lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows_x86_64.cpp
104–109

Are these suspend/resume calls necessary? You should be able to assume that the process is stopped (due to breakpoint, exception or whatever). Nobody will be calling functions that get/set registers on a running process. (linux does not support that either, but we don't bother explicitly stopping the process before we attempt to do this).

lldb/source/Plugins/Process/Windows/Common/NativeThreadWindows.cpp
139–199

IIUC, none of the current register contexts implement watchpoint support right now. Which means this code is dead/untested. Let's remove it for the time being...

lldb/unittests/tools/lldb-server/tests/TestClient.cpp
258–260

This doesn't sound right. If the inferior exits, lldb-server should exit too. It doesn't matter if that's implemented via SIGHUP, or whatever. If some of the tests don't pass because of this, maybe you can disable the tests for the time being..

amccarth requested changes to this revision.Thu, Jul 18, 8:17 AM
In D63165#1540606, @Hui wrote:

Sorry for the stupid question, but ...

What exactly is meant here by "Native"? How is a NativeProcessWindows different from ProcessWindows?

The Native*** classes are meant to be used from lldb-server. They look somewhat similar to their non-native counterpart because they still do debugging, but they're a lot dumber, because they only deal with basic process control, and none of the fancy symbolic stuff that you'd need debug info for.

They differ in APIs but most of them have common implementations. The APIs from native process classes are more easy to apply process/thread control.
Hope the native and non-native ones can be merged. The similar thing to the RegisterContext and NativeRegisterContext classes.

The other thing is that using "native" classes can avoid linking a lot of unnecessary lldb libs (LLDB plugins or whatever comes with the plugins) to lldb-server.
The nativeprocesswindows could just be a pass-through to processwindows plugin, but the usage is a sort of strange since the
lldb-server needs to initialize the plugin, create a target, and create a instance just like what lldb does. This means literally
there will be two lldb debuggers, one on host and the other one on remote. It is doable, but not that applicable.

So the idea is ProcessWindows will be deleted once lldb-server supports windows debugging. A bit of history here: when we first started we started making ProcessXXXX for each platform (mac, linux, windows). Then we thought about remote debugging and decided to have everyone just make a lldb-server for their platform and even when we are locally debugging, we launch a lldb-server. This is how linux, macOS, the BSDs and other targets do it. Why? Because if you do it the other way you have more code to support: one native plug-in and one remote plug-in. This means the remote debugging usually has more issues since it is the less tested debugging scenario. It also means that if you had a native process implementation only, like ProcessWindows is currently, you can't remotely debug to a windows machine.

So yes there is duplicated code for now, but ProcessWindows.cpp and ThreadWindows.cpp should go away in the near future once lldb-server takes over so the temporary code duplication is just to keep things working until lldb-server takes over.

Thanks for all the clarification. "Native" is an overloaded term in LLDB.

I'm trying to think through the implications of this always-use-an-lldb-server approach on cross-platform postmortem debugging. I'll have to do some studying, but I guess this patch doesn't change anything in that regard.

lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.cpp
95

This code appears to be based on ProcessWindows::DoResume, which has a pending patch to fix a race condition by moving the checking of the exception record to later in the function. I assume this will need the same fix. See https://reviews.llvm.org/D62183 for details.

This revision now requires changes to proceed.Thu, Jul 18, 8:17 AM

I'm trying to think through the implications of this always-use-an-lldb-server approach on cross-platform postmortem debugging. I'll have to do some studying, but I guess this patch doesn't change anything in that regard.

I think the implications of this are zero. In the post mortem case we would still create an (in-process) instance of ProcessMinidump, and it work the the same way as it does now. Just like we create a ProcessElfCore for opening linux core files even though we always use lldb-server (via ProcessGdbRemote) for live debugging.

Pretty much nothing of what lldb-server does is useful for debugging core files anyway. (Though I have had some ideas about creating a mock lldb-server operating on core files for the purposes of testing lldb.)

Hui added inline comments.Thu, Jul 18, 9:09 PM
lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.h
46–50

These constructors contain a call to function template make_unique that is not allowed to access private members of NativeProcessWindows.
Simply do the following could address this concern.

-  auto process_up =
-      std::make_unique<NativeProcessWindows>(launch_info, native_delegate, E);

+  auto process_up = std::unique_ptr<NativeProcessWindows>(
+      new NativeProcessWindows(launch_info, native_delegate, E));
Hui added inline comments.Thu, Jul 18, 10:46 PM
lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows_x86_64.cpp
104–109

Yes, agreed. But such codes are added to be consistent with what is now in upstream from https://reviews.llvm.org/rL364216. (CacheAllRegisterValues).

Hui added inline comments.Thu, Jul 18, 10:54 PM
lldb/source/Plugins/Process/Utility/RegisterContextWindows_x86_64.cpp
141

I think WoW64 is i686 that shall deserve a separate arch specific implementation?

labath added inline comments.Fri, Jul 19, 3:14 AM
lldb/source/Plugins/Process/Utility/RegisterContextWindows_x86_64.cpp
141

I am sorry, but I don't follow. Can you repeat the question?

(FWIW, I don't believe that the fact that two things are different from a certain point of view justifies copy-pasting (at least) 150 LOC. If you think it's confusing to use something called "i386" in things dealing with WoW64, you can always typedef the WOW64 context to it, or rename the i386 context to something more generic.)

lldb/source/Plugins/Process/Windows/Common/NativeProcessWindows.h
46–50

The make_unique thing is unfortunate, but that is the nature of c++. Since you're concerned about consistency, I'll mention that other classes (e.g. NativeProcessLinux) also prioritize clarifying what is the public interface of a class over the niceness of being able to call make_unique.

lldb/source/Plugins/Process/Windows/Common/NativeRegisterContextWindows_x86_64.cpp
104–109

Interesting. I don't think it's worth being consistent with that, as that code is hopefully going to go away soon, and lldb-server can offer much stronger guarantees about when these things can be called (due to everything being funneled through the gdb-remote protocol, which doesn't even support sending any packets while the process is running).

However, I am not familiar with windows APIs, so I'll leave this decision up to you...