This is an archive of the discontinued LLVM Phabricator instance.

[lldb] [Process] Watch for fork/vfork notifications
ClosedPublic

Authored by mgorny on Mar 17 2021, 3:19 PM.

Details

Summary

Watch for fork(2)/vfork(2) (also fork/vfork-style clone(2) on Linux)
notifications and explicitly detach the forked child process, and add
initial tests for these cases. The code covers FreeBSD, Linux
and NetBSD process plugins. There is no new user-visible functionality
provided -- this change lays foundations over subsequent work on fork
support.

Diff Detail

Event Timeline

mgorny requested review of this revision.Mar 17 2021, 3:19 PM
mgorny created this revision.
mgorny added inline comments.Mar 17 2021, 3:21 PM
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
2152

This is not intended to send errors like this, I've just added it to test logging.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
187 ↗(On Diff #331393)

@labath, I'm not convinced this is the best way of doing it. Do you have any suggestions?

mgorny updated this revision to Diff 331417.Mar 17 2021, 5:06 PM

Added option setting to the Linux plugin.

mgorny updated this revision to Diff 331609.Mar 18 2021, 10:20 AM

Add a helper function to get PID (TGID) from TID. Recognize fork notifications from parent (SIGTRAP) and from child (SIGSTOP with a TGID != GetID()) and call a helper for them. The helper doesn't do anything except for logging right now.

mgorny added inline comments.Mar 18 2021, 10:22 AM
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
459

For some reason, this doesn't print the correct tgid value (plain printf works). Any clue what's wrong here?

Seems mostly ok. The getPidForTid business does not seem completely ideal, though. I think it may be cleaner to keep the threads for which we have received the creation event in some kind of a purgatory list until we receive the corresponding parent event (which will clarify what kind of a child we are dealing with).

We actually did something like that a very long time ago, but I removed it for being too complicated. Now, I think it may be worth going back to it.

BTW, this made me realize of a funky scenario we may need to handle somehow. I think it's possible (on linux) that two threads will initiate a fork operation concurrently, and so by the time that everything stops, we end up with three (or more) processes...

lldb/include/lldb/Host/common/NativeProcessProtocol.h
409–412

The llvm way to do this is via an enum class in combination with LLVM_MARK_AS_BITMASK_ENUM.

lldb/source/Host/linux/Host.cpp
319

just say lldb_private::getPIDForTID

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
459

what does it print?

651

I guess we should also do something like the WaitForNewThread call in the clone case.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
187 ↗(On Diff #331393)

I'd probably do this by creating a new virtual function (with a signature like vector<string>(ArrayRef<StringRef>)`), which would receive the client features and return the server features. The base qSupported handler would be implemented in terms of this function.

This would also allow us to avoid sending features that clearly have no place in the platform connections.

Seems mostly ok. The getPidForTid business does not seem completely ideal, though. I think it may be cleaner to keep the threads for which we have received the creation event in some kind of a purgatory list until we receive the corresponding parent event (which will clarify what kind of a child we are dealing with).

We actually did something like that a very long time ago, but I removed it for being too complicated. Now, I think it may be worth going back to it.

BTW, this made me realize of a funky scenario we may need to handle somehow. I think it's possible (on linux) that two threads will initiate a fork operation concurrently, and so by the time that everything stops, we end up with three (or more) processes...

I'm actually working on such a thing right now, though initially for forks. I can surely extend it to threads later. I will update the diff with it in a few minutes.

mgorny added inline comments.Mar 18 2021, 2:30 PM
lldb/include/lldb/Host/common/NativeProcessProtocol.h
409–412

Heh, and you've just told me to use constexprs instead of enums ;-).

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
459

tid 118029 belongs to a different tgid 4295085325, assuming child

When tgid was 0, it also printed this huge number. If I cast it to int, it prints fine. But then, first arg is also lldb::pid_t and it prints fine.

651

Yeah, I was wondering about it. On the other hand, won't the main loop handle it anyway?

Does doing it explicitly have any real gain? One thing I was worried about is that the case of parent signal first is much more common than child signal first, so having a different logic on both branches would make it more likely for bugs in the child-first branch not to be noticed.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
187 ↗(On Diff #331393)

Yeah, that's better than my ideas ;-).

mgorny updated this revision to Diff 331688.Mar 18 2021, 2:33 PM

Actually wait for two signals. Store the extra info on parent signal, and use it when handling the fork. For now, the handler just detaches the child and resumes the parent.

mgorny updated this revision to Diff 331824.Mar 19 2021, 4:42 AM
mgorny marked 3 inline comments as done.

Switched Extension to bitmask-enum, fixed prototype and decl for getPIDForTID and switched server features into a virtual as suggested by @labath.

mgorny updated this revision to Diff 331846.Mar 19 2021, 6:02 AM

Fix typo: qSupported needs to be joined using ;, not ,.

The gdb model - since gdb only supports one debugee per gdb - is to either follow the fork or follow the parent. It would be more in keeping with lldb's model to make a new target for the child side of the fork, and use that to follow the child. That way you can continue to debug both the parent and the child processes. It doesn't look like you've gotten that far yet, but IMO that's the direction we should be going for lldb.

mgorny updated this revision to Diff 331982.Mar 19 2021, 12:48 PM

Now using the new thingie for threads. getTIDForPID() temporarily commented out but I plan to use it to distinguish clone() for new thread from clone() for new process.

The gdb model - since gdb only supports one debugee per gdb - is to either follow the fork or follow the parent. It would be more in keeping with lldb's model to make a new target for the child side of the fork, and use that to follow the child. That way you can continue to debug both the parent and the child processes. It doesn't look like you've gotten that far yet, but IMO that's the direction we should be going for lldb.

Long-term, sure. Between life and deadlines, we're only working on the simpler one process model right now.

mgorny updated this revision to Diff 332013.Mar 19 2021, 3:26 PM

Add a super-trivial test that the debugger doesn't lose parent after the fork.

Next on the list: clear software breakpoints in the detached fork.

mgorny updated this revision to Diff 332086.Mar 20 2021, 12:53 AM

Generalize the waiting function and use it for forks as well.

The gdb model - since gdb only supports one debugee per gdb - is to either follow the fork or follow the parent. It would be more in keeping with lldb's model to make a new target for the child side of the fork, and use that to follow the child. That way you can continue to debug both the parent and the child processes. It doesn't look like you've gotten that far yet, but IMO that's the direction we should be going for lldb.

This is no longer true -- gdb supports multiple inferiors these days (I'm not sure when exactly did it grow that support). The automatic detaching of the fork child can be disabled by the "detach-on-fork" setting, at which point you get two inferiors for each fork.

Michal doesn't appear to be interested in this feature right now, just the ability to follow the child process. I think that's fair, as the two inferior setup would be more complicated. However, I am keeping this scenario in mind and making sure that it's possible to support it in the future. The idea is that at the gdb protocol level we will have the notion of two inferiors (using the same syntax that gdb does), but the only thing that the client (and server, mostly) support is to immediately detach from one of the processes after a fork. Later this could be changed to do something more interesting.

The gdb model - since gdb only supports one debugee per gdb - is to either follow the fork or follow the parent. It would be more in keeping with lldb's model to make a new target for the child side of the fork, and use that to follow the child. That way you can continue to debug both the parent and the child processes. It doesn't look like you've gotten that far yet, but IMO that's the direction we should be going for lldb.

This is no longer true -- gdb supports multiple inferiors these days (I'm not sure when exactly did it grow that support). The automatic detaching of the fork child can be disabled by the "detach-on-fork" setting, at which point you get two inferiors for each fork.

Michal doesn't appear to be interested in this feature right now, just the ability to follow the child process. I think that's fair, as the two inferior setup would be more complicated. However, I am keeping this scenario in mind and making sure that it's possible to support it in the future. The idea is that at the gdb protocol level we will have the notion of two inferiors (using the same syntax that gdb does), but the only thing that the client (and server, mostly) support is to immediately detach from one of the processes after a fork. Later this could be changed to do something more interesting.

Oh, I'm glad to hear that, both about gdb and what sounds like a good plan for lldb going forward.. We just need to make sure supporting both sides of the fork stays the natural direction, even if we don't support it right off.

Using background apps to create sandbox barriers for lower-permissions tasks is becoming a more and more common architecture. Having a framework for doing "task level" debugging that can cross these process barriers as the tasks move back and forth would be a great addition to lldb, but for that we need to be able to keep both sides of the fork under lldb's control.

I'm certainly not planning to put any obstacles to full multiprocess debugging in the future, and I'm going to try to pave the way for it whenever possible. One thing I'm considering right now is adding some support for tracking additional NativeProcess* instances to LLGS. While it's not technically necessary, I think it will make clearing software breakpoints before detaching cleaner. However, I'm not yet sure how to integrate vfork into this later.

mgorny updated this revision to Diff 332792.Mar 23 2021, 2:24 PM
mgorny retitled this revision from [lldb] follow-fork/vfork support [WIP] to [lldb] [Process/Linux] Watch for fork notifications.
mgorny edited the summary of this revision. (Show Details)

So I've decided to circle back a bit and start by adding server-side detaching-on-fork support. This means that this patch doesn't depend on multiprocess support anymore.

It includes refactoring clone/fork/vfork event monitoring into a single function. Right now, only forks are supported — the handler creates a local NativeProcessLinux instance (@labath, any suggestions how to make this less hacky?), duplicates parent's breakpoints into it (just like the child gets parent's memory) and uses it to clear these breakpoints and detach the child process. I suppose this will make things easier when we want to make the class more persistent in the future.

TODO: vfork, clone

mgorny updated this revision to Diff 333064.Mar 24 2021, 11:19 AM
mgorny retitled this revision from [lldb] [Process/Linux] Watch for fork notifications to [lldb] [Process/Linux] Watch for fork/vfork notifications.
mgorny edited the summary of this revision. (Show Details)

Included vfork(). Still have to look into other clone() variants, then other platforms.

mgorny updated this revision to Diff 333122.Mar 24 2021, 2:22 PM

Handle clone(2) for new process as well (and add a test for it).

mgorny updated this revision to Diff 333124.Mar 24 2021, 2:24 PM

Also add clone+watchpoint test.

mgorny added inline comments.Mar 24 2021, 2:27 PM
lldb/test/Shell/Subprocess/fork-follow-parent.test
7

The tests sometimes fail if child starts after the breakpoint is hit (and therefore this check happens before stop reason below). Any suggestion how to make it work independently of where 'function run in child' happens?

mgorny marked an inline comment as done.Mar 24 2021, 2:30 PM
mgorny added inline comments.Mar 25 2021, 12:45 AM
lldb/include/lldb/Host/common/NativeProcessProtocol.h
145

Long term, these functions are probably unnecessary. It just occurred to me to check how GDB handles it, and it handles clearing and restoring breakpoints on client side (i.e. by sending z and Z packets).

mgorny added inline comments.Mar 25 2021, 1:45 AM
lldb/include/lldb/Host/common/NativeProcessProtocol.h
145

Hmm, actually that depends on how much compatibility with old client versions we want to preserve. If we want forks/vforks to stop being broken with breakpoints when using an old client, we need to keep them as a fallback logic for when fork/vfork-events aren't supported by client.

mgorny updated this revision to Diff 333254.Mar 25 2021, 4:19 AM

Move saved breakpoint list directly into NativeProcessProtocol.

mgorny updated this revision to Diff 333441.Mar 25 2021, 3:32 PM
mgorny retitled this revision from [lldb] [Process/Linux] Watch for fork/vfork notifications to [lldb] [Process] Watch for fork/vfork notifications.

Now includes initial FreeBSD support. The watchpoint test still fails, we probably need to clean dbregs on fork too.

mgorny updated this revision to Diff 333671.Mar 27 2021, 7:00 AM

Complete FreeBSD and NetBSD support. Includes a fix to Detach() on NetBSD.

mgorny updated this revision to Diff 333834.Mar 29 2021, 6:15 AM

Fix wrongly using the debugged process' TID instead of PID.

It includes refactoring clone/fork/vfork event monitoring into a single function. Right now, only forks are supported — the handler creates a local NativeProcessLinux instance (@labath, any suggestions how to make this less hacky?), duplicates parent's breakpoints into it (just like the child gets parent's memory) and uses it to clear these breakpoints and detach the child process.

I don't find this particularly hacky. However, I have doubts about the breakpoint clearing aspect. If we add that now, I think it will be tricky to move that process to the client in the future (how will the client know whether the breakpoints were auto-removed?) Given that breakpoints in forked process have been borked since forever, I don't see harm in keeping that broken for a little while longer. Even a client with extremely limited of multiple processes (like the one we had in mind) should not have a problem with sending the appropriate z packets before it detaches. This patch could still come first (it's great that you've separated that out) -- it just needs to ensure that there are no breakpoints in the child path which would need clearing.

Btw, have you by any chance looked at how gdb determines which clone()s constitute a new thread?

lldb/include/lldb/Host/common/NativeProcessProtocol.h
145

I certainly don't think we need to try to "fix" older clients by doing stuff to the server. (And on top of that, I am less worried about people using older clients, than I am about older servers -- those can be harder to update.)

409–412

That's cause you were using an enum to define constants. If you were using it to define a type, then I would be fine (though I might ask to make it a scoped enum).

lldb/source/Host/linux/Host.cpp
323

Why is this checking for zero ? (When it's being set to INVALID_PID above?)

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
459

I dunno. I recommend debugging it. It smells like some kind of undefined behavior.

925

if (pgid_ret != child_pid) ?

lldb/test/Shell/Subprocess/fork-follow-parent.test
7

There are several ways around that, but I don't know which one is the best, as I'm not sure what exactly you're trying to test. You could try to capture this nondeterminism with CHECK-DAGs, but CHECK-DAGs don't combine well with CHECK-NOTs. Another option is to make this predictable by adding synchronization to the inferior. If you just wanted to check that the child process finishes correctly, you could have the parent process check its exit status and match that...

It includes refactoring clone/fork/vfork event monitoring into a single function. Right now, only forks are supported — the handler creates a local NativeProcessLinux instance (@labath, any suggestions how to make this less hacky?), duplicates parent's breakpoints into it (just like the child gets parent's memory) and uses it to clear these breakpoints and detach the child process.

I don't find this particularly hacky. However, I have doubts about the breakpoint clearing aspect. If we add that now, I think it will be tricky to move that process to the client in the future (how will the client know whether the breakpoints were auto-removed?) Given that breakpoints in forked process have been borked since forever, I don't see harm in keeping that broken for a little while longer. Even a client with extremely limited of multiple processes (like the one we had in mind) should not have a problem with sending the appropriate z packets before it detaches. This patch could still come first (it's great that you've separated that out) -- it just needs to ensure that there are no breakpoints in the child path which would need clearing.

Actually, that's not a problem at all. If client doesn't indicate fork-events support via qSupported, we handle breakpoints server-side. If it does, we deliver fork stop reason and then the client is responsible for dealing with breakpoints.

Btw, have you by any chance looked at how gdb determines which clone()s constitute a new thread?

If I read the code correctly, GDB always assumes PTRACE_EVENT_CLONE is a new thread, and does not support spawning processes via it.

mgorny added inline comments.Mar 30 2021, 6:20 AM
lldb/test/Shell/Subprocess/fork-follow-parent.test
7

The goal is to check that parent is stopped before printing but the child is not. I suppose your last suggestion, i.e. checking for exit status, should be good enough.

It includes refactoring clone/fork/vfork event monitoring into a single function. Right now, only forks are supported — the handler creates a local NativeProcessLinux instance (@labath, any suggestions how to make this less hacky?), duplicates parent's breakpoints into it (just like the child gets parent's memory) and uses it to clear these breakpoints and detach the child process.

I don't find this particularly hacky. However, I have doubts about the breakpoint clearing aspect. If we add that now, I think it will be tricky to move that process to the client in the future (how will the client know whether the breakpoints were auto-removed?) Given that breakpoints in forked process have been borked since forever, I don't see harm in keeping that broken for a little while longer. Even a client with extremely limited of multiple processes (like the one we had in mind) should not have a problem with sending the appropriate z packets before it detaches. This patch could still come first (it's great that you've separated that out) -- it just needs to ensure that there are no breakpoints in the child path which would need clearing.

Actually, that's not a problem at all. If client doesn't indicate fork-events support via qSupported, we handle breakpoints server-side. If it does, we deliver fork stop reason and then the client is responsible for dealing with breakpoints.

Yeah, but then we have to maintain two copies of breakpoint-unsetting code in perpetuity. Given that we've managed to come this far with this being completely broken, I think we can wait a little longer until the client support is in place.

Btw, have you by any chance looked at how gdb determines which clone()s constitute a new thread?

If I read the code correctly, GDB always assumes PTRACE_EVENT_CLONE is a new thread, and does not support spawning processes via it.

Ah, cute. I guess someone should let them know about this (and maybe also tell kernel folks that differentiating cloned processes is really hard). :)

Yeah, but then we have to maintain two copies of breakpoint-unsetting code in perpetuity. Given that we've managed to come this far with this being completely broken, I think we can wait a little longer until the client support is in place.

Ok, I guess I can do that.

What about debug registers on FreeBSD? This system is pretty unique because child processes inherit dbregs from parent, and therefore we need to clear them. GDB doesn't do that and watchpoints crash forked children. Should we keep the clearing part as a hack specific to FreeBSD, or have lldb generic clear all hardware breakpoints and watchpoints on fork? Independently of that, I'm going to file a bug for FreeBSD to see if this is really the intended behavior.

mgorny updated this revision to Diff 334296.Mar 30 2021, 4:09 PM
mgorny marked 4 inline comments as done.

Removed software breakpoint cleanup. Modified tests not to expect child's output at any specific point.

mgorny edited the summary of this revision. (Show Details)Mar 30 2021, 4:09 PM
labath added a comment.Apr 1 2021, 1:46 AM

What about debug registers on FreeBSD? This system is pretty unique because child processes inherit dbregs from parent, and therefore we need to clear them. GDB doesn't do that and watchpoints crash forked children. Should we keep the clearing part as a hack specific to FreeBSD, or have lldb generic clear all hardware breakpoints and watchpoints on fork?

I think we can keep the freebsd-specific clearing code (in the server). Clearing all hardware breakpoints in the client would be pretty logical, but if we wanted to make that work consistently, then we might need to add code to /other/ operating system implementations to /set/ the breakpoints (so that the client can clear them properly)...

lldb/include/lldb/Host/linux/Host.h
14

These spaces are pretty common in lldb, but that's not actually consistent with how the rest of llvm formats them...

lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
738 ↗(On Diff #334296)

And the netbsd comment might apply to freebsd as well...

lldb/source/Plugins/Process/FreeBSD/NativeRegisterContextFreeBSD_arm64.cpp
290–291 ↗(On Diff #334296)
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
647–665

It looks like these two (and probably also PTRACE_EVENT_CLONE) could be bundled together.

659

If this fails (I guess that can only happen if the entire process was SIGKILLED), how does continuing with pid=0 help?
I'm not sure that resuming the parent (as the clone case does) is significantly better, but it seems like we should at least be consistent...

915

auto *

940

I think we have some macro or attribute for that these days...

945

Copying the breakpoints is not needed yet, right? We could just group these two (fork/vfork) paths for now...

961

llvm_unreachable("unknown clone_info.event");

964

erase(find_it) would be slightly more efficient. (And it might be better to put this before the switch, to make sure it executes in case of early exits.)

lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
784 ↗(On Diff #334296)

For NetBSD, it might be cleaner to keep waiting for the parent only, and then do a separate waitpid for the child once you get its process id. That would make the sequence of events (and the relevant code) predictable.

I think this is how this interface was meant to be used, even on linux, but linux makes it tricky because it's not possible to wait for all threads belonging to a given process in a single call -- one would have to iterate through the threads and wait for each one separately (it might be interesting to try to implement and benchmark that).

lldb/test/Shell/Subprocess/Inputs/clone.c
23 ↗(On Diff #334296)

Better include an alignas directive, just in case.

lldb/test/Shell/Subprocess/Inputs/vfork.c
2

Maybe merge these three files into one, parameterized by a define or something?

lldb/test/Shell/Subprocess/clone-follow-parent-wp.test
1 ↗(On Diff #334296)

Why only these three arches?

lldb/test/Shell/Subprocess/fork-follow-parent.test
2

This (and maybe others) should have UNSUPPORTED: system-windows, I guess.

mgorny added inline comments.Apr 1 2021, 3:52 PM
lldb/test/Shell/Subprocess/clone-follow-parent-wp.test
1 ↗(On Diff #334296)

Right, I was supposed to replace this with something better later. It's supposed to limit the test to arches where we support watchpoints.

mgorny updated this revision to Diff 334947.Apr 2 2021, 6:27 AM
mgorny marked 11 inline comments as done.

Update the Linux plugin and tests, mostly.

TODO: FreeBSD, NetBSD and watchpoint feature for tests.

mgorny added inline comments.Apr 2 2021, 7:11 AM
lldb/include/lldb/Host/linux/Host.h
14

Will remove. However, note that this causes clang-format to reorder the includes.

lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
784 ↗(On Diff #334296)

I've been thinking about it and I suppose it makes sense. I was considering what would be better to future-proof it for the future full multiprocess support but I suppose we could just loop over all process classes and let every one wait for its own process rather than creating a dedicated dispatcher that calls waitpid(-1...).

mgorny updated this revision to Diff 334957.Apr 2 2021, 7:51 AM
mgorny edited the summary of this revision. (Show Details)

Simplified FreeBSD/NetBSD to use per-PID waitpid(). Removed arch restrictions from wp tests — made them consistent with Python tests.

mgorny marked 5 inline comments as done.Apr 2 2021, 7:52 AM

@labath, I think I've made all the requested changes.

labath added a comment.Apr 8 2021, 4:05 AM

@labath, I think I've made all the requested changes.

This looks pretty good now. Just some small nits inline...

lldb/include/lldb/Host/linux/Host.h
14

And it should do that -- that's part of the point :)

lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
723 ↗(On Diff #334957)

Is the loop still necessary?

981–987 ↗(On Diff #334957)

assert(event&PL_FLAG_FORKED) would be shorter, but I am not sure if we need even that, as this is already checked in the caller. You could just drop the entire event argument, if it is unused..

lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.h
124 ↗(On Diff #334957)

move this next to the other Monitor functions ?

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
651

Maybe obsolete by now, but I just realized what you actually meant here.

There will always be some divergence between the two sequence of events. Letting this be handled from the main loop may make it smaller though. I don't really mind doing it that way, as long as the (v)fork and clone cases do it the same way.

lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
784 ↗(On Diff #334296)

I think that the looping would be the cleanest solution for the BSDs. For linux, we will need to call waitpid centrally, and then somehow choose which instance to notify.

760 ↗(On Diff #334957)

same here

lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.h
117 ↗(On Diff #334957)

same here

lldb/test/API/functionalities/gdb_remote_client/TestMultiprocess.py
3 ↗(On Diff #334957)

bad merge

lldb/test/Shell/Subprocess/Inputs/fork.cpp
30–42 ↗(On Diff #334957)

How about:

#if def TEST_CLONE
  pid_t pid = clone(child_func, &stack[sizeof(stack)], 0, NULL);
#elif defined TEST_FORK
  pid_t pid = TEST_FORK(); // defined to fork of vfork
  if (pid == 0)
    _exit(child_func(NULL));
#endif

?

lldb/test/Shell/Subprocess/clone-follow-parent.test
2 ↗(On Diff #334957)

The UNSUPPORTED:windows is _not_ required if you explicitly list supported operating systems. :)

mgorny marked 3 inline comments as done.Apr 8 2021, 4:23 AM

Answered where answer was due, will update the rest once I finish retesting the multiprocess patch.

lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
723 ↗(On Diff #334957)

I am not sure but I think there's no harm in keeping it, and it might save us from some hard-to-debug corner conditions in the future.

981–987 ↗(On Diff #334957)

The distinction is necessary to know how to deal with software breakpoints, and it will be used in the followup patch (D99864, when I extend it to non-Linux targets). I think removing event at this point to reintroduce it would be a wasted effort.

That said, I guess this is yet another case for llvm_unreachable(). Also, I think it's worth keeping assert()s like this as they make sure we don't break stuff accidentally when changing code on the caller's end.

labath added inline comments.Apr 8 2021, 4:44 AM
lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
723 ↗(On Diff #334957)

I don't mind that much, though I would be surprised if this can help with anything. IIUC, the BSD kernel always sends one notification when the process stops, and I believe this code is written in a way which expects that (only linux has special code to delay some actions until every thread registers as stopped). If, for whatever reason, we end up getting two notifications here, I expect things would blow up spectacularly...

981–987 ↗(On Diff #334957)

True, but it's even better when the code is written such that the asserts/unreachables are unnecessary. One way to do that might be to do the decoding in the caller:

if (info.pl_flags & PL_FLAG_FORKED) {
  MonitorClone(info.pl_child_pid, /*bool is_vfork*/ fork_flags & PL_FLAG_VFORKED);
  return;
}

It's hard to say whether that would work without seeing the real implementation.

mgorny added inline comments.Apr 8 2021, 5:23 AM
lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
981–987 ↗(On Diff #334957)

In the long run, we would probably also want a separate event for posix_spawn on NetBSD. I've figured out that reusing native constants avoids reinventing the wheel unnecessarily.

mgorny marked 10 inline comments as done.Apr 8 2021, 5:38 AM

Ok, I think I've implemented all the requested changes. I'm going to test it on all three platforms now and attach the patch if it miraculously still works ;-).

lldb/source/Plugins/Process/FreeBSD/NativeProcessFreeBSD.cpp
981–987 ↗(On Diff #334957)

Hmm, though I guess it doesn't matter for FreeBSD right now and since this is entirely platform-specific code, I'll simplify it for now.

mgorny updated this revision to Diff 336107.Apr 8 2021, 7:54 AM

Implemented @labath's requests.

labath accepted this revision.Apr 8 2021, 8:11 AM

ship it

This revision is now accepted and ready to land.Apr 8 2021, 8:11 AM
This revision was landed with ongoing or failed builds.Apr 8 2021, 9:49 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 9:49 AM

test/Shell/Subprocess/vfork-follow-parent-wp.test is failing on GreenDragon:

http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31016/testReport/junit/lldb-shell/Subprocess/vfork_follow_parent_wp_test/

I'm going to temporarily disable the test on Darwin. Happy to provide help to investigate :-)

mgorny added a comment.Apr 8 2021, 3:33 PM

test/Shell/Subprocess/vfork-follow-parent-wp.test is failing on GreenDragon:

http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31016/testReport/junit/lldb-shell/Subprocess/vfork_follow_parent_wp_test/

I'm going to temporarily disable the test on Darwin. Happy to provide help to investigate :-)

I guess that vfork()-ed process doesn't start with cleared dbregs on Darwin. I suppose you don't handle forks at all, so I suppose skipping/xfailing the test is the thing to do — knowing that there's a potential issue but extremely unlikely in real life.

Forks are handled explicitly on FreeBSD, Linux and NetBSD right now but I've left these tests open to all platforms since they merely verify that LLDB doesn't explode when the debugged program forks, and that it (probably) doesn't try to debug the child accidentally.

I've reverted this (121cff78a8032a73aa4fb820625dc1ecae8e3997) because it made the linux bot super-flaky. I think it's likely that this affects _only_ linux (and the BSD code is fine -- though we don't have a bot to verify that), but I didn't want to experiment with partial reverts while the bots are wonky. However, it may be interesting the re-land the linux support in a separate commit, once we figure out the problem -- my money is on the nondeterminism of thread creation.

I don't really understand what's going on, but in the one case I was able to catch this "in the act" the inferior process had one thread which was stopped one byte after the breakpoint location (there's no valid instruction there). This leads me to believe that the FixupBreakpointPCAsNeeded logic did not fire for some reason.

I've reverted this (121cff78a8032a73aa4fb820625dc1ecae8e3997) because it made the linux bot super-flaky. I think it's likely that this affects _only_ linux (and the BSD code is fine -- though we don't have a bot to verify that), but I didn't want to experiment with partial reverts while the bots are wonky. However, it may be interesting the re-land the linux support in a separate commit, once we figure out the problem -- my money is on the nondeterminism of thread creation.

How about recommitting without the clone() change, i.e. assuming that clone always means thread, and seeing if that causes trouble still?

I've reverted this (121cff78a8032a73aa4fb820625dc1ecae8e3997) because it made the linux bot super-flaky. I think it's likely that this affects _only_ linux (and the BSD code is fine -- though we don't have a bot to verify that), but I didn't want to experiment with partial reverts while the bots are wonky. However, it may be interesting the re-land the linux support in a separate commit, once we figure out the problem -- my money is on the nondeterminism of thread creation.

How about recommitting without the clone() change, i.e. assuming that clone always means thread, and seeing if that causes trouble still?

I think I have it figured out -- see the inlined comment. It seems this was one of those (rare) cases where enabling logging actually makes a race _more_ likely...

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
931–932

This is resuming the thread a second time. If the thread manages to stop between the resume inside AddThread, and here, we will miss the breakpoint (and subsequently crash).

Deleting these two lines should fix things.

mgorny marked an inline comment as done.Apr 13 2021, 5:35 AM
mgorny added inline comments.
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
931–932

Hmm, I wonder why it ended up being added here. Probably my mistake, sorry, and thanks for debugging it. I'm going to test and reland it shortly!