This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Detach the child process when stepping over a fork
ClosedPublic

Authored by labath on Jan 12 2023, 5:13 AM.

Details

Summary

Step over thread plans were claiming to explain the fork stop reasons,
which prevented the default fork logic (detaching from the child
process) from kicking in. This patch changes that.

Diff Detail

Event Timeline

labath created this revision.Jan 12 2023, 5:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:13 AM
labath requested review of this revision.Jan 12 2023, 5:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:13 AM

(The fix seems to make LLDB do the right thing, although I can't claim to fully understand what is going on. From what I can tell, we're currently detaching from the fork child when the fork event gets pulled off the public event queue, and this code is preventing that from happening. Intuitively it seems like the public event queue pull is slightly too late to be doing this kind of thing.)

The part of handling the fork where we decide we're going to follow the child and so we need to switch the process PID & TID does have to happen on event receipt. The point there is that until the client pulls an event from the public event queue, it doesn't know that the state change has happened yet. We don't want the current state that you access with GetState, GetPID, GetSelectedThread, etc. to get out of sync with the event flow, so to the public world it should appear to it as if the process were in the state of the last event it actually received. Having the PID or TID change before the fork event gets pulled off the queue would break this illusion.

However, the state of the process we aren't following isn't relevant to the state of the process we are following, so the part of the DidFork dealing with the discarded side of the fork can happen when the fork event is received, in ProcessGDBRemote::SetThreadStopInfo or thereabouts. That does seem like a better way to do this.

However, I don't think this patch is wrong in itself. Formally, the step thread plans might want to know that a fork has happened. But it's not terribly important as they can't really do anything about except maybe report an error. And the same could be said of exec events which are already in this list. What's actually going to happen is once we've stopped for the fork, all the step plans will be asked whether they are Stale, and given the thread they were on is no longer present, they should unship themselves. So we're not losing anything by bypassing the plans for this event delivery.

Note, it might be worth checking that the stepping thread plan's ShouldStop check the PID, not just the TID. It's possible that the new TID was the same as the old TID and the plan might try to keep operating, which would be wrong.

Thanks for your response, Jim.

The part of handling the fork where we decide we're going to follow the child and so we need to switch the process PID & TID does have to happen on event receipt. The point there is that until the client pulls an event from the public event queue, it doesn't know that the state change has happened yet. We don't want the current state that you access with GetState, GetPID, GetSelectedThread, etc. to get out of sync with the event flow, so to the public world it should appear to it as if the process were in the state of the last event it actually received. Having the PID or TID change before the fork event gets pulled off the queue would break this illusion.

However, the state of the process we aren't following isn't relevant to the state of the process we are following, so the part of the DidFork dealing with the discarded side of the fork can happen when the fork event is received, in ProcessGDBRemote::SetThreadStopInfo or thereabouts. That does seem like a better way to do this.

The first part makes sense to me, but I'm not sure how to reconcile it with the second paragraph. It seems to me that, if follow-fork-mode=child, it'd be pretty hard to detach from the original (parent) process early on, but still have the public API report on its state until the fork event gets actually pulled from the event queue. Technically, it's doable, but I think we'd have to maintain separate "private" and "public" copies of most of the process state (thread lists, etc.). And I'm not sure how useful would that be given that when the process is running, it's hard to guarantee the correctness of a piece of information anyway.

However, I don't think this patch is wrong in itself. Formally, the step thread plans might want to know that a fork has happened. But it's not terribly important as they can't really do anything about except maybe report an error. And the same could be said of exec events which are already in this list. What's actually going to happen is once we've stopped for the fork, all the step plans will be asked whether they are Stale, and given the thread they were on is no longer present, they should unship themselves. So we're not losing anything by bypassing the plans for this event delivery.

Ok, so shall we go with this patch then?

Note, it might be worth checking that the stepping thread plan's ShouldStop check the PID, not just the TID. It's possible that the new TID was the same as the old TID and the plan might try to keep operating, which would be wrong.

AFAICT, the plans do not check the process PID, but ShouldStop seems like it's a bit too late for a plan to figure out it's working with a completely different thread that it was supposed to have. Wouldn't it be better to just remove/flush all the thread plans upon forking (if following the child)?

jingham accepted this revision.EditedJan 23 2023, 5:52 PM

Thanks for your response, Jim.

The part of handling the fork where we decide we're going to follow the child and so we need to switch the process PID & TID does have to happen on event receipt. The point there is that until the client pulls an event from the public event queue, it doesn't know that the state change has happened yet. We don't want the current state that you access with GetState, GetPID, GetSelectedThread, etc. to get out of sync with the event flow, so to the public world it should appear to it as if the process were in the state of the last event it actually received. Having the PID or TID change before the fork event gets pulled off the queue would break this illusion.

However, the state of the process we aren't following isn't relevant to the state of the process we are following, so the part of the DidFork dealing with the discarded side of the fork can happen when the fork event is received, in ProcessGDBRemote::SetThreadStopInfo or thereabouts. That does seem like a better way to do this.

The first part makes sense to me, but I'm not sure how to reconcile it with the second paragraph. It seems to me that, if follow-fork-mode=child, it'd be pretty hard to detach from the original (parent) process early on, but still have the public API report on its state until the fork event gets actually pulled from the event queue. Technically, it's doable, but I think we'd have to maintain separate "private" and "public" copies of most of the process state (thread lists, etc.). And I'm not sure how useful would that be given that when the process is running, it's hard to guarantee the correctness of a piece of information anyway.

I don't think this is an actual problem currently in lldb. After all, our current process model requires that when a process is stopped, it is stopped all the way and can't change state on us (so no new fork events). And when it's running you can't ask questions (other than whether it is in the running state) till you stop it. Since you can't fork while you are stopped, the fork will always happen while the public state is Running, at which point you can only ask "what is your state" or request an interruption. And by the time you have fetched the Stop event so that you can start asking questions the parent -> child transition will have already happened, and you will have to factor that new information into the questions you were going to ask anyway.

So I don't actually think it would cause problems to handle releasing the child earlier.

This is one of the things we're going to have to handle more carefully once we actually do "no stop" debugging. We're relying in a bunch of places on the simplification that when you're stopped, nothing can happen, and when you're running you have to stop first before you can ask questions of the process. In this case, however, I don't think it will be that hard. We can just set lldb up to always follow both sides of the fork, making a target for the child. Then depending on the user settings we can kill off one or the other at our leisure, and be able to ask questions of either in the meantime.

However, I don't think this patch is wrong in itself. Formally, the step thread plans might want to know that a fork has happened. But it's not terribly important as they can't really do anything about except maybe report an error. And the same could be said of exec events which are already in this list. What's actually going to happen is once we've stopped for the fork, all the step plans will be asked whether they are Stale, and given the thread they were on is no longer present, they should unship themselves. So we're not losing anything by bypassing the plans for this event delivery.

Ok, so shall we go with this patch then?

You should do this patch anyway. It doesn't seem like the step plans should be the ones handling fork events, just like they aren't expected to handle exec events. So this change is good anyway.

If it's not going to cause a problem to postpone continuing the side of the fork you aren't following to the next event receipt, then this patch should suffice.

Note, it might be worth checking that the stepping thread plan's ShouldStop check the PID, not just the TID. It's possible that the new TID was the same as the old TID and the plan might try to keep operating, which would be wrong.

AFAICT, the plans do not check the process PID, but ShouldStop seems like it's a bit too late for a plan to figure out it's working with a completely different thread that it was supposed to have. Wouldn't it be better to just remove/flush all the thread plans upon forking (if following the child)?

We always have to do a cleanup pass on the thread plans after a stop: e.g. if we were doing a bunch of nested steps & nexts and then hit an exception that threw past a bunch of the frames we were stepping in, we're going to have to discard all the plans whose frames are now gone, and ShouldStop is where that cleanup happens... We also do that if we were stepping on a thread and the thread died. So cleaning up when the PID changes seems also like a natural part of that cleanup. I'm not sure you gain much by adding another place where you do some ThreadPlanStack cleanups, but if you can find a convenient place to do it, it shouldn't cause any harm.

This revision is now accepted and ready to land.Jan 23 2023, 5:52 PM
This revision was automatically updated to reflect the committed changes.