This is an archive of the discontinued LLVM Phabricator instance.

Fix stepping over watchpoints in architectures that raise the exception before executing the instruction
ClosedPublic

Authored by jingham on Jul 14 2022, 4:10 PM.

Details

Summary

On some architectures - e.g. aarch64 - the watchpoint exception is raised before the instruction that would have triggered it gets run. We want to report old/new values, however, so we need to have executed the instruction before reporting the stop. That means before reporting the stop, we need to do one stepi on the thread that got the watchpoint exception.

The original version of this code implemented this incorrectly. It did the step over synchronously in the StopInfoWatchpoint::ShouldStopSynchronous. You should NEVER run the target in ShouldStopSynchronous. If you do that, then any other threads that had interesting pending stop info actions won't be able to perform their action because the target will have moved on and invalidated their StopInfo's.

You should always do this kind of thing by pushing the action you want to perform onto the ThreadPlanStack for the thread in question. That way all the threads get to register what they need to do, and then the ThreadList runner can arbitrate among them and do the right thing. For instance, in the case of stepping over watchpoints, each thread stopped on the watchpoint pushes a "step instruction" thread plan that re-instates the original watchpoint StopInfo when complete. If there is more than one thread in this state, lldb notices each of them has a "stop others" plan, and one by one lets each thread step over the watchpoint. When all this work is done, we report all the threads stopped at their watchpoints.

I also fixed a bug in the same code I noticed while reworking this. We were not decrementing the HitCount for the watchpoint if the condition fails. We consider a failed condition to mean "the stop point wasn't hit". It's weird to see a hit count of 5 when the target only told you about one stop... But that wasn't being enforced in the StopInfoWatchpoint::PerformAction.

There was a test that was expecting the now-incorrect value of the hit count in this case, so I fixed that, and also all the tests that were failing because of concurrent access were turned on again and run successfully for me.

There's a bit of funniness for MIPS in the watchpoint support. I am pretty sure I got that right, but I don't have anything MIPS to test that on. The same concurrent watchpoint tests that were marked failing for arm64 Darwin are also marked failed for MIPS, but I am not sure if that's for the same reason. Anyway, if someone has access to a MIPS system, it would be good to try this out and see if it works there as well.

Diff Detail

Event Timeline

jingham created this revision.Jul 14 2022, 4:10 PM
jingham requested review of this revision.Jul 14 2022, 4:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 4:10 PM

Ack, bad memory. It was actually originally done in PerformAction, but that has the same effect. You really shouldn't run the target in PerformAction if you can manage it. That's why, for instance, I added a --auto-continue flag to the breakpoints (probably should do the same for watchpoints) so that the thread list code gets a chance to reason about the state before continuing. We do run expressions there, but we should really only do that if (a) we know the expression will only run the current thread or (b) we're doing it for the user, where if they have an expression that requires all threads to run, they needed to do that and presumably were willing to trade off running the expression for changing the state of the other threads from what it was at the initial stop.

I read through the patch and I'm not sure I have much useful feedback because I haven't worked on the ThreadPlan code very much, but this does look like what we were discussing for this. I believe this patch will fix https://github.com/llvm/llvm-project/issues/48777 . Regarding MIPS suport, in reading through the patch, I was reviewing relevant calls and saw this bit in GDBRemoteCommunicationClient::GetWatchpointsTriggerAfterInstruction() - so I believe you're right this change should also fix multithreaded watchpoint hitting on those targets.

// On targets like MIPS and ppc64, watchpoint exceptions are always
// generated before the instruction is executed. The connected target may
// not support qHostInfo or qWatchpointSupportInfo packets.
after = !(triple.isMIPS() || triple.isPPC64());

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

(It wouldn't surprise me if this was a bug in lldb-server, as I think it's related to some strange behavior with multiple threads stopped (in these situations, lldb will sometimes keep switching away from the debugged thread after every stop), but it was never clear to me whether this was a problem in lldb or lldb-server.)

lldb/source/Target/StopInfo.cpp
804

This creates a shared pointer loop, does it not?

IIUC, this m_step_over_wp_sp is used only to check whether the plan is complete. Maybe, instead of the pointer, we could have a flag that the thread plan sets when it finishes?

897–901

How does this relate to the thread plan part of the patch?

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

DavidSpickett added inline comments.
lldb/source/Target/StopInfo.cpp
724

Blank line after this.

748

Depending how you read it this seems backwards. if m_reset_watchpoint is true don't reset it? m_did_reset_watchpoint maybe?

758

return m_step_over_wp_sp->IsPlanComplete(); ?

808

return error.Success() ?

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Yeah, as I was writing my explanation for why we see this problem on Darwin, I realized the obvious next question was, "why isn't this a debugserver fix" - I think that would be another valid way to approach this, although we still need to interop with older debugservers on iOS etc devices from the past five years. (to be fair, it's relatively uncommon that this issue is hit in real-world programs, our API tests stress this area specifically to expose bugs) I do wonder about the failure mgorny reported in https://github.com/llvm/llvm-project/issues/48777 which sound like exactly the same problems as we have on Darwin, but on a BSD or linux system running under AArch64 qemu? (he doesn't say which OS he was running under qemu)

That being said, compensating for this debugserver deficiency in lldb doesn't seem like a bad thing to me, given that the patch exists.

jingham updated this revision to Diff 445059.Jul 15 2022, 10:56 AM

Address review comments.

  1. Name changes and formatting
  2. The ThreadPlan needs to keep the StopInfo alive but not vice versa. So I removed the sp to the StopInfo, and replaced it with a callback in the StopInfo that the thread plan can call to tell it it is done.
  3. I removed the "ResetWatchpoint" call in the ThreadPlan destructor. I don't think on second thoughts that's right. By the time you've gotten to the ThreadPlan destructor, I don't think you know that you are the one in charge of that watchpoint anymore, so the plan shouldn't change it at that point.
jingham added inline comments.Jul 15 2022, 10:58 AM
lldb/source/Target/StopInfo.cpp
804

Ack, yes. I added a "I'm complete" callback to the StopInfoWatchpoint, and the ThreadPlan calls that.

808

Yeah, sometimes it's useful to break on one or other condition w/o having to add conditional breakpoints, but that doesn't seem to be the prevailing style.

897–901

This is a drive-by fix. The point of this effort was to make the hit counts correct for Watchpoints, and as I was reading through the code I noticed that we were actually failing to do that in two ways: mishandling the case of multiple simultaneous watchpoint hits, and mishandling failed condition hits.

I mentioned in the initial comments that I fixed them both in one patch, but I can separate them into two patches if that bugs you.

jingham added a comment.EditedJul 15 2022, 11:05 AM

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Darwin has always worked in such a way that by the time it stops for an exception it will have given other threads enough chance to run that if they were likely to, other threads will hit the exception before the first exception is delivered to lldb. In gdb we tried to hide that fact by only publishing one exception, but that was awkward since you when you tried to resume to do some piece of work, you'd immediately stop again without getting to run. That just lead to confusing interactions (and was a little harder to deal with in the debugger as well). I think it's clearer to stop and see 5 threads have hit your breakpoint, then decide what to do, than to have one breakpoint reported then when you "continue" immediately turn around and get another without anybody really seeming to make progress, rinse, repeat...

Rereading this I wasn't sure what question you were asking. I answered "what should you tell lldb if lldb-server sees 5 threads stopped for reasons at one stop point." But the other question is "what to do with stop info's if you only run one thread". If lldb knows it has stopped a thread from running when it resumes the target, it also preserves the StopInfo for that thread. That seems to me the most useful behavior.

jingham added a comment.EditedJul 15 2022, 4:37 PM

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Yeah, as I was writing my explanation for why we see this problem on Darwin, I realized the obvious next question was, "why isn't this a debugserver fix" - I think that would be another valid way to approach this, although we still need to interop with older debugservers on iOS etc devices from the past five years. (to be fair, it's relatively uncommon that this issue is hit in real-world programs, our API tests stress this area specifically to expose bugs) I do wonder about the failure mgorny reported in https://github.com/llvm/llvm-project/issues/48777 which sound like exactly the same problems as we have on Darwin, but on a BSD or linux system running under AArch64 qemu? (he doesn't say which OS he was running under qemu)

That being said, compensating for this debugserver deficiency in lldb doesn't seem like a bad thing to me, given that the patch exists.

I'm pretty sure "saving suspended thread's stop reasons across a resume" is not a requirement of the gdb-remote protocol, so I don't think we should rely on that behavior.

Plus, this is the wrong way to use the execution control part of lldb. It's much harder to reason about the execution control machinery if we're in one round of "How should I react to stop X" and someone just moves the target to Stop X+1 in the middle of that process... We are in a bit of a tough spot with expressions that run all threads, but (a) that really only happens when user's explicitly ask for it and (b) we already make a distinction between "resumes for expression evaluation" and "natural resumes". We should certainly not make it worse. So even if it hadn't caused this bug, that wasn't the right way to implement this bit of business.

labath accepted this revision.Jul 18 2022, 1:34 AM

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Darwin has always worked in such a way that by the time it stops for an exception it will have given other threads enough chance to run that if they were likely to, other threads will hit the exception before the first exception is delivered to lldb. In gdb we tried to hide that fact by only publishing one exception, but that was awkward since you when you tried to resume to do some piece of work, you'd immediately stop again without getting to run. That just lead to confusing interactions (and was a little harder to deal with in the debugger as well). I think it's clearer to stop and see 5 threads have hit your breakpoint, then decide what to do, than to have one breakpoint reported then when you "continue" immediately turn around and get another without anybody really seeming to make progress, rinse, repeat...

Rereading this I wasn't sure what question you were asking. I answered "what should you tell lldb if lldb-server sees 5 threads stopped for reasons at one stop point." But the other question is "what to do with stop info's if you only run one thread". If lldb knows it has stopped a thread from running when it resumes the target, it also preserves the StopInfo for that thread. That seems to me the most useful behavior.

Given your latest comment, I think you've understood my question, but to avoid confusion, the situation I was referring to is when lldb-server sees 5 threads stopped for some reason, and then lldb steps one of them. The question was whether lldb-server should still report the original stop reason of those threads (if asked -- theoretically lldb should know that the reason hasn't changed and might decide not to ask).

I agree that preserving the stop reason is the most useful behavior for lldb (the threads haven't run, so why should their state change?), but then one could apply the same logic to the lldb-/debugserver component as well.

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Yeah, as I was writing my explanation for why we see this problem on Darwin, I realized the obvious next question was, "why isn't this a debugserver fix" - I think that would be another valid way to approach this, although we still need to interop with older debugservers on iOS etc devices from the past five years. (to be fair, it's relatively uncommon that this issue is hit in real-world programs, our API tests stress this area specifically to expose bugs) I do wonder about the failure mgorny reported in https://github.com/llvm/llvm-project/issues/48777 which sound like exactly the same problems as we have on Darwin, but on a BSD or linux system running under AArch64 qemu? (he doesn't say which OS he was running under qemu)

That being said, compensating for this debugserver deficiency in lldb doesn't seem like a bad thing to me, given that the patch exists.

I'm pretty sure "saving suspended thread's stop reasons across a resume" is not a requirement of the gdb-remote protocol, so I don't think we should rely on that behavior.

What makes you so sure of that? As you've said before, the idea of having multiple stop reasons per stop is our own invention, so it's not like we have a precedent or a spec we compare ourselves to.

The way I see it, one can fairly easily find justifications for both positions. In a world where we don't save the stop reasons, the reasons are not so much a property of a thread, as they are about a thread. They are a property of a stop, and once move on from that "stop", the stop reasons go away. If you wanted to preserve the stop reasons, you'd argue that the stop reasons are an intrinsic property of a thread, and since the thread hasn't run, its stop reason shouldn't change.

I don't think either of them is wrong. Just, depending on how you look at things, one of them feels more natural. And if you're on a system where you have to actively suspend a thread to prevent it from running during a resume, you might incline towards a different interpretation than if you're able to control each thread individually (and "suspending a thread" means "doing nothing").

Plus, this is the wrong way to use the execution control part of lldb. It's much harder to reason about the execution control machinery if we're in one round of "How should I react to stop X" and someone just moves the target to Stop X+1 in the middle of that process... We are in a bit of a tough spot with expressions that run all threads, but (a) that really only happens when user's explicitly ask for it and (b) we already make a distinction between "resumes for expression evaluation" and "natural resumes". We should certainly not make it worse. So even if it hadn't caused this bug, that wasn't the right way to implement this bit of business.

Yeah, I'm not arguing against this patch. The reason I brought this up is because it seems we have different behavior of our stubs in these situations. I think that is causing some (other) bugs, and I'd like to know where to fix them.

lldb/source/Target/StopInfo.cpp
730

Change type of m_stop_info_sp (and the relevant constructor argument) to shared_ptr<StopInfoWatchpoint> to avoid the cast here.

(btw, the right cast for upcasts is static_cast. reinterpret_cast can return wrong results in more complicated scenarios)

805

And use static_pointer_cast<StopInfoWatchpoint>(shared_from_this()) here. That way the casting remains confined within the origin class.

897–901

Sorry, I missed that part of the patch description. Well, the main question on my mind was whether I am missing something. Now that that's cleared up, it's not so much of a problem (I am assuming this is related to/tested by the change in TestWatchpointConditionCmd.py), though I would generally recommend splitting that off into a separate patch in the future (and I don't think you'd need to create a review for a simple change like this).

994

delete

lldb/test/API/commands/watchpoints/watchpoint_commands/condition/TestWatchpointConditionCmd.py
82–83

I guess this comment was wrong before, but better update it now. (or delete it -- it's kind of self-evident)

This revision is now accepted and ready to land.Jul 18 2022, 1:34 AM
jingham updated this revision to Diff 445591.Jul 18 2022, 12:02 PM

Address review comments

jingham marked 3 inline comments as done.Jul 18 2022, 12:11 PM
jingham added inline comments.
lldb/source/Target/StopInfo.cpp
730

Change type of m_stop_info_sp (and the relevant constructor argument) to shared_ptr<StopInfoWatchpoint> to avoid the cast here.

Cute, I didn't know the voodoo to change the type of the pointer inside a shared pointer while retaining its reference counts.

(btw, the right cast for upcasts is static_cast. reinterpret_cast can return wrong results in more complicated scenarios)

Huh, interesting. The names seem backwards to me, static_cast sounds like it takes the pointer you gave it and casts it to the target type, whereas reinterpret sounds like the one that figures out what the class hierarchy actually is and does the right thing, but it looks like it's the other way round.

805

I didn't know that voodoo, looks like we use it in other places, however, so I switched to doing it that way.

jingham marked an inline comment as done.Jul 18 2022, 12:27 PM

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Darwin has always worked in such a way that by the time it stops for an exception it will have given other threads enough chance to run that if they were likely to, other threads will hit the exception before the first exception is delivered to lldb. In gdb we tried to hide that fact by only publishing one exception, but that was awkward since you when you tried to resume to do some piece of work, you'd immediately stop again without getting to run. That just lead to confusing interactions (and was a little harder to deal with in the debugger as well). I think it's clearer to stop and see 5 threads have hit your breakpoint, then decide what to do, than to have one breakpoint reported then when you "continue" immediately turn around and get another without anybody really seeming to make progress, rinse, repeat...

Rereading this I wasn't sure what question you were asking. I answered "what should you tell lldb if lldb-server sees 5 threads stopped for reasons at one stop point." But the other question is "what to do with stop info's if you only run one thread". If lldb knows it has stopped a thread from running when it resumes the target, it also preserves the StopInfo for that thread. That seems to me the most useful behavior.

Given your latest comment, I think you've understood my question, but to avoid confusion, the situation I was referring to is when lldb-server sees 5 threads stopped for some reason, and then lldb steps one of them. The question was whether lldb-server should still report the original stop reason of those threads (if asked -- theoretically lldb should know that the reason hasn't changed and might decide not to ask).

I agree that preserving the stop reason is the most useful behavior for lldb (the threads haven't run, so why should their state change?), but then one could apply the same logic to the lldb-/debugserver component as well.

I think the useful part to lldb is that if IT knows that it was the one responsible for a thread not continuing (it told the stub to stop it) then it should preserve the stop info. If the stub mirrors this, then that's okay but lldb shouldn't rely on it. OTOH, if lldb told the stub to continue all threads, but the stub for some reason of it's own didn't continue one of the threads, and then it re-reported the stop info, that would be confusing to lldb. I can't think of any reason why the stub would do this, but that's the only case where having the stub also do this work is anything but redundant.

Generally, this makes sense to me, but I do have one question (not necessarily for Jim). These tests work on (arm) linux, and I am wondering why is that the case. Could it be related to the fact that lldb-server preserves the stop reason for threads that are not running? I.e. if we have two threads hit a breakpoint (or whatever), and then step one of them, then the other thread will still report a stop_reason=breakpoint. Do you know if this is supposed to happen (e.g. does debugserver do that)?

I'd have to go back and test it to be 100% sure but from the behavior I see on Darwin systems, when we receive a watchpoint hit on two threads at the same time, and instruction-step the first thread, when we ask debugserver for the current stop-reason on thread two, it says there is no reason. The watchpoint exception on the second thread is lost. I could imagine lldb-server behaving differently. (I think when we fetch the mach exceptions for the threads, they've now been delivered, and when we ask the kernel for any pending mach exceptions for the threads a second time -- even if those threads haven't been allowed to run, I think the state is lost, and debugserver didn't save that initial state it received.)

Yes, that sounds plausible. It lldb-server (on linux) it happens differently because we store each stop reason inside the thread object, and since we can control each thread independently, there's no reason to touch it if we're not running the thread. Although it also wouldn't be hard to clear in on every resume.

So which one of these behaviors would you say is the correct one?

Yeah, as I was writing my explanation for why we see this problem on Darwin, I realized the obvious next question was, "why isn't this a debugserver fix" - I think that would be another valid way to approach this, although we still need to interop with older debugservers on iOS etc devices from the past five years. (to be fair, it's relatively uncommon that this issue is hit in real-world programs, our API tests stress this area specifically to expose bugs) I do wonder about the failure mgorny reported in https://github.com/llvm/llvm-project/issues/48777 which sound like exactly the same problems as we have on Darwin, but on a BSD or linux system running under AArch64 qemu? (he doesn't say which OS he was running under qemu)

That being said, compensating for this debugserver deficiency in lldb doesn't seem like a bad thing to me, given that the patch exists.

I'm pretty sure "saving suspended thread's stop reasons across a resume" is not a requirement of the gdb-remote protocol, so I don't think we should rely on that behavior.

What makes you so sure of that? As you've said before, the idea of having multiple stop reasons per stop is our own invention, so it's not like we have a precedent or a spec we compare ourselves to.

Interesting. We introduced this many years ago, and I assumed the gdb version must have some way of returning independent stop info for all the threads, but it looks like it still doesn't. OTOH, it might in the future and more generally it would be nice to rely on the simplest behavior on the part of the stub we can get away with and still preserve as much functionality as we can. So to me that argues not having lldb rely on this behavior.

The way I see it, one can fairly easily find justifications for both positions. In a world where we don't save the stop reasons, the reasons are not so much a property of a thread, as they are about a thread. They are a property of a stop, and once move on from that "stop", the stop reasons go away. If you wanted to preserve the stop reasons, you'd argue that the stop reasons are an intrinsic property of a thread, and since the thread hasn't run, its stop reason shouldn't change.

I don't think either of them is wrong. Just, depending on how you look at things, one of them feels more natural. And if you're on a system where you have to actively suspend a thread to prevent it from running during a resume, you might incline towards a different interpretation than if you're able to control each thread individually (and "suspending a thread" means "doing nothing").

As I said above, the only time having the stub do this on it's own that wasn't purely redundant is when the stub stops something, but lldb didn't think that thread was stopped. In that case, I think this would be more confusing than helpful.

Plus, this is the wrong way to use the execution control part of lldb. It's much harder to reason about the execution control machinery if we're in one round of "How should I react to stop X" and someone just moves the target to Stop X+1 in the middle of that process... We are in a bit of a tough spot with expressions that run all threads, but (a) that really only happens when user's explicitly ask for it and (b) we already make a distinction between "resumes for expression evaluation" and "natural resumes". We should certainly not make it worse. So even if it hadn't caused this bug, that wasn't the right way to implement this bit of business.

Yeah, I'm not arguing against this patch. The reason I brought this up is because it seems we have different behavior of our stubs in these situations. I think that is causing some (other) bugs, and I'd like to know where to fix them.

Unless lldb is getting into trouble because it lacks some piece of information needed to figure out what to do, I think we should be fixing this sort of problem on the lldb end if we can. That's presumably the agent that knows everything.