This is an archive of the discontinued LLVM Phabricator instance.

[NFC] Have ThreadPlans hold onto the Process & TID, rather than the Thread
ClosedPublic

Authored by jingham on Mar 5 2020, 3:44 PM.

Details

Summary

This is the first part of a multi-part commit. This change makes the ThreadPlans use the process & TID to specify which thread they belong to, rather than holding onto the Thread * directly. It doesn't change any behavior, it just means ThreadPlans have to get their thread from the ThreadPlan::GetThread() API, and OTOH can access their process and TID directly.

This point of this multi-part commit will be to make it possible to preserve the stepping intentions for a thread in the target, even if the target doesn't report all threads on every stop. This is particularly a problem for targets that provide their threads using an OSPlugin thread provider. The Darwin kernel uses this extensively, and they found that if they reported all the current threads on every stop, it made stepping unusably slow. So they only report the "on core" threads when they stop.

That made stepping perform acceptably, but caused stepping to behave badly. For instance, suppose you were next-ing Thread A, and the code had stepped into a function and so needed to step out. So lldb would set a breakpoint at the return address of the function, then continue. If you are unlucky, Thread B might hit a breakpoint before thread A completed its step out, and furthermore when Thread B stopped, Thread A didn't happen to be on another core. Then lldb would think Thread A has exited, and would remove the step out breakpoint. That meant when you continued again, your step-over would have turned into a continue.

To fix this problem, I am going to move the ThreadPlan stacks into a separate class, and then have the Process hold onto the map of TID -> ThreadPlanStacks. Then if a thread goes away, we can leave its entry in the map, and if the thread comes back to life, we will know what we were doing with it.

Couple of other random observations:

  • I tried preserving the whole Thread object for a while, but the logic for merging the old & new thread lists is pretty tricky and intervening at that point made this tricky code even harder to understand. Plus this was preserving a lot of unneeded information. Just keeping the ThreadPlanStack off to one side was much cleaner.
  • Leaving actually obsolete thread plan stacks around does have a cost. For instance, if the thread in the example above had really gone away, then we would leave a step-out breakpoint stranded. It wouldn't do any harm, since it is thread specific for a non-existent thread. But it would slow down execution.

In the patches in this sequence, I will only preserve vanished thread's ThreadPlanStacks if there is an OperatingSystem plugin. So it won't affect the current ProcessGDBRemote clients, since ProcessGDBRemote always gets all "real" threads on every stop at present.

In the future, we could change ProcessGDBRemote to NOT gather all threads on stop, but just read the threads that stopped "for a reason". We would have to encode that in the stop packet somehow.

That could make stepping more efficient. If lldb was going to continue immediately (it was in the middle of stepping or whatever) it would be great to only have to reconstruct one or two threads. In the case of ProcessGDBRemote, we could even refine this by asking about the existence of all the threads that had non-trivial thread plans in flight, along with the stopped-for-a-reason threads. That would allow us to reap any of the ThreadPlanStacks that had any effect on the process (setting breakpoints, etc...) without having to enumerate all threads.

Future:

The sequence of patches should be:

  1. This patch deals with how ThreadPlans know what thread they belong to.
  2. Do the same change to ThreadPlanTracers
  3. Introduce the ThreadPlanStacks, and move the plans there, but keep them in the Thread
  4. Move the ThreadPlanStacks into the Process, and manage detaching & reattaching them from Threads as they come and go.

Diff Detail

Event Timeline

jingham created this revision.Mar 5 2020, 3:44 PM
jingham added a comment.EditedMar 5 2020, 4:14 PM

Another point to make clear here. You might wonder whether ThreadPlan::GetThread should return an Optional<Thread> or something like that, since there might not be a Thread associated with the ThreadPlan.

I tried that out but it made the code really ugly. When I thought about it a little more, I realized that besides the destructor, and Description methods all the other methods are only callable when the ThreadPlanStack is attached to a thread, since they are all only called as part of the "ShouldStop" and "WillResume" negotiation, which always work on Thread objects in the ThreadList. So nobody has any business calling any of these other methods when detached from a Thread. And conversely, it's pointless to check whether you have a thread in these methods, you always will...

It doesn't cause practical problems that you can't assume anything about the state of your Thread in these two methods. When you are being destroyed you can't assume anything about the state of the underlying thread, it's target side may very well have gone away, so the destructors don't actually use the thread. And actually all the Description methods really wanted the TID, or were using the thread to get to the process. So I didn't have to change either's behavior.

In the patch that detaches the ThreadPlanStacks from the Thread, I'll assert if you call GetThread and there isn't one. But that won't 100% prevent the problem, since in most scenarios there will always be a thread in these situations.

I was planning on adding the assert and then documenting this restriction, and leave it at that. That's in part because I don't know how to say ThreadPlan::GetThread can be called from anywhere in ThreadPlan BUT ThreadPlan::GetDescription and it's descendants, and ThreadPlan::~ThreadPlan or its descendants. But if anyone knows a clever way to do this, I'd happily use that.

jingham updated this revision to Diff 248638.Mar 5 2020, 6:03 PM

Attempt to appease clang-format.

labath added a subscriber: labath.Mar 6 2020, 12:39 AM

I am somewhat worried about the direction this is taking. Maybe vanishing threads are fine for os-level debugging (because there the "real" threads are the cpu cores, and the os threads are somewhat imaginary), but I definitely wouldn't want to do it for regular debugging. If my process has 1000 threads (not completely uncommon) then I don't think my debugger would be doing me a service showing me just the three it thinks are interesting and completely disregarding the others. Having a mode (it could even be default) which only highlights the interesting ones -- yes, that would definitely be useful. Making sure it doesn't touch the other threads when it doesn't need to -- absolutely. But I don't think it should pretend the other threads don't exist at all, because I may still have a reason to inspect those threads (or just to know that they're there). In fact, I wouldn't be surprised if this happened to the said kernel folks too, and they end up adding a custom command, which would enumerate all "threads".

What was actually the slow part here? Was it the os plugin reading the thread lists from the kernel memory? Or the fact that lldb is slow when it is confronted with them?

If it is the latter, then maybe we should fix that. For example, I think I remember lldb likes to check the PC of every thread before it does any kind of a resume -- I am not convinced that is really needed.

If fetching the threads is slow, then perhaps we could have an api, where the os threads are produced incrementally, and we ensure that lldb does not query for os threads needlessly during the hot paths (e.g. internal stops for servicing thread plans).

I am somewhat worried about the direction this is taking. Maybe vanishing threads are fine for os-level debugging (because there the "real" threads are the cpu cores, and the os threads are somewhat imaginary), but I definitely wouldn't want to do it for regular debugging. If my process has 1000 threads (not completely uncommon) then I don't think my debugger would be doing me a service showing me just the three it thinks are interesting and completely disregarding the others. Having a mode (it could even be default) which only highlights the interesting ones -- yes, that would definitely be useful. Making sure it doesn't touch the other threads when it doesn't need to -- absolutely. But I don't think it should pretend the other threads don't exist at all, because I may still have a reason to inspect those threads (or just to know that they're there). In fact, I wouldn't be surprised if this happened to the said kernel folks too, and they end up adding a custom command, which would enumerate all "threads".

First off, I also think that hiding some threads from users, or forcing people to say "thread list --force" or something like that to see them all would not be a user-friendly design. I'm not suggesting imposing that on general debugging. But if an OS plugin, for instance, decides it is too expensive to do this except as an explicit gesture, we need to support that decision.

But lldb stops many times during the course of a step where it doesn't return control to the user - all the step into code with no debug info, step out again, step across dynamic linker stubs or step through ObjC message dispatch dances we do involve many private stops per public spot... If a thread has not stopped "for a reason" at one of those stops, it takes no part in the "should stop" and "will resume" negotiations and reconstructing it for that stop really does serve no purpose. There's no way that the user will get a chance to see it.

So if we could avoid reading them in, that might make stepping go faster in the presence of lots of threads. BTW, with the accelerated stop packets where we get all the threads and their stop reasons on stop, I'm not sure this would really save all that much time. But talking to less capable stubs, it really can make a difference. Greg did a bunch of work to make sure stepping stopped as few times as possible because the Facebook gdb-remote stub didn't support these accelerated packets and getting all the thread info on each private stop was making stepping go really slowly. I didn't understand the motivation at first because for debugserver the changes didn't really make any difference.

Also, I don't think these two desires: not fetching threads when we don't need them (e.g. private stops) and showing them all to the user whenever they ask, are contradictory. After all we still have the gate of UpdateThreadListIfNeeded. All the commands that touch the thread list go through this. So even if we don't gather all the threads when we stop, we already have a mechanism in place whereby any command that actually touches the thread list can fetch them on demand.

What was actually the slow part here? Was it the os plugin reading the thread lists from the kernel memory? Or the fact that lldb is slow when it is confronted with them?

If it is the latter, then maybe we should fix that. For example, I think I remember lldb likes to check the PC of every thread before it does any kind of a resume -- I am not convinced that is really needed.

If fetching the threads is slow, then perhaps we could have an api, where the os threads are produced incrementally, and we ensure that lldb does not query for os threads needlessly during the hot paths (e.g. internal stops for servicing thread plans).

The thing that is slow is gathering all the threads. On Darwin, there's a kernel activation per user space thread and then a whole bunch of kernel threads. On a busy system, that ends up being a whole lot of threads. Producing them through the OS plugin interface involves a bunch of memory reads to fetch the thread data, and following pointers around to get the register state for the threads, etc.

Note also, the way the OS Plugins currently work, when the process stops the OS Plugin just gets asked to produce the threads backing the "on core" threads. That's all UpdateThreadListIfNeeded will fetch, and there's no lldb way to get the OS Plugin to fetch all the threads it can produce. The only way to do that with the Darwin kernel is to run other commands in the set of Python commands provided by the kernel team that either fetches a non-current thread by ID or forces fetching all the threads.

So the current state when working with OS Plugins violates your principle given above, but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful. But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.

In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object. That is the sole ambition of this set of patches.

As long as the keeping thread plans around for threads that aren't there is only enabled when OS plug-in is around, I am fine with this direction. One questions: do we currently only allow stepping on real threads or OS threads that are backed by real threads right now? And if an OS thread is backed by a real thread, do we associate the step plan with the OS thread ID or the real thread ID (the core)? If we use the OS thread ID, another way to potentially fix this is to know in a thread plan that we are stepping an OS thread, and avoid fetching the complete OS list of threads for non public stops, but only fetch the thread list for any OS threads that have active thread plans and any OS threads for threads that are active on a core. Then we don't end up fetching all OS threads all the time for stepping, but it does allow us to maintain and track our OS thread correctly. This would require changes to the OS plug-ins where getting the thread list needs to be able to grab only the OS threads that are on core, and we can manually fetch the OS thread info for any thread plans, and the plug-in would still need to be able to fetch the complete list on public stop events.

Stepping on Android could really benefit from some of the plans you mention where we might not end up fetching all of the thread IDs on a stop from the GDB remote. One idea is, when we do a continue where we specify only one thread to run, or only a few threads get to run, then the stop reply would only need to mention these threads and any new threads that showed up during the last resume. We can safely assume any other threads were suspended and have the exact same state. This will take some careful tracking in the Process class, but it could be done. For some reason Android has 150 threads running a lot of times which causes stepping to take a while on many cases. A lot of these issues were fixed with my patches submitted a while back where when we are stepping over stuff, we don't always stop at all branches, but only stop at branches that are not function calls, but stepping can still be quite slow.

As long as the keeping thread plans around for threads that aren't there is only enabled when OS plug-in is around, I am fine with this direction. One questions: do we currently only allow stepping on real threads or OS threads that are backed by real threads right now? And if an OS thread is backed by a real thread, do we associate the step plan with the OS thread ID or the real thread ID (the core)? If we use the OS thread ID, another way to potentially fix this is to know in a thread plan that we are stepping an OS thread, and avoid fetching the complete OS list of threads for non public stops, but only fetch the thread list for any OS threads that have active thread plans and any OS threads for threads that are active on a core. Then we don't end up fetching all OS threads all the time for stepping, but it does allow us to maintain and track our OS thread correctly. This would require changes to the OS plug-ins where getting the thread list needs to be able to grab only the OS threads that are on core, and we can manually fetch the OS thread info for any thread plans, and the plug-in would still need to be able to fetch the complete list on public stop events.

The way I have it in another patch in the series is when UpdateThreadList decides a thread is not present, I ask whether there is an OS Plugin, if there is I preserve the ThreadPlanStack while deleting the Thread, and if there isn't I ditch the ThreadPlanStack along with the Thread. We could do other things if we want to get cleverer, but that's what I do at present.

Core threads are really the fiction in the case of the xnu OS Plugin. They are just the activations that happen to be on a processor when something stops. You can't do anything persistent with them because an activation could move from one core to another, be off core at a stop, etc. lldb_private::Threads are tricky, they can be an OS Plugin Thread, a Core Thread with a backing OS Plugin thread, an Core thread with no OS Plugin thread... And the same TID can be change modes across stops.

The thing that isn't a fiction is the TID for the thread. That's why it makes more sense to me that the persistent part of the thread data in lldb be linked to the TID and not the Thread object we happen to have made for it this time. Pulling the persistent part out and having it just depend on the TID, not on the details of how we produced the lldb_private::Thread representing that TID, makes all the updating logic easier to manage.

Stepping on Android could really benefit from some of the plans you mention where we might not end up fetching all of the thread IDs on a stop from the GDB remote. One idea is, when we do a continue where we specify only one thread to run, or only a few threads get to run, then the stop reply would only need to mention these threads and any new threads that showed up during the last resume. We can safely assume any other threads were suspended and have the exact same state. This will take some careful tracking in the Process class, but it could be done. For some reason Android has 150 threads running a lot of times which causes stepping to take a while on many cases. A lot of these issues were fixed with my patches submitted a while back where when we are stepping over stuff, we don't always stop at all branches, but only stop at branches that are not function calls, but stepping can still be quite slow.

Right, I think this is an enabling step for that sort of optimization.

jingham updated this revision to Diff 248868.Mar 6 2020, 5:41 PM
labath added a comment.Mar 9 2020, 5:18 AM

Thanks for the detailed response. I'll try to be equally understandable.

I am somewhat worried about the direction this is taking. Maybe vanishing threads are fine for os-level debugging (because there the "real" threads are the cpu cores, and the os threads are somewhat imaginary), but I definitely wouldn't want to do it for regular debugging. If my process has 1000 threads (not completely uncommon) then I don't think my debugger would be doing me a service showing me just the three it thinks are interesting and completely disregarding the others. Having a mode (it could even be default) which only highlights the interesting ones -- yes, that would definitely be useful. Making sure it doesn't touch the other threads when it doesn't need to -- absolutely. But I don't think it should pretend the other threads don't exist at all, because I may still have a reason to inspect those threads (or just to know that they're there). In fact, I wouldn't be surprised if this happened to the said kernel folks too, and they end up adding a custom command, which would enumerate all "threads".

First off, I also think that hiding some threads from users, or forcing people to say "thread list --force" or something like that to see them all would not be a user-friendly design. I'm not suggesting imposing that on general debugging. But if an OS plugin, for instance, decides it is too expensive to do this except as an explicit gesture, we need to support that decision.

But lldb stops many times during the course of a step where it doesn't return control to the user - all the step into code with no debug info, step out again, step across dynamic linker stubs or step through ObjC message dispatch dances we do involve many private stops per public spot... If a thread has not stopped "for a reason" at one of those stops, it takes no part in the "should stop" and "will resume" negotiations and reconstructing it for that stop really does serve no purpose. There's no way that the user will get a chance to see it.

So if we could avoid reading them in, that might make stepping go faster in the presence of lots of threads. BTW, with the accelerated stop packets where we get all the threads and their stop reasons on stop, I'm not sure this would really save all that much time. But talking to less capable stubs, it really can make a difference. Greg did a bunch of work to make sure stepping stopped as few times as possible because the Facebook gdb-remote stub didn't support these accelerated packets and getting all the thread info on each private stop was making stepping go really slowly. I didn't understand the motivation at first because for debugserver the changes didn't really make any difference.

Also, I don't think these two desires: not fetching threads when we don't need them (e.g. private stops) and showing them all to the user whenever they ask, are contradictory. After all we still have the gate of UpdateThreadListIfNeeded. All the commands that touch the thread list go through this. So even if we don't gather all the threads when we stop, we already have a mechanism in place whereby any command that actually touches the thread list can fetch them on demand.

Yes, I agree with everything above. If lldb does not need information about all threads in order to service a internal stop, then it shouldn't read it in, and the os plugin (or the remote stub) should not need to provide it. In fact I would even go so far as to say that we shouldn't need to read in this information for a public stop until a user performs an operation (e.g. thread list) that requires information about all threads.

What was actually the slow part here? Was it the os plugin reading the thread lists from the kernel memory? Or the fact that lldb is slow when it is confronted with them?

If it is the latter, then maybe we should fix that. For example, I think I remember lldb likes to check the PC of every thread before it does any kind of a resume -- I am not convinced that is really needed.

If fetching the threads is slow, then perhaps we could have an api, where the os threads are produced incrementally, and we ensure that lldb does not query for os threads needlessly during the hot paths (e.g. internal stops for servicing thread plans).

The thing that is slow is gathering all the threads. On Darwin, there's a kernel activation per user space thread and then a whole bunch of kernel threads. On a busy system, that ends up being a whole lot of threads. Producing them through the OS plugin interface involves a bunch of memory reads to fetch the thread data, and following pointers around to get the register state for the threads, etc.

Note also, the way the OS Plugins currently work, when the process stops the OS Plugin just gets asked to produce the threads backing the "on core" threads. That's all UpdateThreadListIfNeeded will fetch, and there's no lldb way to get the OS Plugin to fetch all the threads it can produce. The only way to do that with the Darwin kernel is to run other commands in the set of Python commands provided by the kernel team that either fetches a non-current thread by ID or forces fetching all the threads.

So the current state when working with OS Plugins violates your principle given above,

IIUC, this principle is only violated for this particular os plugin, which violates the plugin contract and does not return all threads through the get_thread_info interface. I don't know whether this contract is spelled out anywhere, but it seems like at least the code in lldb behaves that way.

So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).

That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..

but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful. But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.

In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object. That is the sole ambition of this set of patches.

We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).

I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing weak_ptr<Thread> (but with the downside mentioned in (*)).

(*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a Thread* already is.

jingham added a comment.EditedMar 9 2020, 10:30 AM

Thanks for the detailed response. I'll try to be equally understandable.

I am somewhat worried about the direction this is taking. Maybe vanishing threads are fine for os-level debugging (because there the "real" threads are the cpu cores, and the os threads are somewhat imaginary), but I definitely wouldn't want to do it for regular debugging. If my process has 1000 threads (not completely uncommon) then I don't think my debugger would be doing me a service showing me just the three it thinks are interesting and completely disregarding the others. Having a mode (it could even be default) which only highlights the interesting ones -- yes, that would definitely be useful. Making sure it doesn't touch the other threads when it doesn't need to -- absolutely. But I don't think it should pretend the other threads don't exist at all, because I may still have a reason to inspect those threads (or just to know that they're there). In fact, I wouldn't be surprised if this happened to the said kernel folks too, and they end up adding a custom command, which would enumerate all "threads".

First off, I also think that hiding some threads from users, or forcing people to say "thread list --force" or something like that to see them all would not be a user-friendly design. I'm not suggesting imposing that on general debugging. But if an OS plugin, for instance, decides it is too expensive to do this except as an explicit gesture, we need to support that decision.

But lldb stops many times during the course of a step where it doesn't return control to the user - all the step into code with no debug info, step out again, step across dynamic linker stubs or step through ObjC message dispatch dances we do involve many private stops per public spot... If a thread has not stopped "for a reason" at one of those stops, it takes no part in the "should stop" and "will resume" negotiations and reconstructing it for that stop really does serve no purpose. There's no way that the user will get a chance to see it.

So if we could avoid reading them in, that might make stepping go faster in the presence of lots of threads. BTW, with the accelerated stop packets where we get all the threads and their stop reasons on stop, I'm not sure this would really save all that much time. But talking to less capable stubs, it really can make a difference. Greg did a bunch of work to make sure stepping stopped as few times as possible because the Facebook gdb-remote stub didn't support these accelerated packets and getting all the thread info on each private stop was making stepping go really slowly. I didn't understand the motivation at first because for debugserver the changes didn't really make any difference.

Also, I don't think these two desires: not fetching threads when we don't need them (e.g. private stops) and showing them all to the user whenever they ask, are contradictory. After all we still have the gate of UpdateThreadListIfNeeded. All the commands that touch the thread list go through this. So even if we don't gather all the threads when we stop, we already have a mechanism in place whereby any command that actually touches the thread list can fetch them on demand.

Yes, I agree with everything above. If lldb does not need information about all threads in order to service a internal stop, then it shouldn't read it in, and the os plugin (or the remote stub) should not need to provide it. In fact I would even go so far as to say that we shouldn't need to read in this information for a public stop until a user performs an operation (e.g. thread list) that requires information about all threads.

What was actually the slow part here? Was it the os plugin reading the thread lists from the kernel memory? Or the fact that lldb is slow when it is confronted with them?

If it is the latter, then maybe we should fix that. For example, I think I remember lldb likes to check the PC of every thread before it does any kind of a resume -- I am not convinced that is really needed.

If fetching the threads is slow, then perhaps we could have an api, where the os threads are produced incrementally, and we ensure that lldb does not query for os threads needlessly during the hot paths (e.g. internal stops for servicing thread plans).

The thing that is slow is gathering all the threads. On Darwin, there's a kernel activation per user space thread and then a whole bunch of kernel threads. On a busy system, that ends up being a whole lot of threads. Producing them through the OS plugin interface involves a bunch of memory reads to fetch the thread data, and following pointers around to get the register state for the threads, etc.

Note also, the way the OS Plugins currently work, when the process stops the OS Plugin just gets asked to produce the threads backing the "on core" threads. That's all UpdateThreadListIfNeeded will fetch, and there's no lldb way to get the OS Plugin to fetch all the threads it can produce. The only way to do that with the Darwin kernel is to run other commands in the set of Python commands provided by the kernel team that either fetches a non-current thread by ID or forces fetching all the threads.

So the current state when working with OS Plugins violates your principle given above,

IIUC, this principle is only violated for this particular os plugin, which violates the plugin contract and does not return all threads through the get_thread_info interface. I don't know whether this contract is spelled out anywhere, but it seems like at least the code in lldb behaves that way.

So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).

I think when presenting an OS's threads through the funnel of a monitor stub that only knows about cores and not about the OS's concept of threads, you will very likely run into the problem where reconstructing all the threads at every stop is expensive and not particularly desirable. Two of the three instances of OS plugins that I know about do it this way. The other is for wrapping a cooperative multi-threading library, where fetching all the threads was pretty easy and there weren't too many. But for instance if getting all the threads really is slow, for the OS plugin to be useful we need to allow it to do partial reporting to be useful.

Anyway, I don't know that we need anything more added to the OS interface. There's a "make a thread for a TID" interface, which we can use to reap the persistent part of the thread, for instance when we make a public stop. If that's fast enough (I haven't gotten to that point with the xnu plugin) we can us that. We can add an API to query whether the plugin returns all threads, and gate our behavior on that rather than on "Has OS Plugin". But that's about all I can think of. I also have to support all the OS plugins in all the xnu dSYM's that are currently in the wild, too. So whatever we do has to work without requiring added behaviors.

That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..

but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful. But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.

In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object. That is the sole ambition of this set of patches.

We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).

I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing weak_ptr<Thread> (but with the downside mentioned in (*)).

Everything that holds onto its state using an ExecutionContextRef already holds onto the m_tid as the real entity, and the Thread is only a cache (see ExecutionContextRef::GetThreadSP). That probably hasn't been pushed hard, and I've only written a few tests with actually vanishing threads so far. But if it doesn't work properly then that's a bug to fix not a design to change.

ValueObjects (and therefore SBValues) use an ExecutionContextRef in their UpdatePoint, they don't hold onto a thread. SBThread's also hold ExecutionContextRef's not ThreadWP's to get back to their actual thread. In general, anything that wants to know "where am I in the state space of the process" is supposed to use an ExecutionContextRef to store its state not individual target/process/thread/etc. So I'm not sure this is so far off, but we would have to write some more tests with threads coming and going to make sure this is used consistently.

(*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a Thread* already is.

Not sure I understand this. If I run the process and a real thread exits and another one is created with the same TID, all before stopping, when I go to merge the thread lists on stop I'm going to think this different thread was the same thread because it had the same TID. I don't see how using Thread pointers rather than TID's helps with this. I still have to trust the TID I got from the monitor. We are always going to have a problem merging up Threads when TID's can be reused. I don't see how our Thread *'s - which don't represent anything in the target, and thus don't help with reasoning about persistence, help this at all.

For instance, on MacOS user space the TID we use is the "globally unique thread ID" so it never recurs. The xnu OS plugin has a unique identifier as well. Similarly, if this becomes a problem on Linux, IIRC you observe thread creation and deletion so you can do pruning at that point. But this reasoning is all about TID's not about Thread *'s. So I still think that TID's are a better identifier for our representation of threads than the Thread * that we happen to wrap them in. Note also, when OS Plugin's are around, the mapping from Thread * to TID is not quite so straightforward, and one of the things I like about moving the persistent parts of the Thread (which for now are only the ThreadPlanStack and the TID) out of the whole thread production logic is that it gets us out of knowing anything about this representation, and in the end we can ask the question we really care about: "Is the TID still here".

labath added a comment.Mar 9 2020, 1:07 PM

Everything that holds onto its state using an ExecutionContextRef already holds onto the m_tid as the real entity, and the Thread is only a cache (see ExecutionContextRef::GetThreadSP).

I'm sorry I missed that part. (I saw the ThreadWP, but didn't look further into how it is used).

I'm going to respond in more detail tomorrow, but for now I am just wanted to let you know that my main concern about this has been addressed.

So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).

I think when presenting an OS's threads through the funnel of a monitor stub that only knows about cores and not about the OS's concept of threads, you will very likely run into the problem where reconstructing all the threads at every stop is expensive and not particularly desirable. Two of the three instances of OS plugins that I know about do it this way. The other is for wrapping a cooperative multi-threading library, where fetching all the threads was pretty easy and there weren't too many. But for instance if getting all the threads really is slow, for the OS plugin to be useful we need to allow it to do partial reporting to be useful.

Anyway, I don't know that we need anything more added to the OS interface. There's a "make a thread for a TID" interface, which we can use to reap the persistent part of the thread, for instance when we make a public stop. If that's fast enough (I haven't gotten to that point with the xnu plugin) we can us that. We can add an API to query whether the plugin returns all threads, and gate our behavior on that rather than on "Has OS Plugin". But that's about all I can think of.

Yes, that's sort of what I was expecting. Another idea I had was to have a way to avoid reading the register context initially, and only have it be fetched on demand.

I also have to support all the OS plugins in all the xnu dSYM's that are currently in the wild, too. So whatever we do has to work without requiring added behaviors.

Yep, anything we do shouldn't break existing plugins, but I think its fair to require some modification to an existing plugin in order to enable a new feature (such as stepping across thread reschedules).

That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..

but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful. But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.

In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object. That is the sole ambition of this set of patches.

We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).

I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing weak_ptr<Thread> (but with the downside mentioned in (*)).

Everything that holds onto its state using an ExecutionContextRef already holds onto the m_tid as the real entity, and the Thread is only a cache (see ExecutionContextRef::GetThreadSP). That probably hasn't been pushed hard, and I've only written a few tests with actually vanishing threads so far. But if it doesn't work properly then that's a bug to fix not a design to change.

ValueObjects (and therefore SBValues) use an ExecutionContextRef in their UpdatePoint, they don't hold onto a thread. SBThread's also hold ExecutionContextRef's not ThreadWP's to get back to their actual thread. In general, anything that wants to know "where am I in the state space of the process" is supposed to use an ExecutionContextRef to store its state not individual target/process/thread/etc. So I'm not sure this is so far off, but we would have to write some more tests with threads coming and going to make sure this is used consistently.

This sounds very convincing to me. I was under the impression that this is kind of inventing something new, but as there already is a notion of floating Thread objects, then I don't have a problem with that. I'd say the existing ExecutionContextRef mechanism is working fairly well, as I haven't yet noticed how it re-fetches threads.

(*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a Thread* already is.

Not sure I understand this. If I run the process and a real thread exits and another one is created with the same TID, all before stopping, when I go to merge the thread lists on stop I'm going to think this different thread was the same thread because it had the same TID. I don't see how using Thread pointers rather than TID's helps with this. I still have to trust the TID I got from the monitor. We are always going to have a problem merging up Threads when TID's can be reused. I don't see how our Thread *'s - which don't represent anything in the target, and thus don't help with reasoning about persistence, help this at all.

a Thread* helps precisely because it "doesn't represent anything in the target". Since tid_ts are supposed to match what the OS notion of a thread identifier is, and these are not unique across time on all systems, we'd need to come up with some surrogate identifier to be able to tell them apart. A pointer one kind of a unique identifier.

However, you are right that this alone would not help us maintain accurate thread identity because the tid could come and go without us noticing (though it would help if we do observe the thread to disappear). Further changes would be needed to make that work (and it's not clear whether they are actually worth it). I was mainly trying to ensure that any new concept we come up with does not get in the way if that becomes necessary. But as we are not inventing a new concept here, this point is moot.

So this is technically not "our" fault. The plugin is doing something unsupported and gets weird behavior as a result. Now, I am not saying we should throw this plugin (maybe the only actually used os plugin) overboard. However, I was hoping that the process of supporting it would involve some improvements to the os plugin interface (so that it can do what it wants in a supported way), rather than declaring support for whatever it is the plugin is doing, even though we know it to be problematic (like, not being able to tell when we can throw thread plans away).

I think when presenting an OS's threads through the funnel of a monitor stub that only knows about cores and not about the OS's concept of threads, you will very likely run into the problem where reconstructing all the threads at every stop is expensive and not particularly desirable. Two of the three instances of OS plugins that I know about do it this way. The other is for wrapping a cooperative multi-threading library, where fetching all the threads was pretty easy and there weren't too many. But for instance if getting all the threads really is slow, for the OS plugin to be useful we need to allow it to do partial reporting to be useful.

Anyway, I don't know that we need anything more added to the OS interface. There's a "make a thread for a TID" interface, which we can use to reap the persistent part of the thread, for instance when we make a public stop. If that's fast enough (I haven't gotten to that point with the xnu plugin) we can us that. We can add an API to query whether the plugin returns all threads, and gate our behavior on that rather than on "Has OS Plugin". But that's about all I can think of.

Yes, that's sort of what I was expecting. Another idea I had was to have a way to avoid reading the register context initially, and only have it be fetched on demand.

I also have to support all the OS plugins in all the xnu dSYM's that are currently in the wild, too. So whatever we do has to work without requiring added behaviors.

Yep, anything we do shouldn't break existing plugins, but I think its fair to require some modification to an existing plugin in order to enable a new feature (such as stepping across thread reschedules).

That said, I don't really use or care about the os plugins, so if using it results in leaking thread plans, then whatever. The thing I want to understand is what kind of requirements does this place on the rest of lldb. Which leads me to my next comment..

but I think it fits the kernel folks workflow, since the actual number of threads in the kernel is overwhelming and not terribly useful. But you are right, if we wanted to delay getting threads on private stops, but make UpdateThreadListIfNeeded force a fetch, then we would have to add some API to allow us to fetch only the "on core" threads along the code path that handles internal stops.

In any case, in order to support having internal stops not fetch all threads - which we are currently doing with OS Plugins - we need to be able to persist the stateful data lldb holds for that thread across stops where we don't realize the Thread object. That is the sole ambition of this set of patches.

We've agreed that we would like to avoid gathering the thread data for internals stops that don't need it. Avoiding creating (updating) lldb_private::Thread objects, and having the not-(yet)-realized threads live as a tid_t is one way to achieve this goal, but I am not sure if it is the right one. I am not really against the idea -- since threads can come and go between stops pretty much arbitrarily (even without os plugins), it may make sense to have everything hold onto them as simple handles, and force a fetch through some Process api when one needs to access them may simplify some things (for one, the lifetime of Threads could become much stricter).

I can certainly believe that changing thread plans to use a tid_t instead of a Thread& is simpler than making Thread object management lazy, but I am not sure that is a fair comparison. Thread plans are only a single component which juggles Thread objects. There are many other pieces of code, which assume that the lifetime of a Thread object matches that of the entity it describes. SBThread is one such thing. If SBThread keeps referring to the actual thread as a pointer, then it will lose track of it when the actual thread goes away and reappears. Same goes for ValueObjects and SBValues -- if lldb_private::Thread is meant to be transient, then those should also be updated to use a tid_t(*), but then the changes required to do that become much larger, and it's not clear to me whether that is really worth it, since this is just kind of reinventing weak_ptr<Thread> (but with the downside mentioned in (*)).

Everything that holds onto its state using an ExecutionContextRef already holds onto the m_tid as the real entity, and the Thread is only a cache (see ExecutionContextRef::GetThreadSP). That probably hasn't been pushed hard, and I've only written a few tests with actually vanishing threads so far. But if it doesn't work properly then that's a bug to fix not a design to change.

ValueObjects (and therefore SBValues) use an ExecutionContextRef in their UpdatePoint, they don't hold onto a thread. SBThread's also hold ExecutionContextRef's not ThreadWP's to get back to their actual thread. In general, anything that wants to know "where am I in the state space of the process" is supposed to use an ExecutionContextRef to store its state not individual target/process/thread/etc. So I'm not sure this is so far off, but we would have to write some more tests with threads coming and going to make sure this is used consistently.

This sounds very convincing to me. I was under the impression that this is kind of inventing something new, but as there already is a notion of floating Thread objects, then I don't have a problem with that. I'd say the existing ExecutionContextRef mechanism is working fairly well, as I haven't yet noticed how it re-fetches threads.

To be fair, I think in the current state of things, this code is probably seldom used, since we track threads accurately and you would very seldom have a TID the process knows about that doesn't have a backing thread by the time you get to the level where people were using ExecutionContextRef's. So it's possible this doesn't work perfectly when threads do vanish - though I haven't seen this in any of my hand-testing. But it shows that when we were thinking about how to specify threads we made the same choice we did with frames, and specify them by PC/CFA for frames and TID for threads rather than the native lldb objects.

(*) I don't believe tid_t is actually a good way to keep track of the identity of a thread across time. It does uniquely identify a thread at any given point in time, but thread ids can be recycled (sometimes very quickly -- running the lldb test suite just once causes linux to wrap its pid space), which means you can't guarantee it will always refer to the same physical thread. If we're going to be using numbers to identify threads I think it should be some other kind of an identifier -- but that is sort of what a Thread* already is.

Not sure I understand this. If I run the process and a real thread exits and another one is created with the same TID, all before stopping, when I go to merge the thread lists on stop I'm going to think this different thread was the same thread because it had the same TID. I don't see how using Thread pointers rather than TID's helps with this. I still have to trust the TID I got from the monitor. We are always going to have a problem merging up Threads when TID's can be reused. I don't see how our Thread *'s - which don't represent anything in the target, and thus don't help with reasoning about persistence, help this at all.

a Thread* helps precisely because it "doesn't represent anything in the target". Since tid_ts are supposed to match what the OS notion of a thread identifier is, and these are not unique across time on all systems, we'd need to come up with some surrogate identifier to be able to tell them apart. A pointer one kind of a unique identifier.

However, you are right that this alone would not help us maintain accurate thread identity because the tid could come and go without us noticing (though it would help if we do observe the thread to disappear). Further changes would be needed to make that work (and it's not clear whether they are actually worth it). I was mainly trying to ensure that any new concept we come up with does not get in the way if that becomes necessary. But as we are not inventing a new concept here, this point is moot.

Yes, I think we're okay there.

Even if we get don't notice that a TID has changed its underlying Thread object, the thread plans all get asked "Are you stale" as part of the should-stop mechanism. For instance a "step-out" plan knows that it is stepping out to some Frame, and if that frame isn't on the stack it ditches itself. Similarly step plans expect to see the current stepping frame.. So if you did get it wrong and attach the Thread Plans from one underlying thread to another one which adopted its TID, they would all get discarded when applied to a thread that didn't make sense.

Is this one okay as well?

Everything looks good, just a question in inlined comment about having a thread plan hold onto a pointer to a thread. Seems dangerous

lldb/include/lldb/Target/ThreadPlan.h
601

This seems dangerous to hold onto? It can go stale if the thread goes away right? If we have the m_tid, why do we need this? If we want a thread pointer in this class, should it be a std::weak_ptr? It also seems that if we make a ThreadRef class that is something like:

struct ThreadRef {
  std::weak_ptr<lldb_private::Thread> m_thread_wp;
  lldb::tid_t m_tid;
};

We could re-use this in this class and ExecutionContextRef.

Everything looks good, just a question in inlined comment about having a thread plan hold onto a pointer to a thread. Seems dangerous

The way the ThreadPlanStacks will get used, every time a process stops, lldb will build up a new thread list like it does already, then we update the collection of thread plan stacks, passing in the new thread list. That will grab the TID for each new thread and look it up in the map of stacks (which is indexed by TID). Each time we find a TID from the new thread list that's already in the map, we set the Thread * in the stack map to the one we got the TID from. Then for all the TID's that aren't in the new thread list we either discard it (if we're not holding onto plans for vanished threads) or we set its Thread * to nullptr. So the Thread * is just acting as a cache so you don't need to keep looking up the TID every time you need to get the Thread * for a ThreadPlan between one stop and another.

Unless there's some way that a thread that we saw when we stopped can go away before the next resume without the ThreadList being notified so it can re-sync, this mechanism is safe, and I can't think of any mechanism of that sort in lldb.

If you're still worried about this, I think the safe solution is to remove the Thread * and look up the TID in the Process every time you need the Thread *. Using a ThreadWP doesn't really express what you want here, because the Thread may have been discarded but it's SP is still alive because it's in some list we haven't gotten rid of or something like that. So WP.lock() working doesn't really mean the thread is still alive in the process, but TID in ThreadList does.

Is this one okay as well?

All my concerns have been addressed.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 3 2020, 3:12 PM
This revision was automatically updated to reflect the committed changes.