This is an archive of the discontinued LLVM Phabricator instance.

Add the stop count to "statistics dump" in each target's dictionary.
ClosedPublic

Authored by clayborg on Nov 12 2021, 3:27 PM.

Details

Summary

It is great to know how many times the target has stopped over its lifetime as each time the target stops, and possibly resumes without the user seeing it for things like shared library loading and signals that are not notified and auto continued, to help explain why a debug session might be slow. This is now included as "stopCount" inside each target JSON.

Diff Detail

Event Timeline

clayborg requested review of this revision.Nov 12 2021, 3:27 PM
clayborg created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2021, 3:27 PM
jingham accepted this revision.Nov 12 2021, 4:04 PM

The stop id includes all the user expression stops. So the same "drive through code" could produce very different stop counts if you ran a bunch of p-s in one session but not the other. You can't do better than that at present since we only track the natural/vrs. expression stop id for the last stop. But if it mattered you could keep a separate "natural stop" counter and report that instead. You can't quite get back to the natural stops by subtracting the number of expressions, because not all expressions run the target, and some do twice (if the one-thread run times out).

I'm not sure that's terribly important, provided people doing this analysis know that both stops are getting counted, and this is a useful statistic either way.

Anyway, other than that, LGTM.

This revision is now accepted and ready to land.Nov 12 2021, 4:04 PM
wallace accepted this revision.Nov 12 2021, 8:59 PM

The stop id includes all the user expression stops. So the same "drive through code" could produce very different stop counts if you ran a bunch of p-s in one session but not the other. You can't do better than that at present since we only track the natural/vrs. expression stop id for the last stop. But if it mattered you could keep a separate "natural stop" counter and report that instead. You can't quite get back to the natural stops by subtracting the number of expressions, because not all expressions run the target, and some do twice (if the one-thread run times out).

I actually don't care about the number of natural user stops as the user will be aware of these stops. The info I am interested in is when a debug session has thousands of stops that auto resumed a debug session. This can happen if unix signals are used for program flow control or working around things. Android uses SIGSEGV for catching java NULL derefs and many times we will auto resume from these depending on signal settings. If a debug session is taking a long time it is nice to know. Some users might also have some python module that gets loaded and might end up setting a breakpoint with a callback that can auto resume.

The one metric I do want in the future is the time spent in stops that auto resume, like shared library loading stops where it stops, does some work, and auto resumes. The unix signals that cause stops where we auto resume would fit into that category. So one idea for the future it so start a timer each time we stop and then right before we resume stop the timer and report this time as something like "transparentStopTime" or something similar. Expression evaluation could also be counted in the time it took to evaluate expression as a separate stat.

I'm not sure that's terribly important, provided people doing this analysis know that both stops are getting counted, and this is a useful statistic either way.

Anyway, other than that, LGTM.

I was only looking for the total stops for now as it was really easy to add and could give us some insight to slow debug sessions if this number is really high.

jingham added a comment.EditedNov 15 2021, 7:19 PM

The stop id includes all the user expression stops. So the same "drive through code" could produce very different stop counts if you ran a bunch of p-s in one session but not the other. You can't do better than that at present since we only track the natural/vrs. expression stop id for the last stop. But if it mattered you could keep a separate "natural stop" counter and report that instead. You can't quite get back to the natural stops by subtracting the number of expressions, because not all expressions run the target, and some do twice (if the one-thread run times out).

I actually don't care about the number of natural user stops as the user will be aware of these stops. The info I am interested in is when a debug session has thousands of stops that auto resumed a debug session. This can happen if unix signals are used for program flow control or working around things. Android uses SIGSEGV for catching java NULL derefs and many times we will auto resume from these depending on signal settings. If a debug session is taking a long time it is nice to know. Some users might also have some python module that gets loaded and might end up setting a breakpoint with a callback that can auto resume.

IMO, the real solution for this issue is for lldb to invent (if there isn't already) a "auto-continue signals" packet, so we could tell the stub to restart from those signals w/o even telling lldb about it. Oh, but I guess you can't do that for SIGSEGV...

The one metric I do want in the future is the time spent in stops that auto resume, like shared library loading stops where it stops, does some work, and auto resumes. The unix signals that cause stops where we auto resume would fit into that category. So one idea for the future it so start a timer each time we stop and then right before we resume stop the timer and report this time as something like "transparentStopTime" or something similar. Expression evaluation could also be counted in the time it took to evaluate expression as a separate stat.

Also, ThreadPlans stop a lot without any user-visible stops. So a complicated step can also rack up the stops. Right now, we don't do any accounting for "who is responsible" for auto-continuing, so you'd have to add that in.

I'm not sure that's terribly important, provided people doing this analysis know that both stops are getting counted, and this is a useful statistic either way.

Anyway, other than that, LGTM.

I was only looking for the total stops for now as it was really easy to add and could give us some insight to slow debug sessions if this number is really high.

Sounds fine.

The stop id includes all the user expression stops. So the same "drive through code" could produce very different stop counts if you ran a bunch of p-s in one session but not the other. You can't do better than that at present since we only track the natural/vrs. expression stop id for the last stop. But if it mattered you could keep a separate "natural stop" counter and report that instead. You can't quite get back to the natural stops by subtracting the number of expressions, because not all expressions run the target, and some do twice (if the one-thread run times out).

I actually don't care about the number of natural user stops as the user will be aware of these stops. The info I am interested in is when a debug session has thousands of stops that auto resumed a debug session. This can happen if unix signals are used for program flow control or working around things. Android uses SIGSEGV for catching java NULL derefs and many times we will auto resume from these depending on signal settings. If a debug session is taking a long time it is nice to know. Some users might also have some python module that gets loaded and might end up setting a breakpoint with a callback that can auto resume.

IMO, the real solution for this issue is for lldb to invent (if there isn't already) a "auto-continue signals" packet, so we could tell the stub to restart from those signals w/o even telling lldb about it. Oh, but I guess you can't do that for SIGSEGV...

Agreed. It would help speed things up in the lldb-server supports it. Like SIGCHILD and other signals you might want to just ignore.

The one metric I do want in the future is the time spent in stops that auto resume, like shared library loading stops where it stops, does some work, and auto resumes. The unix signals that cause stops where we auto resume would fit into that category. So one idea for the future it so start a timer each time we stop and then right before we resume stop the timer and report this time as something like "transparentStopTime" or something similar. Expression evaluation could also be counted in the time it took to evaluate expression as a separate stat.

Also, ThreadPlans stop a lot without any user-visible stops. So a complicated step can also rack up the stops. Right now, we don't do any accounting for "who is responsible" for auto-continuing, so you'd have to add that in.

Yep, it was more work than I have time for right now.

I'm not sure that's terribly important, provided people doing this analysis know that both stops are getting counted, and this is a useful statistic either way.

Anyway, other than that, LGTM.

I was only looking for the total stops for now as it was really easy to add and could give us some insight to slow debug sessions if this number is really high.

Sounds fine.

Keep an eye out for other stats and things that can help explain things.

One thing I will be doing soon is I want to time how long it takes to create TypeSystem AST instances. I know for Swift this can be really slow when you have many binaries that each have their own Swift AST and if they have a lot of clang modules that might need to be created and cached. I am sure that the Clang AST type system instances can take a while if there is any clang module caching they want to do. The first time can take a while and it would be good to measure.