This is an archive of the discontinued LLVM Phabricator instance.

Add a new command "target metrics".
AbandonedPublic

Authored by clayborg on Sep 30 2021, 12:56 AM.

Details

Summary

The new target metrics command will report internal timings and information that can be used to track performance in LLDB. The information can also be used to help reproduce the debug session and exact breakpoint state if the appropriate options are used. This allows users to report performance issues or bugs where breakpoints were not resolved and LLDB engineers can use this information to track the issue.

The command emits JSON so that the structured data can be used to report timings to databases for performance tracking.

(lldb) target metrics
{

"firstStopTime": 0.33988644499999998,
"launchOrAttachTime": 0.31808012899999999,
"targetCreateTime": 31.168737819,
"totalBreakpointResolveTime": 6.6650758780000006,
"totalDebugInfoIndexTime": 0,
"totalDebugInfoParseTime": 4.2728297900000118,
"totalDebugInfoSize": 6581323413,
"totalSymbolTableIndexTime": 17.815137097000008,
"totalSymbolTableParseTime": 12.261250563000004

}

With no options we report some timings at the top level:

"launchOrAttachTime" is the time it takes from the time launch or attach is started, to when we hit our first private eStateStopped state
"firstStopTime" is the time it takes to stop at our first public stop. This time only has meaning if you run your program to a known breakpoint and want to compare timings with new versions of LLDB. This is because your program might not stop at a breakpoint right away unless a targetted breakpoint has been set for measuring.
"targetCreateTime" is the time it takes to set the main executable in the target and all dependent shared libraries. This time will include the symbol preloading if symbol preloading is enabled. This time can help is measure how long it takes to parse the symbol tables, index the symbol tables, and index the debug info along with any debug info parsing that might be required.
"totalBreakpointResolveTime" is the time it took to resolve any breakpoints that were set. If you specify the "--breakpoints" or "-b" option, you can get a breakdown of resolve time on a per breakpoint level.
"totalDebugInfoIndexTime" is the time it took to index the debug information for all current modules in the target.
"totalDebugInfoParseTime" is the time it took to parse the debug information for all current modules in the target.
"totalDebugInfoSize" is the sum of all debug information sections from all object files in the module.
"totalSymbolTableIndexTime" is the time it took to parse the symbol tables for each object file in all modules.
"totalSymbolTableParseTime" is the time it took to index the symbol tables for each object file in all modules.

Adding the "--modules" options will include a top level "modules" array that contains full details on each module, breaking down the debug info parse/index times and size, symbol table parse/index times along with enough info to identify the module (path, triple, uuid).

(lldb) target metrics --modules
...

"modules": [
  {
    "debugInfoIndexTime": 0,
    "debugInfoParseTime": 0.054413763000000004,
    "debugInfoSize": 18118626,
    "path": "/Users/gclayton/Documents/src/lldb/main/Debug/bin/lldb",
    "symbolTableIndexTime": 0.22065103899999999,
    "symbolTableParseTime": 0.138976874,
    "triple": "x86_64-apple-macosx11.0.0",
    "uuid": "F59B8D3F-67B1-35EC-A1AA-E0ABB5CB3601"
  },
  {
    "debugInfoIndexTime": 0,
    "debugInfoParseTime": 0,
    "debugInfoSize": 0,
    "path": "/usr/lib/dyld",
    "symbolTableIndexTime": 0.01367178,
    "symbolTableParseTime": 0.0093677970000000006,
    "triple": "x86_64-apple-macosx11.6.0",
    "uuid": "0CC19410-FD43-39AE-A32A-50273F8303A4"
  },
  {
    "debugInfoIndexTime": 0,
    "debugInfoParseTime": 4.2184160270000115,
    "debugInfoSize": 6563204787,
    "path": "/Users/gclayton/Documents/src/lldb/main/Debug/bin/LLDB.framework/Versions/A/LLDB",
    "symbolTableIndexTime": 17.112896844000002,
    "symbolTableParseTime": 11.240680348,
    "triple": "x86_64-apple-macosx11.0.0",
    "uuid": "CB1763B7-BB3D-38A1-AB32-237E59244745"
  },

Adding the "--breakpoints option will include details on all breakpoints including the breakpoint ID, individual resolve time, and "details" which contains the serialized breakpoint settings that were already built into each breakpoint, so we can know the exact details of how a breakpoint was set.

(lldb) target metrics --breakpoints
{

"breakpoints": [
  {
    "details": {...},
    "id": 1,
    "resolveTime": 2.2936575110000001
  }

Adding the "--locations" options will dump the details of all resolved breakpoint locations withing each breakpoint:

(lldb) target metrics --locations
{

"breakpoints": [
  {
    ...,
    "locations": [
      {
        "enabled": true,
        "fileAddress": 15783486,
        "hardward": false,
        "hitCount": 0,
        "id": 1,
        "indirect": false,
        "loadAddress": 4319532606,
        "reexported": false,
        "resolved": true,
        "symbolContext": {
          "compUnit": "/Users/gclayton/Documents/src/lldb/main/llvm-project/lldb/source/Target/Target.cpp",
          "function": "lldb_private::Target::Target(lldb_private::Debugger&, lldb_private::ArchSpec const&, std::__1::shared_ptr<lldb_private::Platform> const&, bool)",
          "module": "CB1763B7-BB3D-38A1-AB32-237E59244745",
          "sourceColumn": 1,
          "sourceFile": "/Users/gclayton/Documents/src/lldb/main/llvm-project/lldb/source/Target/Target.cpp",
          "sourceLine": 123,
          "symbol": "lldb_private::Target::Target(lldb_private::Debugger&, lldb_private::ArchSpec const&, std::__1::shared_ptr<lldb_private::Platform> const&, bool)"
        }
      },
      {
        "enabled": true,
        "fileAddress": 15787038,
        "hardward": false,
        "hitCount": 0,
        "id": 2,
        "indirect": false,
        "loadAddress": 4319536158,
        "reexported": false,
        "resolved": true,
        "symbolContext": {
          "compUnit": "/Users/gclayton/Documents/src/lldb/main/llvm-project/lldb/source/Target/Target.cpp",
          "function": "lldb_private::Target::Target(lldb_private::Debugger&, lldb_private::ArchSpec const&, std::__1::shared_ptr<lldb_private::Platform> const&, bool)",
          "module": "CB1763B7-BB3D-38A1-AB32-237E59244745",
          "sourceColumn": 1,
          "sourceFile": "/Users/gclayton/Documents/src/lldb/main/llvm-project/lldb/source/Target/Target.cpp",
          "sourceLine": 123,
          "symbol": "lldb_private::Target::Target(lldb_private::Debugger&, lldb_private::ArchSpec const&, std::__1::shared_ptr<lldb_private::Platform> const&, bool)"
        }
      }
    ],
    "resolveTime": 4.3714183670000004

Event Timeline

clayborg created this revision.Sep 30 2021, 12:56 AM
clayborg requested review of this revision.Sep 30 2021, 12:56 AM
Herald added a project: Restricted Project. · View Herald Transcript

I have a feeling that there's a lot of overlap between this and the "statistics" command. As a user, I don't think I would be able to tell the difference between these two things.

Having two systems which do vaguely similar things does not seem like a good idea. Have you considered unifying them? Or maybe just deleting the old one, if its not used (I don't think it has gotten any improvements since it was initially introduced)?

+Vedant who had/has some plans with the statistics command at some point IIRC.

FWIW, I wanted to throw away the current statistics implementation for quite a while but I didn't have anything to replace it with. If we can replace it with this then that would be nice.

lldb/include/lldb/Core/Module.h
941

Could the double and uint64_t here/below be a typedef to some unit? typedef double Seconds/typedef uint64_t ByteCount or something along those lines. Would make the unit we're expecting more obvious and I can easily find all the related code.

lldb/test/API/commands/target/metrics/TestTargetMetrics.py
89 ↗(On Diff #376126)

We stopped using assertTrue unless there is really no better assert* alternative as it doesn't produce any error diagnostics on failure. There's a good overview of the replacement methods here which all print nice errors.

199 ↗(On Diff #376126)

This and the following line can be just target = self.createTestTarget() (here and in the rest of the test).

lldb/test/API/commands/target/metrics/main.cpp
2 ↗(On Diff #376126)

Unused function?

labath added inline comments.Sep 30 2021, 4:08 AM
lldb/include/lldb/Core/Module.h
941

*cough* *cough* std::chrono *cough* *cough*

teemperor added inline comments.Sep 30 2021, 4:25 AM
lldb/include/lldb/Core/Module.h
941

Yeah I didn't want to bring std::chrono in just yet but that would be the next logical step. FWIW, ElapsedTime::Duration already seems to be pretty much what we would want here, so we could just use that?

I have a feeling that there's a lot of overlap between this and the "statistics" command. As a user, I don't think I would be able to tell the difference between these two things.

Having two systems which do vaguely similar things does not seem like a good idea. Have you considered unifying them? Or maybe just deleting the old one, if its not used (I don't think it has gotten any improvements since it was initially introduced)?

+1 on unifying the two.

+Vedant who had/has some plans with the statistics command at some point IIRC.

FWIW, I wanted to throw away the current statistics implementation for quite a while but I didn't have anything to replace it with. If we can replace it with this then that would be nice.

We have (short term) plans to start using the statistics to collect more data. Vedant created some patches last year that I've rebased but haven't landed yet. I'll try to get these up on Phabricator ASAP. They're not that invasive, expect maybe the decision to move the statistics above the debugger level. That's something we need to capture things like how many lldb_asserts we've hit in a release build, as well as other metrics that span multiple debuggers (a common use case in Xcode). We're mostly interested in capturing how often something happens as opposed to how long something took (like in this patch) but I think there's room for both in the unified solution.

I have a feeling that there's a lot of overlap between this and the "statistics" command. As a user, I don't think I would be able to tell the difference between these two things.

Having two systems which do vaguely similar things does not seem like a good idea. Have you considered unifying them? Or maybe just deleting the old one, if its not used (I don't think it has gotten any improvements since it was initially introduced)?

I would be willing to take over the "statistics" command, but as it is written, it has to be enabled and disabled which I didn't like so I didn't want to have to worry about that. If we were to take over the "statistics" multiword command we would need to keep old "statistics enable" and "statistics disable" and make them no-ops and then add a JSON output. I started off this way, but it felt like a hack. And to be honest the statistics command doesn't do much. It also isn't placed correctly as I feel "target statistics" would be a better place and I would be fine renaming "metrics" to "statistics", but that would also be confusing to have two commands with the same name in different places.

clayborg updated this revision to Diff 376358.Sep 30 2021, 2:04 PM

Changes:

  • Swithed from using "double" to using ElapsedTime::Duration
  • Stop using assertTrue in tests, and use better assertXXX methods
clayborg marked 6 inline comments as done.Sep 30 2021, 2:05 PM
clayborg added inline comments.
lldb/include/lldb/Core/Module.h
941

I can switch to ElapsedTime::Duration

I would like to get a consensus on if we should move this functionality to "statistics" or not. The reasons that I didn't do it were:

  • "statistics" as a top level method doesn't really do what I wanted here, which was "target statistics", or stats that are related only to the current target. I am not all that interested in overall stats over all targets
  • I didn't want to have to make a human readable and JSON output
  • the "statistics enable"/"statistics disable" didn't seem to make sense as they were coded in top of tree. Maybe enabling and disabling makes more sense when more info is gathered in hidden patches that have not been upstreamed?

I would like to get a consensus on if we should move this functionality to "statistics" or not. The reasons that I didn't do it were:

  • "statistics" as a top level method doesn't really do what I wanted here, which was "target statistics", or stats that are related only to the current target. I am not all that interested in overall stats over all targets

How about we keep the statistics as a top-level entity (even above the debugger) and then have have the equivalent of metrics per debugger and per target? You can define different metrics at different levels. The commands could remain as is, where target metrics (or target statistics) dumps the metrics for the current target and the old statistics dumps the JSON for everything hierarchically.

  • I didn't want to have to make a human readable and JSON output

I'm not interested in human readable output either. For our purposes JSON only is fine. If we considered removing the statistics altogether I don't think anyone will miss the current human readable output?

  • the "statistics enable"/"statistics disable" didn't seem to make sense as they were coded in top of tree. Maybe enabling and disabling makes more sense when more info is gathered in hidden patches that have not been upstreamed?

I see two potential use cases for enabling/disabling statistics:

  • The most obvious one being a way to work around a bug or performance overhead caused by collecting the statistics.
  • The data that's being collected could serve as the basis for something like anonymous analytics. Enabling/disabling the statistics could be a way to opt-in or out from this data being automatically collected and shared.

I would like to get a consensus on if we should move this functionality to "statistics" or not. The reasons that I didn't do it were:

  • "statistics" as a top level method doesn't really do what I wanted here, which was "target statistics", or stats that are related only to the current target. I am not all that interested in overall stats over all targets

It seems confusing to have two ways to gather internal statistics (well three really because we also still have "log timers"). So it does seem good to gather at least the two obviously similar ones in one place.

We have loads of commands that operate on the "currently selected target" even though they aren't in the target hierarchy. So having the target specific parts of the statistics gathering do that as well is in line with many other commands. I don't think that's a problem. It might even be nice to have a --target-id so that if I wanted statistics on several of the targets I have loaded, I could do it with one command rather than having to issue the command over and over.

On a side note, I'm not sure it's really correct to limit the statistics to only one target, at least for any symbol or debug info parsing information. Since all the parsing goes on in the modules in the global shared cache, if you did have two targets running in one lldb session, then parsing module A will formally have to have been done on behalf of both targets, but will be triggered by only one or the other target. If the target that triggered parsing of A is not the current target, will the time to parse that get dropped? The work had to be done, and was only accounted to the other target by a timing accident.

  • I didn't want to have to make a human readable and JSON output

We have a pretty printer for StructuredData objects which are trivial to make from JSON, so a human-readable version should be easy to knock together. That will do for the first pass, and since we seem to be leaning to JSON as a format for various data gathering, if we want to make the output better we could improve the pretty printer, and add header names to our JSON, etc. But we can do that incrementally

  • the "statistics enable"/"statistics disable" didn't seem to make sense as they were coded in top of tree. Maybe enabling and disabling makes more sense when more info is gathered in hidden patches that have not been upstreamed?

If we are confident that gathering statistics really does have and always will have negligible performance impact, then we can drop the "enable" and "disable". I don't know that we can promise we'll never want to gather something that's a bit costly. And there are folks who run lldb in environments where they would rather we not spend time & memory on anything we don't absolutely have to.

So unless it's somehow prohibitively hard to disable gathering statistics, my feeling is we should offer that option.

Ok, so how does this sound:

  • This new command will replace the existing "statistics dump" command and emit JSON only by default
  • The "statistics enable/disable" will stay in place and allow expensive metric gathering to be enabled/disabled if needed. If the gathering is really cheap for certain metrics, then nothing needs to happen for "statistics enable/disable"
  • By default we will emit stats for the current target, and options can be added in the future to enable dumping for all targets and for debugger level stuff

Let me know if this sounds ok to everyone!?

Ok, so how does this sound:

  • This new command will replace the existing "statistics dump" command and emit JSON only by default
  • The "statistics enable/disable" will stay in place and allow expensive metric gathering to be enabled/disabled if needed. If the gathering is really cheap for certain metrics, then nothing needs to happen for "statistics enable/disable"
  • By default we will emit stats for the current target, and options can be added in the future to enable dumping for all targets and for debugger level stuff

Let me know if this sounds ok to everyone!?

This sounds fine to me.

I would mildly argue for adding the --target-id to start with. That should be super-cheap to implement since you just have to look up the target in the debugger. The reason I think it's a little important is that it is the way that you tell that the statistics you are dumping are debugger-wide or target specific. If --breakpoints supports the --target-id, then I know right away it's a target specific statistic, etc... So having this in place from the start will help somebody else who comes along to add some debugger-wide stats.

But I am not insisting on this.

We have (short term) plans to start using the statistics to collect more data. Vedant created some patches last year that I've rebased but haven't landed yet. I'll try to get these up on Phabricator ASAP.

D110893 & D110895

clayborg updated this revision to Diff 376421.Sep 30 2021, 9:11 PM

Changes:

  • Removed "target metrics" and took over "statistics dump"
  • Hooked old statistics into for expression evaluation and frame variable success and fails into this patch
  • No more textual output for "statistics dump", it is now JSON.
  • "statistics enable" and "statistics disable" are still valid commands, but they don't enable or disable anything since all stats currently gathered are very cheap to gather:

We have (short term) plans to start using the statistics to collect more data. Vedant created some patches last year that I've rebased but haven't landed yet. I'll try to get these up on Phabricator ASAP.

D110893 & D110895

See what I did in this patch. I think all of the functionality and splitting the stats out into a separate class are taken care of by this patch. Let me know if it works.

Thanks Greg! Following Pavel's suggestion in D110893, could you split up this patch into smaller ones? I think that will make it a lot easier to review and discuss.

Thanks Greg! Following Pavel's suggestion in D110893, could you split up this patch into smaller ones? I think that will make it a lot easier to review and discuss.

Will do

I have broken out the target only settings to make this patch smaller into: https://reviews.llvm.org/D111686

rmaz added a subscriber: rmaz.Oct 19 2021, 2:42 PM

is this going to be abandoned?