This is an archive of the discontinued LLVM Phabricator instance.

Add modules stats into the "statistics dump" command.
ClosedPublic

Authored by clayborg on Oct 21 2021, 4:05 PM.

Details

Summary

The new module stats adds the ability to measure the time it takes to parse and index the symbol tables for each module, and reports modules statistics in the output of "statistics dump" along with the path, UUID and triple of the module. The time it takes to parse and index the symbol tables are also aggregated into new top level key/value pairs at the target level.

Diff Detail

Event Timeline

clayborg created this revision.Oct 21 2021, 4:05 PM
clayborg requested review of this revision.Oct 21 2021, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2021, 4:05 PM

The new output looks like:

(lldb) statistics dump
{
  "expressionEvaluation": {
    "failures": 0,
    "successes": 0
  },
  "firstStopTime": 0.35030829200000002,
  "frameVariable": {
    "failures": 0,
    "successes": 0
  },
  "launchOrAttachTime": 0.32964795800000002,
  "modules": [
    {
      "path": "/Users/gclayton/Documents/src/args/build/a.out",
      "symbolTableIndexTime": 1.375e-05,
      "symbolTableParseTime": 0.000191375,
      "triple": "arm64-apple-macosx11.0.0",
      "uuid": "B7F60DCD-68BF-3F2D-AE27-85253725636A"
    },
    {
      "path": "/usr/lib/dyld",
      "symbolTableIndexTime": 0.011123958999999999,
      "symbolTableParseTime": 0.0067836249999999997,
      "triple": "arm64e-apple-macosx11.6.0",
      "uuid": "38657979-1ABE-3C9A-BF64-EF3B746216AB"
    },
    {
      "path": "/usr/lib/libc++.1.dylib",
      "symbolTableIndexTime": 0.014424457999999999,
      "symbolTableParseTime": 0.0095513330000000004,
      "triple": "arm64e-apple-macosx11.6.0",
      "uuid": "BED05B96-CCAE-365A-B3F5-F8D72F5E77E1"
    },
    ...
  ],
  "targetCreateTime": 0.16556795899999999,
  "totalSymbolTableIndexTime": 0.047531957999999992,
  "totalSymbolTableParseTime": 0.073819333000000015
}
wallace accepted this revision.Oct 22 2021, 2:11 PM

looks good, does anyone else have any objections?

This revision is now accepted and ready to land.Oct 22 2021, 2:11 PM
JDevlieghere added inline comments.Oct 22 2021, 2:49 PM
lldb/include/lldb/Target/Statistics.h
106

As modules can span (and survive) targets, I'm confused this is part of the target stats?

clayborg added inline comments.Oct 22 2021, 4:28 PM
lldb/include/lldb/Target/Statistics.h
106

In order to create performance tests that can track and identify regressions, and because modules have no backlink to the lldb_private::Target object that is eventually might get added to, we can't get any signal in how long a target takes to load up and parse the modules (and soon debug info) that the target contains without asking each module to report the time it took the parse/index itself.

The stats of how long it took to load the module in the first place is very important information for tracking regressions. Not sure if there is a better way to do this.

We could iterate over all of the modules from the global module list, but I do like the idea of adding up the contributions for a given target by iterating over its current modules. But if you have more than one debug session going, I would hate to report all of the modules from both targets, possibly including all of the .o files from DWARF in .o files and then try to make any sense of the information.

One idea would be to emit stats on all modules in a top level list like "modules", which would contain all of the stats for all modules loaded into LLDB in the global module list, and then have a "targetModules" which could just be an array of UUID values that refer to the modules in the global "modules" list of dicts? But this still doesn't change the fact that aggregating the values at the target level would be off.

I am open to ideas on how to report this better.

After reading Jonas' comment about the module information being at the wrong level, I came up with a solution I think will work well. Instead of "statistics dump" emitting a dictionary for a single target when --all isn't specified, and then emitting a dictionary with a top level "targets" array when it is, we can always emit the top level dictionary that is equivalent to the DebuggerStats object. This top level dictionary will contain "modules" which is the global list of all modules that is currently loaded into LLDB, including all .o files from DWARF in .o, "targets" which is a list of target JSON stats, and then the total symbol table index/parse times are aggregated at this top level instead of in the target. The actual output now looks like:

{
  "modules": [
    {
      "identifier": 5837856648,
      "path": "/Users/gclayton/Documents/src/args/build/a.out",
      "symbolTableIndexTime": 1.4625e-05,
      "symbolTableParseTime": 0.00064383300000000001,
      "triple": "arm64-apple-macosx11.0.0",
      "uuid": "B7F60DCD-68BF-3F2D-AE27-85253725636A"
    },
    {
      "identifier": 5837128360,
      "path": "/usr/lib/dyld",
      "symbolTableIndexTime": 0.011349625,
      "symbolTableParseTime": 0.0066760420000000001,
      "triple": "arm64e-apple-macosx11.6.0",
      "uuid": "38657979-1ABE-3C9A-BF64-EF3B746216AB"
    },
  ],
  "targets": [
    {
      "expressionEvaluation": {
        "failures": 0,
        "successes": 0
      },
      "firstStopTime": 0.35298558299999999,
      "frameVariable": {
        "failures": 0,
        "successes": 0
      },
      "launchOrAttachTime": 0.33194058300000001,
      "moduleIdentifiers": [
        5837856648,
        5837128360
      ],
      "targetCreateTime": 0.16910445800000001
    }
  ],
  "totalSymbolTableIndexTime": 0.048417122000000014,
  "totalSymbolTableParseTime": 0.076428670000000004
}

Then each "target contains a "moduleIdentifiers" array, that identify the modules that are currently part of that target. Each module JSON now has a new "identifier" that can be matched up. We can't use the "UUID" from the module since not all modules have UUIDs.

Jonas, does this alleviate your concerns and fix the representation for the statistics?

clayborg updated this revision to Diff 381693.Oct 22 2021, 6:21 PM

Updated "statistics dump" command to always outout a dictionary that represents DebuggerStats. This means a top level dictionary will contain "modules", a list of all modules from the global module list, and "targets" which will be one target, or multiple if "--all" was supplied. With "modules" at the top level, it represents how modules are represented in LLDB better having them at the DebuggerStats level. Then target dictionaries can refer to these modules with "identifiers" which are unique integers.

Jonas, let me know if this new layout works for you! I updated the tests for the new output format.

this seems like a better solution. I like it

JDevlieghere accepted this revision.Oct 25 2021, 9:16 AM

Can we use the UUID as the module identifier (at least in the JSON)? Other than that, this LGTM.

Can we use the UUID as the module identifier (at least in the JSON)? Other than that, this LGTM.

Some binaries don't have UUID values and I didn't want to use a UUID when it was there and something else when the UUID wasn't there. The main example is all .o files don't have UUID values and many linux binaries may or may not have them.

This revision was automatically updated to reflect the committed changes.