This is an archive of the discontinued LLVM Phabricator instance.

[Command] Implement `stats`
ClosedPublic

Authored by davide on Apr 11 2018, 5:14 PM.

Details

Summary

This allows us to collect useful metrics about lldb debugging sessions.

I thought that an example would be better than a thousand words:

Process 19705 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x0000000100000fb4 blah`main at blah.c:3
   1    int main(void) {
   2      int a = 6;
-> 3      return 0;
   4    }
(lldb) stats enable
(lldb) frame var a
(int) a = 6
(lldb) expr a
(int) $1 = 6
(lldb) stats disable
### Start STATISTICS dump ###
Number of expr evaluation successes : 1
Number of expr evaluation failures : 0
Number of frame var successes : 1
Number of frame var failures : 0
### End STATISTICS dump ###

Future improvements might include:

  1. Passing a file, or implementing categories. The way this patch has been implements is generic enough to allow this to be extended easily without breaking the grammar
  2. Adding an SBAPI and Python API for use in scripts.

Thanks to Jim Ingham for discussing the design with me.

Diff Detail

Repository
rL LLVM

Event Timeline

davide created this revision.Apr 11 2018, 5:14 PM

My two cents,

Why print the '# Start STATISTICS dump #' header/footer when printing the results? I don't think we demarcate result output like that anywhere else in lldb. I don't think it adds anything for the user.

I would probably name the command statistics and have a built in "stats" alias.

Should statistics disable print the results, or should there be a 'statistics status' or 'statistics show' or 'statistics dump' command to show the statistics results accumulated so far?

Ah, no you couldn't set up a command alias like that. Still, if the full name was statistics, you could type 'stat' and it would match. 'stats' wouldn't, though.

I do think decoupling the disabling action from the dumping action would be an improvement. You may want to dump the current statistics multiple times during a sequence of commands, without disabling / re-enabling every time. And dropping the header/footer from the results.

Ah, no you couldn't set up a command alias like that. Still, if the full name was statistics, you could type 'stat' and it would match. 'stats' wouldn't, though.

I do think decoupling the disabling action from the dumping action would be an improvement. You may want to dump the current statistics multiple times during a sequence of commands, without disabling / re-enabling every time. And dropping the header/footer from the results.

OK! I don't mind adding another dump subcommand. I think it makes sense.
I can also disable the footer/header, I largely got that format from opt -time-passes. I don't feel strongly about it, so I'll let everybody else decide :)

I don't see myself using this command personally, but I agree that splitting the "disable stats collection" and "dump accumulated stats" into two actions seems a better choice (maybe disable could do a final auto-dump though). I also think the header/footer is not needed for a primarily interactive tool. If we have a command that dumps so much data that the user cannot tell where the output starts and ends then we're doing something wrong (PS: I wish we had automatic pagination for the backtrace command).

I am also wondering whether statistics should be a top-level command, or should we put it under "log". We already have "log timers enable/disable/...", which are doing something similar to this, so it seems like a good idea to move these two things closer together.

lldb/packages/Python/lldbsuite/test/functionalities/stats/main.c
12 ↗(On Diff #142101)

Maybe test for failure as well (I guess trying to print an non-existing variable should do the trick)?

lldb/source/Commands/CommandObjectStats.cpp
31–35 ↗(On Diff #142101)

I think we have some way (should involve eCommandRequiresTarget) to avoid the need for checking this.

43–46 ↗(On Diff #142101)

Are you planning to have some kind of decentralized method of registering statistics?
If not, then I don't see a need for explicitly registering the statistic kinds here, since the single source of truth about the available kinds can be StatisticKind enum, and Target can just get the list from there automatically when need (when stats are enabled?)

This way we could even simplify the code by avoiding the "statistic is not registered but someone still called IncrementStats" state altogether) and the stats map could be a simple array with NumStatistics elements.

I prefer having it as a top level command rather than part of log. If you think about it LLVM does the same distinction and it worked quite well in practice.
We plan to collect more metrics to the command so I'd very much like to have this living as a separate entity.

lldb/packages/Python/lldbsuite/test/functionalities/stats/main.c
12 ↗(On Diff #142101)

Testing for failure here is a little tricky because the expression parser reports a failure only when it fails internally (and not when succeeds but returns an error because clangAST can't find the variable). Eventually we could make it more fine grained but for now I just would like to get the number of failures resolving variable due to expression parsing logic failing (and not because typos).

lldb/source/Commands/CommandObjectStats.cpp
31–35 ↗(On Diff #142101)

I'll check this out, thanks!

43–46 ↗(On Diff #142101)

No. I originally thought to make it that way but I have to say that maybe it's easier to have a centralized mechanism. I'm still unsure whether this should be a vector or a map, I'll think about it more.

I prefer having it as a top level command rather than part of log. If you think about it LLVM does the same distinction and it worked quite well in practice.
We plan to collect more metrics to the command so I'd very much like to have this living as a separate entity.

I agree that "logging" and "statistics" sound like two different entities. Though in that case, maybe it's the "log timers" command that is misplaced. "performance timers" sound more like "statistics" than "logging" to me. Maybe eventually these two things should be merged even? I don't know what's the right solution here, and I don't care much about it even, though I would be sad if the choice we make now prevents some future unification due to backwards compatibility issues.

lldb/packages/Python/lldbsuite/test/functionalities/stats/main.c
12 ↗(On Diff #142101)

Makes sense, thanks for the explanation.

lldb/source/Commands/CommandObjectStats.cpp
43–46 ↗(On Diff #142101)

llvm manual recommends http://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc vector as a first choice data structure for a map. :P

I'm under the impression that we should either merge log timers with stats or just remove log timers altogether and start from scratch.
From what I hear from Jim, it was really useful for a few people, so maybe a fresh start would be a better way of handling things. Thanks!

lldb/source/Commands/CommandObjectStats.cpp
43–46 ↗(On Diff #142101)

Allright, you convinced me :)

I really like this idea! It will be very helpful for @sas and I. I'd like to +1 creating a separate stats dump subcommand instead of dumping stats on stats disable.

lldb/source/Commands/CommandObjectStats.cpp
37 ↗(On Diff #142101)

nit: You can drop the == true

88 ↗(On Diff #142101)

nit:

if (!target->GetCollectingStats()) {

Timers seemed like they would be really useful for collection of data about operations in lldb, but for most things I think they end up being hard to use because actual wall-clock time is so variable from run to run, and especially for disk and inter-process heavy operations, which lldb tends to be. I'm not sure we should give up on timers, sometimes you want to know "how many times did I do X" and other times "how long did X take" and the Timers are more useful for this than just a sample or wall-clock times because you can find out how long it took "in the Dwarf parser", etc.

But in many cases our performance is more driven by unnecessary lookups, and that sort of error. For that sort of error it will be much more useful to say "given program A and expression B, how many DWARF DIE lookups did I do" than "how long did I spent wall-clock in the DWARF parser..." When the former goes from 20 to 2000, that will be a much clearer symbol that we probably introduced a performance regression.

davide updated this revision to Diff 142301.Apr 12 2018, 4:47 PM

Timers seemed like they would be really useful for collection of data about operations in lldb, but for most things I think they end up being hard to use because actual wall-clock time is so variable from run to run, and especially for disk and inter-process heavy operations, which lldb tends to be. I'm not sure we should give up on timers, sometimes you want to know "how many times did I do X" and other times "how long did X take" and the Timers are more useful for this than just a sample or wall-clock times because you can find out how long it took "in the Dwarf parser", etc.

But in many cases our performance is more driven by unnecessary lookups, and that sort of error. For that sort of error it will be much more useful to say "given program A and expression B, how many DWARF DIE lookups did I do" than "how long did I spent wall-clock in the DWARF parser..." When the former goes from 20 to 2000, that will be a much clearer symbol that we probably introduced a performance regression.

Yes, I agree. I addressed all the comments.
Jim, can you please take a look? Thanks!

davide added inline comments.Apr 12 2018, 4:54 PM
lldb/source/Commands/CommandObjectStats.cpp
37 ↗(On Diff #142101)

Thanks, I'll fix these!

I made a few comments all to the same effect, you can use CommandObject::GetSelectedOrDummyTarget to get the target that the command is operating on, or you can get is straight from m_exe_ctx if you want to make sure that you are getting a real target. In the case of stats, I think running with the dummy target is fine, since for instance you can run expressions in the dummy target...

Other than that I think this is a good place to start from.

lldb/source/Commands/CommandObjectExpression.cpp
626 ↗(On Diff #142301)

This is what CommandObject::GetSelectedOrDummy target is for.

lldb/source/Commands/CommandObjectStats.cpp
30–31 ↗(On Diff #142301)

By the time the DoExecute is run, m_exe_ctx has the execution context, so you can get the target from there, or use GetSelectedOrDummyTarget. You can run expressions in the dummy target so there's no reason not to collect stats from it.

davide added inline comments.Apr 12 2018, 5:41 PM
lldb/source/Commands/CommandObjectExpression.cpp
626 ↗(On Diff #142301)

I just moved existing code, but I agree it's better using this API.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 13 2018, 11:07 AM
This revision was automatically updated to reflect the committed changes.

I really like this idea! It will be very helpful for @sas and I. I'd like to +1 creating a separate stats dump subcommand instead of dumping stats on stats disable.

Thanks! If you has ideas or want to add stuffs, don't hesitate to start a discussion and/or submit a review!