This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Make callback-based formatter matching available from the CLI.
ClosedPublic

Authored by jgorbe on Oct 28 2022, 5:15 PM.

Details

Summary

This change adds a --recognizer-function (-R) to type summary add
and type synth add that allows users to specify that the names in
the command are not type names but python function names.

It also adds an example to lldb/examples, and a section in the data
formatters documentation on how to use recognizer functions.

Diff Detail

Event Timeline

jgorbe created this revision.Oct 28 2022, 5:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 5:15 PM
jgorbe requested review of this revision.Oct 28 2022, 5:15 PM

Seems fine to me.

lldb/source/Commands/Options.td
1238

There's a little-known trick to avoid a short version of an option by using a non-printable character here (like "\\x01", see disassemble --force). Given that we don't expect this to be widely used (and that R could also mean Regex), maybe this is the time to use it?

jingham requested changes to this revision.Oct 31 2022, 9:56 AM

This looks good. The one detail that remains is error handling for the command. You can't provide the --regex option and this matcher at the same time, and it is also an error to have both this matcher function and any arguments in the argument list. We handle the former by judicious use of "option groups" (see the Group(N) entries in the Option.td). The latter you will have to do by hand, checking that if match_type is eFormatterMatchCallback, then if there are any arguments, you should raise an error.

This revision now requires changes to proceed.Oct 31 2022, 9:56 AM

You can't provide the --regex option and this matcher at the same time

Thanks for spotting this. I'll get to it.

it is also an error to have both this matcher function and any arguments in the argument list

Not with the code as it is. I've made the new flag have similar semantics to --regex. If I'm not mistaken, with --regex, all the arguments in the argument list are treated as regular expressions instead of type names, and with --recognizer-function all the arguments in the argument list will be treated as recognizer functions instead.

Would you prefer having the --recognizer-function flag take an argument instead and error out if the argument list is not empty? On one hand I can see the argument for that being somewhat more intuitive. On the other hand, making the new flag be "special" (in the sense that it doesn't behave like --regex) makes the command set a little less consistent which IMO is less intuitive. I'm happy to go either way if you have a strong opinion about the way it should work.

I'm looking at the option of using a non-printable character for the short flag, and at the same time make --regex and --recognizer-function mutually exclusive using option groups. One problem I see is that the command help gets pretty confusing. Using a non-printable character as the short option name makes the option completely disappear from the "Command Options Usage" summary part of the help.

For example, for disassemble the --force option is defined with Groups[2,3,4,5,7] but there's no indication in the usage summary that --force is only usable in some of the groups:

Command Options Usage:
  disassemble [-bkmr] -s <address-expression> [-A <arch>] [-C <num-lines>] [-e <address-expression>] [-F <disassembly-flavor>] [-P <plugin>]
  disassemble [-bkmr] -s <address-expression> [-A <arch>] [-C <num-lines>] [-c <num-lines>] [-F <disassembly-flavor>] [-P <plugin>]
  disassemble [-bkmr] [-A <arch>] [-C <num-lines>] [-c <num-lines>] [-F <disassembly-flavor>] [-n <function-name>] [-P <plugin>]
  disassemble [-bfkmr] [-A <arch>] [-C <num-lines>] [-c <num-lines>] [-F <disassembly-flavor>] [-P <plugin>]
  disassemble [-bkmpr] [-A <arch>] [-C <num-lines>] [-c <num-lines>] [-F <disassembly-flavor>] [-P <plugin>]
  disassemble [-bklmr] [-A <arch>] [-C <num-lines>] [-F <disassembly-flavor>] [-P <plugin>]
  disassemble [-bkmr] [-a <address-expression>] [-A <arch>] [-C <num-lines>] [-c <num-lines>] [-F <disassembly-flavor>] [-P <plugin>]

In the type summary add and type synth add class it would be even worse IMO, because we're now using the groups to enforce two mutually exclusive flags. --regex (-x) is currently accepted everywhere:

type summary add -c [-Oprvx] [-C <boolean>] [-w <name>] <name> [<name> [...]]
type summary add [-ehprvx] -s <summary-string> [-C <boolean>] [-w <name>] [-n <name>] <name> [<name> [...]]
type summary add [-Pehprvx] [-C <boolean>] [-w <name>] [-n <name>] [-F <python-function>] [-o <python-script>] <name> [<name> [...]]

so we need to double the number of groups. Half would have -x and the other half would have -R. But if we don't have a short flag, then we end up with something like this:

type summary add -c [-Oprvx] [-C <boolean>] [-w <name>] <name> [<name> [...]]
type summary add [-ehprvx] -s <summary-string> [-C <boolean>] [-w <name>] [-n <name>] <name> [<name> [...]]
type summary add [-Pehprvx] [-C <boolean>] [-w <name>] [-n <name>] [-F <python-function>] [-o <python-script>] <name> [<name> [...]]
type summary add -c [-Oprv] [-C <boolean>] [-w <name>] <name> [<name> [...]]
type summary add [-ehprv] -s <summary-string> [-C <boolean>] [-w <name>] [-n <name>] <name> [<name> [...]]
type summary add [-Pehprv] [-C <boolean>] [-w <name>] [-n <name>] [-F <python-function>] [-o <python-script>] <name> [<name> [...]]

... where the only visible difference between groups 1-3 and 4-6 is that they don't have the optional -x.

I think a reasonable compromise would be to remove the short -R flag, not use groups to enforce the mutual exclusion with -x, and have some ad-hoc error logic to alert the user if they specify both options at the same time. This way we don't have to duplicate the number of groups just for the sake of an option that will presumably be only very rarely used. I'll give that a try.

jgorbe updated this revision to Diff 472185.Oct 31 2022, 4:56 PM
  • Removed -R short option to avoid confusion with regex. Updated doc, test case, and example.
  • Added some logic to make --recognizer-function and --regex mutually exclusive. Specifying both now results in an error being reported to the user.
  • Rebased on top of current HEAD.
labath added a comment.Nov 2 2022, 7:11 AM

I'm looking at the option of using a non-printable character for the short flag, and at the same time make --regex and --recognizer-function mutually exclusive using option groups. One problem I see is that the command help gets pretty confusing. Using a non-printable character as the short option name makes the option completely disappear from the "Command Options Usage" summary part of the help.

For example, for disassemble the --force option is defined with Groups[2,3,4,5,7] but there's no indication in the usage summary that --force is only usable in some of the groups:

Hmm.. you're right, I didn't know that. I'll leave it up to you, but personally I wouldn't consider it a blocker, as I think that the way we print our option group help needs an overhaul anyway. I mean, the idea is kinda nice, but after a certain number of options (and groups) it just becomes ridiculous. For example, I'd like to meet the person who can make sense of this:

Command Options Usage:
  breakpoint set [-DHd] -l <linenum> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-u <column>] [-f <filename>] [-m <boolean>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -a <address-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-s <shlib-name>]
  breakpoint set [-DHd] -n <function-name> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -F <fullname> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -S <selector> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -M <method> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -r <regular-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -b <function-name> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-ADHd] -p <regular-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-f <filename>] [-m <boolean>] [-s <shlib-name>] [-X <function-name>]
  breakpoint set [-DHd] -E <source-language> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-h <boolean>] [-w <boolean>]
  breakpoint set [-DHd] -P <python-class> [-k <none>] [-v <none>] [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-f <filename>] [-s <shlib-name>]
  breakpoint set [-DHd] -y <linespec> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-m <boolean>] [-s <shlib-name>] [-K <boolean>]
jingham accepted this revision.Nov 8 2022, 2:01 PM

It is a bit odd to follow the what the help summary says is a valid option set and then get an error. But this is a corner case, and since you say explicitly that you can't provide both that's probably good enough. Would you mind changing "Incompatible with" to: "Cannot be specified at the same time as". Incompatible isn't precise - does it mean you can't use them both, or that if you do something weird will happen?

Other than that, this is fine. Thanks for working on this.

This revision is now accepted and ready to land.Nov 8 2022, 2:01 PM

I'm looking at the option of using a non-printable character for the short flag, and at the same time make --regex and --recognizer-function mutually exclusive using option groups. One problem I see is that the command help gets pretty confusing. Using a non-printable character as the short option name makes the option completely disappear from the "Command Options Usage" summary part of the help.

For example, for disassemble the --force option is defined with Groups[2,3,4,5,7] but there's no indication in the usage summary that --force is only usable in some of the groups:

Hmm.. you're right, I didn't know that. I'll leave it up to you, but personally I wouldn't consider it a blocker, as I think that the way we print our option group help needs an overhaul anyway. I mean, the idea is kinda nice, but after a certain number of options (and groups) it just becomes ridiculous. For example, I'd like to meet the person who can make sense of this:

Command Options Usage:
  breakpoint set [-DHd] -l <linenum> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-u <column>] [-f <filename>] [-m <boolean>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -a <address-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-s <shlib-name>]
  breakpoint set [-DHd] -n <function-name> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -F <fullname> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -S <selector> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -M <method> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -r <regular-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-DHd] -b <function-name> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-f <filename>] [-L <source-language>] [-s <shlib-name>] [-K <boolean>]
  breakpoint set [-ADHd] -p <regular-expression> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-f <filename>] [-m <boolean>] [-s <shlib-name>] [-X <function-name>]
  breakpoint set [-DHd] -E <source-language> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-h <boolean>] [-w <boolean>]
  breakpoint set [-DHd] -P <python-class> [-k <none>] [-v <none>] [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-N <breakpoint-name>] [-f <filename>] [-s <shlib-name>]
  breakpoint set [-DHd] -y <linespec> [-G <boolean>] [-C <command>] [-c <expr>] [-i <count>] [-o <boolean>] [-q <queue-name>] [-t <thread-id>] [-x <thread-index>] [-T <thread-name>] [-R <address>] [-N <breakpoint-name>] [-m <boolean>] [-s <shlib-name>] [-K <boolean>]

If I were starting over, I would make "break set" be a multiword command with sub-commands for all the primary specification operations (break set file, break set name etc.) We could use unique first letters for all the actual operations, so this would actually be LESS typing by one dash, and would allow for better documentation of the shared options. But I don't think we can get away with that at this point.

This display while noisy is the only way you can see all the kinds of breakpoint setting operations, so we really have to have something like this, and it should be fairly reliable... It would also be nice if help also understood options, so for instance:help break set -a should only show you the variants that include that option.

It is a bit odd to follow the what the help summary says is a valid option set and then get an error. But this is a corner case, and since you say explicitly that you can't provide both that's probably good enough. Would you mind changing "Incompatible with" to: "Cannot be specified at the same time as". Incompatible isn't precise - does it mean you can't use them both, or that if you do something weird will happen?

After following Pavel's suggestion to remove the short flag it doesn't show up at all in the help summary, so the help summary still shows a valid option set. It just lists the long flag for the new option in the detailed list after the summary, like this:

Command Options Usage:
  type summary add -c [-Oprvx] [-C <boolean>] [-w <name>] <name> [<name> [...]]
  type summary add [-ehprvx] -s <summary-string> [-C <boolean>] [-w <name>] [-n <name>] <name> [<name> [...]]
  type summary add [-Pehprvx] [-C <boolean>] [-w <name>] [-n <name>] [-F <python-function>] [-o <python-script>] <name> [<name> [...]]

       --recognizer-function
            The names in the argument list are actually the names of python functions that decide whether to use this summary for any given type. Cannot be specified at the same time as --regex (-x).

Other than that, this is fine. Thanks for working on this.

Thank you both for helping me land this feature! :)

jgorbe updated this revision to Diff 474573.Nov 10 2022, 10:32 AM
  • Clarified help text for --recognizer-function as suggested by reviewer. Instead of "Incompatible with --regex (-x)." now it says "Cannot be specified at the same time as --regex (-x)."
  • Rebased on top of current HEAD.
This revision was landed with ongoing or failed builds.Nov 10 2022, 10:34 AM
This revision was automatically updated to reflect the committed changes.