This is an archive of the discontinued LLVM Phabricator instance.

[lldb][NFC] Use an enum instead of chars when handling options [WIP]
Changes PlannedPublic

Authored by teemperor on Jul 29 2019, 2:41 AM.

Details

Reviewers
JDevlieghere
Summary

Currently in LLDB we handle options like this:

switch(short_option) {
case 'g': m_force = true; break;
case 'f': m_global = true; supportGlobal(); break;
default:
  error.SetErrorStringWithFormat("unrecognized options '%c'",
                                       short_option);
  break;

This format has a two problems:

  1. If we don't handle an option in this switch statement (because of a typo, a changed short-option

name, or because someone just forgot to implement it), we only find out when we have a test or user
that notices we get an error when using this option. There is no compile-time verification of this.

  1. It's not very easy to read unless you know all the short options for all commands.

This patch makes our tablegen emit an enum that represents the options, which changes the code above to this (using SettingsSet as an example):

auto opt = toSettingsSetEnum(short_option, error);
if (!opt)
  return error;

switch (opt) {
case SettingsSet::Global: m_force = true; break;
case SettingsSet::Force: m_global = true; supportGlobal(); break;
// No default with error handling, already handled in toSettingsSetEnum.
// If you don't implement an option, this will trigger a compiler warning for unhandled enum value.
}
NOTE: This is just a dummy patch to get some feedback if people are for some reason opposed to this change (which would save me from converting all our switch-cases to the new format). I only changed the code for settings set to demonstrate the change.

Diff Detail

Event Timeline

teemperor created this revision.Jul 29 2019, 2:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2019, 2:41 AM

For the sake of completeness, that's how Clang would warn about unimplemented options:

llvm-project/lldb/source/Commands/CommandObjectSettings.cpp:102:15: warning: enumeration value 'Global' not handled in switch [-Wswitch]
      switch (*opt) {
              ^
llvm-project/lldb/source/Commands/CommandObjectSettings.cpp:102:15: note: add missing switch cases
      switch (*opt) {
              ^
labath added a subscriber: labath.Jul 29 2019, 7:58 AM

How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:

void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"

The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

clayborg added a subscriber: clayborg.EditedJul 29 2019, 12:59 PM

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

Yeah, that has been my experience with table gen stuff. Does the table gen stuff generate code that exists in the build folder, like headers and/or C++ code? Navigating using an IDE often fails on these because the build system doesn't know about it directly. Any way to generate code in the source tree from the .inc files that we check in? Then if anyone modifies the .inc file it would regenerate (during the normal build process which could depend on the .inc _and_ on the .h or .cpp file that gets generated) the .h and .cpp file in the source tree and they would show up as modifications.

JDevlieghere added a comment.EditedJul 29 2019, 1:29 PM

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

TableGen's goal is "... to help a human develop and maintain records of domain-specific information." I think having a bit of code generation is nice, as longs as things don't become magical. The current patch is fine I think and I think Pavel's suggestion sounds good, as long as it's well documented. It's also the reason I suggest using the record names, instead of converting the name to CamelCase, so that you can at least grep for them.

Yeah, that has been my experience with table gen stuff. Does the table gen stuff generate code that exists in the build folder, like headers and/or C++ code? Navigating using an IDE often fails on these because the build system doesn't know about it directly. Any way to generate code in the source tree from the .inc files that we check in? Then if anyone modifies the .inc file it would regenerate (during the normal build process which could depend on the .inc _and_ on the .h or .cpp file that gets generated) the .h and .cpp file in the source tree and they would show up as modifications.

I really wouldn't want the generated code to be checked-in. The whole idea is that you don't really care what the (generated) table looks like, you only care about its content, which is defined in a human-readable way in the .td file. For me, TabelGen is about records rather than generating arbitrary code. Including some boiler plate is good (like the array definitions), but the surrounding code needs to be understandable, without having to look at the TableGen backend or generated code.

lldb/utils/TableGen/LLDBOptionDefEmitter.cpp
211

Can we use the option name instead, like I did for the properties? Or would that cause conflicts?

The current patch is fine I think and I think Pavel's suggestion sounds good, as long as it's well documented.

We could make that less magical by including the name of the "setter" method in the tablegen file instead of relying on some convention for deriving it from the setting name. Ideally, in the longer term, I think we shouldn't even need to generate the "SetOptionValue" method, as I think there should be a way to write a generic, non-generated piece of code which loops over available settings and calls the appropriate setter method. But that may be easier to achieve once there is a single source of truth for all "SetOptionValue" methods (i.e., tablegen).

The thing I don't like about the enum approach is that it adds another layer to the option-setting code, whereas I think that the main problem is that the option-setting code has one too many layers already.

Also, -1 to checking in generated code.

teemperor marked an inline comment as done.Aug 1 2019, 2:33 AM

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

That's actually just toOptionEnumSettingsSet('-f', error). I want to get rid of the whole generated method by just placing the enum value in the OptionDefinition struct (which requires some refactoring, but should be doable in the long-term).

The thing I don't like about the enum approach is that it adds another layer to the option-setting code, whereas I think that the main problem is that the option-setting code has one too many layers already.

Agreed.

How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:

void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"

The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself

This seems like a lot of boilerplate when we have to write 300+ one-statement methods for assigning options. Also I would prefer to not use tablegen for generating executable code if possible because that is just hard to read (the function we generate here is already something I only consider as a temporary workaround).

lldb/utils/TableGen/LLDBOptionDefEmitter.cpp
211

If you mean if we can just call it OptionEnumSet instead of OptionEnumSettingsSet, then I assume that could cause conflicts if we implement multiple smaller commands in the same file (which we currently do).

labath added a comment.Aug 1 2019, 3:37 AM

How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:

void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"

The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself

This seems like a lot of boilerplate when we have to write 300+ one-statement methods for assigning options.

If they would really be just one-liners, then this might still be an improvement over the current solution, because now you need at least three lines for each option:

case eFoo:
  foo = bar;
  break;

And this doesn't include the boilerplate around the switch statement -- i.e., checking the result of toOptionEnumSettingsSet -- it's not even clear to me under which circumstances can toOptionEnumSettingsSet` fail to return a value. Shouldn't the caller verify that it is calling us with the correct argument? If we're able to avoid that and just have the user code be:

switch(toOptionEnumSettingsSet(???)) {
...
}

then I would find the enum solution fine. However, right now, it seems more complicated than it ought to be..

Additionally, if setting the option is really that simple, then we could have tablegen generate that too (by just giving it a variable name to set), possibly with the option to fall back to a function for more complex options (which is better handled in a separate function anyway).

Also I would prefer to not use tablegen for generating executable code if possible because that is just hard to read (the function we generate here is already something I only consider as a temporary workaround).

I agree, but on the other hand, temporary workarounds have a habit of becoming permanent, so I'd like to avoid introducing a sub-optimal solution, if there's a better way to do that available..

How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:

void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"

The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself

This seems like a lot of boilerplate when we have to write 300+ one-statement methods for assigning options.

If they would really be just one-liners, then this might still be an improvement over the current solution, because now you need at least three lines for each option:

case eFoo:
  foo = bar;
  break;

The setter function still seems more verbose than three short lines, but I don't have a strong opinion about that so setters are fine. Still, if we generate some function calling the appropriate setters, we will end up with control flow going through that function which I would like to avoid. Grepping for SetOptionGlobal for example would not yield any call location when people try to understand who calls these methods (which is one of the things that @jingham says he's worried about, correct me if I'm wrong).

And this doesn't include the boilerplate around the switch statement -- i.e., checking the result of toOptionEnumSettingsSet -- it's not even clear to me under which circumstances can toOptionEnumSettingsSet` fail to return a value. Shouldn't the caller verify that it is calling us with the correct argument? If we're able to avoid that and just have the user code be:

switch(toOptionEnumSettingsSet(???)) {
...
}

It fails on unrecognized options (which is currently handled in every single command separately). If we can lift this up (I hope every command actually handles invalid options in the same name) then the switch-syntax should be possible.

then I would find the enum solution fine. However, right now, it seems more complicated than it ought to be..

Additionally, if setting the option is really that simple, then we could have tablegen generate that too (by just giving it a variable name to set), possibly with the option to fall back to a function for more complex options (which is better handled in a separate function anyway).

Also I would prefer to not use tablegen for generating executable code if possible because that is just hard to read (the function we generate here is already something I only consider as a temporary workaround).

I agree, but on the other hand, temporary workarounds have a habit of becoming permanent, so I'd like to avoid introducing a sub-optimal solution, if there's a better way to do that available..

I think I phrased that badly. I meant that it's a workaround until the patch is ready to land, not the usual temporary fix(TM) :)

labath added a comment.Aug 1 2019, 4:25 AM

The setter function still seems more verbose than three short lines, but I don't have a strong opinion about that so setters are fine. Still, if we generate some function calling the appropriate setters, we will end up with control flow going through that function which I would like to avoid. Grepping for SetOptionGlobal for example would not yield any call location when people try to understand who calls these methods (which is one of the things that @jingham says he's worried about, correct me if I'm wrong).

I think we understand @jingham's concerns the same way. My answer to that would be that you'd find "SetOptionGlobal" somewhere in the tablegen file, which should be enough to signal that something tablegen-y is going on..

And this doesn't include the boilerplate around the switch statement -- i.e., checking the result of toOptionEnumSettingsSet -- it's not even clear to me under which circumstances can toOptionEnumSettingsSet` fail to return a value. Shouldn't the caller verify that it is calling us with the correct argument? If we're able to avoid that and just have the user code be:

switch(toOptionEnumSettingsSet(???)) {
...
}

It fails on unrecognized options (which is currently handled in every single command separately). If we can lift this up (I hope every command actually handles invalid options in the same name) then the switch-syntax should be possible.

Hmm.. how hard would it be to achieve that? If we can make that happen, then I think this might be the most reasonable compromise...

In fact, how sure are you that the unrecognised options are not already handled somewhere higher up? Doesn't all parsing already go through Options::Parse, which already checks for '?' (and other error-ish results from getopt). After that point, I'd hope that we can assume that the returned value does indeed correspond to some entry in our option vector...

I agree, but on the other hand, temporary workarounds have a habit of becoming permanent, so I'd like to avoid introducing a sub-optimal solution, if there's a better way to do that available..

I think I phrased that badly. I meant that it's a workaround until the patch is ready to land, not the usual temporary fix(TM) :)

Ah, ok, thanks for explaining that. I guess that would avoid generating code completely, but I think we should still try find a way to streamline the code that the user writes.

teemperor planned changes to this revision.Sep 2 2019, 1:11 AM

(mark as WIP)