This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
source/Commands/
-
Commands/
-
CommandObjectSettings.cpp
-
utils/TableGen/
-
TableGen/
1/2
LLDBOptionDefEmitter.cpp

Differential D65386

[lldb][NFC] Use an enum instead of chars when handling options [WIP]
Changes PlannedPublic

Authored by teemperor on Jul 29 2019, 2:41 AM.

Download Raw Diff

Details

Reviewers

JDevlieghere

Summary

Currently in LLDB we handle options like this:

switch(short_option) {
case 'g': m_force = true; break;
case 'f': m_global = true; supportGlobal(); break;
default:
  error.SetErrorStringWithFormat("unrecognized options '%c'",
                                       short_option);
  break;

This format has a two problems:

If we don't handle an option in this switch statement (because of a typo, a changed short-option

name, or because someone just forgot to implement it), we only find out when we have a test or user
that notices we get an error when using this option. There is no compile-time verification of this.

It's not very easy to read unless you know all the short options for all commands.

This patch makes our tablegen emit an enum that represents the options, which changes the code above to this (using SettingsSet as an example):

auto opt = toSettingsSetEnum(short_option, error);
if (!opt)
  return error;

switch (opt) {
case SettingsSet::Global: m_force = true; break;
case SettingsSet::Force: m_global = true; supportGlobal(); break;
// No default with error handling, already handled in toSettingsSetEnum.
// If you don't implement an option, this will trigger a compiler warning for unhandled enum value.
}

NOTE: This is just a dummy patch to get some feedback if people are for some reason opposed to this change (which would save me from converting all our switch-cases to the new format). I only changed the code for settings set to demonstrate the change.

Diff Detail

Event Timeline

teemperor created this revision.Jul 29 2019, 2:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2019, 2:41 AM

Herald added subscribers: lldb-commits, abidh. · View Herald Transcript

For the sake of completeness, that's how Clang would warn about unimplemented options:

llvm-project/lldb/source/Commands/CommandObjectSettings.cpp:102:15: warning: enumeration value 'Global' not handled in switch [-Wswitch]
      switch (*opt) {
              ^
llvm-project/lldb/source/Commands/CommandObjectSettings.cpp:102:15: note: add missing switch cases
      switch (*opt) {
              ^

How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:

void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"

The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

In D65386#1604927, @jingham wrote:

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

Yeah, that has been my experience with table gen stuff. Does the table gen stuff generate code that exists in the build folder, like headers and/or C++ code? Navigating using an IDE often fails on these because the build system doesn't know about it directly. Any way to generate code in the source tree from the .inc files that we check in? Then if anyone modifies the .inc file it would regenerate (during the normal build process which could depend on the .inc _and_ on the .h or .cpp file that gets generated) the .h and .cpp file in the source tree and they would show up as modifications.

In D65386#1604927, @jingham wrote:

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

TableGen's goal is "... to help a human develop and maintain records of domain-specific information." I think having a bit of code generation is nice, as longs as things don't become magical. The current patch is fine I think and I think Pavel's suggestion sounds good, as long as it's well documented. It's also the reason I suggest using the record names, instead of converting the name to CamelCase, so that you can at least grep for them.

In D65386#1605054, @clayborg wrote:

Yeah, that has been my experience with table gen stuff. Does the table gen stuff generate code that exists in the build folder, like headers and/or C++ code? Navigating using an IDE often fails on these because the build system doesn't know about it directly. Any way to generate code in the source tree from the .inc files that we check in? Then if anyone modifies the .inc file it would regenerate (during the normal build process which could depend on the .inc _and_ on the .h or .cpp file that gets generated) the .h and .cpp file in the source tree and they would show up as modifications.

I really wouldn't want the generated code to be checked-in. The whole idea is that you don't really care what the (generated) table looks like, you only care about its content, which is defined in a human-readable way in the .td file. For me, TabelGen is about records rather than generating arbitrary code. Including some boiler plate is good (like the array definitions), but the surrounding code needs to be understandable, without having to look at the TableGen backend or generated code.

lldb/utils/TableGen/LLDBOptionDefEmitter.cpp
211	Can we use the option name instead, like I did for the properties? Or would that cause conflicts?

In D65386#1605119, @JDevlieghere wrote:

The current patch is fine I think and I think Pavel's suggestion sounds good, as long as it's well documented.

We could make that less magical by including the name of the "setter" method in the tablegen file instead of relying on some convention for deriving it from the setting name. Ideally, in the longer term, I think we shouldn't even need to generate the "SetOptionValue" method, as I think there should be a way to write a generic, non-generated piece of code which loops over available settings and calls the appropriate setter method. But that may be easier to achieve once there is a single source of truth for all "SetOptionValue" methods (i.e., tablegen).

The thing I don't like about the enum approach is that it adds another layer to the option-setting code, whereas I think that the main problem is that the option-setting code has one too many layers already.

Also, -1 to checking in generated code.

It worries me a little bit that we are making it harder and harder to figure out "where does the option for "-t" get stored once this CommandObject's options have been parsed. Can you show the steps I would have to go through to get from "-f" to OptionEnumSettingsSet::Force or whatever.

That's actually just toOptionEnumSettingsSet('-f', error). I want to get rid of the whole generated method by just placing the enum value in the OptionDefinition struct (which requires some refactoring, but should be doable in the long-term).

The thing I don't like about the enum approach is that it adds another layer to the option-setting code, whereas I think that the main problem is that the option-setting code has one too many layers already.

Agreed.

In D65386#1604498, @labath wrote:
How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:
void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"
The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself

This seems like a lot of boilerplate when we have to write 300+ one-statement methods for assigning options. Also I would prefer to not use tablegen for generating executable code if possible because that is just hard to read (the function we generate here is already something I only consider as a temporary workaround).

lldb/utils/TableGen/LLDBOptionDefEmitter.cpp
211	If you mean if we can just call it `OptionEnumSet` instead of `OptionEnumSettingsSet`, then I assume that could cause conflicts if we implement multiple smaller commands in the same file (which we currently do).

In D65386#1609875, @teemperor wrote:
In D65386#1604498, @labath wrote:
How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:
void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"
The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself
This seems like a lot of boilerplate when we have to write 300+ one-statement methods for assigning options.

If they would really be just one-liners, then this might still be an improvement over the current solution, because now you need at least three lines for each option:

case eFoo:
  foo = bar;
  break;

And this doesn't include the boilerplate around the switch statement -- i.e., checking the result of toOptionEnumSettingsSet -- it's not even clear to me under which circumstances can toOptionEnumSettingsSet` fail to return a value. Shouldn't the caller verify that it is calling us with the correct argument? If we're able to avoid that and just have the user code be:

switch(toOptionEnumSettingsSet(???)) {
...
}

then I would find the enum solution fine. However, right now, it seems more complicated than it ought to be..

Additionally, if setting the option is really that simple, then we could have tablegen generate that too (by just giving it a variable name to set), possibly with the option to fall back to a function for more complex options (which is better handled in a separate function anyway).

Also I would prefer to not use tablegen for generating executable code if possible because that is just hard to read (the function we generate here is already something I only consider as a temporary workaround).

I agree, but on the other hand, temporary workarounds have a habit of becoming permanent, so I'd like to avoid introducing a sub-optimal solution, if there's a better way to do that available..

In D65386#1609956, @labath wrote:
In D65386#1609875, @teemperor wrote:
In D65386#1604498, @labath wrote:
How about codegenning the entire implementation of SetOptionValue? That way the user won't have to write any switch statements at all. Ideally, the option-setting code would be something like:
void (Status?, Error?) SetOptionForce(StringRef arg, ExecutionContext *ctx) { m_force = true; }
void (Status?, Error?) SetOptionGlobal(StringRef arg, ExecutionContext *ctx) { m_global = true; }

#include "The_thing_which_generates_SetOptionValue.inc"
The generated implementation of SetOptionValue could be the same as the current one, except that it calls into these user-specified functions instead of setting the values itself
This seems like a lot of boilerplate when we have to write 300+ one-statement methods for assigning options.
If they would really be just one-liners, then this might still be an improvement over the current solution, because now you need at least three lines for each option:
case eFoo:
  foo = bar;
  break;

The setter function still seems more verbose than three short lines, but I don't have a strong opinion about that so setters are fine. Still, if we generate some function calling the appropriate setters, we will end up with control flow going through that function which I would like to avoid. Grepping for SetOptionGlobal for example would not yield any call location when people try to understand who calls these methods (which is one of the things that @jingham says he's worried about, correct me if I'm wrong).

And this doesn't include the boilerplate around the switch statement -- i.e., checking the result of toOptionEnumSettingsSet -- it's not even clear to me under which circumstances can toOptionEnumSettingsSet` fail to return a value. Shouldn't the caller verify that it is calling us with the correct argument? If we're able to avoid that and just have the user code be:
switch(toOptionEnumSettingsSet(???)) {
...
}

It fails on unrecognized options (which is currently handled in every single command separately). If we can lift this up (I hope every command actually handles invalid options in the same name) then the switch-syntax should be possible.

then I would find the enum solution fine. However, right now, it seems more complicated than it ought to be..

Additionally, if setting the option is really that simple, then we could have tablegen generate that too (by just giving it a variable name to set), possibly with the option to fall back to a function for more complex options (which is better handled in a separate function anyway).

Also I would prefer to not use tablegen for generating executable code if possible because that is just hard to read (the function we generate here is already something I only consider as a temporary workaround).

I agree, but on the other hand, temporary workarounds have a habit of becoming permanent, so I'd like to avoid introducing a sub-optimal solution, if there's a better way to do that available..

I think I phrased that badly. I meant that it's a workaround until the patch is ready to land, not the usual temporary fix(TM) :)

In D65386#1609986, @teemperor wrote:

In D65386#1609956, @labath wrote:

The setter function still seems more verbose than three short lines, but I don't have a strong opinion about that so setters are fine. Still, if we generate some function calling the appropriate setters, we will end up with control flow going through that function which I would like to avoid. Grepping for SetOptionGlobal for example would not yield any call location when people try to understand who calls these methods (which is one of the things that @jingham says he's worried about, correct me if I'm wrong).

I think we understand @jingham's concerns the same way. My answer to that would be that you'd find "SetOptionGlobal" somewhere in the tablegen file, which should be enough to signal that something tablegen-y is going on..

And this doesn't include the boilerplate around the switch statement -- i.e., checking the result of toOptionEnumSettingsSet -- it's not even clear to me under which circumstances can toOptionEnumSettingsSet` fail to return a value. Shouldn't the caller verify that it is calling us with the correct argument? If we're able to avoid that and just have the user code be:
switch(toOptionEnumSettingsSet(???)) {
...
}
It fails on unrecognized options (which is currently handled in every single command separately). If we can lift this up (I hope every command actually handles invalid options in the same name) then the switch-syntax should be possible.

Hmm.. how hard would it be to achieve that? If we can make that happen, then I think this might be the most reasonable compromise...

In fact, how sure are you that the unrecognised options are not already handled somewhere higher up? Doesn't all parsing already go through Options::Parse, which already checks for '?' (and other error-ish results from getopt). After that point, I'd hope that we can assume that the returned value does indeed correspond to some entry in our option vector...

I agree, but on the other hand, temporary workarounds have a habit of becoming permanent, so I'd like to avoid introducing a sub-optimal solution, if there's a better way to do that available..

I think I phrased that badly. I meant that it's a workaround until the patch is ready to land, not the usual temporary fix(TM) :)

Ah, ok, thanks for explaining that. I guess that would avoid generating code completely, but I think we should still try find a way to streamline the code that the user writes.

teemperor mentioned this in D66522: [lldb][NFC] Remove dead code that is supposed to handle invalid command options.Aug 21 2019, 3:18 AM

teemperor mentioned this in rG36162014c469: [lldb][NFC] Remove dead code that is supposed to handle invalid command options.Aug 22 2019, 1:12 AM

teemperor mentioned this in rL369625: [lldb][NFC] Remove dead code that is supposed to handle invalid command options.

(mark as WIP)

Revision Contents

Path

Size

lldb/

source/

Commands/

CommandObjectSettings.cpp

15 lines

utils/

TableGen/

LLDBOptionDefEmitter.cpp

48 lines

Diff 212134

lldb/source/Commands/CommandObjectSettings.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	insert-before or insert-after.");
public:		public:
CommandOptions() : Options(), m_global(false) {}		CommandOptions() : Options(), m_global(false) {}

~CommandOptions() override = default;		~CommandOptions() override = default;

Status SetOptionValue(uint32_t option_idx, llvm::StringRef option_arg,		Status SetOptionValue(uint32_t option_idx, llvm::StringRef option_arg,
ExecutionContext *execution_context) override {		ExecutionContext *execution_context) override {
Status error;		Status error;
const int short_option = m_getopt_table[option_idx].val;

switch (short_option) {		auto opt = toOptionEnumSettingsSet(m_getopt_table[option_idx].val, error);
case 'f':		if (!opt)
		return error;

		switch (*opt) {
		case OptionEnumSettingsSet::Force:
m_force = true;		m_force = true;
break;		break;
case 'g':		case OptionEnumSettingsSet::Global:
m_global = true;		m_global = true;
break;		break;
default:
error.SetErrorStringWithFormat("unrecognized options '%c'",
short_option);
break;
}		}

return error;		return error;
}		}

void OptionParsingStarting(ExecutionContext *execution_context) override {		void OptionParsingStarting(ExecutionContext *execution_context) override {
m_global = false;		m_global = false;
m_force = false;		m_force = false;
▲ Show 20 Lines • Show All 1,060 Lines • Show Last 20 Lines

lldb/utils/TableGen/LLDBOptionDefEmitter.cpp

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	if (!O.Description.empty()) {
OS << "\"";		OS << "\"";
llvm::printEscapedString(O.Description, OS);		llvm::printEscapedString(O.Description, OS);
OS << "\"";		OS << "\"";
} else		} else
OS << "\"\"";		OS << "\"\"";
OS << "},\n";		OS << "},\n";
}		}

		static std::string Capitalize(llvm::StringRef s) {
		return s.substr(0, 1).upper() + s.substr(1).str();
		}
		static std::string ToCamelCase(llvm::StringRef In) {
		llvm::SmallVector<StringRef, 4> Parts;
		llvm::SplitString(In, Parts, "-_ ");
		std::string Result;
		for (llvm::StringRef S : Parts)
		Result.append(Capitalize(S));
		return Result;
		}

		static void emitEnumDeclaration(llvm::StringRef EnumName,
		const std::vector<CommandOption> &Options,
		raw_ostream &OS) {
		OS << "namespace { enum class " << EnumName << " {\n";
		for (const CommandOption &CO : Options)
		OS << " " << ToCamelCase(CO.FullName) << ",\n";
		OS << "}; }\n";
		}

		static void emitEnumSwitch(llvm::StringRef EnumName,
		const std::vector<CommandOption> &Options,
		raw_ostream &OS) {
		OS << "static llvm::Optional<" << EnumName << "> to" << EnumName
		<< "(char c, Status error) {\n";
		OS << " switch(c) {";
		for (const CommandOption &CO : Options)
		OS << " case '" << CO.ShortName << "': return " << EnumName
		<< "::" << ToCamelCase(CO.FullName) << ";\n";
		OS << R"cpp(
		default:
		error.SetErrorStringWithFormat("unrecognized option '%c'", c);
		return {};
		)cpp";
		OS << " }\n}\n";
		}

/// Emits all option initializers to the raw_ostream.		/// Emits all option initializers to the raw_ostream.
static void emitOptions(std::string Command, std::vector<Record *> Records,		static void emitOptions(std::string Command, std::vector<Record *> Records,
raw_ostream &OS) {		raw_ostream &OS) {
std::vector<CommandOption> Options;		std::vector<CommandOption> Options;
for (Record *R : Records)		for (Record *R : Records)
Options.emplace_back(R);		Options.emplace_back(R);

std::string ID = Command;		std::string ID = Command;
std::replace(ID.begin(), ID.end(), ' ', '_');		std::replace(ID.begin(), ID.end(), ' ', '_');

		std::string CamelCaseID = ToCamelCase(Command);
		JDevlieghereUnsubmitted Not Done Reply Inline Actions Can we use the option name instead, like I did for the properties? Or would that cause conflicts? JDevlieghere: Can we use the option name instead, like I did for the properties? Or would that cause…
		teemperorAuthorUnsubmitted Done Reply Inline Actions If you mean if we can just call it `OptionEnumSet` instead of `OptionEnumSettingsSet`, then I assume that could cause conflicts if we implement multiple smaller commands in the same file (which we currently do). teemperor: If you mean if we can just call it `OptionEnumSet` instead of `OptionEnumSettingsSet`, then I…

// Generate the macro that the user needs to define before including the		// Generate the macro that the user needs to define before including the
// *.inc file.		// *.inc file.
std::string NeededMacro = "LLDB_OPTIONS_" + ID;		std::string NeededMacro = "LLDB_OPTIONS_" + ID;

// All options are in one file, so we need put them behind macros and ask the		// All options are in one file, so we need put them behind macros and ask the
// user to define the macro for the options that are needed.		// user to define the macro for the options that are needed.
OS << "// Options for " << Command << "\n";		OS << "// Options for " << Command << "\n";
OS << "#ifdef " << NeededMacro << "\n";		OS << "#ifdef " << NeededMacro << "\n";
OS << "constexpr static OptionDefinition g_" + ID + "_options[] = {\n";		OS << "constexpr static OptionDefinition g_" + ID + "_options[] = {\n";
for (CommandOption &CO : Options)		for (CommandOption &CO : Options)
emitOption(CO, OS);		emitOption(CO, OS);
// We undefine the macro for the user like Clang's include files are doing it.
OS << "};\n";		OS << "};\n";

		std::string Enum = "OptionEnum" + CamelCaseID;
		emitEnumDeclaration(Enum, Options, OS);
		emitEnumSwitch(Enum, Options, OS);

		// We undefine the macro for the user like Clang's include files are doing it.
OS << "#undef " << NeededMacro << "\n";		OS << "#undef " << NeededMacro << "\n";
OS << "#endif // " << Command << " command\n\n";		OS << "#endif // " << Command << " command\n\n";
}		}

void lldb_private::EmitOptionDefs(RecordKeeper &Records, raw_ostream &OS) {		void lldb_private::EmitOptionDefs(RecordKeeper &Records, raw_ostream &OS) {

std::vector<Record *> Options = Records.getAllDerivedDefinitions("Option");		std::vector<Record *> Options = Records.getAllDerivedDefinitions("Option");

emitSourceFileHeader("Options for LLDB command line commands.", OS);		emitSourceFileHeader("Options for LLDB command line commands.", OS);

RecordsByCommand ByCommand = getCommandList(Options);		RecordsByCommand ByCommand = getCommandList(Options);

for (auto &CommandRecordPair : ByCommand) {		for (auto &CommandRecordPair : ByCommand) {
emitOptions(CommandRecordPair.first, CommandRecordPair.second, OS);		emitOptions(CommandRecordPair.first, CommandRecordPair.second, OS);
}		}
}		}