Page MenuHomePhabricator

Sketch support for generating CC1 command line from CompilerInvocation
ClosedPublic

Authored by dang on May 12 2020, 11:00 AM.

Details

Summary

This is an add-on to the RFC at http://lists.llvm.org/pipermail/cfe-dev/2020-May/065421.html

This introduces optional additional information in Option records that can
specify a mapping between an option and a field of a class (used here for
CompilerInvocation).
Also it adds a GenerateCC1CommandLine method to Compiler invocation that uses
the TableGen definitions to generate command line argument strings.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Bigcheese added inline comments.May 19 2020, 4:41 PM
clang/lib/Frontend/CompilerInvocation.cpp
3939

It's a little sad that we need to allocation every string just because of the -. We definitely need to be able to allocate strings for options with data, but it would be good if we could just have the strings with - prefixed in the option table if that's reasonable to do.

dang updated this revision to Diff 266195.May 26 2020, 6:28 AM

Address some code review comments and sketch support for options that take a single value.

dang marked 5 inline comments as done.May 26 2020, 6:40 AM
dang added inline comments.
clang/include/clang/Frontend/CompilerInvocation.h
156

You're right updated the wording to be more accurate

clang/lib/Frontend/CompilerInvocation.cpp
3704

I updated the patch to handle options that take a value. This doesn't show how to handle everything yet (it is missing how to handle options that take multiple values notably) although there are not that many of them (a quick skim of the code seems to indicate there are less than 100)

dang marked an inline comment as done.May 26 2020, 6:47 AM
Bigcheese added inline comments.Jun 3 2020, 5:11 PM
clang/include/clang/Frontend/CompilerInvocation.h
247

Is there a reason for this to be a member of CompilerInvocation? The rest of the argument parsing functions are static file local functions.

clang/lib/Frontend/CompilerInvocation.cpp
3700

Put formatting fixups in a separate commit.

clang/unittests/Frontend/CompilerInvocationTest.cpp
42

This reminded me that some of the -cc1 arguments are positional such as -xc++. I think we'll need custom handling for INPUT arguments, although we don't need them (or want them) for the modules use case. Do you have any thoughts on how to handle them? I don't think answering that should block the initial patch though.

llvm/include/llvm/Option/OptParser.td
151

I'm not sure Required is a great name here. I initially read that as the option was required, not that it should always be emitted.

165–169

I noticed that this isn't being used for the enum case you added (-mrelocation-model). Do you intend to use it? You should either use it in this patch or remove it until it actually is used.

dang updated this revision to Diff 268746.Jun 5 2020, 4:51 AM
dang marked 3 inline comments as done.

Address some code review feedback

dang marked 4 inline comments as done.Jun 5 2020, 4:55 AM
dang added inline comments.
clang/include/clang/Frontend/CompilerInvocation.h
247

All the keypaths have to be "relative" to CompilerInvocation so that different options can be processed uniformly i.e. we don't want to add extra specialization to the table-gen file that indicates if an option is for the analyzer or for codegen e.t.c. Some of the members of CompilerInvocation that store options are private, which means that if parseSimpleArgs which precludes it from being an free function with internal linkage. Unless you think all the CompilerInvocation members should be public, but that is a different can of worms.

clang/unittests/Frontend/CompilerInvocationTest.cpp
42

-x flags are no INPUT arguments as in it can appear anywhere but will apply to the inputs. The only part of CompilerInvocation that uses positional arguments is the input file(s) which is quite small (21 LOC) so I am happy to leave it as is, having a few line to add it at the end of the command line is not a big problem IMO.
The thing I am more worried about is that the processing for some options depends on other options having already been processed, I was planning to reorder them accordingly in the td file although I am not a fan of doing this, I think it would be better to addd the information using table-gen's DAG support although that would complicated the table-gen backend.

llvm/include/llvm/Option/OptParser.td
165–169

Yeah, I forgot to remove it, since the normalizer scheme subsumes the functionality.

dang updated this revision to Diff 268869.Jun 5 2020, 10:26 AM
dang marked an inline comment as done.

Updating the patch with the correct merge base

dang updated this revision to Diff 268872.Jun 5 2020, 10:35 AM

This is the good diff

I have a couple of drive-by suggestions, up to @dang and @Bigcheese whether to incorporate them.

clang/lib/Frontend/CompilerInvocation.cpp
136–141

I wonder if it's worth creating a .def file for the driver enums, something like:

#ifdef HANDLE_RELOCATION_MODEL
HANDLE_RELOCATION_MODEL("static", llvm::Reloc::Static)
HANDLE_RELOCATION_MODEL("pic", llvm::Reloc::PIC_)
HANDLE_RELOCATION_MODEL("ropi", llvm::Reloc::ROPI)
HANDLE_RELOCATION_MODEL("rwpi", llvm::Reloc::RWPI)
HANDLE_RELOCATION_MODEL("ropi-rwpio", llvm::Reloc::ROPI_RWPI)
HANDLE_RELOCATION_MODEL("dynamic-no-pic", llvm::Reloc::DynamicNoPIC)
#undef HANDLE_RELOCATION_MODEL
#endif // HANDLE_RELOCATION_MODEL

#ifdef HANDLE_DEBUG_INFO_KIND
HANDLE_DEBUG_INFO_KIND("line-tables-only", codegenoptions::DebugLineTablesOnly)
HANDLE_DEBUG_INFO_KIND("line-directives-only", codegenoptions::DebugDirectivesOnly)
HANDLE_DEBUG_INFO_KIND("limited", codegenoptions::LimitedDebugInfo)
HANDLE_DEBUG_INFO_KIND("standalone", codegenoptions::FullDebugInfo)
#undef HANDLE_DEBUG_INFO_KIND
#endif // HANDLE_DEBUG_INFO_KIND

// ...

Then you can use HANDLE_RELOCATION_MODEL in both normalize and denormalize, rather than duplicating the table.

Maybe we can go even further. Can you expand the Values array from the tablegen to include this info? Or rejigger the help text to leverage HANDLE_RELOCATION_MODEL (maybe pass in ValuesDefine<HANDLE_RELOCATION_MODEL>)? The current patch adds a value table; my first suggestion leaves us even; but maybe we can get rid of one.

3665

Seems like Options.inc could provide this as a default definition, not sure if that seems error-prone?

3683–3685

I prefer the style where the .inc file is responsible for the #undef calls. It avoids having to duplicate it everywhere (and the risk of forgetting it). WDYT?

dang marked 3 inline comments as done.Jun 8 2020, 7:18 AM
dang added inline comments.
clang/lib/Frontend/CompilerInvocation.cpp
136–141

I think this suggestion, I can definitely do at least this to generate the necessary switch statements. I don't see why this should be a separate .def file, I can generate this from the tablegen with an extra field named ValuesAssociatedDefinitions or something to that effect. I'll do that now.

3665

Not sure how much of this do you mean? OPTION_WITH_MARSHALLING is a convenience thing that forces users to opt into getting the marshalling definitions. I think it might be better to provide default empty definitions for OPTION_WITH_MARSHALLING_FLAG and friends to achieve a similar effect without needing OPTION_WITH_MARSHALLING. I you mean the actual definitions here they rely on the ArgList & to be named Args which might make it error prone to include these as defaults.

3683–3685

I prefer it too, but the current tablegen backend doesn't generate them for other macros it defines, so I wanted to make it consistent... I could change the existing backend to work that way but I would then need to track all the usages of this across all the llvm-projects which I don't fancy doing. Let me know if you thing it is fine to break with the existing style for this one.

dexonsmith added inline comments.Jun 8 2020, 1:25 PM
clang/lib/Frontend/CompilerInvocation.cpp
3665

Right, I just mean you shouldn't have to define OPTION_WITH_MARSHALLING here.

3683–3685

It's probably pretty easy to clean up the existing ones, just grep for #undef of any of the defined macros and delete the lines. But if you don't have time I'm fine with you leaving it as-is (i.e. it seems good to be consistent with the rest of the backend).

dang updated this revision to Diff 269926.Jun 10 2020, 11:43 AM

This address a bunch of existing feedback and makes the TableGen definitions of the additional information significantly nicer to write.

dexonsmith added inline comments.Jun 10 2020, 12:20 PM
clang/include/clang/Driver/CC1Options.td
216–222

Maybe you can avoid boilerplate with something like this:

ValuesAssociatedDefinitionsScope<"llvm::Reloc">,
ValuesAssociatedDefinitions<["Static", "PIC_", "RPOI", "RWPI", "ROPI_RWPI", "DynamicNoPIC"]>,
MarshallingInfoString<"CodeGenOpts.RelocationModel", "PIC_">,

Note that I shortened the second argument to MarshallingInfoString as well.

@Bigcheese, WDYT?

Also wondering if ValuesAssociatedDefinitions could be shorter. Maybe NormalizedValues (and then NormalizedValuesScope)?

clang/lib/Frontend/CompilerInvocation.cpp
124–126

Should we drop parameter names for ArgList, Diags, and DefaultTriple?

149–157

This looks like boilerplate that's going to be repeated a lot. I'm curious how many of the enum options will end up like this.

If more than a few, maybe we can add a .inc file that includes these function definitions? Triggered by adding:

AutoNormalizeEnum<"llvm::Reloc::Model", "RelocationModel">,

or something (instead of Normalizer<> and Denormalizer<>).

dang updated this revision to Diff 270117.Thu, Jun 11, 6:19 AM

This addresses the usability concern with having to specify fully scoped definitions for normalized values in the TableGen files.

dang marked an inline comment as done.Thu, Jun 11, 10:46 AM
dang updated this revision to Diff 270445.Fri, Jun 12, 10:14 AM
dang edited the summary of this revision. (Show Details)

Implemented a draft of normalizer generation for simple enum based options

dexonsmith added inline comments.Fri, Jun 12, 5:48 PM
clang/lib/Frontend/CompilerInvocation.cpp
3939

I want to highlight @Bigcheese's comment again so it doesn't get lost. I think a reasonable name would be SPELLING.

3951

The denormalizer should take the StringAllocator as an argument, and only allocate when necessary.

llvm/utils/TableGen/OptParserEmitter.cpp
488

Since this is meant to be included exactly once (as opposed to being an x-macro), perhaps this should be split out into a separate .inc.

512–515

It seems unnecessary to generate macros here; you can just generate the code directly...

But looking more closely, I wonder if we really need this generated code at all.

Instead, can we create a table like this in Options.inc?

struct EnumValue {
  const char *Name;
  unsigned Value;
};

struct EnumValueTable {
  const EnumValue *Table;
  unsigned Size;
};

static const EnumValue RelocationModelTable[] = {
  {"static", static_cast<unsigned>(llvm::Reloc::Static)},
  {"pic", static_cast<unsigned>(llvm::Reloc::_PIC)},
  // ...
};

static const EnumValue SomeOtherTable[] = {
  {"name", static_cast<unsigned>(clang::SomeOtherEnumValue)},
  // ...
};

static const EnumValueTable *EnumValueTables[] = {
  {&RelocationModelTable, sizeof(RelocationModelTable) / sizeof(EnumValue)},
  {&SomeOtherTable, sizeof(SomeOtherTable) / sizeof(EnumValue)},
};
static const unsigned EnumValueTablesSize =
    sizeof(EnumValueTables) / sizeof(EnumValueTable);

We can then have this code directly in Options.cpp:

static llvm::Optional<unsigned> extractSimpleEnum(
    llvm::opt::OptSpecifier Opt, int TableIndex,
    const ArgList &ArgList, DiagnosticsEngine &Diags) {
  assert(TableIndex >= 0);
  assert(TableIndex < EnumValueTablesSize);
  const EnumValueTable &Table = *EnumValueTables[TableIndex];

  auto Arg = Args.getLastArg(Opt);
  if (!Arg)
    return None;

  StringRef ArgValue = Arg->getValue();
  for (int I = 0, E = Table.Size; I != E; ++I)
    if (ArgValue == Table.Table[I].Name)
      return Table.Table[I].Value;

  Diags.Report(diag::err_drv_invalid_value)
      << Arg->getAsString(ArgList) << ArgValue;
  return None;
}

and change the normalizer contract to:

  • return an Optional,
  • take an llvm::opt::OptSpecifier,
  • take an index into a table (set to -1 if there's no relevant table), and
  • be responsible for the boilerplate call to getLastArg.

Simple enums would have NORMALIZER hardcoded to extractSimpleEnum. The handler code that calls it needs new arguments TABLE_INDEX and TYPE but also looks simpler this way:

if (auto MaybeValue = NORMALIZER(Opt##ID, TABLE_INDEX, Args, Diags)) \
  this->KEYPATH = static_cast<TYPE>(*MaybeValue);                    \
else                                                                 \
  this->KEYPATH = DEFAULT_VALUE;

We could similarly have a single serializeSimpleEnum function in Options.cpp for a hardcoded DENORMALIZER for simple enums:

static const char *serializeSimpleEnum(int TableIndex, unsigned Value) {
  assert(TableIndex >= 0);
  assert(TableIndex < EnumValueTablesSize);
  const EnumValueTable &Table = *EnumValueTables[TableIndex];
  for (int I = 0, E = Table.Size; I != E; ++I)
    if (Value == Table.Table[I].Value)
      return Table.Table[I].Name;

  llvm::report_fatal_error("good error message");
}
dang updated this revision to Diff 270802.Mon, Jun 15, 10:53 AM
dang marked an inline comment as done.

Implement constant table based marshaling for simple enum based options.

dang marked 2 inline comments as done.Mon, Jun 15, 10:58 AM
dang added inline comments.
clang/lib/Frontend/CompilerInvocation.cpp
3939

Yes I haven't forgotten just have not gotten around to it yet. I just want to do some form of suffix merging with the string the the OptTable receives

llvm/utils/TableGen/OptParserEmitter.cpp
512–515

I originally wanted to generate the normalizers without the table to allow and arbitrary code fragment to be used as the normalized value (one that is potentially not a constant evaluated expression) and just regenerating the macros was the path of least resistance at the time. Having had a closer look at the existing cases of enum based options there doesn't seem to be a need for this functionality so I went ahead and did this. Although I haven't split the backends yet.

dang updated this revision to Diff 271006.Tue, Jun 16, 2:57 AM
dang marked 5 inline comments as done and an inline comment as not done.

Implement suffix merging to avoid allocations for option spelling when generating the command line. This update also removes unnecessary allocation for std::string based options and for enum based options.

dang updated this revision to Diff 271392.Wed, Jun 17, 9:38 AM

Fixed a couple of bugs.

dang updated this revision to Diff 271790.Thu, Jun 18, 11:25 AM

Address the clang-tidy issues.

dang updated this revision to Diff 271942.Fri, Jun 19, 12:57 AM
dang updated this revision to Diff 272125.Fri, Jun 19, 10:20 AM

Allocate string when denormalizing an option backed by an std::string.

Bigcheese added inline comments.Fri, Jun 19, 2:34 PM
clang/include/clang/Frontend/CompilerInvocation.h
193

inserting

clang/lib/Frontend/CompilerInvocation.cpp
159–160

llvm::report_fatal_error is only used for code paths we actually expect to potentially be hit. For programming errors we instead use assert or llvm_unreachable. See https://llvm.org/docs/ProgrammersManual.html#error-handling .

llvm/utils/TableGen/OptParserEmitter.cpp
430–440

Just to verify, this doesn't change the output for options that don't have marshalling info, right? Just want to make sure this doesn't change anything for other users of libOption.

468–469

What happens if there are none?

dang marked 3 inline comments as done.Sun, Jun 21, 10:28 PM
dang added inline comments.
llvm/utils/TableGen/OptParserEmitter.cpp
430–440

Yes of course all the previous behavior is in the WriteOptRecordFields lambda capture. This adds an entirely new macro table and leaves the existing one untouched.

468–469

There is never going to be none as there currently already is an enum-based option that uses this setup, but it would still be nice to have a guard for that. Getting on it...

dang marked 2 inline comments as done.Sun, Jun 21, 10:38 PM
dang added inline comments.
llvm/utils/TableGen/OptParserEmitter.cpp
468–469

Actually this does not need any changes, it will correctly report 0 if there are no value tables.

dang updated this revision to Diff 272337.Sun, Jun 21, 11:32 PM

Address code review feedback.

dang marked 11 inline comments as done and an inline comment as not done.Sun, Jun 21, 11:35 PM
dang updated this revision to Diff 272416.Mon, Jun 22, 7:06 AM

Add a KeyPathPrefix field to factor out common key path prefixes, for example all CodeGenOpts.

Bigcheese accepted this revision.Tue, Jun 23, 10:49 AM

LGTM with KeyPathPrefix moved to the patch that actually uses it.

This revision is now accepted and ready to land.Tue, Jun 23, 10:49 AM
This revision was automatically updated to reflect the committed changes.