Page MenuHomePhabricator

Sketch support for generating CC1 command line from CompilerInvocation

Authored by dang on May 12 2020, 11:00 AM.



This is an add-on to the RFC at

This introduces optional additional information in Option records that can
specify a mapping between an option and a field of a class (used here for
Also it adds a GenerateCC1CommandLine method to Compiler invocation that uses
the TableGen definitions to generate command line argument strings.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Bigcheese added inline comments.May 19 2020, 4:41 PM

It's a little sad that we need to allocation every string just because of the -. We definitely need to be able to allocate strings for options with data, but it would be good if we could just have the strings with - prefixed in the option table if that's reasonable to do.

dang updated this revision to Diff 266195.May 26 2020, 6:28 AM

Address some code review comments and sketch support for options that take a single value.

dang marked 5 inline comments as done.May 26 2020, 6:40 AM
dang added inline comments.

You're right updated the wording to be more accurate


I updated the patch to handle options that take a value. This doesn't show how to handle everything yet (it is missing how to handle options that take multiple values notably) although there are not that many of them (a quick skim of the code seems to indicate there are less than 100)

dang marked an inline comment as done.May 26 2020, 6:47 AM
Bigcheese added inline comments.Jun 3 2020, 5:11 PM

Is there a reason for this to be a member of CompilerInvocation? The rest of the argument parsing functions are static file local functions.


Put formatting fixups in a separate commit.


This reminded me that some of the -cc1 arguments are positional such as -xc++. I think we'll need custom handling for INPUT arguments, although we don't need them (or want them) for the modules use case. Do you have any thoughts on how to handle them? I don't think answering that should block the initial patch though.


I'm not sure Required is a great name here. I initially read that as the option was required, not that it should always be emitted.


I noticed that this isn't being used for the enum case you added (-mrelocation-model). Do you intend to use it? You should either use it in this patch or remove it until it actually is used.

dang updated this revision to Diff 268746.Jun 5 2020, 4:51 AM
dang marked 3 inline comments as done.

Address some code review feedback

dang marked 4 inline comments as done.Jun 5 2020, 4:55 AM
dang added inline comments.

All the keypaths have to be "relative" to CompilerInvocation so that different options can be processed uniformly i.e. we don't want to add extra specialization to the table-gen file that indicates if an option is for the analyzer or for codegen e.t.c. Some of the members of CompilerInvocation that store options are private, which means that if parseSimpleArgs which precludes it from being an free function with internal linkage. Unless you think all the CompilerInvocation members should be public, but that is a different can of worms.


-x flags are no INPUT arguments as in it can appear anywhere but will apply to the inputs. The only part of CompilerInvocation that uses positional arguments is the input file(s) which is quite small (21 LOC) so I am happy to leave it as is, having a few line to add it at the end of the command line is not a big problem IMO.
The thing I am more worried about is that the processing for some options depends on other options having already been processed, I was planning to reorder them accordingly in the td file although I am not a fan of doing this, I think it would be better to addd the information using table-gen's DAG support although that would complicated the table-gen backend.


Yeah, I forgot to remove it, since the normalizer scheme subsumes the functionality.

dang updated this revision to Diff 268869.Jun 5 2020, 10:26 AM
dang marked an inline comment as done.

Updating the patch with the correct merge base

dang updated this revision to Diff 268872.Jun 5 2020, 10:35 AM

This is the good diff

I have a couple of drive-by suggestions, up to @dang and @Bigcheese whether to incorporate them.


I wonder if it's worth creating a .def file for the driver enums, something like:

HANDLE_RELOCATION_MODEL("static", llvm::Reloc::Static)
HANDLE_RELOCATION_MODEL("ropi-rwpio", llvm::Reloc::ROPI_RWPI)
HANDLE_RELOCATION_MODEL("dynamic-no-pic", llvm::Reloc::DynamicNoPIC)

HANDLE_DEBUG_INFO_KIND("line-tables-only", codegenoptions::DebugLineTablesOnly)
HANDLE_DEBUG_INFO_KIND("line-directives-only", codegenoptions::DebugDirectivesOnly)
HANDLE_DEBUG_INFO_KIND("limited", codegenoptions::LimitedDebugInfo)
HANDLE_DEBUG_INFO_KIND("standalone", codegenoptions::FullDebugInfo)

// ...

Then you can use HANDLE_RELOCATION_MODEL in both normalize and denormalize, rather than duplicating the table.

Maybe we can go even further. Can you expand the Values array from the tablegen to include this info? Or rejigger the help text to leverage HANDLE_RELOCATION_MODEL (maybe pass in ValuesDefine<HANDLE_RELOCATION_MODEL>)? The current patch adds a value table; my first suggestion leaves us even; but maybe we can get rid of one.


Seems like could provide this as a default definition, not sure if that seems error-prone?


I prefer the style where the .inc file is responsible for the #undef calls. It avoids having to duplicate it everywhere (and the risk of forgetting it). WDYT?

dang marked 3 inline comments as done.Jun 8 2020, 7:18 AM
dang added inline comments.

I think this suggestion, I can definitely do at least this to generate the necessary switch statements. I don't see why this should be a separate .def file, I can generate this from the tablegen with an extra field named ValuesAssociatedDefinitions or something to that effect. I'll do that now.


Not sure how much of this do you mean? OPTION_WITH_MARSHALLING is a convenience thing that forces users to opt into getting the marshalling definitions. I think it might be better to provide default empty definitions for OPTION_WITH_MARSHALLING_FLAG and friends to achieve a similar effect without needing OPTION_WITH_MARSHALLING. I you mean the actual definitions here they rely on the ArgList & to be named Args which might make it error prone to include these as defaults.


I prefer it too, but the current tablegen backend doesn't generate them for other macros it defines, so I wanted to make it consistent... I could change the existing backend to work that way but I would then need to track all the usages of this across all the llvm-projects which I don't fancy doing. Let me know if you thing it is fine to break with the existing style for this one.

dexonsmith added inline comments.Jun 8 2020, 1:25 PM

Right, I just mean you shouldn't have to define OPTION_WITH_MARSHALLING here.


It's probably pretty easy to clean up the existing ones, just grep for #undef of any of the defined macros and delete the lines. But if you don't have time I'm fine with you leaving it as-is (i.e. it seems good to be consistent with the rest of the backend).

dang updated this revision to Diff 269926.Jun 10 2020, 11:43 AM

This address a bunch of existing feedback and makes the TableGen definitions of the additional information significantly nicer to write.

dexonsmith added inline comments.Jun 10 2020, 12:20 PM

Maybe you can avoid boilerplate with something like this:

ValuesAssociatedDefinitions<["Static", "PIC_", "RPOI", "RWPI", "ROPI_RWPI", "DynamicNoPIC"]>,
MarshallingInfoString<"CodeGenOpts.RelocationModel", "PIC_">,

Note that I shortened the second argument to MarshallingInfoString as well.

@Bigcheese, WDYT?

Also wondering if ValuesAssociatedDefinitions could be shorter. Maybe NormalizedValues (and then NormalizedValuesScope)?


Should we drop parameter names for ArgList, Diags, and DefaultTriple?


This looks like boilerplate that's going to be repeated a lot. I'm curious how many of the enum options will end up like this.

If more than a few, maybe we can add a .inc file that includes these function definitions? Triggered by adding:

AutoNormalizeEnum<"llvm::Reloc::Model", "RelocationModel">,

or something (instead of Normalizer<> and Denormalizer<>).

dang updated this revision to Diff 270117.Thu, Jun 11, 6:19 AM

This addresses the usability concern with having to specify fully scoped definitions for normalized values in the TableGen files.

dang marked an inline comment as done.Thu, Jun 11, 10:46 AM
dang updated this revision to Diff 270445.Fri, Jun 12, 10:14 AM
dang edited the summary of this revision. (Show Details)

Implemented a draft of normalizer generation for simple enum based options

dexonsmith added inline comments.Fri, Jun 12, 5:48 PM

I want to highlight @Bigcheese's comment again so it doesn't get lost. I think a reasonable name would be SPELLING.


The denormalizer should take the StringAllocator as an argument, and only allocate when necessary.


Since this is meant to be included exactly once (as opposed to being an x-macro), perhaps this should be split out into a separate .inc.


It seems unnecessary to generate macros here; you can just generate the code directly...

But looking more closely, I wonder if we really need this generated code at all.

Instead, can we create a table like this in

struct EnumValue {
  const char *Name;
  unsigned Value;

struct EnumValueTable {
  const EnumValue *Table;
  unsigned Size;

static const EnumValue RelocationModelTable[] = {
  {"static", static_cast<unsigned>(llvm::Reloc::Static)},
  {"pic", static_cast<unsigned>(llvm::Reloc::_PIC)},
  // ...

static const EnumValue SomeOtherTable[] = {
  {"name", static_cast<unsigned>(clang::SomeOtherEnumValue)},
  // ...

static const EnumValueTable *EnumValueTables[] = {
  {&RelocationModelTable, sizeof(RelocationModelTable) / sizeof(EnumValue)},
  {&SomeOtherTable, sizeof(SomeOtherTable) / sizeof(EnumValue)},
static const unsigned EnumValueTablesSize =
    sizeof(EnumValueTables) / sizeof(EnumValueTable);

We can then have this code directly in Options.cpp:

static llvm::Optional<unsigned> extractSimpleEnum(
    llvm::opt::OptSpecifier Opt, int TableIndex,
    const ArgList &ArgList, DiagnosticsEngine &Diags) {
  assert(TableIndex >= 0);
  assert(TableIndex < EnumValueTablesSize);
  const EnumValueTable &Table = *EnumValueTables[TableIndex];

  auto Arg = Args.getLastArg(Opt);
  if (!Arg)
    return None;

  StringRef ArgValue = Arg->getValue();
  for (int I = 0, E = Table.Size; I != E; ++I)
    if (ArgValue == Table.Table[I].Name)
      return Table.Table[I].Value;

      << Arg->getAsString(ArgList) << ArgValue;
  return None;

and change the normalizer contract to:

  • return an Optional,
  • take an llvm::opt::OptSpecifier,
  • take an index into a table (set to -1 if there's no relevant table), and
  • be responsible for the boilerplate call to getLastArg.

Simple enums would have NORMALIZER hardcoded to extractSimpleEnum. The handler code that calls it needs new arguments TABLE_INDEX and TYPE but also looks simpler this way:

if (auto MaybeValue = NORMALIZER(Opt##ID, TABLE_INDEX, Args, Diags)) \
  this->KEYPATH = static_cast<TYPE>(*MaybeValue);                    \
else                                                                 \

We could similarly have a single serializeSimpleEnum function in Options.cpp for a hardcoded DENORMALIZER for simple enums:

static const char *serializeSimpleEnum(int TableIndex, unsigned Value) {
  assert(TableIndex >= 0);
  assert(TableIndex < EnumValueTablesSize);
  const EnumValueTable &Table = *EnumValueTables[TableIndex];
  for (int I = 0, E = Table.Size; I != E; ++I)
    if (Value == Table.Table[I].Value)
      return Table.Table[I].Name;

  llvm::report_fatal_error("good error message");
dang updated this revision to Diff 270802.Mon, Jun 15, 10:53 AM
dang marked an inline comment as done.

Implement constant table based marshaling for simple enum based options.

dang marked 2 inline comments as done.Mon, Jun 15, 10:58 AM
dang added inline comments.

Yes I haven't forgotten just have not gotten around to it yet. I just want to do some form of suffix merging with the string the the OptTable receives


I originally wanted to generate the normalizers without the table to allow and arbitrary code fragment to be used as the normalized value (one that is potentially not a constant evaluated expression) and just regenerating the macros was the path of least resistance at the time. Having had a closer look at the existing cases of enum based options there doesn't seem to be a need for this functionality so I went ahead and did this. Although I haven't split the backends yet.

dang updated this revision to Diff 271006.Tue, Jun 16, 2:57 AM
dang marked 5 inline comments as done and an inline comment as not done.

Implement suffix merging to avoid allocations for option spelling when generating the command line. This update also removes unnecessary allocation for std::string based options and for enum based options.

dang updated this revision to Diff 271392.Wed, Jun 17, 9:38 AM

Fixed a couple of bugs.

dang updated this revision to Diff 271790.Thu, Jun 18, 11:25 AM

Address the clang-tidy issues.

dang updated this revision to Diff 271942.Fri, Jun 19, 12:57 AM
dang updated this revision to Diff 272125.Fri, Jun 19, 10:20 AM

Allocate string when denormalizing an option backed by an std::string.

Bigcheese added inline comments.Fri, Jun 19, 2:34 PM



llvm::report_fatal_error is only used for code paths we actually expect to potentially be hit. For programming errors we instead use assert or llvm_unreachable. See .


Just to verify, this doesn't change the output for options that don't have marshalling info, right? Just want to make sure this doesn't change anything for other users of libOption.


What happens if there are none?

dang marked 3 inline comments as done.Sun, Jun 21, 10:28 PM
dang added inline comments.

Yes of course all the previous behavior is in the WriteOptRecordFields lambda capture. This adds an entirely new macro table and leaves the existing one untouched.


There is never going to be none as there currently already is an enum-based option that uses this setup, but it would still be nice to have a guard for that. Getting on it...

dang marked 2 inline comments as done.Sun, Jun 21, 10:38 PM
dang added inline comments.

Actually this does not need any changes, it will correctly report 0 if there are no value tables.

dang updated this revision to Diff 272337.Sun, Jun 21, 11:32 PM

Address code review feedback.

dang marked 11 inline comments as done and an inline comment as not done.Sun, Jun 21, 11:35 PM
dang updated this revision to Diff 272416.Mon, Jun 22, 7:06 AM

Add a KeyPathPrefix field to factor out common key path prefixes, for example all CodeGenOpts.

Bigcheese accepted this revision.Tue, Jun 23, 10:49 AM

LGTM with KeyPathPrefix moved to the patch that actually uses it.

This revision is now accepted and ready to land.Tue, Jun 23, 10:49 AM
This revision was automatically updated to reflect the committed changes.