Page MenuHomePhabricator

[clang][cli] Port CodeGen option flags to new option parsing system
ClosedPublic

Authored by jansvoboda11 on Jul 15 2020, 11:08 AM.

Diff Detail

Event Timeline

dang created this revision.Jul 15 2020, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2020, 11:08 AM
jansvoboda11 commandeered this revision.Nov 25 2020, 12:41 AM
jansvoboda11 added a reviewer: dang.
jansvoboda11 added a subscriber: jansvoboda11.

Taking over this patch as Daniel is no longer involved.

jansvoboda11 retitled this revision from Port CodeGen option flags to new option parsing system to [clang][cli] Port CodeGen option flags to new option parsing system.
jansvoboda11 edited the summary of this revision. (Show Details)

Rebase, undo unnecessary moves, port recently added options, add TODOs (WIP)

Introduce OptOutPositiveFFlag, simplify a couple of boolean options with multiclasses

Add squashed commits

Port AAPCS flags, add tests

Remove whitespace change

Revert accidental help change, do not AlwaysEmit

Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2020, 12:57 AM

This is now ready to be reviewed. If you see a deleted option in the diff, it's either because it was moved closer to its counterpart, or because it's now generated by a multiclass.
I've added a bunch of TODOs I plan to address in a future patch, most of them aim to compress two options into a single multiclass that describes their relationship (e.g. OptOutPositiveFlag).
I'd like to make OptInFFlag et al. modular with something like OptInFlag that does not imply Flags<[CC1Option]> and Group<f_Group> and can be marked with IsPositive or IsNegative.

This is now ready to be reviewed. If you see a deleted option in the diff, it's either because it was moved closer to its counterpart, or because it's now generated by a multiclass.
I've added a bunch of TODOs I plan to address in a future patch, most of them aim to compress two options into a single multiclass that describes their relationship (e.g. OptOutPositiveFlag).
I'd like to make OptInFFlag et al. modular with something like OptInFlag that does not imply Flags<[CC1Option]> and Group<f_Group> and can be marked with IsPositive or IsNegative.

I think it will be easier to review if we work out the "right" scheme, land prep patches so the existing options match that scheme, and then land the CodeGen options with whatever additional functionality is useful. I think there are really two things to decide on for the scheme: what are the axes, and how are they represented by multiclass?

For the axes, I think this patch changes them to:

  • matching of command-line+storage polarity: do the command-line and the storage have the same polarity? ("opt-in" means: "default==false" == "-fno-flag is implied").
  • default storage value: does the command-line default correspond to false or true?

I find this a bit hard to reason about. I feel like "opt-in" vs. "opt-out" should refer purely to command-line option behaviour, independently of storage implementation. Then the second axis can map that somehow to storage / marshalling.

IIUC, the previous / in-tree axes are:

  • command-line behaviour: is "flag" opt-in or opt-out in -cc1? (opt-in implies -fno-flag, opt-out implies -fflag)
  • default storage value: does the command-line default correspond to false or true? (I think "Positive" means "default is false"?)

For me, that's a bit easier to reason about since the axes seem quite independent. Another option (I'm not sure it's as good) is:

  • command-line behaviour: is "flag" opt-in or opt-out in -cc1? (opt-in implies -fno-flag, opt-out implies -fflag)
  • flag storage value: does "flag" being on correspond to true or false?

For the second axis, I don't find the existing "positive" vs. "negative" intuitive. Maybe it could be DefaultStoredAsFalse vs. DefaultStoredAsTrue (or, if we changed the axis, OnMeansTrue vs OnMeansFalse) or maybe there are better/shorter names.

WDYT?

For the multiclass, I don't have a strong opinion, as long as it's declarative and clear. One possibility is to enumerate Out{Out,In}FFlagDefaultStoredAs{False,True} (or Out{Out,In}FFlag{,DefaultStoredAsTrue}); I think this would be fine. It's also fine to take the axes as arguments in a single multiclass, or have Opt{Out,In}FFlag and take the second axis as an argument.

clang/include/clang/Driver/Options.td
250

Whatever names we go with, I think it'll be cleaner to rename existing multiclasses first, and then add in the new multiclasses and/or functionality. That minimizes how inconsistency in-tree (compared to landing new options with a different naming scheme, and then fixing the old options as a follow-up).

261–263

I find the terminology here unintuitive. the thing being declared is when using this multiclass is a command-line option, so I feel like the word "option" should refer to -fuse-jump-tables, and NoUseJumpTables is a storage location / keypath for the option. In that case, -fuse-jump-tables is enabled by default, can be disabled by -fno-use-jump-tables, and this default state is stored in NoUseJumpTables with the value false.

264–265

I'm finding the suggested rename a bit confusing; per my inline comment above, this seems like it should be OptOut...FFlag, since (per my terminology) the command-line option -fuse-jump-tables is enabled by default.

268

Should the new OptOutPositiveFFlag have disablers as well?

dexonsmith requested changes to this revision.Nov 30 2020, 5:14 PM
This revision now requires changes to proceed.Nov 30 2020, 5:14 PM
jansvoboda11 added a comment.EditedDec 1 2020, 6:00 AM

I wanted to defer the resolution of the todos after all existing patches (https://reviews.llvm.org/search/query/wPZcRaH7zHgu/#R) are upstreamed to avoid conflicts when rebasing and refactoring. Looking at the patches more closely, only two of them seem to be dealing with boolean flags, so we might as well come up with the right naming convention and put it in a prep patch. Below are my notes and though process.

The naming convention used upstream works like this:

+-----------------------------------+-------------+
|        Keypath value with:        |             |
+-----------+-----------+-----------+  Multiclass |
|     ''    |  '-foo'   | '-no-foo' |             |
+-----------+-----------+-----------+-------------+
|   false   |   false   |   true    | OptOutFFlag |
|   false   |   true    |   false   | OptInFFlag  |
|   true    |   false   |   true    |             |
|   true    |   true    |   false   |             |
+-----------+-----------+-----------+-------------+

To decide what multiclass to use, you ask:

  • Both Opt{Out,In}FFlag multiclasses default the keypath to false. What command-line flag makes it possible to change the default value of the keypath?
    • OptIn = -ffoo
    • OptOut = -fno-foo

This patch currently works like so:

+-----------------------------------+---------------------+
|        Keypath value with:        |                     |
+-----------+-----------+-----------+      Multiclass     |
|     ''    |  '-foo'   | '-no-foo' |                     |
+-----------+-----------+-----------+---------------------+
|   false   |   false   |   true    | OptInNegativeFFlag  |
|   false   |   true    |   false   | OptInPositiveFFlag  |
|   true    |   false   |   true    | OptOutNegativeFFlag |
|   true    |   true    |   false   | OptOutPositiveFFlag |
+-----------+-----------+-----------+---------------------+

To decide what multiclass to use, you'd ask two questions:

  • What is the semantics of the keypath?
    • Positive = positive, e.g. DoThis
    • Negative = negative, e.g. NoDoThis
  • What is the default value?
    • OptIn = keypath defaults to false; it can be set to true with -ffoo if it has positive semantics, or with -fno-foo if it has negative semantics
    • OptOut = keypath defaults to true; it can be set to false with -fno-foo if it has positive semantics, or with -ffoo if it has negative semantics

I agree the axes and their naming is confusing. So how to make the behavior and usage clearer?

I think it's necessary to drop the assumption that exactly one of the flags is available in -cc1. Some of the record pairs I've marked with a todo don't conform to this scheme (e.g. f[no_]sanitize_address_use_after_scope) but it would still be nice to tie them together via a multiclass. I think it would now make sense to drop the Opt{In,Out} wording.

I think the simplest way to think about this is to encode these properties:

  • What is the keypath default? Is it true or false?
  • Which flag is used to invert the default keypath value? Is it -ffoo or -fno-foo?

How about something like this?

+-----------------------------------+----------------------------------------------------------+
|        Keypath value with:        |                                                          |
+-----------+-----------+-----------+                         Multiclass                       |
|     ''    |  '-foo'   | '-no-foo' |                                                          |
+-----------+-----------+-----------+----------------------------------------------------------+
|   false   |   false   |   true    | BoolOption<DefaultsToFalse, InvertedByNegativeFlag, ...> |
|   false   |   true    |   false   | BoolOption<DefaultsToFalse, InvertedByPositiveFlag, ...> |
|   true    |   false   |   true    | BoolOption<DefaultsToTrue, InvertedByPositiveFlag, ...>  |
|   true    |   true    |   false   | BoolOption<DefaultsToTrue, InvertedByNegativeFlag, ...>  |
+-----------+-----------+-----------+----------------------------------------------------------+

I think it's really similar to the approach you outlined, but takes more keypath-centric approach.
To decide whether we can get away with one parametrized multiclass, I'd need to dig a bit deeper into TableGen.

Another problem to solve: how to declare a mixin that only applies to one of the two generated records?

Option 1:

Linear list of arguments with default values:

multiclass BoolOption<??? defaults_to, ??? inverted_by, string name, string keypath, list<OptionFlag> pos_flags = [], string pos_help = "", list<OptionFlag> neg_flags = [], string neg_help ="", list<OptionFlag> both_flags = [], string both_help_suffix = ""> { ... }

The downside here is that this gets hard to understand:

defm inline_line_tables : BoolGOption<DefaultsToFalse, InvertedByNegativeFlag,
  "inline-line-tables", "CodeGenOpts.NoInlineLineTables", [], "", [CC1Option], "Don't emit inline line tables", [CoreOption], "">;

This could be solved by named arguments, but I'm not sure TableGen supports them.

Option 2:

Group the pos_*, neg_*, both_* arguments into PositiveFlag, NegativeFlag, BothFlags bags:

defm inline_line_tables : BoolGOption<DefaultsToFalse, InvertedByNegativeFlag,
  "inline-line-tables", "CodeGenOpts.NoInlineLineTables", PositiveFlag<>,
  NegativeFlag<[CC1Option], "Don't emit inline line tables">, BothFlags<[CoreOption]>>;

@dexonsmith What do you think about DefaultsTo{True,False} and InvertedBy{Negative,Positive}Flag?

clang/include/clang/Driver/Options.td
250

Fair enough.

268

Probably yes, it's mentioned in a todo above. I wanted to add this only when an option needs it, to minimize untested code paths.

@Bigcheese brought up that if you wanted to mimic the current logic in OptInFFlag that automatically creates the -cc1 option with my current proposal, you'd need to remember to pass it to the right Flags (e.g.: use NegativeFlag<[CC1Option]> when using InvertedByNegativeFlag). My hope is that this could be handled automatically by a thin wrapper around BoolOption.

I think DefaultsTo{True,False} and InvertedBy{Negative,Positive}Flag makes the axes clear, it's way better; thanks for thinking this through.

Thinking out loud: I'm still a bit resistant to making this keypath-centric. The reason is that llvm::Option is a library for command-line options, where command-options are declared, and the "keypath" feature is optional; changing the core concept in clang's usage of llvm::Option feels incongruent somehow (especially since clang is the main client). On the other hand (in favour of being keypath-centric), we're bundling multiple command-line options into a single conceptual declaration, doing doing one declaration per keypath. I guess I'm okay with the new semantics. Maybe part of what makes it work is that we're not declaring a "flag" anymore.

That said, I think you can go further:

defm inline_line_tables : BoolGOption<DefaultsToFalse, InvertedByNegativeFlag,
  "inline-line-tables", "CodeGenOpts.NoInlineLineTables", PositiveFlag<>,
  NegativeFlag<[CC1Option], "Don't emit inline line tables">, BothFlags<[CoreOption]>>;

If this is keypath-centric, then a better name for the declaration would be no_inline_line_tables, which matches the keypath. This gives:

defm no_inline_line_tables : BoolOption<DefaultsToFalse, InvertedByNegativeFlag,
  "inline-line-tables", "CodeGenOpts.NoInlineLineTables", PositiveFlag<>,
  NegativeFlag<[CC1Option], "Don't emit inline line tables">, BothFlags<[CoreOption]>>;

... which doesn't seem quite right yet, since the keypath isn't front and centre.

Here's an idea that's a bit more keypath-centric, that connects the polarity of the option with the option spelling, and that avoids some of the repetitive clutter (I added -something to show the positive case):

defm no_inline_line_tables : BoolOption<"CodeGenOpts.NoInlineLineTables", DefaultsToFalse,
  ChangedByFlag_No<"inline-line-tables", [CC1Option, CoreOption],
                   "Don't emit inline line tables">,
  CancelFlag<[CoreOption]>>;

defm something : BoolOption<"CodeGenOpts.Something", DefaultsToFalse,
  ChangedByFlag<"something", [CC1Option,CoreOption], "Do something">,
  CancelFlag<[CoreOption]>>;

This is similar, but might be easier to implement:

defm no_inline_line_tables : BoolOption<"CodeGenOpts.NoInlineLineTables", DefaultsToFalse,
  ChangedBy_No<"inline-line-tables", [CC1Option, CoreOption], [CoreOption],
               "Don't emit inline line tables">>;

defm something : BoolOption<"CodeGenOpts.Something", DefaultsToFalse,
  ChangedBy<"something", [CC1Option,CoreOption], [CoreOption],
            "Do something">>;

I think both of these syntaxes allow you to think primarily about the keypath and the matching flag.

WDYT?

clang/include/clang/Driver/Options.td
264

Since you're planning to leave this to-do behind, I have a few nits:

  • I think it should be capitalized (TODO:) since I feel like I've seen that more often here.
  • It should probably be its own paragraph.
  • Complete sentences are a bit nicer (start with a capital and end with a period).
  • The wording "if necessary" suggests we don't know if we'll need to do this, in which case having a to-do would just be adding noise. But as you mentioned we do know. I think it's better to be clear about it.
// TODO: Add support for ImpliedByAnyOf once it can be
// tested with an option that depends on it.
268

Ah, thanks, missed that.

Rebase, use BoolOption

This is now ready for review.

dexonsmith accepted this revision.Dec 10 2020, 6:16 PM

Thanks for working through this! Updated patch LGTM, with one nit.

clang/lib/Frontend/CompilerInvocation.cpp
325

Please leave this whitespace change out, or commit separately.

This revision is now accepted and ready to land.Dec 10 2020, 6:16 PM

Remove whitespace change, rebase

This revision was landed with ongoing or failed builds.Dec 16 2020, 1:01 AM
This revision was automatically updated to reflect the committed changes.

One thing this patch does, is make decisions about default behavior static. Meaning, the option behavior cannot depend on other options; specifically, it can't be based on the triple, which allows target-specific customization. PS4 certainly has cases where our defaults are different from the usual ones, and I'd kind of think that was true for other targets as well.

Sorry I didn't notice this patch before, our CI has just tried to merge it. We've patched it up in our main branch but I'm not sure what the upstream intent is here.

One thing this patch does, is make decisions about default behavior static. Meaning, the option behavior cannot depend on other options; specifically, it can't be based on the triple, which allows target-specific customization. PS4 certainly has cases where our defaults are different from the usual ones, and I'd kind of think that was true for other targets as well.

Sorry I didn't notice this patch before, our CI has just tried to merge it. We've patched it up in our main branch but I'm not sure what the upstream intent is here.

BoolOption doesn't support dynamic defaults, but BoolOptionBase does. Here's an example of how to use it:

defm legacy_pass_manager : BoolOptionBase<"legacy-pass-manager",
  "CodeGenOpts.LegacyPassManager", Default<"!static_cast<unsigned>(LLVM_ENABLE_NEW_PASS_MANAGER)">,
  FlagDef<PosFlag, true, [], "Use the legacy pass manager in LLVM">,
  FlagDef<NegFlag, false, [], "Use the new pass manager in LLVM">,
  FlagDefSuffix<[CC1Option], "">, "f">, Group<f_clang_Group>;

For depending on -triple, you should be able to do something like:

defm this_option : BoolOptionBase<...,
    Default<"getDefaultForThisOption(TargetOptions->Triple)",
    ...>

if I understand your use case correctly (it's important to define this_option after the definition of triple).

Maybe it's worth defining another set of derived multiclasses, something like:

defm legacy_pass_manager : DynamicBoolFOption<"legacy-pass-manager",
  "CodeGenOpts.LegacyPassManager", Default<"!static_cast<unsigned>(LLVM_ENABLE_NEW_PASS_MANAGER)">,
  TrueFlag<PosFlag, [], "Use the legacy">,
  FalseFlag<NegFlag, [], "Use the new">,
  BothFlags<[], " pass manager in LLVM">>;

WDYT?

Maybe it's worth defining another set of derived multiclasses, [...]

Another possibility of course is to make Bool*Option work this way (removing the ChangedBy, ResetBy, DefaultsToTrue, and DefaultsToFalse helpers).

(Either way, if we do add better support for dynamic defaults, I think the names SetsTo{True,False} are more clear than my previous example / suggestion {True,False}Flag.)

Thanks Duncan! I (or someone) will play around with that and see what we need to do. May take a little while, as we're about to freeze a release and then go on break for two weeks, but good to know there's a straightforward path.

Heads-up: Looks like one of these changes, likely this one for CodeGen, made sanitizer output way larger: https://bugs.chromium.org/p/chromium/issues/detail?id=1161230

I verified it's due to this change. https://bugs.chromium.org/p/chromium/issues/detail?id=1161230#c15 has a stand-alone repro.

I reverted this (and a bunch of stuff that landed on top of it) in 7ad666798f12456d9e663e763e17e29007c3728d for now.

...and two more in 1876a2914fe0bedf50f7be6a305f5bf35493e496. Sorry for the churn!

rnk added a subscriber: rnk.Tue, Jan 5, 5:22 PM

Reposting my comment here, since this is where the discussion was:

I think this change broke -gline-tables-only functionality. I compared object files before and after this change. The new object file had a large .debug_loc section, which is only present when full debug info is enabled, and is not desired in this case.

@thakis Thanks for the heads-up and sorry for the inconvenience.

@rnk You're right. This patch accidentally changed Group<g_flags_Group> to Group<g_Group> for -g[no-]column-info. In the linked build, this caused expansion of the driver flag -gcolumn-info to cc1 flag -debug-info-kind=limited instead of the correct -debug-info-kind=tables-only.

I've fixed the Group for -g[no-]column-info and for other two flags that were affected in a similar way in. https://reviews.llvm.org/rGce8c59e6af487f0b8786ae921aa926341f0ae04f

Let me know if there are any outstanding issues.