This is an archive of the discontinued LLVM Phabricator instance.

[C++2a][Preprocessor] Implement p0306 __VA_OPT__ (Comma omission and comma deletion)
AbandonedPublic

Authored by faisalv on Jul 23 2017, 8:48 PM.

Details

Summary

This patch implements an extension to the preprocessor: VA_OPT which expands into its contents if variadic arguments are supplied, or behaves as an empty token if none.

  • currently it is only enabled for C++2a (we could always enable this for various other dialects with the appropriate extension or compatibility warnings)

Given:

#define F(a,...) a #__VA_OPT__(a ## a)  a ## __VA_OPT__(__VA_ARGS__)

A few technicalities (most which were clarified through private correspondence between rsmith, hubert and thomas) are worth mentioning:

  • the call F(,) Does not supply any tokens for the variadic arguments and hence VA_OPT behaves as a placeholder
  • when expanding VA_OPT (for e.g. F(,1) token pasting occurs eagerly within its contents
  • a hash or a hashhash prior to VA_OPT does not inhibit expansion of arguments if they are the first token within VA_OPT.
  • essentially, when a variadic argument is supplied, argument substitution occurs within the contents as does stringification and concatenation - and this is substituted, potentially strinigified and concatenated content (but not rescanned) is inserted into the macro expansions token stream just prior to the entire stream being rescanned and concatenated.

See wg21.link/P0306 for further details.
See the example files for more involved examples.

Would greatly appreciate timely feedback =)

Diff Detail

Repository
rL LLVM

Event Timeline

faisalv created this revision.Jul 23 2017, 8:48 PM
rsmith edited edge metadata.Jul 24 2017, 5:08 PM

I think we need more consideration of what the end result of __VA_OPT__ should be, and particularly how it should affect the SourceLocation data. It seems there are two models:

  1. Model __VA_OPT__ following the specification in the standard: form an imaginary macro argument by expanding the contents, and then substitute that for the __VA_OPT__(...) tokens
  2. Model __VA_OPT__( ... ) as if it is just some tokens "in the way" of regular expansion, with macro expansion just walking into the contents.

In particular, option 1 would give us a macro argument expansion sloc for each __VA_OPT__ where option 2 would not. I think it's somewhat unclear which is the better model, but option 1 is more faithful to the standard wording at least.

If we want to try option 1, I think the way to model it would be to snip the __VA_OPT__(...) out of the replacement list when forming the MacroInfo, to store separate replacement lists for each such sequence, and to build additional arguments in the MacroArgs for each such replacement when we expand a macro using __VA_OPT__.

Have you considered the other option? If so, why do you think this way is better? (I haven't formed an opinion myself yet, but my default inclination would be to follow the model in the standard rather than using a different one, even one that is formally equivalent.)

include/clang/Basic/DiagnosticLexKinds.td
351

Seems weird to mention C99 in a diagnostic about a C++20 feature. Maybe just drop the "C99" here?

355

Including the "( contents" here seems pretty awkward. Can we just use the normal style of a "missing ) for this (" diagnostic, with a note pointing to the '('?

360

"at start within VA_OPT" might be clearer as "at start of VA_OPT argument" or similar.

include/clang/Lex/Preprocessor.h
133

Drop the "(contents)" from the comment here. The identifier we're tracking here is just __VA_OPT__.

include/clang/Lex/VariadicMacroSupport.h
1

Typo "gaurds".

24–27

Please factor out this class (for the __VA_ARGS__ case) and commit it separately.

30–31

Don't look these up by string each time, instead grab the value cached on the preprocessor.

35

Rather than repeating this connection between language mode and this particular feature here, maybe check whether the cached IdentifierInfo* on the preprocessor is null?

57–59

LLVM / Clang style uses CamelCase for names such as this.

61

This should not be called State if what it's tracking is the number of nested parens. Please give this a better name.

74

Doc comments should start with three slashes, not repeat the function name, and be formatted as a sentence (leading capital letter, terminating period).

lib/Lex/TokenLexer.cpp
192

It's a bit weird to use the unexpanded argument here:

#define EMPTY
#define X(a, ...) __VA_OPT__(not) empty
X(1) // empty
X(1, ) // empty
X(1, EMPTY) // not empty

... but I think this ultimately is the right thing. Especially when you consider that #__VA_ARGS__ would produce "" for the first two cases and "EMPTY" for the third. Maybe extend your comment with the empty macro case?

faisalv updated this revision to Diff 109883.Aug 5 2017, 10:14 AM
faisalv marked 7 inline comments as done.

Hi Richard,

This patch attempts to incorporate all the changes you requested.

You asked for some consideration on the rationale for sticking with my approach (as opposed to the alternative you posed), so here are my reasons (FWTW) for not modeling the VA_OPT as a macro argument:

  • the standard does not specify it as behaving exactly as a macro argument (specifically no rescanning, after placemarker token removal) - and so i would have to special case the argument expansion logic - which currently composes well by just launching a tokenlexer on the argument.
  • it would be the only argument that would need to have *other* arguments substituted into it
  • unlike every other argument it would not be represented by just 1 token
  • in regards to the source-location mapping, my sense is that the ExpansionInfo that represents the various expansions might need to add a representation for this 'fourth' kind of expansion (since expanded args are represented by a valid LocStart but an invalid LocEnd, and since this is more than one token, things could get hairy)
  • it might require complicating 'Parsing' the macro definition (and the MacroArgs, MacroInfo objects) and the general tokenLexer, ExpandFunctionArguments() logic more than any gained conceptual elegance (in my opinion)

Thanks Richard - Look forward to your review and thoughts!

majnemer added inline comments.
include/clang/Lex/TokenLexer.h
169

I think llvm::ArrayRef<Token>() can just be llvm::None.

This generally looks good. I have some specific ideas about moving more of the recognition of the __VA_OPT__ ( ... ) pattern into VAOptDefinitionContext, so if you prefer you can go ahead with something like this patch and I'll do the refactoring afterwards.

include/clang/Basic/DiagnosticLexKinds.td
360

'##' cannot appear at end of __VA_OPT__ argument please, to match the change to the previous diagnostic.

include/clang/Lex/VariadicMacroSupport.h
1

nit: "state machines" and "scope guards" should not be hyphenated

69

Please add a documentation comment explaining what this is. Maybe something like:

"A class for tracking whether we're inside a VA_OPT during a traversal of the tokens of a variadic macro."

119

And here:

"A class for tracking whether we're inside a VA_OPT during traversal of the tokens of a macro as part of macro expansion."

lib/Lex/PPDirectives.cpp
2384–2418

I think I would be inclined to move more of this logic into VAOptDefinitionContext, maybe a TokenState process(const Token&) member that returns an enum saying whether the token should be handled normally, skipped, diagnosed due to missing lparen, diagnosed due to nested va_opt, ...

lib/Lex/Preprocessor.cpp
130

Have you considered allowing this as an extension in other language modes?

lib/Lex/TokenLexer.cpp
189–195

Please make this a member function of MacroArgs (passing in the MacroInfo*).

275–284

Is this extra check really necessary? We only set PasteBefore to true if there was a hashhash token in the token stream prior to the __VA_OPT__, so it seems like we should be able to assume the token is still there now.

285

I think it might be more obvious to move this hasStringifyOrCharifyBefore check upwards, and define a bool IsPlacemarkerToken = !NumVAOptTokens && !VCtx.hasStringifyOrCharifyBefore();, then conditionalize these special cases on IsPlacemarkerToken not !NumVAOptTokens.

321–322

ConcatenatedVAOPTResultToks.back() = LHS;

357

The first part of this while condition has no effect. The second part has an effect, but it's tremendously confusing to put this check here, when it corresponds only to the continue on line 251. Can you move the I != E check to line 251 and make this just be a while (true) loop?

(I also think that having the normal fallthrough path through the loop be an exit path is far from ideal. Moving to a process(Token) mechanism on the VAOpt...Context class should help with this.)

418–420

This seems excessive. Just "unexpected ## in result tokens" would do, I think.

572–575

This is a fun corner case. :)

723–725

Please go ahead and do that refactoring now :)

faisalv abandoned this revision.Oct 15 2017, 7:04 PM
faisalv marked 12 inline comments as done.

committed as r315840.
https://reviews.llvm.org/rL315840