Page MenuHomePhabricator

[Annotation] Allow parameter pack expansions in annotate attribute
Needs ReviewPublic

Authored by steffenlarsen on Nov 23 2021, 7:01 AM.

Details

Summary

These changes make the Clang parser recognize parameter pack expansion in attribute arguments and consume them if the pack expansion is in the last argument and the last argument is a variadic expression argument that has been marked as supporting pack expansion.
Currently only the last argument of the clang::annotate attribute will support parameter pack expansions. The parser will issue an error diagnostic if other attributes are passed parameter pack expansion in their last argument.

Diff Detail

Event Timeline

steffenlarsen created this revision.Nov 23 2021, 7:01 AM
steffenlarsen requested review of this revision.Nov 23 2021, 7:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2021, 7:01 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Tyker added a comment.Nov 23 2021, 9:34 AM

This seems like a good change to me. but i don't think my approval is enough

Aaron is way more familiar with this code than I am, but I've got some suggestions for more tests in the parsing, we need to make sure that we handle pack expansion completely here.

clang/lib/Parse/ParseDecl.cpp
504

Do you have a test for something that isn't a pack followed by an ellipsis? What is the behavior there? I'm also concerned that there doesn't seem to be anything here that handles complex-pack expansions (like folds), can you test that as well?

Also, why is this 'if' not up on line 429 (that is, outside of this 'else'?). I believe ParseAssignmentExpression is going to muck-up the tokens, so I'm not particularly sure what is going to happen here?

clang/test/Parser/cxx0x-attributes.cpp
262

What is the point of this change?

aaron.ballman requested changes to this revision.Dec 1 2021, 11:40 AM
aaron.ballman added inline comments.
clang/include/clang/Basic/Attr.td
549–551

From a design perspective, I think this is a property of the parameters of an attribute rather than of an attribute itself. So I'd rather not add a bit to the Attr class, but instead modify the Argument class. There are a few approaches to that which I can think of, and I'm not certain which is the best approach.

  1. Introduce ExpressionPackArgument and attributes can opt into using it if they want to support packs.
  2. Add this bit to VariadicExprArgument and attributes can opt into using it if they want to support packs.
  3. Change VariadicExprArgument to also mean "allows packs" and all attributes using this automatically support packs.

I think that my conservative preference is for #2, but my intuition is that #3 is where we probably ultimately want to get to.

There are definitely some open questions with this approach that we'll have to figure out:

  • Can you mix packs with variadics in the same attribute?
  • Can you have two pack arguments in the same attribute?
  • Does the tablegen design also seem likely to work when we want to introduce type packs instead of value packs?

I think the safest bet is to be conservative and not allow mixing packs with variadics, and not allowing multiple packs. We should be able to diagnose that situation from ClangAttrEmitter.cpp to help attribute authors out. However, it'd be worth adding a FIXME comment to that diagnostic logic asking whether we want to relax the behavior at some point. But if you feel strongly that we should support these situations initially, I wouldn't be opposed.

This revision now requires changes to proceed.Dec 1 2021, 11:40 AM

I think the safest bet is to be conservative and not allow mixing packs with variadics, and not allowing multiple packs. We should be able to diagnose that situation from ClangAttrEmitter.cpp to help attribute authors out. However, it'd be worth adding a FIXME comment to that diagnostic logic asking whether we want to relax the behavior at some point. But if you feel strongly that we should support these situations initially, I wouldn't be opposed.

From a practical perspective, the only difference between a list of ExprArgument and VariadicExprArgument is the enforced arity.... I would also consider seeing if we can just generalize the pack logic to work for BOTH cases. I'd be all for limiting the pack/ExprArgument list to be the 'last' argument to simplify it.

steffenlarsen added inline comments.Dec 2 2021, 2:35 AM
clang/include/clang/Basic/Attr.td
549–551

Having ParmExpansionArgsSupport in Attr would allow attributes to populate with parameter packs across argument boundaries. It seems more permissive, but frankly I have no attachment to it, so I am okay with tying it to Argument instead.

Of the mentioned options, I agree #2 and #3 are the preferred paths. For simplicity I prefer #2, though I will have to figure out how best to add a check on individual arguments, which @erichkeane's suggestion to limit the attributes to only allowing parameter packs in variadic arguments as the last argument would help with. On the other hand, the population logic for most of the relevant attributes are explicitly defined in clang/lib/Sema/SemaDeclAttr.cpp (which will have to be changed with #3) so being more permissive - like the current version of the patch is - gives more freedom to the attributes.

Does the tablegen design also seem likely to work when we want to introduce type packs instead of value packs?

I am not sure about how tablegen would take it, but the parser doesn't currently parse more than a single type argument, so I don't see a need for allowing type packs at the moment.

I think the safest bet is to be conservative and not allow mixing packs with variadics, and not allowing multiple packs. We should be able to diagnose that situation from ClangAttrEmitter.cpp to help attribute authors out. However, it'd be worth adding a FIXME comment to that diagnostic logic asking whether we want to relax the behavior at some point. But if you feel strongly that we should support these situations initially, I wouldn't be opposed.

If we limit packs to VariadicExprArgument - like annotate does in the current state of this patch - having multiple packs in the same argument is simple to handle. Both are represented as Expr and the variadic argument will simply contain two expressions until the templates are instantiated and the packs are expanded. This is also a feature I would like to keep, though it should be possible to work around the limitation if we conclude it can't stay.

clang/lib/Parse/ParseDecl.cpp
504

Do you have a test for something that isn't a pack followed by an ellipsis? What is the behavior there? I'm also concerned that there doesn't seem to be anything here that handles complex-pack expansions (like folds), can you test that as well?

Good question! I would expect it to fail in kind during template instantiation, or earlier if it is not an expression, but I will test it out once we converge on a new solution.

Also, why is this 'if' not up on line 429 (that is, outside of this 'else'?). I believe ParseAssignmentExpression is going to muck-up the tokens, so I'm not particularly sure what is going to happen here?

I originally tried having the ActOnPackExpansion earlier, but it did indeed cause trouble. However, if my understanding of the previous two branches is correct, they are:

  1. Type argument. This could be consuming the ellipsis as well, though I suspect it needs another path given that it does not seem to produce any expressions. Note however that it currently seems to stop argument parsing after reading the first type (confirmed by the FIXME) so parameter pack logic for this branch would probably be more suitable once it actually allows attributes to take multiple types.
  2. Identifier argument. Correct me if I am wrong, but I don't think this can be populated by a parameter pack.

I could add the diagnostic to these if ellipsis is found, but if my assumptions are correct it doesn't make much sense to expand these branches beyond that.

clang/test/Parser/cxx0x-attributes.cpp
262

It makes more sense for the new test cases to get expressions rather than types and since Ts wasn't used for its contents it did not seem like an intrusive change.

steffenlarsen added inline comments.Dec 2 2021, 2:40 AM
clang/lib/Parse/ParseDecl.cpp
504

I'm also concerned that there doesn't seem to be anything here that handles complex-pack expansions (like folds), can you test that as well?

Sorry, I forgot to address this. I have not tested it, but I would expect complex-pack expansions like folds to be at expression-level. I will make sure to test this as well.

aaron.ballman added inline comments.Dec 2 2021, 8:34 AM
clang/include/clang/Basic/Attr.td
549–551

Having ParmExpansionArgsSupport in Attr would allow attributes to populate with parameter packs across argument boundaries.

That's what I want to avoid. :-D I would prefer that the arguments have to opt into that behavior because the alternative could get ambiguous. I'd rather we do some extra work to explicitly support that situation once we have a specific attribute in mind for it.

... which @erichkeane's suggestion to limit the attributes to only allowing parameter packs in variadic arguments as the last argument would help with.

I agree with that suggestion, btw.

I am not sure about how tablegen would take it, but the parser doesn't currently parse more than a single type argument, so I don't see a need for allowing type packs at the moment.

Fair point, and yet another reason not to allow type packs initially. :-)

If we limit packs to VariadicExprArgument - like annotate does in the current state of this patch - having multiple packs in the same argument is simple to handle. Both are represented as Expr and the variadic argument will simply contain two expressions until the templates are instantiated and the packs are expanded.

Hmm, let's make sure we're on the same page. The situations I think we should avoid are:

// Mixing variadics and packs.
let Args = [VariadicExprArgument<"VarArgs">, VariadicExprArgument<"Pack", /*AllowPack*/1>];

// Multiple packs.
let Args = [VariadicExprArgument<"FirstPack", /*AllowPack*/1>, VariadicExprArgument<"SecondPack", /*AllowPack*/1>];
erichkeane added inline comments.Dec 2 2021, 8:40 AM
clang/include/clang/Basic/Attr.td
549–551

I agree, BOTH of those should be prohibited. Any amount of mixing of VariadicExprArgument (or mixing of any of the variadics IMO) need to be illegal. Otherwise the ambiguous nature of it is untenable.

While I like the possibility of
let Args = [ExprArgument<"Arg1">, ExprArgument<"Arg2">];

ALSO accepting a pack, I think we end up getting a little too 'hacky' with how to store this in the non-instantiated version.

steffenlarsen edited the summary of this revision. (Show Details)

Thank you both for the reviews! Consensus seems to be having support for pack expansion at argument level for now and let more complicated logic be added when there is an actual need. I have applied these changes as I understood them and have added/adjusted the requested test cases. Please have a look and let me know what you think. 😄

That's what I want to avoid. :-D I would prefer that the arguments have to opt into that behavior because the alternative could get ambiguous. I'd rather we do some extra work to explicitly support that situation once we have a specific attribute in mind for it.

I'm okay with that! I have made the changes to only allow it in the last argument if it is marked as supporting pack expansion. Note that it assumes that there are no other variadic parameters except the last one, as suggested.

Hmm, let's make sure we're on the same page. The situations I think we should avoid are:

// Mixing variadics and packs.
let Args = [VariadicExprArgument<"VarArgs">, VariadicExprArgument<"Pack", /*AllowPack*/1>];

// Multiple packs.
let Args = [VariadicExprArgument<"FirstPack", /*AllowPack*/1>, VariadicExprArgument<"SecondPack", /*AllowPack*/1>];

Oh, I see what you mean. Yeah I agree for sure. I think the only exceptions we currently have to this is omp attributes, but given that they are pragmas they should be nowhere close to this.

erichkeane added inline comments.Dec 6 2021, 8:51 AM
clang/include/clang/Basic/DiagnosticParseKinds.td
724

I don't really see why this is required? I would think the 722 would be all you would need.

clang/lib/Parse/ParseDecl.cpp
498

I don't think this should be diagnosed here, and I don't think it is 'right'. I think the ClangAttrEmitter should ensure that the VariadicExprArgument needs to be the 'last' thing, but I think that REALLY means "supports a pack anywhere inside of it".

See my test examples below, I don't think this parsing is sufficient for that.

clang/test/Parser/cxx0x-attributes.cpp
270

what about:
void foz [[clang::annotate("D", Is)]] ();

I would expect that to error.

Another test I'd like to see:

void foz[[clang::annotate("E", 1, 2, 3, Is...)]]

Also, I don't see why if THAT works, that:
void foz[[clang::annotate("E", 1, Is..., 2, 3)]]

shouldn't be allowed as well.

clang/utils/TableGen/ClangAttrEmitter.cpp
1166–1168

The rule of 'only the last argument is allowed to support a pack' should be in the attribute emitter.

steffenlarsen added inline comments.Dec 7 2021, 12:51 AM
clang/include/clang/Basic/DiagnosticParseKinds.td
724

Intention was to make a distinction between the two cases, since we had the granularity to do so. I.e. clang::annotate would never use 722 as it explicitly states that the attribute doesn't support pack expansion (in the last argument) which is untrue for clang::annotate, but if a user was to do pack expansion on the first argument of clang::annotate they would get this error telling them that the given argument cannot accept the pack expansion.

clang/lib/Parse/ParseDecl.cpp
498

That is also the intention here. All it checks is that we are in or beyond the last argument of the attribute. For example, clang::annotate will return 2 from attributeNumberOfArguments as VariadicExprArgument only counts as a single argument. Thus, any pack expansion expressions parsed after the first will be accepted. I do agree though that the error message may be a little confusing for users.

I will add the suggested tests and rethink the diagnostics.

The new revision does the following:

  1. Revisits the diagnostics. Now parameter packs will cause the same error on attributes that do not support parameter packs in arguments, while attributes that support it will cause an error if parameter packs are used in the incorrect argument, specifying which argument should be the earliest for it.
  2. Adds additional tests to ensure that variadic arguments supporting packs also allow pack expansion intermingled with other arguments.
  3. Adds assertions to attribute generation to check that attributes only ever support packs in the last variadic argument and that there are no other variadic arguments if they do.
steffenlarsen added inline comments.Dec 7 2021, 4:35 AM
clang/test/Parser/cxx0x-attributes.cpp
270

what about:
void foz [[clang::annotate("D", Is)]] ();

I would expect that to error.

So would I, but it seems that the parser and annotate is more than happy to consume this example, even in the current state of Clang: https://godbolt.org/z/rx64EKeeE
I am not sure as to why, but seeing as it is a preexisting bug I don't know if it makes sense to include it in the scope of this patch.

Another test I'd like to see:

void foz[[clang::annotate("E", 1, 2, 3, Is...)]]

Also, I don't see why if THAT works, that:
void foz[[clang::annotate("E", 1, Is..., 2, 3)]]

shouldn't be allowed as well.

They seem to work as expected. I have added these cases.

clang/utils/TableGen/ClangAttrEmitter.cpp
1166–1168

The rule of 'only the last argument is allowed to support a pack' should be in the attribute emitter.

I have added some assertions around the area where attributes are generate, as that carries more surrounding information. Is that approximately what you had in mind?

erichkeane added inline comments.Dec 7 2021, 5:56 AM
clang/lib/Parse/ParseDecl.cpp
495

So I was thinking about this overnight... I wonder if this logic is inverted here? Aaron can better answer, but I wonder if we should be instead detecting when we are on the 'last' parameter and it is one of these VariadicExprArgument things (that accept a pack), then dropping the parser into a loop of:

while (Tok.is(tok::comma)){

Tok.Consume();
ArgExpr = CorrectTypos(ParseAssignmentExpr(...));

}
// finally, consume the closing paren.

WDYT?

steffenlarsen added inline comments.Dec 8 2021, 2:50 AM
clang/lib/Parse/ParseDecl.cpp
495

The parsing of attribute arguments is already this lenient. It does not actually care how many arguments the attribute says it takes (unless it takes a type and then it will only currently supply _one_ argument). It simply consumes as many expressions it can before reaching the end parenthesis, then leaves the attributes to consume them in clang/lib/Sema/SemaDeclAttr.cpp.
As a result we do not need any additional work here to handle variadic arguments, but it also means that we have to introduce the restrictions that we can only allow packs in the last argument and allow only that argument to be variadic, as otherwise we have no way of knowing when and where the packs pass between argument boundaries.

erichkeane added inline comments.Dec 8 2021, 6:09 AM
clang/lib/Parse/ParseDecl.cpp
495

My point is, I don't think THIS function should be handling the ellipsis, the expression parsing functions we send the parsing of each component off to should do that.

We unfortunately cannot move the entirety of this parsing section to do that, since 'identifier'/'type', and 'Function' argument types end up needing special handling, but it would be nice if our 'expr' types would all be able to do this, and I suggested earlier that we could start with the VariadicExprArgument to make this an easier patch that didn't have to deal with the issues that come with that.

That said, it would be nice if we could get CLOSE to that.

steffenlarsen added inline comments.Dec 9 2021, 4:32 AM
clang/lib/Parse/ParseDecl.cpp
495

Sorry. I am still not sure I am following. Do you mean the consumption of pack expressions should be moved into the expression parsing so other places (not just attributes) would accept it? If so, I am not opposed to the idea though I have no idea which expressions these would be nor the full implications of such a change. For these we at least have the isolation of having to explicitly support the packs in the given attributes.

erichkeane added inline comments.Dec 9 2021, 6:07 AM
clang/lib/Parse/ParseDecl.cpp
495

I'm saying the other expression parsers should ALREADY properly handle packs, and we should just use those.

steffenlarsen added inline comments.Dec 9 2021, 6:43 AM
clang/lib/Parse/ParseDecl.cpp
495

I think I see what you mean. There is the Parser::ParseExpressionList which may fit the bill, though it seems to also accept initializer lists. But since it is an expression the attributes can decide if it is a valid expression for them, so maybe that is not a problem?

erichkeane added inline comments.Dec 9 2021, 6:45 AM
clang/lib/Parse/ParseDecl.cpp
495

Hmm... I don't have a good idea of that, Aaron has better visibility into that sort of thing.

Part of the advantage to doing it the way I suggest is that it ends up being a single call (that is, the parser should parse the commas!), so we should only have to consume the closing paren.

aaron.ballman added inline comments.Dec 15 2021, 7:25 AM
clang/lib/Parse/ParseDecl.cpp
495

I think we want something more like Parser::ParseSimpleExpressionList() but... that doesn't handle packs, so those appear to be handled special: https://github.com/llvm/llvm-project/blob/main/clang/lib/Parse/ParseExpr.cpp#L3059. But one of these two functions seems to be along the correct lines of what we want (with modification).

FWIW, I like the sound of Erich's approach better than what's in the current patch. I'd rather not try to specially handle the ellipsis so much as recognizing when we expect an arbitrary list of variadic expression arguments as the last "parameter" for an attribute.

I agree that reusing existing parser logic for this would be preferable, but as @aaron.ballman mentions Parser::ParseSimpleExpressionList does not give us the parameter pack expansion parsing we need. We could potentially unify it with the logic from https://github.com/llvm/llvm-project/blob/main/clang/lib/Parse/ParseExpr.cpp#L3059 but on the other hand there could be potential benefits to using Parser::ParseExpressionList instead.

The primary difference between the behavior of Parser::ParseExpressionList and what we are looking for is that it will allow initializer lists as arguments. Since most (if not all) attributes check the validity of their expression arguments, they would likely fail accordingly if they get an initializer list, likely specifying they expected an argument of type X. It may be a blessing in disguise to parse these arguments as well as we are more permissive w.r.t. attribute arguments. Please let me know if there is anything fundamentally wrong with this line of logic. One concern I have is that we currently run Actions.CorrectDelayedTyposInExpr on the expression right after parsing it, then we run pack expansion, whereas Parser::ParseExpressionList only runs the former action in failure case. I am unsure whether or not that might pose a problem.

One problem is that if we only use Parser::ParseExpressionList to parse the arguments of the last argument (if it is variadic) then we get the odd behavior that the initializer lists will only be permitted in variadic arguments, which arguably is a little obscure. However, upon further investigation the only case that stops us from using this for all expression arguments are when the attribute has either VariadicIdentifierArgument or VariadicParamOrParamIdxArgument as the first argument. If we assume that these variadic expressions are limited to the last argument as well (as they are variadic) leaving them as the only argument of the attributes we are parsing here, then we can parse expressions as we did before if we are parsing one of these arguments or switch to using e.g. Parser::ParseExpressionList for all expressions of the attribute (even non-variadic) if not. Then we would just have to iterate over the produced expressions to make sure no parameter pack expansions occur before the last (variadic) argument of the parsed attribute.

Summary of suggested changes:
Split attribute parsing into 3: First case parsing a type (currently exists). Second case keeping the current logic for parsing identifiers and single assignment-expressions if the attribute has a single argument of type VariadicIdentifierArgument or VariadicParamOrParamIdxArgument. Third case, applicable if none of the former ran, parses all expression parameters in one using Parser::ParseExpressionList and then checks that parameter packs only occur in the last variadic argument (if it allows it). Side effect is it allows initializer lists in the third case, but attributes are likely to complain about unexpected expression types if these are passed.

I have made the changes suggested in my previous comment. This makes significantly more changes to the parsing of attribute arguments as the old path was needed for attributes that allow both expressions and types. It also required some new controlling arguments for ParseExpressionList.

Because these changes also allow intializer lists in attribute arguments I have added a test case showing that clang::annotate parses these but will not accept them.

@erichkeane & @aaron.ballman - Is this in line with what you were thinking?

Thanks for this, I think it's shaping up well!

clang/include/clang/Basic/DiagnosticParseKinds.td
723
725
clang/lib/Parse/ParseDecl.cpp
436–438

I'm a bit surprised this logic moved -- are there no times when we'd need it within the new else clause?

497

Coding style nits.

clang/lib/Parse/ParseExpr.cpp
3360

or something along those lines (note, the logic has reversed meaning). "Fail immediately" doesn't quite convey the behavior to me because we fail immediately either way, it's more about whether we attempt to skip tokens to get to the end of the expression list. I'm not strongly tied to the name I suggested either, btw.

3374–3375

Not that I think this is a bad change, but it seems unrelated to the patch. Can you explain why you made the change?

clang/test/Parser/cxx0x-attributes.cpp
262

Non-type templates are different from type templates, so I don't really want to lose the test coverage for that (if nothing else, it ensures the parser doesn't explode when it encounters one). It's fine to add a variadic_nttp() function or something along those lines.

Thank you, @aaron.ballman ! I will update the revision in a moment with some of the suggested changes.

clang/lib/Parse/ParseDecl.cpp
436–438

This is done because ParseExpressionList wouldn't know what to do with identifiers, so it keeps the old logic. This means that attributes that consist of a variadic identifier argument cannot parse packs and initializer lists. It shouldn't be a problem for now, but it does kill some generality.

An alternative is to split the individual expression parsing from ParseExpressionList and use that in here in a similar way to how it parsed before. Would that be preferred?

clang/lib/Parse/ParseExpr.cpp
3360

"Fail immediately" doesn't quite convey the behavior to me because we fail immediately either way

Am I misunderstanding FailUntil or is the intention of SkipUntil(tok::comma, tok::r_paren, StopBeforeMatch) to either stop at the next "," or ")"? If I am not misunderstanding, I believe it will continue parsing expressions after comma if SkipUntil stopped before it. As such I don't think FailImmediatelyOnInvalidExpr is inherently wrong, but I agree that skipping the skipping isn't very well conveyed by the name as is.

3374–3375

Admittedly I am not sure why it is needed, but in the previous version of the parsing for attribute parameters it does the typo-correction immediately after parsing the expression and if this isn't added a handful of tests fail.

Applied suggestions.

steffenlarsen marked 5 inline comments as done.Thu, Jan 20, 3:30 AM
aaron.ballman added inline comments.Thu, Jan 20, 8:11 AM
clang/lib/Parse/ParseDecl.cpp
436–438

This is done because ParseExpressionList wouldn't know what to do with identifiers, so it keeps the old logic.

Ah, I think I confused myself into thinking {a, b, c} would be using a DeclRefExpr for each of the identifiers, but this code path is for identifiers that are *not* expressions. Got it.

An alternative is to split the individual expression parsing from ParseExpressionList and use that in here in a similar way to how it parsed before. Would that be preferred?

I think this is okay for now.

468

I think it'd be good to move this comment up closer to the else on line 461 so that it's clear what the else clause covers (and more clear that this is the common case).

clang/lib/Parse/ParseExpr.cpp
3360

You're correct about the behavior (I had missed that we stopped on a comma as well as closing paren, I was thinking we were stopping on the closing curly brace). So the parameter name isn't inherently wrong, just seems a bit unclear to me. I don't insist on a change, but if someone has a better name for the parameter, I won't be sad either.

3374–3375

Ah, thanks for the info! So this is basically doing the same work as what's done on line 3410 in the late failure case -- I wonder if this should actually be tied to the FailImmediatelyOnInvalidExpr parameter instead of having its own parameter?

Moved comment.

steffenlarsen marked an inline comment as done.Fri, Jan 21, 2:55 AM
steffenlarsen added inline comments.
clang/lib/Parse/ParseDecl.cpp
468

I completely agree. It has been moved to right after the else.

clang/lib/Parse/ParseExpr.cpp
3374–3375

They could be unified in the current version, but I don't think there is an actual relation between them.