This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Record tokens in attribute arguments for user-defined C++/C2x attributes
AbandonedPublic

Authored by Qix- on Apr 4 2021, 9:48 AM.

Details

Summary

Currently, user-defined attributes (e.g. those registered via Clang plugins) don't hold much information or allow for much to be done with them, as their argument tokens are discarded entirely in almost all cases.

However, since they are quite flexible with their syntax (pretty much anything can go in there), it would be massively helpful if plugins could be handed the pre-processed and parsed tokens as a list to be consumed by third-party plugins and the like.

This diff creates a trailing data list for ParsedAttr objects and places the "recorded" tokens there from the lexer. I would have piggy-backed off of the normal backtrack system but unfortunately it doesn't call the handler for tokens being replayed from the backtrack cache, which in many cases we *do* care about. So I had to create a second recording callback.

The new plugin directory shows how this would be used from a plugin.

NOTE: I'm posting a bit prematurely, as on my WSL1 machine this caused a random test to fail, and I'm not entirely sure why. Hoping someone else can point me in the right direction to debug it.

The three tests that failed before hand were:

Failed Tests (3):
  Clang :: Frontend/plugin-delayed-template.cpp
  libc++ :: libcxx/gdb/gdb_pretty_printer_test.sh.cpp
  libc++ :: std/numerics/numeric.ops/numeric.ops.midpoint/midpoint.float.pass.cpp

And after:

Failed Tests (4):
  Clang :: Frontend/plugin-delayed-template.cpp
  Clang :: SemaObjC/externally-retained-no-arc.m
  libc++ :: libcxx/gdb/gdb_pretty_printer_test.sh.cpp
  libc++ :: std/numerics/numeric.ops/numeric.ops.midpoint/midpoint.float.pass.cpp

EDIT: Looks like the errors were transient on my end; successive runs of the identical changes on WSL1 produces different failures each time. CI appears to be green so hopefully everything is OK.

Diff Detail

Event Timeline

Qix- created this revision.Apr 4 2021, 9:48 AM
Qix- requested review of this revision.Apr 4 2021, 9:48 AM
Qix- edited the summary of this revision. (Show Details)Apr 4 2021, 9:56 AM
Qix- edited the summary of this revision. (Show Details)Apr 4 2021, 12:37 PM
Qix- updated this revision to Diff 336173.Apr 8 2021, 11:10 AM

Updated the diff to include a lot more context (-U9999). Thanks again for the tip :)

Thank you for this patch, I think it's really useful functionality for plugin authors!

Adding some additional reviewers for more opinions on the changes in the preprocessor.

clang/examples/PrintAttributeTokens/CMakeLists.txt
4–11

I'm not certain what this is actually doing -- the .exports file is empty (and no other plugin has this sort of .exports file thing). Is this needed?

clang/examples/PrintAttributeTokens/PrintAttributeTokens.cpp
38

Should probably use llvm::outs() instead (here and elsewhere in the plugin).

45–46

Per the usual coding conventions.

49–51

It'd probably be handy to also print identifier tokens.

clang/include/clang/Sema/ParsedAttr.h
236–238

Rather than use a bit-field for this, I'd add it as a regular unsigned after the declaration of UnavailableLoc.

284
490

I think it'd be reasonable to return nullptr in this case, WDYT?

clang/lib/Parse/ParseDeclCXX.cpp
4100–4102

I think plugins will expect these tokens to be available regardless of which attribute syntax is used, so you may need to look into also doing work for GNU and declspec attribute argument lists.

4121–4122

To be clear, we should have at least *two* tokens, right? One for the left paren and one for the right?

clang/test/Frontend/plugin-print-attr-tokens.cpp
10–13

Can you also verify that the token stream output from the plugin looks correct?

Can you also add additional tests for GNU and declspec attribute syntaxes?

Qix- planned changes to this revision.Apr 9 2021, 11:04 AM
Qix- marked 5 inline comments as done.
Qix- added inline comments.
clang/examples/PrintAttributeTokens/CMakeLists.txt
4–11

Most likely not - the directory was copied from the PrintFunctionNames directory, which also has an empty exports file. I'm not completely familiar with the LLVM build configs so I figured it was required. Happy to remove it if need be :)

clang/examples/PrintAttributeTokens/PrintAttributeTokens.cpp
38

Sure thing, though should PrintFunctionNames be updated as well? Mostly copied the conventions over from that.

49–51

Yeah I caught that the other day too, thanks for reminding me.

clang/include/clang/Sema/ParsedAttr.h
284

Agreed; should 279 be updated too?

490

Sure, that's fine. I was trying to think why I did the assertion but nullptr seems fine too.

clang/lib/Parse/ParseDeclCXX.cpp
4100–4102

As far as I understand (and perhaps poorly communicated in the comment) is that plugin-defined attributes will always hit this codepath as opposed to the built-in attribute parsers.

I could be wrong here, though. Are arbitrary tokens even allowed in GNU/Declspec attributes? I thought it was just the C2x/C++ attributes syntax that allowed non-identifier tokens in the first place.

Either way, from what I could tell (trying a few different implementations of this change), this is the only place where user-defined attributes are parsed. I certainly could be missing something though.

4121–4122

This was confusing for me too at first.

By 4112, the left paren token has already been emitted by the lexer; ConsumeParen advances to the next token *after* the left-paren; _that_ token then becomes the first in the recording session.

Further, SkipUntil(tok::r_paren) checks if the current token is an r_paren, and if not, advances and then starts over. It doesn't advance _then_ check.

It's the presence of the left paren that advances us to this stage of the parsing in the first place and thus is emitted before our recording session starts. An empty argument list ([[some_attr()]]) thus only has an r_paren token emitted to the recording session, which is then ignored.

This is the existing design of the lexer it seems - I wasn't expecting the "reversed" order of operations, and the names of the lexer methods didn't help, either :)

clang/test/Frontend/plugin-print-attr-tokens.cpp
10–13

Can you also verify that the token stream output from the plugin looks correct?

Absolutely; how do I do this? The attributes don't make it into the final AST as far as I can tell, so is there a filecheck mechanism for checking the stdio streams directly? Is there an existing test that can do this I can reference?

Can you also add additional tests for GNU and declspec attribute syntaxes?

See comment above about GNU/declspec syntaxes. Depending on the outcome there, I'll certainly add them here.

Qix- marked 3 inline comments as done.Apr 9 2021, 1:20 PM

So I went back and checked and I remember why I didn't add explicit support for GNU/declspec attributes - they actually perform symbol lookups in the surrounding scope. I believe there's an issue right now with plugins that GNU-style attributes parser assumes that they're being parsed from the Tablegen stuff and thus expect the attribute classes to be parameterized as e.g. taking arguments or not. Therefore, putting arbitrary tokens in GNU-style and (as far as I can tell) declspec-style attributes is expressely not supported by the compiler. Also I believe this makes sense, too, because those syntaxes aren't spec'd to allow arbitrary tokens like C2x and C++ attributes are.

In fact, in my testing, the GNU-style attributes in plugins can't accept arguments whatsoever without getting some sort of compilation error. Maybe I'm not using the plugin system correctly, though.

Qix- marked an inline comment as not done.Apr 9 2021, 1:23 PM
Qix- added inline comments.
clang/lib/Parse/ParseDeclCXX.cpp
4100–4102

See my latest top-level comment.

Qix- updated this revision to Diff 336614.Apr 10 2021, 7:38 AM

@aaron.ballman updated with comments.

See also https://bugs.llvm.org/show_bug.cgi?id=46446. when I first fell into this issue, I did think it was trying to save the token stream as this patch is doing. Neat I thought :) although I'm a clang weenie, saving the tokens is putting this into deferred-parse territory, which makes me nervous. Wouldn't it be better for the ParsedAttrInfo objects to determine whether and how to parse their arguments. They could do so immediately, or save tokens, or whatever on a per-attribute per-argument basis. Isn't that more flexible? Add some ParsedAttrInfo defaults for the default cxx11, gnu & clang flavours of attributes?

See also https://bugs.llvm.org/show_bug.cgi?id=46446. when I first fell into this issue, I did think it was trying to save the token stream as this patch is doing. Neat I thought :) although I'm a clang weenie, saving the tokens is putting this into deferred-parse territory, which makes me nervous.

FWIW, I've been stewing on the deferred parse issues with the approach from this patch because they make me nervous as well. Because [[]] attributes can have arbitrary token soup for their arguments and that includes arbitrary expressions, I worry that just giving the plugin author a bunch of tokens and saying "you deal with it" will only work for very trivial arguments.

Wouldn't it be better for the ParsedAttrInfo objects to determine whether and how to parse their arguments. They could do so immediately, or save tokens, or whatever on a per-attribute per-argument basis. Isn't that more flexible? Add some ParsedAttrInfo defaults for the default cxx11, gnu & clang flavours of attributes?

I think this would be an improved approach over just replaying the tokens for the plugin to handle. I would hope that we'd be able to design something that handles the easy cases of passing a string literal, an integer value, an enumerated list of named values, etc so that plugin authors don't have to reinvent the wheel for basic needs.

Qix- added a comment.Apr 18 2021, 12:32 PM

I'm not sure exactly how to continue after the last few comments - what should the approach be for this patch? Or are these things we can shoot for in later patches?

I'm not sure exactly how to continue after the last few comments - what should the approach be for this patch? Or are these things we can shoot for in later patches?

I don't think they're things we should shoot for in a later patch; the token replay approach doesn't seem like it would work for more complicated attribute arguments. As a concrete example of what would be super difficult to support would be:

struct S {
  int member;
  void func(int i, int j) [[plugin::attr(i + j + member)]];
};

because it would be very difficult to recognize that member is looked up in the context of the declaration of S::func() after having left the parsing context for the structure.

I think the one way to continue is along the lines of what's described in https://bugs.llvm.org/show_bug.cgi?id=46446#c11 so that the plugin handles parsing the arguments by invoking calls on the parser.

Qix- abandoned this revision.Apr 22 2021, 6:35 AM

Closing in favor of more complete plugin attribute changes discussed in IRC.