This is a preliminary patch that I hope can serve as a basis for discussion. The
design summary below is copy/pasted from https://bugs.llvm.org/show_bug.cgi?id=46446#c7
(I'm intentionally posting a work-in-progress so that I can make sure that things
are roughly looking good; I'd like to defer details (clang-format'ing my
changes, naming conventions, tests) to later once the design is settled.)
The patch essentially adds the ability for a plugin to handle the parsing of a
CXX11 attribute's arguments. This means that with this patch, plugin-defined
CXX11 attributes are no longer limited to constant attributes that take no
arguments.
The patch intentionally gives the plugin a fair amount of freedom, to enable
interesting use-cases (see below).
The patch does not (yet) tackle the ability for out-of-tree plugins to extend
Attr.td. I'd be happy to work on that once the present patch makes progress.
Design
I chose to extend the ParsedAttrInfo class. This allows reusing the logic that
searches for an attribute plugin that has registered a matching spelling, and
reusing the existing registry of plugins. (I contemplated a separate registry,
but thought it would duplicate the spelling-search logic and would also lead to
code duplication in the plugin itself.)
By default, ParsedAttrInfo's (most of which do *not* come from plugins) are
happy to let the built-in clang parsing logic take effect, enforcing tweaking
knobs such as NumArgs and OptArgs. Plugins can, if they are so inclined, elect
to bypass clang's parsing logic (which, as we saw on Bugzilla, ignores CXX11
attribute arguments when they're not known to clang). In that case, plugins are
on their own; they are expected to handle all of the parsing, as well as pushing
the resulting ParsedAttr into the ParsedAttributes. They then report whether the
arguments are syntactically valid or not and the rest of the parsing code
handles error reporting.
(Side-note: this means that it's a three-state return value, and I'm currently
reusing the existing AttrHandling enum, but it could be improved with either
better names or a separate enum.)
The plugin receives, in addition to location information, two key parameters:
- the parser, obviously
- a possibly-null Declarator representing the partially parsed declaration that the attribute is attached to, if any.
The second parameter may be the most surprising part of this patch, so let me
try to provide some motivation and context.
My goal is to enable plugins to perform advanced static analysis and code
generation, relying on rich CXX11 attributes capturing deep semantic information
relating, say, a function's parameters, or a struct's fields.
You can easily imagine someone authoring a static analyzer that recognizes:
typedef struct { size_t len; uint8_t *chars; [[ my_analyzer::invariant(strlen(chars) == len) ]]; } managed_string;
The point here being to leverage a nice CXX11 attribute that benefits from
typo-checking, proper name and macro resolution, etc. rather than using some
unreliable syntax embedded in comments like so many other tools are doing, or
worse, a custom C++ parser.
In our case, we're auto-generating parsers and serializers from a struct type
definition (see https://www.usenix.org/system/files/sec19-ramananandro_0.pdf for
more info), so I'd like to author:
typedef struct { uint32_t min_version; uint32_t max_version [[ everparse::constraint (max_version >= min_version && min_version >= 2 && max_version <= 4) ]]; } my_network_format; `
Of course, our actual network formats are much more complicated, but this is
just for the sake of example. My point is that allowing plugins a substantial
degree of freedom is likely to enable a flurry of novel and exciting use-cases
based on clang.
Back to the design of the patch, receiving the Declarator allows me to do things
like this (in the plugin):
Sema &S = P->getActions(); TypeSourceInfo *T = S.GetTypeForDeclarator(*D, P->getCurScope()); QualType R = T->getType(); VarDecl *VD = VarDecl::Create(S.getASTContext(), S.CurContext, D->getBeginLoc(), D->getIdentifierLoc(), D->getIdentifier(), R, T, SC_None); // Doing this makes sure Sema is aware of the new scope entry, meaning this name // will be in scope when parsing the expression. (Parsing and scope // resolution are intertwined.) VD->setImplicit(true); S.PushOnScopeChains(VD, P->getCurScope());
This allows my plugin to successfully parse and recognize complex arguments that
refer to the current declarator, in effect letting my plugin define its own
extended syntax and scoping rules.
I hope this helps, I'll keep working on the patch (this is the first version
that seems to do something useful) but thought it'd be helpful to get the
discussion started in the meanwhile.
Details (optional reading)
- I was able to refactor the lookup-by-spelling logic in ParsedAttrInfo::get -- I'm curious to know if the two overloaded methods are conforming to clang's style
- I had to switch ParsedAttrInfo from a struct to a class because of a warning related to Microsoft ABI standards violations
- I'm wondering if the Declarator that the plugin receives should instead be a more general class (with a Kind()) that covers the union of all the "things currently being parsed that an attribute may be attached to". But, since I couldn't come up with other compelling examples, I've left it as-is until I find a strong argument in favor of introducing extra complexity.
I think this should also have a parameter for the syntax used for the attribute, because different syntaxes may have different rules for how to parse the arguments to the attribute (this is one of the reasons why we have common attribute argument parsing as well as syntax-specific attribute argument parsing in Parser).