This is an archive of the discontinued LLVM Phabricator instance.

Support Attr in DynTypedNode and ASTMatchers.
ClosedPublic

Authored by sammccall on Oct 19 2020, 2:37 PM.

Diff Detail

Event Timeline

sammccall created this revision.Oct 19 2020, 2:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 2:37 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sammccall requested review of this revision.Oct 19 2020, 2:37 PM

Looks like there are some failing tests in premerge bots, e.g. https://reviews.llvm.org/harbormaster/unit/view/185822/

Also run clang/docs/tools/dump_ast_matchers.py script to update the LibASTMatchersReference.html file.

sammccall updated this revision to Diff 299324.Oct 20 2020, 3:40 AM

Update docs, dynamic registry and tests.

hokein accepted this revision.Oct 20 2020, 7:09 AM
This revision is now accepted and ready to land.Oct 20 2020, 7:09 AM
aaron.ballman added a subscriber: aaron.ballman.

This is *awesome*, thank you so much for working on it! One question I have is: as it stands, this is not a particularly useful matcher because it can only be used to say "yup, there's an attribute" -- are you planning to extend this functionality so that you can do something like attr(hasName("foo")), or specify the syntax the attribute uses, isImplicit(), etc?

clang/include/clang/ASTMatchers/ASTMatchers.h
6738

Can you also add an example using __declspec()?

Should I expect this to work on attributes added in other, perhaps surprising ways, like through #pragma or keywords?

// Some pragmas add attributes under the hood
#pragma omp declare simd 
int min(int a, int b) { return a < b ? a : b; }

// Other pragmas only exist to add attributes
#pragma clang attribute push (__attribute__((annotate("custom"))), apply_to = function)
void function(); // The function now has the annotate("custom") attribute
#pragma clang attribute pop

// Still other attributes come through keywords
alignas(16) int i;

If this is expected to match, we should document it more clearly.

clang/lib/ASTMatchers/ASTMatchFinder.cpp
238

Should we be properly handling IgnoreUnlessSpelledInSource for implicit attributes? e.g., [[gnu::interrupt]] void func(int *i); which gets two attributes (the interrupt attribute and an implicit used attribute).

clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
1879

Why?

aaron.ballman requested changes to this revision.Oct 20 2020, 7:40 AM
This revision now requires changes to proceed.Oct 20 2020, 7:40 AM

This is *awesome*, thank you so much for working on it!

Thanks for the comments - haven't addressed them yet, but wanted to clarify scope first.

One question I have is: as it stands, this is not a particularly useful matcher because it can only be used to say "yup, there's an attribute"

Yes definitely, I should have mentioned this...

My main goal here was to support it in DynTypedNode, as we have APIs (clangd::SelectionTree) that can only handle nodes that fit there.
But the common test pattern in ASTTypeTraitsTest used matchers, and I figured a basic matcher wasn't hard to add.

are you planning to extend this functionality so that you can do something like attr(hasName("foo")), or specify the syntax the attribute uses, isImplicit(), etc?

I hadn't planned to - it definitely makes sense though I don't have any immediate use cases. I can do any of:

  • leave as-is, waiting for someone to add matchers to make it useful
  • scope down the patch to exclude the matcher (and write the ASTTypeTraitsTest another way)
  • add some simple matcher like hasName (I guess in a followup patch) to make it minimally useful, with space for more

What do you think?

clang/include/clang/ASTMatchers/ASTMatchers.h
6738

I think the answer to all of these is yes, I'll update the docs.

clang/lib/ASTMatchers/ASTMatchFinder.cpp
238

I think we should, I just wasn't thinking clearly about where that should go :-)

This is *awesome*, thank you so much for working on it!

Thanks for the comments - haven't addressed them yet, but wanted to clarify scope first.

One question I have is: as it stands, this is not a particularly useful matcher because it can only be used to say "yup, there's an attribute"

Yes definitely, I should have mentioned this...

My main goal here was to support it in DynTypedNode, as we have APIs (clangd::SelectionTree) that can only handle nodes that fit there.
But the common test pattern in ASTTypeTraitsTest used matchers, and I figured a basic matcher wasn't hard to add.

Okay, that makes sense to me.

are you planning to extend this functionality so that you can do something like attr(hasName("foo")), or specify the syntax the attribute uses, isImplicit(), etc?

I hadn't planned to - it definitely makes sense though I don't have any immediate use cases. I can do any of:

  • leave as-is, waiting for someone to add matchers to make it useful
  • scope down the patch to exclude the matcher (and write the ASTTypeTraitsTest another way)
  • add some simple matcher like hasName (I guess in a followup patch) to make it minimally useful, with space for more

What do you think?

My preference is to add support for hasName() as that's going to be the most common need for matching attributes. If you also added support for matching which syntax is used, support attributes for hasArgument() (which I suppose could be interesting given that there are the parsed attribute arguments and the semantic attribute arguments, and they may differ), etc, I certainly wouldn't complain, but I think at least name matching support is required to make the matcher marginally useful. I'm totally fine if this happens in a follow-up patch (or patches). WDYT?

sammccall updated this revision to Diff 300106.Oct 22 2020, 2:43 PM
sammccall marked 2 inline comments as done.

Address review comments.
Add hasAttrName() and make isImplicit() support Attr too.

This is *awesome*, thank you so much for working on it!

Thanks for the comments - haven't addressed them yet, but wanted to clarify scope first.

One question I have is: as it stands, this is not a particularly useful matcher because it can only be used to say "yup, there's an attribute"

Yes definitely, I should have mentioned this...

My main goal here was to support it in DynTypedNode, as we have APIs (clangd::SelectionTree) that can only handle nodes that fit there.
But the common test pattern in ASTTypeTraitsTest used matchers, and I figured a basic matcher wasn't hard to add.

Okay, that makes sense to me.

are you planning to extend this functionality so that you can do something like attr(hasName("foo")), or specify the syntax the attribute uses, isImplicit(), etc?

I hadn't planned to - it definitely makes sense though I don't have any immediate use cases. I can do any of:

  • leave as-is, waiting for someone to add matchers to make it useful
  • scope down the patch to exclude the matcher (and write the ASTTypeTraitsTest another way)
  • add some simple matcher like hasName (I guess in a followup patch) to make it minimally useful, with space for more

What do you think?

My preference is to add support for hasName() as that's going to be the most common need for matching attributes.

Sounds good (though I ran into issues calling this hasName specifically, see below). There's overlap with hasAttr(Kind) but I think that only handles decls, you can't bind the attribute itself, and it's harder to extend to the other stuff you mentioned.
So maybe hasAttr gets deprecated, or maybe has(attr(hasName("foo"))) is a mouthful for a common case so the shorthand is nice.

If you also added support for matching which syntax is used, support attributes for hasArgument() (which I suppose could be interesting given that there are the parsed attribute arguments and the semantic attribute arguments, and they may differ), etc, I certainly wouldn't complain, but I think at least name matching support is required to make the matcher marginally useful.

Yeah, in the interests of time I think i'll need to stop there (my understanding of the attribute model is pretty superficial).

I'm totally fine if this happens in a follow-up patch (or patches). WDYT?

For hasAttrName and isImplicit (which simplifies the tests), I've tried to fold it into this patch, it's simple enough.
I went with hasAttrName (matching Attr::getAttrName()) because I wasn't easily able to get attrName to work.

The issue is that hasName is already the name of a Decl matcher with the same signature, so we'd have to make it polymorphic.
(This means they'd appear to be one matcher, with one set of docs etc, despite having very different semantics - doesn't really seem right...)
The implementation of the Decl version is the fairly complex HasNameMatcher (shared with hasAnyName) that copies strings into a vector on construction. This doesn't fit into the polymorphic matcher model as far as I can find, so we're adding heap allocations to the fastpath of matching hasName("foo") which seems... not obviously a good tradeoff.

If you think the hasName name is important I'm happy to drop hasAttrName from this patch, and someone can work out how to add it with the better name, but I'm not sure I want to pick this battle with the template gods...

sammccall updated this revision to Diff 300107.Oct 22 2020, 2:47 PM

Fix nullptr condition (probably dead, but...)

Harbormaster completed remote builds in B76094: Diff 300107.

My preference is to add support for hasName() as that's going to be the most common need for matching attributes.

Sounds good (though I ran into issues calling this hasName specifically, see below). There's overlap with hasAttr(Kind) but I think that only handles decls, you can't bind the attribute itself, and it's harder to extend to the other stuff you mentioned.
So maybe hasAttr gets deprecated, or maybe has(attr(hasName("foo"))) is a mouthful for a common case so the shorthand is nice.

I was sort of expecting hasAttr() to be deprecated because it is a leaky abstraction (consumers of the API have to know about our internal naming convention for attribute kind enumerations, it makes it harder for use to change the definition of the enumeration, etc.

If you also added support for matching which syntax is used, support attributes for hasArgument() (which I suppose could be interesting given that there are the parsed attribute arguments and the semantic attribute arguments, and they may differ), etc, I certainly wouldn't complain, but I think at least name matching support is required to make the matcher marginally useful.

Yeah, in the interests of time I think i'll need to stop there (my understanding of the attribute model is pretty superficial).

That's fine by me, thank you for the bits you've done!

I'm totally fine if this happens in a follow-up patch (or patches). WDYT?

For hasAttrName and isImplicit (which simplifies the tests), I've tried to fold it into this patch, it's simple enough.
I went with hasAttrName (matching Attr::getAttrName()) because I wasn't easily able to get attrName to work.

The issue is that hasName is already the name of a Decl matcher with the same signature, so we'd have to make it polymorphic.
(This means they'd appear to be one matcher, with one set of docs etc, despite having very different semantics - doesn't really seem right...)

I definitely don't agree that hasName() has different semantics for attributes from declarations -- both of them are named identifiers, potentially with scope information as part of the name. Making the matcher polymorphic matches my expectations (pun intended).

The implementation of the Decl version is the fairly complex HasNameMatcher (shared with hasAnyName) that copies strings into a vector on construction. This doesn't fit into the polymorphic matcher model as far as I can find, so we're adding heap allocations to the fastpath of matching hasName("foo") which seems... not obviously a good tradeoff.

This, on the other hand, is a very good reason to not push forward with those changes yet.

If you think the hasName name is important I'm happy to drop hasAttrName from this patch, and someone can work out how to add it with the better name, but I'm not sure I want to pick this battle with the template gods...

I'm curious if @klimek agrees, but I think hasName() is the correct way we want to eventually go. That said, I think having an attribute matcher with no way to match by name encourages an anti-pattern where people write a very generic matcher that matches a lot and then do further refinement work for every match (rather than taking advantage of memoization and other optimizations), so I think hasAttrName() is still better than nothing. Would it make sense to add the API but mark it as deprecated to alert users that this API may go away some day if we can swing it? It's a bit unfriendly since we don't have a replacement API lined up for users to use, so I wouldn't want to issue any diagnostics for using the matcher.

clang/include/clang/ASTMatchers/ASTMatchers.h
6764

If you want to avoid duplicating the call to getAttrName() (I don't insist).

if (const IdentifierInfo *II = Node.getAttrName())
  return II->isStr(Name);
return false;
clang/include/clang/ASTMatchers/ASTMatchersInternal.h
729

Are the changes to the HasNameMatcher needed?

clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
1893

nodes an -> nodes create an

njames93 added inline comments.
clang/include/clang/ASTMatchers/ASTMatchers.h
6763

AST_MATCHER macros that take StringRef paramaters don't own the data for the string, which will result in UB if they are passed a reference to a string that is destroyed while the matcher is still matching. It's much safer to use std::string in this case.

sammccall marked 5 inline comments as done.

Address comments

I was sort of expecting hasAttr() to be deprecated because it is a leaky abstraction (consumers of the API have to know about our internal naming convention for attribute kind enumerations, it makes it harder for use to change the definition of the enumeration, etc.

Sure, SGTM

I'm totally fine if this happens in a follow-up patch (or patches). WDYT?

I definitely don't agree that hasName() has different semantics for attributes from declarations -- both of them are named identifiers, potentially with scope information as part of the name. Making the matcher polymorphic matches my expectations (pun intended).

Fair enough - this isn't really my area, but the differences I was thinking of were:

  • the Attr uses a name rather than owning it - it's more like DeclRefExpr than like NamedDecl
  • it wasn't clear to me that you'd want the same rules about optional scope handling, though it sounds like you do

If you think the hasName name is important I'm happy to drop hasAttrName from this patch, and someone can work out how to add it with the better name, but I'm not sure I want to pick this battle with the template gods...

I'm curious if @klimek agrees, but I think hasName() is the correct way we want to eventually go. That said, I think having an attribute matcher with no way to match by name encourages an anti-pattern where people write a very generic matcher that matches a lot and then do further refinement work for every match (rather than taking advantage of memoization and other optimizations), so I think hasAttrName() is still better than nothing. Would it make sense to add the API but mark it as deprecated to alert users that this API may go away some day if we can swing it? It's a bit unfriendly since we don't have a replacement API lined up for users to use, so I wouldn't want to issue any diagnostics for using the matcher.

I've added a "deprecated" comment describing the intent to merge into hasName.

clang/include/clang/ASTMatchers/ASTMatchersInternal.h
729

Oops, these were left over from a polymorphism experiment

aaron.ballman accepted this revision.Oct 28 2020, 12:39 PM

I was sort of expecting hasAttr() to be deprecated because it is a leaky abstraction (consumers of the API have to know about our internal naming convention for attribute kind enumerations, it makes it harder for use to change the definition of the enumeration, etc.

Sure, SGTM

I'm totally fine if this happens in a follow-up patch (or patches). WDYT?

I definitely don't agree that hasName() has different semantics for attributes from declarations -- both of them are named identifiers, potentially with scope information as part of the name. Making the matcher polymorphic matches my expectations (pun intended).

Fair enough - this isn't really my area, but the differences I was thinking of were:

  • the Attr uses a name rather than owning it - it's more like DeclRefExpr than like NamedDecl
  • it wasn't clear to me that you'd want the same rules about optional scope handling, though it sounds like you do

Thank you for giving this feedback, it's helpful! I may have to rethink this a bit because it never dawned on me that someone would think of attributes being more like a DeclRefExpr than a NamedDecl. If I'm following along properly, this is because the AST node for the attribute could have been generated from various ways of spelling it? e.g., gcc::aligned vs _Alignas vs align vs alignas all making an AlignedAttr. I was thinking this is more like NamedDecl because *one* of those names generated the AST node and I'd hate for AST matchers to have to do something like anyOf(attr(hasSpelling("GNU"), hasName("aligned")), attr(hasSpelling("CXX"), hasName("gcc::aligned)), attr(hasSpelling("declspec"), hasName("align")), attr(hasSpelling("keyword"), hasAnyName("_Alignas", "alignas"))) to match an AlignedAttr regardless of how it is spelled. I figured it would make more sense for attr(hasName("aligned")) to match any AlignedAttr regardless of how the user wrote it in their source code, but perhaps others have different ideas here.

The scope handling is a bit of a question mark. Attributes don't really have namespaces like other identifiers, but they sort of have them when you squint at the name just right. For instance, the scope handling currently allows you to match ::foo in C mode even though C doesn't have the notion of namespaces. So I thought it wouldn't be unintuitive for it to also work for namespace scopes as well.

If you think the hasName name is important I'm happy to drop hasAttrName from this patch, and someone can work out how to add it with the better name, but I'm not sure I want to pick this battle with the template gods...

I'm curious if @klimek agrees, but I think hasName() is the correct way we want to eventually go. That said, I think having an attribute matcher with no way to match by name encourages an anti-pattern where people write a very generic matcher that matches a lot and then do further refinement work for every match (rather than taking advantage of memoization and other optimizations), so I think hasAttrName() is still better than nothing. Would it make sense to add the API but mark it as deprecated to alert users that this API may go away some day if we can swing it? It's a bit unfriendly since we don't have a replacement API lined up for users to use, so I wouldn't want to issue any diagnostics for using the matcher.

I've added a "deprecated" comment describing the intent to merge into hasName.

Thanks! LGTM! Please wait a day or two before landing for @klimek to respond if he has any concerns about introducing a new matcher that's already deprecated.

This revision is now accepted and ready to land.Oct 28 2020, 12:39 PM

Adding @klimek as a reviewer since I forgot to do that last time.

(while this is useful to you, let's keep discussing, but I'm also happy to stop and land this with your preferred API/semantics - just LMK if changes are needed)

Thank you for giving this feedback, it's helpful! I may have to rethink this a bit because it never dawned on me that someone would think of attributes being more like a DeclRefExpr than a NamedDecl. If I'm following along properly, this is because the AST node for the attribute could have been generated from various ways of spelling it? e.g., gcc::aligned vs _Alignas vs align vs alignas all making an AlignedAttr.

My mental model is that the Attr classes own the names, and the Attr instances refer to the classes using the names.

So the class AlignedAttr has the name aligned, and the instance [[gnu::aligned(8)]] refers to it. Just like the FunctionDecl double sqrt(double); has the name sqrt, and the DeclRefExpr in sqrt(8) refers to it.

To reinforce this, I can't choose a different name for the instance: [[gnu::whatever]] doesn't produce an Attr instance at all. But I can choose a different name for the class (by modifying Attr.td).
So an Attr instance (what we're matching) feels more like a reference than a primary entity. (Similarly, the semantics are associated with the Attr class rather than the instance).

If clang's AST modeled syntax rather than semantics, I imagine there'd be one Attr class and each instance could have an arbitrary name and arg list. In that case an Attr would seem to "have" a name in the same sense a decl does.

I figured it would make more sense for attr(hasName("aligned")) to match any AlignedAttr regardless of how the user wrote it in their source code, but perhaps others have different ideas here.

That definitely seems like the common case. However I can certainly imagine wanting to e.g. write a migration from __attribute__((xxx)) to [[gnu::xxx]].

(If attr classes were AST nodes too, like DeclRefExpr/Decl or TypeLoc/Type then you'd resolve this by using a matcher on one or the other)

As it stands it seems too surprising that Attr::getAttrName() returns the particular name used for that instance, but attr(hasAttrName("...")) would match any name associated with the same class.
Maybe attr(equivalentAttrName("...")) or so, possibly constrasting to attr(exactAttrName("..."))?

The scope handling is a bit of a question mark. Attributes don't really have namespaces like other identifiers, but they sort of have them when you squint at the name just right. For instance, the scope handling currently allows you to match ::foo in C mode even though C doesn't have the notion of namespaces. So I thought it wouldn't be unintuitive for it to also work for namespace scopes as well.

Yup, I don't have a real opinion here, I just assumed it was different. Seems like it would work fine.

(while this is useful to you, let's keep discussing, but I'm also happy to stop and land this with your preferred API/semantics - just LMK if changes are needed)

Let's hold off on landing for just a little bit. I don't think changes are needed yet, but the discussion may change my opinion and we might as well avoid churn. However, if you need to land part of this while we discuss other parts, LMK.

Thank you for giving this feedback, it's helpful! I may have to rethink this a bit because it never dawned on me that someone would think of attributes being more like a DeclRefExpr than a NamedDecl. If I'm following along properly, this is because the AST node for the attribute could have been generated from various ways of spelling it? e.g., gcc::aligned vs _Alignas vs align vs alignas all making an AlignedAttr.

My mental model is that the Attr classes own the names, and the Attr instances refer to the classes using the names.

So the class AlignedAttr has the name aligned, and the instance [[gnu::aligned(8)]] refers to it. Just like the FunctionDecl double sqrt(double); has the name sqrt, and the DeclRefExpr in sqrt(8) refers to it.

To reinforce this, I can't choose a different name for the instance: [[gnu::whatever]] doesn't produce an Attr instance at all. But I can choose a different name for the class (by modifying Attr.td).
So an Attr instance (what we're matching) feels more like a reference than a primary entity. (Similarly, the semantics are associated with the Attr class rather than the instance).

If clang's AST modeled syntax rather than semantics, I imagine there'd be one Attr class and each instance could have an arbitrary name and arg list. In that case an Attr would seem to "have" a name in the same sense a decl does.

Oh, how interesting, thank you for expounding!

My mental model is that there is no declaration of an attribute (in the usual sense, because users cannot specify their own attributes without changing the compiler), and so there's not a referential model like there are with a DeclRefExpr that refers back to a specific declaration. Instead, to me, an attribute is a bit more like a builtin type -- you may have multiple ways to spell the type (char vs signed char or unsigned char, long int vs long, etc), but it's the same type under the hood.

However, that brings up an interesting observation: you can't do hasType(hasName("long")) and instead have to do hasType(asString("long")). I don't recall the background about why there is asString() for matching string types vs hasName() -- you wouldn't happen to remember that context, would you? For instance, is the correct pattern to allow attr(asString("aligned")) to map back to an AlignedAttr regardless of how the attribute was spelled in source, similar to how asString("long") matches long int?

I figured it would make more sense for attr(hasName("aligned")) to match any AlignedAttr regardless of how the user wrote it in their source code, but perhaps others have different ideas here.

That definitely seems like the common case. However I can certainly imagine wanting to e.g. write a migration from __attribute__((xxx)) to [[gnu::xxx]].

Definitely true! I was imagining we'd want a separate checker for narrowing based on attribute syntax.

(If attr classes were AST nodes too, like DeclRefExpr/Decl or TypeLoc/Type then you'd resolve this by using a matcher on one or the other)

As it stands it seems too surprising that Attr::getAttrName() returns the particular name used for that instance, but attr(hasAttrName("...")) would match any name associated with the same class.
Maybe attr(equivalentAttrName("...")) or so, possibly constrasting to attr(exactAttrName("..."))?

Yeah, I am starting to see this more readily, so thank you for continuing the discussion!

The scope handling is a bit of a question mark. Attributes don't really have namespaces like other identifiers, but they sort of have them when you squint at the name just right. For instance, the scope handling currently allows you to match ::foo in C mode even though C doesn't have the notion of namespaces. So I thought it wouldn't be unintuitive for it to also work for namespace scopes as well.

Yup, I don't have a real opinion here, I just assumed it was different. Seems like it would work fine.

(while this is useful to you, let's keep discussing, but I'm also happy to stop and land this with your preferred API/semantics - just LMK if changes are needed)

Let's hold off on landing for just a little bit. I don't think changes are needed yet, but the discussion may change my opinion and we might as well avoid churn. However, if you need to land part of this while we discuss other parts, LMK.

Sorry I haven't had more to say on this. I would like to land at least the attr-in-DynTypedNode part, as this unblocks some stuff unrelated to matchers in clangd (someone just filed a dupe bug, which reminded me...)

Happy to cut out all of the matcher part, or land just attr() without any attr matchers, or all of it, or make another round of changes...

My mental model is that there is no declaration of an attribute (in the usual sense, because users cannot specify their own attributes without changing the compiler), and so there's not a referential model like there are with a DeclRefExpr that refers back to a specific declaration. Instead, to me, an attribute is a bit more like a builtin type -- you may have multiple ways to spell the type (char vs signed char or unsigned char, long int vs long, etc), but it's the same type under the hood.

Ah, this also makes sense! Again, the wrinkle is IIUC attr::Kind is like a builtin type, while Attr is more like VectorType - it specifies a builtin type, but also some other stuff.

However, that brings up an interesting observation: you can't do hasType(hasName("long")) and instead have to do hasType(asString("long")). I don't recall the background about why there is asString() for matching string types vs hasName() -- you wouldn't happen to remember that context, would you?

That was before my time I'm afraid.
It seems the implementation checks whether QualType::getAsString() exactly matches the argument. asString("long") matches a type specified as long or long int. But asString("long int") doesn't match anything! (Sugar types muddle things a bit but don't fundamentally change this).
My guess is this implementation was just a question of practicality: parsing the matcher argument as a type isn't feasible, but having the node producing a single string to compare to is easy. And it generalizes to complicated template types.

It's reasonable to think of these as doing different things hasName is querying a property, while asString is doing a specialized comparison of the whole thing.
I'm not *sure* that's why the different names where chosen though. And the fact that hasName allows qualifiers definitely feels a bit bolted on here.

For instance, is the correct pattern to allow attr(asString("aligned")) to map back to an AlignedAttr regardless of how the attribute was spelled in source, similar to how asString("long") matches long int?

If we're following precedents (not sure we have to):

  • per your mental model, asString would pick one fixed name for each attribute kind - which may not be the one used!
  • ... but also asString should roughly be "node as string" which logically includes arguments and should use the right name (since the node records it)
  • hasName should match against the name attribute of Attr, possibly with optional qualifiers
  • ... but there's no precedent for rejecting synonymous names per se, and this may be confusing behavior

So I don't think either of these are a *great* fit, assuming we think "attribute named XYZ or any synonym" is the most important operation (which I agree with).
I can't think of a pithy name, but I also think it's OK to be a bit verbose and explicit here - attributes aren't the most commonly used bit of the AST.

Maybe attr(equivalentAttrName("...")) or so, possibly constrasting to attr(exactAttrName("..."))?

(while this is useful to you, let's keep discussing, but I'm also happy to stop and land this with your preferred API/semantics - just LMK if changes are needed)

Let's hold off on landing for just a little bit. I don't think changes are needed yet, but the discussion may change my opinion and we might as well avoid churn. However, if you need to land part of this while we discuss other parts, LMK.

Sorry I haven't had more to say on this. I would like to land at least the attr-in-DynTypedNode part, as this unblocks some stuff unrelated to matchers in clangd (someone just filed a dupe bug, which reminded me...)

That part LGTM and is fine to land.

Happy to cut out all of the matcher part, or land just attr() without any attr matchers, or all of it, or make another round of changes...

I think the attr() matcher is definitely reasonable as it stands and can be landed as well.

My mental model is that there is no declaration of an attribute (in the usual sense, because users cannot specify their own attributes without changing the compiler), and so there's not a referential model like there are with a DeclRefExpr that refers back to a specific declaration. Instead, to me, an attribute is a bit more like a builtin type -- you may have multiple ways to spell the type (char vs signed char or unsigned char, long int vs long, etc), but it's the same type under the hood.

Ah, this also makes sense! Again, the wrinkle is IIUC attr::Kind is like a builtin type, while Attr is more like VectorType - it specifies a builtin type, but also some other stuff.

Yeah, that's a fair point.

However, that brings up an interesting observation: you can't do hasType(hasName("long")) and instead have to do hasType(asString("long")). I don't recall the background about why there is asString() for matching string types vs hasName() -- you wouldn't happen to remember that context, would you?

That was before my time I'm afraid.
It seems the implementation checks whether QualType::getAsString() exactly matches the argument. asString("long") matches a type specified as long or long int. But asString("long int") doesn't match anything! (Sugar types muddle things a bit but don't fundamentally change this).
My guess is this implementation was just a question of practicality: parsing the matcher argument as a type isn't feasible, but having the node producing a single string to compare to is easy. And it generalizes to complicated template types.

That's my guess as well -- also, I wasn't aware of the long vs long int behavioral difference with asString(), that's.. neat.

It's reasonable to think of these as doing different things hasName is querying a property, while asString is doing a specialized comparison of the whole thing.
I'm not *sure* that's why the different names where chosen though. And the fact that hasName allows qualifiers definitely feels a bit bolted on here.

For instance, is the correct pattern to allow attr(asString("aligned")) to map back to an AlignedAttr regardless of how the attribute was spelled in source, similar to how asString("long") matches long int?

If we're following precedents (not sure we have to):

  • per your mental model, asString would pick one fixed name for each attribute kind - which may not be the one used!
  • ... but also asString should roughly be "node as string" which logically includes arguments and should use the right name (since the node records it)
  • hasName should match against the name attribute of Attr, possibly with optional qualifiers
  • ... but there's no precedent for rejecting synonymous names per se, and this may be confusing behavior

So I don't think either of these are a *great* fit, assuming we think "attribute named XYZ or any synonym" is the most important operation (which I agree with).

I agree that this is the most common use-case.

I can't think of a pithy name, but I also think it's OK to be a bit verbose and explicit here - attributes aren't the most commonly used bit of the AST.

Maybe attr(equivalentAttrName("...")) or so, possibly constrasting to attr(exactAttrName("..."))?

I think this design is the most flexible and will be the most understandable in the long-run because it makes it clear which name-matching behavior is being used. But do we want to stick attr in the name of this to limit it to just attributes or do we think this is a generalized concept that should apply to matching other names as well? e.g., hasType(equivalentName("long")) to match either long, long int or signed long int and hasType(exactName(long int`)) to only match long int` and not the other forms? Or do you think this is a bridge too far because it would be reasonable to expect hasType(equivalentName("long")) to match typedef names because those are equivalent (or perhaps more problematically, would the user expect hasType(equivalentName("struct foo<bar>::baz<quux>")) to match as an equivalent name for the parameter type (spelled with a using declaration) in void foo(blah b))? (I think I've just about talked myself into keeping attr in the name so we don't try to use this for other kinds of identifiers with equivalent names.)

Go ahead and land the dyn node bits and the attr matcher. If you feel like being on the hook for doing the other matchers that we're discussing in another review, then yay, but don't feel obligated (though I appreciate the design discussion!).

OK, I'm going to drop hasAttrName from the patch (it's a relatively tiny piece of the code/tests) and land this, and put that back up as another review for discussion. And thanks, it's been interesting and I learned a bunch!
We didn't talk about overloading isImplicit to apply to attrs too, but it doesn't seem like that was controversial (and it does help with the tests).

Maybe attr(equivalentAttrName("...")) or so, possibly constrasting to attr(exactAttrName("..."))?

I think this design is the most flexible and will be the most understandable in the long-run because it makes it clear which name-matching behavior is being used. But do we want to stick attr in the name of this to limit it to just attributes or do we think this is a generalized concept that should apply to matching other names as well? e.g., hasType(equivalentName("long")) to match either long, long int or signed long int and hasType(exactName(long int`)) to only match long int` and not the other forms? Or do you think this is a bridge too far because it would be reasonable to expect hasType(equivalentName("long")) to match typedef names because those are equivalent (or perhaps more problematically, would the user expect hasType(equivalentName("struct foo<bar>::baz<quux>")) to match as an equivalent name for the parameter type (spelled with a using declaration) in void foo(blah b))? (I think I've just about talked myself into keeping attr in the name so we don't try to use this for other kinds of identifiers with equivalent names.)

Putting attr in the name is the *safe* option - as we've seen, going for a generic name and hoping it can work consistently for all types over time is risky.
(The cost of a confusing "overload" set is also high because the polymorphism mechanism is poorly understood - most users probably can't easily look up the variant they're using and ignore the others).
On the other hand, when it is consistent, I think the uniformity is nice - so a generic name isn't without merit, it's just a bit of a gamble.

We didn't talk about overloading isImplicit to apply to attrs too, but it doesn't seem like that was controversial (and it does help with the tests).

I spoke too soon about this...
This prevents hasAncestor(isImplicit()) from compiling, because hasAncestor needs to deduce the node type from its argument to call ASTMatchFinder::matchesAncestorOf<T>().
This occurs in a few places in tree and many places in our private codebase...
The workaround is hasAncestor(decl(isImplicit())) which is reasonable, except that "is contained in *any* implicit node" is probably actually the intent. Well, at least it's not a regression.

In addition, while digging into this, I realized Attrs are not traversed in the parent map, and not supported by the parent/child/ancestor/descendant traversals.
So I'm fixing that... and adding some tests.

I'll need to send this for another round, even without the name matcher.

sammccall updated this revision to Diff 306795.Nov 20 2020, 3:51 PM

Drop hasAttrName matcher.

Update uses of hasAncestor(isImplicit()) to hasAncestor(decl(isImplicit())),
as the former no longer compiles due to lack of deduction.

Implement and test up/down traversals involving attrs.

aaron.ballman edited subscribers, added: ymandel; removed: aaron.ballman.Nov 23 2020, 5:36 AM

We didn't talk about overloading isImplicit to apply to attrs too, but it doesn't seem like that was controversial (and it does help with the tests).

I spoke too soon about this...
This prevents hasAncestor(isImplicit()) from compiling, because hasAncestor needs to deduce the node type from its argument to call ASTMatchFinder::matchesAncestorOf<T>().
This occurs in a few places in tree and many places in our private codebase...
The workaround is hasAncestor(decl(isImplicit())) which is reasonable, except that "is contained in *any* implicit node" is probably actually the intent. Well, at least it's not a regression.

Users can always traverse in IgnoreUnlessSpelledInSource mode for that situation though, so at least there's a reasonable path forward.

In addition, while digging into this, I realized Attrs are not traversed in the parent map, and not supported by the parent/child/ancestor/descendant traversals.
So I'm fixing that... and adding some tests.

Good catch!

I'll need to send this for another round, even without the name matcher.

Thank you!

clang/lib/AST/ASTTypeTraits.cpp
138

Oye, this brings up an interesting point. Plugin-based attributes currently cannot create their own semantic attribute, but will often instead reuse an existing semantic attribute like annotate. This means code like [[clang::plugin_attr]] int x; may or may not be possible to match. Further, some builtin attributes have no semantic attribute associated with them whatsoever: https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Basic/Attr.td#L2740

I think the switch statement logic here is correct in these weird cases and we won't hit the llvm_unreachable. For attributes with no AST representation, there's no Attr object that could be passed in the first place. Unknown attributes similarly won't get here because there's no way to get an AST node for them. Plugin-based attributes are still going to be similarly surprising, but... I don't know that we can solve that here given there's no way to create a plugin-based semantic attribute yet.

Pining @ymandel to raise awareness of these sorts of issues that stencil may run into. For the AST matchers, I think it's reasonable for us to say "if there's no AST node, we can't match on it", but IIRC, stencil was looking to stay a bit closer to the user's source code rather than be strongly tied to the AST.

clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
1887

Can you add an expects false test for an unknown attribute and another one for an attribute with no AST node associated with it?

ymandel added inline comments.Nov 30 2020, 11:42 AM
clang/lib/AST/ASTTypeTraits.cpp
138

@aaron.ballman Thanks for pinging me. However, Stencil is limited to AST nodes, for better or worse. They make it somewhat easier to _generate_ source plainly, but they are fundamentaly an abstraction over the AST. I think that the only way we'll get beyond the AST is Syntax Trees.

Still, nice to see this patch! I've been meaning to do the same for LambaCapture for a while and this will be a handy guide to what needs to be changed.

aaron.ballman accepted this revision.Dec 4 2020, 4:46 AM

LGTM aside from a minor testing request.

clang/lib/AST/ASTTypeTraits.cpp
138

That's good to know, @ymandel, thank you!

clang/unittests/ASTMatchers/ASTMatchersNodeTest.cpp
1887

This comment may have been missed during the discussion.

sammccall marked 2 inline comments as done.Aug 6 2021, 12:55 PM

Finally getting around to landing this, sorry for the long delay!

Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2021, 12:55 PM
This revision was landed with ongoing or failed builds.Aug 6 2021, 1:06 PM
This revision was automatically updated to reflect the committed changes.