This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Fix `readability-static-accessed-through-instance` false negative for static methods
AbandonedPublic

Authored by PiotrZSL on Dec 4 2021, 3:30 PM.

Details

Summary

Fixes PR#52519. The problem actually has nothing to do with the out-of-line definition being inline; the problem is that hasDeclaration() of the memberExpr() will match the out-of-line definition, which obviously isn't marked static, so isStaticStorageClass() won't match. To find out whether a member function is static, one needs to look at the canonical declaration, but there doesn't seem to be a matcher for that yet, so I've added an isStatic() narrowing matcher for CXXMethodDecls, similar to the isConst() matcher that already exists.

Diff Detail

Event Timeline

fwolff created this revision.Dec 4 2021, 3:30 PM
fwolff requested review of this revision.Dec 4 2021, 3:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2021, 3:30 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Thanks a lot for addressing this issue! I am just trying it on our codebase.

The problem actually has nothing to do with the out-of-line definition being inline; the problem is that hasDeclaration() of the memberExpr() will match the out-of-line definition, which obviously isn't marked static, so isStaticStorageClass() won't match.

Hm, an out-of-line definition *cannot* have the static keyword. I wonder if it's actually a bug (in the AST? or just the matcher?) that isStaticStorageClass doesn't match here? I guess that other checks that use isStaticStorageClass might be affected by this too?

Thanks a lot for addressing this issue! I am just trying it on our codebase.

The problem actually has nothing to do with the out-of-line definition being inline; the problem is that hasDeclaration() of the memberExpr() will match the out-of-line definition, which obviously isn't marked static, so isStaticStorageClass() won't match.

Hm, an out-of-line definition *cannot* have the static keyword. I wonder if it's actually a bug (in the AST? or just the matcher?) that isStaticStorageClass doesn't match here? I guess that other checks that use isStaticStorageClass might be affected by this too?

isStaticStorageClass() calls through to FunctionDecl::getStorageClass() which reports the storage as written on that declaration in source. So this isn't a bug in the AST, it's a more of a misunderstanding of whether we're querying the storage duration/linkage for the method or whether we're querying whether it was written with the static keyword.

clang/include/clang/ASTMatchers/ASTMatchers.h
6017

I would prefer we not expose this AST matcher as a public one. There's sufficient confusion around "as written in source" vs "computed" vs "linkage" in this area that I think we should be careful when introducing matchers around storage classes and storage durations. Then again, the water is already pretty muddy here, so maybe this ship sailed...

Other potential solutions to this issue would be to expose an AST matcher to traverse to the canonical decl or only matches the canonical decl, then we could use that to wrap the existing call to isStaticStorageClass(). (Of course, we could make this matcher local to just that source file, as well.)

Thoughts?

Thanks a lot for addressing this issue! I am just trying it on our codebase.

The problem actually has nothing to do with the out-of-line definition being inline; the problem is that hasDeclaration() of the memberExpr() will match the out-of-line definition, which obviously isn't marked static, so isStaticStorageClass() won't match.

Hm, an out-of-line definition *cannot* have the static keyword. I wonder if it's actually a bug (in the AST? or just the matcher?) that isStaticStorageClass doesn't match here? I guess that other checks that use isStaticStorageClass might be affected by this too?

isStaticStorageClass() calls through to FunctionDecl::getStorageClass() which reports the storage as written on that declaration in source. So this isn't a bug in the AST, it's a more of a misunderstanding of whether we're querying the storage duration/linkage for the method or whether we're querying whether it was written with the static keyword.

I see, makes sense.

simon.giesecke added inline comments.Dec 9 2021, 1:54 AM
clang/include/clang/ASTMatchers/ASTMatchers.h
6017

I think if we have a isStaticStorageClass matcher, then we can also have isStatic matcher.

But maybe we shouldn't have either of those in the public header, but maybe add a different one as you suggested, under a suitable name with as little ambiguity as possible.

And maybe keep the isStaticStorageClass matcher but rename it to reduce the chance of misunderstanding?

Then, as mentioned in my original top-level comment, I can imagine that there are more uses of the isStaticStorageClass matcher in clang-tidy rules (and maybe elsewhere) that actually should use the new matcher.

Adding some more reviewers, as this is starting to touch on AST matcher design more deeply and needs a wider audience.

clang/include/clang/ASTMatchers/ASTMatchers.h
6017

My current thinking on this is that I'm not opposed to exposing the functionality, but I have Opinions about the names of how we expose these and what they model. There are quite a few ways to think about "is static" (with some overlap between them):

  1. Was this declaration marked with the static keyword?
  2. Does this declaration have static storage duration? http://eel.is/c++draft/basic.stc.static#def:storage_duration,static
  3. Does this declaration have internal linkage? http://eel.is/c++draft/basic.link#3.1
  4. Does this declaration have a static array extent? C2x 6.7.6.3p6
  5. Does the definition a member function have access to a this pointer/does the definition of a data member contribute to the subobjects of the class? http://eel.is/c++draft/class.static
  6. All of the above questions, but instead of specific to one declaration, is about all (re)declarations of the entity.

I don't think we want to cover #4 as part of this patch (that's more about types and less about declarations), and I think #6 is a general matcher question.

FWIW, we currently have matchers for:

isStaticLocal
hasStaticStorageDuration
isStaticStorageClass

So I don't think isStatic is viable -- it's just too ambiguous given all the many uses of the keyword and the existing set of matchers.

My intuition is that we might want a new kind of traversal matcher for matching over a set of redeclarations. e.g., functionDecl(isStaticStorageClass()) vs anyRedeclaration(functionDecl(isStaticStorageClass())):

static void func(); // #1
void func(); // #2
void func() {} // #3

where functionDecl(isStaticStorageClass()) would match #1 and anyRedeclaration(functionDecl(isStaticStorageClass)) would match #1, #2, and #3. I think if we had this sort of facility, we could probably make use of our existing matchers (perhaps with improved names) to achieve the same goal as you're going for, but I'm curious what other folks think as well.

Yeah, this is a thorny question. The language hasn't chosen clear names and neither has the AST in many cases.
(Just yesterday I lost a lot of time to CXXRecordDecl->getBodyRBrace()...)

I think anyRedeclaration(functionDecl(isStaticStorageClass))is too difficult to get right and too easy to get wrong for such a common/simple concept.
I know what a static method is, but I would have made the same mistake here. In general asking people to describe syntax to answer a semantic question invites this.

Here's a hack: CXXMethodDecl has both isStatic() and isInstance(). I agree that isStatic is a risky name, but I don't think isInstance is. Can we just spell the matcher you want not(isInstance())?

Yeah, this is a thorny question. The language hasn't chosen clear names and neither has the AST in many cases.
(Just yesterday I lost a lot of time to CXXRecordDecl->getBodyRBrace()...)

Yeah, I've run into this problem with attribute subject lists as well, so it's definitely an AST-level problem with how we expose the information in the AST.

I think anyRedeclaration(functionDecl(isStaticStorageClass))is too difficult to get right and too easy to get wrong for such a common/simple concept.

Get right in terms of implementation, or in terms of use, or both?

I know what a static method is, but I would have made the same mistake here. In general asking people to describe syntax to answer a semantic question invites this.

Because our AST is intended to reflect the syntax, our AST matchers kind of *do* match syntax -- I think it's the reason we're in the problem (and this is not the first time it's come up). There's a cognitive disconnect between the AST matchers traversing the AST and applying a match on all node and the C & C++ concept of redeclarations. People writing the matchers are thinking exactly what you said -- I know what a static method is -- but they don't realize they get this behavior with the most straight-forward matching code (https://godbolt.org/z/rWW48MT8T):

static void f(); // Get a match for this
void f() {} // But not this, despite it being static

Here's a hack: CXXMethodDecl has both isStatic() and isInstance(). I agree that isStatic is a risky name, but I don't think isInstance is. Can we just spell the matcher you want not(isInstance())?

This could work, but I don't think it's particularly satisfying in situations that aren't binary. Like asking "does this have static storage duration" (as opposed to thread, automatic, or dynamic storage duration). And it turns out that the behavior there is opposite to asking about the storage class specifier: https://godbolt.org/z/hqserfxzs (the extern redeclaration is still matched because that declaration retains the internal linkage from the first declaration). So we sometimes take prior declarations into account and we sometimes don't. :-(

I think anyRedeclaration(functionDecl(isStaticStorageClass))is too difficult to get right and too easy to get wrong for such a common/simple concept.

Get right in terms of implementation, or in terms of use, or both?

Use, if I'm understanding your question.
If we don't expose this as a matcher with some name, it's a recipe the person writing the matcher has to remember.
There's no good way to look it up, and it's easy to write something that seems good but isn't.

(This problem is pervasive with AST and matchers, and this isn't the worst example.)

I know what a static method is, but I would have made the same mistake here. In general asking people to describe syntax to answer a semantic question invites this.

Because our AST is intended to reflect the syntax, our AST matchers kind of *do* match syntax -- I think it's the reason we're in the problem (and this is not the first time it's come up).

The clang AST is the best tool we have to describe syntax *and* to describe semantics.
Maybe by design it captures "all" syntax and "enough" semantics, but users reasonably care less about the design than about their alternatives and problems. The clang AST is useful for answering syntax questions (better than regex!) but it's _irreplaceable_ for answering semantics questions (better than... err?).

So we have syntax and semantics in one namespace, and static is the worst because it means like 3 syntactic and 5 semantic things :-(

There's a cognitive disconnect between the AST matchers traversing the AST and applying a match on all node and the C & C++ concept of redeclarations.

Definitely, this highlights the disconnect really well.
And some of the accessors on a redecl will delegate to the canonical while others won't. (Can we call this "syntax semantics" vs "semantics semantics"?!)

I definitely think we *should* have a redecl matcher - I just don't think we should make users use it to answer common semantic questions.
That leaves matcher authors remembering and reciting the rules about how syntax aggregates over redecls to produce semantics. We'll make mistakes.

Here's a hack: CXXMethodDecl has both isStatic() and isInstance(). I agree that isStatic is a risky name, but I don't think isInstance is. Can we just spell the matcher you want not(isInstance())?

This could work, but I don't think it's particularly satisfying in situations that aren't binary.

Oh, I totally agree. I just meant specifically for this case, as a way to avoid the dilemma of ambiguous vs unwieldy.

This could work, but I don't think it's particularly satisfying in situations that aren't binary. Like asking "does this have static storage duration" (as opposed to thread, automatic, or dynamic storage duration). And it turns out that the behavior there is opposite to asking about the storage class specifier: https://godbolt.org/z/hqserfxzs

I'm not totally surprised by that, because "storage duration" sounds like a semantic property to me, while "storage class" (specifier) is a syntactic feature.
But it's not a distinction I'd pick up on if I wasn't looking for it, and different people are going to read things different ways.

My favorite naming convention I've seen in the AST is e.g. getBlob() vs getBlobAsWritten().
Maybe getEffectiveBlob() would be even clearer for the semantic version.
But I'm not sure how much it's possible to realign matcher names around that sort of thing if the AST names aren't going to change.

I think anyRedeclaration(functionDecl(isStaticStorageClass))is too difficult to get right and too easy to get wrong for such a common/simple concept.

Get right in terms of implementation, or in terms of use, or both?

Use, if I'm understanding your question.
If we don't expose this as a matcher with some name, it's a recipe the person writing the matcher has to remember.
There's no good way to look it up, and it's easy to write something that seems good but isn't.

(This problem is pervasive with AST and matchers, and this isn't the worst example.)

Ah, I see what you're driving at better now, thank you!

I know what a static method is, but I would have made the same mistake here. In general asking people to describe syntax to answer a semantic question invites this.

Because our AST is intended to reflect the syntax, our AST matchers kind of *do* match syntax -- I think it's the reason we're in the problem (and this is not the first time it's come up).

The clang AST is the best tool we have to describe syntax *and* to describe semantics.
Maybe by design it captures "all" syntax and "enough" semantics, but users reasonably care less about the design than about their alternatives and problems. The clang AST is useful for answering syntax questions (better than regex!) but it's _irreplaceable_ for answering semantics questions (better than... err?).

So we have syntax and semantics in one namespace, and static is the worst because it means like 3 syntactic and 5 semantic things :-(

Heh, those are fair points. I especially appreciate the clarity about whether we're matching syntax or semantics, that's really the crux of my concerns when you distill it.

There's a cognitive disconnect between the AST matchers traversing the AST and applying a match on all node and the C & C++ concept of redeclarations.

Definitely, this highlights the disconnect really well.
And some of the accessors on a redecl will delegate to the canonical while others won't. (Can we call this "syntax semantics" vs "semantics semantics"?!)

I definitely think we *should* have a redecl matcher - I just don't think we should make users use it to answer common semantic questions.
That leaves matcher authors remembering and reciting the rules about how syntax aggregates over redecls to produce semantics. We'll make mistakes.

Ahhhh, okay, I thought you were opposed to the notion entirely. I agree with you that we need to be very careful not to add undue user burdens for common semantic questions!

Here's a hack: CXXMethodDecl has both isStatic() and isInstance(). I agree that isStatic is a risky name, but I don't think isInstance is. Can we just spell the matcher you want not(isInstance())?

This could work, but I don't think it's particularly satisfying in situations that aren't binary.

Oh, I totally agree. I just meant specifically for this case, as a way to avoid the dilemma of ambiguous vs unwieldy.

Now that I understand better, +1 -- that would unblock the needs for this patch without having to solve the larger problem, which is great.

This could work, but I don't think it's particularly satisfying in situations that aren't binary. Like asking "does this have static storage duration" (as opposed to thread, automatic, or dynamic storage duration). And it turns out that the behavior there is opposite to asking about the storage class specifier: https://godbolt.org/z/hqserfxzs

I'm not totally surprised by that, because "storage duration" sounds like a semantic property to me, while "storage class" (specifier) is a syntactic feature.
But it's not a distinction I'd pick up on if I wasn't looking for it, and different people are going to read things different ways.

My favorite naming convention I've seen in the AST is e.g. getBlob() vs getBlobAsWritten().
Maybe getEffectiveBlob() would be even clearer for the semantic version.
But I'm not sure how much it's possible to realign matcher names around that sort of thing if the AST names aren't going to change.

Yeah, there's definitely some tension between our desire for AST matchers to use the same terminology as used by the AST itself and picking names that are usable and clear for users not steeped in compiler internals lore. Personally, I am very comfortable with the idea of renaming some of the AST functionality within Clang in this area (at least regarding storage duration, linkage, etc) because this problem comes up outside of AST matchers as well. It's an NFC change and there will be some churn from it, but I think the churn is well motivated.

PiotrZSL commandeered this revision.Nov 10 2023, 10:16 AM
PiotrZSL added a reviewer: fwolff.
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: ChuanqiXu. · View Herald Transcript
PiotrZSL abandoned this revision.Nov 10 2023, 10:17 AM

Already fixed by D157326