This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Add a new recursive matcher to replace `forEachDescendant` in unsafe buffer check
ClosedPublic

Authored by ziqingluo-90 on Nov 18 2022, 1:32 PM.

Details

Summary

Add a new matcher forEveryDescendant that recursively matches descendants of a Stmt but skips nested callable definitions.
This matcher has same effect as using forEachDescendant and skipping forCallable explicitly but does not require the AST construction to be complete.

Diff Detail

Event Timeline

ziqingluo-90 created this revision.Nov 18 2022, 1:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2022, 1:32 PM
Herald added a subscriber: rnkovacs. · View Herald Transcript
ziqingluo-90 requested review of this revision.Nov 18 2022, 1:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2022, 1:32 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Is the problem forEachDescendant matching statements inside blocks and lambdas? I wonder if this behavior would surprise people, so I think it would be better to:

  • Potentially add a template bool parameter to forEachDescendant controlling this behavior.
  • Review existing uses because I am not entirely sure if the current behavior is the right default.
ziqingluo-90 added a comment.EditedNov 18 2022, 5:12 PM

Is the problem forEachDescendant matching statements inside blocks and lambdas? I wonder if this behavior would surprise people, so I think it would be better to:

  • Potentially add a template bool parameter to forEachDescendant controlling this behavior.
  • Review existing uses because I am not entirely sure if the current behavior is the right default.

Let me try to describe the problem in more detail. The goal is to recursively visit every descendant of a node n but skip all nested callable declarations in n. With forEachDescendant, one needs to explicitly test and skip, using forCallable, if a descendant does not belong to the same "callable" as n. forCallable looks for ancestors of a node through the AST so it requires the AST to be complete. But in our case, the AST is under construction. Given a node, we want the visitor only strictly look at descendants of the node.

For the questions, I think blocks and lambdas are the only cases we will have in analyzing "C/C++" + "blocks" programs. Our plan is to generalize this matcher to be an optional (off-by-default) mode of forEachDescendant once it becomes stable. So this implementation also skips other declarations. So far, this matcher is local to unsafe buffer checks.

steakhal added inline comments.
clang/lib/Analysis/UnsafeBufferUsage.cpp
55–56

Can Node be ever null if the visitor is initiated only be AST_MATCHER_P?

NoQ added a comment.Nov 28 2022, 1:28 PM

Is the problem forEachDescendant matching statements inside blocks and lambdas? I wonder if this behavior would surprise people, so I think it would be better to:

  • Potentially add a template bool parameter to forEachDescendant controlling this behavior.

Yeah, I honestly think that it is the current behavior of forEachDescendant that's counterintuitive and almost never does what the user actually wants. People typically get surprised when it *does* descend into callables, something they didn't even think about.

And the idiom

decl(forEachDescendant(..., forCallable(equalsBoundNode("D"))).bind("D")

is not only clumsy and obscure but also algorithmically slower.

We do need a better name though, the difference between "Each" and "Every" isn't exactly the same as the difference between this matcher's behavior and that matcher's behavior. I was thinking of forEachStmtDescendant, as in it matches "statement-descendants". An even better name could be forEachDescendantInCallable which describes the behavior precisely and also connects to the old idiom for discoverability.

  • Review existing uses because I am not entirely sure if the current behavior is the right default.

Yeah that's definitely a good idea.

clang/lib/Analysis/UnsafeBufferUsage.cpp
51–52

^^

60

llvm:: is unnecessary, we're under using namespace llvm.

ziqingluo-90 added inline comments.Nov 29 2022, 1:20 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
55–56

Honestly I do not know the exact answer to your question. I was imagining that an AST node could have a null to be one of its children.

Our plan later is to make this matcher a general alternative to forEachDescendant, so I think the check for null here is not over-defensive.

NoQ added inline comments.Nov 29 2022, 2:42 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
55–56

There can definitely be null children in the AST. Eg. for(;;) {} has null initializer, null condition, null increment, non-null body. I guess this is more about whether RecursiveASTVisitor checks for that automatically before invoking callbacks.

ziqingluo-90 added inline comments.Dec 1 2022, 12:22 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
55–56

I guess this is more about whether RecursiveASTVisitor checks for that automatically before invoking callbacks.

The override-ed TraverseDecl method in the base class RecursiveASTVisitor also has a null-check. So I feel like this null-check is needed here.

60
72

Did a rebase and addressed comments.

NoQ accepted this revision.Jan 3 2023, 1:22 PM

Ok LGTM! I hope we'll get a chance to implement a reusable solution later🤞

This revision is now accepted and ready to land.Jan 3 2023, 1:22 PM
This revision was landed with ongoing or failed builds.Jan 4 2023, 3:53 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Jan 4 2023, 5:02 PM

The clang binary should not depend on ASTMatchers.

I see this isn't new in this patch, but ASTMatchers should only be used from clang-tidy, not from the compiler itself.

thakis added a comment.Jan 4 2023, 5:07 PM

Unrelatedly, this doesn't build on Windows: http://45.33.8.238/win/72750/step_4.txt

NoQ added a comment.Jan 4 2023, 6:02 PM

The clang binary should not depend on ASTMatchers.

I see this isn't new in this patch, but ASTMatchers should only be used from clang-tidy, not from the compiler itself.

ASTMatchers have been baked into clang binary for at least a few years already, actively used by the static analyzer (some backstory in D25429). This warning is, however, probably the first use in clang proper, so they'll be there even if static analyzer support is turned off through cmake flags.

I'm open to a broader discussion. To me it's natural that there exist "analysis-based warnings" that take advantage of advanced analysis tools, and ASTMatchers is just one such tool, definitely not the most expensive one and not the most hazardous one (CFG and flow-sensitive analysis are arguably much scarier in both regards, and they're actively used in essential warnings such as -Wuninitialized).

With respect to this warning, we're also paying a lot of attention to make sure that performance is acceptable for the compiler. This patch is actually an example of our commitment to performance as it flattens the performance of this analysis down to linear time over AST size, something that wasn't previously possible with ASTMatchers.

MaskRay added a subscriber: MaskRay.EditedJan 5 2023, 4:34 PM

For relands, you can just use Differential Revision: https://reviews.llvm.org/D138321 instead of Related differential revision: https://reviews.llvm.org/D138329.

Phabricator doesn't recognize Related differential revision:, so it does not connect ef47a0a711f12add401394f7af07a0b4d1635b56 to this differential.

Revert "Revert can be replaced with Reland to emphasize that it is a reland (it can be omitted as well). A reland should include the original commit message. You may additionally mention what issue it fixes.

Is it possible these errors are due to your change: https://lab.llvm.org/buildbot/#/builders/5/builds/30522/steps/13/logs/stdio? If so please consider reverting/fixing.

ASTMatchers have been baked into clang binary for at least a few years already, actively used by the static analyzer (some backstory in D25429). This warning is, however, probably the first use in clang proper, so they'll be there even if static analyzer support is turned off through cmake flags.

I'm open to a broader discussion. To me it's natural that there exist "analysis-based warnings" that take advantage of advanced analysis tools, and ASTMatchers is just one such tool, definitely not the most expensive one and not the most hazardous one (CFG and flow-sensitive analysis are arguably much scarier in both regards, and they're actively used in essential warnings such as -Wuninitialized).

This has been discussed extensively when ASTMatchers were introduced. Project leadership back then (Doug Gregor and iirc Richard Smith) thought ASTMatches weren't a great approach and agreed to have them merged only if they aren't used in clang itself, but only in clang-tools-extra.