This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] ASTMatchers: Extend with Static Analyzer data types
Needs ReviewPublic

Authored by Ignotus on Apr 8 2020, 10:26 AM.

Details

Summary

This is an extension of the existing AST matcher API so that we can
traverse on symbolic values.

Diff Detail

Event Timeline

Ignotus created this revision.Apr 8 2020, 10:26 AM

What is the concrete problem you are trying to solve? Could you please elaborate on why do we need this raw copy of the original ASTMatchers library? Why is it not possible to use the matchers from libAST? I am not sure that a raw copy paste of a whole library is the most appropriate solution to any problem.

Ignotus added subscribers: rsmith, alexfh, a_sidorin.

Thanks for the response!

What is the concrete problem you are trying to solve?

Some high-level goals: Deprecate the reachability and cluster analysis of RegionStore. Simplify the checker-writing so obtaining a std::string's StringRegion does not require an ASTMatcher to match for the StringLiteral. It could be used as a base for Aleksei's graph-matcher-based checker-writing concept: http://lists.llvm.org/pipermail/cfe-dev/2018-May/057985.html

Could you please elaborate on why do we need this raw copy of the original ASTMatchers library?

This solution is the most appropriate solution, I believe. There is no common-sense way to adjust the ASTMatchers with our data types. For a single use-case it does not make sense to create another layer on top of the already complex template and macro layer so that we could inherit from the ASTMatchers. Most likely if the community needs Aleksei's work we need a lot more fine-grained matcher logic using the context and state information. With that copy-paste abstraction, we could create our own matcher-logic step by step.

Why is it not possible to use the matchers from libAST?

The goal of ASTMatchers is to glue types together and match them using the class DynTypedNode. We do not use the types which are the ASTMatchers work on, and the ASTMatchers should not be shipped with Static Analyzer types. Every part of the ASTMatchers are type-heavy, please do a Ctrl+F and search for the word SVal. That is the issue in order to re-use the existing facilities.

I am not sure that a raw copy paste of a whole library is the most appropriate solution to any problem.

I could reinvent the copy-pasted part of the ASTMatchers for my own wording and own class-hierarchies, and I could sell it as being my own invention. That would be even more silly than reusing it. It is an exotic way of reusing code, I understand it and I feel embarrassed to do so.


@alexfh, @rsmith, would we like to use namespace ento types in the ASTMatchers or we want to create a template-macro-magic layer on top of the template-macro-magic layer to reuse code (if that is even possible)?

This looks like quite a change -- did I miss out on a discussion on the mailing list?

For a patch of this scale I'd prefer to have a formal proposal, a problem statement that details the current state of things and why they aren't ideal, a survey of already existing solutions or ideas (Alexey's idea and proof of concept, as you said yourself, seems strongly related, and there are years of discussion about it in the mailing list, should we pick that up?), and then a high level overview of what your approach is like. You seem to aim for a bigger fish then what is found in here, so I suspect it would be great to be on the same page.

Despite your comment to @martong, I, and seemingly many other regular maintainers of the analyzer I talked to seem to be confused about the above mentioned points. I added some folks as reviewers (as well as myself), I have strong confidence in their expertise of the AST matcher library (on which this solution is based) and the static analyzer itself.

Thanks for the idea!

This looks like quite a change -- did I miss out on a discussion on the mailing list?

No, there is no discussion.

For a patch of this scale I'd prefer to have a formal proposal, a problem statement that details the current state of things and why they aren't ideal, a survey of already existing solutions or ideas (Alexey's idea and proof of concept, as you said yourself, seems strongly related, and there are years of discussion about it in the mailing list, should we pick that up?), and then a high level overview of what your approach is like. You seem to aim for a bigger fish then what is found in here, so I suspect it would be great to be on the same page.

It is a very tiny change for my later patches, which will consist of large changes. None of them need to be discussed because they are my own research project.

Despite your comment to @martong, I, and seemingly many other regular maintainers of the analyzer I talked to seem to be confused about the above mentioned points. I added some folks as reviewers (as well as myself), I have strong confidence in their expertise of the AST matcher library (on which this solution is based) and the static analyzer itself.

Of course this solution is somewhat shameless. I do not like it as well. Given that people seems to refuse this poor man's solution, I will try better. Other than that I am all ears whether we want to pollute the LibASTMatchers with template magic. My goal is not simplicity, but a common-sense solution.

Ignotus updated this revision to Diff 256885.Apr 12 2020, 3:27 PM
Ignotus retitled this revision from [analyzer] SADMatchers: Static Analyzer Data matchers to [analyzer] ASTMatchers: Extend with Static Analyzer data types.
Ignotus edited the summary of this revision. (Show Details)

This is a minimalist solution. I hope that it will show why it would be near impossible to implement/maintain over-stressed and templated inheritance.

Pros:

  • No template magic required for the implementation.
  • We can traverse into ASTMatchers so that the expressibility is increased.
  • With descending into ASTMatchers we could match parts of the CFG/Environment/ProgramPoint (power user features).

Cons:

  • The placing of the implementation is ridiculous.
  • We put core logic and maintenance outside of the Static Analyzer's scope.
  • Traversing SVals became difficult because we need to forward declare CompoundVal and LazyCompoundVal (not sure how).

Hi, 19n07u5!
It's neat that my research gained some value for community members. Unfortunately, my initial work had a number of design problems that prevented me from upstreaming it. And the first one was module dependencies. The whole dependency picture for graph matchers looks like this:

  1. ASTTypeTraits should depend on Static Analyzer (not an option!) because graph types are trying to mimic common DynTypedNodes. This was done to make matchers for graph and AST nodes operate seamlessly but what was good for PoC was not good for upstreaming.
  2. ASTMatchers should depend on Static Analyzer (requires massive refactoring). One of the problems here is that Clang can be built without CSA so ASTMatcherInternals.h have to become full of preprocessor guards.
  3. ASTMatchers should depend on GraphMatchers and vice versa. While the backward dependency is fine, the direct is required to make matchers like functionDecl(hasPath(....)) work. This is not a problem for this patch, however.

AST should not depend on StaticAnalyzer and should not use its data structures. I was thinking about some mechanism that allows extending ASTNodeKind and DynTypedNodes by adding types that can be located in another modules (by registering them, for example). Unfortunately, I had to abandon this work due to lack of time.
As a result of these issues, I haven't started an RFC for adding the graph node matchers in the mailing list. This discussion is strongly required for such a change. I think the community will suggest some design improvements if we need such a feature. Also feel free to contact me in case of any question regarding this work.

whisperity added inline comments.Apr 21 2020, 8:15 AM
clang/CMakeLists.txt
469–472

Why is this needed? Given that CLANG_ENABLE_STATIC_ANALYZER is a toggleable option by the user, the definition should be present in the config header <build>/tools/clang/include/clang/Config/config.h:

/* Enable each functionality of modules */
#define CLANG_ENABLE_ARCMT 0
#define CLANG_ENABLE_OBJC_REWRITER 0
#define CLANG_ENABLE_STATIC_ANALYZER 0

(this is from a build on my machine that has CSA turned off.)

One way or another, this config header should find its way to the TUs, and then the defined-ness of the macro should be available without having to manually add the preprocessor flag to the compiler calls.

(Another options related to ARCMT that are in this same CMakeLists file are also present in the confiig header)

clang/include/clang/AST/RecursiveASTVisitor.h
159–160

This here feels like a massive ODR issue just waiting for biting us in the back. We have to think about potential downstream projects, and introducing this new inheritance seems very, very intrusive.