This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Parameterize analysis by explicit map of analyzable functions.
Needs ReviewPublic

Authored by ymandel on Aug 5 2022, 12:14 PM.

Details

Summary

This patch evolves the cache of ControlFlowContexts into a parameter that the
user can supply. Specifically, the system now limits inline analysis of function
calls specifically to those functions contained in a user-supplied map from
"fully qualified name" to ControlFlowContext.

To avoid the cost of generating the fully qualified function name for every
CallExpr, we cache the results of the map lookup in an intermediary map over
FunctionDecl*s.

Additionally, this patch adds a utility function to construct such a map from an ASTUnit. This allows either admitting all the functions in the main AST (for example, a test code snippet) or from a separate AST. This latter functionality is a step towards supporting models from independent ASTs.

Part of issue #56879

Diff Detail

Event Timeline

ymandel created this revision.Aug 5 2022, 12:14 PM
Herald added a project: Restricted Project. · View Herald Transcript
ymandel requested review of this revision.Aug 5 2022, 12:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 12:14 PM
ymandel updated this revision to Diff 450353.Aug 5 2022, 12:22 PM

add comment

ymandel edited the summary of this revision. (Show Details)Aug 8 2022, 5:45 AM
sgatev added inline comments.Aug 8 2022, 7:55 AM
clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
83

#include "llvm/ADT/StringMap.h"

clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
27

This should be StringMap.

60

How about InlineableFunctions?

60
clang/lib/Analysis/FlowSensitive/Transfer.cpp
549–552

How is that different from F? Why not let Environment::pushCall get this from the CallExpr argument?

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
74
ymandel updated this revision to Diff 450823.Aug 8 2022, 8:36 AM

Address reviewer comments.

The concept of fully qualified name is somewhat ambiguous for me. Do we expect the user to specify inline namespaces as well? I'd love to see some test cases and comments that clarify this.

clang/lib/Analysis/FlowSensitive/ControlFlowContext.cpp
96

Do we exclude non-toplevel declarations on purpuse? Or would this work for methods of inline classes, methods of classes defined within a function?

clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
341

Out of curiosity, I'd expect getCanonicalDecl to always succeed. Do you know cases where it would not?

ymandel marked an inline comment as done.Aug 8 2022, 9:12 AM

The concept of fully qualified name is somewhat ambiguous for me. Do we expect the user to specify inline namespaces as well? I'd love to see some test cases and comments that clarify this.

Sure. This is probably worth some discussion. Fully qualified names, however we define them, will not be enough, since they don't cover overload sets. So, regardless, we need some internally-consistent (in clang) mechanism of uniquely identifying function declarations. Since we're also supporting methods and constructors, that implies IDs for classes as well.

I'd like some mechanism that matches how identifiers are used. So, for example, inline namespaces should *not* be necessary, since they are an implementation detail from this perspective.

clang/lib/Analysis/FlowSensitive/ControlFlowContext.cpp
96

Do we exclude non-toplevel declarations on purpuse? Or would this work for methods of inline classes, methods of classes defined within a function?

I think that for the current use case -- models of library types and their methods/functions -- we don't have a good usecase for this. But, I can see this becoming an issue if we want to expand to inlining other declarations. So, I'm inclined to hold off on this for the time being, since that's a larger design discussion.

Yet, I also think this should be generalized to take any decl and extract the functions/methods. I limited to ASTUnit for convenience, since that was the immediate need. I'm happy to either:

  1. Add a FIXME, and/or,
  2. Split this function into two: one that takes two decl iterators (begin, end) and does this work here and another which is just a convenience function for ASTUnit to apply the above to the top-level decls.

WDYT?

clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
341

Hah. No, I don't but I also didn't find any mention in the headers guaranteeing otherwise, so I added this. I'm fine removing but I feel a little uncomfortable with the fact that it's not documented.

Sure. This is probably worth some discussion. Fully qualified names, however we define them, will not be enough, since they don't cover overload sets.

I see. This is not a unique problem. I think there were multiple discussions about API Notes and those need to solve the same problem. @gribozavr2 probably has more context on the current status of API Notes. An alternative to fully qualified names is Clang's USR that is often used for cross-referencing functions across translation units. Less user friendly, but will support overloads.

I'd like some mechanism that matches how identifiers are used. So, for example, inline namespaces should *not* be necessary, since they are an implementation detail from this perspective.

A similar matching is already implemented for the Clang Static Analyzer. See https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CallDescription.html and https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CallDescriptionMap.html

One example use is in the CStringChecker: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp#L136

clang/lib/Analysis/FlowSensitive/ControlFlowContext.cpp
96

I am fine with the current behavior, but I think the docstring "Builds a map of all declared functions in the given AST" is misleading in this case. Specifying in the docstring that this function will only map the top-level functions and methods of top-level classes sounds good to me.

clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
341

I see other places where we don't check for nullness, e.g.: https://github.com/llvm/llvm-project/blob/bb297024fad2f6c3ccaaa6a5c3a270f73f15f3ac/clang-tools-extra/clang-tidy/misc/UnusedParametersCheck.cpp#L83

I think the CanonicalDecl is defined as the first declaration for redeclarable entities. I think getting the first element of a non-empty list should always succeed. But I definitely agree, this fact should be documented, or even better, this should probably return a reference.

ymandel updated this revision to Diff 451144.Aug 9 2022, 7:29 AM
ymandel marked an inline comment as done.

respond to comments.

ymandel updated this revision to Diff 451145.Aug 9 2022, 7:34 AM
ymandel marked 9 inline comments as done.

fix typo

Sure. This is probably worth some discussion. Fully qualified names, however we define them, will not be enough, since they don't cover overload sets.

I see. This is not a unique problem. I think there were multiple discussions about API Notes and those need to solve the same problem. @gribozavr2 probably has more context on the current status of API Notes. An alternative to fully qualified names is Clang's USR that is often used for cross-referencing functions across translation units. Less user friendly, but will support overloads.

I'd like some mechanism that matches how identifiers are used. So, for example, inline namespaces should *not* be necessary, since they are an implementation detail from this perspective.

A similar matching is already implemented for the Clang Static Analyzer. See https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CallDescription.html and https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CallDescriptionMap.html

One example use is in the CStringChecker: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp#L136

Thanks. That looks good, but I'm concerned that it only counts the arguments and doesn't look at their types. I'd imagine this will be a limitation down the line when we want to deal with overload sets w/ the same number of arguments, but different types.

Aside: why the const char * interface? Do you think owners would be open to a llvm::StringRef overload for the constructor?

clang/lib/Analysis/FlowSensitive/ControlFlowContext.cpp
96

Thanks, done.

clang/lib/Analysis/FlowSensitive/Transfer.cpp
549–552

Added comment to explain.

Thanks. That looks good, but I'm concerned that it only counts the arguments and doesn't look at their types. I'd imagine this will be a limitation down the line when we want to deal with overload sets w/ the same number of arguments, but different types.

Yeah, it is not a fully baked solution at this point, but it does implement some of the features that you plan to add (like skipping inline namespaces), and some more (argument count, checking if a function is from a system header).

Aside: why the const char * interface? Do you think owners would be open to a llvm::StringRef overload for the constructor?

I am sure that the analyzer community is open to any improvements. The main reason I'd be glad if that facility could be shared across the two static analysis solution because improvements from one community could be benefited by the other, also the code would be more similar which could be great for cross pollination of ideas.

If you think it is feasible to reuse some of those facilities for your purposes, I am happy to review all of those patches. If it is not a good fit for some reason, I am ok with having a new custom solution here.

Thanks. That looks good, but I'm concerned that it only counts the arguments and doesn't look at their types. I'd imagine this will be a limitation down the line when we want to deal with overload sets w/ the same number of arguments, but different types.

Yeah, it is not a fully baked solution at this point, but it does implement some of the features that you plan to add (like skipping inline namespaces), and some more (argument count, checking if a function is from a system header).

Aside: why the const char * interface? Do you think owners would be open to a llvm::StringRef overload for the constructor?

I am sure that the analyzer community is open to any improvements. The main reason I'd be glad if that facility could be shared across the two static analysis solution because improvements from one community could be benefited by the other, also the code would be more similar which could be great for cross pollination of ideas.

If you think it is feasible to reuse some of those facilities for your purposes, I am happy to review all of those patches. If it is not a good fit for some reason, I am ok with having a new custom solution here.

I think we should try to reuse for the reasons you mention. We can evolve it with more features as needed. I'm kind of allergic to const char * so I propose that I first send a patch for that and can then follow up here.

I think we should try to reuse for the reasons you mention. We can evolve it with more features as needed. I'm kind of allergic to const char * so I propose that I first send a patch for that and can then follow up here.

This sounds wonderful, thanks!

samestep added inline comments.Aug 9 2022, 12:42 PM
clang/include/clang/Analysis/FlowSensitive/ControlFlowContext.h
70–73