This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM but deferring approval to @xazax.hun .
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp | ||
---|---|---|
31 | Add assert(BlockStates.empty()); ? | |
59 | Please prefer StringRef to const char * if possible. |
This patch looks good to me, most of my comments are things to consider in follow-up patches.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp | ||
---|---|---|
32 | Alternatively, if we have a bottom, forall e in lattice join(e, bottom) == e, so we do not really need optionals and can express nodes that do not contribute to the state using the bottom value. (Or using top, depending on the orientation for our lattices, unfortunately the literature is using both orientations which is really confusing.) I'm fine with the current approach, but I'd love to see a function level comment about nullopts representing nodes that were not yet evaluated. | |
35 | I did not catch this earlier, but I wonder if we should pass the block in to typeErasedInitialElement. There are some analysis where the initial element might be different for certain nodes. E.g. here is the algorithms for computing dominators from wikipedia: // dominator of the start node is the start itself Dom(n0) = {n0} // for all other nodes, set all nodes as the dominators for each n in N - {n0} Dom(n) = N; // iteratively eliminate nodes that are not dominators while changes in any Dom(n) for each n in N - {n0}: Dom(n) = {n} union with intersection over Dom(p) for all p in pred(n) Here the initial analysis state for the entry node differs from the initial state for the rest of the nodes. | |
69 | I think this is fine for now, but in general, CFGStmt is probably not the only kind of CFGElement that we want to evaluate. E.g., CFGImplicitDtor or CFGLifetimeEnds might also be interesting in the future if we want to find certain problems, like dereferencing an invalid pointer. I'd love to see a FIXME. | |
104 | I'd strongly suggest setting up some statistics in the future: https://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option Having some counters, like how many function timed out, what is the average iteration count and so on could be very handy to monitor the performance of the analysis before and after some changes (both for client analyses or for the framework itself). | |
134 | If a block still has None as its state at this point, it is unreachable in the CFG. One option is to ignore them, as we do at the moment and it is completely fine and should not change this patch. But for some analysis in the future, we might want to check dead code as well (it might be dead for the current platform due to some preprocessor defines but alive for some other). E.g., it would also be nice to diagnose a null dereference in a block that was not reached. | |
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp | ||
38–41 | I wonder if the dataflow analysis framework should provide a factory that that returns a CFG::BuildOptions which is a good default for most clients. Can be in a follow-up patch, or completely ignored. Or maybe even a convenience function running the analysis on a function definition without the user having to know about the CFG. |
clang/unittests/Analysis/FlowSensitive/CMakeLists.txt | ||
---|---|---|
2 | Why create a new sub directory? From what I've seen elsewhere, it seems uncommon. I'm fine either way, but want to be sure we have a good reason for differing. |
Thanks everyone!
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp | ||
---|---|---|
32 | Right. Perhaps we can use Analysis.typeErasedInitialElement() and InitEnv to express states of blocks that were not evaluated? In practice, these often coincide with bottom. In that case we're loosing the ability to understand which blocks have been evaluated. An alternative, as you mentioned, would be to require an additional bottom element to be provided by the analysis. I added a function level comment for now and will consider this a bit more. | |
35 | Good point, though, I'm not sure if this algorithm is the best choice for such an analysis. It seems that in that case we don't need to evaluate the statements in the basic blocks, right? Perhaps we can generalize the algorithm or offer a separate one for more efficient implementation of such analyses. I added a FIXME and will consider this a bit more. | |
134 | I added a FIXME to consider evaluating unreachable blocks. However, I think it's preferable to run the analysis with the same configuration that will be used to compile the code. Do you think this isn't always possible/practical? | |
clang/unittests/Analysis/FlowSensitive/CMakeLists.txt | ||
2 | I created the directory to match the structure in llvm-project/clang/include/clang/Analysis/FlowSensitive and llvm-project/clang/lib/Analysis/FlowSensitive. In general, I think it'd be easier to find related code if it's structured similarly. If this doesn't follow the established practices for the project I'd be happy to change it. | |
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp | ||
31 | Done, and also changed the other ASSERT_THAT to assert. | |
38–41 | I added a FIXME. A utility that runs an analysis on a function definition would mean that the CFG will need to be built each time we want to run an analysis which could be wasteful if we want to run several analyses on the same CFG. Perhaps we can consider providing a function that builds the CFG using default options and returns it, possibly wrapped in another type. |
Thanks!
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp | ||
---|---|---|
35 | Dominators is one example, but there might be other analyses where the initial state differs :) |
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp | ||
---|---|---|
31 | Marked it as static. |
clang/unittests/Analysis/FlowSensitive/CMakeLists.txt | ||
---|---|---|
2 | I'm also confused by this. Would this work: % git diff diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt index 16e3f474060c..eec02b31a9a8 100644 --- a/clang/lib/Analysis/CMakeLists.txt +++ b/clang/lib/Analysis/CMakeLists.txt @@ -18,6 +18,7 @@ add_clang_library(clangAnalysis CodeInjector.cpp Dominators.cpp ExprMutationAnalyzer.cpp + FlowSensitive/TypeErasedDataflowAnalysis.cpp IssueHash.cpp LiveVariables.cpp MacroExpansionContext.cpp @@ -44,4 +45,3 @@ add_clang_library(clangAnalysis ) add_subdirectory(plugins) -add_subdirectory(FlowSensitive) diff --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt index 7e2a00b96057..f1cb402268c0 100644 --- a/clang/unittests/Analysis/CMakeLists.txt +++ b/clang/unittests/Analysis/CMakeLists.txt @@ -8,6 +8,7 @@ add_clang_unittest(ClangAnalysisTests CFGTest.cpp CloneDetectionTest.cpp ExprMutationAnalyzerTest.cpp + FlowSensitive/TypeErasedDataflowAnalysisTest.cpp MacroExpansionContextTest.cpp ) Or do the flow-sensitive analyses have different library dependencies and there are places that want to depend on analysis without the flow sensitive bits or the other way round? (Why are these files in a subdirectory at all?) |
should this be declared in the header? If not, please mark static.