This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Implement a basic algorithm for dataflow analysis
ClosedPublic

Authored by sgatev on Dec 7 2021, 4:18 AM.

Details

Summary

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Diff Detail

Event Timeline

sgatev created this revision.Dec 7 2021, 4:18 AM
sgatev requested review of this revision.Dec 7 2021, 4:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2021, 4:18 AM
sgatev updated this revision to Diff 392346.Dec 7 2021, 4:28 AM

Minor changes to parameter names.

LGTM but deferring approval to @xazax.hun .

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
31

Add assert(BlockStates.empty()); ?

59

Please prefer StringRef to const char * if possible.

xazax.hun accepted this revision.Dec 7 2021, 9:32 AM

This patch looks good to me, most of my comments are things to consider in follow-up patches.

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
32

Alternatively, if we have a bottom, forall e in lattice join(e, bottom) == e, so we do not really need optionals and can express nodes that do not contribute to the state using the bottom value. (Or using top, depending on the orientation for our lattices, unfortunately the literature is using both orientations which is really confusing.)

I'm fine with the current approach, but I'd love to see a function level comment about nullopts representing nodes that were not yet evaluated.

35

I did not catch this earlier, but I wonder if we should pass the block in to typeErasedInitialElement. There are some analysis where the initial element might be different for certain nodes. E.g. here is the algorithms for computing dominators from wikipedia:

// dominator of the start node is the start itself
Dom(n0) = {n0}
// for all other nodes, set all nodes as the dominators
for each n in N - {n0}
    Dom(n) = N;
// iteratively eliminate nodes that are not dominators
while changes in any Dom(n)
    for each n in N - {n0}:
        Dom(n) = {n} union with intersection over Dom(p) for all p in pred(n)

Here the initial analysis state for the entry node differs from the initial state for the rest of the nodes.

69

I think this is fine for now, but in general, CFGStmt is probably not the only kind of CFGElement that we want to evaluate.

E.g., CFGImplicitDtor or CFGLifetimeEnds might also be interesting in the future if we want to find certain problems, like dereferencing an invalid pointer. I'd love to see a FIXME.

118

I'd strongly suggest setting up some statistics in the future: https://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option

Having some counters, like how many function timed out, what is the average iteration count and so on could be very handy to monitor the performance of the analysis before and after some changes (both for client analyses or for the framework itself).

148

If a block still has None as its state at this point, it is unreachable in the CFG. One option is to ignore them, as we do at the moment and it is completely fine and should not change this patch. But for some analysis in the future, we might want to check dead code as well (it might be dead for the current platform due to some preprocessor defines but alive for some other). E.g., it would also be nice to diagnose a null dereference in a block that was not reached.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
38–41

I wonder if the dataflow analysis framework should provide a factory that that returns a CFG::BuildOptions which is a good default for most clients. Can be in a follow-up patch, or completely ignored. Or maybe even a convenience function running the analysis on a function definition without the user having to know about the CFG.

This revision is now accepted and ready to land.Dec 7 2021, 9:32 AM
ymandel added inline comments.Dec 7 2021, 12:24 PM
clang/unittests/Analysis/FlowSensitive/CMakeLists.txt
2

Why create a new sub directory? From what I've seen elsewhere, it seems uncommon. I'm fine either way, but want to be sure we have a good reason for differing.

sgatev updated this revision to Diff 392772.Dec 8 2021, 7:35 AM
sgatev marked 8 inline comments as done.

Address reviewers' comments.

sgatev added a comment.Dec 8 2021, 7:37 AM

Thanks everyone!

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
32

Right. Perhaps we can use Analysis.typeErasedInitialElement() and InitEnv to express states of blocks that were not evaluated? In practice, these often coincide with bottom. In that case we're loosing the ability to understand which blocks have been evaluated. An alternative, as you mentioned, would be to require an additional bottom element to be provided by the analysis. I added a function level comment for now and will consider this a bit more.

35

Good point, though, I'm not sure if this algorithm is the best choice for such an analysis. It seems that in that case we don't need to evaluate the statements in the basic blocks, right? Perhaps we can generalize the algorithm or offer a separate one for more efficient implementation of such analyses. I added a FIXME and will consider this a bit more.

148

I added a FIXME to consider evaluating unreachable blocks. However, I think it's preferable to run the analysis with the same configuration that will be used to compile the code. Do you think this isn't always possible/practical?

clang/unittests/Analysis/FlowSensitive/CMakeLists.txt
2

I created the directory to match the structure in llvm-project/clang/include/clang/Analysis/FlowSensitive and llvm-project/clang/lib/Analysis/FlowSensitive. In general, I think it'd be easier to find related code if it's structured similarly. If this doesn't follow the established practices for the project I'd be happy to change it.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
31

Done, and also changed the other ASSERT_THAT to assert.

38–41

I added a FIXME. A utility that runs an analysis on a function definition would mean that the CFG will need to be built each time we want to run an analysis which could be wasteful if we want to run several analyses on the same CFG. Perhaps we can consider providing a function that builds the CFG using default options and returns it, possibly wrapped in another type.

xazax.hun accepted this revision.Dec 8 2021, 9:18 AM

Thanks!

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
35

Dominators is one example, but there might be other analyses where the initial state differs :)
Since this change is really easy to make, I'm ok deferring this until we see the actual need.

ymandel added inline comments.Dec 8 2021, 2:47 PM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
31

should this be declared in the header? If not, please mark static.

62

Please declare this function in the header.

sgatev updated this revision to Diff 393156.Dec 9 2021, 7:09 AM

Declare transferBlock in the header.

sgatev marked 4 inline comments as done.Dec 9 2021, 7:10 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
31

Marked it as static.

gribozavr2 accepted this revision.Dec 10 2021, 1:39 AM
thakis added a subscriber: thakis.Dec 10 2021, 4:45 AM
thakis added inline comments.
clang/unittests/Analysis/FlowSensitive/CMakeLists.txt
2

I'm also confused by this.

Would this work:

% git diff
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 16e3f474060c..eec02b31a9a8 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -18,6 +18,7 @@ add_clang_library(clangAnalysis
   CodeInjector.cpp
   Dominators.cpp
   ExprMutationAnalyzer.cpp
+  FlowSensitive/TypeErasedDataflowAnalysis.cpp
   IssueHash.cpp
   LiveVariables.cpp
   MacroExpansionContext.cpp
@@ -44,4 +45,3 @@ add_clang_library(clangAnalysis
   )

 add_subdirectory(plugins)
-add_subdirectory(FlowSensitive)
diff --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt
index 7e2a00b96057..f1cb402268c0 100644
--- a/clang/unittests/Analysis/CMakeLists.txt
+++ b/clang/unittests/Analysis/CMakeLists.txt
@@ -8,6 +8,7 @@ add_clang_unittest(ClangAnalysisTests
   CFGTest.cpp
   CloneDetectionTest.cpp
   ExprMutationAnalyzerTest.cpp
+  FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
   MacroExpansionContextTest.cpp
   )

Or do the flow-sensitive analyses have different library dependencies and there are places that want to depend on analysis without the flow sensitive bits or the other way round?

(Why are these files in a subdirectory at all?)