This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Implement functionality for flow condition variable substitution.
ClosedPublic

Authored by wyt on Jun 22 2022, 10:45 AM.

Details

Summary

This patch introduces buildAndSubstituteFlowCondition - given a flow condition token, this function returns the expression of constraints defining the flow condition, with values substituted where specified.

As an example:
Say we have tokens FC1, FC2, FC3:

FlowConditionConstraints: {
 FC1: C1,
 FC2: C2,
 FC3: (FC1 v FC2) ^ C3,
}

buildAndSubstituteFlowCondition(FC3, /*Substitutions:*/{{C1 -> C1'}})
returns a value corresponding to (C1' v C2) ^ C3.

Note:
This function returns the flow condition expressed directly as its constraints, which differs to how we currently represent the flow condition as a token bound to a set of constraints and dependencies. Making the representation consistent may be an option to consider in the future.

Depends On D128357

Diff Detail

Event Timeline

wyt created this revision.Jun 22 2022, 10:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2022, 10:45 AM
wyt requested review of this revision.Jun 22 2022, 10:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2022, 10:45 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
wyt edited the summary of this revision. (Show Details)Jun 22 2022, 11:10 AM
gribozavr2 added inline comments.Jun 22 2022, 11:53 AM
clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
174

Could you refactor it into a free function? It does not use 'this', seems like.

236–240

It is better to put special cases (early returns) first.

clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp
235

Could you do the following: build explicitly a BoolValue that represents the expected result formula, and then ask the SAT solver to prove equivalence between the two? That would be a stronger test than merely checking one implication.

Same in the test below.

xazax.hun added inline comments.Jun 22 2022, 4:42 PM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
209

Could you elaborate on why do we need this? Why do we care about how flow condition constraints are represented? We should use the solver to ask questions and as far as I understand the solver should work with both representations. Also this function feels like doing two things, building the flow condition from the tokens and doing substitutions. Any reason why those two are not separate functions? Is this for performance reasons?

wyt updated this revision to Diff 439719.Jun 24 2022, 5:54 AM
wyt marked an inline comment as done.

Address comments - improve test clarity by comparing substituted flow condition with an expected value using equivalentBoolValues utility.

wyt marked 3 inline comments as done.Jun 24 2022, 6:32 AM

@xazax.hun

Could you elaborate on why do we need this?

We are currently working on a pointer nullability analysis here: https://github.com/google/crubit/tree/main/nullability_verification.
One of the things we are trying to do is to check if the flow condition is independent of a variable, done by checking if the formula regardless if the variable is substituted to true or false - requiring buildAndSubstituteFlowCondition.

Why do we care about how flow condition constraints are represented? We should use the solver to ask questions and as far as I understand the solver should work with both representations.

The first approach of representing flow conditions as an atomic boolean token, places the responsibility on the user to handle adding the constraints of the flow condition when checking with the solver.
The second approach of representing flow conditions as a direct conjunction of its constraints, has performance implications as any boolean values with the flow condition appearing in it may grow in size, and the entire conjunction of the flow condition has to be added repeatedly if the flow condition appears in multiple places.

Also this function feels like doing two things, building the flow condition from the tokens and doing substitutions. Any reason why those two are not separate functions? Is this for performance reasons?

You'd build the flow condition by passing in an empty substitutions set. Also, just building the flow condition does not see any use currently.

clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
188

@gribozavr2

Could you refactor it into a free function? It does not use 'this', seems like.

We store and retrieve BoolVals in DataflowAnalysisContext. Example as highlighted.

xazax.hun accepted this revision.Jun 24 2022, 8:49 AM
xazax.hun added inline comments.
clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
219

I think DenseMap has a constructor that accepts two iterators.

235–238

Should we move this check to the beginning to save work?

This revision is now accepted and ready to land.Jun 24 2022, 8:49 AM
gribozavr2 accepted this revision.Jun 24 2022, 3:22 PM
gribozavr2 added inline comments.
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
209

Could you copy the example from the commit description into the doc comment?

clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp
319

Could you also add a test for negations?

wyt updated this revision to Diff 440119.Jun 27 2022, 1:11 AM
wyt marked an inline comment as done.

Address comments - add example to buildAndSubstituteFlowCondition doc comment, add tests for atomic and negated flow condition.

wyt marked an inline comment as done.Jun 27 2022, 1:13 AM
gribozavr2 accepted this revision.Jun 27 2022, 2:17 AM