This patch extends the join logic for environments to explicitly handle
boolean values. It creates the disjunction of both source values, guarded by the
respective flow conditions from each input environment. This change allows the
framework to reason about boolean correlations across multiple branches (and
subsequent joins).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | ||
---|---|---|
79 | Hmm, interesting. So it would be interpreted as: (FlowConditionPred1 || FlowConditionPred2) && (FormulaAtPred1 || FormulaAtPred2). My main concern is whether we would end up seeing an exponential explosion in the size of these formulas in the number of branches following each other in a sequence. |
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | ||
---|---|---|
79 | Yeap, I agree this is suboptimal and I believe I'm the one to blame for introducing it downstream. I wonder if we can represent the flow condition of each environment using a bool atom and have a mapping of bi-conditionals between flow condition atoms and flow condition constraints. Something like: FC1 <=> C1 ^ C2 FC2 <=> C2 ^ C3 ^ C4 FC3 <=> (FC1 v FC2) ^ C5 ... We can use that to simplify the formulas here and in joinConstraints. The mapping can be stored in DataflowAnalysisContext. We can track dependencies between flow conditions (e.g. above FC3 depends on FC1 and FC2) and modify flowConditionImplies to construct a formula that includes the bi-conditionals for all flow condition atoms in the transitive set before invoking the solver. I suggest putting the optimization in its own patch. I'd love to look into it right after this patch is submitted if both of you think it makes sense on a high level. |
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | ||
---|---|---|
79 | This sounds good to me. That said, I'm not sure how often we'd expect this to be an issue in practice, since, IIUC, this specialized merge only occurs when the value is handled differently in the two branches. So, a series of branches alone won't trigger the exponential behavior. |
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | ||
---|---|---|
79 | Fair point. It might never cause a problem in practice. Since we already have one proposal how to try to fix this if a problem ever surfaces, I'd love to have a comment about this in the code. But I think this is something that probably does not need to be addressed in the near future. |
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | ||
---|---|---|
79 | Good idea -- I reformatted essentially sgatev's response above as a comment in the code. |
Hmm, interesting.
I think we view every boolean formula at a certain program point implicitly as FlowConditionAtThatPoint && Formula. And the flow condition at a program point should already be a disjunction of its predecessors.
So it would be interpreted as: (FlowConditionPred1 || FlowConditionPred2) && (FormulaAtPred1 || FormulaAtPred2).
While this is great, this is not the strongest condition we could derive.
(FlowConditionPred1 && FormulaAtPred1) || (FormulaAtPred2 && FlowConditionPred2) created by this code snippet is stronger which is great.
My main concern is whether we would end up seeing an exponential explosion in the size of these formulas in the number of branches following each other in a sequence.