This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Optimize flow condition representation
ClosedPublic

Authored by sgatev on Apr 25 2022, 8:34 AM.

Details

Summary

Enable efficient implementation of context-aware joining of distinct
boolean values. It can be used to join distinct boolean values while
preserving flow condition information.

Flow conditions are represented as Token <=> Clause iff formulas. To
perform context-aware joining, one can simply add the tokens of flow
conditions to the formula when joining distinct boolean values, e.g:
makeOr(makeAnd(FC1, Val1), makeAnd(FC2, Val2)). This significantly
simplifies the implementation of Environment::join.

This patch removes the DataflowAnalysisContext::getSolver method.
The DataflowAnalysisContext::flowConditionImplies method should be
used instead.

Diff Detail

Event Timeline

sgatev created this revision.Apr 25 2022, 8:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 8:34 AM
sgatev requested review of this revision.Apr 25 2022, 8:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 8:34 AM
sgatev updated this revision to Diff 424930.Apr 25 2022, 9:08 AM

Mark methods as const.

Nice! Did you do some measurements? Does this improve the performance or decrease the memory consumption?

clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

Could this be const?

221

I think an example would be nice what "depending on" a flow condition means. A simplistic example with a nested if would probably be the easiest to understand.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
194–195

Do we ever expect to call these functions independently of each other? If no, maybe addFlowConditionDependency should add the constraint as well.

sgatev updated this revision to Diff 425143.Apr 26 2022, 1:04 AM
sgatev marked 2 inline comments as done.

Address comments.

sgatev marked an inline comment as done.Apr 26 2022, 1:18 AM

Nice! Did you do some measurements? Does this improve the performance or decrease the memory consumption?

I didn't do any measurements :( I implemented this as I'd like to add context-aware join (what we do for booleans, preserving flow condition context) to UncheckedOptionalAccessModel and don't really want to make the current approach a pattern. I suspect this could also be useful when modeling other language constructs.

clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

I think no because we need to pass it to getOrCreateDisjunctionValue and getOrCreateNegationValue, and their sub-values are not const currently.

221

I didn't add an example, but I rephrased this section after adding the forkFlowCondition and joinFlowConditions methods. Let me know what you think.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
194–195

I added forkFlowCondition and joinFlowConditions methods so dependencies are an implementation detail now.

ymandel accepted this revision.Apr 26 2022, 7:20 AM

Nice work. On the surface, this adds complexity to the system and so should be justified in terms of performance improvements. However, having read through the patch, I think it overall reduces the complexity, since both environment join and boolean-value join have been radically simplified. So, I'm fine with the patch as is, even without performance measurement. That said, I'm not against such measurement -- just saying it's not blocking. :)

If you agree, you may want to reword the description of the patch to focuse on the design improvements rather than (exclusively) the optimization aspect.

clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
49

nit: Maybe mention this removal in the patch description.

152

I think it would be helpful to expand here with a brief explanation of the role of flow condition tokens.

155
212

Is the idea that this is an encoding of the <=>? If so, maybe use "encoded" in the internal description, as in "Internally, a flow condition clause is encoded as ..."?

Or, a bigger change:

Flow conditions are tracked symbolically: each unique flow condition is associated with a fresh symbolic variable, bound to the clause that defines the flow condition. Conceptually, each binding corresponds to an "iff" over the form FC <=> (C1 ^ C2 ^ ...), where FC is a flow condition token (an atomic boolean) and Cis are the set of constraints that define the flow condition. However, we do not record the formula directly as an iff. Instead, internally, a flow condition clause is ...

This revision is now accepted and ready to land.Apr 26 2022, 7:20 AM
sgatev updated this revision to Diff 425221.Apr 26 2022, 8:03 AM
sgatev marked 4 inline comments as done.

Address comments.

sgatev edited the summary of this revision. (Show Details)Apr 26 2022, 8:05 AM
sgatev marked an inline comment as done.

Nice work. On the surface, this adds complexity to the system and so should be justified in terms of performance improvements. However, having read through the patch, I think it overall reduces the complexity, since both environment join and boolean-value join have been radically simplified. So, I'm fine with the patch as is, even without performance measurement. That said, I'm not against such measurement -- just saying it's not blocking. :)

If you agree, you may want to reword the description of the patch to focuse on the design improvements rather than (exclusively) the optimization aspect.

Agreed. I updated the description.

xazax.hun accepted this revision.Apr 29 2022, 8:09 AM
This revision was automatically updated to reflect the committed changes.