This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Intersect ExprToLoc when joining environments
ClosedPublic

Authored by sgatev on Jan 20 2022, 1:33 AM.

Details

Summary

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Diff Detail

Event Timeline

sgatev created this revision.Jan 20 2022, 1:33 AM
sgatev requested review of this revision.Jan 20 2022, 1:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2022, 1:33 AM
xazax.hun accepted this revision.Jan 20 2022, 2:40 AM
This revision is now accepted and ready to land.Jan 20 2022, 2:40 AM
xazax.hun added inline comments.Jan 20 2022, 2:42 AM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

Are we sure that the memory addresses of CFGBlocks are stable enough for a deterministic order? Alternatively, we could use the block ids for the ordering.

xazax.hun added inline comments.Jan 20 2022, 2:44 AM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

Also, could you describe where the flakiness is originated from? Naively, I'd expect that the order in which we process the predecessors should not change the results of the analysis.

sgatev updated this revision to Diff 401581.Jan 20 2022, 3:33 AM
sgatev marked an inline comment as done.

Address reviewers' comments.

sgatev marked an inline comment as done.Jan 20 2022, 3:35 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

You're right, using block ids for ordering is better. I updated the code.

Also, could you describe where the flakiness is originated from?

Say we have a block B1 with predecessors B2 and B3. Let the environment of B2 after evaluating all of its statements is B2Env = { Expr1 -> Loc1 } and the environment of B3 after evaluating all of its statement is B3Env = { Expr2 -> Loc2 } where ExprX -> LocX refers to a particular mapping of storage locations to expressions.

What we want for the input environment of B1 is {} because B2Env and B3Env do not contain common assignments of storage locations to expressions. What we got before this patch is either B2Env.join(B3Env) = { Expr1 -> Loc1 } or B3Env.join(B2Env) = { Expr2 -> Loc2 }.

Without deterministic ordering of predecessors the test that I'm introducing in this patch is flaky.

xazax.hun added inline comments.Jan 20 2022, 3:47 AM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

What we got before this patch is either B2Env.join(B3Env) = { Expr1 -> Loc1 } or B3Env.join(B2Env) = { Expr2 -> Loc2 }.

I think I'm still missing something. With this patch, wouldn't both B2Env.join(B3Env) and B3Env.join(B2Env) produce the empty environment? If that is the case, do we still care about a deterministic order?

sgatev marked 2 inline comments as done.Jan 20 2022, 5:11 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

That's right. With the patch in DataflowEnvironment.cpp the particular order of predecessors doesn't affect the result. However, one of the properties that I'm looking for in tests is being able to remove functionality from the code and have the tests that exercise this functionality fail. This won't necessarily be the case here if the order wasn't deterministic. I don't have a strong preference so please let me know if you have concerns about it. I should also note that we expect all of this to be removed once temporary destructors are handled better in the CFG.

xazax.hun added inline comments.Jan 20 2022, 5:45 AM
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

Strictly speaking, making this deterministic is not a requirement, it should not have any observable effect for the end user. On the other hand, we could think of this non-determinism as a feature. A well behaved analysis should produce the same answer regardless of the order in which we process the nodes. (This requirement follows from the algebraic properties of the join operation.) So in the future I would even anticipate a feature that deliberately randomize the order to ensure that the clients are well behaved.

I think eliminating this non-determinism could potentially mask bugs in the future and also it requires extra code. I think I prefer the original version until we see some evidence that determinism is desired.

sgatev updated this revision to Diff 401615.Jan 20 2022, 6:04 AM
sgatev marked an inline comment as done.

Address reviewers' comments.

sgatev marked an inline comment as done.Jan 20 2022, 6:05 AM
sgatev added inline comments.
clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp
49

Sure. Reverted that part.

sgatev marked an inline comment as done.Jan 20 2022, 6:06 AM

Thanks, this looks good to me!