This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Enable merging distinct values in Environment::join
ClosedPublic

Authored by sgatev on Jan 24 2022, 5:37 AM.

Details

Summary

Make specializations of DataflowAnalysis extendable with domain-specific
logic for merging distinct values when joining environments. This could be
a strict lattice join or a more general widening operation.

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Diff Detail

Event Timeline

sgatev created this revision.Jan 24 2022, 5:37 AM
sgatev requested review of this revision.Jan 24 2022, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2022, 5:37 AM
sgatev updated this revision to Diff 402490.Jan 24 2022, 5:42 AM

Add missing include.

Interesting. So clients can affect how the environment is being merged. As a result, we potentially cannot run multiple clients in the same fixed-point iteration, each of them will require separate passes.

I think this is OK, just wanted to to be explicit.

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
66

Previously, when the values were distinct, we did not include anything in the merged environment. With the new model, we will end up creating "default" values for every one of them. I wonder if this is wasteful. We could potentially also defer this until we have some real world data and can benchmark this. But I think we could consider changing the return type to bool to specify if the merged value should be included in the resulting environment at all and this could return false by default.

sgatev updated this revision to Diff 402529.Jan 24 2022, 7:42 AM
sgatev marked an inline comment as done.

Address reviewers' comments.

Right. To model the behavior of a language feature (e.g. a simple model that tracks whether a std::optional value is engaged) clients can attach properties to symbolic values and decide how distinct values are joined. This doesn't completely rule out the possibility of running multiple clients in the same fixed-point iteration. In fact, we think this could be beneficial as clients can use each others' results by enriching the environment (e.g. dead code analysis can take advantage of understanding of whether a std::optional value is always (not) engaged). This is a direction that we're currently exploring. It does require some care to ensure that the clients are not stepping on each others' toes. For example, we should probably replace the current Properties mechanism with something more robust to ensure that properties are stored in isolated namespaces.

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
66

Good idea. Let's only include the merge value if necessary for now. I changed the interface to return a bool indicating whether the merged value should be included or not. I added a FIXME to consider not storing the constructed value in the context if it isn't necessary. I'd prefer to address that separately, to keep the scope of this patch smaller.

xazax.hun accepted this revision.Jan 24 2022, 7:52 AM

I think the canonical approach most of the time is to have a map lattice that maps from values to your lattice elements. Storing properties directly at the object sounds like an interesting approach. I'll be curious to see how it works out. Do we expect the framework to propagate these properties automatically across copy ctors/copy assignments, move operations etc?

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
364

An alternative approach would be to just return false here and whenever the property lookup fails just assume the top value. Since this is just a simple test, feel free to leave this as is.

This revision is now accepted and ready to land.Jan 24 2022, 7:52 AM
sgatev updated this revision to Diff 402550.Jan 24 2022, 8:45 AM

Address reviewers' comments.

sgatev marked an inline comment as done.Jan 24 2022, 8:47 AM

Yes, I think we should be able to propagate properties across copy and move operations in the framework. Clients can override this behavior by modeling the copy and move operations and storing different values in the environment.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
364

Right. This seems simpler so let's go with it for now.

sgatev marked an inline comment as done.Jan 24 2022, 10:01 AM
xazax.hun accepted this revision.Jan 25 2022, 10:04 AM