This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add transfer functions for assignment
ClosedPublic

Authored by sgatev on Jan 4 2022, 5:48 AM.

Details

Summary

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

Diff Detail

Event Timeline

sgatev created this revision.Jan 4 2022, 5:48 AM
sgatev requested review of this revision.Jan 4 2022, 5:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2022, 5:48 AM
sgatev updated this revision to Diff 397275.Jan 4 2022, 6:16 AM

Extend TransferTest.

xazax.hun added inline comments.Jan 4 2022, 9:33 AM
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
34

I am just wondering if this is the right level of abstraction. Maybe users do think of this in terms of skipping references?
Or would they think in terms of value categories? E.g., having getLValue vs getRValue. I think we can leave this as is for now, I just like the idea of exploring alternatives.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
95

What is the model for expressions that evaluate to different locations in every loop iterations, e.g.:

for(int *p = array; p != array + size; ++p)
  *p; // The dereference expression

Here, having *p always evaluate to the same location is not correct, we'd probably need a way to widen this.

clang/lib/Analysis/FlowSensitive/Transfer.cpp
46

I think you probably also want to set the correct location for the whole expression. Or is that handled somewhere else?

sgatev updated this revision to Diff 397508.Jan 5 2022, 3:40 AM
sgatev marked 3 inline comments as done.

Address reviewers' comments.

sgatev added inline comments.Jan 5 2022, 3:42 AM
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
34

Absolutely! It makes sense to me to consider this (and other) alternatives so I added a FIXME as a reminder.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
95

In the current model, the storage location for p is stable and the value that is assigned to it changes. So, in one iteration the value assigned to the storage location for p is val1 which is a PointerValue that points to loc1 and in another iteration the value assigned to the storage location for p is val2 which is a PointerValue that points to loc2. The storage location for *p is also stable and the value that is assigned to it is a ReferenceValue that points to either loc1 or loc2. I implemented this and added a test.

clang/lib/Analysis/FlowSensitive/Transfer.cpp
46

You're right. I updated the code and added a test.

ymandel added inline comments.Jan 5 2022, 5:59 AM
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
36

I'm not sure that value categories are right here. The key is that skipping is optional, unlike value categories which are fixed. I think the problem is just that C++ references are weird (for lack of a technical term). A reference variable exists simultaneously in two different categories -- a pointer and the value it points to. That's what this is trying to capture.

So, I'd recommend just changing the name to reflect that. WDYT of something like this?

/// Indicates how to treat references, if encountered -- as a reference or the value referred to -- when retrieving
/// storage locations or values.
enum class RefTreatment {
   /// Consider the reference itself.
  AsSelf,
  /// Consider the referent.
  AsReferent,
};
113

Why have a SkipPast argument here? I understand the concept for considering var-decls - it corresponds to the weirdness of lvalue-refs themselves. But StorageLocation is a lower-level (and simpler) concept, to which this doesn't seem to map. I'm fine with resolving indirections, but what is the value of only doing so conditionally?

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
242–247
clang/lib/Analysis/FlowSensitive/Transfer.cpp
37

maybe comment as to why this is necessary? I'd have thought that CFG's flattening of expressions would handle this (why, in general, we don't need to look deep into expressions).

143

nit: might be easier to read if you invert this:

const Expr *InitExpr = D.getInit();
if (InitExpr == nullptr) return;
152–153

nit: else-if (one line)?

xazax.hun added inline comments.Jan 5 2022, 9:23 AM
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
36

Yeah, value categories are indeed fixed, I was thinking along the lines of treating expressions as if there was an LValueToRValue or RValueToLValue conversion. I think some of this problem might come from the ambiguity. E.g., when I want to query the analysis state about the value of *p, what do I want? Do I want the value for *p as an lvalue or *p as an rvalue? I would get a different answer in both cases. If I see *p in the source code it is fixed but when I want to query the analysis state it is not.

On the other hand I see that SkipPast is used in the transfer functions, and I am wondering if that is justified. According to the language rules, I can never observe the actual value of the reference. I can modify the pointee, (ref = val) or create a pointer to the same value (&ref), but I cannot actually read or modify what is inside the reference. So I wonder if it might be less error prone if there would be no way in the transfer functions to observe the values of the references either.

But I do not have a strong feeling about any of this at this point so feel free to leave everything as it is.

So, I'd recommend just changing the name to reflect that.

I am not sure how clear Referent is. For a non-native speaker Pointee might be clearer, although I do see how it can be confusing.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
95

In this case, does it really make sense to materialize this mapping? If every subexpression corresponds to a unique location, and always the same location, we could use the subexpression directly as a location.

sgatev updated this revision to Diff 398629.Jan 10 2022, 7:21 AM
sgatev marked 8 inline comments as done.

Address reviewers' comments.

sgatev added inline comments.Jan 10 2022, 7:27 AM
clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
36

Thanks both! I agree with the comments. I prefer to keep it as is for now. I updated the FIXME with a pointer to this discussion so that we remember to get back to it at some point.

113

You're right. It's not necessary here. Removed it.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
95

True, but a StorageLocation can also be assigned to a ValueDecl, not only to Expr. We also keep some additional information in the StorageLocation sub-classes so I don't think we can replace it directly.

clang/lib/Analysis/FlowSensitive/Transfer.cpp
37

Added a comment. The CFG does not contain ParenExpr as top-level statements in basic blocks, however sub-expressions can still be of that type.

ymandel accepted this revision.Jan 10 2022, 7:48 AM
This revision is now accepted and ready to land.Jan 10 2022, 7:48 AM
xazax.hun accepted this revision.Jan 10 2022, 10:58 AM
This revision was landed with ongoing or failed builds.Jan 10 2022, 11:37 AM
This revision was automatically updated to reflect the committed changes.