This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add `Strict` versions of `Value` and `StorageLocation` accessors.
ClosedPublic

Authored by mboehme on May 16 2023, 2:55 AM.

Details

Summary

This is part of the gradual migration to strict handling of value categories, as described in the RFC at https://discourse.llvm.org/t/70086.

This patch migrates some representative calls of the newly deprecated accessors to the new Strict functions. Followup patches will migrate the remaining callers. (There are a large number of callers, with some subtlety involved in some of them, so it makes sense to split this up into multiple patches rather than migrating all callers in one go.)

The Strict accessors as implemented here have some differences in semantics compared to the semantics originally proposed in the RFC; specifically:

  • setStorageLocationStrict(): The RFC proposes to create an intermediate ReferenceValue that then refers to the StorageLocation for the glvalue. It turns out though that, even today, most places in the code are not doing this but are instead associating glvalues directly with their StorageLocation. It therefore didn't seem to make sense to introduce new ReferenceValues where there were none previously, so I have chosen to instead make setStorageLocationStrict() simply call through to setStorageLocation(const Expr &, StorageLocation &) and merely add the assertion that the expression must be a glvalue.
  • getStorageLocationStrict(): The RFC proposes that this should assert that the storage location for the glvalue expression is associated with an intermediate ReferenceValue, but, as explained, this is often not true. The current state is inconsistent: Sometimes the intermediate ReferenceValue is there, sometimes it isn't. For this reason, getStorageLocationStrict() skips past a ReferenceValue if it is there but otherwise directly returns the storage location associated with the expression. This behavior is equivalent to the existing behavior of SkipPast::Reference.
  • setValueStrict(): The RFC proposes that this should always create the same StorageLocation for a given Value, but, in fact, the transfer functions that exist today don't guarantee this; almost all transfer functions unconditionally create a new StorageLocation when associating an expression with a Value.

    There appears to be one special case: TerminatorVisitor::extendFlowCondition() checks whether the expression is already associated with a StorageLocation and, if so, reuses the existing StorageLocation instead of creating a new one.

    For this reason, setValueStrict() implements this logic (preserve an existing StorageLocation) but makes no attempt to always associate the same StorageLocation with a given Value, as nothing in the framework appers to require this.

    As TerminatorVisitor::extendFlowCondition() is an interesting special case, the setValue() call there is among the ones that this patch migrates to setValueStrict().

Diff Detail

Event Timeline

mboehme created this revision.May 16 2023, 2:55 AM
Herald added a project: Restricted Project. · View Herald Transcript
mboehme requested review of this revision.May 16 2023, 2:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 2:55 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
mboehme edited the summary of this revision. (Show Details)May 16 2023, 2:56 AM
mboehme added reviewers: sammccall, ymandel, xazax.hun.
sammccall accepted this revision.May 16 2023, 5:27 AM
sammccall added inline comments.
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
644

why does it make sense to allow setStorageLocationStrict but not getStorageLocationStrict for builtin functions?

705

this isn't documented, should it be?

This revision is now accepted and ready to land.May 16 2023, 5:27 AM

I'm a little confused by one of the points in the description:

setValueStrict(): The RFC proposes that this should always create the same StorageLocation for a given Value, but, in fact, the transfer functions that exist today don't guarantee this; almost all transfer functions unconditionally create a new StorageLocation when associating an expression with a Value.

Since values are immutable, we (intentionally) share them between different storage locations. So, it makes sense that a single value could be associated with multiple storage locations. Now, there's a bug in that values aren't really immutable since you can update the fields, so we have a false sharing bug in the framework, but maybe that only applies to glvalues?

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
273

Use LLVM_DEPRECATED? here and below.

ymandel accepted this revision.May 16 2023, 5:44 AM
mboehme updated this revision to Diff 522591.May 16 2023, 6:44 AM

Tweaked preconditions (in comments and code)

mboehme marked 3 inline comments as done.May 16 2023, 6:54 AM

I'm a little confused by one of the points in the description:

setValueStrict(): The RFC proposes that this should always create the same StorageLocation for a given Value, but, in fact, the transfer functions that exist today don't guarantee this; almost all transfer functions unconditionally create a new StorageLocation when associating an expression with a Value.

Since values are immutable, we (intentionally) share them between different storage locations. So, it makes sense that a single value could be associated with multiple storage locations. Now, there's a bug in that values aren't really immutable since you can update the fields, so we have a false sharing bug in the framework, but maybe that only applies to glvalues?

Yes, values are and should definitely be shared between different storage locations.

The comment in the RFC is specifically in the context of the "fake" storage locations that we create for prvalues (since setValueStrict() only deals with prvalues). I call these storage locations "fake" because a prvalue doesn't actually have a storage location. During the review of the RFC, one of the reviewers asked whether we should guarantee that a given Value, when associated with a prvalue Expr, should always use the same "fake" storage location, instead of creating a new storage location every time. The idea was that this might be required for convergence. This argument made sense to me at the time, but having looked at what the existing code does, almost every place that associates a Value with a prvalue expression unconditionally create a new "fake" storage location. So this implies to me that creating durable fake storage locations isn't necessary.

Does this make sense?

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
273

There are some buildbots that are configured to produce an error (instead of a warning) when an LLVM_DEPRECATED function is used. (I've unfortunately broken builds more than once because of this.)

The functions that are being deprecated in this patch are still being used in many places, so we can only deprecate them with a comment, but not with LLVM_DEPRECATED.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
644

Thanks for pointing this out -- this was an inconsistency that I just missed (and obviously there aren't any tests that exercise this case for getStorageLocationStrict()).

I've changed this (in comments and code) to be consistent with setStorageLocationStrict().

705

Yes! Done.

xazax.hun accepted this revision.May 16 2023, 9:46 AM

I'm a little confused by one of the points in the description:

setValueStrict(): The RFC proposes that this should always create the same StorageLocation for a given Value, but, in fact, the transfer functions that exist today don't guarantee this; almost all transfer functions unconditionally create a new StorageLocation when associating an expression with a Value.

Since values are immutable, we (intentionally) share them between different storage locations. So, it makes sense that a single value could be associated with multiple storage locations. Now, there's a bug in that values aren't really immutable since you can update the fields, so we have a false sharing bug in the framework, but maybe that only applies to glvalues?

Yes, values are and should definitely be shared between different storage locations.

The comment in the RFC is specifically in the context of the "fake" storage locations that we create for prvalues (since setValueStrict() only deals with prvalues). I call these storage locations "fake" because a prvalue doesn't actually have a storage location. During the review of the RFC, one of the reviewers asked whether we should guarantee that a given Value, when associated with a prvalue Expr, should always use the same "fake" storage location, instead of creating a new storage location every time. The idea was that this might be required for convergence. This argument made sense to me at the time, but having looked at what the existing code does, almost every place that associates a Value with a prvalue expression unconditionally create a new "fake" storage location. So this implies to me that creating durable fake storage locations isn't necessary.

Does this make sense?

Yes, makes sense.

mboehme marked 3 inline comments as done.May 17 2023, 2:30 AM

Failing x64 debian build is due to a wrong CMake version, which is unrelated to this patch.

This revision was landed with ongoing or failed builds.May 17 2023, 2:30 AM
This revision was automatically updated to reflect the committed changes.