This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Analysis/FlowSensitive/
-
clang/
-
Analysis/
-
FlowSensitive/
-
DataflowEnvironment.h
-
lib/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
-
DataflowEnvironment.cpp
-
TypeErasedDataflowAnalysis.cpp
-
unittests/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
-
TransferTest.cpp

Differential D156658

[clang][dataflow] When checking `ExprToLoc` convergence, only consider children of block terminator.
AbandonedPublic

Authored by mboehme on Jul 31 2023, 2:38 AM.

Download Raw Diff

Details

Reviewers

NoQ

Summary

The only entries in ExprToLoc that will be read by a different block are the direct children of the block terminator (if one exists). For the purposes of determining whether ExprToLoc has converged, it is therefore sufficient to look at these entries, as any differences in other entries will not be seen by other blocks.

The other entries in ExprToLoc are only read during processing of the block that contains the corresponding expressions. To be clear, these entries can affect the results of the block, but only indirectly, in one of two ways:

If they are indirect descendants of the terminator and therefore affect the values of the terminator's direct children.

If they affect the entries in one of the other mappings in Environment.

Before this patch, we were comparing all entries in ExprToLoc, even if they were never accessed by other blocks, which could cause non-convergence. This patch adds two tests that demonstrate this; they do not converge without the other changes in this patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mboehme created this revision.Jul 31 2023, 2:38 AM

Herald added a reviewer: NoQ. · View Herald TranscriptJul 31 2023, 2:38 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: martong, xazax.hun. · View Herald Transcript

mboehme requested review of this revision.Jul 31 2023, 2:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 2:38 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

mboehme edited the summary of this revision. (Show Details)Jul 31 2023, 2:39 AM

mboehme added reviewers: ymandel, xazax.hun.

Retracting from review momentarily -- I've just realized that I should be comparing values the same way this is done for the LocToVal map.

mboehme removed reviewers: ymandel, xazax.hun.Jul 31 2023, 2:44 AM

Make check for Value equivalence consistent with the corresponding check for
LocToVal.

mboehme added reviewers: ymandel, xazax.hun.Jul 31 2023, 2:52 AM

Use dyn_cast_or_null instead of dyn_cast. (Children can be null.)

Harbormaster completed remote builds in B249150: Diff 545571.Jul 31 2023, 4:47 AM

Retracting from review. I think I was overly hasty with this change:

There are edges between expressions that cross CFG block boundaries but don't involve block terminators. Here's an example -- [B1.1] has [B2.1] as a child, yet [B2] doesn't even have a terminator.

More fundamentally, convergence is not only important when considering the environment as an input to successor blocks, but also as a basis for the diagnostics that we will emit, which may look at arbitrary expressions in the block. So we really want all expressions to be fully converged.

It looks as if, instead, what we should be doing to improve convergence in a sound manner is to implement widening for ExprToLoc. I'll submit a corresponding patch shortly.

mboehme removed reviewers: ymandel, xazax.hun.Jul 31 2023, 7:00 AM

It looks as if, instead, what we should be doing to improve convergence in a sound manner is to implement widening for ExprToLoc. I'll submit a corresponding patch shortly.

+1, I believe we want ExprToLoc to converge. That being said, if we can get away with not checking some parts, it could potentially be implemented as an optimization. In that case, I'd expect to still have full checking in debug builds and a strong argument why is it OK to not compare some parts.

In D156658#4546955, @xazax.hun wrote:

It looks as if, instead, what we should be doing to improve convergence in a sound manner is to implement widening for ExprToLoc. I'll submit a corresponding patch shortly.

+1, I believe we want ExprToLoc to converge. That being said, if we can get away with not checking some parts, it could potentially be implemented as an optimization. In that case, I'd expect to still have full checking in debug builds and a strong argument why is it OK to not compare some parts.

I've investigated this in more detail. Unfortunately, it turns out that it's not quite as simple as just implementing widening on ExprToLoc.

One of the reasons for this is that we only apply widening at loop heads, but the expressions that are "blocking" convergence may be contained in a block that is not a loop head.

We could solve this by applying widening everywhere, but AFAIU, that's really not desirable because you lose a lot of precision that way.

But really, I think ExprToLoc is just a red herring here. The real issue is:

We lack widening on PointerValues
Instead, for the purposes of convergence, we simply ignore differences in PointerValues
However, different PointerValues can lead to different locations for expressions that dereference these PointerValues, and we do consider the difference in these locations to be relevant for convergence

In other words, we have an inconsistency between when we consider a PointerValue to be converged and when we consider the storage location for an expression to be converged.

I think the real solution to this would be to introduce widening for PointerValues. Essentially, what we want is a "top" PointerValue that does not have an associated StorageLocation. However, we don't want to eliminate the PointerValue entirely; we still want to be able to attach properties to it, so that, for example, an analysis can record that the PointerValue is non-null, even though we don't know what its exact location is. This is important for us to be able to handle cases like this one correctly.

The most obvious way to implement a "top" PointerValue would be to make PointerValue::getPointeeLoc() return a nullable pointer instead of a reference. When dereferencing a PointerValue without a storage location, we would then not associate the corresponding Expr with a storage location at all, thereby solving the convergence issue. However, this approach would require some effort, as it would involve changing callers of PointerValue::getPointeeLoc() so that they can deal with the case where no pointee location is available.

I therefore think that we should consider a shorter-term solution: In Environment::equivalentTo(), ignoring glvalue entries in ExprToLoc for certain expressions where it's unlikely that any analysis will ever want to retrieve their storage location. See https://reviews.llvm.org/D156856, which I've just submitted for review. I hope that, in practice, this will cover a majority of the cases that are causing non-convergence.

In D156658#4552965, @mboehme wrote:

I've investigated this in more detail. Unfortunately, it turns out that it's not quite as simple as just implementing widening on ExprToLoc.

One of the reasons for this is that we only apply widening at loop heads, but the expressions that are "blocking" convergence may be contained in a block that is not a loop head.

I am probably missing something, but I why does it matter where are we doing the widening? It is possible that we might need to redesign how parts of the program state is represented, but I do not immediately see any fundamental roadblocks.

I think the real solution to this would be to introduce widening for PointerValues.

Fully agreed.

Essentially, what we want is a "top" PointerValue that does not have an associated StorageLocation. However, we don't want to eliminate the PointerValue entirely; we still want to be able to attach properties to it, so that, for example, an analysis can record that the PointerValue is non-null, even though we don't know what its exact location is.

Another way to interpret "top": it points to a "summary" StorageLocation that can be any other StorageLocation, we just do not know which one. This interpretation/formulation has some advantages:

We have a StorageLocation to use when we dereference these top pointers.
It is compatible with the alias sets representation.
It is compatible with some other representations where we have other "summary" locations, like "UnkownStackLocation" or "UnkownHeapLocation".

These summary memory locations are sort of the union of all the potential memory locations they could represent. I think in general it might be useful to embrace this idea, e.g., when we model arrays, we can have a single element region representing all the knowledge we know to be true for all elements of the array.

Sorry for the late response -- I was on vacation.

In D156658#4554347, @xazax.hun wrote:

In D156658#4552965, @mboehme wrote:

I've investigated this in more detail. Unfortunately, it turns out that it's not quite as simple as just implementing widening on ExprToLoc.

One of the reasons for this is that we only apply widening at loop heads, but the expressions that are "blocking" convergence may be contained in a block that is not a loop head.

I am probably missing something, but I why does it matter where are we doing the widening?

We only do widening at loop heads, and this means that widening only affects locations and values that flow into the loop from the outside or from a previous loop iteration.

But convergence can also be blocked by locations and values that are only used within the loop body. If these change from loop iteration to loop iteration and we don't perform widening on them, we will conclude that the state of the loop body never converges.

Essentially, what we want is a "top" PointerValue that does not have an associated StorageLocation. However, we don't want to eliminate the PointerValue entirely; we still want to be able to attach properties to it, so that, for example, an analysis can record that the PointerValue is non-null, even though we don't know what its exact location is.

Another way to interpret "top": it points to a "summary" StorageLocation that can be any other StorageLocation, we just do not know which one. This interpretation/formulation has some advantages:

We have a StorageLocation to use when we dereference these top pointers.

It is compatible with the alias sets representation.

It is compatible with some other representations where we have other "summary" locations, like "UnkownStackLocation" or "UnkownHeapLocation".

These summary memory locations are sort of the union of all the potential memory locations they could represent. I think in general it might be useful to embrace this idea, e.g., when we model arrays, we can have a single element region representing all the knowledge we know to be true for all elements of the array.

I like this!

I'll have to do some thinking about how we want to represent these unknown / "top" storage locations exactly. Is there going to be a singleton "top" storage location? Are we going to allow associating the "top" storage location with a value (probably not...)? And so on. But this seems like a good direction to investigate.

mboehme abandoned this revision.Aug 22 2023, 6:50 AM

mboehme mentioned this in D158513: [clang][dataflow] Add two repros for non-convergence involving pointers in loops..Aug 22 2023, 6:53 AM

I've broken out the tests into https://reviews.llvm.org/D158513, as it seems clear we want to get these submitted.

In D156658#4606400, @mboehme wrote:

We only do widening at loop heads, and this means that widening only affects locations and values that flow into the loop from the outside or from a previous loop iteration.

But convergence can also be blocked by locations and values that are only used within the loop body. If these change from loop iteration to loop iteration and we don't perform widening on them, we will conclude that the state of the loop body never converges.

I wonder if this is something we need to solve. Maybe when we do the widening at the loop head that should also affect values that are not flowing into that node. I.e., at the loop heads we might want to also look at the values from the previous iteration that are only used within the loop body.

mboehme mentioned this in rGa1a63d68a468: [clang][dataflow] Add two repros for non-convergence involving pointers in….Aug 23 2023, 12:03 AM

Revision Contents

Path

Size

clang/

include/

clang/

Analysis/

FlowSensitive/

DataflowEnvironment.h

25 lines

lib/

Analysis/

FlowSensitive/

DataflowEnvironment.cpp

41 lines

TypeErasedDataflowAnalysis.cpp

3 lines

unittests/

Analysis/

FlowSensitive/

TransferTest.cpp

52 lines

Diff 545571

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h

Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	public:
Environment pushCall(const CallExpr *Call) const;		Environment pushCall(const CallExpr *Call) const;
Environment pushCall(const CXXConstructExpr *Call) const;		Environment pushCall(const CXXConstructExpr *Call) const;

/// Moves gathered information back into `this` from a `CalleeEnv` created via		/// Moves gathered information back into `this` from a `CalleeEnv` created via
/// `pushCall`.		/// `pushCall`.
void popCall(const CallExpr *Call, const Environment &CalleeEnv);		void popCall(const CallExpr *Call, const Environment &CalleeEnv);
void popCall(const CXXConstructExpr *Call, const Environment &CalleeEnv);		void popCall(const CXXConstructExpr *Call, const Environment &CalleeEnv);

/// Returns true if and only if the environment is equivalent to `Other`, i.e		/// Returns true if and only if the environment for a particular CFG block is
/// the two environments:		/// equivalent to `Other`, i.e the two environments:
/// - have the same mappings from declarations to storage locations,		/// - have the same mappings from declarations to storage locations,
/// - have the same mappings from expressions to storage locations,		/// - have the same mappings from expressions accessed outside the block to
		// storage locations,
/// - have the same or equivalent (according to `Model`) values assigned to		/// - have the same or equivalent (according to `Model`) values assigned to
/// the same storage locations.		/// the same storage locations.
///		///
		/// Note that the expressions accessed outside the block are exactly the
		/// children of the block terminator. `Terminator` should be this block
		/// terminator, or null if the block does not have a terminator.
		///
		/// The storage locations for all other expressions are only accessed while
		/// processing the block. They can still affect the results of the block, but
		/// only indirectly, in one of two ways:
		/// - If they are indirect descendants of the terminator and therefore affect
		/// the values of the terminator's direct children.
		/// - If they affect the entries in one of the other mappings.
		///
/// Requirements:		/// Requirements:
///		///
/// `Other` and `this` must use the same `DataflowAnalysisContext`.		/// `Other` and `this` must be environments for the same block and must use
bool equivalentTo(const Environment &Other,		/// the same `DataflowAnalysisContext`.
Environment::ValueModel &Model) const;		bool equivalentTo(const Environment &Other, Environment::ValueModel &Model,
		const Stmt *Terminator) const;

/// Joins two environments by taking the intersection of storage locations and		/// Joins two environments by taking the intersection of storage locations and
/// values that are stored in them. Distinct values that are assigned to the		/// values that are stored in them. Distinct values that are assigned to the
/// same storage locations in `EnvA` and `EnvB` are merged using `Model`.		/// same storage locations in `EnvA` and `EnvB` are merged using `Model`.
///		///
/// Requirements:		/// Requirements:
///		///
/// `EnvA` and `EnvB` must use the same `DataflowAnalysisContext`.		/// `EnvA` and `EnvB` must use the same `DataflowAnalysisContext`.
▲ Show 20 Lines • Show All 524 Lines • Show Last 20 Lines

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	void Environment::popCall(const CXXConstructExpr *Call,
this->LocToVal = std::move(CalleeEnv.LocToVal);		this->LocToVal = std::move(CalleeEnv.LocToVal);
this->FlowConditionToken = std::move(CalleeEnv.FlowConditionToken);		this->FlowConditionToken = std::move(CalleeEnv.FlowConditionToken);

if (Value Val = CalleeEnv.getValue(CalleeEnv.ThisPointeeLoc)) {		if (Value Val = CalleeEnv.getValue(CalleeEnv.ThisPointeeLoc)) {
setValueStrict(Call, Val);		setValueStrict(Call, Val);
}		}
}		}

		static bool exprToLocEquivalent(const Environment &Env1,
		const Environment &Env2, const Stmt *Terminator,
		Environment::ValueModel &Model) {
		if (Terminator == nullptr)
		return true;

		for (const Stmt *Child : Terminator->children()) {
		auto *E = dyn_cast_or_null<Expr>(Child);
		if (E == nullptr)
		continue;

		if (E->isGLValue()) {
		if (Env1.getStorageLocationStrict(*E) !=
		Env2.getStorageLocationStrict(*E))
		return false;
		} else {
		// For prvalues, locations don't matter -- just look at whether they map
		// to equivalent values.
		Value Val1 = Env1.getValue(E);
		Value Val2 = Env2.getValue(E);

		if (Val1 == nullptr \|\| Val2 == nullptr) {
		if (Val1 == nullptr && Val2 == nullptr)
		continue;
		return false;
		}

		if (!areEquivalentValues(Val1, Val2) &&
		!compareDistinctValues(E->getType(), Val1, Env1, Val2, Env2, Model))
		return false;
		}
		}

		return true;
		}

bool Environment::equivalentTo(const Environment &Other,		bool Environment::equivalentTo(const Environment &Other,
Environment::ValueModel &Model) const {		Environment::ValueModel &Model,
		const Stmt *Terminator) const {
assert(DACtx == Other.DACtx);		assert(DACtx == Other.DACtx);

if (ReturnVal != Other.ReturnVal)		if (ReturnVal != Other.ReturnVal)
return false;		return false;

if (ReturnLoc != Other.ReturnLoc)		if (ReturnLoc != Other.ReturnLoc)
return false;		return false;

if (ThisPointeeLoc != Other.ThisPointeeLoc)		if (ThisPointeeLoc != Other.ThisPointeeLoc)
return false;		return false;

if (DeclToLoc != Other.DeclToLoc)		if (DeclToLoc != Other.DeclToLoc)
return false;		return false;

if (ExprToLoc != Other.ExprToLoc)		if (!exprToLocEquivalent(*this, Other, Terminator, Model))
return false;		return false;

// Compare the contents for the intersection of their domains.		// Compare the contents for the intersection of their domains.
for (auto &Entry : LocToVal) {		for (auto &Entry : LocToVal) {
const StorageLocation *Loc = Entry.first;		const StorageLocation *Loc = Entry.first;
assert(Loc != nullptr);		assert(Loc != nullptr);

Value *Val = Entry.second;		Value *Val = Entry.second;
▲ Show 20 Lines • Show All 550 Lines • Show Last 20 Lines

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp

Show First 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	if (OldBlockState) {
Effect2 == LatticeJoinEffect::Unchanged) {		Effect2 == LatticeJoinEffect::Unchanged) {
// The state of `Block` didn't change from widening so there's no need		// The state of `Block` didn't change from widening so there's no need
// to revisit its successors.		// to revisit its successors.
AC.Log.blockConverged();		AC.Log.blockConverged();
continue;		continue;
}		}
} else if (Analysis.isEqualTypeErased(OldBlockState->Lattice,		} else if (Analysis.isEqualTypeErased(OldBlockState->Lattice,
NewBlockState.Lattice) &&		NewBlockState.Lattice) &&
OldBlockState->Env.equivalentTo(NewBlockState.Env, Analysis)) {		OldBlockState->Env.equivalentTo(NewBlockState.Env, Analysis,
		Block->getTerminatorStmt())) {
// The state of `Block` didn't change after transfer so there's no need		// The state of `Block` didn't change after transfer so there's no need
// to revisit its successors.		// to revisit its successors.
AC.Log.blockConverged();		AC.Log.blockConverged();
continue;		continue;
}		}
}		}

BlockStates[Block->getBlockID()] = std::move(NewBlockState);		BlockStates[Block->getBlockID()] = std::move(NewBlockState);
Show All 24 Lines

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp

Show First 20 Lines • Show All 3,830 Lines • ▼ Show 20 Lines	runDataflow(

// The loop body may not have been executed, so we should not conclude		// The loop body may not have been executed, so we should not conclude
// that `l` points to `val`.		// that `l` points to `val`.
EXPECT_NE(&LVal->getPointeeLoc(),		EXPECT_NE(&LVal->getPointeeLoc(),
OuterEnv.getStorageLocation(*ValDecl));		OuterEnv.getStorageLocation(*ValDecl));
});		});
}		}

		TEST(TransferTest, LoopDereferencingChangingPointerConverges) {
		std::string Code = R"cc(
		bool some_condition();

		void target(int i1, int i2) {
		int *p = &i1;
		while (true) {
		(void)*p;
		if (some_condition())
		p = &i1;
		else
		p = &i2;
		}
		}
		)cc";
		// The key property that we are verifying is implicit in `runDataflow` --
		// namely, that the analysis succeeds, rather than hitting the maximum number
		// of iterations.
		runDataflow(
		Code,
		[](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
		ASTContext &ASTCtx) {});
		}

		TEST(TransferTest, LoopDereferencingChangingRecordPointerConverges) {
		std::string Code = R"cc(
		struct Lookup {
		int x;
		};

		bool some_condition();

		void target(Lookup l1, Lookup l2) {
		Lookup *l = &l1;
		while (true) {
		(void)l->x;
		if (some_condition())
		l = &l1;
		else
		l = &l2;
		}
		}
		)cc";
		// The key property that we are verifying is implicit in `runDataflow` --
		// namely, that the analysis succeeds, rather than hitting the maximum number
		// of iterations.
		runDataflow(
		Code,
		[](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
		ASTContext &ASTCtx) {});
		}

TEST(TransferTest, DoesNotCrashOnUnionThisExpr) {		TEST(TransferTest, DoesNotCrashOnUnionThisExpr) {
std::string Code = R"(		std::string Code = R"(
union Union {		union Union {
int A;		int A;
float B;		float B;
};		};

void foo() {		void foo() {
▲ Show 20 Lines • Show All 1,805 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] When checking `ExprToLoc` convergence, only consider children of block terminator.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 545571

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp

[clang][dataflow] When checking `ExprToLoc` convergence, only consider children of block terminator.
AbandonedPublic