This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Analysis/FlowSensitive/
-
clang/
-
Analysis/
-
FlowSensitive/
-
DataflowEnvironment.h
-
lib/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
3/5
DataflowEnvironment.cpp
-
unittests/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
-
TransferTest.cpp

Differential D122838

[clang][dataflow] Add support for correlation of boolean (tracked) values
ClosedPublic

Authored by ymandel on Mar 31 2022, 10:57 AM.

Download Raw Diff

Details

Reviewers

xazax.hun
sgatev

Commits

rG01db10365e93: [clang][dataflow] Add support for correlation of boolean (tracked) values

Summary

This patch extends the join logic for environments to explicitly handle
boolean values. It creates the disjunction of both source values, guarded by the
respective flow conditions from each input environment. This change allows the
framework to reason about boolean correlations across multiple branches (and
subsequent joins).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ymandel created this revision.Mar 31 2022, 10:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 10:57 AM

Herald added subscribers: tschuett, steakhal, rnkovacs. · View Herald Transcript

ymandel requested review of this revision.Mar 31 2022, 10:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 10:57 AM

xazax.hun added inline comments.Mar 31 2022, 1:26 PM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
79	Hmm, interesting. I think we view every boolean formula at a certain program point implicitly as `FlowConditionAtThatPoint && Formula`. And the flow condition at a program point should already be a disjunction of its predecessors. So it would be interpreted as: `(FlowConditionPred1 \|\| FlowConditionPred2) && (FormulaAtPred1 \|\| FormulaAtPred2)`. While this is great, this is not the strongest condition we could derive. `(FlowConditionPred1 && FormulaAtPred1) \|\| (FormulaAtPred2 && FlowConditionPred2)` created by this code snippet is stronger which is great. My main concern is whether we would end up seeing an exponential explosion in the size of these formulas in the number of branches following each other in a sequence.

Harbormaster completed remote builds in B157229: Diff 419518.Mar 31 2022, 3:22 PM

sgatev accepted this revision.Apr 1 2022, 12:55 AM

sgatev added inline comments.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
79	Yeap, I agree this is suboptimal and I believe I'm the one to blame for introducing it downstream. I wonder if we can represent the flow condition of each environment using a bool atom and have a mapping of bi-conditionals between flow condition atoms and flow condition constraints. Something like: FC1 <=> C1 ^ C2 FC2 <=> C2 ^ C3 ^ C4 FC3 <=> (FC1 v FC2) ^ C5 ... We can use that to simplify the formulas here and in `joinConstraints`. The mapping can be stored in `DataflowAnalysisContext`. We can track dependencies between flow conditions (e.g. above `FC3` depends on `FC1` and `FC2`) and modify `flowConditionImplies` to construct a formula that includes the bi-conditionals for all flow condition atoms in the transitive set before invoking the solver. I suggest putting the optimization in its own patch. I'd love to look into it right after this patch is submitted if both of you think it makes sense on a high level.

This revision is now accepted and ready to land.Apr 1 2022, 12:55 AM

ymandel added inline comments.Apr 1 2022, 4:28 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
79	This sounds good to me. That said, I'm not sure how often we'd expect this to be an issue in practice, since, IIUC, this specialized merge only occurs when the value is handled differently in the two branches. So, a series of branches alone won't trigger the exponential behavior.

xazax.hun accepted this revision.Apr 1 2022, 8:00 AM

xazax.hun added inline comments.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
79	Fair point. It might never cause a problem in practice. Since we already have one proposal how to try to fix this if a problem ever surfaces, I'd love to have a comment about this in the code. But I think this is something that probably does not need to be addressed in the near future.

Add comment about future optimization potential.

ymandel marked an inline comment as done.Apr 1 2022, 8:14 AM

ymandel added inline comments.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
79	Good idea -- I reformatted essentially sgatev's response above as a comment in the code.

Harbormaster completed remote builds in B157422: Diff 419764.Apr 1 2022, 8:45 AM

ymandel mentioned this in D122830: [clang][dataflow] Add support for (built-in) (in)equality operators.Apr 1 2022, 10:12 AM

Closed by commit rG01db10365e93: [clang][dataflow] Add support for correlation of boolean (tracked) values (authored by ymandel). · Explain WhyApr 1 2022, 10:30 AM

This revision was automatically updated to reflect the committed changes.

ymandel marked an inline comment as done.

ymandel added a commit: rG01db10365e93: [clang][dataflow] Add support for correlation of boolean (tracked) values.

Revision Contents

Path

Size

clang/

include/

clang/

Analysis/

FlowSensitive/

DataflowEnvironment.h

4 lines

lib/

Analysis/

FlowSensitive/

DataflowEnvironment.cpp

53 lines

unittests/

Analysis/

FlowSensitive/

TransferTest.cpp

55 lines

Diff 419797

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	public:
/// result. If the given boolean values represent the same value, the result		/// result. If the given boolean values represent the same value, the result
/// will be a value that represents the true boolean literal.		/// will be a value that represents the true boolean literal.
BoolValue &makeIff(BoolValue &LHS, BoolValue &RHS) const {		BoolValue &makeIff(BoolValue &LHS, BoolValue &RHS) const {
return &LHS == &RHS		return &LHS == &RHS
? getBoolLiteralValue(true)		? getBoolLiteralValue(true)
: makeAnd(makeImplication(LHS, RHS), makeImplication(RHS, LHS));		: makeAnd(makeImplication(LHS, RHS), makeImplication(RHS, LHS));
}		}

		const llvm::DenseSet<BoolValue *> &getFlowConditionConstraints() const {
		return FlowConditionConstraints;
		}

/// Adds `Val` to the set of clauses that constitute the flow condition.		/// Adds `Val` to the set of clauses that constitute the flow condition.
void addToFlowCondition(BoolValue &Val);		void addToFlowCondition(BoolValue &Val);

/// Returns true if and only if the clauses that constitute the flow condition		/// Returns true if and only if the clauses that constitute the flow condition
/// imply that `Val` is true.		/// imply that `Val` is true.
bool flowConditionImplies(BoolValue &Val) const;		bool flowConditionImplies(BoolValue &Val) const;

private:		private:
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (auto *IndVal1 = dyn_cast<IndirectionValue>(Val1)) {
auto *IndVal2 = cast<IndirectionValue>(Val2);		auto *IndVal2 = cast<IndirectionValue>(Val2);
assert(IndVal1->getKind() == IndVal2->getKind());		assert(IndVal1->getKind() == IndVal2->getKind());
return &IndVal1->getPointeeLoc() == &IndVal2->getPointeeLoc();		return &IndVal1->getPointeeLoc() == &IndVal2->getPointeeLoc();
}		}

return Model.compareEquivalent(Type, Val1, Env1, Val2, Env2);		return Model.compareEquivalent(Type, Val1, Env1, Val2, Env2);
}		}

		/// Attempts to merge distinct values `Val1` and `Val1` in `Env1` and `Env2`,
		/// respectively, of the same type `Type`. Merging generally produces a single
		/// value that (soundly) approximates the two inputs, although the actual
		/// meaning depends on `Model`.
		static Value mergeDistinctValues(QualType Type, Value Val1, Environment &Env1,
		Value *Val2, const Environment &Env2,
		Environment::ValueModel &Model) {
		// Join distinct boolean values preserving information about the constraints
		// in the respective path conditions. Note: this construction can, in
		// principle, result in exponential growth in the size of boolean values.
		// Potential optimizations may be worth considering. For example, represent
		// the flow condition of each environment using a bool atom and store, in
		xazax.hunUnsubmitted Not Done Reply Inline Actions Hmm, interesting. I think we view every boolean formula at a certain program point implicitly as `FlowConditionAtThatPoint && Formula`. And the flow condition at a program point should already be a disjunction of its predecessors. So it would be interpreted as: `(FlowConditionPred1 \|\| FlowConditionPred2) && (FormulaAtPred1 \|\| FormulaAtPred2)`. While this is great, this is not the strongest condition we could derive. `(FlowConditionPred1 && FormulaAtPred1) \|\| (FormulaAtPred2 && FlowConditionPred2)` created by this code snippet is stronger which is great. My main concern is whether we would end up seeing an exponential explosion in the size of these formulas in the number of branches following each other in a sequence. xazax.hun: Hmm, interesting. I think we view every boolean formula at a certain program point implicitly…
		sgatevUnsubmitted Not Done Reply Inline Actions Yeap, I agree this is suboptimal and I believe I'm the one to blame for introducing it downstream. I wonder if we can represent the flow condition of each environment using a bool atom and have a mapping of bi-conditionals between flow condition atoms and flow condition constraints. Something like: FC1 <=> C1 ^ C2 FC2 <=> C2 ^ C3 ^ C4 FC3 <=> (FC1 v FC2) ^ C5 ... We can use that to simplify the formulas here and in `joinConstraints`. The mapping can be stored in `DataflowAnalysisContext`. We can track dependencies between flow conditions (e.g. above `FC3` depends on `FC1` and `FC2`) and modify `flowConditionImplies` to construct a formula that includes the bi-conditionals for all flow condition atoms in the transitive set before invoking the solver. I suggest putting the optimization in its own patch. I'd love to look into it right after this patch is submitted if both of you think it makes sense on a high level. sgatev: Yeap, I agree this is suboptimal and I believe I'm the one to blame for introducing it…
		ymandelAuthorUnsubmitted Done Reply Inline Actions This sounds good to me. That said, I'm not sure how often we'd expect this to be an issue in practice, since, IIUC, this specialized merge only occurs when the value is handled differently in the two branches. So, a series of branches alone won't trigger the exponential behavior. ymandel: This sounds good to me. That said, I'm not sure how often we'd expect this to be an issue in…
		xazax.hunUnsubmitted Done Reply Inline Actions Fair point. It might never cause a problem in practice. Since we already have one proposal how to try to fix this if a problem ever surfaces, I'd love to have a comment about this in the code. But I think this is something that probably does not need to be addressed in the near future. xazax.hun: Fair point. It might never cause a problem in practice. Since we already have one proposal how…
		ymandelAuthorUnsubmitted Done Reply Inline Actions Good idea -- I reformatted essentially sgatev's response above as a comment in the code. ymandel: Good idea -- I reformatted essentially sgatev's response above as a comment in the code.
		// `DataflowAnalysisContext`, a mapping of bi-conditionals between flow
		// condition atoms and flow condition constraints. Something like:
		// \code
		// FC1 <=> C1 ^ C2
		// FC2 <=> C2 ^ C3 ^ C4
		// FC3 <=> (FC1 v FC2) ^ C5
		// \code
		// Then, we can track dependencies between flow conditions (e.g. above `FC3`
		// depends on `FC1` and `FC2`) and modify `flowConditionImplies` to construct
		// a formula that includes the bi-conditionals for all flow condition atoms in
		// the transitive set, before invoking the solver.
		if (auto *Expr1 = dyn_cast<BoolValue>(Val1)) {
		for (BoolValue *Constraint : Env1.getFlowConditionConstraints()) {
		Expr1 = &Env1.makeAnd(Expr1, Constraint);
		}
		auto *Expr2 = cast<BoolValue>(Val2);
		for (BoolValue *Constraint : Env2.getFlowConditionConstraints()) {
		Expr2 = &Env1.makeAnd(Expr2, Constraint);
		}
		return &Env1.makeOr(Expr1, Expr2);
		}

		// FIXME: Consider destroying `MergedValue` immediately if `ValueModel::merge`
		// returns false to avoid storing unneeded values in `DACtx`.
		if (Value *MergedVal = Env1.createValue(Type))
		if (Model.merge(Type, Val1, Env1, Val2, Env2, *MergedVal, Env1))
		return MergedVal;

		return nullptr;
		}

/// Initializes a global storage value.		/// Initializes a global storage value.
static void initGlobalVar(const VarDecl &D, Environment &Env) {		static void initGlobalVar(const VarDecl &D, Environment &Env) {
if (!D.hasGlobalStorage() \|\|		if (!D.hasGlobalStorage() \|\|
Env.getStorageLocation(D, SkipPast::None) != nullptr)		Env.getStorageLocation(D, SkipPast::None) != nullptr)
return;		return;

auto &Loc = Env.createStorageLocation(D);		auto &Loc = Env.createStorageLocation(D);
Env.setStorageLocation(D, Loc);		Env.setStorageLocation(D, Loc);
▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	for (auto &Entry : OldLocToVal) {
assert(It->second != nullptr);		assert(It->second != nullptr);

if (equivalentValues(Loc->getType(), Val, *this, It->second, Other,		if (equivalentValues(Loc->getType(), Val, *this, It->second, Other,
Model)) {		Model)) {
LocToVal.insert({Loc, Val});		LocToVal.insert({Loc, Val});
continue;		continue;
}		}

// FIXME: Consider destroying `MergedValue` immediately if		if (Value MergedVal = mergeDistinctValues(Loc->getType(), Val, this,
// `ValueModel::merge` returns false to avoid storing unneeded values in		It->second, Other, Model))
// `DACtx`.
if (Value *MergedVal = createValue(Loc->getType()))
if (Model.merge(Loc->getType(), Val, this, *It->second, Other,
MergedVal, this))
LocToVal.insert({Loc, MergedVal});		LocToVal.insert({Loc, MergedVal});
}		}
if (OldLocToVal.size() != LocToVal.size())		if (OldLocToVal.size() != LocToVal.size())
Effect = LatticeJoinEffect::Changed;		Effect = LatticeJoinEffect::Changed;

FlowConditionConstraints = joinConstraints(DACtx, FlowConditionConstraints,		FlowConditionConstraints = joinConstraints(DACtx, FlowConditionConstraints,
Other.FlowConditionConstraints);		Other.FlowConditionConstraints);

return Effect;		return Effect;
▲ Show 20 Lines • Show All 257 Lines • Show Last 20 Lines

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp

Show All 29 Lines
using namespace clang;		using namespace clang;
using namespace dataflow;		using namespace dataflow;
using namespace test;		using namespace test;
using ::testing::_;		using ::testing::_;
using ::testing::ElementsAre;		using ::testing::ElementsAre;
using ::testing::IsNull;		using ::testing::IsNull;
using ::testing::NotNull;		using ::testing::NotNull;
using ::testing::Pair;		using ::testing::Pair;
		using ::testing::SizeIs;

class TransferTest : public ::testing::Test {		class TransferTest : public ::testing::Test {
protected:		protected:
template <typename Matcher>		template <typename Matcher>
void runDataflow(llvm::StringRef Code, Matcher Match,		void runDataflow(llvm::StringRef Code, Matcher Match,
LangStandard::Kind Std = LangStandard::lang_cxx17,		LangStandard::Kind Std = LangStandard::lang_cxx17,
bool ApplyBuiltinTransfer = true) {		bool ApplyBuiltinTransfer = true) {
ASSERT_THAT_ERROR(		ASSERT_THAT_ERROR(
▲ Show 20 Lines • Show All 2,597 Lines • ▼ Show 20 Lines	runDataflow(
EXPECT_FALSE(EnvThen.flowConditionImplies(BarValThen));		EXPECT_FALSE(EnvThen.flowConditionImplies(BarValThen));

auto &BarValElse =		auto &BarValElse =
cast<BoolValue>(EnvElse.getValue(BarDecl, SkipPast::None));		cast<BoolValue>(EnvElse.getValue(BarDecl, SkipPast::None));
EXPECT_TRUE(EnvElse.flowConditionImplies(BarValElse));		EXPECT_TRUE(EnvElse.flowConditionImplies(BarValElse));
});		});
}		}

		TEST_F(TransferTest, CorrelatedBranches) {
		std::string Code = R"(
		void target(bool B, bool C) {
		if (B) {
		return;
		}
		(void)0;
		/[[p0]]/
		if (C) {
		B = true;
		/[[p1]]/
		}
		if (B) {
		(void)0;
		/[[p2]]/
		}
		}
		)";
		runDataflow(
		Code, [](llvm::ArrayRef<
		std::pair<std::string, DataflowAnalysisState<NoopLattice>>>
		Results,
		ASTContext &ASTCtx) {
		ASSERT_THAT(Results, SizeIs(3));

		const ValueDecl *CDecl = findValueDecl(ASTCtx, "C");
		ASSERT_THAT(CDecl, NotNull());

		{
		ASSERT_THAT(Results[2], Pair("p0", _));
		const Environment &Env = Results[2].second.Env;
		const ValueDecl *BDecl = findValueDecl(ASTCtx, "B");
		ASSERT_THAT(BDecl, NotNull());
		auto &BVal = cast<BoolValue>(Env.getValue(BDecl, SkipPast::None));

		EXPECT_TRUE(Env.flowConditionImplies(Env.makeNot(BVal)));
		}

		{
		ASSERT_THAT(Results[1], Pair("p1", _));
		const Environment &Env = Results[1].second.Env;
		auto &CVal = cast<BoolValue>(Env.getValue(CDecl, SkipPast::None));
		EXPECT_TRUE(Env.flowConditionImplies(CVal));
		}

		{
		ASSERT_THAT(Results[0], Pair("p2", _));
		const Environment &Env = Results[0].second.Env;
		auto &CVal = cast<BoolValue>(Env.getValue(CDecl, SkipPast::None));
		EXPECT_TRUE(Env.flowConditionImplies(CVal));
		}
		});
		}

} // namespace		} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add support for correlation of boolean (tracked) valuesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 419797

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp

[clang][dataflow] Add support for correlation of boolean (tracked) values
ClosedPublic