This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Analysis/FlowSensitive/
-
clang/
-
Analysis/
-
FlowSensitive/
4/4
WatchedLiteralsSolver.h
-
lib/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
2/2
WatchedLiteralsSolver.cpp
-
unittests/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
1/2
SolverTest.cpp

Differential D152732

[clang][dataflow] Support limits on the SAT solver to force timeouts.
ClosedPublic

Authored by ymandel on Jun 12 2023, 10:45 AM.

Download Raw Diff

Details

Reviewers

kinu
xazax.hun
mboehme
sgatev
NoQ

Commits

rGb639ebaa8f83: [clang][dataflow] Support limits on the SAT solver to force timeouts.

Summary

This patch allows the client of a WatchedLiteralsSolver to specify a
computation limit on the use of the solver. After the limit is exhausted, the
SAT solver times out.

Fixes issues #60265.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ymandel created this revision.Jun 12 2023, 10:45 AM

Herald added a reviewer: NoQ. · View Herald TranscriptJun 12 2023, 10:45 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: martong. · View Herald Transcript

ymandel requested review of this revision.Jun 12 2023, 10:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 10:45 AM

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

This revision is now accepted and ready to land.Jun 12 2023, 10:55 AM

In D152732#4414661, @xazax.hun wrote:

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

Excellent question. Ultimately what matters for a user is the global limit. So, in that sense, a global limit makes sense. But, it also makes it harder (in principle) to pinpoint the problem, because you could have it timeout on a small query right after a large query that was actually responsible for consuming the resources. That said, I'm guessing in practice it's binary, because of the exponential: either a single call exhausts all resources or it barely uses them. I suspect we'll ~never hit the case I described. So, I'm inclined to start global (to keep it simple) and then refine if necessary. As you probably noticed, this patch actually has both -- the user specifies the global limit, but the implementation is local. So, changing this would be easy.

That said, I should note that it's not global for a TU - just for a function, at least given the way we currently implement our clang-tidy checks. So, that seems a reasonable compromise.

WDYT?

LGTM as well, I was initially thinking about having a local limit per query (which could be easier to pinpoint the particular query that explodes), but per-solver instance limit could make sense as a starter too.

In D152732#4414707, @ymandel wrote:

Ultimately what matters for a user is the global limit.

I am not 100% sure about that. While it is true that the user cares about the process not hanging, but global vs local limits can have observable effects on the analysis results. With a global limit, after a query exhausted all the budget, for all intents and purposes we continue the analysis without a solver for the rest of the function and all queries would just time out, even the simple ones. With a local limit, the solver might time out for a couple of queries, but we keep the precision for the simple queries. That being said, it is possible that the scenario where we have a few big queries that blows the solver up but the rest of them are simple just does not happen that much. Also, a local timeout produces less reliable worst case runtime results. This makes me think it might be possible that we want both, but this decision is probably better made when we have some evidence that we actually need both. So, I am ok with committing this as is for now.

In D152732#4414771, @xazax.hun wrote:

In D152732#4414707, @ymandel wrote:

Ultimately what matters for a user is the global limit.

I am not 100% sure about that. While it is true that the user cares about the process not hanging, but global vs local limits can have observable effects on the analysis results. With a global limit, after a query exhausted all the budget, for all intents and purposes we continue the analysis without a solver for the rest of the function and all queries would just time out, even the simple ones. With a local limit, the solver might time out for a couple of queries, but we keep the precision for the simple queries. That being said, it is possible that the scenario where we have a few big queries that blows the solver up but the rest of them are simple just does not happen that much. Also, a local timeout produces less reliable worst case runtime results. This makes me think it might be possible that we want both, but this decision is probably better made when we have some evidence that we actually need both. So, I am ok with committing this as is for now.

Great! Yes, I think you're right that having both is probably the ideal solution. Let's start here, but that will be an easy step if and when we need it.

This revision was landed with ongoing or failed builds.Jun 12 2023, 11:35 AM

Closed by commit rGb639ebaa8f83: [clang][dataflow] Support limits on the SAT solver to force timeouts. (authored by ymandel). · Explain Why

This revision was automatically updated to reflect the committed changes.

ymandel added a commit: rGb639ebaa8f83: [clang][dataflow] Support limits on the SAT solver to force timeouts..

gribozavr2 added a subscriber: gribozavr2.Jun 12 2023, 12:15 PM

gribozavr2 added inline comments.

clang/include/clang/Analysis/FlowSensitive/WatchedLiteralsSolver.h
37	Consider renaming to `RemainingWorkUnits`
42
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
462	Why not add a separate getter for the remaining work amount?

In D152732#4414707, @ymandel wrote:

In D152732#4414661, @xazax.hun wrote:

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

Excellent question. Ultimately what matters for a user is the global limit. So, in that sense, a global limit makes sense. But, it also makes it harder (in principle) to pinpoint the problem, because you could have it timeout on a small query right after a large query that was actually responsible for consuming the resources. That said, I'm guessing in practice it's binary, because of the exponential: either a single call exhausts all resources or it barely uses them. I suspect we'll ~never hit the case I described. So, I'm inclined to start global (to keep it simple) and then refine if necessary. As you probably noticed, this patch actually has both -- the user specifies the global limit, but the implementation is local. So, changing this would be easy.

I'm a bit late to this discussion but still wanted to chime in.

I would actually argue that a local limit more accurately reflects what we want to limit. The functions we analyze will be distributed across a fairly broad range of size and complexity. It seems reasonable to allow more resources to be used to analyze a large and complex function than a small and simple function, and I think this is aligned with users' expectations. So I think it would be reasonable to allow the analysis to use an amount of resources that's proportional to the number of solve() invocations; we just want to limit the amount of resources consumed in a given solve() invocation.

I do follow your argument that "local versus global" likely won't make much of a difference in practice -- the number of solve() invocations is polynomial in the size of the function (I believe?), and that pales against the exponential blowup that can potentially occur inside solve().

However, I don't follow the argument that you want to "start global (to keep it simple)". I think a "local" limit would be simpler: WatchedLiteralsSolverImpl::solve() wouldn't need to return the final value of its parameter MaxIterations, and WatchedLiteralsSolver::solve() wouldn't need to write that back into its member variable MaxIterations (which would instead be const).

I don't think, in any case, that we should have a global _and_ a local limit -- that would really be overcomplicating things.

clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
378	Do we really need such a complex formula to test this? Couldn't we make the formula simpler (can we get as simple as "a && !a"?) and reduce the maximum number of iterations accordingly so we still get a timeout?

Harbormaster completed remote builds in B238249: Diff 530588.Jun 12 2023, 1:32 PM

In D152732#4415168, @mboehme wrote:

In D152732#4414707, @ymandel wrote:

In D152732#4414661, @xazax.hun wrote:

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

Excellent question. Ultimately what matters for a user is the global limit. So, in that sense, a global limit makes sense. But, it also makes it harder (in principle) to pinpoint the problem, because you could have it timeout on a small query right after a large query that was actually responsible for consuming the resources. That said, I'm guessing in practice it's binary, because of the exponential: either a single call exhausts all resources or it barely uses them. I suspect we'll ~never hit the case I described. So, I'm inclined to start global (to keep it simple) and then refine if necessary. As you probably noticed, this patch actually has both -- the user specifies the global limit, but the implementation is local. So, changing this would be easy.

I'm a bit late to this discussion but still wanted to chime in.

At a high level, I really don't care much -- I just want one mechanism that works well enough. With that said, here are my (weak) arguments in favor of global vs local. If you feel strongly though, feel free to push back and I'll change it (I have some small things to fix anyhow based on Dmitri's comments) or even just send a patch.

I would actually argue that a local limit more accurately reflects what we want to limit. The functions we analyze will be distributed across a fairly broad range of size and complexity. It seems reasonable to allow more resources to be used to analyze a large and complex function than a small and simple function, and I think this is aligned with users' expectations. So I think it would be reasonable to allow the analysis to use an amount of resources that's proportional to the number of solve() invocations; we just want to limit the amount of resources consumed in a given solve() invocation.

I agree about resource usage, but this is about a ceiling. Like timeouts that we place on our clang-tidy invocations and elsewhere, we're looking for a cap. This lets you cap the total, regardless of size. If you want to account for larger functions, just set the cap higher. I'd say that a local mechanism, then, is basically a way to be more restrictive on smaller functions, which begs the question of: what is the particular benefit?

I do follow your argument that "local versus global" likely won't make much of a difference in practice -- the number of solve() invocations is polynomial in the size of the function (I believe?), and that pales against the exponential blowup that can potentially occur inside solve().

Yeah, that's the key and drives the decision here.

However, I don't follow the argument that you want to "start global (to keep it simple)". I think a "local" limit would be simpler: WatchedLiteralsSolverImpl::solve() wouldn't need to return the final value of its parameter MaxIterations, and WatchedLiteralsSolver::solve() wouldn't need to write that back into its member variable MaxIterations (which would instead be const).

True - the code would be simpler and maybe we should just go by this. What I meant is that it is simpler in terms of tuning, since the user is setting the cap for the function. I find the predictability of a total cap simpler to reason about.

I don't think, in any case, that we should have a global _and_ a local limit -- that would really be overcomplicating things.

I actually think you just made the case for both: these accomplish different things. So, if it turns out we want both global caps (for catastrophes) and function-proportional limits, then two limits make sense. It doesn't make sense at the outset for sure -- just if we need, at which point by definition we need it. :)

clang/include/clang/Analysis/FlowSensitive/WatchedLiteralsSolver.h
37	I like that. Will do in followup.
42	ack (for followup patch).
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
462	It's a different object -- `WatchedLiteralsSolverImpl` (here) vs `WatchedLiteralsSolver` (where the field `MaxIterations` is located).
clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
378	No, I simply copied this from elsewhere. I just wanted an example that had a non trivial number of variables. I think its worth commenting that it's an arbitrary choice, but I don't seen particular value in trying to fine tune this for simplicity. But, if you have some argument for it, by all means.

Revision Contents

Path

Size

clang/

include/

clang/

Analysis/

FlowSensitive/

WatchedLiteralsSolver.h

18 lines

lib/

Analysis/

FlowSensitive/

WatchedLiteralsSolver.cpp

20 lines

unittests/

Analysis/

FlowSensitive/

SolverTest.cpp

18 lines

Diff 530616

clang/include/clang/Analysis/FlowSensitive/WatchedLiteralsSolver.h

Show All 11 Lines

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H #ifndef LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H

#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H #define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H

#include "clang/Analysis/FlowSensitive/Solver.h" #include "clang/Analysis/FlowSensitive/Solver.h"

#include "clang/Analysis/FlowSensitive/Value.h" #include "clang/Analysis/FlowSensitive/Value.h"

#include "llvm/ADT/DenseSet.h" #include "llvm/ADT/DenseSet.h"

#include <limits>

namespace clang { namespace clang {

namespace dataflow { namespace dataflow {

/// A SAT solver that is an implementation of Algorithm D from Knuth's The Art /// A SAT solver that is an implementation of Algorithm D from Knuth's The Art

/// of Computer Programming Volume 4: Satisfiability, Fascicle 6. It is based on /// of Computer Programming Volume 4: Satisfiability, Fascicle 6. It is based on

/// the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, keeps references to a /// the Davis-Putnam-Logemann-Loveland (DPLL) algorithm, keeps references to a

/// single "watched" literal per clause, and uses a set of "active" variables /// single "watched" literal per clause, and uses a set of "active" variables

/// for unit propagation. /// for unit propagation.

class WatchedLiteralsSolver : public Solver { class WatchedLiteralsSolver : public Solver {

// Count of the iterations of the main loop of the solver. This spans *all*

// calls to the underlying solver across the life of this object. It is

// reduced with every (non-trivial) call to the solver.

// We give control over the abstract count of iterations instead of concrete

// measurements like CPU cycles or time to ensure deterministic results.

std::int64_t MaxIterations = std::numeric_limits<std::int64_t>::max();

gribozavr2Unsubmitted

Not Done

Consider renaming to RemainingWorkUnits

gribozavr2: Consider renaming to `RemainingWorkUnits`

ymandelAuthorUnsubmitted

Done

I like that. Will do in followup.

ymandel: I like that. Will do in followup.

public: public:

WatchedLiteralsSolver() = default;

// `Work` specifies a computational limit on the solver. Units of "work"

gribozavr2Unsubmitted

Not Done

WatchedLiteralsSolver() = default;

- // `Work` specifies a computational limit on the solver. Units of "work"

+ // `WorkLimit` specifies a computational limit on the solver. Units of "work"

// roughly correspond to attempts to assign a value to a single

gribozavr2:

ymandelAuthorUnsubmitted

Done

ack (for followup patch).

ymandel: ack (for followup patch).

// roughly correspond to attempts to assign a value to a single

// variable. Since the algorithm is exponential in the number of variables,

// this is the most direct (abstract) unit to target.

explicit WatchedLiteralsSolver(std::int64_t WorkLimit)

: MaxIterations(WorkLimit) {}

Result solve(llvm::DenseSet<BoolValue *> Vals) override; Result solve(llvm::DenseSet<BoolValue *> Vals) override;

}; };

} // namespace dataflow } // namespace dataflow

} // namespace clang } // namespace clang

#endif // LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H #endif // LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp

Show First 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	explicit WatchedLiteralsSolverImpl(const llvm::DenseSet<BoolValue *> &Vals)

// Initialize the active variables.		// Initialize the active variables.
for (Variable Var = Formula.LargestVar; Var != NullVar; --Var) {		for (Variable Var = Formula.LargestVar; Var != NullVar; --Var) {
if (isWatched(posLit(Var)) \|\| isWatched(negLit(Var)))		if (isWatched(posLit(Var)) \|\| isWatched(negLit(Var)))
ActiveVars.push_back(Var);		ActiveVars.push_back(Var);
}		}
}		}

Solver::Result solve() && {		// Returns the `Result` and the number of iterations "remaining" from
		// `MaxIterations` (that is, `MaxIterations` - iterations in this call).
		gribozavr2Unsubmitted Not Done Reply Inline Actions Why not add a separate getter for the remaining work amount? gribozavr2: Why not add a separate getter for the remaining work amount?
		ymandelAuthorUnsubmitted Done Reply Inline Actions It's a different object -- `WatchedLiteralsSolverImpl` (here) vs `WatchedLiteralsSolver` (where the field `MaxIterations` is located). ymandel: It's a different object -- `WatchedLiteralsSolverImpl` (here) vs `WatchedLiteralsSolver` (where…
		std::pair<Solver::Result, std::int64_t> solve(std::int64_t MaxIterations) && {
size_t I = 0;		size_t I = 0;
while (I < ActiveVars.size()) {		while (I < ActiveVars.size()) {
		if (MaxIterations == 0)
		return std::make_pair(Solver::Result::TimedOut(), 0);
		--MaxIterations;

// Assert that the following invariants hold:		// Assert that the following invariants hold:
// 1. All active variables are unassigned.		// 1. All active variables are unassigned.
// 2. All active variables form watched literals.		// 2. All active variables form watched literals.
// 3. Unassigned variables that form watched literals are active.		// 3. Unassigned variables that form watched literals are active.
// FIXME: Consider replacing these with test cases that fail if the any		// FIXME: Consider replacing these with test cases that fail if the any
// of the invariants is broken. That might not be easy due to the		// of the invariants is broken. That might not be easy due to the
// transformations performed by `buildBooleanFormula`.		// transformations performed by `buildBooleanFormula`.
assert(activeVarsAreUnassigned());		assert(activeVarsAreUnassigned());
Show All 10 Lines	while (I < ActiveVars.size()) {

// Backtrack and rewind the `Level` until the most recent non-forced		// Backtrack and rewind the `Level` until the most recent non-forced
// assignment.		// assignment.
reverseForcedMoves();		reverseForcedMoves();

// If the root level is reached, then all possible assignments lead to		// If the root level is reached, then all possible assignments lead to
// a conflict.		// a conflict.
if (Level == 0)		if (Level == 0)
return Solver::Result::Unsatisfiable();		return std::make_pair(Solver::Result::Unsatisfiable(), MaxIterations);

// Otherwise, take the other branch at the most recent level where a		// Otherwise, take the other branch at the most recent level where a
// decision was made.		// decision was made.
LevelStates[Level] = State::Forced;		LevelStates[Level] = State::Forced;
const Variable Var = LevelVars[Level];		const Variable Var = LevelVars[Level];
VarAssignments[Var] = VarAssignments[Var] == Assignment::AssignedTrue		VarAssignments[Var] = VarAssignments[Var] == Assignment::AssignedTrue
? Assignment::AssignedFalse		? Assignment::AssignedFalse
: Assignment::AssignedTrue;		: Assignment::AssignedTrue;
Show All 40 Lines	while (I < ActiveVars.size()) {

// This was the last active variable. Repeat the process from the		// This was the last active variable. Repeat the process from the
// beginning.		// beginning.
I = 0;		I = 0;
} else {		} else {
++I;		++I;
}		}
}		}
return Solver::Result::Satisfiable(buildSolution());		return std::make_pair(Solver::Result::Satisfiable(buildSolution()), MaxIterations);
}		}

private:		private:
/// Returns a satisfying truth assignment to the atomic values in the boolean		/// Returns a satisfying truth assignment to the atomic values in the boolean
/// formula.		/// formula.
llvm::DenseMap<AtomicBoolValue *, Solver::Result::Assignment>		llvm::DenseMap<AtomicBoolValue *, Solver::Result::Assignment>
buildSolution() {		buildSolution() {
llvm::DenseMap<AtomicBoolValue *, Solver::Result::Assignment> Solution;		llvm::DenseMap<AtomicBoolValue *, Solver::Result::Assignment> Solution;
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	for (Literal Lit : watchedLiterals()) {
continue;		continue;
return false;		return false;
}		}
return true;		return true;
}		}
};		};

Solver::Result WatchedLiteralsSolver::solve(llvm::DenseSet<BoolValue *> Vals) {		Solver::Result WatchedLiteralsSolver::solve(llvm::DenseSet<BoolValue *> Vals) {
return Vals.empty() ? Solver::Result::Satisfiable({{}})		if (Vals.empty())
: WatchedLiteralsSolverImpl(Vals).solve();		return Solver::Result::Satisfiable({{}});
		auto [Res, Iterations] =
		WatchedLiteralsSolverImpl(Vals).solve(MaxIterations);
		MaxIterations = Iterations;
		return Res;
}		}

} // namespace dataflow		} // namespace dataflow
} // namespace clang		} // namespace clang

clang/unittests/Analysis/FlowSensitive/SolverTest.cpp

Show First 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	TEST(SolverTest, ImplicationConflict) {
auto Y = Ctx.atom();		auto Y = Ctx.atom();
auto *XImplY = Ctx.impl(X, Y);		auto *XImplY = Ctx.impl(X, Y);
auto *XAndNotY = Ctx.conj(X, Ctx.neg(Y));		auto *XAndNotY = Ctx.conj(X, Ctx.neg(Y));

// X => Y ^ X ^ !Y		// X => Y ^ X ^ !Y
expectUnsatisfiable(solve({XImplY, XAndNotY}));		expectUnsatisfiable(solve({XImplY, XAndNotY}));
}		}

		TEST(SolverTest, LowTimeoutResultsInTimedOut) {
		WatchedLiteralsSolver solver(10);
		ConstraintContext Ctx;
		auto X = Ctx.atom();
		auto Y = Ctx.atom();
		auto Z = Ctx.atom();
		auto W = Ctx.atom();

		// !(X v Y) <=> !X ^ !Y
		auto A = Ctx.iff(Ctx.neg(Ctx.disj(X, Y)), Ctx.conj(Ctx.neg(X), Ctx.neg(Y)));

		// !(Z ^ W) <=> !Z v !W
		auto B = Ctx.iff(Ctx.neg(Ctx.conj(Z, W)), Ctx.disj(Ctx.neg(Z), Ctx.neg(W)));

		// A ^ B
		EXPECT_EQ(solver.solve({A, B}).getStatus(), Solver::Result::Status::TimedOut);
		mboehmeUnsubmitted Not Done Reply Inline Actions Do we really need such a complex formula to test this? Couldn't we make the formula simpler (can we get as simple as "a && !a"?) and reduce the maximum number of iterations accordingly so we still get a timeout? mboehme: Do we really need such a complex formula to test this? Couldn't we make the formula simpler…
		ymandelAuthorUnsubmitted Done Reply Inline Actions No, I simply copied this from elsewhere. I just wanted an example that had a non trivial number of variables. I think its worth commenting that it's an arbitrary choice, but I don't seen particular value in trying to fine tune this for simplicity. But, if you have some argument for it, by all means. ymandel: No, I simply copied this from elsewhere. I just wanted an example that had a non trivial number…
		}

} // namespace		} // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Support limits on the SAT solver to force timeouts.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 530616

clang/include/clang/Analysis/FlowSensitive/WatchedLiteralsSolver.h

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp

clang/unittests/Analysis/FlowSensitive/SolverTest.cpp

[clang][dataflow] Support limits on the SAT solver to force timeouts.
ClosedPublic