This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Support limits on the SAT solver to force timeouts.
ClosedPublic

Authored by ymandel on Jun 12 2023, 10:45 AM.

Details

Summary

This patch allows the client of a WatchedLiteralsSolver to specify a
computation limit on the use of the solver. After the limit is exhausted, the
SAT solver times out.

Fixes issues #60265.

Diff Detail

Event Timeline

ymandel created this revision.Jun 12 2023, 10:45 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: martong. · View Herald Transcript
ymandel requested review of this revision.Jun 12 2023, 10:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 10:45 AM
xazax.hun accepted this revision.Jun 12 2023, 10:55 AM

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

This revision is now accepted and ready to land.Jun 12 2023, 10:55 AM

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

Excellent question. Ultimately what matters for a user is the global limit. So, in that sense, a global limit makes sense. But, it also makes it harder (in principle) to pinpoint the problem, because you could have it timeout on a small query right after a large query that was actually responsible for consuming the resources. That said, I'm guessing in practice it's binary, because of the exponential: either a single call exhausts all resources or it barely uses them. I suspect we'll ~never hit the case I described. So, I'm inclined to start global (to keep it simple) and then refine if necessary. As you probably noticed, this patch actually has both -- the user specifies the global limit, but the implementation is local. So, changing this would be easy.

That said, I should note that it's not global for a TU - just for a function, at least given the way we currently implement our clang-tidy checks. So, that seems a reasonable compromise.

WDYT?

kinu added a comment.Jun 12 2023, 11:09 AM

LGTM as well, I was initially thinking about having a local limit per query (which could be easier to pinpoint the particular query that explodes), but per-solver instance limit could make sense as a starter too.

Ultimately what matters for a user is the global limit.

I am not 100% sure about that. While it is true that the user cares about the process not hanging, but global vs local limits can have observable effects on the analysis results. With a global limit, after a query exhausted all the budget, for all intents and purposes we continue the analysis without a solver for the rest of the function and all queries would just time out, even the simple ones. With a local limit, the solver might time out for a couple of queries, but we keep the precision for the simple queries. That being said, it is possible that the scenario where we have a few big queries that blows the solver up but the rest of them are simple just does not happen that much. Also, a local timeout produces less reliable worst case runtime results. This makes me think it might be possible that we want both, but this decision is probably better made when we have some evidence that we actually need both. So, I am ok with committing this as is for now.

Ultimately what matters for a user is the global limit.

I am not 100% sure about that. While it is true that the user cares about the process not hanging, but global vs local limits can have observable effects on the analysis results. With a global limit, after a query exhausted all the budget, for all intents and purposes we continue the analysis without a solver for the rest of the function and all queries would just time out, even the simple ones. With a local limit, the solver might time out for a couple of queries, but we keep the precision for the simple queries. That being said, it is possible that the scenario where we have a few big queries that blows the solver up but the rest of them are simple just does not happen that much. Also, a local timeout produces less reliable worst case runtime results. This makes me think it might be possible that we want both, but this decision is probably better made when we have some evidence that we actually need both. So, I am ok with committing this as is for now.

Great! Yes, I think you're right that having both is probably the ideal solution. Let's start here, but that will be an easy step if and when we need it.

This revision was landed with ongoing or failed builds.Jun 12 2023, 11:35 AM
This revision was automatically updated to reflect the committed changes.
gribozavr2 added inline comments.
clang/include/clang/Analysis/FlowSensitive/WatchedLiteralsSolver.h
37

Consider renaming to RemainingWorkUnits

42
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
462

Why not add a separate getter for the remaining work amount?

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

Excellent question. Ultimately what matters for a user is the global limit. So, in that sense, a global limit makes sense. But, it also makes it harder (in principle) to pinpoint the problem, because you could have it timeout on a small query right after a large query that was actually responsible for consuming the resources. That said, I'm guessing in practice it's binary, because of the exponential: either a single call exhausts all resources or it barely uses them. I suspect we'll ~never hit the case I described. So, I'm inclined to start global (to keep it simple) and then refine if necessary. As you probably noticed, this patch actually has both -- the user specifies the global limit, but the implementation is local. So, changing this would be easy.

I'm a bit late to this discussion but still wanted to chime in.

I would actually argue that a local limit more accurately reflects what we want to limit. The functions we analyze will be distributed across a fairly broad range of size and complexity. It seems reasonable to allow more resources to be used to analyze a large and complex function than a small and simple function, and I think this is aligned with users' expectations. So I think it would be reasonable to allow the analysis to use an amount of resources that's proportional to the number of solve() invocations; we just want to limit the amount of resources consumed in a given solve() invocation.

I do follow your argument that "local versus global" likely won't make much of a difference in practice -- the number of solve() invocations is polynomial in the size of the function (I believe?), and that pales against the exponential blowup that can potentially occur inside solve().

However, I don't follow the argument that you want to "start global (to keep it simple)". I think a "local" limit would be simpler: WatchedLiteralsSolverImpl::solve() wouldn't need to return the final value of its parameter MaxIterations, and WatchedLiteralsSolver::solve() wouldn't need to write that back into its member variable MaxIterations (which would instead be const).

I don't think, in any case, that we should have a global _and_ a local limit -- that would really be overcomplicating things.

clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
378

Do we really need such a complex formula to test this? Couldn't we make the formula simpler (can we get as simple as "a && !a"?) and reduce the maximum number of iterations accordingly so we still get a timeout?

ymandel marked 3 inline comments as done.Jun 12 2023, 2:44 PM

Huge +1, I think most solvers need to have some resource limits in place as the runtime can explode. I am just not 100% what is the best approach here, putting a global limit on the solver instance vs having a limit per query. Any thoughts on that?

Excellent question. Ultimately what matters for a user is the global limit. So, in that sense, a global limit makes sense. But, it also makes it harder (in principle) to pinpoint the problem, because you could have it timeout on a small query right after a large query that was actually responsible for consuming the resources. That said, I'm guessing in practice it's binary, because of the exponential: either a single call exhausts all resources or it barely uses them. I suspect we'll ~never hit the case I described. So, I'm inclined to start global (to keep it simple) and then refine if necessary. As you probably noticed, this patch actually has both -- the user specifies the global limit, but the implementation is local. So, changing this would be easy.

I'm a bit late to this discussion but still wanted to chime in.

At a high level, I really don't care much -- I just want one mechanism that works well enough. With that said, here are my (weak) arguments in favor of global vs local. If you feel strongly though, feel free to push back and I'll change it (I have some small things to fix anyhow based on Dmitri's comments) or even just send a patch.

I would actually argue that a local limit more accurately reflects what we want to limit. The functions we analyze will be distributed across a fairly broad range of size and complexity. It seems reasonable to allow more resources to be used to analyze a large and complex function than a small and simple function, and I think this is aligned with users' expectations. So I think it would be reasonable to allow the analysis to use an amount of resources that's proportional to the number of solve() invocations; we just want to limit the amount of resources consumed in a given solve() invocation.

I agree about resource usage, but this is about a ceiling. Like timeouts that we place on our clang-tidy invocations and elsewhere, we're looking for a cap. This lets you cap the total, regardless of size. If you want to account for larger functions, just set the cap higher. I'd say that a local mechanism, then, is basically a way to be more restrictive on smaller functions, which begs the question of: what is the particular benefit?

I do follow your argument that "local versus global" likely won't make much of a difference in practice -- the number of solve() invocations is polynomial in the size of the function (I believe?), and that pales against the exponential blowup that can potentially occur inside solve().

Yeah, that's the key and drives the decision here.

However, I don't follow the argument that you want to "start global (to keep it simple)". I think a "local" limit would be simpler: WatchedLiteralsSolverImpl::solve() wouldn't need to return the final value of its parameter MaxIterations, and WatchedLiteralsSolver::solve() wouldn't need to write that back into its member variable MaxIterations (which would instead be const).

True - the code would be simpler and maybe we should just go by this. What I meant is that it is simpler in terms of tuning, since the user is setting the cap for the function. I find the predictability of a total cap simpler to reason about.

I don't think, in any case, that we should have a global _and_ a local limit -- that would really be overcomplicating things.

I actually think you just made the case for both: these accomplish different things. So, if it turns out we want both global caps (for catastrophes) and function-proportional limits, then two limits make sense. It doesn't make sense at the outset for sure -- just if we need, at which point by definition we need it. :)

clang/include/clang/Analysis/FlowSensitive/WatchedLiteralsSolver.h
37

I like that. Will do in followup.

42

ack (for followup patch).

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
462

It's a different object -- WatchedLiteralsSolverImpl (here) vs WatchedLiteralsSolver (where the field MaxIterations is located).

clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
378

No, I simply copied this from elsewhere. I just wanted an example that had a non trivial number of variables. I think its worth commenting that it's an arbitrary choice, but I don't seen particular value in trying to fine tune this for simplicity. But, if you have some argument for it, by all means.