This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add SAT solver interface and implementation
ClosedPublic

Authored by sgatev on Feb 21 2022, 3:52 PM.

Details

Summary

This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.

The solver will be used in a follow-up patch to check validity of flow
conditions.

Diff Detail

Event Timeline

sgatev created this revision.Feb 21 2022, 3:52 PM
sgatev requested review of this revision.Feb 21 2022, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2022, 3:52 PM
sgatev edited the summary of this revision. (Show Details)Feb 21 2022, 3:59 PM
ymandel accepted this revision.Feb 21 2022, 6:28 PM
ymandel added inline comments.
clang/include/clang/Analysis/FlowSensitive/Solver.h
39

Which Result is expected if the Solver gives up, assuming that's allowed in this API.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
193
373

Why this constraint?

456

nit: ++I, for consistency?

529
594
This revision is now accepted and ready to land.Feb 21 2022, 6:28 PM
sgatev updated this revision to Diff 410460.Feb 21 2022, 11:55 PM
sgatev marked 6 inline comments as done.

Address reviewers' comments.

clang/include/clang/Analysis/FlowSensitive/Solver.h
39

Added a TimedOut result for such cases.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
373

That's because solve must be called at most once for WatchedLiteralsSolverImpl - it initializes its data structures in the constructor and then mutates them in solve.

Hi!

Just a quick high level question. Did you look into reusing existing SAT solvers like miniSAT? What was the main reason for rolling our own instead of picking something up off the selves?

Also, LLVM already has a SMT API (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/SMTAPI.h). I wonder if it would make sense to have a SAT base class for the SMT API and reuse that here? Specifically, because solvers could be useful for all of the frontends and also optimizations in the backends, it would be great to make sure everyone can benefit from these features. What do you think?

Note that, there are already precedents for Clang using solver tech from LLVM. E.g., the clang static analyzer's Z3-based refutation support is using the SMT API in LLVM, and there are experiments integrating yet another solver with the SA: https://reviews.llvm.org/D110125

Did you look into reusing existing SAT solvers like miniSAT? What was the main reason for rolling our own instead of picking something up off the shelves?

Mainly to provide a lightweight out-of-the-box alternative. The solver interface is simple so one should be able to roll out an implementation for their favorite solver with little effort. One concrete use case that I have in mind for this solver implementation is unit tests, but we've also used it successfully to analyze a large codebase.

I wonder if it would make sense to have a SAT base class for the SMT API and reuse that here?

I think that on a high level we can either 1) integrate the SMT API types deeply into the dataflow framework and use that solver interface directly or 2) have a layer that translates from the dataflow BoolValue types to the SMT API types. At this point I'm not convinced that we should go for 1). For 2) we can simply provide an adapter that implements the dataflow solver API using the SMT API. This way the dataflow framework can be used with SMT API-compatible solvers. What do you think?

xazax.hun added a comment.EditedFeb 22 2022, 11:25 AM

I wonder if it would make sense to have a SAT base class for the SMT API and reuse that here?

I think that on a high level we can either 1) integrate the SMT API types deeply into the dataflow framework and use that solver interface directly or 2) have a layer that translates from the dataflow BoolValue types to the SMT API types. At this point I'm not convinced that we should go for 1).

Could you elaborate on what would be the main disadvantage of using 1)? (The advantage would be that we could very easily switch to Z3 any time.)

I wonder if it would make sense to have a SAT base class for the SMT API and reuse that here?

I think that on a high level we can either 1) integrate the SMT API types deeply into the dataflow framework and use that solver interface directly or 2) have a layer that translates from the dataflow BoolValue types to the SMT API types. At this point I'm not convinced that we should go for 1).

Could you elaborate on what would be the main disadvantage of using 1)? (The advantage would be that we could very easily switch to Z3 any time.)

Let me clarify 1) because I see two options there as well:

1.a) We replace the BoolValue hierarchy in the dataflow framework with SMTExprRef.
This feels too committal. The dataflow interfaces are still not sufficiently developed and there isn't a clear benefit (e.g. performance) with this approach so I'd rather not couple them for now. There might also be benefits to having a structured representation (as opposed to using opaque values such as SMTExprRef) that we need to consider. I think this is a viable long-term option, though.

1.b) We add an SMTExprRef member to each boolean value (atom, conjunction, disjunction, and negation) in the dataflow framework.
I like this better than 1.a). The main disadvantage I see is that SMTSolver is a huge interface and on the surface it seems tailored to the Z3 API. From that perspective it also feels more committal than necessary, given that we can make do with a simple one-method interface for now. I think this is a good long-term option if we think the SMT API is the most appropriate solver interface.

I find 2) least committal as it keeps the dataflow framework and the solver separate. For a simple integration with Z3 I can provide an implementation of clang::dataflow::Solver using the SMT API. That should boil down to translating dataflow types to SMT types.

What do you think about this? Any other options we should consider?

it seems tailored to the Z3 API.

As far as I understand, there were downstream patches that used the same API with other SMT solvers. The authors did not upstream it because they did not see any major improvement on Z3 for any of the use cases.

What do you think about this? Any other options we should consider?

After reading your analysis, I'm fine with the current approach.

xazax.hun accepted this revision.Feb 24 2022, 12:18 PM

Overall, looks good to me. Some nits inline.

Are there plans to get a model/assignment of the variables from the solver? That could be helpful for generating warning messages in the future :)

clang/include/clang/Analysis/FlowSensitive/Solver.h
40

= default? To make this trivially destructible.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
160

Shouldn't we make this function variadic (or taking a container) instead of having multiple overloads?

214

Switch on the kind instead? That way DisjunctionValue and ConjunctionValue could be handled by the same case.

242

While I get that this should be empty before this loop due to the loop condition at the previous use, but I find such code confusing. I'd prefer to either use a separate queue or have at least an assert to remind us that this one should be a clean container before the loop.

356

Shouldn't we call the right ctor directly in the initialization list instead of resizing a default constructed vector?

sgatev updated this revision to Diff 411400.Feb 25 2022, 6:18 AM
sgatev marked 5 inline comments as done.

Address reviewers' comments.

Are there plans to get a model/assignment of the variables from the solver? That could be helpful for generating warning messages in the future :)

Absolutely! That was only discussed briefly so far. One challenge would be distilling this model to present only relevant information to the user. That could also be best effort, of course. It's certainly something I think we should provide at some point.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
160

I'd like to avoid containers here to minimize allocations. I'd love to use a fold expression, but that's C++ 17. I opted for setting default values for the last two arguments.

214

Unfortunately, I don't think we can handle them in the same case because they don't share a parent that represents a binary operation. Perhaps an improvement worth considering, as we discussed in one of the previous patches.

242

I agree that this would be an improvement. I opted for narrowing the scope of the first container and adding a second container for this operation.

356

Makes complete sense. Updated.

This revision was landed with ongoing or failed builds.Feb 25 2022, 6:48 AM
This revision was automatically updated to reflect the committed changes.