In dataflow analysis, SAT solver: simplify formula during CNF construction and short-cut
solving when the formula has been recognized as contradictory.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This simplification makes sense, but if we're adding the layer, we're missing an opportunity to apply it completely.
(Intuitively, I don't see any reason that two passes is "enough")
To do this, we'd want to simplify existing formulas when we learn something new.
Curious whether you think this is worth doing and why - I don't think I'm opposed to the current version if it solves the practical problem, but I am concerned we might be adding simplification layers that solve some test cases but are defeated by relatively simple changes like reordering inputs.
For a complete version, I think the algorithm we'd want is something like:
clauses_so_far = {} clauses_to_add = [...] for clause in clauses_to_add: clause = simplify(clause) if clause is trivially true or clause in clauses_so_far: continue # adds nothing if clause is {lit}: # reprocess clauses containing the variable we just resolved affected_clauses = [c in clauses_so_far if lit in c or negate(lit) in c] clauses_so_far -= affected_clauses clauses_to_add += affected_clauses clauses_so_far.add(clause) if clause is trivially false: break def simplify(clause): if clause.any(lit => {lit} in clauses_so_far): return trivially true return clause.filter(lit => not {negate(lit)} in clauses_so_far)
So the builder data structure (clauses_so_far) needs to support:
- test for clause presence
- remove and return clauses containing a literal
I can't come up with anything particularly clever, but throwing some hashtables at it always works...
(A more powerful version would recognize that AvB can simplify AvBvC and so do removal by subset rather than a single literal, but that seems even further beyond scope)
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp | ||
---|---|---|
179 | It would be useful to briefly describe what kind of simplifications. (this drives e.g. the fact we use it twice). AIUI: It tracks variables that must be true/false, based on trivial clauses seen so far. Such variables can be dropped from subsequently-added clauses, which may render such clauses trivial and give us more known variables. | |
195 | you're already testing this in the higher-level loop, so checking on every call to addClause doesn't seem to actually save anything significant. IMO it makes responsibilities less clear, so I'd prefer to drop it here and addClauseLiterals | |
203 | nit: capitalize variable names here & elsewhere | |
426 | the issue is that info only propagates forward (earlier to later clauses, right?) so by running this again, and sorting units first, we allow simplifications that propagate info backwards *once*, but we still don't have all simplifications. D Av!B Bv!C Cv!D // first simplification pass Av!B Bv!C C // hoist new unit // second simplification pass Av!B B // hoist new unit // third simplification pass A I think this is worth being explicit about: we're going to find some more simplifications, but we won't find them all, because running this to fixed point is too inefficient. Is 2 experimentally determined to be the right number of passes? a guess? or am I misunderstanding :-) |
Thanks for the review. PTAL.
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp | ||
---|---|---|
426 | You are right that one could do more work but it is better to leave this to the solver algorithm. We know empirically that there will be a few unit clauses, so might as well spend *linear time* (in number of unit clauses) to save some work. This won't be enough to determine whether all formulas are satisfiable, but it catches a few obvious contradictions. Doing this twice (as opposed to once) catches more formulas that are obvious contradictions in our unit tests and some real sources. I picked two simply because when we obtain unit clauses "later", we had no opportunity to apply them to earlier clauses. Doing full-blow mutations seems more complicated, esp. given that the Clauses data structure has been written for the actual solver algorithm. I think your concern on optimizing for a certain pattern of input formulas, which may well change in the future, is valid; therefore one should leave the "real" solving work to the solver algorithm, which systematically explores all cases. |
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp | ||
---|---|---|
153–155 | lits => Lits | |
184 | I'm confused about this comment: the preprocessing looks like O(N): we process every clause once, and addClauseLiterals() runs in constant time. What am I missing? | |
202 | if you're going to form an ArrayRef in any case, might as well skip this indirection and have the callsites pass addClause({L1, L2}) or so? | |
221 | literal => L or so | |
259 | IsKnownContradictory => isKnownContradictory |
Applied reviewer comments.
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp | ||
---|---|---|
184 | I am sorry for the misleading text: I only talked about addClause complexity. And I estimated the lookup at a conservative O(log(K)) worst case complexity. And this is just completely wrong, since we can expect lookup to be on a hash-table. It would be O(1) average and worst case O(K). Changed to O(N). | |
202 | Thanks, this looks much nicer. I chose to preserve the assert condition to check that there are <= 3 literals. I believe the solver way work with clauses that have more literals but I don't know whether any trade-offs were made to focus on 3SAT in the solver. |
lits => Lits