This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] #llvm #flow-analysis Simplify formula at CNF construction time, and short-cut solving of known contradictory formulas.
ClosedPublic

Authored by burakemir on Aug 21 2023, 4:15 AM.

Details

Summary

In dataflow analysis, SAT solver: simplify formula during CNF construction and short-cut
solving when the formula has been recognized as contradictory.

Diff Detail

Event Timeline

burakemir created this revision.Aug 21 2023, 4:15 AM
Herald added a project: Restricted Project. · View Herald Transcript
burakemir requested review of this revision.Aug 21 2023, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 4:15 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
burakemir edited reviewers, added: sammccall, xazax.hun; removed: NoQ.Aug 21 2023, 4:19 AM

This simplification makes sense, but if we're adding the layer, we're missing an opportunity to apply it completely.
(Intuitively, I don't see any reason that two passes is "enough")
To do this, we'd want to simplify existing formulas when we learn something new.

Curious whether you think this is worth doing and why - I don't think I'm opposed to the current version if it solves the practical problem, but I am concerned we might be adding simplification layers that solve some test cases but are defeated by relatively simple changes like reordering inputs.


For a complete version, I think the algorithm we'd want is something like:

clauses_so_far = {}
clauses_to_add = [...]

for clause in clauses_to_add:
  clause = simplify(clause)
  if clause is trivially true or clause in clauses_so_far:
    continue # adds nothing
  if clause is {lit}:
    # reprocess clauses containing the variable we just resolved
    affected_clauses = [c in clauses_so_far if lit in c or negate(lit) in c]
    clauses_so_far -= affected_clauses
    clauses_to_add += affected_clauses
  clauses_so_far.add(clause)
  if clause is trivially false:
    break

def simplify(clause):
  if clause.any(lit => {lit} in clauses_so_far):
    return trivially true 
  return clause.filter(lit => not {negate(lit)} in clauses_so_far)

So the builder data structure (clauses_so_far) needs to support:

  • test for clause presence
  • remove and return clauses containing a literal

I can't come up with anything particularly clever, but throwing some hashtables at it always works...

(A more powerful version would recognize that AvB can simplify AvBvC and so do removal by subset rather than a single literal, but that seems even further beyond scope)

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
179

It would be useful to briefly describe what kind of simplifications. (this drives e.g. the fact we use it twice).

AIUI:

It tracks variables that must be true/false, based on trivial clauses seen so far.
Such variables can be dropped from subsequently-added clauses, which
may render such clauses trivial and give us more known variables.
195

you're already testing this in the higher-level loop, so checking on every call to addClause doesn't seem to actually save anything significant.

IMO it makes responsibilities less clear, so I'd prefer to drop it here and addClauseLiterals

203

nit: capitalize variable names here & elsewhere

426

the issue is that info only propagates forward (earlier to later clauses, right?)

so by running this again, and sorting units first, we allow simplifications that propagate info backwards *once*, but we still don't have all simplifications.

D
Av!B
Bv!C
Cv!D

// first simplification pass

Av!B
Bv!C
C // hoist new unit

// second simplification pass

Av!B
B // hoist new unit

// third simplification pass
A

I think this is worth being explicit about: we're going to find some more simplifications, but we won't find them all, because running this to fixed point is too inefficient.

Is 2 experimentally determined to be the right number of passes? a guess? or am I misunderstanding :-)

burakemir retitled this revision from #llvm #flow-analysis Simplify formula at CNF construction time, and short-cut solving of known contradictory formulas. to [clang][dataflow] #llvm #flow-analysis Simplify formula at CNF construction time, and short-cut solving of known contradictory formulas..Aug 28 2023, 1:18 AM
burakemir updated this revision to Diff 554040.Aug 28 2023, 1:43 PM
burakemir marked 2 inline comments as done.

Addressing reviewer comments.

Thanks for the review. PTAL.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
426

You are right that one could do more work but it is better to leave this to the solver algorithm.

We know empirically that there will be a few unit clauses, so might as well spend *linear time* (in number of unit clauses) to save some work.

This won't be enough to determine whether all formulas are satisfiable, but it catches a few obvious contradictions.

Doing this twice (as opposed to once) catches more formulas that are obvious contradictions in our unit tests and some real sources. I picked two simply because when we obtain unit clauses "later", we had no opportunity to apply them to earlier clauses.

Doing full-blow mutations seems more complicated, esp. given that the Clauses data structure has been written for the actual solver algorithm. I think your concern on optimizing for a certain pattern of input formulas, which may well change in the future, is valid; therefore one should leave the "real" solving work to the solver algorithm, which systematically explores all cases.

sammccall accepted this revision.Aug 31 2023, 2:01 PM
sammccall added inline comments.
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
153–155

lits => Lits

184

I'm confused about this comment: the preprocessing looks like O(N): we process every clause once, and addClauseLiterals() runs in constant time. What am I missing?

202

if you're going to form an ArrayRef in any case, might as well skip this indirection and have the callsites pass addClause({L1, L2}) or so?

221

literal => L or so

259

IsKnownContradictory => isKnownContradictory

This revision is now accepted and ready to land.Aug 31 2023, 2:01 PM
burakemir updated this revision to Diff 555363.Sep 1 2023, 6:57 AM
burakemir marked 4 inline comments as done.

Applied reviewer comments.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
184

I am sorry for the misleading text: I only talked about addClause complexity.

And I estimated the lookup at a conservative O(log(K)) worst case complexity. And this is just completely wrong, since we can expect lookup to be on a hash-table. It would be O(1) average and worst case O(K).
I guess my mind was wandering off thinking about improving worst case lookup complexity.

Changed to O(N).

202

Thanks, this looks much nicer.

I chose to preserve the assert condition to check that there are <= 3 literals.

I believe the solver way work with clauses that have more literals but I don't know whether any trade-offs were made to focus on 3SAT in the solver.

This revision was landed with ongoing or failed builds.Sep 4 2023, 11:23 PM
This revision was automatically updated to reflect the committed changes.