This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Add support for a Top value in boolean formulas.
ClosedPublic

Authored by ymandel on Oct 6 2022, 1:44 PM.

Details

Summary

Currently, our boolean formulas (BoolValue) don't form a lattice, since they
have no Top element. This patch adds such an element, thereby "completing" the
built-in model of bools to be a proper semi-lattice. It still has infinite
height, which is its own problem, but that can be solved separately, through
widening and the like.

Patch 1 for Issue #56931.

Diff Detail

Event Timeline

ymandel created this revision.Oct 6 2022, 1:44 PM
Herald added a project: Restricted Project. · View Herald Transcript
ymandel requested review of this revision.Oct 6 2022, 1:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2022, 1:44 PM
gribozavr2 added inline comments.Oct 6 2022, 3:09 PM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

Should we be creating a new top every time, or should it be a singleton like true and false?

It seems like we explicitly don't care about its identity (or else it would be isomorphic to AtomicBool).

clang/include/clang/Analysis/FlowSensitive/Value.h
98

Here and throughout the patch: since we might have a top for other kinds of values, WDYT about renaming this type to TopBoolValue? Similarly createTop -> createTopBool ?

103

Since TopValue is a BoolValue, can we form say a ConjunctionValue where LHS or RHS is Top?

What if we create such a conjunction twice? It seems like such conjunctions would incorrectly compare equal, even though each Top will be replaced with a unique fresh variable.

Would it make sense to change our factory functions for boolean formulas to eagerly open Top?

Then we wouldn't need a recursive walk in unpackValue().

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
376

isa<Top>

clang/lib/Analysis/FlowSensitive/Transfer.cpp
77

llvm_unreachable(); because this can't happen, V is a BoolValue.

124–125
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
237–238
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
472

isa<Top>(...)? and isa<AtomicBool>(...) below?

620

EXPECT_TRUE(isa<Top>(...)) ?

1192
1219–1220
1291
1301–1310

Similarly in tests below.

if (false) theoretically could be handled in a special way in future and could be folded away during CFG construction.

1435–1439

Zab1Decl, Zab2Decl are unused apart from the NotNull check (which does not seem interesting to me).

1453

Could you add a variant of this test?

void target(bool Cond) {
  bool Foo = makeTop();
  // Force a new block.
  if (false) return;
  (void)0;
  /*[[p1]]*/

  bool Zab;
  if (Cond) {
    Zab = Foo;
  } else {
    Zab = Foo;
  }
  (void)0;
  if (Zab == Foo) { return; }
  /*[[p2]]*/
}

Show the loss of precision by checking that the flow condition for p2 is satisfiable.

xazax.hun accepted this revision.Oct 6 2022, 3:16 PM
xazax.hun added inline comments.
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
525–526

I feel like this logic is repeated multiple times. I wonder if we should define an operator== for const BoolValue*.

1191

Nit: I usually prefer marking whole classes final rather than individual virtual methods, but feel free to leave as is.

This revision is now accepted and ready to land.Oct 6 2022, 3:16 PM
xazax.hun added inline comments.Oct 6 2022, 3:18 PM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

Good point, a singleton Top might actually simplify some parts of the code.

Thanks for the thorough reviews! Agreed on all points, and will address.

clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

Good question. That was my initial implementation, but it ran into the problem that the solver (in buildBooleanFormula) maps each unique (by pointer) subexpression into a fresh SAT variable in . If we use a singleton Top, we need to modify that algorithm appropriately. I'm open to suggestions, though, because of the advantages of a singleton Top.

If we assume that a given Top is never actually shared in a boolean formula (unlike other subexpressions, which may be shared), then we can use a singleton and special case it in buildBooleanFormula. I think that's a good assumption, but am not sure. Thoughts?

clang/include/clang/Analysis/FlowSensitive/Value.h
103

They would only compare equal if we're using a singleton Top. In fact, that's not quite right because we have no deep equality on formulas. But, the caching mechanism would generate the same formula, so the point holds. In that case, we would need to eagerly open the Top, because we would lose any way to distinguish them. But, with different Top instances, this wouldn't be necessary. Overall, it seems like "open top as soon as possible without killing convergence" is probably our best heuristic, so this seems like a win. But, I'm pretty sure there are places where it will prevent convergence (on its own -- widen will still handle it fine I believe).

Avoiding the recursive walk would be nice.

ymandel updated this revision to Diff 466124.Oct 7 2022, 11:05 AM
ymandel marked an inline comment as done.

Address (most) comments.

ymandel marked 11 inline comments as done.Oct 7 2022, 11:17 AM
ymandel added inline comments.
clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
237–238

regarding the latter clause -- its freedom *does* affect satisfiability. e.g. A || Top is trivially satisfiable. I'm just going to drop the "and ...".

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
525–526

Agreed. I want to wait until we settle on the representation and then we can consider this operator. But, if we end up with a singleton Top then I think we can hold off.

1191

Good point. I was following what's done elsewhere in the file -- I think we should update all or nothing. that said, if you mark the class final, then what do you do with each method? nothing or override?

1301–1310

Sure. I went with a different name since it's playing a very specific role that's not related to the test in the way that Cond is.

1453

Added, but there's no loss of precision and so the test demonstrates that. My initial instinct was that there would be loss, but the guard of Cond actually preserves the precision. The cost is complexity -- Zab is a fresh atomic and the flow condition encoded enough information to recover its equivalence to Foo. But, it is not _equal_ to Foo, which would be nice.

The case where we lose precision is a loop, where the incoming edge doesn't carry a condition.

gribozavr2 accepted this revision.Oct 7 2022, 11:19 AM
gribozavr2 added inline comments.
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

I see. Could you add some direct tests for the SAT solver with formulas that include Top to check the behavior we care about?

sgatev added inline comments.Oct 10 2022, 4:13 AM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
156

Please add a comment.

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
291

Please add a comment.

clang/include/clang/Analysis/FlowSensitive/Value.h
100

explicit seems unnecessary.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
237–238

Where? Does that mean that TopBool never reaches the solver? I think it'd be good to clarify.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
1223

Why do we need to check two code points here and in the test below? It's not obvious what the difference between p1 and p2 is.

xazax.hun added inline comments.Oct 11 2022, 9:31 AM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

Given the discussion so far, it looks like having a singleton top has its own problems. I'm fine with the current solution for now, let's see if the overall approach works and we will see if this needs any further optimization later. +1 to Dmitry, some unit tests like Top && Top generated multiple times will not yield the same expression could be useful. If someone in the future tries to introduce singleton Top, they will see those tests fail and see why we did not have that in the first place and what problems need to be solved to introduce it.

ymandel updated this revision to Diff 467510.Oct 13 2022, 9:38 AM
ymandel marked 6 inline comments as done.

Address comments.

ymandel edited the summary of this revision. (Show Details)Oct 13 2022, 9:38 AM
ymandel updated this revision to Diff 467513.Oct 13 2022, 9:52 AM

Add explicit test for creation of multiple tops.

ymandel marked 5 inline comments as done.Oct 13 2022, 1:36 PM
ymandel added inline comments.
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

I've added a test that direct creation of two Tops results in distinct values (by pointer) and inequivalent (per the solver), and some tests relating to the solver which will fail if you switch to a singleton Top. I didn't see much room for a Top && Top test, since that seems redundant with the simpler tests. I think the inequivalence tests satisfies Dmitri's concern with conjunctions involving Top, but, happy to add more if you disagree.

I should note, though: Top iff Top is true when both Tops are identical values (pointer equality) but not when they are distinct values. This doesn't seem right to me -- I think we should have one answer for Top iff Top and I think it should be false. So, in some sense, even the current scheme suffers from the some of the drawbacks of the singleton scheme. I think this is ok for now, but it does seem to encourage improvement in further patches.

clang/include/clang/Analysis/FlowSensitive/Value.h
100

agreed. looks like a copy-paste error on my part.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
1223

added comments to both tests to explain motivation. The second one is actually a regression test since my original version had a bug in its handling of lvalues *not* in rvalue position.

ymandel updated this revision to Diff 467589.Oct 13 2022, 1:38 PM
ymandel marked an inline comment as done.

added another test

xazax.hun added inline comments.Oct 13 2022, 1:53 PM
clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
157

I think we should have one answer for Top iff Top and I think it should be false.

I'd love to see this as a TODO comment.

clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
525–526

We ended up not going with a singleton Top, let's reconsider the overloaded operator.

xazax.hun added inline comments.Oct 13 2022, 1:54 PM
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
1191

I prefer override on the methods in that scenario. But I agree that refactoring changes like this might be better done in a separate NFC commit.

ymandel updated this revision to Diff 467609.Oct 13 2022, 2:42 PM

Added FIXMEs for noted issues.

ymandel updated this revision to Diff 467611.Oct 13 2022, 2:49 PM

fix typos

ymandel marked 6 inline comments as done.Oct 13 2022, 2:50 PM
ymandel added inline comments.
clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp
525–526

added FIXME.

xazax.hun accepted this revision.Oct 13 2022, 2:50 PM
gribozavr2 accepted this revision.Oct 14 2022, 4:33 AM
gribozavr2 added inline comments.
clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
69

Why delete this test?

ymandel updated this revision to Diff 467746.Oct 14 2022, 5:13 AM
ymandel marked an inline comment as done.

Adding back accidentally deleted test

ymandel marked an inline comment as done.Oct 14 2022, 5:16 AM
ymandel added inline comments.
clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
69

That was a mistake. I've added it back

gribozavr2 accepted this revision.Oct 14 2022, 6:42 AM
ymandel updated this revision to Diff 467822.Oct 14 2022, 10:18 AM
ymandel marked an inline comment as done.

rebase onto a recent HEAD

This revision was landed with ongoing or failed builds.Oct 14 2022, 10:42 AM
This revision was automatically updated to reflect the committed changes.