This is an archive of the discontinued LLVM Phabricator instance.

[dataflow] Add dedicated representation of boolean formulas
ClosedPublic

Authored by sammccall on Jun 20 2023, 9:16 AM.

Details

Summary

This is the first step in untangling the two current jobs of BoolValue.

Desired end-state:

  • BoolValue will model C++ booleans e.g. held in StorageLocations. this includes describing uncertainty (e.g. "top" is a Value concern)
  • Formula describes analysis-level assertions in terms of SAT atoms.

These can still be linked together: a BoolValue may have a corresponding
SAT atom which is constrained by formulas.

Done in this patch:

BoolValue is left intact, Formula is just the input type to the
SAT solver, and we build formulas as needed to invoke the solver.

Incidental changes to debug string printing:

  • variables renamed from B0 etc to V0 etc B0 collides with the names of basic blocks, which is confusing when debugging flow conditions.
  • debug printing of formulas (Formula and Atom) uses operator<< rather than debugString(), so works with gtest. Therefore moved out of DebugSupport.h
  • Did the same to Solver::Result, and some helper changes to SolverTest, so that we get useful messages on unit test failures
  • formulas are now printed as infix expressions on one line, rather than wrapped/indented S-exprs. My experience is that this is easier to scan FCs for small examples, and large ones are unreadable either way.
  • most of the several debugString() functions for constraints/results are unused, so removed them rather than updating tests. Inlined the one that was actually used into its callsite.

Diff Detail

Event Timeline

sammccall created this revision.Jun 20 2023, 9:16 AM
Herald added a project: Restricted Project. · View Herald Transcript
sammccall requested review of this revision.Jun 20 2023, 9:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2023, 9:16 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

update tests

sammccall updated this revision to Diff 533025.Jun 20 2023, 1:02 PM
sammccall edited the summary of this revision. (Show Details)

updating description

sammccall updated this revision to Diff 533027.Jun 20 2023, 1:13 PM
sammccall edited the summary of this revision. (Show Details)

more tests

sammccall updated this revision to Diff 533028.Jun 20 2023, 1:15 PM

remove dead comment

kinu added a subscriber: kinu.Jun 21 2023, 3:15 AM

Splitting out Formula from BoolValue is a pretty big change... I've tried to keep this patch a manageable size by really just using Formula for interacting with the SAT solver and some printing, but this means we're maintaining BoolValues and Formulas in parallel for now. Best to take a look at Formula.h, then look ahead to D153469 to see how we can eliminate the BoolValue subclasses, stop using Values as flow conditions in DataflowAnalysisContext, etc.

Would be very happy to hear high-level feedback/concerns first.
(I do know there's a bunch of places where words in comments/variable names should likely change - I thought making all these up front may make this very noisy to review)

High-level comment: I like the separation of concerns that this introduces.

Would you like me to review https://reviews.llvm.org/D153469 or should we wait for this patch to converge first?

clang/include/clang/Analysis/FlowSensitive/Formula.h
23

Put this right above the class definition below? (This is just needed for the alignas(Formula *), right?)

39

Is this important to state? (It's an implementation detail of Atom, essentially?)

48

Do you need the alignas because the pointers to operands are stored as trailing objects? Can you add a short comment to that effect?

72

Is this technically legal? (I've taken a look, but the definition of similar types makes my eyes glaze over.)

Why not use TrailingObjects instead? (I'm not really familiar with TrailingObjects, so there may be good reasons against it -- just asking because it seems like an obvious existing mechanism that could be used here.)

83

This looks like an "internal" API -- it's really only intended for use by Arena, right?

Maybe add a comment indicating that it's not intended to be used directly?

105

Why not give this type Atom?

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
219

Now that we've added Formula, it's pretty confusing that we also have BooleanFormula.

I assume this is one of the renamings that you want to get to later? I guess BooleanFormula becomes something like CnfFormula (would like to do 3CnfFormula, but you can't start with a digit... :( )

Would you like me to review https://reviews.llvm.org/D153469 or should we wait for this patch to converge first?

Similarly it'd be great to get high-level feedback there: whether the design of FormulaBoolValue is right, whether the changes to use Atom/Formula more pervasively seem sensible etc.
For the details, probably better to wait until this one is stable to avoid churn...

Thanks!

clang/include/clang/Analysis/FlowSensitive/Formula.h
72

Is this technically legal?

Yeah, it's OK: similar types etc is not relevant because these pointers are not pointing to objects of different types: there's an object of type Formula at one address that has a lifetime that begins at the constructor (but it's trivially destructible and never destroyed), and then objects of type Formula* at higher addresses - these don't overlap, and are both part of the chunks of memory provided by malloc.

Why not use TrailingObjects instead?

TrailingObjects is complicated and ugly - it *can* work for this, but I don't think it's actually easier to understand. (Do a code search for "this + 1" in LLVM - this is often done "by hand" instead of with TrailingObjects for simple cases like this one).

e.g. here we have a little magic in create() and a little in operands(). TrailingObjects equally needs ~2 pieces of magic, they're *slightly* less scary, but they also don't relate directly to the memory layout, and I don't think you can have a meaningful understanding of what this is for without also thinking about the memory layout.

TrailingObjects shines when the pointer arithmetic gets complicated, though...

105

I don't think it should always be Atom, even when it's used

see D153485 where I add a "Literal" formula and use value to distinguish true/false

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
219

Agree this is an unfortunate conflict and we should probably rename this local one.
(Just because it's more important that the public type gets the good name)

CNFFormula or just CNF SGTM.
I'll leave this open and do it as a followup if that's OK, the patch is noisy as is.

Would you like me to review https://reviews.llvm.org/D153469 or should we wait for this patch to converge first?

Similarly it'd be great to get high-level feedback there: whether the design of FormulaBoolValue is right, whether the changes to use Atom/Formula more pervasively seem sensible etc.
For the details, probably better to wait until this one is stable to avoid churn...

Thanks!

Will do!

clang/include/clang/Analysis/FlowSensitive/Formula.h
72

Is this technically legal?

Yeah, it's OK: similar types etc is not relevant because these pointers are not pointing to objects of different types: there's an object of type Formula at one address that has a lifetime that begins at the constructor (but it's trivially destructible and never destroyed), and then objects of type Formula* at higher addresses - these don't overlap, and are both part of the chunks of memory provided by malloc.

Thanks for the explanation!

Is it worth putting a (condensed) version of this in a comment, or is this well-known enough that it's not worth it?

Why not use TrailingObjects instead?

TrailingObjects is complicated and ugly - it *can* work for this, but I don't think it's actually easier to understand. (Do a code search for "this + 1" in LLVM - this is often done "by hand" instead of with TrailingObjects for simple cases like this one).

e.g. here we have a little magic in create() and a little in operands(). TrailingObjects equally needs ~2 pieces of magic, they're *slightly* less scary, but they also don't relate directly to the memory layout, and I don't think you can have a meaningful understanding of what this is for without also thinking about the memory layout.

TrailingObjects shines when the pointer arithmetic gets complicated, though...

Again, thanks for the explanation. I don't have any experience with TrailingObjects except for using classes that are implemented using it, so thanks for filling me in.

105

Ah, I understand now.

Maybe add a comment somewhere to explain this distinction? (If it had me scratching my head, it might do so for others too.)

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
219

Yes, please do leave it open. This should happen later, just wanted to clarify that that's the plan.

sammccall updated this revision to Diff 534117.Jun 23 2023, 4:27 PM
sammccall marked 10 inline comments as done.

address Martin's review comments

BTW, since this is a substantial API/concept change, I'll wait on approval from all of @mboehme, @xazax.hun, @gribozavr2 - if you don't plan to review just LMK.
(Not in a hurry, just wanted to mention so we don't deadlock :-)

clang/include/clang/Analysis/FlowSensitive/Formula.h
72

Is it worth putting a (condensed) version of this in a comment, or is this well-known enough that it's not worth it?

I've added a few lines inside create() - putting this in the class comment is just distracting I think.

83

Done. We could friend Arena, but it's not actually a really specialized or dangerous API, so I think a comment is cleaner.

clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp
219

In fact there are only a handful of references to this class name, and it's pretty confusing, so I've done the rename to CNFFormula. I think this actually makes the purpose of this class much clearer!

(I did also update some comments here were "value" was being used in a way that's now misleading)

rebase, primarily on the SAT-inputs-are-ordered change (SetVector etc)

clarify that Formula::print output is supposed to be reliably stable.
This is useful for testing downstream analyses: having a representation of
formulas that can outlive the DataAnalysisContext makes tests more ergonomic.

Change Formula::print back to using a map of atom names instead of a delegate
print function. The extra flexibility is not useful, and it's awkward to use.

gribozavr2 accepted this revision.Jun 30 2023, 10:50 AM
gribozavr2 added inline comments.
clang/include/clang/Analysis/FlowSensitive/Formula.h
28
34
47

For consistency with the printed representation of the formula.

70

Should it be called getAsAtom() or castAsAtom() ? For uniformity with other names in Clang and LLVM.

130–131

'inline' should be unnecessary here.

clang/include/clang/Analysis/FlowSensitive/Solver.h
73–74
clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
137
clang/unittests/Analysis/FlowSensitive/SolverTest.cpp
92

While you're refactoring this test, could you add 'using' decls for AssignedTrue and AssignedFalse at the top of the file?

clang/unittests/Analysis/FlowSensitive/TestingSupport.h
475

I'd say either "creates a fresh..." or "returns a reference to..." (although pedantically, it returns a pointer)

485–487
499–500

Similarly, drop "value" from the rest of the comments above.

This revision is now accepted and ready to land.Jun 30 2023, 10:50 AM
sammccall marked 11 inline comments as done.Jul 4 2023, 3:19 AM
sammccall added inline comments.
clang/include/clang/Analysis/FlowSensitive/Formula.h
70

This isn't quite a cast - the distinction between a variable's identity and a reference to it is real.
renamed to getAtom()

This revision was landed with ongoing or failed builds.Jul 4 2023, 3:20 AM
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.
uabelho added a subscriber: uabelho.Jul 4 2023, 5:01 AM
Failed Tests (6):
  Clang-Unit :: Analysis/FlowSensitive/./ClangAnalysisFlowSensitiveTests/BoolValueDebugStringTest/ComplexBooleanWithSomeNames
  Clang-Unit :: Analysis/FlowSensitive/./ClangAnalysisFlowSensitiveTests/BoolValueDebugStringTest/Conjunction
  Clang-Unit :: Analysis/FlowSensitive/./ClangAnalysisFlowSensitiveTests/BoolValueDebugStringTest/Disjunction
  Clang-Unit :: Analysis/FlowSensitive/./ClangAnalysisFlowSensitiveTests/BoolValueDebugStringTest/Iff
  Clang-Unit :: Analysis/FlowSensitive/./ClangAnalysisFlowSensitiveTests/BoolValueDebugStringTest/Implication
  Clang-Unit :: Analysis/FlowSensitive/./ClangAnalysisFlowSensitiveTests/BoolValueDebugStringTest/NestedBoolean

https://lab.llvm.org/buildbot/#/builders/139/builds/44269

thakis added a subscriber: thakis.Jul 4 2023, 5:15 AM

Looks like this breaks tests on windows: http://45.33.8.238/win/80815/step_7.txt

Please take a look etc

TWeaver added a subscriber: TWeaver.Jul 4 2023, 5:41 AM

Hello and good afternoon from the UK,

I believe this change has introduced test failures on the following buildbot:

https://lab.llvm.org/buildbot/#/builders/139/builds/44269

are you able to take a look see?

Much appreciated,
Tom W

My apologies but I've had to revert this change for now until the author can address the buildbot failures.

My apologies but I've had to revert this change for now until the author can address the buildbot failures.

Thanks for the revert, and sorry for the disruption - I expected to be around to keep an eye on this, but was away.

My apologies but I've had to revert this change for now until the author can address the buildbot failures.

Thanks for the revert, and sorry for the disruption - I expected to be around to keep an eye on this, but was away.

No worries Sam, thanks for understanding! Good luck on the fix, let me know if you need any help 👍