This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add a Fourier-Motzkin elimination solver backend.
Needs ReviewPublic

Authored by NoQ on Sep 20 2021, 9:47 PM.

Details

Summary

LLVM contains an implementation of Fourier-Motzkin elimination by @fhahn. It lives in llvm/Analysis/ConstraintSystem.h and allows solving systems of linear inequalities over real-world (non-wrapping) numbers. Even though machine integers are typically wrapping, Fourier-Motzkin elimination can still be used for refuting inequalities that are impossible *even* in real-world numbers - as long as we guarantee lack of overflows in our own linear operations. And, well, we can still take type size constraints into account.

In this patch I'm (partially) implementing conversion from static analyzer symbolic expressions to linear inequalities and adding Fourier-Motzkin elimination as a refutation backend (similar to our Z3 backend). It's currently off by default but I plan to enable it by default as soon as I double-check that the math is correct and make sure there are no performance regressions. I'd also consider making it a part of the default constraint manager so that to refute infeasible execution paths during analysis but I suspect it may have a much larger performance impact. No matter how this turns out, I see this work as orthogonal to other improvements in RangeConstraintManager; these can still keep coming.

Currently the conversion is very limited. It can already refute some simple constraints such as x < y, y < z, z > x but otherwise it treats most symbols as opaque and doesn't really see linear inequalities when you feed them to it. See the large FIXME for more details. It already did refute some real-world false positives for me, around 1% of total warnings, which is much weaker than Z3 but still very promising. Every false positive refuted this way was also refuted by Z3 which suggests (but of course doesn't prove) correctness of the solution.

Diff Detail

Event Timeline

NoQ created this revision.Sep 20 2021, 9:47 PM
NoQ requested review of this revision.Sep 20 2021, 9:47 PM
NoQ added inline comments.Sep 20 2021, 9:56 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
42

My implementation doesn't make use of abstractions from SMTConv.h but introduces its own abstractions instead. This is because the conversion process is too significantly different: while a generic SMT solver can consume arbitrary expressions which suggests a recursive rewrite, Fourier-Motzkin elimination only consumes expressions of fixed shape, which makes conversion very non-recursive and context-dependent. Once the conversion gets more advanced, it will probably also have to rely on other constraint information to avoid overflows which is redundant for a generic SMT solver.

NoQ updated this revision to Diff 373781.Sep 20 2021, 10:25 PM

Fix modules/shared libs build.

Ooooh, delicious! Diving in!

martong added inline comments.Sep 21 2021, 10:01 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
3193–3194

This is a good check!

3227

Why don't we need the location context anymore?

3253–3254

This if block and the one at line 3259 seems to be redundant (?). We have an assertion in the constructor that either ShouldCrosscheckWithFME or ShouldCrosscheckWithZ3 is true.

clang/lib/StaticAnalyzer/Core/FMEConv.cpp
39–42

llvm::find(Vars, Sym)

fhahn added a comment.Sep 21 2021, 2:28 PM

Great to see this coming along :)

On naming, I'd not reference FME, as this is an implementation detail which could change, e.g. if we chose implement a different algorithm later on.

Nice work!

clang/lib/StaticAnalyzer/Core/FMEConv.cpp
59

So, in one row each variable's multiplier is initialized to 0 and later we set the Sym's multiplier to 1. This took me some time to realize ... perhaps a comment could be helpful here.

85

Perhaps assert(Op != BO_NE) ?

100
156

I think we could do better than the default behavior in case of SymIntExpr and IntSymExpr. We could call directly the add(SymbolRef Sym, BinaryOperator::Opcode Op,const APSInt &Val) overload. This would be superior especially in case of ranges with one element, e.g. x == 0.

martong added inline comments.Sep 22 2021, 4:30 AM
clang/lib/StaticAnalyzer/Core/FMEConv.cpp
156

Umm, please disregard my previous comment. I suppose your comment above with the FIXME (line 94) refers to the handling SymInt and IntSym expressions ? Let's suppose we have x+1 in the range [0, 0]. We could transform this to x <= -1 and x >= 1. What is missing to do the transformation?

According to the coverage of the test I can propose to test for some edge cases:

  • _ExtInt(64+)
  • uint64_t(INT64_MAX), uint64_t(INT64_MAX + 1)
  • etc...

I'm attaching the related gcov HTMLs about the two interesting files.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
601

Why not const?

607

We should probably mark this explicit.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
3249–3250

I think the comment is misleading.
If I comprehend this right, it suggests to me that even if FMA is disabled but Z3 is enabled it would still try to crosscheck with FMA. On the other hand, the behavior of the code reflects a different behavior.

3253–3254

I don't think it's redundant, but I agree that it has a code smell.
If this is the behavior we want to implement we could achieve by something like this:

ShouldCrosscheckWithFME ? crosscheckWithFME(BR, BRC) : crosscheckWithZ3(BR, BRC);

3259–3260

I'm not exactly sure why we check and return here. We would return at the end of the function regadless.

clang/lib/StaticAnalyzer/Core/FMEConv.cpp
53

You probably meant > here.
Otherwise, the condition is always false at L63 is dead.

144

I would rather prefer getValue(), it looks weird now - at least for me.

martong added inline comments.Sep 22 2021, 6:00 AM
clang/lib/StaticAnalyzer/Core/FMEConv.cpp
144

Even better would be to have an early return if the optional is not set, just to avoid this deep nesting of if blocks. But, you could do that early return only if the whole if (const auto *CompSym = dyn_cast<SymSymExpr>(Sym)) { ...} was in a function (or in a lambda).

NoQ added a comment.Sep 22 2021, 2:04 PM

On naming, I'd not reference FME, as this is an implementation detail which could change, e.g. if we chose implement a different algorithm later on.

Got it, will fix! Do I understand correctly that the input format ("a system of linear inequalities over real numbers") probably won't change? So I'll still have to do the same conversion and the same overflow-checking?

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
3227

It never made sense in the first place. The second parameter isn't actually used meaningfully by the bug report, it only contributes to the identity of the invalidation stamp, so that this stamp could be removed later if necessary. Even if we ever wanted to remove this particular invalidation stamp, the location context is a really weird choice for its identity. I think this was a copy-paste from some other invalidation.

3253–3254

I think it makes sense to allow cross-checking with both.

3259–3260

For symmetry. No other reason.

clang/lib/StaticAnalyzer/Core/FMEConv.cpp
53

Nice catch! I'll fix.

85

This is already covered by an unreachable below but I agree that it's better to put this on top as a pre-condition.

156

Let's suppose we have x+1 in the range [0, 0].

We won't actually ever have such range. RangeConstraintManager would simplify it into x in range [-1, -1] before recording in the ProgramState.

But if we did have such range then we'd also have overflow problems. Imagine x is also unsigned. Then your original range is SAT in the original program but UNSAT in FME. We'll need overflow checks. They're not hard but we do need some of them.

That's a great beginning! I'd like to join!

clang/include/clang/StaticAnalyzer/Core/PathSensitive/FMEConv.h
26

I see you frequently use predefined size of 8 in SmallVector. Would it be better to make an alias for that, since the structures refer to each other?

SmallVector8<SymbolRef> Vars;
SmallVector8<std::string> Names;
SmallVector8<int64_t> Rows;
clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
292–293
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
3259–3260

I prefer symmetry as it make code readable twice quicklier.

clang/lib/StaticAnalyzer/Core/FMEConv.cpp
48

Can we get a commented description in a few words about what's going inside the function? (and so in the rest)

50

I'd avoid magic numbers.

105–106
129

Is it supposed any logic for EQ, NE subsequently?

158

Could you write a test case for this(with/without holes)?

clang/test/Analysis/fme-crosscheck.c
27–30

I'd like to see tests with MAX, MIN as well.

Ping. Is this alive?