This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Refactor range inference for symbolic expressions
ClosedPublic

Authored by vsavchenko on May 1 2020, 2:04 AM.

Details

Summary

This change introduces a new component to unite all of the reasoning
we have about operations on ranges in the analyzer's solver.
In many cases, we might conclude that the range for a symbolic operation
is much more narrow than the type implies. While reasoning about
runtime conditions (especially in loops), we need to support more and
more of those little pieces of logic. The new component mostly plays
a role of an organizer for those, and allows us to focus on the actual
reasoning about ranges and not dispatching manually on the types of the
nested symbolic expressions.

Diff Detail

Event Timeline

vsavchenko created this revision.May 1 2020, 2:04 AM
NoQ added a subscriber: steakhal.May 1 2020, 11:16 PM

Very nice, i like this architecture.

@baloghadamsoftware @steakhal @ASDenysPetrov will you be able to plug D49074/D50256/D77792/D77802/D78933 into this so that to separate algebra from pattern-matching and ensure no performance regressions? Or is something still missing?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
271–281

Can we replace these three with a single VisitBinarySymExpr()? Or is there too much template duck typing involved?

340–345

I think this is a must-have, at least in some form. We've been exploding like this before on real-world code, well, probably not with bitwise ops but i'm still worried.

vsavchenko marked 3 inline comments as done.May 2 2020, 3:00 AM

Very nice, i like this architecture.

Aww, thanks 😊

@baloghadamsoftware @steakhal @ASDenysPetrov will you be able to plug D49074/D50256/D77792/D77802/D78933 into this so that to separate algebra from pattern-matching and ensure no performance regressions? Or is something still missing?

Yeah, I'll be happy to hear what will be good to have for all the different cases we have and might have in the future.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
271–281

Unfortunately no, we need to know more derived types in order to use getLHS and getRHS methods. And that's why VisitBinaryOperator function is a template.

340–345

It will be pretty easy to introduce a limit on how deep we go into a tree of the given symbolic expression. That can also be a solution.

In D79232#2016065, @NoQ wrote:

@baloghadamsoftware @steakhal @ASDenysPetrov will you be able to plug D49074/D50256/D77792/D77802/D78933 into this so that to separate algebra from pattern-matching and ensure no performance regressions? Or is something still missing?

Sure. I'll plug my changes D77802/D78933 as soon as this patch is available in the master.

NoQ added inline comments.May 3 2020, 2:10 PM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
340–345

I mean, doing something super trivial, like defining a map from symexprs to ranges in SymbolicRangeInferrer itself and find-or-inserting into it, will probably not be harder than counting depth(?)

vsavchenko updated this revision to Diff 261833.EditedMay 4 2020, 8:42 AM
vsavchenko marked an inline comment as done.

Now getRange is more likely to return unfeasible range. Calling Intersect and pin methods from such ranges might cause a crash.
Check for unfeasible ranges.

NoQ added inline comments.May 4 2020, 9:21 AM
clang/test/Analysis/constant-folding.c
127–128

How can both of these be false? o.o

xazax.hun added inline comments.May 4 2020, 9:49 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
340–345

I am a bit ignorant of this topic, but I wonder what a good caching mechanism would look like.
A simple symexpr -> range mapping does not feel right as the same symexpr might have a different range in a different program state (e.g., we might learn new ranges for our symbols). But having a separate map for each state state might do relatively little caching?

xazax.hun added inline comments.May 4 2020, 10:00 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
380

I always get surprised when I read code like the one above seeing that only RHS is tested for being a concerte value. Later on, I vaguely start to remember that we only produce SymIntExprs (is that correct?). I wonder if we should add an assert so this code blows up when someone is trying to add IntSymExprs, so she will know what code needs modification.

Add clarification on infeasible ranges in bitwise tests

vsavchenko marked an inline comment as done.May 5 2020, 12:38 PM
vsavchenko added inline comments.
clang/test/Analysis/constant-folding.c
127–128

Yeah :) I realized how weird it is.
Anything is possible in the land of infeasible ranges.

I changed a comment there to address this

NoQ added inline comments.May 5 2020, 12:57 PM
clang/test/Analysis/constant-folding.c
127–128

I mean, this pretty much never happened before. How are you not tripping on this assert? (probably it's simply been disabled in normal debug builds now that it's under "expensive checks")

The correct thing to do is to detect the paradox earlier and mark the path as infeasible. What prevents us from doing it right away here?

vsavchenko marked an inline comment as done.May 5 2020, 2:48 PM
vsavchenko added inline comments.
clang/test/Analysis/constant-folding.c
127–128

Before we didn't really care about constraints on the operands and I changed it :)
So, now Intersect (which is logically not a correct way to do what is meant) can cause this type of behaviour

NoQ added inline comments.May 6 2020, 7:55 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
340–345

Even a simple symexpr -> range mapping with lifetime of SymbolicRangeInferrer should be able to improve algorithmic complexity dramatically. And it doesn't need to consider different states because it only lives for the duration of a single assume() operation.

clang/test/Analysis/constant-folding.c
127–128

[visible confusion]

Could you elaborate? I see that only constraint so far is $a: [11; UINT_MAX]. I don't see any infeasible ranges here. (a & 1) <= 1 is clearly true. If we were previously thinking that it's unknown and now we think that it's false, then it's a regression.

vsavchenko marked an inline comment as done.May 6 2020, 8:25 AM
vsavchenko added inline comments.
clang/test/Analysis/constant-folding.c
127–128

a is indeed [11, UINT_MAX].
Current implementation checks a constant (i.e. 1) and intersects the range for LHS [11, UINT_MAX] with [UINT_MIN, 1], which produces empty range set (aka infeasible).

This is why I'm saying that intersection is a bad choice, it's even plain wrong.
Before this patch we ignored constraints for a and considered it to be [UINT_MIN, UINT_MAX]. In that setting, intersection does indeed work (which doesn't make it correct).

Yes, it is a regression. I'm changing this implementation in the child revisions.

NoQ accepted this revision.May 9 2020, 7:11 PM
NoQ added inline comments.
clang/test/Analysis/constant-folding.c
127–128

Yes, it is a regression. I'm changing this implementation in the child revisions.

Oh, right, got it :D

Ok, let's land 'em together then!

This revision is now accepted and ready to land.May 9 2020, 7:11 PM

Fix a problem with doubles sneaking into integer symbolic expressions

This revision was automatically updated to reflect the committed changes.