This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Introduce small improvements to the solver infra
ClosedPublic

Authored by vsavchenko on Jun 23 2020, 7:19 AM.

Details

Summary
  • Add a new function to delete points from range sets.
  • Introduce an internal generic interface for range set intersections.
  • Remove unnecessary bits from a couple of solver functions.
  • Add in-code sections.

Diff Detail

Event Timeline

vsavchenko created this revision.Jun 23 2020, 7:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2020, 7:19 AM

Remove hunk that is meant for the next commmit

Fix comment

Add forgotten nodiscards

NoQ added inline comments.Jun 23 2020, 7:51 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
556–566

I'm a bit confused, why aren't these invocations of getRangeForInvertedSub() and getRangeForComparisonSymbol() implemented simply as steps in Visit()? What's so different about them, other then our desire not to go into infinite recursion (which should probably be addressed via memoization)?

vsavchenko marked an inline comment as done.Jun 23 2020, 8:09 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
556–566

I even had a patch that moves these two functions inside of the Visit, but then I reverted that change.
Right now the logic for each symbol is like this: let's try to find anything on this symbol in the existing constraints and if we did find something - return it. And only if that didn't work, we try to figure out something from the sub-tree on its own.

So what does that mean in here: that we will need to make a performance impacting [probably nothing serious, but still a functional] change - to intersect the range assigned to the symbol and whatever we can infer from its subtrees. It would be good to have when we support binary operators +/-/==/<=/etc. because it can help narrowing it down.

As of now, it will give no information, only possible overhead of traversing a tree. For this reason, I suggest to move those separately and AFTER we support more operators, because that will also have a couple of good motivation examples.

Harbormaster completed remote builds in B61398: Diff 272722.
NoQ added inline comments.Jun 24 2020, 2:03 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
467–470

Why do we have these representations in the first place?

It sounds like you're treating null pointers / empty optionals equivalently to full ranges (i.e., intersecting with them has no effect). How hard would it be to simply construct an actual full range in all the places in which we currently have empty optionals?

xazax.hun added inline comments.Jun 24 2020, 4:00 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
373

Sorry if my question is dumb, but I do not really have a mental model at this point about where do we actually handle types and overflows. Will this method work when we delete the last or the first element of the full range of a type?

I think having unit tests would be a great way to make this clear. I always felt that the solver is actually something that should be really easy to test separately and those tests would also help a lot to understand how the solver is actually working.

467–470

+1

I also find this confusing. Should I interpret a None as a full or empty range? Does this depend on the context or a general rule? Is there any reason we need to handle nullable pointers or could we actually make call sites more uniform to get rid of some of the complexity here?

vsavchenko marked an inline comment as done.Jun 24 2020, 4:26 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
467–470

It sounds like you're treating null pointers / empty optionals equivalently to full ranges (i.e., intersecting with them has no effect)

It is not actually true. Empty optionals is the information that "this range information is not available for this symbol". It is true that intersecting with them has no effect, but they are semantically different in other aspects.

Currently solver RELIES on the information that whether the range is available or not (see my previous comment), and we simply couldn't get rid of this behavior as part of NFC.

How hard would it be to simply construct an actual full range in all the places in which we currently have empty optionals?

It is not hard architecturally, but it WILL make the change functional and possibly impact the performance.

Should I interpret a None as a full or empty range?

Neither of those. That is the problem right now, that we rely on different sources of information for the ranges and behave differently depending on whether one piece of information is available or not.

Does this depend on the context or a general rule?

Semantics are always the same - this info is not available.

Is there any reason we need to handle nullable pointers?

State->get<XYZ>(abc) returns a nullable pointer meaning optional object, it is hard to avoid it especially because we currently behave very differently when this information is not available.

...could we actually make call sites more uniform to get rid of some of the complexity here?

Yes we can, but later on. See the explanations above and in my other comment.

vsavchenko marked an inline comment as done.Jun 24 2020, 4:41 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
373

Sorry if my question is dumb...

No problem at all, I also had this question when I looked at that.

where do we actually handle types and overflows

APInt class basically does it all for us in terms of overflows, number of bits, and signedness.

I think having unit tests would be a great way to make this clear

I am onboard! I will add a framework for testing the solver in later patches.

Hi @vsavchenko , sorry for the late review.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
367–378

Useful function. But I'd better rename it to subtract as we are working with sets (as a mathimatical collection). We should have a such one for the Ranges not only for Points.
We have intersect, delete aka subtract. And we also need to have functions union and symmetricDifference to cover full palette of common operations on sets.

733

As for me, I'd call this like getRangeForNegatedSymSymExpr, since you do Negate operation inside.

840–843

Don't you think this is too complicated for such a simple getter?
Maybe we can just construct the range using smth about RangeSet(RangeFactory, ++Zero, --Zero); ?

vsavchenko marked 3 inline comments as done.Jul 8 2020, 5:56 AM

Hi @ASDenysPetrov no problem at all! Thanks for taking your time and checking it.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
367–378

I agree that we should have a full set of functions. I don't agree however, that this function is a subtract. Subtract is an operation on two sets and here we have a set and a point. One might argue that a point is just a very simple set, that's true, but real subtract would be more complex in its implementation.

Naming it delete, on the other hand, I was coming from a notion of deleting points or neighbourhoods (https://en.wikipedia.org/wiki/Neighbourhood_(mathematics)#Deleted_neighbourhood).

733

I'm not super happy about my name either, but I feel like it describes it better than the previous name and your version. That function doesn't work for any SymSymExpr and it doesn't simply negate whatever we gave it. It works specifically for symbolic subtractions and this is the information I want to be reflected in the name.

840–843

It is more complex than a false range but there is a reason for it.

First of all, RangeSet can't have ranges where the end is greater than its start. Only Intersect can handle such ranges correctly. Another thing is that ranges like that mean [MIN, --Zero], [++Zero, MAX] and without a type we can't really say what MIN and MAX are, so such constructor for RangeSet simply cannot exist.

Another point is that we do need to have [MIN, -1], [+1, MAX] as opposed to [-1, -1], [+1, +1] because of C language (it doesn't have boolean type), and because of the cases like a - b where we know that a != b.

I hope that answers the question.

@vsavchenko
Thank you.
Despite of all of my nits, LGTM!

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
367–378

One might argue that a point is just a very simple set

That's actually what I mean :) and for this particular case you may leave the implementation as is. And for me it still does what subtract does. But I'm OK, I don't insist.

733

Oh, I just assumed ...Sub at the end as a subexpression but you mean subtraction. What I'm trying to say is that we can rename it like getRangeFor...the expression which this function can handle. E.g. getRangeForNegatedSubtractionSymSymExpr. My point is in a speaking name.

I think invertion is not something appropriate in terms of applying minus operator. I think invertion of zero should be something opposite but not a zero. Because when you would like to implement the function which turns [A, B] into [MIN, A)U(B, MAX], what would be the name of it? I think this is an invertion.

But this is not a big deal, it's just my thoughts.

840–843

I just want this function has low complexity and be more lightweight as getFalseRange. And if we have any chance to simplify it (and you have all the data to get MIN and MAX), it'd be cool.

vsavchenko marked 2 inline comments as done.Jul 8 2020, 8:57 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
733

My thought process here was that we are trying to get range for A - B and there is also information on B - A, so we can get something for A - B based on that. So, it doesn't really matter what it does under the hood with ranges, it matters what its semantics are. Here I called B - A an inverted subtraction.
I don't really know what would be the best name, but I thought that this one makes more sense.

840–843

infer(QualType T) does just that ☺️ So it is a pretty low complexity.

ASDenysPetrov accepted this revision.Jul 8 2020, 9:19 AM
This revision is now accepted and ready to land.Jul 8 2020, 9:19 AM
NoQ accepted this revision.Jul 8 2020, 9:32 AM
NoQ added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
467–470

it WILL make the change functional and possibly impact the performance

Mmm, yeah, i guess the necessity to construct a full range and then do the possibly O(N) intersection when it'll do nothing most of the time is indeed daunting.

Maybe we should implement full ranges as a special case in the RangeSet class? Like, don't actually construct the single range, just remember the type; special-case all operations on such sets to perform trivially in O(1), etc.?

733

Because when you would like to implement the function which turns [A, B] into [MIN, A)U(B, MAX], what would be the name of it? I think this is an invertion.

https://en.wikipedia.org/wiki/Complement_(set_theory)
https://en.wikipedia.org/wiki/Inverse_function

"Negated subtraction" is definitely my favorite so far. "Mirrored" might be a good layman term as well.

ASDenysPetrov added inline comments.Jul 15 2020, 3:10 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
733

Rename function

This revision was automatically updated to reflect the committed changes.