This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Constraint Manager - Calculate Effective Range for Differences
ClosedPublic

Authored by baloghadamsoftware on Nov 28 2018, 7:48 AM.

Details

Summary

Since rL335814, if the constraint manager cannot find a range set for A - B (where A and B are symbols) it looks for a range for B - A and returns it negated if it exists. However, if a range set for both A - B and B - A is stored then it only returns the first one. If we both use A - B and B - A, these expressions behave as two totally unrelated symbols. This way we miss some useful deductions which may lead to false negatives or false positives.

This tiny patch changes this behavior: if the symbolic expression the constraint manager is looking for is a difference A - B, it tries to retrieve the range for both A - B and B - A and if both exists it returns the intersection of range A - B and the negated range of B - A. This way every time a checker applies new constraints to the symbolic difference or to its negated it always affects both the original difference and its negated.

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ added a comment.Nov 30 2018, 5:44 PM

Is it an option to canonicalize the expression so that B - A was never stored in the first place? I.e., do this range intersection at the moment of writing the range, not at the moment of reading the range.

This could be implemented by, say, comparing symbol IDs for A and B and making sure that in every stored SymSymExpr the first symbol's ID is greater than the second symbol's ID.

My original idea was that once we ony store either A - B or B - A. Thus if we already have A - B stored then do not store range for B - A but negate both the difference and the range. I can think on two ways to implement this:

  1. Create a separate function e.g. setRange() to store the range. This function checks whether the symbol is a difference and whether we already have a range for its negated. If so, then negate the difference and the range as well. ( We do not need to intersect them because the caller already did it.) However, in this case we negate twice: once in getRange() then once in setRange().
  1. Move the negation out of getRange() and call check for a stored negated difference before calling it. If it exist then call the appropriate assume function for the negated difference (for == and != it is the same function, but reverse the operator for the rest).

Your idea (store either A - B or B - A based on their symbol ID is also feasible but then we also face the same question. So 1) or 2)?

NoQ added a comment.Dec 6 2018, 4:54 PM

I just generally wish for a function that would return the most specific known range for the given arbitrary symbol. This is something we can put, for example, into visitor path notes.

Eg., imagine "Assuming x is greater than 5" -> "Tainted array index constrained to [5, 6] U {8} U [10, 12]" -> "Potential buffer overflow: accessing array of 11 elements with a tainted index".

getRange() is a great name for such function. For now i think it kinda works for atomic symbols with a few extra goodies on top of that. If we generally move towards that goal, i think it would be great.

MTC added a subscriber: MTC.Dec 6 2018, 6:19 PM
In D55007#1322335, @NoQ wrote:

I just generally wish for a function that would return the most specific known range for the given arbitrary symbol. This is something we can put, for example, into visitor path notes.

Eg., imagine "Assuming x is greater than 5" -> "Tainted array index constrained to [5, 6] U {8} U [10, 12]" -> "Potential buffer overflow: accessing array of 11 elements with a tainted index".

getRange() is a great name for such function. For now i think it kinda works for atomic symbols with a few extra goodies on top of that. If we generally move towards that goal, i think it would be great.

I think my patch is exactly about this for symbol differences: it returns the most specific known range for the difference by intersecting the range stored for the differences with the range stored for the negated difference.

It may be better not to store the difference for the negated range at all, but it is a bigger change in the code. (See my options 1 and 2.)

NoQ accepted this revision.Mar 27 2019, 3:34 PM

What i was trying to say with my last comment is that i guess i'd rather go for option (1) because with that getRange() remains the single source of truth, which is comfy.

I agree this shouldn't really be blocking the patch - sorry for stalling! - i'm hopefully slowly getting better at not stalling.

Generally i would have went for saving some memory and expensive ImmutableMap lookups by canonicalizing the key as much as possible.

Do we want to add the opposite test

void effective_range_2(int m, int n) {
  assert(m - n <= 0);
  assert(n - m <= 0);
  clang_analyzer_eval(m - n == 0); // expected-warning{{TRUE}} expected-warning{{FALSE}}
  clang_analyzer_eval(n - m == 0); // expected-warning{{TRUE}} expected-warning{{FALSE}}
}

...where the FALSE case corresponds to m - n == INT_MIN?

lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
483 ↗(On Diff #175693)

sotred -> stored :)

This revision is now accepted and ready to land.Mar 27 2019, 3:34 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2019, 6:04 AM