This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix modeling arithmetic
ClosedPublic

Authored by alexander-shaposhnikov on Aug 24 2017, 2:43 PM.

Details

Summary

This diff attempts to fix modeling of arithmetic expressions
where pointers are treated as integers (i.e. via C-style / reinterpret casts).
In particular, it resolves https://bugs.llvm.org/show_bug.cgi?id=34309

P.S. Before this change the analyzer incorrectly called
evalBinOpLL(state, op, lhsL, makeLoc(i), resultTy).
Because of passing lhsL we were loosing the information about the previous cast to an integer type
and were treating this as "pointers arithmetic" rather than integers arithmetic.

Test plan: make check-all

Diff Detail

Repository
rL LLVM

Event Timeline

alexander-shaposhnikov edited the summary of this revision. (Show Details)Aug 24 2017, 2:49 PM

Thank you for the prompt fix!

NoQ edited edge metadata.Aug 25 2017, 4:27 AM

I guess we'd need more assertions to catch invalid symbols, will have a look.

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
371–373 ↗(On Diff #112614)

For now this code would return UnknownVal in most cases (pointer is not tainted, or not symbolic, or contains an offset), and still construct an invalid symbol in the rest of the cases (makeSymExprValNN would add the number to the pointer symbol, instead of modelling an offset within the pointed-to region).

Once D35450 finally lands (sorry for the delays...), it'd return UnknownVal less often and crash more often.

375–377 ↗(On Diff #112614)

I guess you're trying to address this FIXME i recently added. This is probably the way to go: just take the Loc behind the LocAsInteger, cast it to char * (because pointer type shouldn't affect how much bytes of offset are added, anymore), add your integer to it, pack back into LocAsInteger.

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
371–373 ↗(On Diff #112614)

@NoQ - many thanks for the code review!

i assume i might be missing smth, anyway, (replying to this comment and another one below)

>This is probably the way to go: just take the Loc behind the LocAsInteger, cast it to char * (because 
>pointer type shouldn't affect how much bytes of offset are added, anymore), add your integer to it, 
>pack back into LocAsInteger

i'm not sure i understand how that would work here and somehow don't like it.
For example, op can be MultiplicativeOp (i.e. long y = ((long)p) * 13;) extracting the Loc and doing smth with the Loc - doesn't seem to be the right thing (if i understood your suggestion correctly)

>For now this code would return UnknownVal in 
>most cases (pointer is not tainted, or not symbolic, 
>or contains an offset)

that was my understanding what this code should do (return UnknownVal in most cases unless we can reason about the actual (integer) values of LHS,RHS)

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
363 ↗(On Diff #112614)

@NoQ , @dcoughlin
while we are looking at this code - just to double check - is this line (363) actually correct ?
Let's take a look at the following example:

bool f(long x, double *p1, double *p2) {
   long y = (long)p1 - (long) p2; 
   // or,alternatively (long)p1 * (long)p2  or (long)p1 / (long)p2
   return y == x;
 }

it looks like again the analyzer will try to use evalBinOpLL and evaluate this as an operation over pointers, while (if my understanding is correct) we should be working with integers here (and yes, in most cases it should return UnknownVal)

NoQ added a comment.Aug 26 2017, 2:15 AM

Yeah, LocAsInteger is supported very poorly in most places. We can only get away with it because people rarely cast pointers to integers.

Your reasoning about how LocAsInteger works poorly with eg. multiplication or bitwise operations (consider some sanitizer computing shadow memory) is one of the reasons why i think LocAsInteger should be turned into a special symbol class ("SymbolRegionAddress"), to be able to participate in symbolic expressions. Even if our constraint manager is unable to reason about such expressions, having correct opaque expressions is better than having UnknownVals because at least we can carry constraints on exact same expressions, and also better than conjured symbols because we are sure to assign the same symbol if such expression is computed more than once. Such change would allow us to remove special handling of LocAsInteger in some cases (just treat it as any other symbol), while still performing simplifications where we want. We'd probably be able to collapse addresses of symbolic pointers into SymbolCasts, eg. $addr<element{SymRegion{$x}, 2, char} into (uintptr_t)$x + 2. But all of that is a bigger task, not sure if you want to dig into this much, as it's not necessary to fix the crashes.

lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
363 ↗(On Diff #112614)

Yeah, even pointer substraction would be handled incorrectly, because pointer types would be incorrectly taken into account (if only we actually had pointer substraction).

371–373 ↗(On Diff #112614)

Yeah, we should still return UnknownVal for now in most cases. I just wanted to point out that addition and substraction can be handled this way. And also that makeSymExprValNN() seems to be unable to handle LocAsInteger arguments correctly, so manual intervention would be necessary anyway. So i think it'd be better to just return UnknownVal directly.

Return UnknownVal for non-comparison operations.
Add FIXME (improve modeling of "pointers as integers").

alexander-shaposhnikov marked an inline comment as done.Aug 28 2017, 2:01 AM

updated the patch per suggestion by @NoQ

NoQ accepted this revision.Aug 28 2017, 7:32 AM

Thanks, LGTM!

This revision is now accepted and ready to land.Aug 28 2017, 7:32 AM
This revision was automatically updated to reflect the committed changes.