This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Handle pointer difference of ElementRegion and SymbolicRegion
Needs ReviewPublic

Authored by steakhal on Jul 28 2020, 2:23 AM.

Details

Summary

Currently, if the analyzer evaluates an expression like to - from, it only computes reasonable answer if both are representing ElementRegions.
This patch aims to extends the analyzer to be able to evaluate certain pointer arithmetic operations.

Formally, for all memory region X and for all NonLoc n, m:
Evaluate:

  • Element{X,n} - Element{X,m} => n-m
  • Element{X,n} - X => n
  • X - Element{X,n} => -n
  • Element{X,n} == X => true/falsedepending on n is zero or not.
  • etc.

Diff Detail

Event Timeline

steakhal created this revision.Jul 28 2020, 2:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 2:23 AM
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 28 2020, 3:02 AM
Harbormaster failed remote builds in B65978: Diff 281152!
steakhal requested review of this revision.Jul 28 2020, 7:42 AM

Manually force request review, to start a discussion about this.

Although I managed to implement this in a way to pass the tests, but the logic became somewhat messy, that's why I stick to this dummy patch.
I'm gonna clean that up and update the diff, if you think that such handing of memregions are valid.
Sorry for the cfe-dev spam, I did not know that I can manually request review.

I think adding code snippets that are affected by this patch would make it much easier to evaluate the changes and whether this is a good idea or not.

NoQ added a comment.Jul 28 2020, 12:52 PM

I think the overall idea is correct.

Yup, the patch will need the new positive tests that started passing. If you can't demonstrate it with integration tests you should do unit tests because the contract you're trying to achieve is fairly clear and obviously correct and it makes sense to test it separately.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1092

Symbolic regions aren't special. Any other parent region should work similarly.

1095–1096

You'll have to multiply by the size of a single element - it's an index, not an offset.

1103–1105

Yeah, sounds like we'll have to stick to UnknownVal for now when RightIndex is symbolic. You can easily compute unary minus for a concrete index though, and that'll probably be your most important case anyway.

steakhal planned changes to this revision.Jul 28 2020, 2:05 PM

Thank you all for the comments.

In D84736#2179828, @NoQ wrote:

I think the overall idea is correct.

So I will keep going in this direction, you can expect the more complex version of this.
I plan to reuse this revision.

So we can formalize this logic like this:
for all MemReg, and Y: &Element{MemReg,Y,char} - &MemReg => Y

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1095–1096

Nice catch.

1103–1105

I managed to get around this by turning the unary minus into a binary substitution from zero (using ArrayIndexTy type).
Although I should have improved the SimpleSValBuilder::minus function to handle other svals besides concrete ints instead.

steakhal updated this revision to Diff 281556.Jul 29 2020, 6:52 AM
steakhal marked 3 inline comments as done.

Reworked ElementRegion handling.
Previously it handled only element regions in the form of: Element{X,n} OP Element{X,m}
Now supports Element{X,n} OP X and X OP Element{X,n} as well.
Still misses nested ElementRegions though like: Element{X,n} OP Element{Element{X,k},m}.

No tests yet.
Regression test vs unit test - which should I prefer?

Great job, I'm on board with this change!

I think that our code is a bit under commented when it comes to the logic inside of functions, so I would really appreciate if we can add a good chunk of comments here!

Also, I believe integration tests would be pretty much straightforward with clang_analyzer_eval.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1035–1038

I would prefer having a comment for this line with a very simple formula of the thing we are calculating. I think it can increase the readability of the code because when there is a call accepting a whole bunch of arguments it is never obvious and it always takes time to parse through.

1041

This comment is a bit deceiving IMO. It returns a concrete value even when HasSameMemRegions is false, but from the comment it seems like we evaluate the operator ONLY when HasSameMemRegions is true.

1068

I think it's better to add comments for each case describing what is the situation we handle here in a simplified form.
And for each return within the case - a short comment with the reason.

steakhal planned changes to this revision.Aug 5 2020, 3:40 AM
steakhal marked an inline comment as done.
steakhal added inline comments.
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1035–1038

Sure, I will add something like this:

// T buf[n];   Element{buf,2} --> 2 * sizeof(T)
1041

No, the comment is up to date.
This lambda handles both equality and inequality operators, and this is why its called EvaluateEqualityOperators.

EQResult will be true only if the two memory regions are the same and the index is zero.
The result of the inequality is just the negated version of the result of equality.

1068

I'm not sure if it's necessary.
I have already described these 3 cases at the beginning.
Quote:

// Handle: \forall MemRegion X, \forall NonLoc n, m:
//  - Element{X,n} OP Element{X,m}
//  - Element{X,n} OP X
//  -            X OP Element{X,n}
// We don't handle here nested ElementRegions like in the this expression:
// Element{Element{x,3,int [10]},5,int}  ==  Element{x,35,int}

If you still insist, I will defenately add more comments though.

vsavchenko added inline comments.Aug 5 2020, 6:15 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1035–1038

Yeah, something like this would be good. I would prefer more explicit version of it simply because Stmt; Smth can be a bit confusing for notation.

1041

If both have the same and the index has a concrete value, we can evaluate equality operators.

In this setting, I assume that we talk about "if and only if", and in any case other than "both have the same and the index has a concrete value" we can't evaluate equality operators.
Also, I didn't say that it is necessarily incorrect, I said that it is deceiving because if it confused me, it can confuse other people as well.

I would prefer something more like this:

// For a situation, a OP b + index, where OP is equality operator, 
// we can infer the result for known `index` values.
// We are able to do this because ...
1068

In that comment, you don't say anything about OP and how we handle those.

Also I want to add that you probably want to add an arbitrary MemRegion Y as well because you handle not only cases when MemRegions are same.

steakhal updated this revision to Diff 286318.Aug 18 2020, 10:08 AM
steakhal marked 4 inline comments as done.
steakhal retitled this revision from [analyzer][RFC] Handle pointer difference of ElementRegion and SymbolicRegion to [analyzer] Handle pointer difference of ElementRegion and SymbolicRegion.
steakhal edited the summary of this revision. (Show Details)
  • Refined documentation comments as noted.
  • Added tests.
  • Removed the complicated ByteOffsetOfElement lambda.
  • Rename revision.

Before this patch, only these reported Unknown instead of the currently expected value.
Besides that all tests passed as-is on master:

clang_analyzer_dump_int(p - p0);       // expected-warning {{0 S32b}}
clang_analyzer_dump_int(p - p1);       // expected-warning {{-1 S32b}}
clang_analyzer_dump_int(p - pn);       // expected-warning-re {{0 - (reg_${{[0-9]+}}<int n>)}}
clang_analyzer_dump_int((p + 1) - p);  // expected-warning {{1 S32b}}

// Swapped operands:
clang_analyzer_dump_int(p0 - p);       // expected-warning {{0 S32b}}
clang_analyzer_dump_int(p1 - p);       // expected-warning {{1 S32b}}
clang_analyzer_dump_int(pn - p);       // expected-warning-re {{reg_${{[0-9]+}}<int n>}}
clang_analyzer_dump_int(p - (p + 1));  // expected-warning {{-1 S32b}}

Further notes:
Element{X, n, Ty1} and Element{X, m, Ty2} should compare equal if and only if the n * sizeof(Ty1) equals to n * sizeof(Ty2).
However, previously it did not take the size of the types into account (there is the corresponding FIXIT).
I'm not fixing this either for now.

The analyzer returns Unknown for this call:
clang_analyzer_dump_int((p + 1) - q);
However, IMO it should hold the expression 'p+1+q' instead - regardless of p alias (or not) the same memory region of p
There is a FIXME in the testcode for this.

steakhal added inline comments.Aug 18 2020, 10:12 AM
clang/test/Analysis/pointer-arithmetic.c
88

The suggested expression should be p+1-q.

100

The suggested expression should be q-p+1.

steakhal added inline comments.Aug 18 2020, 10:19 AM
clang/test/Analysis/pointer-arithmetic.c
88

xD more like (p+1)-q

100

q-(p+1)