This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][RFC] Complete rewrite of ArrayBoundCheckerV2
Needs ReviewPublic

Authored by steakhal on Sep 26 2020, 7:39 AM.

Details

Summary

The main principle remained the same as the original ArrayBoundCheckerV2.
However, now during the simplification of 0 <= subscript-expr < extent we will assume that no overflow/underflow could have happened during the evaluation of the outer-most operator of subscript-expr. This will aggregate such assumptions into the State.

The comments in the code should make the algorithm clear enough. Besides that, I will answer any questions that might arise.


Plans

  • Add more tests, reorganize current ones.
  • Substitute the template machinery of the out-of-bounds-v2-false-positive-hunter.cpp test file to make it run significantly faster [1].
  • Improve the readability of the bug report using NoteTags or BRVisitors - yet to be decided which and how to implement that.
  • Update docs to reflect the current implementation.
  • Document that this checker uses a heuristic.

NoteTags/BRVisitor
I'm planning to come up with something to make the reports more readable. It could be a visitor or NoteTags, I haven't decided yet.
However, which makes this quite heroic is showing why a given expression is thought to be non-overflowing/under-flowing.
I don't have any source location beside the access itself. Imagine a pointer, which is modified line-by-line to point to the final destination which happens to be out-of-bound.
We check when it is dereferenced, but I can not put a NoteTag to the origin of that modification saying that 'assuming that this expression doesn't overflow/underflow'.
How can I implement meaningful diagnostics accomplishing this?

Limitations
If the extent is symbolic, but the access is concrete, then we will not constrain the symbolic extent to draw the access valid.
One might implement this later though.

This patch supersedes D86873 and D86874, but implements the very same logic. I will abandon them if you feel this approach is better.


[1] I will rewrite the templated test-cases since the current implementation takes too much time to run.
The analyzer would have to simulate all the template magic just to get to the interesting memory access, which is an unacceptable slowdown IMO.
Currently, that single test file takes about 2.7 seconds to run.
Also, if any test-case fails, pretty hard to find the one that failed since warning locations shared across test-cases xD
Just to tackle the latter, I came up with a new debug expression checker that dumps the value of PRETTY_FUNCTION if a bug report with a given description is on the flight.
If you are interested, I can upstream that as well.

Diff Detail

Event Timeline

steakhal created this revision.Sep 26 2020, 7:39 AM
steakhal requested review of this revision.Sep 26 2020, 7:39 AM
NoQ added a comment.Sep 27 2020, 1:43 PM

So, ArrayBoundCheckerV3 then?

We already have a similar simplification facility in SValBuilder created to solve the similar problem with iterator checkers. It fires up when it knows that the values it works with are limited to roughly 1/4 of their type's range and therefore none of the individual operations over them can potentially overflow (cf. D35109). It's currently off by default because performance was not properly evaluated and none of the on-by-default checkers rely on it. This is currently the intended approach to such issues. It was decided that constructing a custom solver for "non-overflowing but still bounded" arithmetic was not the right thing to do, mostly because it absolutely doesn't correspond to the actual run-time behavior of the program that we're supposed to be modeling.

Separately, I'd like to once again bring up that the problem you're solving with this patch is relatively minor compared to bigger problems with this checker. One bigger problem is that this checker amplifies our lack of loop widening (i highly doubt that the existing alpha loop widening feature solves any of these, though i didn't try; it has to be really good loop widening in order to be effective). The checker has massive false positives because of just that. Like, it only deals with small concrete values, not much solving, but even then it's all wrong, just because the loop isn't executed the right number of times.

steakhal added a comment.EditedSep 28 2020, 1:37 AM
In D88359#2297074, @NoQ wrote:

So, ArrayBoundCheckerV3 then?

:D

We already have a similar simplification facility in SValBuilder created to solve the similar problem with iterator checkers. It fires up when it knows that the values it works with are limited to roughly 1/4 of their type's range and therefore none of the individual operations over them can potentially overflow (cf. D35109).

I know, I have seen that patch, even proved the correctness of the main idea in Z3.
Essentially I'm trying to do the same, but with an ad-hoc calculated upper and lower bound. Constraining any subexpressions of the subscript expression to be in a made-up range like 1/4 (or anything else) would result in unsound constraints and assumptions. So I still think my approach is the only sound one [*].

It's currently off by default because performance was not properly evaluated and none of the on-by-default checkers rely on it. This is currently the intended approach to such issues. It was decided that constructing a custom solver for "non-overflowing but still bounded" arithmetic was not the right thing to do, mostly because it absolutely doesn't correspond to the actual run-time behavior of the program that we're supposed to be modeling.

What if we don't want to model the langue but rearrange an inequality to have a different form?
This checker did always this rearrange, expressing x in 0 <= x * 2 + 3 < 8 via transforming into -1 <= x < 3. Using this the checker could have infered that x must be in range [-1, 2] to make the dereference valid.
However, we can not make such a transformation without having the appropriate constraints. Such as that no subexpression can overflow/underflow during the transformation AND we must evaluate the constant folding in a mathematical sense (signed, no-wrapping).

Separately, I'd like to once again bring up that the problem you're solving with this patch is relatively minor compared to bigger problems with this checker. One bigger problem is that this checker amplifies our lack of loop widening (i highly doubt that the existing alpha loop widening feature solves any of these, though i didn't try; it has to be really good loop widening in order to be effective). The checker has massive false positives because of just that. Like, it only deals with small concrete values, not much solving, but even then it's all wrong, just because the loop isn't executed the right number of times.

I haven't touched loop-widening, I will have a look.
By the same token, I also wanted to evaluate some reports.

[*]: Assuming that no overflow/underflow could happen in any subexpression (as a heuristic) might introduce false assumptions especially if -fwrapv compiler option made signed wrapping well defined or the expression is of unsigned type which is by definition could wrap.

balazske added inline comments.Sep 29 2020, 8:27 AM
clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
61

Should not the ExtendedSigned be in the true branch?

87

I do not like these function names. getCommonSignedTypeToFit, isRepresentableBySigned, getAbsWithoutWrapping is better to have the common "start verbs" in the names.

241

std::move is not needed here

269

Function name should start with lower case.

285

The moves here are not needed.

Thank you for your time @balazske!
Your catches were really valuable.

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
61

You are right. Thanks.

87

You are probably right about that.

241

But definitely not hurt :D
I quite like to use move on by-value parameters.
This way the implementation could omit the increment decrement even if the implementation currently doesn't exploit this.

269

Thanks.

285

I'm not convinced about that.
I think without move we would do a copy there - which could be expensive in the case of APSInts.
Here is an example demonstrating this: https://godbolt.org/z/E5dqWb
I know that currently APSInt does not support move operations, but I already plan adding that.