This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix wrong calculation of offset in ArrayBoundsV2
AbandonedPublic

Authored by danielmarjamaki on Oct 18 2017, 6:03 AM.

Details

Summary

Example code:

void test3_simplified_offset(int x, unsigned long long y) {
  int buf[100];
  if (x < 0)
    x = 0;
  for (int i = y - x; i > 0 && i < 100; i++)
    buf[i] = 0; // no-warning
}

Without this patch Clang will wrongly report this FP:

File out-of-bounds.c Line 144: Out of bound memory access (accessed memory precedes memory block)

There is some bug in the getSimplifiedOffsets() calculations. I removed the wrong calculations and this does not break any existing tests so either no tests were written in the first place or these calculations got redundant sometime. If somebody wants to readd the calculations that I remove.. I am not against that if some tests are added and it does not break my test.

Diff Detail

Repository
rL LLVM

Event Timeline

xazax.hun edited edge metadata.Oct 19 2017, 3:25 AM

I checked what happens:

The checker would like to solve the following (I inspect the branch when x == 0 ):
((reg_$1<unsigned long long y>) + 1) * 4 <= 0
The getSimplifiedOffsets function kicks in and simplifies the expression above to the following:
(reg_$1<unsigned long long y>) <= -1

The analyzer also know that the value of y is within [1,98].

The source of the problem is that the simplified expression is evaluated after the right-hand side is converted to an unsigned value which will be greater than the max value of y.

I think we did not regress after omitting some of the computation because the analyzer's default constraint manager handles the case when there is a constant addition/subtraction next to the symbol. So if we lose something with this modification, we could probably observe that using multidimensional arrays.

Do you mind writing some tests with multidimensional arrays to check what do we lose if we remove that code?
Also, how hard would it be to correct the calculation for unsigned values?

MTC added a subscriber: MTC.Oct 23 2017, 8:05 PM

Do you mind writing some tests with multidimensional arrays to check what do we lose if we remove that code?

I have spent a few hours trying to write a test case that shows there is false negatives caused by this change. And fail.

I see lots of false negatives for multidimensional arrays with or without this code.

For instance:

void f(int x) {
  int buf[2][3];
  if (x < 4 || x>10)
    return;
  buf[1][x] = 0;
}
NoQ edited edge metadata.Oct 30 2017, 3:37 AM
// TODO: once the constraint manager is smart enough to handle non simplified
// symbolic expressions remove this function. Note that this can not be used in
// the constraint manager as is, since this does not handle overflows. It is
// safe to assume, however, that memory offsets will not overflow.

Wasn't safe enough, i guess. This is fairly similar to D35109, so someone would have to eventually do some convincing math to either prove that some sort of "forget about overflows" approach is indeed safe, or avoid overflows properly, or handle overflows properly. I feel that it's already clear that quick intuition-based solutions don't quite cut it when there are a lot of different types, signednesses, promotion rules, signed/unsigned overflows, and signed/unsigned extensions involved.

In D39049#910482, @NoQ wrote:
// TODO: once the constraint manager is smart enough to handle non simplified
// symbolic expressions remove this function. Note that this can not be used in
// the constraint manager as is, since this does not handle overflows. It is
// safe to assume, however, that memory offsets will not overflow.

Wasn't safe enough, i guess.

I agree, we should remove/alter this part of the comment since it is not true.

@danielmarjamaki
Could you do a similar analysis that I did above to check why does this not work for the multidimensional case? (I.e.: checking what constraints are generated and what the analyzer does with them.)

Could you do a similar analysis that I did above to check why does this not work for the multidimensional case? (I.e.: checking what constraints are generated and what the analyzer does with them.)

the "location.dump()" will just say "x".

the ProgramState is:

Expressions:
(0x2acd12a78a0,0x2acd1478b80) buf : &buf
(0x2acd12a78a0,0x2acd1478bf8) buf : &element{buf,0 S64b,int [3]}
(0x2acd12a78a0,0x2acd1478c10) buf[1] : &element{buf,1 S64b,int [3]}
(0x2acd12a78a0,0x2acd1478c38) x : &x
(0x2acd12a78a0,0x2acd1478c88) buf[1] : &element{element{buf,1 S64b,int [3]},0 S64b,int}
Ranges of symbol values:
reg_$0<int x> : { [4, 10] }

rawOffsetVal => 0

extentBegin => 0

For getSimplifiedOffset() , the offset is not a SymIntExpr it will just return 0.

xazax.hun added a comment.EditedNov 16 2017, 2:18 AM

Could you do a similar analysis that I did above to check why does this not work for the multidimensional case? (I.e.: checking what constraints are generated and what the analyzer does with them.)

Expressions:
rawOffsetVal => 0

extentBegin => 0

For getSimplifiedOffset() , the offset is not a SymIntExpr it will just return 0.

So what are the arguments that are passed to getSimplifiedOffset() in that case? 0? That does not seem to be correct.

> So what are the arguments that are passed to getSimplifiedOffset() in that case? 0? That does not seem to be correct.

yes.

so the conclusion is:

  • this code does not work
  • this code is untested
  • this code is not even used in the use cases it was intended for because of bugs elsewhere

therefore it should be removed.

xazax.hun requested changes to this revision.Nov 21 2017, 4:14 AM

> So what are the arguments that are passed to getSimplifiedOffset() in that case? 0? That does not seem to be correct.

yes.

Maybe I am missing something but this does not seem to be correct for me. We should either have the correct symbolic expression or UnknownVal if the analyzer cannot reason about something (or UndefVal if the result is undefined). But never the wrong value.
If you are sure that this happens, this needs to be investigated.

so the conclusion is:

  • this code does not work
  • this code is untested
  • this code is not even used in the use cases it was intended for because of bugs elsewhere

therefore it should be removed.

I am ok with removing that code AND altering the comment to reflect the changes. Please update this patch accordingly.
Also, I think maybe an easier test case could be added to show what is wrong with the current approach.

This revision now requires changes to proceed.Nov 21 2017, 4:14 AM
danielmarjamaki abandoned this revision.Jan 15 2018, 12:33 AM

I will not continue working on this. Feel free to take over the patch or write a new patch.