This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Produce symbolic values for C-array elements
Needs ReviewPublic

Authored by ASDenysPetrov on Jun 5 2020, 5:00 AM.

Details

Summary

Problem:
The issue is that UnknownVal is produced for an array element when it is used in expressions with unknown bounds and unknown index. Thus it doesn't bind in the list of Expressions and never be used twice then.

Solution:
Produce symbolic values for array elements instead of UnknownVal. This also enables to bind these values and use them later in the next expressions.

This fixes https://bugs.llvm.org/show_bug.cgi?id=9289

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Jun 5 2020, 5:00 AM
NoQ added inline comments.Jun 6 2020, 11:48 AM
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1709

As soon as contents of the array change during analysis, this becomes incorrect: we cannot keep denoting a new value with the same old symbol.

Added more tests changing an array element.

ASDenysPetrov marked an inline comment as done.Jun 9 2020, 3:48 AM
ASDenysPetrov added inline comments.
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
1709

I've updated the patch, adding more tests with the case you mentioned. Please, look at them, especially at functions last four functions. Is this your case? If so, then I tested them and they passed successfully. And if I'm wrong, please, explain what I missed more detaily.

NoQ added a comment.Jun 18 2020, 7:04 AM

Here are some examples:

void test1(int *a, int index1, int index2) {
  int x = a[index1];
  a[index2] = 0;
  int y = a[index1];

  // In reality we don't know but after your patch
  // we're confident that this is "TRUE".
  clang_analyzer_eval(x == y);
}

void foo(int *a); // Potentially modifies elements of 'a'.

void test2(int *a, int index) {
  int x = a[index];
  foo(a);
  int y = a[index];

  // In reality we don't know but after your patch
  // we're confident that this is "TRUE".
  clang_analyzer_eval(x == y);
}

@NoQ, thanks for the examples.

I didn't get the first one. How do you get to the "In reality we don't know", if we don't touch a[index1]:

void test1(int *a, int index1, int index2) {
  int x = a[index1];
  a[index2] = 0;
  int y = a[index1];

  // In reality we don't know but after your patch
  // we're confident that this is "TRUE".
  clang_analyzer_eval(x == y);
}

I worked on the second case. I found some possible way to resovle it:

void foo(int *a); // Potentially modifies elements of 'a'.
void fooRef(int *&a); // Potentially modifies elements of 'a'.

void test2(int *a) {
  // a - &SymRegion{reg_$6<int * a>}
  foo(a); // after this, CSA doesn't change a.
  // a - &SymRegion{reg_$6<int * a>}
  fooRef(a); // after this, CSA changes a.
  // a - &SymRegion{conj_$10{int *, LC1, S925, #1}}
}

Here we need to make foo behave the same as fooRef. What do you think?

alexfh removed a reviewer: alexfh.Aug 10 2020, 8:48 AM

@NoQ, thanks for the examples.

I didn't get the first one. How do you get to the "In reality we don't know", if we don't touch a[index1]:

If index1 and index2 has the same value, we should not be confident that the x == y holds.

void foo(int *a); // Potentially modifies elements of 'a'.
void fooRef(int *&a); // Potentially modifies elements of 'a'.

void test2(int *a) {
  // a - &SymRegion{reg_$6<int * a>}
  foo(a); // after this, CSA doesn't change a.
  // a - &SymRegion{reg_$6<int * a>}
  fooRef(a); // after this, CSA changes a.
  // a - &SymRegion{conj_$10{int *, LC1, S925, #1}}
}

I don't think this is the right way expressing this.
AFAIK this means that we don't know where the resulting pointer points to - which is not true.
We know where it points to (aka. the pointer's value is conjured), but we don't know what the value if there.

Unfortunately, I don't know the solution to your problem, but I'm pretty confident that this is something else.
However, I'm really interested in where this discussion goes.


I feel like this problem should be handled by some sort of invalidation mechanism.
Take my comments on these with a pinch of salt though - I'm really not experienced in this :|

clang/test/Analysis/PR9289.cpp
13–21

Why do we test this?
Here we can reason about the index.
This patch preserves the current behavior relating to this.

36–40

I'm not sure why do you overwrite the index if you could simply index with index2 instead in the following expression. That would make no difference, only make the test more readable IMO.
Same comment for all the`*_change` test cases.

@steakhal

If index1 and index2 has the same value, we should not be confident that the x == y holds.

Thanks! Now I see. Shame on me =)

We know where it points to (aka. the pointer's value is conjured), but we don't know what the value if there.

Absolutely right. I looked at this patch after a while and don't currently see a proper solution.

I feel like this problem should be handled by some sort of invalidation mechanism.

Definitely it should be some criteria/mark/flag about the region invalidation. Maybe someone more confident could prompt us how to.

clang/test/Analysis/PR9289.cpp
13–21

This is a root case of the bug which this patch came from. I decided to vary it by using an explicit index.

36–40

My intention is to retain index inside brackets. Just to check that index changed indeed.

Hey, folk, welcome to https://reviews.llvm.org/D85984
I've moved the logic of this checker in PthreadLockChecker

Should this revision be closed or rejected somehow?

We know where it points to (aka. the pointer's value is conjured), but we don't know what the value if there.

Absolutely right. I looked at this patch after a while and don't currently see a proper solution.

I feel like this problem should be handled by some sort of invalidation mechanism.

Definitely it should be some criteria/mark/flag about the region invalidation. Maybe someone more confident could prompt us how to.

How about using the mistical SymbolDerived?
The clang-analyzer-guide says:

Another atomic symbol, closely related to SymbolRegionValue, is SymbolDerived. It represents a value of a region after another symbol was written into a direct or indirect super-region. SymbolDerived contains a reference to both the parent symbol and the parent region. This symbol is mostly a technical hack. Usually SymbolDerived appears after invalidation: the whole structure of a certain type gets smashed with a single SymbolConjured, and then values of its fields become represented with the help of SymbolDerived of that conjured symbol and the region of the field. In any case, SymbolDerived is similar to SymbolRegionValue, just refers to a value after a certain event during analysis rather than at start of analysis.

Could we use that here @NoQ?


Should this revision be closed or rejected somehow?

I'm not sure about that.
I think it still has some value. Especially these two test-cases:

int index_sym(const int *a, int index) {
  int var;
  int ret = 0;
  if (a[index] < 2)
    var = 1;

  // Since we did not write through a pointer, we can be sure that the values bound to 'a' are still valid.
  // (We should also take strict-aliasing into account if -fno-strict-aliasing is not enabled.)
  //
  // We should take this branch if it was taken previously.
  if (a[index] < 2) {
    // FIXME: We should not warn here.
    ret = var; // expected-warning{{Assigned value is garbage or undefined [core.uninitialized.Assign]}}
  }
  return ret;
}

int index_sym_invalidated(const int *a, int index, int index2) {
  int var;
  int ret = 0;
  if (a[index] < 2)
    var = 1;

  a[index2] = 42; // Store a value to //somewhere// in 'a'.
  // A store happens through a pointer which //may// alias with the region 'a'.
  // So this store might overwrite the value of 'a[index]'
  // Thus, we should invalidate all the bindings to the region 'a'.
  // Except the binding to 'a[index2]' which is known to be 42.
  if (a[index] < 2) {
    // This warning is reasonable.
    ret = var; // expected-warning{{Assigned value is garbage or undefined [core.uninitialized.Assign]}}
  }
  return ret;
}