This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Bug Reporter Visitor to Display Values of Variables - PRELIMINARY!
AbandonedPublic

Authored by baloghadamsoftware on Jun 22 2017, 5:10 AM.

Details

Summary

This patch is made upon user request. The first example is the following:

int f(int n) {
  return n;
}

int main() {
  int v[3], i;
  
  v[0] = 0;
  v[2] = 2;

  for(i = 0; i<3; ++i) {
    f(v[i]);
  }

  return 0;
}

Here we get warning that parameter v[i] is uninitialized when calling f(). However, it is not clear in the bug path for which i is this true. In the example v[1] is uninitialized. Our patch adds a note to the bug path: Assuming i == 1.

Another user complained about false positive here:

static unsigned size = 32;

int init(int *s);
void assert(int);

static void test(void) {
  int val;
  if (size>10) {
    for (unsigned j = 0; j<size+1; j++)
      init(&val);
    assert((int) (val == 1));
  }
}

At the assert() statement we get warning for garbage value. This looks impossible for the first site, however, with our patch we get a note at the beginning of the loop: Assuming j == 0, size == 4294967295. Now it is clear that val is indeed uninitialized in case of an integer overflow in size.

This patch is still preliminary, only part of the tests is adjusted. First we try to reduce the unnecessary noise, by e.g. removing duplicate messages.

Questions to consider:

  1. Name of the Visitor. Currently it is VariableValuesBRVIsitor, maybe a better one could be found.
  2. Maybe we shoud use is insted of == to make it compatible with ConditionBRVisitor.
  3. Currently we add the note to the last appearance of the variable before the bug. Maybe we should change it to the first appearance where the variable takes its final value.

Diff Detail

Event Timeline

NoQ edited edge metadata.Jun 26 2017, 12:55 PM

In a perfect world, the analyzer's report would work like a debugger: jumping through stack frames, or even through diagnostic pieces, and printing symbolic values of variables at every piece would be really great.

I'm not entirely understanding the behavior you intend to have in this patch. Are you trying to print out values of all variables as diagnostic pieces? This might be a bit of an overkill, because many variables would be irrelevant to the bug. It could probably be better if a new kind of diagnostic pieces is introduced, instead of PathDiagnosticEventPiece, that would be handled by the consumers (text, html, plist) to somehow provide values on demand and avoid clamping up the screen.

Currently, we already highlight the last assignments for the "interesting" variables, which is implemented through, for example, bugreporter::trackNullOrUndefValue() - see how various checkers use it. This is, of course, far from perfect as well, because it's very hard to figure out which variables are of interest.


In the first example, the user sees the "Entering loop body" control flow piece twice, from which it is clear that i is equal to 1. I disagree that an "assuming..." piece should be added here, because there is no assumption being made. The analyzer knows that i is first equal to 0, then it becomes 1; he doesn't need to assume it. A quick discussion on this subject happened in D23300.

That said, i agree that it'd be good to improve the existing "Entering loop body..." diagnostic to contain information about the value of i. The report is understandable without it, but it'd make a nice addition.

In the second example, j is still not being assumed, however upon meeting the condition j < size + 1, we should definitely mention that we assume that size is equal to UINT_MAX - because this is indeed an assumption, and we're failing to display it. It's a bug that definitely needs to be fixed, and the aforementioned D23300 was fixing similar bugs.


To summarize:

  • It's great to improve diagnostics, and you have pointed out some buggy or missing diagnostic pieces that make reports confusing.
  • I'm not sure if we can, or should, afford the overkill of dumping all variables, though in some situations it may indeed be useful.
  • I'm not seeing examples that aren't supposed to be covered by existing diagnostic mechanisms - fixing bugs in them might be equally useful and much easier and safer.
  • I remember seeing such examples before, so, generally, i suspect that your approach may work, especially with a better UI.
  • This is likely to require a more open discussion.
zaks.anna added a subscriber: dcoughlin.
zaks.anna edited edge metadata.Jun 28 2017, 6:51 PM

I agree that we should not print the values of all variables. The users will be overwhelmed with the huge amount of information. It is very valuable to highlight just the right information. (I believe even the current diagnostics often produce too much info and highlighting the more important notes would be very valuable.) However, the examples you presented are very compelling. Are there ways we could highlight the same information without printing values of all variables?

For example, for the overflow case, we could experiment with printing notes along the path at the locations a variable overflows. This would be very valuable for the array overflow alpha checker, which often flags the bugs that only occur if an integer value along the path overflows. I am not sure how noisy this approach will be. If it is too noisy, we could refine this further.

For the uninitialized variable example, we could have some pattern-matching logic that would check if the expression is an array element and if so print the value of the index. (Another problem with variables that are failed to be initialized in a function is that we do not display the path on which the variable is not initialized, making it hard to understand the reports. Though, that is a separate problem.)

whisperity edited the summary of this revision. (Show Details)Jun 29 2017, 2:54 AM
In D34508#791048, @NoQ wrote:

Currently, we already highlight the last assignments for the "interesting" variables, which is implemented through, for example, bugreporter::trackNullOrUndefValue() - see how various checkers use it. This is, of course, far from perfect as well, because it's very hard to figure out which variables are of interest.

I think concentrating on "interesting" things would be a great addition to this patch. So in case an elementRegion is marked as interesting, we could also mark the index as interesting automatically and later print out the relevant information only for interesting symbols, regions.

I am considering to restrict the assumptions to nodes marked as interesting and to the location of the bug. However, I have difficulties with the latter, it seems that the bug location itself is not part of the bug path.

test/Analysis/bug_hash_test.m