This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] NoStoreFuncVisitor: Suppress bug reports with no-store in system headers.
ClosedPublic

Authored by NoQ on Apr 1 2019, 5:35 PM.

Details

Summary

Hmm, i literally patched the same line of code a few days ago in D59901. That's fairly accidental.

NoStoreFuncVisitor is mostly attached to uninitialized value reports and is responsible for adding path notes within (a.k.a. disabling pruning of) inlined calls that could have initialized the memory region but didn't end up doing it. It emits notes that say Returning without writing to ... at the respective return sites of such calls.

George decided to suppress the note for calls into system headers. I guess the reason was that it was otherwise too loud. After all, it's an un-pruning effort - it can cause massive bring-ins of inlined calls into reports.

However, when the note is suppressed, the very original issue that caused us to write this visitor bites us again: the report becomes incomprehensible. Here's a specific example i'm looking at:

#include <iostream>

void use(char);

void foo() {
  char A;
  std::cin >> A;
  use(A); // Use of uninitialized variable?!
}

This is a "true" positive: there are a bunch of failure modes in std::cin that may lead to not initializing the variable and the developer has to check for them before using the variable. However,

  • You'll never be able to understand that from the report;
  • Even if it's true, the user would most likely still not bother fixing it unless it's a security-critical application.

What are our options here?

  1. We can model operator>>(). We can either model it as something that always initializes the value to an unknown (and possibly tainted) value, or force a state split with a specific note about the contract of the operator, such as Value A may remain uninitialized when B is called (ρ=0.56), given C, assuming D and under E conditions (we'll have to also make sure that these preconditions are compatible with the current state). That's the ideal solution because it gives us the perfect modeling that we want and gives us ultimate flexibility.
  2. We can disable inlining of operator>>() and maybe other system header functions that take out-parameters.
  3. We can suppress bug reports that would have caused the no-store visitor to emit its notes in system headers.

In this patch i'm implementing solution 3.

Solution 1 is not incompatible with solutions 2 and 3; it can be incrementally added on top of either solution 2 or solution 3 as a more man-hour-expensive incremental improvement (suppress inlining/reporting but also unsuppress by modeling a few known functions explicitly).

Solution 2 is similar to our container inlining heuristic: just don't bother inlining 'cause we'll never understand what's going on anyway. I ended up hating that heuristic and dreaming of carefully removing it and replacing it with visitor-based suppressions such as the solution 3. By suppressing inlining, we have all of the downsides of the conservative evaluation: we end up exploring obviously infeasible execution paths because we're losing information about the program. Solution 3 is more targeted: it only suppresses reports of a specific checker for which the problem has actually manifested rather than all checkers, it doesn't cause arbitrary unpredictable coverage skew, and it doesn't destroy any valid information that we managed to obtain during inlining.

The patch is trivial and mostly consists of inevitable renaming functions and variables. There's one interesting gotcha though: if the function has no branches whatsoever, disable the suppression. Like, if the function unconditionally fails to initialize anything, the developer probably knows about that. I think we should do more of such un-suppressions. This was inspired by a test case in test/Analysis/new.cpp that otherwise regressed:

200 int testNoInitializationPlacement() {
201   int n;
202   new (&n) int; // Doesn't initialize 'n'!
203
204   if (n) { // expected-warning{{Branch condition evaluates to a garbage value}}
205     return 0;
206   }
207   return 1;
208 }

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ created this revision.Apr 1 2019, 5:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 5:35 PM
NoQ edited the summary of this revision. (Show Details)Apr 1 2019, 5:35 PM

(whoops, forgot Alexey)

Charusso accepted this revision.Apr 4 2019, 11:32 AM

It is very good to try one improvement in another similar function.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
364 ↗(On Diff #193211)

R was cool, that is our unspoken naming convention. What about CallR/CallRegion and possibly CallSVal? If I am right usually the BugReport object is named BR because of our regions.

547 ↗(On Diff #193211)

Update that comment?

552 ↗(On Diff #193211)

What about tryToEmitNote()? This 'or' condition is very uncommon for a function name. Also there is another R what could be BR.

567 ↗(On Diff #193211)

The Phabricator comment is better because it has a cool example. What about merging that into this one?

clang/test/Analysis/no-store-suppression.cpp
3 ↗(On Diff #193211)

Could you inject a link for the diff or copy the information for further improvements why no diagnostic happen?

This revision is now accepted and ready to land.Apr 4 2019, 11:32 AM
NoQ updated this revision to Diff 193828.Apr 4 2019, 7:39 PM
NoQ marked 6 inline comments as done.

Address comments. Thanks!

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
364 ↗(On Diff #193211)

My unspoken (well, not anymore, i guess) naming convention usually goes like this:

  • B for BinaryOperator (emphasis on binary-ness, U for unary so they don't conflict)
  • R for BugReport (emphasis on being a report)
  • BR for BugReporter (for the lack of better name),
  • MR for MemRegion (specific classes of regions go like VR, FR, TR, SR, or, well, BR)
  • BRC for BugReporterContext (emphasis on being a context)

Also V for SVal (emphasis on value) because S for Stmt.

clang/test/Analysis/no-store-suppression.cpp
3 ↗(On Diff #193211)

Links to diffs are rarely included in the source code because they're usually two clicks away anyway: just do git blame and see the bottom of the commit message.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2019, 1:17 PM