This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] mark returns of functions where the region passed as parameter was not initialized
ClosedPublic

Authored by george.karpenkov on Jan 8 2018, 6:50 PM.

Details

Summary

In the wild, many cases of null pointer dereference, or uninitialized value read occur because the value was meant to be initialized by the inlined function, but did not, most often due to error condition in the inlined function.
This change highlights the return branch taken by the inlined function, in order to help user understand the error report and see why the value was uninitialized.

rdar://36287652

Diff Detail

Repository
rL LLVM

Event Timeline

The case which I don't know how to handle for know is "void" functions without a return statement. Where the note should be attached in those cases?

NoQ added a comment.EditedJan 8 2018, 7:00 PM

Where the note should be attached in those cases?

At the closing brace, i guess. You can actually match for the CallExitBegin program point - regardless of the return statement.

This is great!

I think you'll also want to handle the case where the region of interest is a field and the containing struct/class is passed to the function (same for when the region of interest is an array element). Subregion::isSubRegionOf() will probably be helpful there.

You'll also probably want to handle when the region of interest is a sub region of 'this'/'self'.

If we find that too many notes are being emitted (once you handle 'self' I think that will be a real possibility) you may want to add a heuristic that only displays the notes when a function has control flow or when some other path may write to the region of interest. (This last could be done with a simple, separate, flow-insensitive syntactic analysis).

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
166 ↗(On Diff #129022)

It would be good to extend the comment to say that it emits the diagnostic note when the region of interested is a *passed to a function* but the function doesn't store into it.

187 ↗(On Diff #129022)

Do you want to only add this note once per path? Or could there be multiple function calls per path that we would want to show to the user?

194 ↗(On Diff #129022)

Note that ObjCMethodDecl isn't a subtype of FunctionDecl, so your later logic won't kick in for ObjC methods.

You can use the CallEvent interface to abstract over this. See visitNodeMaybeUnsuppress() for an example.

200 ↗(On Diff #129022)

Note that the parent context of the current context may not always be context of the caller. It could be a block invocation context or (in the near future) any other lexical scope of interest.

You should get the parent of SCtx instead (or, really, just use CallEvent instead).

205 ↗(On Diff #129022)

I think it would be good to not emit the note for const parameters since it is unlikely that the programmer expects those to update the region.

227 ↗(On Diff #129022)

The convention is to start the note with a capital letter.

Also, we typically want the note to describe something that happens at that particular program point (in contrast with something that didn't happen previously along the path).

I would suggest something like "Returning without writing to '*p'".

A nice QoI improvement would be to specialize this for the uninitialized case: "Returning without initializing "'*p'", but that should probably wait until later.

228 ↗(On Diff #129022)

Note the difference between writing to 'p' and writing to '*p'. You'll probably want to pass in the region of interest as well as the parameter to construct a human readable description -- although in some cases you probably won't be able to unambiguously come up with one.

test/Analysis/no-store-func-path-notes.m
1 ↗(On Diff #129022)

I think it would be fine to collapse these tests all into a single .mm file, but that's up to you.

I like this patch, only found a few nits after the other's review.

I have one high-level comment though regarding how could the analyzer be more user-friendly (and only slightly related to this patch). It is really good to have additional notes to aid the understanding of the user but having redundant notes, on the other hand, can make it harder to understand the bugs. My main concern is that we use UndefVal for both uninitialized values and values that are undefined due to an operation like shift by a negative value. I wonder if we would like to use the same set of bug path visitors in both cases.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
173 ↗(On Diff #129022)

Nit: extra space before public.

190 ↗(On Diff #129022)

Asterisk on the wrong side.

202 ↗(On Diff #129022)

arg should start with an uppercase letter.

NoQ added a comment.Jan 9 2018, 12:04 PM

I have one high-level comment though regarding how could the analyzer be more user-friendly (and only slightly related to this patch). It is really good to have additional notes to aid the understanding of the user but having redundant notes, on the other hand, can make it harder to understand the bugs. My main concern is that we use UndefVal for both uninitialized values and values that are undefined due to an operation like shift by a negative value. I wonder if we would like to use the same set of bug path visitors in both cases.

Btw, do you have any thoughts on treating UndefinedVal as a sub-kind of taint? Like, replace it with a normal symbol (eg. SymbolRegionValue for uninitialized locals, or SymIntExpr for $x << 32) and then attaching a taint to it? It should help tracking the values, understanding their origins, and it matches the idea that undefined values, like user inputs, may indeed be equal to any value within their respective ranges (unlike normal values which are only known to be possibly equal to at least one value within the range). However, a lot of this seems redundant because we are trying to immediately catch undefs (there's really no point in propagating them or allowing the user to remove the taint by constraining the symbol to a constant), while taint analysis is not yet even enabled by default, and also we don't have a good value class to represent many undefined values (eg., how do we represent a result of pointer dereference? - we may still want to make some sort of NullPointerRegion for the different purpose of computing null pointer offsets correctly, so then we'd be able to take SymbolRegionValue from it and taint it, but, again, we're not anywhere near there yet).

george.karpenkov marked 6 inline comments as done.
george.karpenkov added inline comments.
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
187 ↗(On Diff #129022)

From what I've seen, the situation you describe is very rare: the most common idiom is along the lines of

S s;
int out = initialize(&s);
s->doStuff(); // optionally forgetting to check the exit code

but then it probably does not hurt to add notes to all functions touching the region.

228 ↗(On Diff #129022)

Pretty-printing MemRegion * results in a name of the argument at the function call site, which is not very understandable in the context of a called function.
@dcoughlin would it make sense to do something like

"Returning without writing to 'p' (x not populated)"

? I think not.

test/Analysis/no-store-func-path-notes.m
1 ↗(On Diff #129022)

Technically these languages have different semantics, but on the other hand it's good not to pollute test/Analysis further.

george.karpenkov marked 3 inline comments as done.Jan 9 2018, 4:16 PM

You'll also probably want to handle when the region of interest is a sub region of 'this'/'self'.
If we find that too many notes are being emitted (once you handle 'self' I think that will be a real possibility) you may want to add a heuristic that only displays the notes when a function has control flow or when some other path may write to the region of interest. (This last could be done with a simple, separate, flow-insensitive syntactic analysis).

I don't think that's a good idea. In fact, I think that's extending this patch to C++/Obj-CPP is already a stretch.

The pattern declare-but-not-initialize-and-pass-address-to-a-func occurs almost exclusively in C,
as in C++ a default constructor is usually invoked (and structs in general are populated using a constructor).

I think you'll also want to handle the case where the region of interest is a field and the containing struct/class is passed to the function (same for when the region of interest is an array element). Subregion::isSubRegionOf() will probably be helpful there.

done

You'll also probably want to handle when the region of interest is a sub region of 'this'/'self'.
If we find that too many notes are being emitted (once you handle 'self' I think that will be a real possibility) you may want to add a heuristic that only displays the notes when a function has control flow or when some other path may write to the region of interest. (This last could be done with a simple, separate, flow-insensitive syntactic analysis).

I don't think that's a good idea. In fact, I think that's extending this patch to C++/Obj-CPP is already a stretch.

The pattern declare-but-not-initialize-and-pass-address-to-a-func occurs almost exclusively in C,
as in C++ a default constructor is usually invoked (and structs in general are populated using a constructor).

The "pass off the address of an uninitialized local" pattern is quite common in Objective-C. It is also quite common to forget to initialize a field in a C++ constructor.

Note that this visitor will be called when the uninitialized value comes from malloc()/new and not just from an uninitialized local whose address is taken.

Also note that this visitor is called for tracking both uninitialized values and also nil/NULL values.

In the spirit of incremental development, I'm fine with holding off on fully supporting structs and C++/Objective-C classes until a later patch. But these are important cases that you'll need to support.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
199 ↗(On Diff #129182)

Should this use 'getArgSVal()' on CallEvent?

229 ↗(On Diff #129182)

Some cases to consider:

void foo(int *p) {
  p = NULL;

  return; // "Returning without writing to 'p'" is not correct since p was written to
}

Also:

struct S {
  int f;
  int g;
};
void foo (struct S *p) {
  p->f = 5;
  // Forget to initialize field 'g'

  return; // "Returning without writing to 'p'" is also misleading.
}
187 ↗(On Diff #129022)

Do you mean the base region or the actual region of interest? It probably doesn't make sense to emit a note for (for example) a setter method that only touches an other field than the field for the region of interest.

228 ↗(On Diff #129022)

I was imagining you would use the MemRegion to construct the series of lvalue projections to get from the base region to the subregion and display that in terms of the parameter.
For example, p->foo.bar[2].blam. Here you would construct ->foo.bar[2].blam from MemRegion and p from the parameter name.

NoQ added a comment.Jan 10 2018, 3:36 PM

This looks good so far, i only have minor nits to add to Devin's review. As for checking that the value wasn't written to (in the null value case), i guess you could scan all program states within the call and see if getSVal(R) returns the same value on all of them. Because i always forget in which direction does the visitor scan the nodes, it'd either be done easily when visiting previous nodes, or would require another loop through nodes, but in any case it shouldn't be too complicated.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
239–240 ↗(On Diff #129189)

I believe that you should (because you can) assert that you're only dealing with SubRegions (or even type-check it). It doesn't make sense to track null or undefined values for memory spaces, only for [their] sub-regions.

test/Analysis/diagnostics/undef-value-param.c
15 ↗(On Diff #129189)

These warnings seem to be out of sync with the patch.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
245 ↗(On Diff #130053)

Any objections on adding this helper to ExplodedNode ? We have it in CheckerContext already, but it's not available for visitors.

NoQ added a comment.Jan 17 2018, 10:42 AM

More nits!~

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
228–229 ↗(On Diff #130081)

I'm for having a TODO for making an additional diagnostic for this situation, i.e. "Assignment does not change the value of p".

231 ↗(On Diff #130081)

Should this also be isSubRegionOf()?

246–247 ↗(On Diff #130081)

I suspect that you might want to take the canonical type first (i.e. unwrap typedefs).

george.karpenkov marked 2 inline comments as done.Jan 17 2018, 5:08 PM
george.karpenkov added inline comments.
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
228–229 ↗(On Diff #130081)

Makes sense, but I'm not really in favor of mixing two completely different tasks in one visitor.

Now with support for constructors.

Handling signatures with no param names correctly.

Update: I have examined the diagnostics on XNU source, there were 9 messages and 7 of them were helpful.
I guess less then we could have hoped for, but helpful still.

Minor refactoring.

Replace quadratic algorithm with linear.

NoQ added a comment.EditedJan 22 2018, 3:40 PM

Replace quadratic algorithm with linear.

Yay, nice. I think it's also much easier to understand now.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
178 ↗(On Diff #130927)

SmallPtrSet? (because it's cute)

230–231 ↗(On Diff #130927)

Please add a comment explaining why is this pre-computation so good. Like, every time we reach CallExit, we'd need information from CallEnter (pointed-to values), which we didn't reach yet, so it's easier to just do this once for the whole path than to do this for every call, and cheaper on memory than adding every call site's store to the program state as a special trait.

369 ↗(On Diff #130927)

I suggest a simple "Not implemented" assertion, because there is a clear intent to support more regions once they actually start showing up.

1346–1348 ↗(On Diff #130927)

cast<SubRegion>(R) would combine the assertion and the cast.

Actually, i'm surprised we don't have MemRegion::castAs<>().

george.karpenkov marked 3 inline comments as done.
george.karpenkov added inline comments.
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
369 ↗(On Diff #130927)

The branch is actually unreachable: otherwise the while loop above would have return false beforehand.

NoQ added inline comments.Jan 23 2018, 12:01 PM
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
369 ↗(On Diff #130927)

Whoops i see.

TODO: for a single parameter, when passed as a pointer, display the star (the function is not writing into "p", the function is not writing into "*p")
TODO2: update the backup text, writing "inside the".

TODO3: heuristics for handling arrays?

I've realized that we don't handle the case where the function parameter is not directly pointing to a region-of-interested, but needs to be dereferenced a couple of times before that.

@dcoughlin I think this is ready to go. The "stars" are aligned correctly, however the default case is still the same. Basically the default case is for unforeseen cases which I currently don't know exist --- normally, it would be an assertion, but it seems to much to crash an analyzer over that.
If the current output is undesirable I think that another good option is just not printing the note in that case at all.

george.karpenkov updated this revision to Diff 134763.
NoQ accepted this revision.Feb 23 2018, 12:47 PM

I think this is ready to go. The "stars" are aligned correctly

LGTM. Hope they're still aligned =)

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
356 ↗(On Diff #134765)

Please add parentheses because nobody remembers operation priority which makes the code hard to understand.

This revision is now accepted and ready to land.Feb 23 2018, 12:47 PM
This revision was automatically updated to reflect the committed changes.