This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix autodetection of getSVal()'s type argument.
ClosedPublic

Authored by NoQ on Sep 28 2017, 7:20 AM.

Details

Summary

In ProgramState::getSVal(Loc, QualType), the QualType parameter, which represents the type of the expected value, is optional. If it is not supplied, it is auto-detected by looking at the memory region in Loc. For typed-value regions such autodetection is trivial. For other kinds of regions (most importantly, for symbolic regions) it apparently never worked correctly: it detected the location type (pointer type), not the value type in this location (pointee type).

Our tests contain numerous cases when such autodetection was working incorrectly, but for existing tests it never mattered. There are three notable places where the type was regularly auto-detected incorrectly:

  1. ExprEngine::performTrivialCopy(). Trivial copy from symbolic references never worked - test case attached.
  2. CallAndMessageChecker::uninitRefOrPointer(). Here the code only cares if the value is Undefined or not, so autodetected type didn't matter.
  3. GTestChecker::modelAssertionResultBoolConstructor(). This is how the issue was found in https://bugs.llvm.org/show_bug.cgi?id=34305 - test case attached.

I added a few more sanity checks - we'd now also fail with an assertion if we are unable to autodetect the pointer type, warning the author of the checker to pass the type explicitly.

It is sometimes indeed handy to put a void-pointer-typed Loc into getSVal(Loc), as demonstrated in the exercise-ps.c's f2() test through CallAndMessageChecker (case '2.' above). I handled this on the API side in order to simplify checker API, implicitly turning void into char, though i don't mind modifying CallAndMessageChecker to pass CharTy explicitly instead.

Diff Detail

Event Timeline

NoQ updated this revision to Diff 116986.Sep 28 2017, 7:20 AM
NoQ created this revision.

Add a forgotten comment.

NoQ updated this revision to Diff 116992.Sep 28 2017, 8:01 AM
NoQ added a subscriber: alexfh.

Add @alexfh's small reproducer test case. It was so small i never noticed it until now!

Thank you for the fix! It seems to be the largest source of SA crashes on our code at the moment.

dcoughlin edited edge metadata.Sep 28 2017, 4:57 PM

This is such a nasty bug! It is great to see a fix. I have two comments inline, one of which is just a nit.

lib/StaticAnalyzer/Core/RegionStore.cpp
1404

Nit: It seems a bit odd to read the first byte here since (unless I'm misunderstanding) this would never be triggered by actual C semantics, only by a checker. Did you consider just returning UnknownVal() in this case?

1410

I think you missed handling the AllocaRegion case from the old version in your new version. This means the assert will fire on the following when core.alpha is enabled:

void foo(void *dest) {
  void *src = __builtin_alloca(5);
  memcpy(dest, src, 1);
}
NoQ updated this revision to Diff 117360.Oct 2 2017, 8:58 AM

Yeah, nice catch. So we need to either always tell the checkers to specify their CharTy when they are dealing with void pointers, or to do our substitution consistently, not only for SymbolicRegion but also for AllocaRegion (did that in this diff).

dcoughlin accepted this revision.Oct 2 2017, 10:23 AM

Looks great to me.

I do wonder if long-term we should consider removing the auto-deduction when loading from the store. On the one hand it is really nice to avoid having to specify that at the call to getSVal(). On the other hand, this can lead to some really pathological weirdness and a bunch of corner-case code. For loads that result from actual program semantics the type of the loaded-from storage (from the loader's perspective) should always be easily available at the load site. There would still be cases where a checker might want to inspect what the store thinks is in memory, but I think that is a different function and in my opinion deserves a separate API.

This revision is now accepted and ready to land.Oct 2 2017, 10:23 AM
NoQ added a comment.Oct 4 2017, 8:59 AM

Whoops forgot to submit inline comments.

lib/StaticAnalyzer/Core/RegionStore.cpp
1404

UnknownVal seems to be quite similar to assertion failure: in both cases we'd have to force the checker to specify CharTy explicitly. But assertions are better because the author of the checker would instantly know that he needs to specify the type, instead of never noticing the problem, or spending a lot of time in the debugger trying to understand why he gets an UnknownVal.

1410

Nice one!

Always been that way though. In the old code, if AllocaRegion is supplied and no type is explicitly specified, cast<SymbolicRegion>(MR) would fail. Also ElementRegion is created in both old and new code if the type was specified.

This revision was automatically updated to reflect the committed changes.
dcoughlin added inline comments.Oct 4 2017, 10:14 AM
lib/StaticAnalyzer/Core/RegionStore.cpp
1404

I think having an assertion with a helpful message indicating how the checker author could fix the problem would be great! But you're not triggering an assertion failure here since you're changing T to be CharTy. Instead, you're papering over the problem by making up a fake value that doesn't correspond to the program semantics. If you're going to paper over the the problem and return a value, I think it is far preferable to use UnknownVal. This would signal "we don't know what is going here" rather than just making up a value that is likely wrong.

NoQ added inline comments.Oct 4 2017, 10:48 AM
lib/StaticAnalyzer/Core/RegionStore.cpp
1404

If you're going to paper over the the problem and return a value, I think it is far preferable to use UnknownVal.

If we return an UnknownVal, the user would never figure out what's wrong by looking at the value, i.e. what limitation of the analyzer he's stepping on (in fact he isn't, he's just using the APIs incorrectly, while we know very well what's going on). Additionally, returning the first byte makes API simpler in cases the type actually doesn't matter (which is why it's not provided), unlike returning UnknownVal. This makes me think that returning UnknownVal is worse than both returning first byte and stepping on assertion. However, yeah, if we try to model an API that manipulates objects of well-defined types through void pointers, returning bytes might cause issues when the user doesn't realize he needs to specify his types, especially when RegionStore would often ignore the type and only take the offset, unless it has no bindings and needs to return region value or derived symbol (so the user may easily fail to test this case).

I guess i'd follow up with a patch to remove the defensive behavior and bring back the assertion and modify checkers to provide types when necessary, and later see if it's worth it to remove auto-detection completely.