This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][UninitializedObjectChecker] Fixed dereferencing
ClosedPublic

Authored by Szelethus on Aug 21 2018, 12:41 PM.

Details

Summary

This patch aims to fix derefencing, which has been debated for months now.

Instead of working with SVals, the function now relies on TypedValueRegion.

Since this has been discussed since the inception of the checker, I'd very much prefer finding a permanent solution for this before moving forward.

Diff Detail

Repository
rL LLVM

Event Timeline

Szelethus created this revision.Aug 21 2018, 12:41 PM

@NoQ, is this how you imagined derefercing to work? I'm still a little uncertain that I implemented this in a way that addresses all your concerns.

NoQ added a comment.Aug 21 2018, 1:21 PM

Yup, this looks great and that's exactly how i imagined it.

lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedObject.h
261–265 ↗(On Diff #161796)

We have a fancy static Loc::isLocType().

lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedPointee.cpp
126–127 ↗(On Diff #161796)

Good catch.

It might still be possible to initialize a reference with an already-undefined pointer if core checkers are turned off, but we don't support turning them off, so i guess it's fine.

177 ↗(On Diff #161796)

Just SVal. And you should be able to pass R directly into getSVal.

210 ↗(On Diff #161796)

We usually just do .getAsRegion().

And then later dyn_cast it.

212–216 ↗(On Diff #161796)

Ok, i have a new concern that i didn't think about before.

There's nothing special about symbolic regions. You can make a linked list entirely of concrete regions:

struct List {
  List *next;
  List(): next(this) {}
};

void foo() {
  List list;
}

In this case the finite-ness of the type system won't save us.

I guess we could also memoize base regions that we visit :/ This is guaranteed to terminate because all of our concrete regions are based either on AST nodes (eg. global variables) or on certain events that happened previously during analysis (eg. local variables or temporaries, as they depend on the stack frame which must have been previously entered). I don't immediately see a better solution.

223 ↗(On Diff #161796)

We might have eliminated symbolic regions but we may still have concrete function pointers (which are TypedRegions that aren't TypedValueRegions, it's pretty weird), and i guess we might even encounter an AllocaRegion (which is completely untyped). I guess we should not try to dereference them either.

240–244 ↗(On Diff #161796)

This code seems to be duplicated with the "0th iteration" before the loop. I guess you can put everything into the loop.

Szelethus updated this revision to Diff 162198.EditedAug 23 2018, 9:17 AM

Addressed the inline comments.

  • Added a llvm::SmallSet to further ensure the termination of the loop.
  • Changed isLocType to isDereferencableType, block pointers are regarded as primitive objects.
  • Added new tests.
  • No longer using getDynamicTypeInfo, dynamic type is acquired from TypedValueRegion::getLocationType.
Szelethus marked 5 inline comments as done.Aug 23 2018, 9:27 AM
Szelethus added inline comments.
lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedObject.h
259 ↗(On Diff #162198)

I'm not sure this is correct -- do block pointers belong here? Since their region is not TypedValueRegion, I though they better fit here.

261–265 ↗(On Diff #161796)

Oh, good to know! However, it also returns true for nullptr_t, which also happens to be a BuiltinType. I'd like to keep isPrimitiveType and (the now renamed) isDereferencableType categories disjunctive. Primitive types require no further analysis other then checking whether they are initialized or not, which is true for nullptr_t objects.

lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedPointee.cpp
240–244 ↗(On Diff #161796)

I moved some code into the loop, but I think that I really need a 0th iteration to make the code readable.

Szelethus added a comment.EditedAug 29 2018, 12:23 PM

I managed to cause a bunch of crashes with this patch, so consider it as a work in progress for now :/

Szelethus updated this revision to Diff 163185.Aug 29 2018, 1:57 PM

Fixed two crashes. Interesting how many much better this approach works, as I've never encountered FunctionTyped objects or only forward declared typed pointers after dereferencing.

Szelethus updated this revision to Diff 163499.Aug 31 2018, 4:24 AM

Reuploaded with -U99999. Oops.

Szelethus updated this revision to Diff 163992.EditedSep 5 2018, 2:44 AM

Fixed an assertation failure where a lambda field's this capture was a part of the fieldchain.

Szelethus added inline comments.Sep 5 2018, 2:53 AM
lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedPointee.cpp
126–127 ↗(On Diff #161796)

I removed it, because it did crash couple times on LLVM. Note that the assert checked whether the reference for undefined, not uninitialized :/.

It's no longer in the code, but this was it:

assert(!FR->getDecl()->getType()->isReferenceType() &&
       "References must be initialized!");
test/Analysis/cxx-uninitialized-object.cpp
879–902 ↗(On Diff #163992)

I'm 99% sure this is a FP, but it doesn't originate from the checker. Shouldn't *ptr be undef after the end of the code block as lambda's lifetime ends?

Nevertheless, it did cause a crash, so here's a quick fix for it.

NoQ added inline comments.Sep 6 2018, 6:18 PM
lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedPointee.cpp
126–127 ↗(On Diff #161796)

Hmm, indeed, it seems that we don't have a checker for garbage references, only for null references(?).

I guess it's a good idea for a checker.

test/Analysis/cxx-uninitialized-object.cpp
879–902 ↗(On Diff #163992)

I'm pretty sure that all sorts of contents of lambda aka *ptr are undefined once it goes out of scope. Moreover, ptr is now a dangling pointer, and reading from it would cause undefined behavior. I'm not sure if the analyzer actually models this though.

But on the other hand, even if it didn't go out of scope, i don't really see where field a was initialized here.

Soo what makes you think it's a false positive?

Szelethus added inline comments.Sep 7 2018, 3:56 AM
test/Analysis/cxx-uninitialized-object.cpp
879–902 ↗(On Diff #163992)

Soo what makes you think it's a false positive?

Poor choice of words I guess. Its not a false positive (as the entire region of that lambda is undefined), but rather a false negative, as the analyzer doesn't pick up that *ptr is a dangling pointer.

I ran the checker on LLVM/Clang and grpc, and saw no crashes at all! Quite confident about the current diff. However, I'm not 100% sure it doesn't break on Objective C++ codes. Can you recommend a project like that?

NoQ accepted this revision.Sep 13 2018, 6:56 PM

Looks good i guess. I'm glad this is sorted out^^

This revision is now accepted and ready to land.Sep 13 2018, 6:56 PM
This revision was automatically updated to reflect the committed changes.
isuckatcs added inline comments.
cfe/trunk/lib/StaticAnalyzer/Checkers/UninitializedObject/UninitializedPointee.cpp
222

@Szelethus @NoQ @xazax.hun I ran into an issue related to this line.

Consider this test case:

struct CyclicPointerTest1 {
  int *ptr; // expected-note{{object references itself 'this->ptr'}}
  int dontGetFilteredByNonPedanticMode = 0;

  CyclicPointerTest1() : ptr(reinterpret_cast<int *>(&ptr)) {} // expected-warning{{1 uninitialized field}}
};

At this point, with the example above, the store looks like this:

{"kind": "Direct", "offset": 0, "extent": 64, value: &Element{temp_object{CyclicPointerTest1, S915}.ptr,0 S64b,int}}
{"kind": "Direct", "offset": 64, "extent": 32, value: 0 S32b}

which means the values are the following:

V = &Element{temp_object{CyclicPointerTest1, S915}.ptr,0 S64b,int}
R = Element{temp_object{CyclicPointerTest1, S915}.ptr,0 S64b,int}
DynT = PointerType ... 'int *'

when State->getSVal(R, DynT) is called, eventually RegionStoreManager::getBinding() will also be invoked, where because R is an ElementRegion we reach these lines:

if (const ElementRegion *ER = dyn_cast<ElementRegion>(R)) {
  // FIXME: Here we actually perform an implicit conversion from the loaded
  // value to the element type.  Eventually we want to compose these values
  // more intelligently.  For example, an 'element' can encompass multiple
  // bound regions (e.g., several bound bytes), or could be a subset of
  // a larger value.
  return svalBuilder.evalCast(getBindingForElement(B, ER), T, QualType{});
}

What happens here is that we call getBindingForElement(B, ER) and whatever value it returns, we cast to T, which is int * in this example. The type of the element inside the ER however is an int, so the following lookup will be performed:

Binding being looked up: 
  {"kind": "Direct", "offset": 0, "extent": 32}
------------------------------------------------------------------------------------------------------------------------
Store:
  {"kind": "Direct", "offset": 0, "extent": 64 value: &Element{temp_object{CyclicPointerTest1, S915}.ptr,0 S64b,int}}
  {"kind": "Direct", "offset": 64, "extent": 32 value: 0 S32b}

Since extents are ignored by default, the returned value will be &Element{temp_object{CyclicPointerTest1, S915}.ptr,0 S64b,int}}, which is indeed a pointer, however I'm not sure that the lookup here is actually correct.

As far as I understand, we want to lookup an int element based on it's region and we receive a pointer value. Is it because I misunderstand something, or it's an actual issue that is hid by how region store works right now?

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2023, 5:03 PM