This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][UninitializedObjectChecker] Fixed a false negative by no longer filtering out certain constructor calls
ClosedPublic

Authored by Szelethus on Jun 21 2018, 8:10 AM.

Details

Summary

As of now, all constructor calls are ignored that are being called by a constructor. The point of this was not to analyze the fields of an object, so an uninitialized field wouldn't be reported multiple times.

This however introduced false negatives when the two constructors were in no relation to one another -- see the test file for a neat example for this with singletons.

This patch aims so fix this issue.

Diff Detail

Repository
rC Clang

Event Timeline

Szelethus created this revision.Jun 21 2018, 8:10 AM

Moved LC = LC ->getParent() to the while statement's argument to avoid a potential infinite loop. Whoops :)

george.karpenkov accepted this revision.Jun 26 2018, 5:46 PM

LGTM with a nit.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
676

nit: could we have while (LC) followed by LC = LC->getParent() ? Do you intentionally skip the first location context?

This revision is now accepted and ready to land.Jun 26 2018, 5:46 PM
NoQ added a comment.Jun 26 2018, 6:07 PM

I think we need to finish our dialog on who's responsible for initialization and why do we need to filter constructors at all, cause it's kinda hanging (i.e. D45532#inline-422673).

NoQ requested changes to this revision.Jun 26 2018, 6:07 PM
This revision now requires changes to proceed.Jun 26 2018, 6:07 PM
Szelethus added a comment.EditedJun 27 2018, 3:05 AM
In D48436#1144380, @NoQ wrote:

I think we need to finish our dialog on who's responsible for initialization and why do we need to filter constructors at all, cause it's kinda hanging (i.e. D45532#inline-422673).

Agreed.

I have some code in the works, and I planned to upload it as a separate patch. Note that I added TODO's around base class handling to avoid forgetting this.

Here's my current thinking of the topic: I am experimenting with no longer ignoring base initializations, instead of manually going over them, and and I really like where it's going. This has a number of benefits:

  • This immediately solves the issue where there are multiple copies of the same base is in an object (non-virtual "diamond" inheritance), a wrong note message was emitted
  • It would make more sense, as each constructor would be responsible to handle direct fields
  • It would limit the number of notes emitted per object

The solution is based upon your idea (D45532#inline-415396).

However, while this new approach is more towards making each constructor take care of their own fields, I think a derived class should be responsible for calling the correct base constructor. This means that even if that was compiler-generated, a call to it as a base initialization should trigger analysis from the checker.
There a number of heuristics that could be added to this, like ignore privately inherited bases.

The reason why I decided not to upload it alongside this patch is that it changes a lot of code.
For example:

struct Base {
  int a;
};

struct Derived {
  int b;
  Derived() : b(123) {
    a = 456;
  }
};

A call to Derived::Derived() previously emitted no warnings. However, with these changes, a warning is emitted for Base::a. Since cxx-uninitialized-object-inheritance.cpp is full of code like the example seen above, the whole file needs to be rewritten.

I'm also stopped by a bug where a warning is not emitted in the text output, but is emitted in the plist output. I'm already neck-deep in this issue, but I'll probably need some more time.

Note that you also mentioned that it isn't obvious enough what rules does this checker follows -- this should be resolved with a documentation file of some sort, and I plan to fix it in a different patch.

I'd love to hear your thoughts on this :)

[...]why do we need to filter constructors at all[...]

Else you'd get get warning for the same uninitialized fields multiple times. For example (although it's not a part of this function) compiler-generated constructors are often provided so a type can be stored in a container that required a default constructor.

Polite ping ^-^

NoQ added a comment.Jul 12 2018, 12:43 PM

A call to Derived::Derived() previously emitted no warnings. However, with these changes, a warning is emitted for Base::a.

Yep, a heuristic to skip implicit constructor declarations during analysis and make the first explicit constructor responsible for initializing its bases/fields that have been constructed via autogenerated constructors seems reasonable. With that i guess you'll still have to deep-scan fields and bases, which prevents us from simplifying our code. I'll think about that a bit more; it might be worth it to track such deferred subregions in a state trait and drain it whenever we pop back to an explicit constructor.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
683–693

It seems safer to look at CXXConstructExpr::getConstructionKind().

Taking a LazyCompoundVal and converting it back to a region is definitely a bad idea because the region within LazyCompoundVal is completely immaterial and carries no meaning for the value represented by this SVal; it's not necessarily the region you're looking for.

NoQ added inline comments.Jul 12 2018, 12:45 PM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
676

I guess the predicate we're checking is trivially true for the current location context.

I'll think about that a bit more; it might be worth it to track such deferred subregions in a state trait and drain it whenever we pop back to an explicit constructor.

There are so many things to consider, like the following case:

struct DynTBase {};
struct DynTDerived : DynTBase {
  // TODO: we'd expect the note: {{uninitialized field 'this->x'}}
  int x; // no-note
};

struct DynamicTypeTest {
  DynTBase *bptr;
  int i = 0;

  // TODO: we'd expect the warning: {{1 uninitialized field}}
  DynamicTypeTest(DynTBase *bptr) : bptr(bptr) {} // no-warning
};

void f() {
  DynTDerived d;
  DynamicTypeTest t(&d);
};

As of now, the checker misses this one.
We talked about two approaches so far, the one already in the checker and the one I proposed now. I think the current implementation solves this issue a lot more naturally -- isNonUnionUninit has to be changed so that the fields dynamic type is checked rather then its static type.
The proposed method would have a tough time dealing with this -- we don't want an error for DynTDerived::DynTDerived(), since it's a compiler generated ctor, but ideally we wouldn't like to have a warning for an object (t) and a separate warning for it's field (t.bptr.x), but I'm afraid this new approach would make this a necessity.

So, bottom line, there are a bunch of TODOs in the code to make sure this issue is not forgotten, I strongly believe that we need a separate discussion for this, if you'd agree :)

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
676

Completely missed this inline, sorry!

As @NoQ said, since this checker only fires after a constructor call, the first location context will surely be that, and I'm only checking whether the current ctor was called by another.

Szelethus added inline comments.Jul 16 2018, 10:16 AM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
676

I'm actually not sure how you imagined that. The loop condition is LC, but if I assign it to its parent right after that, I won't be sure that it's non-null. The reason why I placed it there is that early exits could restart the loop at "random" places.

683–693

Looking at the singleton test case, I think I need to get that region somehow, and I'm not too sure what you meant under using CXXConstructExpr::getConstructionKind(). One thing I could do, is similar to how getObjectVal works:

Loc Tmp = Context.getSValBuilder().getCXXThis(OtherCtor->getParent(),
                                              Context.getStackFrame());

auto OtherThis = Context.getState()->getSVal(Tmp).castAs<loc::MemRegionVal>();

OtherThis.getRegion(); // Hooray!

I have tested it, and it works wonders. Is this a "safe-to-use" region?

Polite ping :)

NoQ added a comment.Jul 25 2018, 6:03 PM

Sorry, i'm still lagging with my reviews because there are a lot of them and i have to balance it with doing actual work. I'll hopefully get to this soon.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
683–693

You can directly ask CXXConstructExpr if it's a field or a base constructor. I could describe getConstructionKind() as some sort of very limited ConstructionContext that's available in the AST: it explains to you what sort of object is being constructed. I suspect it's enough for your purposes.

Szelethus added inline comments.Jul 30 2018, 8:51 AM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
683–693

I think I'm aware of how it works, but my issue is that it's very limited:

enum ConstructionKind { CK_Complete, CK_NonVirtualBase, CK_VirtualBase, CK_Delegating }

Sadly, whether the CXXConstructExpr was a field construction or a "top level" (non-field) construction can't be retrieved from getConstructionKind(), as I understand it.

If I were to return true (say that the constructed object will be analyzed later, thus ignoring it for now) if getConstructionKind() == CK_Complete, the singleton testcase will fail (I have tested it). I'm very confident that I need to obtain the constructed object's MemRegion.

NoQ added inline comments.Jul 30 2018, 5:03 PM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
683–693

Whoops, yeah, i was wrong. I mis-remembered that there's a CK_ for field.

Then, yeah, the code looks good, let me recall where we were in terms of why do we need it><

Szelethus added inline comments.Jul 31 2018, 8:40 AM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
683–693

"it" as if?

george.karpenkov resigned from this revision.Aug 2 2018, 2:37 PM

Polite ping :)

NoQ accepted this revision.Aug 7 2018, 1:42 PM

Ok, let's commit this and see how to fix it later. I still think it's more important to come up with clear rules of who is responsible for initializing fields than making sure our warnings are properly grouped together.

ideally we wouldn't like to have a warning for an object (t) and a separate warning for it's field (t.bptr.x)

I don't quite understand this. How would the first warning look like? What would it warn about?

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
671–682

All uses of getObjectVal so far are followed by retrieving the parent region from the LazyCompoundVal. Why do you need to obtain the LazyCompoundVal in the first place, i.e. do the second getSVal "for '*this'" in getObjectVal? Why not just operate over the this-region on the current Store? I think there isn't even a guarantee that these two regions are the same. Like, in this case they probably will be the same, but we shouldn't rely on that.

This revision is now accepted and ready to land.Aug 7 2018, 1:42 PM
In D48436#1191458, @NoQ wrote:

Ok, let's commit this and see how to fix it later.

Thanks! ^-^

I still think it's more important to come up with clear rules of who is responsible for initializing fields than making sure our warnings are properly grouped together.

I'll admit that the main reason why I only added a TODO about this issue and delayed it for later is that I'm a little unsure about some details myself. I spent serious effort on some ideas, which I later realized didn't work out too well, so I'm looking for a solid solution on some issues (like base class handling) that is both efficient to a degree and makes sense from a user standpoint before writing a doc file of some sort.

Currently, I'm working on refactoring the checker (which is why I didn't update some of my patches, as those changes would become obsolete), which will greatly increase readability and reliability (let's face it, the code about pointer chasing is a spaghetti).

[...]ideally we wouldn't like to have a warning for an object (t) and a separate warning for it's field (t.bptr.x)[...]

I don't quite understand this. How would the first warning look like? What would it warn about?

I'm afraid I explained my thoughts very poorly. I have a much better grasp on this issue now, and I have a very good solution in testing. I'll upload a new diff during the week that will clarify this.
Here's what I meant with the correct code:

struct DynTBase {
  // This is the line I meant but forgot to add.
  int x; // expected-note{{uninitialized field 'this->bptr->x'}}
};
struct DynTDerived : DynTBase {
  // TODO: we'd expect the note: {{uninitialized field 'this->bptr->y'}}
  int y; // no-note
};

struct DynamicTypeTest {
  DynTBase *bptr;
  
  int i = 0;

  DynamicTypeTest(DynTBase *bptr) : bptr(bptr) {} // expected-warning{{1 uninitialized field}}
};

void f() {
  DynTDerived d;
  DynamicTypeTest t(&d);
};

In this one, note that the constructed object t has two uninitialized fields, t.bptr->x and t.bptr->y. Currently, my checker misses the second one. The first one gets picked up rather easily, but then the problem arises about how do we deal with t.bptr->y. I proposed two ideas:

  1. In isNonUnionUninit, obtain the dynamic type of the object we're analyzing. This is the easier solution, as the current implementation explicitly checks each base. This would result in one warning with a note for each field.
  2. Change willObjectBeAnalyzedLater so that bases are analyzed on their own, and eliminate checking bases explicitly (D45532#inline-415396). The checker won't run when d is constructed, because it has a default ctor, but will run after t. How do we catch t.bptr->y? If we don't analyze bases explicitly, we can't just obtain the dynamic type in isNonUnionUninit, because then we'd miss t.bptr->x. If we decide that for fields we will explicitly check bases, that would be inconsistent, because in ts case we would analyze base classes on their own (if it had any), but we wouldn't for its fields. As you said,

With that i guess you'll still have to deep-scan fields and bases, which prevents us from simplifying our code.

I really just wanted to emphasize the point that I need more time to figure this out.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
671–682

Fair point, that part was written when I knew very little about how these things worked. I'll add a TODO to get that fixed before commiting.

Szelethus updated this revision to Diff 159687.Aug 8 2018, 5:15 AM

Added a TODO.

This revision was automatically updated to reflect the committed changes.