[StaticAnalyzer] Checker to find uninitialized fields after a constructor call
Needs ReviewPublic

Authored by Szelethus on Wed, Apr 11, 11:05 AM.

Details

Summary

This checker checks at the end of a constructor call whether all fields of the object were initialized.

Note that a significant portion of the code is for handling unions, for which there is very little support in the CSA. In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The test files also contain checks for heap allocated objects and unions, but heap regions are mostly conjured in the CSA and unions have little to no support, so the tests for them and unions should work theoretically, but its not 100%. so the tests are not yet functional (used no-warning at places where a warning is expected).

Diff Detail

Szelethus created this revision.Wed, Apr 11, 11:05 AM

Please always upload all patches with full context (-U99999)

Reuploaded the diff with full context.

NoQ added a comment.Wed, Apr 11, 11:44 AM

That's a lot of code, so it'll take me some time to understand what's going on here. You might be able to help me by the large patch in smaller logical chunks.

The idea looks reasonable. We're having specific warnings for, eg., assigning an uninitialized value, which fire much more often now that we have proper constructor support (eg. they'll fire in the copy constructor when the value was not initialized in the "actual" constructor). While some bugs can be found this way, i agree that it's not the best way to find them.

That said, your checker finds non-bugs, which is always risky. Sometimes the user should be allowed to initialize the field after construction but before the first use as a potentially important performance optimization. Did you find any such intentionally uninitialized fields with your checker? Were there many of those? If a user wants to have his codebase "analyzer-clean", would he be able to suppress the warning without changing the behavior of the program?

If union support is causing problems and you have a lot of workaround in the checker for these problems, i'll definitely suggest removing these workarounds from the initial commit and replacing them with FIXME comments (even if it assumes completely suppressing all warnings on classes that mention unions anywhere in the chain) because i'd much rather move towards better modeling in RegionStore than having checkers work around that. You might also be interested in D45241.

In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The whole point of making them unknown, as far as i understand, was to suppress uninitialized field warnings. So i feel what we're doing here is a bit upside down.

test/Analysis/ctor-uninitialized-member.cpp
684

I managed to find the thread, but this link doesn't work for me.

870

Hmm, shouldn't it say "expected"? Do these tests actually work?

Szelethus added inline comments.Wed, Apr 11, 11:48 AM
test/Analysis/ctor-uninitialized-member.cpp
870

Since unions are not yet supported by the CSA, this is only what would be expected from this checker to find in the future. I purposely wrote 'xpected' so tests wouldn't break.

In D45532#1064652, @NoQ wrote:

In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The whole point of making them unknown, as far as i understand, was to suppress uninitialized field warnings. So i feel what we're doing here is a bit upside down.

I wasn't too clear on this one: unknown fields are regarded as uninitialized, what I wrote was a temporary change in the code so I could check whether it would work.

I agree with you regarding unions, and will remove the code that relies on non existing implementation.

NoQ added a comment.Wed, Apr 11, 11:57 AM

Ok, thanks, looking forward to it!

test/Analysis/ctor-uninitialized-member.cpp
870

Aha, okay, got it. The tradition is to say "// no-warning" and then put a FIXME nearby saying that it should warn, how it should warn, probably why it should warn, if it's non-obvious.

I wasn't too clear on this one: unknown fields are regarded as uninitialized, what I wrote was a temporary change in the code so I could check whether it would work.

Woops, I meant that unknown fields are regarded az initialized.

NoQ added a comment.Wed, Apr 11, 1:19 PM

I wasn't too clear on this one: unknown fields are regarded as uninitialized, what I wrote was a temporary change in the code so I could check whether it would work.

Woops, I meant that unknown fields are regarded az initialized.

Aha, yep, that makes a lot more sense =)

@NoQ Do you reckon these tests files are too long? Perhaps the one about this inheritance, that inheritance, diamond inheritance, etc. could be split into multiple files.

include/clang/StaticAnalyzer/Checkers/Checkers.td
321

This always pokes me in the eye, but shouldn't this file be sorted alphabetically?

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
104

I believe fundamental object should be rephrased as of fundamental type (as in: object that is of fundamental type), as the standard talks about fundamental types.

188

I think somewhere there should be a bit of reasoning why exactly these cases are ignored by the checker.

219

Maybe one could use a Twine or a string builder to avoid all these unneccessary string instantiations? Or std::string::append()?

317

(See, you use fundamental type here.)

341

Please put : after TODO.

343

What does it refer to in this sentence?

422

Please insert a : after TODO here too.

476

Maybe this could be made more explicit (as opposed to a comment) by using StringRef::rtrim(StringRef), like this:

return OS.str().rtrim('.').rtrim("->");

How does this code behave if the Chain is empty and thus OS contains no string whatsoever? drop_back asserts if you want to drop more elements than the length of the string.

499

Comment isn't formatted as full sentence.

502

Neither here.

test/Analysis/ctor-uninitialized-member.cpp
684

Confirmed, this link is a 404.

whisperity requested changes to this revision.Thu, Apr 12, 2:29 AM
whisperity added subscribers: gsd, dkrupp, o.gyorgy.

Sorry, one comment has gone missing meanwhile, I'm still getting used to this interface and hit Submit early.

test/Analysis/ctor-uninitialized-member-inheritance.cpp
25

The literal 420 is repeated everywhere in this file. I think this (the same value appearing over and over again) will make debugging bad if something goes haywire and one has to look at memory dumps, control-flow-graphs, etc.

This revision now requires changes to proceed.Thu, Apr 12, 2:29 AM
MTC added a subscriber: MTC.Thu, Apr 12, 5:10 AM
Szelethus added a comment.EditedThu, Apr 12, 8:27 AM

Thank you for all your comments so far! I'll probably only be able to update the diff tomorrow (with me being in the GMT + 1 timezone).

That's a lot of code, so it'll take me some time to understand what's going on here. You might be able to help me by the large patch in smaller logical chunks.

I know, and I tried to look for ways to split this checker into smaller logical chunks, but I didn't find a satisfying solution. I'll be sure to review the comments I have in the code so it's as easy as possible to understand what I did and why I did it. Also, I'd be delighted to help any way I can to guide you through it! :)

That said, your checker finds non-bugs, which is always risky. Sometimes the user should be allowed to initialize the field after construction but before the first use as a potentially important performance optimization. [...] If a user wants to have his codebase "analyzer-clean", would he be able to suppress the warning without changing the behavior of the program?

I didn't implement anything explicitly to suppress warnings, so if there is nothing in the CSA by default for it, I'm afraid this isn't possible (just yet).

My checker is mainly a tool to enforce the rule that every field should be initialized in a C++ object. While I see that this approach could result reduced performance, I think it's fine to say that those users who find this important (by 'this' I mean not initializing every field) should not enable this checker.

Nevertheless, I'd be happy to know what you think of this.

Did you find any such intentionally uninitialized fields with your checker? Were there many of those?

I ran the checker on some projects, but it's little difficult to analyze larger ones as this checker emits very important information in notes, and those are not yet part of the plist output. But it's coming: https://reviews.llvm.org/D45407! I'll be sure to answer these questions as soon as I can.

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
188

At this function's declaration I have some comments describing what this does and why it does it. Did you find it insufficient?

test/Analysis/ctor-uninitialized-member-inheritance.cpp
25

Would you say that I should rather use a different number for each test case?

Szelethus added inline comments.Thu, Apr 12, 8:29 AM
test/Analysis/ctor-uninitialized-member-inheritance.cpp
25

I mean, a different number for each variable within a test case.

xazax.hun added inline comments.Thu, Apr 12, 8:50 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
32

As far as I understand, for every operation, the only relevant contribution of the non-last elements in the chain is the diagnostic message.
I wonder if you would build a string instead of a FieldRegion chain and only store the last FieldRegion would make this more explicit in the code.

188

I am a bit surprised why these errors are not deduplicated by the analyzer. Maybe it would worth some investigation?

273

I find the documentation and the name of the function below misleading. isNonUnionUninit tells me this function is not supposed to be called for unions but the documentation suggests otherwise.

284

Why do you copy here explicitly instead of taking the Chain argument by value?

293

Don't you need to pop somewhere?

Szelethus added inline comments.Thu, Apr 12, 10:11 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
188

Copied from BugReports constructor documentation for uniqueing:

The reports that have the same report location, description, bug type, and ranges are uniqued - only one of the equivalent reports will be presented to the user. [...]

For this code snippet:

struct A {
   struct B {
      int x, y;
      
      B() {}
   } b;
   int w;

   A() {
      b = B();
   }
};

the warning message after a call to A::A() would be "3 uninitialized fields[...]", and for B::B() inside As constructor would be "2 uninitialized fields[...]", so it wouldn't be filtered out.

293

Nice catch! This isn't covered by tests, and would most probably cause an incorrect note message. I'll be sure to fix it.

MTC added inline comments.Fri, Apr 13, 4:46 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
11

typo :) cunstructors -> constructors

55

When I apply this patch, hit a compiler warning. Maybe friend struct is better.

warning: 'FieldChainInfoComparator' defined as a struct here but previously declared as a class [-Wmismatched-tags]
xazax.hun added inline comments.Fri, Apr 13, 5:32 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
36

I was wondering, do you need the chain at all? I think a field region might be sufficient. The enclosing object of the field should be accessible by querying the super region of the field region.

Szelethus marked 6 inline comments as done.Sun, Apr 15, 10:48 AM

I'm about to update the diff, I changed a quite a lot of stuff, so I'm not sure that I'd be able to respond to these inline comments.

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
32

I experimented with a number of implementations along this idea, but I just couldn't get it to work. Here are my findings:

  • Twine-s aren't meant to be copied, so using them didn't work out
  • StringRefs for some reason get invalidated by the time it'd make a call to FieldChainInfo::getAsString()
  • I didn't like the idea of storing the string because it'd not only impact performance greatly, but it'd also make the code a lot harder to understand, as opposed to making it more explicit

I did however try to make the implementation more easy to understand.

36

I tried it, but that approach made it impossible to include pointers and references in the fieldchain.

219

Doesn't move semantics take care of that? I'm not too sure on this one.

Szelethus marked 2 inline comments as done.
Szelethus edited the summary of this revision. (Show Details)

Among many other things:

  • The checker class is now on top of the file.
  • Reviewed all comments, fixed typos, tried to make the general idea more understandable.
  • Removed all (at least, all I could find) unnecessary functions and function arguments.
  • Removed support for unions entirely.
Szelethus marked 24 inline comments as done.Sun, Apr 15, 11:21 AM

Note that there was a comment made about the test files being too long. I still haven't split them, as I didn't find a good "splitting point". Is this okay, or shall I try to split these into numerous smaller ones?

Szelethus marked an inline comment as done.Sun, Apr 15, 11:22 AM

Would be interesting to extend this checker (maybe in an upcoming patch) to report on uninitialized members not only in constructors, but also copy constructors and move constructors.

See related https://bugs.llvm.org/show_bug.cgi?id=37086

This bug report also mentions assignment operator. But for that a warning may be not so useful. In that case the members of the assigned to object should have some initialized value already which the programmer may not want to overwrite in the assignment operator.

Szelethus updated this revision to Diff 142624.EditedMon, Apr 16, 6:23 AM

Would be interesting to extend this checker (maybe in an upcoming patch) to report on uninitialized members not only in constructors, but also copy constructors and move constructors.

Added 3 new test cases to cover them. Interestingly, move constructors don't emit any warnings - the core can only assert that the fields after a move construction are valid (returns true for SVal::isValid()).
Came to think of it, I'm not 100% confident in the checkers name. It could be misleading, as this checker doesn't check constructors, but rather objects after construction. The end of a constructor call is only the point at which we know that analysis can be done.

This bug report also mentions assignment operator. But for that a warning may be not so useful. In that case the members of the assigned to object should have some initialized value already which the programmer may not want to overwrite in the assignment operator.

I believe there's a checker for that already, but I'm really not sure whether UndefinedAssignmentChecker covers all such cases.

Also, I managed to cause a crash with the class linked_ptr_internal from google's boringssl when I analyzed the grpc project. I'll look deeper into this, but I have a strong suspicion that the error lies within the CSA core.

Szelethus marked 2 inline comments as done.Mon, Apr 16, 6:29 AM

This bug report also mentions assignment operator. But for that a warning may be not so useful. In that case the members of the assigned to object should have some initialized value already which the programmer may not want to overwrite in the assignment operator.

I believe there's a checker for that already, but I'm really not sure whether UndefinedAssignmentChecker covers all such cases.

Indeed, there are two different targets for analysis here:

  • The rhs of the assignment contains an uninitialised value, that is copied/moved into the this of the assignment. --> This should be checked by UndefinedAssignmentChecker.
  • Not every field is initialised by the assignment operator. This one, I believe, is hard to check and prove, and also feels superfluous. As @NoQ wrote earlier, the amount of reports this checker emits in itself is, at least expected to be, huge, and while it can be a valid programming approach that a ctor must initialise all, tracking in the CSA on an operator= that the this-side had something uninitialised but it was also not initialised by the assignment? Hard, and also not very useful, as far as I see.

In D45532#1064652, @NoQ wrote:

That said, your checker finds non-bugs, which is always risky. Sometimes the user should be allowed to initialize the field after construction but before the first use as a potentially important performance optimization. Did you find any such intentionally uninitialized fields with your checker? Were there many of those?

Where I can vision the usefulness of this checker the most is code that is evolving. There are many communities and codebases where coding standards and practices aren't laid out in a well-formed manner like LLVM has. There are also projects which are traditionally C code that has evolved into C++-ish code. When moving into C++, people start to realise they can write things like constructors, so they write them, but then leave out some members, and we can only guess (and perhaps make use of other checkers related to dereference or read of undefineds) what needs to be initialised, what is used without initialisation.

This checker is more close to something like a bugprone- Tidy matcher from the user's perspective, but proper analysis requires it to be executed in the SA.

A valid constriction of what this checker can find could be members that can "seemingly" be only be set in the constructor, there are no write operations or public exposure on them.

I have reviewed some findings of the checker, consider the following very trivial example:

#define HEAD_ELEMENT_SPECIFIER 0xDEADF00D // some magic yadda

struct linked_list;

struct linked_list
{
    int elem;
    linked_list* next;
};

class some_information_chain_class
{
  public:
    some_information_chain_class() : iList(new linked_list())
    {
      iList->elem = HEAD_ELEMENT_SPECIFIER;
    }

  private:
    linked_list* iList;
};

In this case, the "next" is in many cases remain as a garbage value. Of course, the checker cannot know, if, for example, there is also a count field and we don't rely on next == nullptr to signify the end of the list. But this can very well be a mistake from the programmer's end.

If a user wants to have his codebase "analyzer-clean", would he be able to suppress the warning without changing the behavior of the program?

I'm not entirely sure what you mean by "changing the behaviour of the program". As far as I can see, this checker only reads information from the SA, so it should not mess up any preconception set or state potentially used by other checkers.


I didn't implement anything explicitly to suppress warnings, so if there is nothing in the CSA by default for it, I'm afraid this isn't possible (just yet).

So far the only thing I can think of is coming up with new command-line arguments that can fine-tune the behaviour of the checker - but in this case, you can just as well just toggle the whole thing to be disabled.

An improvement heuristic (implemented in a later patch, toggleable with a command-line option perhaps?), as discussed above, would be checking only for members for whom there isn't any "setter" capability anywhere in the type.

whisperity requested changes to this revision.EditedTue, Apr 17, 3:19 AM

There is something that came up in my mind:

Consider a construct like this:

class A
{
  A()
  {
    memset(X, 0, 10 * sizeof(int));
  }

  int X[10];
};

I think it's worthy for a test case if this X is considered unitialised at the end of the constructor or not. (And if not, a ticket, or a fix for SA in a subpatch.)


Also, there are some debug prints left behind in the current patch.

This revision now requires changes to proceed.Tue, Apr 17, 3:19 AM
MTC added a comment.Tue, Apr 17, 4:11 AM

There is something that came up in my mind:

Consider a construct like this:

class A
{
  A()
  {
    memset(X, 0, 10 * sizeof(int));
  }

  int X[10];
};

I think it's worthy for a test case if this X is considered unitialised at the end of the constructor or not. (And if not, a ticket, or a fix for SA in a subpatch.)

Just for a reminding, there is a very primitive patch to model memset(), see D44934. And if we think that memset is a way of initialization, we may need to distinguish between POD objects and non-POD Objects.

Szelethus updated this revision to Diff 142760.Tue, Apr 17, 4:44 AM

Also, I managed to cause a crash with the class linked_ptr_internal from google's boringssl when I analyzed the grpc project. I'll look deeper into this, but I have a strong suspicion that the error lies within the CSA core.

I was very much wrong on that, the checker didn't handle cyclic references. I fixed that and some other crashes I encountered when checking the LLVM project.

To summarize, in this diff I

  • added some more test cases,
  • fixed crashes,
  • removed unnecessary headers,
  • fixed some typos in the documentation.