This is an archive of the discontinued LLVM Phabricator instance.

[StaticAnalyzer] Checker to find uninitialized fields after a constructor call
ClosedPublic

Authored by Szelethus on Apr 11 2018, 11:05 AM.

Details

Summary

This checker checks at the end of a constructor call whether all fields of the object were initialized.

Note that a significant portion of the code is for handling unions, for which there is very little support in the CSA. In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The test files also contain checks for heap allocated objects and unions, but heap regions are mostly conjured in the CSA and unions have little to no support, so the tests for them and unions should work theoretically, but its not 100%. so the tests are not yet functional (used no-warning at places where a warning is expected).

Diff Detail

Repository
rC Clang

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Reuploaded the diff with full context.

NoQ added a comment.Apr 11 2018, 11:44 AM

That's a lot of code, so it'll take me some time to understand what's going on here. You might be able to help me by the large patch in smaller logical chunks.

The idea looks reasonable. We're having specific warnings for, eg., assigning an uninitialized value, which fire much more often now that we have proper constructor support (eg. they'll fire in the copy constructor when the value was not initialized in the "actual" constructor). While some bugs can be found this way, i agree that it's not the best way to find them.

That said, your checker finds non-bugs, which is always risky. Sometimes the user should be allowed to initialize the field after construction but before the first use as a potentially important performance optimization. Did you find any such intentionally uninitialized fields with your checker? Were there many of those? If a user wants to have his codebase "analyzer-clean", would he be able to suppress the warning without changing the behavior of the program?

If union support is causing problems and you have a lot of workaround in the checker for these problems, i'll definitely suggest removing these workarounds from the initial commit and replacing them with FIXME comments (even if it assumes completely suppressing all warnings on classes that mention unions anywhere in the chain) because i'd much rather move towards better modeling in RegionStore than having checkers work around that. You might also be interested in D45241.

In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The whole point of making them unknown, as far as i understand, was to suppress uninitialized field warnings. So i feel what we're doing here is a bit upside down.

test/Analysis/ctor-uninitialized-member.cpp
683 ↗(On Diff #142053)

I managed to find the thread, but this link doesn't work for me.

869 ↗(On Diff #142053)

Hmm, shouldn't it say "expected"? Do these tests actually work?

Szelethus added inline comments.Apr 11 2018, 11:48 AM
test/Analysis/ctor-uninitialized-member.cpp
869 ↗(On Diff #142053)

Since unions are not yet supported by the CSA, this is only what would be expected from this checker to find in the future. I purposely wrote 'xpected' so tests wouldn't break.

In D45532#1064652, @NoQ wrote:

In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The whole point of making them unknown, as far as i understand, was to suppress uninitialized field warnings. So i feel what we're doing here is a bit upside down.

I wasn't too clear on this one: unknown fields are regarded as uninitialized, what I wrote was a temporary change in the code so I could check whether it would work.

I agree with you regarding unions, and will remove the code that relies on non existing implementation.

NoQ added a comment.Apr 11 2018, 11:57 AM

Ok, thanks, looking forward to it!

test/Analysis/ctor-uninitialized-member.cpp
869 ↗(On Diff #142053)

Aha, okay, got it. The tradition is to say "// no-warning" and then put a FIXME nearby saying that it should warn, how it should warn, probably why it should warn, if it's non-obvious.

I wasn't too clear on this one: unknown fields are regarded as uninitialized, what I wrote was a temporary change in the code so I could check whether it would work.

Woops, I meant that unknown fields are regarded az initialized.

NoQ added a comment.Apr 11 2018, 1:19 PM

I wasn't too clear on this one: unknown fields are regarded as uninitialized, what I wrote was a temporary change in the code so I could check whether it would work.

Woops, I meant that unknown fields are regarded az initialized.

Aha, yep, that makes a lot more sense =)

@NoQ Do you reckon these tests files are too long? Perhaps the one about this inheritance, that inheritance, diamond inheritance, etc. could be split into multiple files.

include/clang/StaticAnalyzer/Checkers/Checkers.td
322

This always pokes me in the eye, but shouldn't this file be sorted alphabetically?

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
103 ↗(On Diff #142053)

I believe fundamental object should be rephrased as of fundamental type (as in: object that is of fundamental type), as the standard talks about fundamental types.

187 ↗(On Diff #142053)

I think somewhere there should be a bit of reasoning why exactly these cases are ignored by the checker.

218 ↗(On Diff #142053)

Maybe one could use a Twine or a string builder to avoid all these unneccessary string instantiations? Or std::string::append()?

316 ↗(On Diff #142053)

(See, you use fundamental type here.)

340 ↗(On Diff #142053)

Please put : after TODO.

342 ↗(On Diff #142053)

What does it refer to in this sentence?

421 ↗(On Diff #142053)

Please insert a : after TODO here too.

475 ↗(On Diff #142053)

Maybe this could be made more explicit (as opposed to a comment) by using [[http://llvm.org/doxygen/classllvm_1_1StringRef.html#a3f45a78650a72626cdf9210de0c6d701|StringRef::rtrim(StringRef)]], like this:

return OS.str().rtrim('.').rtrim("->");

How does this code behave if the Chain is empty and thus OS contains no string whatsoever? drop_back asserts if you want to drop more elements than the length of the string.

498 ↗(On Diff #142053)

Comment isn't formatted as full sentence.

501 ↗(On Diff #142053)

Neither here.

test/Analysis/ctor-uninitialized-member.cpp
683 ↗(On Diff #142053)

Confirmed, this link is a 404.

whisperity requested changes to this revision.Apr 12 2018, 2:29 AM
whisperity added subscribers: gsd, dkrupp, o.gyorgy.

Sorry, one comment has gone missing meanwhile, I'm still getting used to this interface and hit Submit early.

test/Analysis/ctor-uninitialized-member-inheritance.cpp
24 ↗(On Diff #142053)

The literal 420 is repeated everywhere in this file. I think this (the same value appearing over and over again) will make debugging bad if something goes haywire and one has to look at memory dumps, control-flow-graphs, etc.

This revision now requires changes to proceed.Apr 12 2018, 2:29 AM
MTC added a subscriber: MTC.Apr 12 2018, 5:10 AM
Szelethus added a comment.EditedApr 12 2018, 8:27 AM

Thank you for all your comments so far! I'll probably only be able to update the diff tomorrow (with me being in the GMT + 1 timezone).

That's a lot of code, so it'll take me some time to understand what's going on here. You might be able to help me by the large patch in smaller logical chunks.

I know, and I tried to look for ways to split this checker into smaller logical chunks, but I didn't find a satisfying solution. I'll be sure to review the comments I have in the code so it's as easy as possible to understand what I did and why I did it. Also, I'd be delighted to help any way I can to guide you through it! :)

That said, your checker finds non-bugs, which is always risky. Sometimes the user should be allowed to initialize the field after construction but before the first use as a potentially important performance optimization. [...] If a user wants to have his codebase "analyzer-clean", would he be able to suppress the warning without changing the behavior of the program?

I didn't implement anything explicitly to suppress warnings, so if there is nothing in the CSA by default for it, I'm afraid this isn't possible (just yet).

My checker is mainly a tool to enforce the rule that every field should be initialized in a C++ object. While I see that this approach could result reduced performance, I think it's fine to say that those users who find this important (by 'this' I mean not initializing every field) should not enable this checker.

Nevertheless, I'd be happy to know what you think of this.

Did you find any such intentionally uninitialized fields with your checker? Were there many of those?

I ran the checker on some projects, but it's little difficult to analyze larger ones as this checker emits very important information in notes, and those are not yet part of the plist output. But it's coming: https://reviews.llvm.org/D45407! I'll be sure to answer these questions as soon as I can.

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
187 ↗(On Diff #142053)

At this function's declaration I have some comments describing what this does and why it does it. Did you find it insufficient?

test/Analysis/ctor-uninitialized-member-inheritance.cpp
24 ↗(On Diff #142053)

Would you say that I should rather use a different number for each test case?

Szelethus added inline comments.Apr 12 2018, 8:29 AM
test/Analysis/ctor-uninitialized-member-inheritance.cpp
24 ↗(On Diff #142053)

I mean, a different number for each variable within a test case.

xazax.hun added inline comments.Apr 12 2018, 8:50 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
31 ↗(On Diff #142053)

As far as I understand, for every operation, the only relevant contribution of the non-last elements in the chain is the diagnostic message.
I wonder if you would build a string instead of a FieldRegion chain and only store the last FieldRegion would make this more explicit in the code.

187 ↗(On Diff #142053)

I am a bit surprised why these errors are not deduplicated by the analyzer. Maybe it would worth some investigation?

272 ↗(On Diff #142053)

I find the documentation and the name of the function below misleading. isNonUnionUninit tells me this function is not supposed to be called for unions but the documentation suggests otherwise.

283 ↗(On Diff #142053)

Why do you copy here explicitly instead of taking the Chain argument by value?

292 ↗(On Diff #142053)

Don't you need to pop somewhere?

Szelethus added inline comments.Apr 12 2018, 10:11 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
187 ↗(On Diff #142053)

Copied from BugReports constructor documentation for uniqueing:

The reports that have the same report location, description, bug type, and ranges are uniqued - only one of the equivalent reports will be presented to the user. [...]

For this code snippet:

struct A {
   struct B {
      int x, y;
      
      B() {}
   } b;
   int w;

   A() {
      b = B();
   }
};

the warning message after a call to A::A() would be "3 uninitialized fields[...]", and for B::B() inside As constructor would be "2 uninitialized fields[...]", so it wouldn't be filtered out.

292 ↗(On Diff #142053)

Nice catch! This isn't covered by tests, and would most probably cause an incorrect note message. I'll be sure to fix it.

MTC added inline comments.Apr 13 2018, 4:46 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
10 ↗(On Diff #142053)

typo :) cunstructors -> constructors

54 ↗(On Diff #142053)

When I apply this patch, hit a compiler warning. Maybe friend struct is better.

warning: 'FieldChainInfoComparator' defined as a struct here but previously declared as a class [-Wmismatched-tags]
xazax.hun added inline comments.Apr 13 2018, 5:32 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
35 ↗(On Diff #142053)

I was wondering, do you need the chain at all? I think a field region might be sufficient. The enclosing object of the field should be accessible by querying the super region of the field region.

Szelethus marked 6 inline comments as done.Apr 15 2018, 10:48 AM

I'm about to update the diff, I changed a quite a lot of stuff, so I'm not sure that I'd be able to respond to these inline comments.

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
31 ↗(On Diff #142053)

I experimented with a number of implementations along this idea, but I just couldn't get it to work. Here are my findings:

  • Twine-s aren't meant to be copied, so using them didn't work out
  • StringRefs for some reason get invalidated by the time it'd make a call to FieldChainInfo::getAsString()
  • I didn't like the idea of storing the string because it'd not only impact performance greatly, but it'd also make the code a lot harder to understand, as opposed to making it more explicit

I did however try to make the implementation more easy to understand.

35 ↗(On Diff #142053)

I tried it, but that approach made it impossible to include pointers and references in the fieldchain.

218 ↗(On Diff #142053)

Doesn't move semantics take care of that? I'm not too sure on this one.

Szelethus marked 2 inline comments as done.
Szelethus edited the summary of this revision. (Show Details)

Among many other things:

  • The checker class is now on top of the file.
  • Reviewed all comments, fixed typos, tried to make the general idea more understandable.
  • Removed all (at least, all I could find) unnecessary functions and function arguments.
  • Removed support for unions entirely.
Szelethus marked 24 inline comments as done.Apr 15 2018, 11:21 AM

Note that there was a comment made about the test files being too long. I still haven't split them, as I didn't find a good "splitting point". Is this okay, or shall I try to split these into numerous smaller ones?

Szelethus marked an inline comment as done.Apr 15 2018, 11:22 AM

Would be interesting to extend this checker (maybe in an upcoming patch) to report on uninitialized members not only in constructors, but also copy constructors and move constructors.

See related https://bugs.llvm.org/show_bug.cgi?id=37086

This bug report also mentions assignment operator. But for that a warning may be not so useful. In that case the members of the assigned to object should have some initialized value already which the programmer may not want to overwrite in the assignment operator.

Szelethus updated this revision to Diff 142624.EditedApr 16 2018, 6:23 AM

Would be interesting to extend this checker (maybe in an upcoming patch) to report on uninitialized members not only in constructors, but also copy constructors and move constructors.

Added 3 new test cases to cover them. Interestingly, move constructors don't emit any warnings - the core can only assert that the fields after a move construction are valid (returns true for SVal::isValid()).
Came to think of it, I'm not 100% confident in the checkers name. It could be misleading, as this checker doesn't check constructors, but rather objects after construction. The end of a constructor call is only the point at which we know that analysis can be done.

This bug report also mentions assignment operator. But for that a warning may be not so useful. In that case the members of the assigned to object should have some initialized value already which the programmer may not want to overwrite in the assignment operator.

I believe there's a checker for that already, but I'm really not sure whether UndefinedAssignmentChecker covers all such cases.

Also, I managed to cause a crash with the class linked_ptr_internal from google's boringssl when I analyzed the grpc project. I'll look deeper into this, but I have a strong suspicion that the error lies within the CSA core.

Szelethus marked 2 inline comments as done.Apr 16 2018, 6:29 AM

This bug report also mentions assignment operator. But for that a warning may be not so useful. In that case the members of the assigned to object should have some initialized value already which the programmer may not want to overwrite in the assignment operator.

I believe there's a checker for that already, but I'm really not sure whether UndefinedAssignmentChecker covers all such cases.

Indeed, there are two different targets for analysis here:

  • The rhs of the assignment contains an uninitialised value, that is copied/moved into the this of the assignment. --> This should be checked by UndefinedAssignmentChecker.
  • Not every field is initialised by the assignment operator. This one, I believe, is hard to check and prove, and also feels superfluous. As @NoQ wrote earlier, the amount of reports this checker emits in itself is, at least expected to be, huge, and while it can be a valid programming approach that a ctor must initialise all, tracking in the CSA on an operator= that the this-side had something uninitialised but it was also not initialised by the assignment? Hard, and also not very useful, as far as I see.

In D45532#1064652, @NoQ wrote:

That said, your checker finds non-bugs, which is always risky. Sometimes the user should be allowed to initialize the field after construction but before the first use as a potentially important performance optimization. Did you find any such intentionally uninitialized fields with your checker? Were there many of those?

Where I can vision the usefulness of this checker the most is code that is evolving. There are many communities and codebases where coding standards and practices aren't laid out in a well-formed manner like LLVM has. There are also projects which are traditionally C code that has evolved into C++-ish code. When moving into C++, people start to realise they can write things like constructors, so they write them, but then leave out some members, and we can only guess (and perhaps make use of other checkers related to dereference or read of undefineds) what needs to be initialised, what is used without initialisation.

This checker is more close to something like a bugprone- Tidy matcher from the user's perspective, but proper analysis requires it to be executed in the SA.

A valid constriction of what this checker can find could be members that can "seemingly" be only be set in the constructor, there are no write operations or public exposure on them.

I have reviewed some findings of the checker, consider the following very trivial example:

#define HEAD_ELEMENT_SPECIFIER 0xDEADF00D // some magic yadda

struct linked_list;

struct linked_list
{
    int elem;
    linked_list* next;
};

class some_information_chain_class
{
  public:
    some_information_chain_class() : iList(new linked_list())
    {
      iList->elem = HEAD_ELEMENT_SPECIFIER;
    }

  private:
    linked_list* iList;
};

In this case, the "next" is in many cases remain as a garbage value. Of course, the checker cannot know, if, for example, there is also a count field and we don't rely on next == nullptr to signify the end of the list. But this can very well be a mistake from the programmer's end.

If a user wants to have his codebase "analyzer-clean", would he be able to suppress the warning without changing the behavior of the program?

I'm not entirely sure what you mean by "changing the behaviour of the program". As far as I can see, this checker only reads information from the SA, so it should not mess up any preconception set or state potentially used by other checkers.


I didn't implement anything explicitly to suppress warnings, so if there is nothing in the CSA by default for it, I'm afraid this isn't possible (just yet).

So far the only thing I can think of is coming up with new command-line arguments that can fine-tune the behaviour of the checker - but in this case, you can just as well just toggle the whole thing to be disabled.

An improvement heuristic (implemented in a later patch, toggleable with a command-line option perhaps?), as discussed above, would be checking only for members for whom there isn't any "setter" capability anywhere in the type.

whisperity requested changes to this revision.EditedApr 17 2018, 3:19 AM

There is something that came up in my mind:

Consider a construct like this:

class A
{
  A()
  {
    memset(X, 0, 10 * sizeof(int));
  }

  int X[10];
};

I think it's worthy for a test case if this X is considered unitialised at the end of the constructor or not. (And if not, a ticket, or a fix for SA in a subpatch.)


Also, there are some debug prints left behind in the current patch.

This revision now requires changes to proceed.Apr 17 2018, 3:19 AM
MTC added a comment.Apr 17 2018, 4:11 AM

There is something that came up in my mind:

Consider a construct like this:

class A
{
  A()
  {
    memset(X, 0, 10 * sizeof(int));
  }

  int X[10];
};

I think it's worthy for a test case if this X is considered unitialised at the end of the constructor or not. (And if not, a ticket, or a fix for SA in a subpatch.)

Just for a reminding, there is a very primitive patch to model memset(), see D44934. And if we think that memset is a way of initialization, we may need to distinguish between POD objects and non-POD Objects.

Szelethus updated this revision to Diff 142760.Apr 17 2018, 4:44 AM

Also, I managed to cause a crash with the class linked_ptr_internal from google's boringssl when I analyzed the grpc project. I'll look deeper into this, but I have a strong suspicion that the error lies within the CSA core.

I was very much wrong on that, the checker didn't handle cyclic references. I fixed that and some other crashes I encountered when checking the LLVM project.

To summarize, in this diff I

  • added some more test cases,
  • fixed crashes,
  • removed unnecessary headers,
  • fixed some typos in the documentation.

The code looks good apart from a few minor nits. I think I would prefer a new category created for this checker instead of using alpha; let's see what @NoQ has to say.

include/clang/StaticAnalyzer/Checkers/Checkers.td
305

alpha might not be a right place for this checker.
The meaning of alpha seems to be "this checker is currently too immature to be used by default because it generates too many FPs".

This is not the case for the uninitialized-fields checker, as it finds non-bugs by design, and each project might want to decide whether using such checker is a right fit.
I would suggest introducing a new category here, e.g. bugprone (other suggestions: lint?)
@NoQ any objections?

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
162 ↗(On Diff #142760)

"returns"

212 ↗(On Diff #142760)

nitpicking: llvm::Twine is normally used for such constructs

448 ↗(On Diff #142760)

Might be a naive question, but what about a chain intermixing references and pointers?
Wouldn't it be simpler to write

while (T)
  if (T->isVoidPointerType())
    return true;
  T = T->getPointeeType();

?

test/Analysis/ctor-uninitialized-member.cpp
817 ↗(On Diff #142053)

What happened to e? Here and in all notes below.

antipatterns or security could be another potential category name.

@george.karpenkov @NoQ bugprone. as a category sounds nice. It also nicely corresponds to the Clang-Tidy bugprone- category. It would not be nice to further fragment the "top levels" of checker categories.

I can say with confidence that CodeChecker does not break if the same category name is used by two different analyzers. Does the same stand for XCode / Scan-Build? In this case, introducing the bugprone category with the same principle that is behind Tidy's one is a good step.

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
162 ↗(On Diff #142760)

Actually, this explanation is superfluous. I believe

Returns if FD can be (transitively) dereferenced to a void* (void**, ...). The type of the region behind a void pointer isn't known, and thus FD can not be analyzed.

should suffice?

212 ↗(On Diff #142760)

I have also suggested the usage of Twine for this line (it's just the diff that got out of sync with the line numbers!), but I don't recall what @Szelethus' concern was about them. In case we have move semantics enabled, this line will compile into using the moving concatenator.

NoQ added a comment.EditedApr 23 2018, 11:56 AM

Guys, what do you think about a checker that warns on uninitialized fields only when at least one field is initialized? I'd be much more confident about turning such check on by default. We can still keep a pedantic version.

I can say with confidence that CodeChecker does not break if the same category name is used by two different analyzers. Does the same stand for XCode / Scan-Build?

Xcode and scan-build only support one analyzer, so it won't be a problem (at least not until more analyzers get supported).

But in general if we do overlapping names, we must be super sure that they mean the same thing in both analyzers, so that we didn't end up with a user interface in which one of the relatively common task would be to enable clang-tidy "bugprone" checkers and disable static analyzer "bugprone" checkers but the user would have to list all checkers by name in order to accomplish that because they all stay in the same package.

Szelethus added a comment.EditedApr 24 2018, 10:08 AM

I'd also like to point out that as I mentioned before, the checker's name itself is misleading (it is a leftover from an earlier implementation of this checker). Here are just some ideas I came up with:

  • UninitializedObjectChecker
  • UninitializedFieldsChecker
  • UninitializedFieldsAfterConstructionChecker
  • UninitializedMembersChecker
  • UninitializedMembersAfterConstructionChecker

Of these I like the first the most, but I'm open for anything, if you have an idea for it.

In D45532#1075789, @NoQ wrote:

Guys, what do you think about a checker that warns on uninitialized fields only when at least one field is initialized? I'd be much more confident about turning such check on by default. We can still keep a pedantic version.

Sounds good! I just finished implementing it along with a few minor (like some TODOs and fixes according to inline comments) and not-so-minor (like ignoring fields from system headers) changes. I'll update the diff and post results on it once I finish rechecking the LLVM/Clang project. I feel very confident about the upcoming version.

Szelethus updated this revision to Diff 143875.Apr 25 2018, 2:35 AM

In this diff I

  • added a Pedantic flag that is set to false by default to filter out results from objects that don't have a single field initialized,
  • made it so that fields that are declared in system headers are now ignored,
  • refactored isFullyInitialized to hasUnintializedFields (it returned true when there were in fact uninit fields),
  • fixed everything mentioned in inline comments aside from the naming and the category,
  • added TODOs for FieldChainInfo::toString, I decided to fix those in a later patch to keep the diff just a little bit smaller,
  • added many more test cases, including tests for the Pedantic flag
  • added support for arrays. Granted, they worked wonderfully with the checker before, but there was nothing mentioned about them in the code.

If you like how I implemented the Pedantic flag, then I think only the naming and choosing the correct category is left.

I also rechecked the entire LLVM/Clang project before the system header fix and after the fix with Pedantic set to true and set to false. Here are my findings:

How many reports did the checker emit?

  • Without fields in system headers being ignored: 208 (only functional had some fields uninitialized)
  • With fields in system headers being ignored and Pedantic set to true: 181
  • With fields in system headers being ignored and Pedantic set to false: 150

Most of these are intentional, as a very large portion of the project is performance critical. I did however find some constructors with the checker that would benefit from having the rest of their fields initialized.
I also found some constructors that didn't use = default for no good reason.

Szelethus marked 6 inline comments as done.Apr 25 2018, 2:37 AM
a.sidorin added inline comments.Apr 25 2018, 11:15 AM
lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
72 ↗(On Diff #143875)

This can be rewritten like `void toString(SmallVectorImpl<char> &Buf) const;

306 ↗(On Diff #143875)

What will happen if we analyze a member which is a pointer to a structure type and the structure is of incomplete type?

534 ↗(On Diff #143875)

while (!T.isNull())

test/Analysis/ctor-uninitialized-member.cpp
554 ↗(On Diff #143875)

Could you please explain what is the logic of test naming?

Szelethus updated this revision to Diff 144119.Apr 26 2018, 7:41 AM

Fixes according to inline comments.

Szelethus marked 3 inline comments as done.Apr 26 2018, 7:57 AM

By the way, thank you all for taking the time to review my code!

lib/StaticAnalyzer/Checkers/CtorUninitializedMemberChecker.cpp
306 ↗(On Diff #143875)

For that field isPointerOrReferenceUninit will be called. If it happens not to be null, unknown or undef, the CSA core will construct a symbolic region for it, and the checker always assumes that symbolic regions are initialized, meaning that we'll never call isNonUnionUninit for a field of incomplete type.

However, I did not cover this with tests, thanks for pointing it out!

test/Analysis/ctor-uninitialized-member.cpp
554 ↗(On Diff #143875)

The test files follow the strict structure that for each test case we'll define a structure type and then call its constructor(s) in a function right after it (functions are used to avoid zero initializations).

To be honest, there is no real logic behind the naming of the functions, it is only to keep the ODR. I used numbers in an increasing order, however I later added there cases, so in between f23 and f24 I used f23p5 and so on.

I figured that the strict structure of the test files would avoid confusion. Do you find it distracting?

Szelethus updated this revision to Diff 144439.EditedApr 28 2018, 2:31 AM
Szelethus marked an inline comment as done.

Renamed the checker to cplusplus.uninitialized.UninitializedObject.

I've read your comments regarding the category this should be in thoroughly, and should the clang-tidy category naming not be an issue, cplusplus.bugprone would've probably been the best option. However, since it is a concern, I did find many checkers in the core in the uninitialized subcategory, so I thought I'll place it in cplusplus.uninitialized for now. This could be a little too specific though, as I don't think many other checkers could be implemented that would reside in there.

What do you guys think?

Szelethus marked an inline comment as done.Apr 28 2018, 2:34 AM
Szelethus added inline comments.
test/Analysis/cxx-uninitialized-object.cpp
527

I haven't marked @a.sidorin's comment as done, but it disappeared because I renamed the file, so here it is:

Could you please explain what is the logic of test naming?

To which I replied:

The test files follow the strict structure that for each test case we'll define a structure type and then call its constructor(s) in a function right after it (functions are used to avoid zero initializations).

To be honest, there is no real logic behind the naming of the functions, it is only to keep the ODR. I used numbers in an increasing order, however I later added there cases, so in between f23 and f24 I used f23p5 and so on.

I figured that the strict structure of the test files would avoid confusion. Do you find it distracting?

Szelethus updated this revision to Diff 144440.Apr 28 2018, 2:42 AM

Forgot to rename the system header simulator file. Sorry about the spam :)

Two minor comments.

With regards to the function naming, the problem with incremental counts is that they get out of sync, like they did with fXpY and such, and if someone wants to keep the count incremental down the file, adding any new test will result in an unnecessarily large patch. Perhaps you could use void T_() for the test that calls T::T()?

test/Analysis/cxx-uninitialized-object.cpp
661

A newline between #endif and the next token is missing here.

1413

What is this comment symbolising? Is this actually something the test infrastructure picks up?

NoQ added a comment.Apr 30 2018, 7:47 PM

Had a look at the actual code, came up with some comments, most are fairly minor, so good job! You did a good job explaining the overall idea, but a lot of specifics were difficult to understand, so i made a few comments on how to make the code a bit easier to read.

I know, and I tried to look for ways to split this checker into smaller logical chunks, but I didn't find a satisfying solution.

What i would have done:

  1. A checker that straightforwardly iterates through immediate fields of the current object and reports uninitialized fields.
  2. Add base classes support.
  3. Heuristic for skipping sub-constructors could use a separate review.
  4. Start skipping arrays and unions.
  5. Add simple pointer dereference support - introduce chains of fields.
  6. More heuristics - the one for symbolic regions, the one for system header fields, etc.

etc.

I'm asking for this because it's, like, actually difficult and painful to understand formal information in large chunks. One-solution-at-a-time would have been so much easier. It is really a bad idea to start by presenting a working solution; the great idea is to present a small non-working solution with a reasonable idea behind it, and then extend it "in place" as much as it seems necessary. It is very comfortable when parts of the patch with different "status" (main ideas, corner-case fixes, heuristics, refactoring) stay in separate patches. Alpha checkers are essentially branches: because our svn doesn't have branches, we have alpha checkers and off-by-default flags. Even before step 1, you should be totally fine with committing an empty checker with a single empty callback and a header comment - even on such "step 0." we would be able to discuss if your approach to the checker seems right or might have inevitable issues, and it could have saved a lot of time.

Essentially every part of the solution that you implement, if you think it might deserve a separate discussion, also deserves a separate patch. It is reasonable to split the work into logical chunks before you start it. It's very unlikely that you have a single idea that takes 500 non-trivial lines of code to express. Splitting up ideas so that they were easy to grasp is an invaluable effort. I had very positive experience with that recently, both as a coder and as a reviewer, so i encourage that a lot.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
48

At a glance it looks like a great use case for an immutable list. It's easy to use the immutable list as a stack, and it's also O(1) to copy/move and take snapshots of it.

Note that you copy 10 pointers every time you copy or move a SmallVector. Small vectors don't make much sense as fields of actively used data structures - their main purpose is to live on the stack as local variables.

52–54

Please move this doc to the getter/setter function. That's where people expect to find it when they're reading code.

59–60

Do we really need the other field (IsDereferenced) to be mutable? Or maybe we can keep the whole object immutable? I.e., instead of dereference() we'll get a constructor that constructs a FieldChainInfo with the same chain but with the dereferenced flag set.

87–90

All of these can be replaced with a single ProgramStateRef object.

94

This field is worth documenting.

157–158

Please explain what these methods do in a more direct manner. Something like "This method returns true if an uninitialized field was found within the region. It should only be used with regions of union-type objects".

171–172

Something's missing before with.

177–178

The second sentence would look great at the call site.

182–183

Type of an object pointed to by a void pointer is something that our path-sensitive engine is sometimes able to provide. It is not uncommon that a void pointer points to a concrete variable on the current path, and we may know it in the analyzer. I'm not sure this sort of check is necessary (i.e. require more information).

You may also want to have a look at getDynamicTypeInfo() which is sometimes capable of retrieving the dynamic type of the object pointed to by a pointer or a reference - i.e. even if it's different from the pointee type of the pointer.

186

We're traditionally writing "end anonymous namespace" or "end of anonymous namespace".

224–226

I suspect that a fatal error is better here. We don't want the user to receive duplicate report from other checkers that catch uninitialized values; just one report is enough.

234–237

I believe that you only need the first Twine; then all operator+ calls will be resolved to operator+(Twine, ...) automatically. Or just use a string and a stream like pretty much all other checkers.

245

This is the third approach to concatenating strings in the last 10 lines of code. I think you should make a method FieldChainInfo::print(llvm::raw_ostream &) and use streams everywhere.

259–261

Aha, ok, got it, so you're putting the warning at the end of the constructor and squeezing all uninitialized fields into a single warning by presenting them as notes.

Because for now notes aren't supported everywhere (and aren't always supported really well), i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.

I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.

Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field. Though i agree that when there is more than one report, the user will be likely to deal with all fields at once rather than separately.

So it's a bit wonky.

We might want to enable one mode or another in the checker depending on the analyzer output flag. Probably in the driver (RenderAnalyzerOptions()).

It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.

293

Why is checking done lazily in the first place? Are there many situations in which we want to construct the object but never do the actual scan?

If it was actually necessary, i would have split out the side effect of this getter into a separate private method.

347

Please check the type explicitly, and then assert that the value is a Loc. There are a lot of nuances that you don't want to be dealing with here (eg. ObjCObjectPointerType, which can be present in C++ as well because Objective-C++ is a thing, is not a PointerType or a ReferenceType, but it is modeled with a Loc).

353

Please formalize what you mean by a fundamental type and assert that. You probably mean something like "an integer or a floating-point number or maybe an enum or, well, bool".

I suspect that many cases here are missed (complex types, vector types, ...).

360–373

Are there many warnings that will be caused by this but won't be caused by conducting the check for the constructor-within-constructor that's currently disabled? Not sure - is it even syntactically possible to not initialize the base class?

385

Please assert here that FR is a pointer/reference-type field.

387

As i mentioned above, you can simply store a ProgramStateRef within FindUninitializedFields. In this case this code will be simplified to State->getSVal(FR).

406–407

Is this loop guaranteed to terminate? Is something like void *v; v = &v; possible?

I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.

In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for State->getSVal(R, ...), but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.

418

CSA is not a commonly used acronym around the codebase.

418

I'd like to change the word "conjure" to something else because in most cases it won't be an actual SymbolConjured, but it'd be a SymbolRegionValue or SymbolDerived. Conjuring occurs in different circumstances.

431

Add llvm_unreachable("") to find out :)

433

Needs a comment on why the IsDereferenced flag on this path.

481

Can we at least detect these situations and suppress the warning? It should be relatively easy for lambdas (captured variable mapping should be stored somewhere, as far as i remember), not sure how easy it'd be for double inheritance (there are fancy methods for scanning this stuff, but that's equivalent to actually implementing the correct warning message).

Weird warning messages are pretty scary to show to the users.

It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.

560–561

This doesn't check that the constructor on the parent stack frame is anyhow related to the current constructor, so it may introduce false negatives. For instance, a class may lazy-initialize a singleton but never store a reference to it within itself (because, well, it's a singleton, you can always obtain the reference to it). In this case we'll never find the bug in the constructor of the singleton.

test/Analysis/cxx-uninitialized-object.cpp
932

CSA is not a commonly used acronym around the codebase.

Szelethus added a comment.EditedMay 2 2018, 8:37 AM

Thanks @NoQ for taking the time to review my code!

I'm sorry that the patch became this long. It was an error on my part, and I completely get that the checker now takes an uncomfortably long time to read and understand. This is my first "bigger" project in the CSA, I learned a lot so far, and I'll definitely review how I should've handled developing and uploading it here better, so next time I will do my best to make my contributions a lot more reviewer-friendly.

edit: Not just review-wise, but development-wise. This could've easily turned out wrong as a whole, and if it was, it could've been avoided if I shared my thoughts here.

It's very unlikely that you have a single idea that takes 500 non-trivial lines of code to express.

That's true. Since this was the first time I wrote anything the StaticAnalyzer project, I often found myself almost completely rewriting everything. Looking back, I still could've uploaded the code in parts, so notes taken, will do better next time!

It'll be some time before I sort everything out you commented on, just wanted to leave this here.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

[...]i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.

This can already be achieved with -analyzer-config notes-as-events=true. There no good reason however not to make an additional flag that would emit warnings instead of notes.

I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.

This checker had a previous implementation that emitted a warning for each uninitialized field, instead of squeezing them in a single warning per constructor call. One of the major reasons behind rewriting it was that it emitted so many of them that it was difficult to differentiate which warning belonged to which constructor call. Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.

Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field.

While this is true to some extent, it's not uncommon that a single constructor call leaves 7 uninitialized -- in the LLVM/Clang project I found one that had over 30!
Since its practically impossible to determine whether this was a performance enhancement or just poor programming, the few cases where it is intentional, an object would emit many more warnings would make make majority of the findings (where an object truly had only 1 or 2 fields uninit) a lot harder to spot in my opinion.

In general, I think one warning per constructor call makes the best user experience, but a temporary solution should be implemented until there's better support for notes.

I agree, this should be a topic of a follow-up patch.

test/Analysis/cxx-uninitialized-object.cpp
1413

// no-crash is something I found in many other test files, in this case it symbolizes that the point of this test isn't to check whether the checker correctly finds or ignores an uninit field, but that it doesn't crash.

NoQ added inline comments.May 3 2018, 6:31 PM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

This can already be achieved with -analyzer-config notes-as-events=true.

Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.

Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.

This makes me a bit worried, i wonder what's the reason why does the checker shout so loudly on a codebase that doesn't seem to have actual uninitialized value bugs all over the place.

Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor). Then I wonder:

  1. How many uninitialize fields would we be able to find this way?
  2. How many of such fields were intentionally left uninitialized?
  3. How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?
NoQ added inline comments.May 4 2018, 2:30 PM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

(when i'm asking this sort of stuff, i don't mean you should look through thousands of positives, a uniform random sample of ~50 should be just fine)
(but we really do need this info before enabling stuff by default)

Szelethus added a comment.EditedMay 15 2018, 8:13 AM

I'm afraid it'll be even more time before I post a new diff. There are some things that ProgramState is lacking, so I'll get that fixed first in a different pull request.

Quick edit: When I said "non-unique warnings", I meant that warnings before uniqueing.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

This can already be achieved with -analyzer-config notes-as-events=true.

Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.

I totally agree, I meant that a note-less solution should only be a (arguably better) workaround just as notes-as-events=true.

Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor).

Sure, having the same field field reported from the same constructor call in different TUs happens quite a lot, but they can easily be uniqued.

In the checkers current state, meaning that one warning per ctor call, the LLVM/Clang project in pedantic mode resulted in 409 non-unique warnings, and using CodeChecker I found that 181 of these is unique.

How many uninitialize fields would we be able to find this way?

In its current state, in pedantic mode, after uniqueing ~581. Now that's quite a bit off from what I remembered "the warnings from this checker alone would be in the thousands", but having one warning per uninit field would still result in a ~320% increase in reports.

How many of such fields were intentionally left uninitialized?

(I checked around 95 reports a couple weeks back before uploading this diff)
Almost all of them. Note that this is a very performance-critical project. From what I saw, many times several fields are left uninitialized, but these fields are inaccessible due to a kind field being asserted at their getter functions.
Also, I found cases where the constructor wasn't defaulted with = default;.

In fact, I struggled to find a case where a field was 100% left out by accident. Maybe in llvm/lib/IR/ConstantsContext.h, calling ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE).

This is not the case however in other projects I checked. In Xerces for example every find was a true positive.

How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?

This is somewhat harder to measure. The best I could come up with is uniqueing the fieldchains from the plist files with lexicographical sorting. Your question can be answered by measuring the length of the fieldchains. I found

  • 3 fieldchains with the length of 5
  • 18 fieldchains with the length of 4
  • 62 fieldchains with the lenfth of 3
  • 108 fieldchains with the length of 2
  • 137 fieldchains with the length of 1

(that is a total of 328 unique fieldchains lexicographically)
Because heap objects are not modeled, only 3 of the 328 fieldchains contains pointer dereferences. However, reference objects are very common from what I saw, but sadly I can't give you an exact number on those.

I hope these answers were satisfying, but I'll follow up with more information if needed.

360–373

I'm not a 100% sure what you mean. Can you clarify?

Not sure - is it even syntactically possible to not initialize the base class?

If I understand the question correctly, no, as far as I know.

431

It does! :)

Szelethus updated this revision to Diff 147485.EditedMay 18 2018, 6:05 AM

In this diff, I preferred placing TODOs in places where the logic of the checker would have changed too much. While I didn't have a strict rule for this in mind, I tried changing as little as possible in order to implement the fixes in smaller followup patches.

Now with that being said, I still did make numerous changes to address the inline comments, so here is a complete list of them:

  • I placed the checker back in alpha.cplusplus
  • While the main logic of the checker is unchanged, I did make some small additions: it turned out (thanks to @NoQ) that objects I wrongly referred to as fundamental were in fact of BuiltinType or EnumeralType. In the checker I defined these types as "primitive", as neither clang or the C++ standard defines a type called "primitive"
  • A new node is added to the directed tree: objects of MemberPointerType. I updated the documentation, and added a function to FindUninitializedFields but left the implementation to be done in a followup patch
  • String concatenations are all done with streams
  • FieldChainInfo now uses llvm::ImmutableList
  • FieldChainInfo is now completely immutable
  • FindUninitializedFields no longer checks lazily, and has only a single getUninitFields public non-special member function.
  • FindUninitializedFields now has a ProgramStateRef field instead of a bunch of managers (this added a dependency D46891)
  • Added asserts and llvm_unreachable at numerous locations
  • More comments for easier reading

For the test files:

  • Changed all non-member function names to eliminate numbering
  • Split cxx-uninitialized-object.cpp up, and placed pointer/reference tests in a different file
  • Added tests for
    • loc::ConcreteInt
    • Member pointers
    • BuiltinType and EnumeralType
    • Constructorcall within a constructor, without the two constructors being in relation, to highlight a false negative
    • C++11 member initializers.

I'll be the first to admit that I'm not 100% sure I implemented the change to llvm::ImmutableList perfectly, so I guess maybe it's worth taking a deeper look at (but it certainly works!). Shall I tie its lifetime to the checker object's lifetime?
Edit: I mean, the Factory's lifetime.
Of course, I rechecked the entire LLVM/Clang project and I'm happy to report that no crashes occured, and it emitted almost the same amount of warning (almost, because I used a newer version of clang).

Szelethus marked 28 inline comments as done.May 18 2018, 6:25 AM
Szelethus added inline comments.
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
182–183

I took a look at it, but I found that it'd change the implementation of isPointerOrReferenceUninit so much, I'd be a lot more comfortable if I could implement it in a smaller followup patch. I placed some TODOs in the code about this.

224–226

I think that would be a bad idea. For example, this checker shouts so loudly while checking the LLVM project, it would practically halt the analysis of the code, reducing the coverage, which means that checkers other then uninit value checkers would "suffer" from it.

However, I also think that having multiple uninit reports for the same object might be good, especially with this checker, as it would be very easy to see where the problem originated from.

What do you think?

406–407

Is this loop guaranteed to terminate? Is something like void *v; v = &v; possible?

I looked this up, and I am confident that it is not possible for a pointer to point to itself. Only a void** object may point to a void*. The loop terminates even if you do something evil like v = reinterpret_cast<void*>(&v); (I have tried this with ints).

I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.

In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for State->getSVal(R, ...), but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.

I invested some serious effort into getDynamicType, but I think it'll require more time and some serious refactoring of this (isPointerOrReferenceUninit) function, so I'd post it separately.

433

Added a comment at the line where I dereference the field that after that point we'll check the pointee.

560–561

Added a TODO and a test case to highlight this issue, and I'm planning to fix it in a followup patch.

Szelethus marked 4 inline comments as done.May 18 2018, 6:29 AM
NoQ added a comment.May 25 2018, 2:15 PM

Ok! Putting any remaining work into follow-up patches would be great indeed. Let's land this into alpha.cplusplus for now because i still haven't made up my mind for how to organize the proposed bugprone category (sorry!).

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
224–226

Well, i guess that's the reason to not use the checker on LLVM. Regardless of fatal/nonfatal warnings, enabling this checker on LLVM regularly would be a bad idea because it's unlikely that anybody will be able to fix all the false positives to make it usable. And for other projects that don't demonstrate many false positives, this shouldn't be a problem.

In order to indicate where the problem originates from, we have our bug reporter visitors that try their best to add such info directly to the report. In particular, @george.karpenkov's recent NoStoreFuncVisitor highlights functions in which a variable was not initialized but was probably expected to be. Not sure if it highlights constructors in its current shape, but that's definitely a better way to give this piece of information to the user because it doesn't make the user look for a different report to understand the current report.

259–261

Thanks for the detailed stats! Yeah, that's a lot of "immediately" uninitialized fields. My concerns are mostly clarified. Yeah i was curious about references as well, but it's clear that chains of length 1 contain no references.

360–373

You have a check for isCalledByConstructor() in order to skip base class constructors but then you check them manually. That's a lot of code, but it seems that the same result could have been achieved by simply skipping the descent into the base class.

Same question for class-type fields, actually.

406–407

Only a void** object may point to a void*.

That's definitely not true. The whole point of void * is that it can point to anything, including another void *. In fact, even void *v = &v; compiles in both C and C++.

I think it's better to have a stronger guarantee that this loop terminates.

test/Analysis/cxx-uninitialized-object.cpp
1413

Yup, // no-crash is indeed a good thing to mention.

Note: // no-warning is also not picked up by any infrastructure. It is only for the reader.

Szelethus marked 9 inline comments as done.May 27 2018, 12:38 PM

Thanks again for taking a look :)

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
360–373

Same question for class-type fields, actually.

My problem with that approach is that the same uninitialized field could've been emitted multiple times in different warnings. From an earlier conversation (assume that its run in pedantic mode):

For this code snippet:

struct A {
   struct B {
      int x, y;
      
      B() {}
   } b;
   int w;

   A() {
      b = B();
   }
};

the warning message after a call to A::A() would be "3 uninitialized fields[...]", and for B::B() inside As constructor would be "2 uninitialized fields[...]", so it wouldn't be filtered out.

In my opinion, if A contains a field of type B, it should be the responsibility of A's constructors to initialize those fields properly.

You have a check for isCalledByConstructor() in order to skip base class constructors but then you check them manually. That's a lot of code, but it seems that the same result could have been achieved by simply skipping the descent into the base class.

I also believe that a constructor "should be held responsible" for the initialization of the inherited data members. However, in the case of base classes, uniqueing isn't an issue, as in this case:

struct B {
  int a;
  B() {}
};

struct D : public B {
  int b;
  D() {}
};

a call to D::D() would not check the inherited member (a), if I didn't implement it explicitly. That also means that if isCalledByConstructor was refactored to not filter out base classes, we would get two warning each with a single note, reporting b and a respectively. (I haven't actually checked it, but its a reasonable assumption)

Actually, I'm not 100% sure which approach is better: a warning for each base, or one warning for the object, bases included. I personally think one warning per object results in the best user experience.

I like both ideas, but I think if we'd come to the conclusion that a warning per base should be the way to go, it should definitely be implemented in a followup patch, as isCalledByConstructor is already in line for some major refactoring.

What do you think? :)

406–407

Oh, right. Yea. Silly me. Well in any case, non-void pointers can't point to themselves, as I understand it. And since void pointers are skipped for now, I think (and have found after some serious testing) that loop will terminate just fine.

I think it's better to have a stronger guarantee that this loop terminates.

Agreed. I see now however why did you find parts of this function so fishy.

Szelethus marked 4 inline comments as done.May 28 2018, 3:36 AM

I decided to mark the inline comments in isPointerOrReferenceUninit regarding the dereferencing loop termination done for now. I left several TODO's in the function to be fixed later, with many of them depending on one another, so fixing them all or many at once would probably be the best solution.

I left two conversations "not done" (fatal/nonfatal errors and base classes), but if I may, do you have any other objections?

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
224–226

LLVM is a special project in the fact that almost every part of it is so performance critical, that leaving many fields uninit matters. However, I would believe that in most projects, only a smaller portion of the code would be like that.

Suppose that we have a project that also defines a set of ADTs, like an std::list-like container. If that container had a field that would be left uninit after a ctor call, analysis on every execution path would be halted which would use an object like that.

My point is, as long as there is no way to tell the analyzer (or the checker) to ignore certain constructor calls, I think it would be best not to generate a fatal error.

Regardless of fatal/nonfatal warnings, enabling this checker on LLVM regularly would be a bad idea because it's unlikely that anybody will be able to fix all the false positives to make it usable. And for other projects that don't demonstrate many false positives, this shouldn't be a problem.

I wouldn't necessarily call them false positives. This checker doesn't look for bugs, and all reports I checked were correct in the fact that those fields really were left uninit. They just don't cause any trouble (just yet!).

360–373

[...]but it seems that the same result could have been achieved by simply skipping the descent into the base class.

I can't edit inline comments sadly, but I'd just like to point out that it wouldn't achieve the same thing, as highlighted above. But, both solution would be okay.

@Szelethus I personally really like this idea, and I think it would be a great checker.
Could you perform more evaluation on other projects?

george.karpenkov resigned from this revision.May 30 2018, 4:36 PM
george.karpenkov added a subscriber: george.karpenkov.
Szelethus updated this revision to Diff 149290.May 31 2018, 7:44 AM
  • Rebased to 1cefbc5593d2f017ae56a853b0723a31865aa602 (revision 333276)
  • Fixed some typos
  • Added tests CyclicPointerTest and CyclicVoidPointerTest to highlight an issue to be fixed in a later patch
xazax.hun added inline comments.Jun 1 2018, 8:37 AM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
224–226

I think of this check as a tool to support a specific programming model where every field needs to be initialized by the constructor. This programming model might be followed by some parts of the projects while a 3rd party library in the same project or some other files might not follow this model. Right now there is no easy way to turn off some checks for a set of files, and there is no way to turn off a set of checks for some headers. For this reason, I think it is not a good idea to make these errors fatal, as 3rd party headers might reduce the coverage of the analysis on a project in a way that the user cannot control.

If we are afraid of having multiple reports from this check we could turn off the check for that particular path, for example, we could have a bool stored in the GDM for each path whether this check is already reported an error or not and we can check that before emitting warnings.

Szelethus marked 5 inline comments as done.Jun 4 2018, 3:45 AM

@Szelethus I personally really like this idea, and I think it would be a great checker.
Could you perform more evaluation on other projects?

Thanks for the encouraging words!

I have checked some other projects:

  • Xerces: I only found one uninit field (shockingly), and I found that it was probably unintentional.
  • grpc: This project depends on numerous third-party libraries, so it has a variety of coding and implementation styles. The checker emitted 5 warnings, and I think it's very likely that at least 3 of those were unintentional (4, if we count the missing =default;).
  • CppCheck: No warnings at all :)

The checker didn't crash during the analysis of any of the projects.

After some thinking, I decided to mark the comments on base classes done. I believe that the current implementation results in the best user experience, but even if we were come to the conclusion that it is not, we could resume the conversation on this when I submit a patch to fix related issues within isCalledByConstructor.

Szelethus updated this revision to Diff 150899.Jun 12 2018, 1:31 AM

Polite ping :)

This diff fixed a minor issue in FieldChainInfo's constructor.

NoQ accepted this revision.Jun 15 2018, 2:35 PM

Ok, let's land this one and see how it goes! I'm looking forward to seeing the follow-up patches.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
360–373

I still don't fully understand the reasoning behind who's responsible for initializing the field, i.e., the exact rule that you're enforcing seems to cause warnings to depend heavily on meta-information such as where does the analysis start, which is confusing. Because we're trying to enforce a coding guideline here, i believe that the guideline itself should be transparent and easy to explain. In particular, it should be obvious to the user which constructor is responsible for initializing a field, and by obvious i mean it should be obvious both from the guideline itself and from the analyzer's warning, and both of them should be the same "obvious".

whisperity accepted this revision.Jun 16 2018, 1:41 PM

Ah, and the function names in the test files have been made more logical.

Let's roll. 😉

This revision is now accepted and ready to land.Jun 16 2018, 1:41 PM
Szelethus updated this revision to Diff 151685.EditedJun 18 2018, 4:33 AM

Whoo! Thank you all for helping me with the reviews. Very excited about the upcoming fixes, I have neat ideas for some already.

  • Rebased to 8186139d6aa0619e2057ae617c074e486a7b2f2b (revision 334929),
  • Added a FIXME around base class handling as a reminder that this topic still needs to be discussed,
  • Re-run tests on some projects just to be sure.
This revision was automatically updated to reflect the committed changes.

If you are interested, I executed this checker on Firefox code. Results can be found here
http://sylvestre.ledru.info/reports/fx-scan-build/

Andi reported this bug https://bugs.llvm.org/show_bug.cgi?id=37965

@sylvestre.ledru Have you found any actual bugs using this checker?
@Szelethus Interesting, it seems that the pointer itself is initialized, but not what it's pointing to.
I think we should just check the fields directly, and do not attempt to traverse the pointer hierarchy.

If you are interested, I executed this checker on Firefox code. Results can be found here
http://sylvestre.ledru.info/reports/fx-scan-build/

Absolutely, thank you! :)

Andi reported this bug https://bugs.llvm.org/show_bug.cgi?id=37965

Well, that is intentional: not the pointer, but the pointee is uninitialized, as you can see from the note message. Now with that being said, I have had an other colleague of mine complain about a report, as he didn't see that the note message said "pointee" not "pointer", so maybe there's a point in trying to come up with a better message.

@sylvestre.ledru Have you found any actual bugs using this checker?
@Szelethus Interesting, it seems that the pointer itself is initialized, but not what it's pointing to.

Exactly.

I think we should just check the fields directly, and do not attempt to traverse the pointer hierarchy.

Hmm, that's one way of thinking, but I think it's more beneficial to check pointers too.

I'll take a look at some results and try to get back to you with some stats to support my view on this issue :)