[StaticAnalyzer] Checker to find uninitialized fields after a constructor call
ClosedPublic

Authored by Szelethus on Apr 11 2018, 11:05 AM.

Details

Summary

This checker checks at the end of a constructor call whether all fields of the object were initialized.

Note that a significant portion of the code is for handling unions, for which there is very little support in the CSA. In most cases, all fields of a union is regarded as unknown. I checked these cases by regarding unknown fields as uninitialized.

The test files also contain checks for heap allocated objects and unions, but heap regions are mostly conjured in the CSA and unions have little to no support, so the tests for them and unions should work theoretically, but its not 100%. so the tests are not yet functional (used no-warning at places where a warning is expected).

Diff Detail

Repository
rC Clang
There are a very large number of changes, so older changes are hidden. Show Older Changes
Szelethus marked an inline comment as done.EditedApr 28 2018, 2:31 AM

Renamed the checker to cplusplus.uninitialized.UninitializedObject.

I've read your comments regarding the category this should be in thoroughly, and should the clang-tidy category naming not be an issue, cplusplus.bugprone would've probably been the best option. However, since it is a concern, I did find many checkers in the core in the uninitialized subcategory, so I thought I'll place it in cplusplus.uninitialized for now. This could be a little too specific though, as I don't think many other checkers could be implemented that would reside in there.

What do you guys think?

Szelethus marked an inline comment as done.Apr 28 2018, 2:34 AM
Szelethus added inline comments.
test/Analysis/cxx-uninitialized-object.cpp
527

I haven't marked @a.sidorin's comment as done, but it disappeared because I renamed the file, so here it is:

Could you please explain what is the logic of test naming?

To which I replied:

The test files follow the strict structure that for each test case we'll define a structure type and then call its constructor(s) in a function right after it (functions are used to avoid zero initializations).

To be honest, there is no real logic behind the naming of the functions, it is only to keep the ODR. I used numbers in an increasing order, however I later added there cases, so in between f23 and f24 I used f23p5 and so on.

I figured that the strict structure of the test files would avoid confusion. Do you find it distracting?

Szelethus updated this revision to Diff 144440.Apr 28 2018, 2:42 AM

Forgot to rename the system header simulator file. Sorry about the spam :)

Two minor comments.

With regards to the function naming, the problem with incremental counts is that they get out of sync, like they did with fXpY and such, and if someone wants to keep the count incremental down the file, adding any new test will result in an unnecessarily large patch. Perhaps you could use void T_() for the test that calls T::T()?

test/Analysis/cxx-uninitialized-object.cpp
661

A newline between #endif and the next token is missing here.

1413

What is this comment symbolising? Is this actually something the test infrastructure picks up?

NoQ added a comment.Apr 30 2018, 7:47 PM

Had a look at the actual code, came up with some comments, most are fairly minor, so good job! You did a good job explaining the overall idea, but a lot of specifics were difficult to understand, so i made a few comments on how to make the code a bit easier to read.

I know, and I tried to look for ways to split this checker into smaller logical chunks, but I didn't find a satisfying solution.

What i would have done:

  1. A checker that straightforwardly iterates through immediate fields of the current object and reports uninitialized fields.
  2. Add base classes support.
  3. Heuristic for skipping sub-constructors could use a separate review.
  4. Start skipping arrays and unions.
  5. Add simple pointer dereference support - introduce chains of fields.
  6. More heuristics - the one for symbolic regions, the one for system header fields, etc.

etc.

I'm asking for this because it's, like, actually difficult and painful to understand formal information in large chunks. One-solution-at-a-time would have been so much easier. It is really a bad idea to start by presenting a working solution; the great idea is to present a small non-working solution with a reasonable idea behind it, and then extend it "in place" as much as it seems necessary. It is very comfortable when parts of the patch with different "status" (main ideas, corner-case fixes, heuristics, refactoring) stay in separate patches. Alpha checkers are essentially branches: because our svn doesn't have branches, we have alpha checkers and off-by-default flags. Even before step 1, you should be totally fine with committing an empty checker with a single empty callback and a header comment - even on such "step 0." we would be able to discuss if your approach to the checker seems right or might have inevitable issues, and it could have saved a lot of time.

Essentially every part of the solution that you implement, if you think it might deserve a separate discussion, also deserves a separate patch. It is reasonable to split the work into logical chunks before you start it. It's very unlikely that you have a single idea that takes 500 non-trivial lines of code to express. Splitting up ideas so that they were easy to grasp is an invaluable effort. I had very positive experience with that recently, both as a coder and as a reviewer, so i encourage that a lot.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
48

At a glance it looks like a great use case for an immutable list. It's easy to use the immutable list as a stack, and it's also O(1) to copy/move and take snapshots of it.

Note that you copy 10 pointers every time you copy or move a SmallVector. Small vectors don't make much sense as fields of actively used data structures - their main purpose is to live on the stack as local variables.

52–54

Please move this doc to the getter/setter function. That's where people expect to find it when they're reading code.

59–60

Do we really need the other field (IsDereferenced) to be mutable? Or maybe we can keep the whole object immutable? I.e., instead of dereference() we'll get a constructor that constructs a FieldChainInfo with the same chain but with the dereferenced flag set.

87–90

All of these can be replaced with a single ProgramStateRef object.

94

This field is worth documenting.

157–158

Please explain what these methods do in a more direct manner. Something like "This method returns true if an uninitialized field was found within the region. It should only be used with regions of union-type objects".

171–172

Something's missing before with.

177–178

The second sentence would look great at the call site.

182–183

Type of an object pointed to by a void pointer is something that our path-sensitive engine is sometimes able to provide. It is not uncommon that a void pointer points to a concrete variable on the current path, and we may know it in the analyzer. I'm not sure this sort of check is necessary (i.e. require more information).

You may also want to have a look at getDynamicTypeInfo() which is sometimes capable of retrieving the dynamic type of the object pointed to by a pointer or a reference - i.e. even if it's different from the pointee type of the pointer.

186

We're traditionally writing "end anonymous namespace" or "end of anonymous namespace".

224–226

I suspect that a fatal error is better here. We don't want the user to receive duplicate report from other checkers that catch uninitialized values; just one report is enough.

234–237

I believe that you only need the first Twine; then all operator+ calls will be resolved to operator+(Twine, ...) automatically. Or just use a string and a stream like pretty much all other checkers.

245

This is the third approach to concatenating strings in the last 10 lines of code. I think you should make a method FieldChainInfo::print(llvm::raw_ostream &) and use streams everywhere.

259–261

Aha, ok, got it, so you're putting the warning at the end of the constructor and squeezing all uninitialized fields into a single warning by presenting them as notes.

Because for now notes aren't supported everywhere (and aren't always supported really well), i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.

I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.

Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field. Though i agree that when there is more than one report, the user will be likely to deal with all fields at once rather than separately.

So it's a bit wonky.

We might want to enable one mode or another in the checker depending on the analyzer output flag. Probably in the driver (RenderAnalyzerOptions()).

It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.

293

Why is checking done lazily in the first place? Are there many situations in which we want to construct the object but never do the actual scan?

If it was actually necessary, i would have split out the side effect of this getter into a separate private method.

347

Please check the type explicitly, and then assert that the value is a Loc. There are a lot of nuances that you don't want to be dealing with here (eg. ObjCObjectPointerType, which can be present in C++ as well because Objective-C++ is a thing, is not a PointerType or a ReferenceType, but it is modeled with a Loc).

353

Please formalize what you mean by a fundamental type and assert that. You probably mean something like "an integer or a floating-point number or maybe an enum or, well, bool".

I suspect that many cases here are missed (complex types, vector types, ...).

360–373

Are there many warnings that will be caused by this but won't be caused by conducting the check for the constructor-within-constructor that's currently disabled? Not sure - is it even syntactically possible to not initialize the base class?

385

Please assert here that FR is a pointer/reference-type field.

387

As i mentioned above, you can simply store a ProgramStateRef within FindUninitializedFields. In this case this code will be simplified to State->getSVal(FR).

406–407

Is this loop guaranteed to terminate? Is something like void *v; v = &v; possible?

I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.

In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for State->getSVal(R, ...), but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.

418

CSA is not a commonly used acronym around the codebase.

418

I'd like to change the word "conjure" to something else because in most cases it won't be an actual SymbolConjured, but it'd be a SymbolRegionValue or SymbolDerived. Conjuring occurs in different circumstances.

431

Add llvm_unreachable("") to find out :)

433

Needs a comment on why the IsDereferenced flag on this path.

481

Can we at least detect these situations and suppress the warning? It should be relatively easy for lambdas (captured variable mapping should be stored somewhere, as far as i remember), not sure how easy it'd be for double inheritance (there are fancy methods for scanning this stuff, but that's equivalent to actually implementing the correct warning message).

Weird warning messages are pretty scary to show to the users.

It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.

560–561

This doesn't check that the constructor on the parent stack frame is anyhow related to the current constructor, so it may introduce false negatives. For instance, a class may lazy-initialize a singleton but never store a reference to it within itself (because, well, it's a singleton, you can always obtain the reference to it). In this case we'll never find the bug in the constructor of the singleton.

test/Analysis/cxx-uninitialized-object.cpp
932

CSA is not a commonly used acronym around the codebase.

Szelethus added a comment.EditedMay 2 2018, 8:37 AM

Thanks @NoQ for taking the time to review my code!

I'm sorry that the patch became this long. It was an error on my part, and I completely get that the checker now takes an uncomfortably long time to read and understand. This is my first "bigger" project in the CSA, I learned a lot so far, and I'll definitely review how I should've handled developing and uploading it here better, so next time I will do my best to make my contributions a lot more reviewer-friendly.

edit: Not just review-wise, but development-wise. This could've easily turned out wrong as a whole, and if it was, it could've been avoided if I shared my thoughts here.

It's very unlikely that you have a single idea that takes 500 non-trivial lines of code to express.

That's true. Since this was the first time I wrote anything the StaticAnalyzer project, I often found myself almost completely rewriting everything. Looking back, I still could've uploaded the code in parts, so notes taken, will do better next time!

It'll be some time before I sort everything out you commented on, just wanted to leave this here.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

[...]i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.

This can already be achieved with -analyzer-config notes-as-events=true. There no good reason however not to make an additional flag that would emit warnings instead of notes.

I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.

This checker had a previous implementation that emitted a warning for each uninitialized field, instead of squeezing them in a single warning per constructor call. One of the major reasons behind rewriting it was that it emitted so many of them that it was difficult to differentiate which warning belonged to which constructor call. Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.

Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field.

While this is true to some extent, it's not uncommon that a single constructor call leaves 7 uninitialized -- in the LLVM/Clang project I found one that had over 30!
Since its practically impossible to determine whether this was a performance enhancement or just poor programming, the few cases where it is intentional, an object would emit many more warnings would make make majority of the findings (where an object truly had only 1 or 2 fields uninit) a lot harder to spot in my opinion.

In general, I think one warning per constructor call makes the best user experience, but a temporary solution should be implemented until there's better support for notes.

I agree, this should be a topic of a follow-up patch.

test/Analysis/cxx-uninitialized-object.cpp
1413

// no-crash is something I found in many other test files, in this case it symbolizes that the point of this test isn't to check whether the checker correctly finds or ignores an uninit field, but that it doesn't crash.

NoQ added inline comments.May 3 2018, 6:31 PM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

This can already be achieved with -analyzer-config notes-as-events=true.

Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.

Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.

This makes me a bit worried, i wonder what's the reason why does the checker shout so loudly on a codebase that doesn't seem to have actual uninitialized value bugs all over the place.

Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor). Then I wonder:

  1. How many uninitialize fields would we be able to find this way?
  2. How many of such fields were intentionally left uninitialized?
  3. How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?
NoQ added inline comments.May 4 2018, 2:30 PM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

(when i'm asking this sort of stuff, i don't mean you should look through thousands of positives, a uniform random sample of ~50 should be just fine)
(but we really do need this info before enabling stuff by default)

Szelethus added a comment.EditedMay 15 2018, 8:13 AM

I'm afraid it'll be even more time before I post a new diff. There are some things that ProgramState is lacking, so I'll get that fixed first in a different pull request.

Quick edit: When I said "non-unique warnings", I meant that warnings before uniqueing.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
259–261

This can already be achieved with -analyzer-config notes-as-events=true.

Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.

I totally agree, I meant that a note-less solution should only be a (arguably better) workaround just as notes-as-events=true.

Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor).

Sure, having the same field field reported from the same constructor call in different TUs happens quite a lot, but they can easily be uniqued.

In the checkers current state, meaning that one warning per ctor call, the LLVM/Clang project in pedantic mode resulted in 409 non-unique warnings, and using CodeChecker I found that 181 of these is unique.

How many uninitialize fields would we be able to find this way?

In its current state, in pedantic mode, after uniqueing ~581. Now that's quite a bit off from what I remembered "the warnings from this checker alone would be in the thousands", but having one warning per uninit field would still result in a ~320% increase in reports.

How many of such fields were intentionally left uninitialized?

(I checked around 95 reports a couple weeks back before uploading this diff)
Almost all of them. Note that this is a very performance-critical project. From what I saw, many times several fields are left uninitialized, but these fields are inaccessible due to a kind field being asserted at their getter functions.
Also, I found cases where the constructor wasn't defaulted with = default;.

In fact, I struggled to find a case where a field was 100% left out by accident. Maybe in llvm/lib/IR/ConstantsContext.h, calling ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE).

This is not the case however in other projects I checked. In Xerces for example every find was a true positive.

How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?

This is somewhat harder to measure. The best I could come up with is uniqueing the fieldchains from the plist files with lexicographical sorting. Your question can be answered by measuring the length of the fieldchains. I found

  • 3 fieldchains with the length of 5
  • 18 fieldchains with the length of 4
  • 62 fieldchains with the lenfth of 3
  • 108 fieldchains with the length of 2
  • 137 fieldchains with the length of 1

(that is a total of 328 unique fieldchains lexicographically)
Because heap objects are not modeled, only 3 of the 328 fieldchains contains pointer dereferences. However, reference objects are very common from what I saw, but sadly I can't give you an exact number on those.

I hope these answers were satisfying, but I'll follow up with more information if needed.

360–373

I'm not a 100% sure what you mean. Can you clarify?

Not sure - is it even syntactically possible to not initialize the base class?

If I understand the question correctly, no, as far as I know.

431

It does! :)

Szelethus updated this revision to Diff 147485.EditedMay 18 2018, 6:05 AM

In this diff, I preferred placing TODOs in places where the logic of the checker would have changed too much. While I didn't have a strict rule for this in mind, I tried changing as little as possible in order to implement the fixes in smaller followup patches.

Now with that being said, I still did make numerous changes to address the inline comments, so here is a complete list of them:

  • I placed the checker back in alpha.cplusplus
  • While the main logic of the checker is unchanged, I did make some small additions: it turned out (thanks to @NoQ) that objects I wrongly referred to as fundamental were in fact of BuiltinType or EnumeralType. In the checker I defined these types as "primitive", as neither clang or the C++ standard defines a type called "primitive"
  • A new node is added to the directed tree: objects of MemberPointerType. I updated the documentation, and added a function to FindUninitializedFields but left the implementation to be done in a followup patch
  • String concatenations are all done with streams
  • FieldChainInfo now uses llvm::ImmutableList
  • FieldChainInfo is now completely immutable
  • FindUninitializedFields no longer checks lazily, and has only a single getUninitFields public non-special member function.
  • FindUninitializedFields now has a ProgramStateRef field instead of a bunch of managers (this added a dependency D46891)
  • Added asserts and llvm_unreachable at numerous locations
  • More comments for easier reading

For the test files:

  • Changed all non-member function names to eliminate numbering
  • Split cxx-uninitialized-object.cpp up, and placed pointer/reference tests in a different file
  • Added tests for
    • loc::ConcreteInt
    • Member pointers
    • BuiltinType and EnumeralType
    • Constructorcall within a constructor, without the two constructors being in relation, to highlight a false negative
    • C++11 member initializers.

I'll be the first to admit that I'm not 100% sure I implemented the change to llvm::ImmutableList perfectly, so I guess maybe it's worth taking a deeper look at (but it certainly works!). Shall I tie its lifetime to the checker object's lifetime?
Edit: I mean, the Factory's lifetime.
Of course, I rechecked the entire LLVM/Clang project and I'm happy to report that no crashes occured, and it emitted almost the same amount of warning (almost, because I used a newer version of clang).

Szelethus marked 28 inline comments as done.May 18 2018, 6:25 AM
Szelethus added inline comments.
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
182–183

I took a look at it, but I found that it'd change the implementation of isPointerOrReferenceUninit so much, I'd be a lot more comfortable if I could implement it in a smaller followup patch. I placed some TODOs in the code about this.

224–226

I think that would be a bad idea. For example, this checker shouts so loudly while checking the LLVM project, it would practically halt the analysis of the code, reducing the coverage, which means that checkers other then uninit value checkers would "suffer" from it.

However, I also think that having multiple uninit reports for the same object might be good, especially with this checker, as it would be very easy to see where the problem originated from.

What do you think?

406–407

Is this loop guaranteed to terminate? Is something like void *v; v = &v; possible?

I looked this up, and I am confident that it is not possible for a pointer to point to itself. Only a void** object may point to a void*. The loop terminates even if you do something evil like v = reinterpret_cast<void*>(&v); (I have tried this with ints).

I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.

In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for State->getSVal(R, ...), but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.

I invested some serious effort into getDynamicType, but I think it'll require more time and some serious refactoring of this (isPointerOrReferenceUninit) function, so I'd post it separately.

433

Added a comment at the line where I dereference the field that after that point we'll check the pointee.

560–561

Added a TODO and a test case to highlight this issue, and I'm planning to fix it in a followup patch.

Szelethus marked 4 inline comments as done.May 18 2018, 6:29 AM
NoQ added a comment.May 25 2018, 2:15 PM

Ok! Putting any remaining work into follow-up patches would be great indeed. Let's land this into alpha.cplusplus for now because i still haven't made up my mind for how to organize the proposed bugprone category (sorry!).

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
224–226

Well, i guess that's the reason to not use the checker on LLVM. Regardless of fatal/nonfatal warnings, enabling this checker on LLVM regularly would be a bad idea because it's unlikely that anybody will be able to fix all the false positives to make it usable. And for other projects that don't demonstrate many false positives, this shouldn't be a problem.

In order to indicate where the problem originates from, we have our bug reporter visitors that try their best to add such info directly to the report. In particular, @george.karpenkov's recent NoStoreFuncVisitor highlights functions in which a variable was not initialized but was probably expected to be. Not sure if it highlights constructors in its current shape, but that's definitely a better way to give this piece of information to the user because it doesn't make the user look for a different report to understand the current report.

259–261

Thanks for the detailed stats! Yeah, that's a lot of "immediately" uninitialized fields. My concerns are mostly clarified. Yeah i was curious about references as well, but it's clear that chains of length 1 contain no references.

360–373

You have a check for isCalledByConstructor() in order to skip base class constructors but then you check them manually. That's a lot of code, but it seems that the same result could have been achieved by simply skipping the descent into the base class.

Same question for class-type fields, actually.

406–407

Only a void** object may point to a void*.

That's definitely not true. The whole point of void * is that it can point to anything, including another void *. In fact, even void *v = &v; compiles in both C and C++.

I think it's better to have a stronger guarantee that this loop terminates.

test/Analysis/cxx-uninitialized-object.cpp
1413

Yup, // no-crash is indeed a good thing to mention.

Note: // no-warning is also not picked up by any infrastructure. It is only for the reader.

Szelethus marked 9 inline comments as done.May 27 2018, 12:38 PM

Thanks again for taking a look :)

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
360–373

Same question for class-type fields, actually.

My problem with that approach is that the same uninitialized field could've been emitted multiple times in different warnings. From an earlier conversation (assume that its run in pedantic mode):

For this code snippet:

struct A {
   struct B {
      int x, y;
      
      B() {}
   } b;
   int w;

   A() {
      b = B();
   }
};

the warning message after a call to A::A() would be "3 uninitialized fields[...]", and for B::B() inside As constructor would be "2 uninitialized fields[...]", so it wouldn't be filtered out.

In my opinion, if A contains a field of type B, it should be the responsibility of A's constructors to initialize those fields properly.

You have a check for isCalledByConstructor() in order to skip base class constructors but then you check them manually. That's a lot of code, but it seems that the same result could have been achieved by simply skipping the descent into the base class.

I also believe that a constructor "should be held responsible" for the initialization of the inherited data members. However, in the case of base classes, uniqueing isn't an issue, as in this case:

struct B {
  int a;
  B() {}
};

struct D : public B {
  int b;
  D() {}
};

a call to D::D() would not check the inherited member (a), if I didn't implement it explicitly. That also means that if isCalledByConstructor was refactored to not filter out base classes, we would get two warning each with a single note, reporting b and a respectively. (I haven't actually checked it, but its a reasonable assumption)

Actually, I'm not 100% sure which approach is better: a warning for each base, or one warning for the object, bases included. I personally think one warning per object results in the best user experience.

I like both ideas, but I think if we'd come to the conclusion that a warning per base should be the way to go, it should definitely be implemented in a followup patch, as isCalledByConstructor is already in line for some major refactoring.

What do you think? :)

406–407

Oh, right. Yea. Silly me. Well in any case, non-void pointers can't point to themselves, as I understand it. And since void pointers are skipped for now, I think (and have found after some serious testing) that loop will terminate just fine.

I think it's better to have a stronger guarantee that this loop terminates.

Agreed. I see now however why did you find parts of this function so fishy.

Szelethus marked 4 inline comments as done.May 28 2018, 3:36 AM

I decided to mark the inline comments in isPointerOrReferenceUninit regarding the dereferencing loop termination done for now. I left several TODO's in the function to be fixed later, with many of them depending on one another, so fixing them all or many at once would probably be the best solution.

I left two conversations "not done" (fatal/nonfatal errors and base classes), but if I may, do you have any other objections?

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
224–226

LLVM is a special project in the fact that almost every part of it is so performance critical, that leaving many fields uninit matters. However, I would believe that in most projects, only a smaller portion of the code would be like that.

Suppose that we have a project that also defines a set of ADTs, like an std::list-like container. If that container had a field that would be left uninit after a ctor call, analysis on every execution path would be halted which would use an object like that.

My point is, as long as there is no way to tell the analyzer (or the checker) to ignore certain constructor calls, I think it would be best not to generate a fatal error.

Regardless of fatal/nonfatal warnings, enabling this checker on LLVM regularly would be a bad idea because it's unlikely that anybody will be able to fix all the false positives to make it usable. And for other projects that don't demonstrate many false positives, this shouldn't be a problem.

I wouldn't necessarily call them false positives. This checker doesn't look for bugs, and all reports I checked were correct in the fact that those fields really were left uninit. They just don't cause any trouble (just yet!).

360–373

[...]but it seems that the same result could have been achieved by simply skipping the descent into the base class.

I can't edit inline comments sadly, but I'd just like to point out that it wouldn't achieve the same thing, as highlighted above. But, both solution would be okay.

@Szelethus I personally really like this idea, and I think it would be a great checker.
Could you perform more evaluation on other projects?

george.karpenkov resigned from this revision.May 30 2018, 4:36 PM
george.karpenkov added a subscriber: george.karpenkov.
Szelethus updated this revision to Diff 149290.May 31 2018, 7:44 AM
  • Rebased to 1cefbc5593d2f017ae56a853b0723a31865aa602 (revision 333276)
  • Fixed some typos
  • Added tests CyclicPointerTest and CyclicVoidPointerTest to highlight an issue to be fixed in a later patch
xazax.hun added inline comments.Jun 1 2018, 8:37 AM
lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
224–226

I think of this check as a tool to support a specific programming model where every field needs to be initialized by the constructor. This programming model might be followed by some parts of the projects while a 3rd party library in the same project or some other files might not follow this model. Right now there is no easy way to turn off some checks for a set of files, and there is no way to turn off a set of checks for some headers. For this reason, I think it is not a good idea to make these errors fatal, as 3rd party headers might reduce the coverage of the analysis on a project in a way that the user cannot control.

If we are afraid of having multiple reports from this check we could turn off the check for that particular path, for example, we could have a bool stored in the GDM for each path whether this check is already reported an error or not and we can check that before emitting warnings.

Szelethus marked 5 inline comments as done.Jun 4 2018, 3:45 AM

@Szelethus I personally really like this idea, and I think it would be a great checker.
Could you perform more evaluation on other projects?

Thanks for the encouraging words!

I have checked some other projects:

  • Xerces: I only found one uninit field (shockingly), and I found that it was probably unintentional.
  • grpc: This project depends on numerous third-party libraries, so it has a variety of coding and implementation styles. The checker emitted 5 warnings, and I think it's very likely that at least 3 of those were unintentional (4, if we count the missing =default;).
  • CppCheck: No warnings at all :)

The checker didn't crash during the analysis of any of the projects.

After some thinking, I decided to mark the comments on base classes done. I believe that the current implementation results in the best user experience, but even if we were come to the conclusion that it is not, we could resume the conversation on this when I submit a patch to fix related issues within isCalledByConstructor.

Szelethus updated this revision to Diff 150899.Jun 12 2018, 1:31 AM

Polite ping :)

This diff fixed a minor issue in FieldChainInfo's constructor.

NoQ accepted this revision.Jun 15 2018, 2:35 PM

Ok, let's land this one and see how it goes! I'm looking forward to seeing the follow-up patches.

lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp
360–373

I still don't fully understand the reasoning behind who's responsible for initializing the field, i.e., the exact rule that you're enforcing seems to cause warnings to depend heavily on meta-information such as where does the analysis start, which is confusing. Because we're trying to enforce a coding guideline here, i believe that the guideline itself should be transparent and easy to explain. In particular, it should be obvious to the user which constructor is responsible for initializing a field, and by obvious i mean it should be obvious both from the guideline itself and from the analyzer's warning, and both of them should be the same "obvious".

whisperity accepted this revision.Jun 16 2018, 1:41 PM

Ah, and the function names in the test files have been made more logical.

Let's roll. 😉

This revision is now accepted and ready to land.Jun 16 2018, 1:41 PM
Szelethus updated this revision to Diff 151685.EditedJun 18 2018, 4:33 AM

Whoo! Thank you all for helping me with the reviews. Very excited about the upcoming fixes, I have neat ideas for some already.

  • Rebased to 8186139d6aa0619e2057ae617c074e486a7b2f2b (revision 334929),
  • Added a FIXME around base class handling as a reminder that this topic still needs to be discussed,
  • Re-run tests on some projects just to be sure.
This revision was automatically updated to reflect the committed changes.

If you are interested, I executed this checker on Firefox code. Results can be found here
http://sylvestre.ledru.info/reports/fx-scan-build/

Andi reported this bug https://bugs.llvm.org/show_bug.cgi?id=37965

@sylvestre.ledru Have you found any actual bugs using this checker?
@Szelethus Interesting, it seems that the pointer itself is initialized, but not what it's pointing to.
I think we should just check the fields directly, and do not attempt to traverse the pointer hierarchy.

If you are interested, I executed this checker on Firefox code. Results can be found here
http://sylvestre.ledru.info/reports/fx-scan-build/

Absolutely, thank you! :)

Andi reported this bug https://bugs.llvm.org/show_bug.cgi?id=37965

Well, that is intentional: not the pointer, but the pointee is uninitialized, as you can see from the note message. Now with that being said, I have had an other colleague of mine complain about a report, as he didn't see that the note message said "pointee" not "pointer", so maybe there's a point in trying to come up with a better message.

@sylvestre.ledru Have you found any actual bugs using this checker?
@Szelethus Interesting, it seems that the pointer itself is initialized, but not what it's pointing to.

Exactly.

I think we should just check the fields directly, and do not attempt to traverse the pointer hierarchy.

Hmm, that's one way of thinking, but I think it's more beneficial to check pointers too.

I'll take a look at some results and try to get back to you with some stats to support my view on this issue :)