This patch removes DiagnoseUninitializedFields from SemaDeclCXX, and modifies the UninitializedValues checker to detect the use of uninitialized fields not only in member initializer lists, but also in the constructor bodies.
Details
Diff Detail
Event Timeline
This looks really exciting, and a great direction for the -Wuninitialized warning. (I'm a little sad that we're losing the enabled-by-default warning here, but I can live with that...)
I think there are a few small pieces here that you could factor out and could be committed separately (handling of .* and ->*, for instance).
Longer-term, I think it'd be useful to extend this to model subobjects of fields and base classes (for instance, we should catch using a field of base class B2 from an initializer for base class B1), but this is a big step in the right direction, and I certainly don't want any scope creep on this change.
Have you run this over a significant quantity of code? Does it find many bugs, and were there any false positives?
lib/Analysis/UninitializedValues.cpp | ||
---|---|---|
53–54 | Micro-optimization: move the if (!cd) test to before the loop. | |
97 | The isTrackedField check here seems redundant. All fields you will see here are tracked. | |
304–310 | Why do you need the extra complexity of building a list and iterating backwards here, rather than just visiting the children inside the first loop? | |
306 | " &", not "& ", please. |
lib/Analysis/UninitializedValues.cpp | ||
---|---|---|
41 | You seem to have no problem dealing with fields of non-scalar, non-vector types. I expect you could do the same for variables. | |
55 | What if the field is of a class type with no data members? struct X {}; struct Y { X x, y; Y() : y(x) {} // should probably not warn, but presumably will }; | |
323 | if is redundant: fd can't be null here. Please combine this with the above check to use a dyn_cast: FieldDecl *fd = dyn_cast<FieldDecl>(me->getMemberDecl()); if (!fd) return 0; // ... | |
392 | I don't think it matters, but !D || !isTracked(D) makes more sense to me. | |
426 | I think this should be classified as Use. If we have: (x ?: y) = 0; x is used before it's initialized. Maybe this doesn't matter, because we'll separately have an lvalue-to-rvalue conversion for x? Can you change the above ConditionalOperator code to instead handle AbstractConditionalOperator and fold the two checks together? | |
469–470 | Why is this necessary? This would seem to fall under the general rule that "if we can't classify it, it's initialization". | |
493 | Did you mean to use IsPointerToConst here? | |
496–500 | Please add braces around this. | |
512 | This looks like it would do the wrong thing if we had a c-style cast that only performed an lvalue-to-rvalue conversion or decay. Push the return into the void check? | |
519 | This looks curious. Treating this case as Init seems appropriate to me: we don't track how the resulting pointer is used, and it might be used to initialize the object. Is the idea to ignore all array subobjects of class type in the analysis? That is: struct X { int a[3]; int b; }; struct Y { X x; Y() { x.a[0] = 1; // don't treat 'x' as initialized x.a[1] = x.b; // warning: 'x' used uninitialized } }; If so, this really deserves a comment. | |
535 | Seems this case should apply if any of the FieldDecls in the chain are of reference type, not just if the last one is. | |
552–557 | This looks suspicious. If we're visiting a CXXDefaultInitExpr for an object other than this, and that initializer references this, we'll think that we have a reference to one of our subobjects, but we don't. | |
566 | " *", not "* ". | |
567 | Do you really need a member for this? Seems trivial to reconstruct. | |
755–756 | I assume you're trying to cope with this[0].blah here? This doesn't look quite right: it'll also find this[-1].blah (which might sometimes be valid) and will do the wrong thing on 0[this].blah. Maybe just drop the array indexing support here -- it's a corner case and not worth the cost of getting all the details right. | |
765–769 | What about indirect fields from anonymous structs or unions? | |
803–817 | Handle this in VisitCallExpr instead. This approach will do the wrong thing for a call to an operator marked returns_twice or analyzer_noreturn, or for an explicit call to the assignment operator, like this->operator=(foo); | |
824–825 | if (constructor) ? | |
839–852 | Please either fold this into VisitCallExpr or call VisitCallExpr if you don't handle the call here. | |
884 | Is the cast here correct? findLeftmostMemberExpr might return a MemberExpr that references a static member (for an expression like this->static_member), and that's a VarDecl, not a FieldDecl. | |
898–899 | I'm unconvinced that this and the CXXThisExpr-as-function-argument checks are sufficient; CXXThisExpr could be used to indirectly initialize the object through various other forms of expression. I think you should make the classification code also classify all the CXXThisExprs, and treat all fields as initialized if we see a CXXThisExpr that is classified as Init. | |
949–956 | Again, I don't think this does the right thing if this appears in a default initializer for an object other than *this. Elsewhere we handle this by visiting the CXXDefaultInitExpr while holding an override for what this binds to. You could do something similar here. | |
986–987 | Do you need to pass these in? TransferFunctions can work this out for itself. | |
993 | " *", not "* ". | |
lib/Sema/AnalysisBasedWarnings.cpp | ||
1243 | " *" | |
lib/Sema/SemaDeclCXX.cpp | ||
7980–7984 | A comment explaining this would be useful. | |
test/SemaCXX/uninitialized.cpp | ||
444 | We should really warn on this. A FIXME for that would be nice. |
Thanks for the review.
Yes, I've run it over a big code base. I didn't spot false positives, but I still think there are too many false negatives.
lib/Analysis/UninitializedValues.cpp | ||
---|---|---|
41 | Maybe. I'd prefer to run a few test before. | |
53–54 | Done. | |
55 | It doesn't, x is default initialized. | |
97 | Done. | |
304–310 | That's the way statements are appended to the CFG in the base case. | |
306 | Done. | |
323 | Done. | |
392 | Done. | |
426 | Done. | |
469–470 | Done. It wasn't, indeed. | |
493 | Done. | |
496–500 | Done. | |
512 | Done. | |
519 | This is actually another case in which x is treated as initialized because the default constructor is called, hence x is immediately treated as initialized. This was intended for: void foo(int *a); struct X { int a[3]; int b; X() { foo(a); b = a[1]; // do not warn on 'a' here. } }; But actually it's not necessary because the current code is not classifying ArraySubscriptExprs. | |
535 | I'm not sure I understand: struct A { int &a; }; struct B { A &a; B(int) {} }; struct C { B &ref; B b; C() : b(ref.a.a) {} C(B &o) : b(o.a.a) {} }; The only thing we know here is that "ref" is uninitialized. | |
552–557 | isTrackedField verifies that the specified field has the same parent of the constructor. A CXXDefaultInitExpr appears either in a CtorInitializer or within the context of (c++1y) aggregate initialization, in which case the record of the current constructor and that of any MemberExpr## in the aggregate can't be the same. | |
566 | Done. | |
567 | Done. | |
755–756 | Actually no. This just detects things like struct X { int x, y; X(int f) { reinterpret_cast<int*>(this)[0] = f; y = x; } }; Warning on y = x is wrong. On the other hand, I'm simply doing as if this escaped analysis here, and yes, it won't work on 0[reinterpret_cast<int*>(this)] = f. Should I still drop this? | |
765–769 | Done. | |
803–817 | Done. | |
824–825 | Done. | |
839–852 | Done. | |
884 | Yes. classification.get(e) calls ClassifyRefs::isTracked, which will be false for such a VarDecl. | |
898–899 | I'll try to do it. I need a little more time. | |
949–956 | Please see my comment in ClassifyRefs::VisitCXXDefaultInitExpr. | |
986–987 | Done. | |
993 | Done. | |
lib/Sema/AnalysisBasedWarnings.cpp | ||
1243 | Done. | |
lib/Sema/SemaDeclCXX.cpp | ||
7980–7984 | Done. | |
test/SemaCXX/uninitialized.cpp | ||
444 | I don't think we should. J is a union, and one of its fields (a) has been initialized. |
lib/Analysis/UninitializedValues.cpp | ||
---|---|---|
304–310 | Hmm, does this do good things if there is control flow in the CXXDefaultInitExpr? struct S { enum Kind { Foo, Bar } K; int FooVal; int BarVal; int Value = K == Foo ? FooVal : BarVal; S() : K(Foo), FooVal(0) {} // warning about uninitialized use of BarVal? }; Maybe we should teach the CFG builder to inline CXXDefaultInitExprs when they appear as a CXXCtorInitializer? That should not break the CFG's invariants, and allow you to drop a lot (all?) of your special-case handling for them (since you don't actually want to walk into them when they occur within aggregate initialization). | |
519 | My preference would be to keep this patch as small as possible while adding the new constructor-handling functionality. I'd prefer for ArraySubscriptExpr support to be added separately if that's practical. | |
535 | Suppose we had: struct A { int n; }; struct B { A &a; }; struct C { int n; B b; C(A &a) : b{a}, n(b.a.n = 0) {} }; Here (because the n initializer runs before the b initializer), b is uninitialized in the b.a.n = 0 expression. Because b.a is a reference, that expression is a use of b, not an initialization of it. So we should diagnose this. | |
552–557 | I don't think that is the case. For instance: struct A { int a; int b; int c; struct B; A(); }; struct A::B : A { int d = (c = 0); }; A::A() : a(B{}.d), b(c) {} // B{} should not mark our `c` as initialized. As I alluded to in another comment, I think we could handle this correctly by not walking into a CXXDefaultInitExpr in aggregate initialization (perhaps only walk into them when they appear at the top level in a CtorInitializer, and do that in the CFG builder). | |
755–756 | Unless you have real-world examples of bugs the extra accuracy catches here, I'd prefer to drop this for simplicity's sake. | |
test/SemaCXX/uninitialized.cpp | ||
444 | Well, it's undefined behavior to write to field a then read from field b. The "uninitialized" warning isn't really the right one to issue, though, so I think your approach in this patch is fine. |
include/clang/Analysis/Analyses/UninitializedValues.h | ||
---|---|---|
51 | Done. | |
74 | Done. | |
lib/Analysis/UninitializedValues.cpp | ||
304–310 | The only difference I see is that, as there's no control flow information at all, what's usually reported as sometimes uninitialized will be reported as always uninitialized. | |
493 | I've changed this a bit to make sure that #IsPointerToConst() is only called if (*I)->isGLValue()## is false. This is to avoid a regression for cases like: int foo(const int *&a) { a = &some_global_int_; } void bar() { const int *j; foo(j); // don't classify the DeclRefExpr for j as Ignore here, just let it escape int a = *j; } | |
519 | Acknowledged. | |
535 | Thanks, you helped me diagnose a bigger problem here (record access wasn't being counted as use in general). struct A { int &a; int &b; int n; A() : a(b), n(b) {} }; the use of b in n(b) will be detected because of the lvalue-to-rvalue cast, but there's no such cast on references. | |
552–557 | In that case B{} is a CXXConstructExpr. You can't use aggregate initialization for B because it has a base class. | |
755–756 | Actually, it prevents a false warning, but it's still not common. I'll get rid of it. And I've realized that this function could get into an infinite loop. Fixed that. | |
test/SemaCXX/uninitialized.cpp | ||
444 | Acknowledged. |
This looks really great. Let's start getting bits of this checked in!
I've marked a few pieces of this patch that I'd like to be checked in as separate pieces (to avoid the big monolithic change here). In addition to those, can you factor out the CXXDefaultInitExpr handling code into a separate patch, too?
Additionally, now we've basically converged on the functional side, there's a couple of coding standard issues that should be sorted out prior to commit. Variable names should be capitalized (don't worry about pre-existing variables in this file, it's currently a big mess with regard to coding style issues), and you have a "* " in ClassifyRefs::VisitMemberExpr.
After getting the side-pieces committed, we can circle back to the bulk of this patch. There's only really one substantive comment that I have there (and I think it's something you said you're already working on): it's hard to tell if CXXThisExpr is being handled appropriately in all cases, and I'd like for it to be tracked with a ClassifyRefs-based approach too (so if a CXXThisExpr escapes we can conservatively-correctly mark all fields as initialized). I don't know what you think about this -- do you think the patch is already handling this sufficiently reliably that it'd be OK to commit without this, and fix it up later? I know you've tested this against a lot of code already, but cases like passing this through an ObjCMessageExpr or a CXXConstructExpr (which may not have come up in your testing) don't look like they're handled here.
lib/Analysis/UninitializedValues.cpp | ||
---|---|---|
391 | Do you need the isa check here? This should only be called on DeclRefExpr and MemberExpr, as far as I can tell, so we should assert if something else gets passed in (or, ideally, not compile). | |
393–394 | I'm concerned about this removal. The old model was:
The new model seems to be:
I think this may be subtly incorrect. If we have: int x; x = x + 1; ... and the CFG happens to emit the LHS and then the RHS, we won't find the uninitialized use, because we'll think that x is already initialized. | |
413–437 | The improvements here are great, but they're orthogonal to handling class members. Please factor them out into a separate patch. | |
476–480 | This comment could describe what's going on here better. Usually if an address is taken, we treat the value as initialized; the thing we should call out here is that the value is *not* treated as initialized if passed to a const pointer parameter. Maybe just change the first comment to "If a value is passed by const pointer or const reference to a function, ..."? | |
487–493 | This is orthogonal to field handling; please break this out into a separate patch. | |
495–500 | What happens here: int f(); struct S { int a, b; int init() { a = b = 0; return f(); } }; struct T { int k; S s; T() { s.init(); } // ok? warning? T(int) : k(s.init()) {} // ok? warning? }; I think both of these are fine: because T::s has trivial initialization, its lifetime begins when storage for it is obtained (per [basic.life]p1). | |
519–523 | Is this necessary? The VisitMemberExpr check seems like it should do the right thing here. | |
541–544 | This doesn't look quite right (though I think the issue is in findLeftmostMemberExpr). Consider: this->member.static_member.non_static_member When we visit this, I think we'll mark 'member' as Use. Suggestion: teach findLeftmostMemberExpr not to walk through MemberExprs for static data members. | |
547–550 | I think you should also handle the arguments the same way that VisitCallExpr does. struct A { A(const int&); }; struct B { int x, y; B() { A a(x); y = x; } // should warn on this }; I think we won't warn here, because our usual handling for conservatively marking the x in a(x) as Ignore won't fire. (Even if this is a copy or move constructor, it could still have additional arguments, so you should do this in that case too.) That said, this is also orthogonal to handling of fields, and fixing this in a separate patch would be preferred. | |
993 | This counting approach seems rather complex to me, and I think we can get the same effect with something much simpler: remove the 'ReportConstructor' code entirely, and when you come to issue a diagnostic for an uninitialized use of a field, check whether the source location of the use is within the source range of the constructor. If not, emit the additional note pointing out which constructor it was. | |
lib/Sema/AnalysisBasedWarnings.cpp | ||
1672 | I don't think you need this any more. |
The analyzer can probably be modified (later) to take advantage of this as well. Awesome.
lib/Analysis/CFG.cpp | ||
---|---|---|
769–777 | I'm assuming "the context is not ambiguous here" means "we can guarantee that this expression will only appear once in the CFG, unlike when it is used in list-initialization". That's why we didn't do this before. It would be nice to have a CFG validator that enforced this "appears-once" restriction, but that can come later. | |
lib/Analysis/UninitializedValues.cpp | ||
993 | I'm always iffy about source ranges because of macros. It doesn't seem like it'd be too awful if it misfired here, but at the same time you could notice that the field (a) has an in-class initializer, and (b) does not have an associated initializer. (And calculate the latter with a simple linear walk, since we've already decided to emit an error message.) |
lib/Analysis/CFG.cpp | ||
---|---|---|
769–777 | Exactly; we only expand default initializers when they're (implicitly) used from a constructor initializer list, and each such default initializer can only be used once in any given constructor. Maybe a more detailed comment here would help. | |
lib/Analysis/UninitializedValues.cpp | ||
993 | isBeforeInTranslationUnit does the right thing in the presence of macros (it compares the order of tokens after preprocessing). |
Please extend the comment here: something like "If the use happens in a default initializer for a field, this is the constructor in which the default initializer was used."