This is an archive of the discontinued LLVM Phabricator instance.

Make UninitializedValues detect the use of uninitialized fields in a constructor.
Needs ReviewPublic

Authored by epertoso on Nov 14 2013, 8:33 AM.

Details

Reviewers
rsmith
Summary

This patch removes DiagnoseUninitializedFields from SemaDeclCXX, and modifies the UninitializedValues checker to detect the use of uninitialized fields not only in member initializer lists, but also in the constructor bodies.

Diff Detail

Event Timeline

+richard as reviewer

This looks really exciting, and a great direction for the -Wuninitialized warning. (I'm a little sad that we're losing the enabled-by-default warning here, but I can live with that...)

I think there are a few small pieces here that you could factor out and could be committed separately (handling of .* and ->*, for instance).

Longer-term, I think it'd be useful to extend this to model subobjects of fields and base classes (for instance, we should catch using a field of base class B2 from an initializer for base class B1), but this is a big step in the right direction, and I certainly don't want any scope creep on this change.

Have you run this over a significant quantity of code? Does it find many bugs, and were there any false positives?

lib/Analysis/UninitializedValues.cpp
53–54

Micro-optimization: move the if (!cd) test to before the loop.

97

The isTrackedField check here seems redundant. All fields you will see here are tracked.

304–310

Why do you need the extra complexity of building a list and iterating backwards here, rather than just visiting the children inside the first loop?

306

" &", not "& ", please.

rsmith added inline comments.Nov 14 2013, 8:50 PM
lib/Analysis/UninitializedValues.cpp
41

You seem to have no problem dealing with fields of non-scalar, non-vector types. I expect you could do the same for variables.

55

What if the field is of a class type with no data members?

struct X {};
struct Y {
  X x, y;
  Y() : y(x) {} // should probably not warn, but presumably will
};
323

if is redundant: fd can't be null here. Please combine this with the above check to use a dyn_cast:

FieldDecl *fd = dyn_cast<FieldDecl>(me->getMemberDecl());
if (!fd)
  return 0;
// ...
392

I don't think it matters, but !D || !isTracked(D) makes more sense to me.

426

I think this should be classified as Use. If we have:

(x ?: y) = 0;

x is used before it's initialized. Maybe this doesn't matter, because we'll separately have an lvalue-to-rvalue conversion for x?

Can you change the above ConditionalOperator code to instead handle AbstractConditionalOperator and fold the two checks together?

469–470

Why is this necessary? This would seem to fall under the general rule that "if we can't classify it, it's initialization".

493

Did you mean to use IsPointerToConst here?

496–500

Please add braces around this.

512

This looks like it would do the wrong thing if we had a c-style cast that only performed an lvalue-to-rvalue conversion or decay. Push the return into the void check?

519

This looks curious. Treating this case as Init seems appropriate to me: we don't track how the resulting pointer is used, and it might be used to initialize the object. Is the idea to ignore all array subobjects of class type in the analysis? That is:

struct X {
  int a[3];
  int b;
};
struct Y {
  X x;
  Y() {
    x.a[0] = 1; // don't treat 'x' as initialized
    x.a[1] = x.b; // warning: 'x' used uninitialized
  }
};

If so, this really deserves a comment.

535

Seems this case should apply if any of the FieldDecls in the chain are of reference type, not just if the last one is.

552–557

This looks suspicious. If we're visiting a CXXDefaultInitExpr for an object other than this, and that initializer references this, we'll think that we have a reference to one of our subobjects, but we don't.

566

" *", not "* ".

567

Do you really need a member for this? Seems trivial to reconstruct.

755–756

I assume you're trying to cope with this[0].blah here? This doesn't look quite right: it'll also find this[-1].blah (which might sometimes be valid) and will do the wrong thing on 0[this].blah. Maybe just drop the array indexing support here -- it's a corner case and not worth the cost of getting all the details right.

765–769

What about indirect fields from anonymous structs or unions?

803–817

Handle this in VisitCallExpr instead. This approach will do the wrong thing for a call to an operator marked returns_twice or analyzer_noreturn, or for an explicit call to the assignment operator, like this->operator=(foo);

824–825

if (constructor) ?

839–852

Please either fold this into VisitCallExpr or call VisitCallExpr if you don't handle the call here.

884

Is the cast here correct? findLeftmostMemberExpr might return a MemberExpr that references a static member (for an expression like this->static_member), and that's a VarDecl, not a FieldDecl.

898–899

I'm unconvinced that this and the CXXThisExpr-as-function-argument checks are sufficient; CXXThisExpr could be used to indirectly initialize the object through various other forms of expression. I think you should make the classification code also classify all the CXXThisExprs, and treat all fields as initialized if we see a CXXThisExpr that is classified as Init.

949–956

Again, I don't think this does the right thing if this appears in a default initializer for an object other than *this. Elsewhere we handle this by visiting the CXXDefaultInitExpr while holding an override for what this binds to. You could do something similar here.

986–987

Do you need to pass these in? TransferFunctions can work this out for itself.

993

" *", not "* ".

lib/Sema/AnalysisBasedWarnings.cpp
1243

" *"

lib/Sema/SemaDeclCXX.cpp
7980–7984

A comment explaining this would be useful.

test/SemaCXX/uninitialized.cpp
444

We should really warn on this. A FIXME for that would be nice.

epertoso updated this revision to Unknown Object (????).Nov 18 2013, 11:17 AM

Thanks for the review.

Yes, I've run it over a big code base. I didn't spot false positives, but I still think there are too many false negatives.

lib/Analysis/UninitializedValues.cpp
41

Maybe. I'd prefer to run a few test before.

53–54

Done.

55

It doesn't, x is default initialized.

97

Done.

304–310

That's the way statements are appended to the CFG in the base case.
But it doesn't seem to cause problems if I don't do it here (classify is always called when there's enough context).

306

Done.

323

Done.

392

Done.

426

Done.
But we have to explicitly classify the common expr as Use, because an OpaqueValueExpr will appear between the lvalue-to-rvalue cast and x.

469–470

Done. It wasn't, indeed.

493

Done.

496–500

Done.

512

Done.

519

This is actually another case in which x is treated as initialized because the default constructor is called, hence x is immediately treated as initialized.
The analysis could potentially be made smarter for aggregates, but I'm not sure this patch is the best place to do it.

This was intended for:

void foo(int *a);
struct X {
  int a[3];
  int b;
  X() {
    foo(a);
    b = a[1]; // do not warn on 'a' here.
  }
};

But actually it's not necessary because the current code is not classifying ArraySubscriptExprs.
Should I try to deal with ArraySubscriptExpr it in this same patch?

535

I'm not sure I understand:

struct A {
  int &a;
};
struct B {
  A &a;
  B(int) {}
};
struct C {
  B &ref;
  B b;
  C() : b(ref.a.a) {}
  C(B &o) : b(o.a.a) {}
};

The only thing we know here is that "ref" is uninitialized.
Do you have anything specific in mind?

552–557

isTrackedField verifies that the specified field has the same parent of the constructor. A CXXDefaultInitExpr appears either in a CtorInitializer or within the context of (c++1y) aggregate initialization, in which case the record of the current constructor and that of any MemberExpr## in the aggregate can't be the same.

566

Done.

567

Done.

755–756

Actually no. This just detects things like

struct X {
  int x, y;
  X(int f) {
    reinterpret_cast<int*>(this)[0] = f;
    y = x;
  }
};

Warning on y = x is wrong. On the other hand, I'm simply doing as if this escaped analysis here, and yes, it won't work on 0[reinterpret_cast<int*>(this)] = f.
It's still seems to be a corner case, though, plus it's not warning on member access via this[0].

Should I still drop this?

765–769

Done.

803–817

Done.
An explicit call to the assignment operator would have been handled by VisitCXXMemberCallExpr, but I've also moved that code in VisitCallExpr.
I added the tests for both: *this = foo and this->operator=(foo) in uninitialized.cpp.

824–825

Done.

839–852

Done.

884

Yes. classification.get(e) calls ClassifyRefs::isTracked, which will be false for such a VarDecl.

898–899

I'll try to do it. I need a little more time.

949–956

Please see my comment in ClassifyRefs::VisitCXXDefaultInitExpr.

986–987

Done.

993

Done.

lib/Sema/AnalysisBasedWarnings.cpp
1243

Done.

lib/Sema/SemaDeclCXX.cpp
7980–7984

Done.

test/SemaCXX/uninitialized.cpp
444

I don't think we should. J is a union, and one of its fields (a) has been initialized.
If you're really sure that a warning should still be issued here, it's actually easier to do so rather than not.

rsmith added inline comments.Nov 18 2013, 1:55 PM
lib/Analysis/UninitializedValues.cpp
304–310

Hmm, does this do good things if there is control flow in the CXXDefaultInitExpr?

struct S {
  enum Kind { Foo, Bar } K;
  int FooVal;
  int BarVal;
  int Value = K == Foo ? FooVal : BarVal;
  S() : K(Foo), FooVal(0) {} // warning about uninitialized use of BarVal?
};

Maybe we should teach the CFG builder to inline CXXDefaultInitExprs when they appear as a CXXCtorInitializer? That should not break the CFG's invariants, and allow you to drop a lot (all?) of your special-case handling for them (since you don't actually want to walk into them when they occur within aggregate initialization).

519

My preference would be to keep this patch as small as possible while adding the new constructor-handling functionality. I'd prefer for ArraySubscriptExpr support to be added separately if that's practical.

535

Suppose we had:

struct A { int n; };
struct B { A &a; };
struct C {
  int n;
  B b;
  C(A &a) : b{a}, n(b.a.n = 0) {}
};

Here (because the n initializer runs before the b initializer), b is uninitialized in the b.a.n = 0 expression. Because b.a is a reference, that expression is a use of b, not an initialization of it. So we should diagnose this.

552–557

I don't think that is the case. For instance:

struct A {
  int a;
  int b;
  int c;
  struct B;
  A();
};
struct A::B : A {
  int d = (c = 0);
};
A::A() : a(B{}.d), b(c) {} // B{} should not mark our `c` as initialized.

As I alluded to in another comment, I think we could handle this correctly by not walking into a CXXDefaultInitExpr in aggregate initialization (perhaps only walk into them when they appear at the top level in a CtorInitializer, and do that in the CFG builder).

755–756

Unless you have real-world examples of bugs the extra accuracy catches here, I'd prefer to drop this for simplicity's sake.

test/SemaCXX/uninitialized.cpp
444

Well, it's undefined behavior to write to field a then read from field b. The "uninitialized" warning isn't really the right one to issue, though, so I think your approach in this patch is fine.

rsmith added inline comments.Nov 18 2013, 2:00 PM
include/clang/Analysis/Analyses/UninitializedValues.h
51

Please extend the comment here: something like "If the use happens in a default initializer for a field, this is the constructor in which the default initializer was used."

74

Maybe getConstructorForDefaultInit?

epertoso updated this revision to Unknown Object (????).Nov 21 2013, 12:31 PM
epertoso added inline comments.
include/clang/Analysis/Analyses/UninitializedValues.h
51

Done.

74

Done.

lib/Analysis/UninitializedValues.cpp
304–310

The only difference I see is that, as there's no control flow information at all, what's usually reported as sometimes uninitialized will be reported as always uninitialized.
But I'll have a look at the CFGBuilder.

493

I've changed this a bit to make sure that #IsPointerToConst() is only called if (*I)->isGLValue()## is false. This is to avoid a regression for cases like:

int foo(const int *&a) { a = &some_global_int_; }
void bar() {
  const int *j;
  foo(j); // don't classify the DeclRefExpr for j as Ignore here, just let it escape
  int a = *j;
}
519

Acknowledged.

535

Thanks, you helped me diagnose a bigger problem here (record access wasn't being counted as use in general).
I did this because when you have

struct A {
  int &a;
  int &b;
  int n;
  A() : a(b), n(b) {}
};

the use of b in n(b) will be detected because of the lvalue-to-rvalue cast, but there's no such cast on references.
Each MemberExpr is visited, so the code warns on your example. I've added a test case.

552–557

In that case B{} is a CXXConstructExpr. You can't use aggregate initialization for B because it has a base class.
Unless there's a change in the aggregate definition after n3653.

755–756

Actually, it prevents a false warning, but it's still not common. I'll get rid of it.

And I've realized that this function could get into an infinite loop. Fixed that.

test/SemaCXX/uninitialized.cpp
444

Acknowledged.

epertoso updated this revision to Unknown Object (????).Dec 3 2013, 8:26 AM

CFGBuilder now appends the expressions wrapped by a CXXDefaultInitExpr.

This looks really great. Let's start getting bits of this checked in!

I've marked a few pieces of this patch that I'd like to be checked in as separate pieces (to avoid the big monolithic change here). In addition to those, can you factor out the CXXDefaultInitExpr handling code into a separate patch, too?

Additionally, now we've basically converged on the functional side, there's a couple of coding standard issues that should be sorted out prior to commit. Variable names should be capitalized (don't worry about pre-existing variables in this file, it's currently a big mess with regard to coding style issues), and you have a "* " in ClassifyRefs::VisitMemberExpr.

After getting the side-pieces committed, we can circle back to the bulk of this patch. There's only really one substantive comment that I have there (and I think it's something you said you're already working on): it's hard to tell if CXXThisExpr is being handled appropriately in all cases, and I'd like for it to be tracked with a ClassifyRefs-based approach too (so if a CXXThisExpr escapes we can conservatively-correctly mark all fields as initialized). I don't know what you think about this -- do you think the patch is already handling this sufficiently reliably that it'd be OK to commit without this, and fix it up later? I know you've tested this against a lot of code already, but cases like passing this through an ObjCMessageExpr or a CXXConstructExpr (which may not have come up in your testing) don't look like they're handled here.

lib/Analysis/UninitializedValues.cpp
391

Do you need the isa check here? This should only be called on DeclRefExpr and MemberExpr, as far as I can tell, so we should assert if something else gets passed in (or, ideally, not compile).

393–394

I'm concerned about this removal. The old model was:

  • Visiting the DeclRefExpr on the LHS of an assignment does not initialize the variable
  • Visiting the assignment expression itself initializes the variable on its LHS

The new model seems to be:

  • Visiting the DeclRefExpr or MemberExpr on the LHS of an assignment initializes the variable or field
  • Visiting the assignment expression itself *also* initializes the variable

I think this may be subtly incorrect. If we have:

int x;
x = x + 1;

... and the CFG happens to emit the LHS and then the RHS, we won't find the uninitialized use, because we'll think that x is already initialized.

413–437

The improvements here are great, but they're orthogonal to handling class members. Please factor them out into a separate patch.

476–480

This comment could describe what's going on here better. Usually if an address is taken, we treat the value as initialized; the thing we should call out here is that the value is *not* treated as initialized if passed to a const pointer parameter. Maybe just change the first comment to "If a value is passed by const pointer or const reference to a function, ..."?

487–493

This is orthogonal to field handling; please break this out into a separate patch.

495–500

What happens here:

int f();
struct S {
  int a, b;
  int init() { a = b = 0; return f(); }
};
struct T {
  int k;
  S s;
  T() { s.init(); } // ok? warning?
  T(int) : k(s.init()) {} // ok? warning?
};

I think both of these are fine: because T::s has trivial initialization, its lifetime begins when storage for it is obtained (per [basic.life]p1).

519–523

Is this necessary? The VisitMemberExpr check seems like it should do the right thing here.

541–544

This doesn't look quite right (though I think the issue is in findLeftmostMemberExpr). Consider:

this->member.static_member.non_static_member

When we visit this, I think we'll mark 'member' as Use. Suggestion: teach findLeftmostMemberExpr not to walk through MemberExprs for static data members.

547–550

I think you should also handle the arguments the same way that VisitCallExpr does.

struct A { A(const int&); };
struct B {
  int x, y;
  B() { A a(x); y = x; } // should warn on this
};

I think we won't warn here, because our usual handling for conservatively marking the x in a(x) as Ignore won't fire.

(Even if this is a copy or move constructor, it could still have additional arguments, so you should do this in that case too.)

That said, this is also orthogonal to handling of fields, and fixing this in a separate patch would be preferred.

993

This counting approach seems rather complex to me, and I think we can get the same effect with something much simpler: remove the 'ReportConstructor' code entirely, and when you come to issue a diagnostic for an uninitialized use of a field, check whether the source location of the use is within the source range of the constructor. If not, emit the additional note pointing out which constructor it was.

lib/Sema/AnalysisBasedWarnings.cpp
1672

I don't think you need this any more.

The analyzer can probably be modified (later) to take advantage of this as well. Awesome.

lib/Analysis/CFG.cpp
769–777

I'm assuming "the context is not ambiguous here" means "we can guarantee that this expression will only appear once in the CFG, unlike when it is used in list-initialization". That's why we didn't do this before.

It would be nice to have a CFG validator that enforced this "appears-once" restriction, but that can come later.

lib/Analysis/UninitializedValues.cpp
993

I'm always iffy about source ranges because of macros. It doesn't seem like it'd be too awful if it misfired here, but at the same time you could notice that the field (a) has an in-class initializer, and (b) does not have an associated initializer. (And calculate the latter with a simple linear walk, since we've already decided to emit an error message.)

rsmith added inline comments.Dec 6 2013, 6:48 PM
lib/Analysis/CFG.cpp
769–777

Exactly; we only expand default initializers when they're (implicitly) used from a constructor initializer list, and each such default initializer can only be used once in any given constructor. Maybe a more detailed comment here would help.

lib/Analysis/UninitializedValues.cpp
993

isBeforeInTranslationUnit does the right thing in the presence of macros (it compares the order of tokens after preprocessing).

Remind me: what's the state of this?

klimek resigned from this revision.Jul 3 2015, 6:41 AM
klimek removed a reviewer: klimek.
Via Old World