This is an archive of the discontinued LLVM Phabricator instance.

[CFG] [analyzer] pr41300: Add a branch to skip virtual base initializers when they are handled by the superclass.
ClosedPublic

Authored by NoQ on May 10 2019, 8:14 PM.

Details

Summary

A copy-paste from https://bugs.llvm.org/show_bug.cgi?id=41300#c4:

A cleaner repro:

#include <iostream>

struct S {
  S() { std::cout << "  Calling S()" << std::endl; }
};

struct A {
  A() { std::cout << "  Calling A()" << std::endl; }
  A(S s) { std::cout << "  Calling A(S)" << std::endl; }
};

struct B : virtual A {
  B() : A(S()) { std::cout << "  Calling B()" << std::endl; }
};

struct C : B {
  C() { std::cout << "  Calling C()" << std::endl; }
};

int main() {
  std::cout << "Constructing 'b':" << std::endl;
  B b;

  std::cout << "Constructing 'c':" << std::endl;
  C c;
  return 0;
}

The output of this program if you run it:

Constructing 'b':

Calling S()
Calling A(S)
Calling B()

Constructing 'c':

Calling A()
Calling B()
Calling C()

So, like, when calling the constructor for class B from class C, we silently (even under -Weverything) ignore the initializer A(S()). That is, we don't even compute S(), we just ignore it the whole initializer. But when we invoke B() directly, we do compute S().

This behavior implemented by compiling two versions of B(): one with the initializer for A and one without it, as can be seen in https://godbolt.org/z/8Sc-Re

However in the AST there's only one constructor for B(), so we should probably implement a runtime branch while modeling a CXXCtorInitializer in an inlined constructor call.

This patch adds the run-time CFG branch that would skip initialization of virtual base classes depending on whether the constructor is called from a superclass constructor or not. Previously the Static Analyzer was already skipping virtual base-class initializers in such constructors, but it wasn't skipping its arguments and their potential side effects, which was causing a crash (and was generally incorrect). The previous skipping behavior is replaced with a hard assertion that we're not even getting there due to how our CFG works.

The new CFG element is under a CFG build option so that not to break other consumers of the CFG by this change. Static Analyzer support for this change is implemented.

I've no idea how to write CFGBuilder code correctly because it's confusing because everything depends on the global mutable state of Block and Succ in the CFGBuilder. But i hope that initializers-cfg-output.cpp, together with the newly added test, provides some basic test coverage for my code.

Note that a similar functionality is necessary for destructors of virtual bases, but it remains to be done for now. We should be able to re-use the same terminator kind.

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ created this revision.May 10 2019, 8:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2019, 8:14 PM
NoQ added a comment.May 13 2019, 5:54 PM

Note that a similar functionality is necessary for destructors of virtual bases, but it remains to be done for now. We should be able to re-use the same terminator kind.

It's not really that important because virtual base destructors don't have sub-expressions that might cause side effects. So we can simply skip them in ExprEngine.

NoQ updated this revision to Diff 199532.May 14 2019, 4:27 PM

Rebase.

Szelethus added a comment.EditedMay 15 2019, 6:47 AM

Exquisite detective work as always! I read a bit of the standard, your summary, the bugreport and the new test cases, and I feel confident that I have a good understanding of the problem and your solution, though I didn't dig through the details just yet. At a first glance this looks great, and I'll be sure to revisit this patch and accept it formally.

The expression-list or braced-init-list in a mem-initializer is used to initialize the designated subobject (or, in the case of a delegating constructor, the complete class object) according to the initialization rules of 11.6 for direct-initialization. [ Example:

struct B1 { B1(int) ; /* . . . */ };
struct B2 { B2(int) ; /* . . . */ };
struct D : B1, B2 {
  D(int) ;
  B1 b;
  const int c;
};
D::D(int a) : B2(a+1) , B1(a+2) , c(a+3) , b(a+4) { /* . . . */ }
D d(10) ;

— end example ] [ Note: The initialization performed by each mem-initializer constitutes a full-expression (4.6) . Any expression in a mem-initializer is evaluated as part of the full-expression that performs the initialization. — end note ] A mem-initializer where the mem-initializer-id denotes a virtual base class is ignored during execution of a constructor of any class that is not the most derived class.

I think this part of the standard isn't well known (due mostly the the scarcity of virtual bases), and I'd like to see it referenced, maybe even quoted.

clang/include/clang/Analysis/CFG.h
567 ↗(On Diff #199532)

In the context of this patch, I understand what you mean, but without that, this might not be a good description for a class this important.

How about

///     vbase inits   |  initialization handled by superclass;
///                   |  initialization not handled by superclass

?

Charusso accepted this revision.May 16 2019, 12:03 AM

I do not like to have a variable storing/mention one stuff named in plural. It is your decision as it is just my personal feeling.

clang/include/clang/Analysis/AnalysisDeclContext.h
462 ↗(On Diff #199532)

Bases -> Base

clang/include/clang/Analysis/CFG.h
510 ↗(On Diff #199532)

The most derived class instead. Bases -> Base

541 ↗(On Diff #199532)

Bases -> Base

1050 ↗(On Diff #199532)

Bases -> Base

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
120 ↗(On Diff #199532)

Bases -> Base

clang/lib/Analysis/AnalysisDeclContext.cpp
73 ↗(On Diff #199532)

Bases -> Base

clang/lib/Analysis/CFG.cpp
1435 ↗(On Diff #199532)

The most derived class instead.

1794 ↗(On Diff #199532)

Bases -> Base

clang/lib/StaticAnalyzer/Core/AnalysisManager.cpp
40 ↗(On Diff #199532)

Bases -> Base

clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
447 ↗(On Diff #199532)

Bases -> Base

451 ↗(On Diff #199532)

May it is worth to handle the call outside to see whether it is non-null.

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
435 ↗(On Diff #199532)

May it is worth to handle the call outside to see whether it is non-null.

440 ↗(On Diff #199532)

The most derived class?

This revision is now accepted and ready to land.May 16 2019, 12:03 AM
Szelethus added a comment.EditedMay 16 2019, 5:17 AM

Hmmm, how about this?

#include <iostream>

struct VBase1 {
  VBase1() { std::cout << "VBase1()\n"; }
  VBase1(int) { std::cout << "VBase1(int)\n"; }
};

struct VBase2 {
  VBase2() { std::cout << "VBase2()\n"; }
  VBase2(std::string) { std::cout << "VBase2(std::string)\n"; }
};

struct B : public virtual VBase1 {
  B() : VBase1(0) {}
};

struct C : public B, public virtual VBase2 {
  C() : VBase2("sajt") {}
};

int main() {
  C c;
}

Output:

VBase1()
VBase2(std::string)

You are right that all virtual bases as initialized before everything else, but not all ctor delegations are skipped. If not this specific one, can something similar screw us over?

edit:

Twin of the abomination above:

#include <iostream>

struct VBase1 {
  VBase1() { std::cout << "VBase1()\n"; }
  VBase1(int) { std::cout << "VBase1(int)\n"; }
};

struct VBase2 {
  VBase2() { std::cout << "VBase2()\n"; }
  VBase2(std::string) { std::cout << "VBase2(std::string)\n"; }
};

struct B : public virtual VBase1, public virtual VBase2 { // Note that VBase2 is here.
  B() : VBase1(0) {}
};

struct C : public B {
  C() : VBase2("sajt") {}
};

int main() {
  C c;
}

Output:

VBase1()
VBase2(std::string)

edit2: What if the order of inheritance changes?

clang/lib/Analysis/CFG.cpp
1434–1438 ↗(On Diff #199532)

I find this a little too vague. The standard states:

First, and only for the constructor of the most derived class (4.5) , virtual base classes are initialized in the order they appear on a depth-first left-to-right traversal of the directed acyclic graph of base classes, where “left-to-right” is the order of appearance of the base classes in the derived class base-specifier-list.

Then, direct base classes are initialized in declaration order as they appear in the (regardless of the order of the mem-initializer s).

Then, non-static data members are initialized in the order they were declared in the class definition (again regardless of the order of the mem-initializer s).

Finally, the base-specifier-list compound-statement of the constructor body is executed.

This would be an overkill here, but how does this sound like:

Constructor delegations are ignored to virtual bases unless the object is of the most derived class.

class VBase { VBase() = default; VBase(int) {} };
class A : virtual public VBase { A() : VBase(0) {} };
class B : public A {};

B b; // Constructor calls in order: VBase(), A(), B().

// VBase(int) is ignored, B isn't the most derived class that
// delegates to the virtual base.

This may result in the virtual base(s) being already initialized at this point, in which case we should jump right onto non-virtual bases and fields. To handle this, make a CFG branch. We only need to do this once, since the standard states that all virtual bases shall be initialized before non-virtual bases and direct data members.

Also, the comment of mine complementing this inline raises a concern about "doing this only once", can you specify?

Szelethus added inline comments.May 17 2019, 1:58 AM
clang/lib/Analysis/CFG.cpp
1434–1438 ↗(On Diff #199532)

Mind you, this is the order of initialization for non-delegating constructors!

Szelethus requested changes to this revision.May 18 2019, 5:34 AM

The more I think about this, the more I think it would be great to see this tested :)

This revision now requires changes to proceed.May 18 2019, 5:34 AM
NoQ updated this revision to Diff 201354.May 24 2019, 3:52 PM
NoQ marked 14 inline comments as done.

Hmmm, how about this?
...
You are right that all virtual bases as initialized before everything else, but not all ctor delegations are skipped. If not this specific one, can something similar screw us over?
edit: Twin of the abomination above:
...
edit2: What if the order of inheritance changes?

VBase1(int) is skipped because B is not the most derived class, VBase2(std::string) is not skipped because C is the most derived class. That's the whole point of the branch; i don't see a problem. AST handles the initialization order for us, all we need is to branch around it correctly. Added these tests^^

clang/include/clang/Analysis/CFG.h
567 ↗(On Diff #199532)

Because this is a doxygen table syntax, i'm trying to make it look reasonable both in comments and in doxygen. In this sense it's hard to make line breaks in cells.

Hmm, btw, doxygen also fails to handle || because it thinks that it's a table cell delimiter. Let me fix that as well:

clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
451 ↗(On Diff #199532)

Mmm, what do you mean?

Charusso added inline comments.May 24 2019, 3:56 PM
clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
451 ↗(On Diff #199532)

It was that which is null-proof:

if (const Stmt *Outer = LCtx->getStackFrame()->getCallSite()) {
  const CXXConstructExpr *OuterCtor = dyn_cast<CXXConstructExpr>(Outer);
  if (OuterCtor) {
    // ...
  }
}
NoQ marked an inline comment as done.May 24 2019, 3:58 PM
NoQ added inline comments.
clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
451 ↗(On Diff #199532)

That's exactly what _or_null stands for in dyn_cast_or_null.

Charusso accepted this revision.May 24 2019, 4:02 PM
Charusso added inline comments.
clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
451 ↗(On Diff #199532)

Hm, I believe I had problems with that nullability a year ago, but it was long ago. Thanks for the clarification.

Szelethus accepted this revision.May 24 2019, 4:17 PM

LGTM! Nicely done!

This revision is now accepted and ready to land.May 24 2019, 4:17 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2019, 4:35 PM