This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] Optimize comdat dynamic initializers on Windows
Changes PlannedPublic

Authored by rnk on Nov 20 2018, 2:12 PM.

Details

Summary

The motivation is to be able to optimize dynamic initialization of
things with vague linkage, as in this C++ code:

struct my_id {
  my_id() : v_(0) {}
  int v_;
};
template <typename T> struct Foo { static my_id id; };
template <typename T> my_id Foo<T>::id;
int main() { Foo<char>::id.v_++; }

Here, we have a class with a non-trivial, non-constexpr default
constructor, and we instantiate a static data member of a class template
of that type.

Normally, globalopt refuses to modify initializers of globals with weak
linkage because it has no way of knowing if the linker will select the
modified or unmodified global, or if there will be some duplicate
initializer that performs the same effect twice.

However, when the initializer modifies the global used as its comdat key
in llvm.global_ctors, we know that, in the Microsoft C++ ABI, if
globalopt succeeds and the modified global prevails at link time, all of
the other initializers will be discarded.

The Itanium C++ ABI does not (currently) provide a guarantee that the
.init_array entry will be in the same comdat group as the global, so it
instead requires that the compiler emit a guard variable check to
prevent double initialization as described here:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#guards

I felt it was best to make globalopt check the triple, even though in
practice, the Itanium guard variable check blocks globalopt because the
guard variable is in its own comdat group and has weak linkage.

Fixes PR39712

Event Timeline

rnk created this revision.Nov 20 2018, 2:12 PM
efriedma added inline comments.
llvm/lib/Transforms/Utils/CtorUtils.cpp
116

If the IR doesn't have enough information to distinguish between the Microsoft and Itanium ABI cases, we should extend the IR to contain more information. Checking the triple means we're depending on semantic guarantees which are not documented in LangRef, and many not apply to non-C++ languages.

rjmccall added inline comments.Nov 25 2018, 11:53 PM
llvm/lib/Transforms/Utils/CtorUtils.cpp
116

I agree with Eli. This is clearly a frontend-originated guarantee, not a target-wide guarantee, and should be reflected in the IR.

rnk marked an inline comment as done.Nov 26 2018, 12:45 PM
rnk added inline comments.
llvm/lib/Transforms/Utils/CtorUtils.cpp
116

For platforms with comdats, I wanted to push this same idea through the Itanium C++ ABI. It completely eliminates the need for guard variables for initializers of variables with vague linkage, although only if they are hidden, so other DSOs don't try to initialize them.

What do you think of some kind of global named metadata node for this? This is such a complicated guarantee to express, I've tried and failed to write it down concisely three times. !llvm.assume_comdat_ctors? !llvm.comdat_initializers = !{i32 0, i32 1}? I like "comdat_initializers".

rjmccall added inline comments.Nov 26 2018, 1:13 PM
llvm/lib/Transforms/Utils/CtorUtils.cpp
116

For platforms with comdats, I wanted to push this same idea through the Itanium C++ ABI. It completely eliminates the need for guard variables for initializers of variables with vague linkage, although only if they are hidden, so other DSOs don't try to initialize them.

Okay, but it's still not a target-wide thing, it's a rule of a specific language ABI. That's a higher level of abstraction than LLVM IR and should be opt-in on a variable-by-variable basis.

This is such a complicated guarantee to express, I've tried and failed to write it down concisely three times.

"If this definition of the global variable is chosen by the linker, there is an associated global constructor function that will also be chosen by the linker, and it is undefined behavior if any other global constructor accesses the variable before the associated constructor runs." I'd call it comdat_global_ctor, but the "comdat" in the name is just suggestive, not a core aspect of the semantics.

Global metadata pointing to what, exactly? The global variables?

The comdat key in global_ctors provides some sort of guarantee that the global variable definitions in that comdat are "strong", in some sense: if the comdat were discarded, the ctor would also be discarded. Do we really need any additional guarantee from the frontend?

llvm/lib/Transforms/Utils/Evaluator.cpp
131

Actually, if we have a ComdatGV, isn't it illegal to update globals where hasUniqueInitializer() is true? The constructor could be discarded. For example, something like this:

void foo();
struct S { S() { foo(); } static S s; };
inline S S::s;
S *ss = &S::s;
int x;
void foo(){ x++; }
rnk planned changes to this revision.Nov 26 2018, 2:07 PM
rnk marked 2 inline comments as done.

It would be nice if globalopt could do a better job on these kinds of things, too:

inline int foo() { return 42; }
int bar();
static int block_globalopt = bar();
static int g1 = foo();
static int g2 = foo();
static int g3 = foo();
int *arr[] = {&block_globalopt, &g1, &g2, &g3};

C++ requires that these globals are initialized in order, but (I think?) it's UB if any of the globals are accessed before they are constructed. Right now, to provide the order guarantee, we lump all the non-comdat initializers into a _GLOBAL__sub_I_ function. As a consequence, globalopt is all or nothing: either everything can be optimized, or nothing can be. In this example, because it can't see through bar, it won't fire.

Maybe we should consider giving a stronger guarantee for the order in which global_ctors entries run, so that clang can instead emit these all separately.

llvm/lib/Transforms/Utils/CtorUtils.cpp
116

I wasn't going to go so far as to make it variable-by-variable, I was just going to make it a global named metadata tuple. I guess a metadata attachment on the global variable being initialized would be OK, since removing just blocks the optimization, which is semantics preserving. I don't like it that much, since I really feel like this optimization should be done as a matter of course. It sounds like you folks think that's the way to go, though?

llvm/lib/Transforms/Utils/Evaluator.cpp
131

You're right. Initially I was thinking this definition of x will always prevail so modifying it is safe, but the initializer may not prevail, and we could now double-increment x. I think it's fixable. Basically, if this is an initializer for something in a comdat, we can only touch globals in that comdat group, so they all go together.

In D54774#1308714, @rnk wrote:

It would be nice if globalopt could do a better job on these kinds of things, too:

inline int foo() { return 42; }
int bar();
static int block_globalopt = bar();
static int g1 = foo();
static int g2 = foo();
static int g3 = foo();
int *arr[] = {&block_globalopt, &g1, &g2, &g3};

C++ requires that these globals are initialized in order, but (I think?) it's UB if any of the globals are accessed before they are constructed.

As seen in the example at the end of [basic.start.static], no, this is not UB. If block_globalopt's initializer reads from any of these variables, it's unspecified what value it gets, but it's not actually UB. According to [class.cdtor], it's UB if the variable is "an object with a non-trivial constructor", but I think that has to be read as limited to class types at a minimum.

You're right that we're supposed to be able to emit the g variables in your example with constant initialization, although you have to check very specific conditions (no modifications of other variables, restricted dependencies on other variables' values) before you can.

Right now, to provide the order guarantee, we lump all the non-comdat initializers into a _GLOBAL__sub_I_ function. As a consequence, globalopt is all or nothing: either everything can be optimized, or nothing can be. In this example, because it can't see through bar, it won't fire.

Maybe we should consider giving a stronger guarantee for the order in which global_ctors entries run, so that clang can instead emit these all separately.

That could be useful.

llvm/lib/Transforms/Utils/CtorUtils.cpp
116

Right. I feel that this is a language-specific optimization that needs language-specific permission to enable.