This is an archive of the discontinued LLVM Phabricator instance.

[static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used.
ClosedPublic

Authored by wolfgangp on Jun 1 2021, 4:55 PM.

Details

Summary

Commit c19f4f8069 caused some ordered dynamic initialization of static member variables to be treated like unordered initialization, i.e. put into global_ctors. This caused issues in some configurations, specifically when init_array is not used. This patch proposes to restrict this behaviour to the Microsoft ABI, where it is apparently necessary for correctness.

2 existing test cases are affected by this change.

This fixes PR50266.

Diff Detail

Event Timeline

wolfgangp requested review of this revision.Jun 1 2021, 4:55 PM
wolfgangp created this revision.

+@rsmith @hans @aeubanks

I think this would be an unfortunate code size and startup time regression for Itanium C++ inline variables. The existing code was written under the assumption that vague linkage (GVA_DiscardableODR) implies no initialization ordering, but I think you've found a valid counterexample.

specifically when init_array is not used

Can you elaborate on why that is? Here's what I remember, and what I guess is happening. ELF has two initializer schemes: .init_array and .ctors. They are essentially equivalent, they are both arrays of function pointers, but one is called in reverse order and the other is called in forward order. The compiler knows which scheme is in use and it is controlled by -fuse-init-array.

The LLVM IR langref says that llvm.global_ctors are called in ascending priority order, and the order between initializers is not defined. That's not very helpful and often doesn't reflect reality. I wonder if we could do two things, perhaps separately:

  1. emit llvm.global_ctors so that they are called in order of appearance in the IR array (is this not already true?)
  2. define the order of initialization in LangRef

With the first change in place, we can be confident that so long as all includers of the StaticsClass in the test emit both initializers in the correct order, no matter which initializers prevail, they will be called in the correct order. Of course, this could break users relying on the existing ordering, nevermind that it is explicitly documented as undefined.

The second change is only a documentation change, but I would like to find a way to justify what LLVM does in GlobalOpt. GlobalOpt can effectively fold a dynamic initializer to constant memory in certain cases. That can only ever have the effect of making an initializer run earlier. As long as no initializers depend on observing uninitialized globals, that should be safe. The guarantee that we want to provide is that, for each initializer, all initializers prior to it in the global_ctors array will have run when it runs. Initializers that appear later may run earlier. Hopefully that covers both the usual way that .init_array sections are linked together and the way globalopt optimizes dynamic initialization.

ychen added a subscriber: ychen.Jun 2 2021, 2:18 PM

The LLVM IR langref says that llvm.global_ctors are called in ascending priority order, and the order between initializers is not defined. That's not very helpful and often doesn't reflect reality. I wonder if we could do two things, perhaps separately:

  1. emit llvm.global_ctors so that they are called in order of appearance in the IR array (is this not already true?)
  2. define the order of initialization in LangRef

Is there anything preventing us from using the existing priority field to define the order instead of introducing the order among initializers with the same priority? If we go with 1., there would be no way to say that "the order does not matter" right?

rnk added a comment.Jun 2 2021, 2:22 PM

Is there anything preventing us from using the existing priority field to define the order instead of introducing the order among initializers with the same priority? If we go with 1., there would be no way to say that "the order does not matter" right?

The priority is used mainly for inter-object initialization order. It moves the initializer into a .init_array.N section, for some number N. The programmer may be using existing numbers via __attribute__((init_priority(N))), so it isn't safe for the compiler to use anything other than the default initialization priority.

ychen added a comment.Jun 2 2021, 2:47 PM

Is there anything preventing us from using the existing priority field to define the order instead of introducing the order among initializers with the same priority? If we go with 1., there would be no way to say that "the order does not matter" right?

The priority is used mainly for inter-object initialization order. It moves the initializer into a .init_array.N section, for some number N. The programmer may be using existing numbers via __attribute__((init_priority(N))), so it isn't safe for the compiler to use anything other than the default initialization priority.

I see. It makes sense to define the order at IR level then since the order is there in .init_array.N/.ctors section nevertheless.

+@rsmith @hans @aeubanks

specifically when init_array is not used

Can you elaborate on why that is? Here's what I remember, and what I guess is happening. ELF has two initializer schemes: .init_array and .ctors. They are essentially equivalent, they are both arrays of function pointers, but one is called in reverse order and the other is called in forward order. The compiler knows which scheme is in use and it is controlled by -fuse-init-array.

For PS4, we use the .ctors scheme, and so the initialization order was suddenly reversed, which was not noticed for a while until the user had a dependency on the previous initialization. And using init_array or not has currently no effect on the order of emission of global_ctors.

The LLVM IR langref says that llvm.global_ctors are called in ascending priority order, and the order between initializers is not defined. That's not very helpful and often doesn't reflect reality. I wonder if we could do two things, perhaps separately:

  1. emit llvm.global_ctors so that they are called in order of appearance in the IR array (is this not already true?)

AFAICT, this is already happening. After sorting by priority, the order remains intact at emission. So with -fno-use-init-array0we're going to emit global_ctors in reverse order, I suppose.

  1. define the order of initialization in LangRef

With the first change in place, we can be confident that so long as all includers of the StaticsClass in the test emit both initializers in the correct order, no matter which initializers prevail, they will be called in the correct order. Of course, this could break users relying on the existing ordering, nevermind that it is explicitly documented as undefined.

The second change is only a documentation change, but I would like to find a way to justify what LLVM does in GlobalOpt. GlobalOpt can effectively fold a dynamic initializer to constant memory in certain cases. That can only ever have the effect of making an initializer run earlier. As long as no initializers depend on observing uninitialized globals, that should be safe. The guarantee that we want to provide is that, for each initializer, all initializers prior to it in the global_ctors array will have run when it runs. Initializers that appear later may run earlier. Hopefully that covers both the usual way that .init_array sections are linked together and the way globalopt optimizes dynamic initialization.

rnk added a comment.Jun 2 2021, 3:46 PM

For PS4, we use the .ctors scheme, and so the initialization order was suddenly reversed, which was not noticed for a while until the user had a dependency on the previous initialization. And using init_array or not has currently no effect on the order of emission of global_ctors.

Exactly, that's my suggestion: if fno-use-init-array is set, emit global_ctors into the .ctors section in reverse order. That way -fno-use-init-array doesn't reverse the order of initialization *inside* a given LLVM module, even if it reverses the order in which *modules* are initialized.

wolfgangp updated this revision to Diff 350364.Jun 7 2021, 11:19 AM

Following the suggestion to define an order of initialization for the entries in llvm.global_ctors and llvm.global_dtors this is mainly a documentation change, as well as a simple reversed emission of global_ctors/dtors entries when InitArray is not used.
3 test cases are affected, 2 of them just by a reversed order of emission of priority-suffixed sections. I added a couple of entries to the third test case to verify in-order emission of entries with equal priority.

Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2021, 11:19 AM

+ various .ctors stakeholders + @MaskRay + @mstorsjo

This change was my idea, so I want to make sure there is buy in from other folks who use -fno-init-array. This has the potential to break user programs that rely on the order of dynamic initialization *within* a TU, but there will be no change in order between TUs. This might also deserve a release note.

Overall this should be a positive change. In the past users have reported QoI issues where a template with an SDM instantiated earlier in the TU gets initialized after a template instantiated later in the TU. Technically, those initializers are not ordered, so this is not a correctness issue, but it would be nicer if initializers ran in source order.

We might need to add docs here about what happens when the initializer is in a comdat group, since the order guarantee in that case is very subtle.

Are there any existing optimizations that might be affected by this? In particular, I think GlobalOpt implicitly reorders functions in global_ctors.

Thanks for tagging us.

glibc has supported DT_INIT_ARRAY/DT_FINI_ARRAY since 1999. GCC 4.7 defaulted to .init_array/.fini_array on glibc >= 2.4 and most Linux distributions have thus made the switch for many years. Actually, most ELF operating systems have switched to .init_array/.fini_array (FreeBSD added the support in 2012, OpenBSD in 2016, NetBSD in 2018.

.ctors doesn't have aren't many users now, but my users are unfortunately still using .ctors:(
Our tests can pass with .init_array + ld.lld --shuffle-sections=.init_array=-1, so I think we don't have code relying on the buggy .ctors order
(i.e. we don't have problems due to the shuffled dynamic initialization order *within* a TU).

This patch improves the situation for other users and we should do it.

Can you check whether clang/lib/CodeGen/CGDeclCXX.cpp:507 needs any comment update?
The subject "Don't put ordered dynamic initializers of static variables into global_ctors" needs an update as well.

llvm/test/CodeGen/X86/constructor.ll
61

Worth a comment that we check f/i/j are in order.

Are there any existing optimizations that might be affected by this? In particular, I think GlobalOpt implicitly reorders functions in global_ctors.

Thanks for mentioning llvm::optimizeGlobalCtorsList. It keeps the order of the remaining items so I assume defining an order among constructors with the same priority should be fine.

Thanks for looping me in; I don't have any objection to this, as far as I can see, the reasoning seems sensible, and the most brittle testcase I have (from what I remember offhand) seems to run fine with this change.

rnk added a subscriber: asbirlea.Jun 8 2021, 9:21 AM

I think we do need to think about globalopt. Consider this example:

extern "C" {
int printf(const char *, ...);
extern int sharedState;
int sharedState = 0;
int init1() {
  asm volatile(""); // block globalopt
  return ++sharedState;
}
int init2() { return ++sharedState; }
int gv1 = init1();
int gv2 = init2();
}

int main() {
  printf("gv1: %d\n", gv1);
  printf("gv2: %d\n", gv2);
}

These initializers are ordered, and gv1 should be 1 and gv2 should be 2. Clang produces a single shared initializer @_GLOBAL__sub_I_${src} to ensure this ordering. However, if you manually split the initializers apart (https://reviews.llvm.org/P8266) and try to rely on the new langref ordering, globalopt will fold the second initializer and it appears as if init2 ran first. I have these two files in t.cpp and t.ll, and see the results:

$ clang -O2 t.cpp  && ./a.out
gv1: 1
gv2: 2

$ clang -O2 t.ll  && ./a.out
gv1: 2
gv2: 1

I've never been totally comfortable with the soundness of globalopt, so I wonder if that's the part that really needs to change. The value of globalopt is that it can fold simple constructors, and those don't access other mutable global variables. If we had some way of telling globalopt that it is only allowed to read from constants and the GV being initialized, that would solve this issue. Right now, there is no link in the IR between a global and its initializer unless the global is comdat.


As for this patch, I think we're not ready to add the LangRef guarantee. However, the code change is good: it ensures that -f[no-]use-init-array doesn't needlessly reverse the order of initialization in the TU.

Without the langref guarantee, clang is skating on thin ice: inline static data member initializers may not be run in source order. However, globalopt is currently completely unable to fold initializers for weak/comdat globals, so in practice, there is no risk to users.

I think we can defer the langref change until the next time someone makes a serious investment in globalopt. I think @asbirlea and @aeubanks might be interested in that sometime later this year.

wolfgangp updated this revision to Diff 350644.Jun 8 2021, 10:11 AM
wolfgangp retitled this revision from [static initializers] Don't put ordered dynamic initializers of static variables into global_ctors to [static initializers] Emit global_ctors and global_dtors in reverse order when init_array is not used..

Given the assessment on GlobalOpt, removed the LangRef changes and updated the subject line.

Can you check whether clang/lib/CodeGen/CGDeclCXX.cpp:507 needs any comment update?
The subject "Don't put ordered dynamic initializers of static variables into global_ctors" needs an update as well.

If we are not defining the initialization order, it seems the comment doesn't need any updating yet, although once we do, it definitely will.

efriedma accepted this revision.Jun 8 2021, 10:51 AM

LGTM. Making the order of constructors independent of UseInitArray seems obviously good in any case.

This revision is now accepted and ready to land.Jun 8 2021, 10:51 AM

Won't this change cause weird effects with LTO, when you're merging multiple TUs' global_ctors arrays before emitting? Won't this end up reversing the order of the files, as well as the order of the functions within a single file? That does not seem likely to be expected, right?

The compiler knows which scheme is in use and it is controlled by -fuse-init-array.

This isn't exactly true -- the linker (binutils ld and ld.gold) can translate ctors to init_array, so we also need to look at what they do. I can't actually find the code that does it in bfd, but in examination of its output, it looks like it does reverse the order of pointers within a ctors section when translating to init_array, in order to preserve the expected "backwards" order that ctors users "wanted". And gold does the same -- except it is buggy, because it reverses the order of the text itself, yet not the relocation entries associated with it. So depending on whether you're building a PIC binary or not, you get different behavior (sigh). Anyhow, modulo that gold bug, it looks like this is actually OK.

rnk added a subscriber: probinson.Jun 9 2021, 9:04 AM

Won't this change cause weird effects with LTO, when you're merging multiple TUs' global_ctors arrays before emitting? Won't this end up reversing the order of the files, as well as the order of the functions within a single file? That does not seem likely to be expected, right?

True, I think that is an issue, and that probably does affect Sony's actual customers. I hear games tend to use regular LTO, so this will in fact reverse the order of initialization between object files for those users. The only solution I can think of would be to bring the LTO library into the game. It must know whether .ctors of .init_array are in use, since it generates code and it has to pick the section. To ensure that initialization of objects happens in reverse command line order as expected of users of -fno-init-array, the linker would join global_ctors in reverse order. That's not elegant. It encodes this knowledge in two places instead of one. But it does move in the direction of making things more defined, more deterministic, and that seems like a good thing.

So, a question for @wolfgangp and @probinson , do we need to make adjustments to the LTO library, or is this OK as is?

I think for non-Sony users we can accept the change in initialization order.

So, a question for @wolfgangp and @probinson , do we need to make adjustments to the LTO library, or is this OK as is?

We never guaranteed our licensees any particular initialization order (between modules), so after checking with my team members I think this is OK as is.

If nobody has any more objections, I'll commit this change, then. Please let me know if you think otherwise.

rnk accepted this revision.Jun 10 2021, 12:09 PM

lgtm

MaskRay accepted this revision.Jun 10 2021, 12:48 PM
MaskRay retitled this revision from [static initializers] Emit global_ctors and global_dtors in reverse order when init_array is not used. to [static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used..
This revision was landed with ongoing or failed builds.Jun 10 2021, 4:45 PM
This revision was automatically updated to reflect the committed changes.