Page MenuHomePhabricator

[C++20][Modules] Build module static initializers per P1874R1.
AcceptedPublic

Authored by iains on May 23 2022, 2:14 AM.

Details

Summary

Currently we only implement this for the Itanium ABI since the correct
mangling for the initializers in other ABIs is not yet known.

Intended result:

For a module interface [which includes partition interface and implementation
units] (instead of the generic CXX initializer) we emit a module init that:

  • wraps the contained initializations in a control variable to ensure that the inits only happen once, even if a module is imported many times by imports of the main unit.
  • calls module initialisers for imported modules first. Note that the order of module import is not significant, and therefore neither is the order of imported module initializers.
  • We then call initializers for the Global Module Fragment (if present)
  • We then call initializers for the current module.
  • We then call initializers for the Private Module Fragment (if present)

For a module implementation unit, or a non-module TU that imports at least one
module we emit a regular CXX init that:

  • Calls the initializers for any imported modules first.
  • Then proceeds as normal with remaining inits.

For all module unit kinds we include a global constructor entry, this allows
for the (in most cases unusual) possibility that a module object could be
included in a final binary without a specific call to its initializer.

Implementation:

  • We provide the module pointer in the AST Context so that CodeGen can act on it and its sub-modules.
  • We need to account for module build lines like this: clang -cc1 -std=c++20 Foo.pcm -emit-obj -o Foo.o or clang -cc1 -std=c++20 -xc++-module Foo.cpp -emit-obj -o Foo.o
  • in order to do this, we add to ParseAST to set the module pointer in the ASTContext, once we establish that this is a module build and we know the module pointer. To be able to do this, we make the query for current module public in Sema.
  • In CodeGen, we determine if the current build requires a CXX20-style module init and, if so, we defer any module initializers during the "Eagerly Emitted" phase.
  • We then walk the module initializers at the end of the TU but before emitting deferred inits (which adds any hidden and static ones, fixing https://github.com/llvm/llvm-project/issues/51873 ).
  • We then proceed to emit the deferred inits and continue to emit the CXX init function.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 2:14 AM
iains added a subscriber: Restricted Project.
iains edited the summary of this revision. (Show Details)May 23 2022, 2:22 AM
iains edited the summary of this revision. (Show Details)
iains edited the summary of this revision. (Show Details)
iains edited the summary of this revision. (Show Details)May 23 2022, 2:24 AM
iains edited the summary of this revision. (Show Details)
iains updated this revision to Diff 431313.May 23 2022, 2:33 AM

just updated description

iains published this revision for review.May 23 2022, 5:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 5:02 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
urnathan added inline comments.May 23 2022, 5:53 AM
clang/lib/CodeGen/CGDeclCXX.cpp
626–627

'at this time' is ambiguous. I think it means 'the current compiler implementation state', but it could also mean 'at this point in the compilation'. I don't think there's a general problem with such an optimization -- we could emit a flag into the BMI. It's just we don't have the smarts to do that just yet. right?

629

I'm with Morse about this, even before coming to live where I do. You've used both 'ise' and 'ize'. s/ise/ize/

ChuanqiXu added inline comments.May 26 2022, 12:15 AM
clang/include/clang/AST/ASTContext.h
476

The name PrimaryModule looks confusing. PrimaryModule looks like primary module interface unit to me. I think it refers to current module unit only. I think it is better to rename to CurrentModuleUnit.

Then why is it mutable? I don't find it is changed in a const member function.

1085
clang/lib/CodeGen/CGDeclCXX.cpp
621

Recommend to comment related part in your summary. It should be much helpful for other to read the codes.

645–646

mangleModuleInitializer is virtual function and we don't need to cast it.

671

I find the process here for PrioritizedCXXGlobalInits is much simpler than the one done in CodeGenModule::EmitCXXGlobalInitFunc(). It would generate more global init function. Doesn't it matter?

684

Given CodeGenModule is designed as ABI independent, I feel better to avoid see Itanium ABI in this function. (Although we don't have other ABI for modules now..)

692–693
699

Should the Guard be internal linkage? I image that it is possible to be manipulated by different TUs. So I feel like it might be better to be linkonce or linkonce_odr?

706–722

The codes look highly similar to CodeGenModule::EmitCXXGlobalInitFunc. I feel it is better to hoist a new function to avoid repeating the same logic twice.

861

This looks odd since it might be possible to not handling modules. But I understand it might be time-consuming if we want to use CXXGloablInits and we want ModuleInits to be front.

clang/lib/CodeGen/CodeGenModule.cpp
2487

Nit: I feel better to add methods like Module *Module::getGlobalModuleFragment(). I feel odd to use magic string here. But I am OK with it since there many magic string in CodeGen codes and we got in consensus before that we would need to do a lot of refactor work in the future.

2501

Nit: with above: Module *Module::getPrivateModuleFragment().

2931–2933
3205–3206

Is this change needed? Could you elaborate more on this?

clang/lib/CodeGen/CodeGenModule.h
1564

Let's keep consistent with the following. Given CodeGen part uses LLVM IR tensely, clang::Module looks better than Module*

clang/test/CodeGen/module-intializer.cpp
57

I think it is necessary to check the load for the guard and the check.

173

It looks like we lack a test for PrivateModuleFragment.

ChuanqiXu added inline comments.May 26 2022, 12:31 AM
clang/lib/CodeGen/CGDeclCXX.cpp
699

Oh, I realized I'm wrong. Don't remind this one.

iains updated this revision to Diff 432241.May 26 2022, 4:02 AM
iains marked 14 inline comments as done.

address review comments

iains added a comment.May 26 2022, 4:03 AM

not sure why the Debian clang-format is complaining - I am using llvm-14's clang-format with git clang-format + arc.

clang/include/clang/AST/ASTContext.h
476

The name PrimaryModule looks confusing. PrimaryModule looks like primary module interface unit to me. I think it refers to current module unit only. I think it is better to rename to CurrentModuleUnit.

I think we want to be clear that this is the Top Level Module (and not some dependent or sub-module) - so I have renamed to "TopLevelModule".

Then why is it mutable? I don't find it is changed in a const member function.

Ah well caught; the implementation went through several iterations and I missed to remove this.

clang/lib/CodeGen/CGDeclCXX.cpp
621

Hmm. I am not sure exactly what you mean here - there is a comment about the purpose of the method, where the method is declared.
The commit message describes what this method does in the first part. I'm happy to make things more clear, but not sure where you want to see some change,

626–627

yeah, I meant "as currently implemented" - adding flags to the BMI attracts concomitant churn in the serialisation, so probably better done separately. Amended the comment.

629

tools tools ... my editor wants "ise" unless I remember to set it to USian .. and sometimes I miss a change,, hopefully will catch them all as I go.

645–646

Actually, 'mangleModuleInitializer' is not a virtual function of the 'MangleContext' class, only of the 'ItaniumMangleContext'.
(see D122741).

This because we do not know the mangling for MS and, in principle, there might never be such a mangling (e.g. that implementation might deal with P1874 in some different way).

I did have a version of this where I amended the mangling to make the function virtual in the 'MangleContext' class, but then that means generating a dummy MS version that, as noted above, might never exist in reality.

671

In the general CXX initializer the prioritized inits are grouped into functions named in a way that allows them to be collated by other tools (e.g. the static linker) so that init priority can be honoured over multiple TUs.

This is not feasible for module initializers, there is no way to fuze them across TUs (it would have no meaning). So I think all we need to do here is to emit the calls in the correct order (in the end there are no more initializer functions, we just skip the intermediated grouping functions,

Of course, init priority is not mentioned in the standard, so that really what would matter is that compiler 'vendors' agree on a sensible approach to make sure code behaves in an expected manner for common toolchains.

684

Until we have an answer on MS's approach to this, I think it is better to be clear that this only applies to Itanium ABI uses (but if consensus is to remove this, I can do so).

692–693

we need the cast, see above.

706–722

actually, I had considered trying to factor this a bit, but it looks kind of artificial (will consider a second iteration).

861

Yes - otherwise we end up inserting a bunch of stuff, the code is not too long, and the pre-pending is described at the stage it is done.

clang/lib/CodeGen/CodeGenModule.cpp
2487

yeah, I was intending to do this actually - although it only moves the magic strings to a different place. It might be that we have enough bits in the module type enumerator (already streamed) so that we could use up two values with "GMF" and "PMF" module types. However IMO that should be a separate change.

3205–3206

the test is quite heavy, and is used twice when the assertions are enabled (I do not mind to revert this part if that does not seem worthwhile).

clang/test/CodeGen/module-intializer.cpp
57

can you say why?
the main purpose here is to check that we have the right mangled symbols (and some check that there is useful ordering). In what case would you think code-gen could emit the store and somehow not the other parts?

173

yeah, needs to be a second test since we cannot have PMF in a module with partitions.
added one.

ChuanqiXu added inline comments.Thu, May 26, 8:15 PM
clang/lib/CodeGen/CGDeclCXX.cpp
621

I mean the paragraph:

For a module (instead of the generic CXX initializer) we emit a module init
that:

- wraps the contained initializations in a control variable to ensure that the inits only happen once, even if a module is imported many times by imports of the main unit.
- calls module initialisers for imported modules first. Note that the order of module import is not significant, and therefore neither is the order of imported module initializers.
- We then call initializers for the Global Module Fragment (if present)
- We then call initializers for the current module.
- We then call initializers for the Private Module Fragment (if present)

I understand people might feel like this is wordy. But, personally, I prefer readability.

645–646

I was just afraid of people might blame us to bring ABI specific things to CodeGen* things. I found there similar uses in CGVTables.cpp and CGVTT.cpp. So this might be acceptable too.

671

Yeah, the prioritized inits are really odd fashion to me.. I only saw such things in assembly. So my understanding to this one is that:
(1) If we have prioritized inits in modules' code , it is not guaranteed to be initialized first. This is the vendors agreement.
(2) If we link a static library (e.g., libgcc.a), the prioritized inits in that would still be initialized first.

Do I understand right?

clang/lib/CodeGen/CodeGenModule.cpp
2487

We still need to use auto * instead of auto here. I remember this is recommended in clang's coding standards.

2501

Use auto* here.

3205–3206

I prefer to do such changes in a standalone NFC patch.

iains added inline comments.Fri, May 27, 3:35 AM
clang/lib/CodeGen/CGDeclCXX.cpp
621

so, to clarify - you would like me to add this description as a comment block on the new initializer method (I am OK with doing this)

ChuanqiXu added inline comments.Fri, May 27, 5:36 AM
clang/lib/CodeGen/CGDeclCXX.cpp
621

Yeah, I feel it is helpful.

iains updated this revision to Diff 433049.Tue, May 31, 4:05 AM
iains marked 7 inline comments as done.

rebased, addressed review comments.

clang/lib/CodeGen/CGDeclCXX.cpp
645–646

yes, let us see - if there are any other opinions - personally, I prefer not to generate a dummy function in the MS code.

671

We should avoid side-tracking this patch with too much discussion of the problems of C++ global initialisers. Behaviour of things like static libraries depends on the toolchain, binary container, platform ....

  1. prioritised initializers are not part of the standard
  2. they are optional (some toolchains do not handle them at all - e.g. macOS)
  3. where they are handled, usually the toolchain makes no guarantees about behaviour between TUs
  4. (for a regular ELF-style C++ global init) there is a convention to group initializers for a specific priority into a function with a specific mangled name that allows (optionally) external tools - like the linker -to group / order the inits _between_ TUs.

None of this is relevant to Module initializers, we just have to ensure that imported module inits are run before the current module's.

clang/lib/CodeGen/CodeGenModule.cpp
3205–3206

OK .. fair enough, reverted for now.

ChuanqiXu accepted this revision.Tue, May 31, 7:01 PM

LGTM, thanks!

This revision is now accepted and ready to land.Tue, May 31, 7:01 PM
iains updated this revision to Diff 434003.Fri, Jun 3, 5:26 AM
iains edited the summary of this revision. (Show Details)

rebased and updated

previously, this was not doing the right thing for module implementation units
which were being processed as if they were interfaces, so we've introduced a module
implemetation type, and that is now checked for in the emitting of the inits.

iains updated this revision to Diff 434781.Tue, Jun 7, 4:57 AM

rebased

iains updated this revision to Diff 436487.Mon, Jun 13, 11:20 AM

rebased. Amended to include a global CTOR entry for each module kind.

@urnathan - I believe that this now implements the same scheme as you have done for GCC (less any of the optimisations).
In particular, we now emit global CTOR entries for module initializers, even though these should really be called explicitly, but as was discussed on the core/sg15 reflectors it is possible (at least for unix-like systems) to construct cases where module objects are included in a final binary without their initializers being called.

please sed /initialiser/initializer/, I noticed a few had crept in.