This is an archive of the discontinued LLVM Phabricator instance.

[C++20][Modules] Elide unused guard variables in Itanium ABI module initializers.
ClosedPublic

Authored by iains on Sep 24 2022, 8:17 AM.

Details

Summary

For the Itanium ABI, we emit an initializer for each module. This is responsible
for handling initialization of global vars. Relates to P1874R1.

The initializer has a known mangling and is automatically called from any TU that
imports a module. Since, at present, the importer has no way to determine that an
imported module does not require an initializer, we generate the initializer for
all cases (even when it is empty).

Initializers must be run once, with the ordering guaranteed by the import graph
and this is ensured in the current code by addition of a guard variable.

In the case that a module has no requirement for global initializers, and also does
not import any other modules, we can elide the guard variable.

Diff Detail

Event Timeline

iains created this revision.Sep 24 2022, 8:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2022, 8:17 AM
iains published this revision for review.Sep 24 2022, 8:20 AM
iains added a reviewer: urnathan.
iains added a subscriber: Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2022, 8:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
dblaikie added inline comments.
clang/lib/CodeGen/CGDeclCXX.cpp
690–695

Usually I'd expect this to be sunk down closer to its usage - any particular reason to have it up here? (not super far away, but seems further away than I'd expect)

730

Could we avoid duplicating this call, by doing something like this:

ConstantAddress Guard = ConstantAddress::invalid()
if (!PrioritizedCXXGlobalInits.empty() || !CXXGlobalInits.empty())
  Guard = ConstantAddress(new GlobalVariable(...), Int8Ty, GuardAlign);
CodeGenFunction(*this).GenerateCXXGlobalInitFunc(Fn, ModuleInits, Guard);

?

clang/test/CodeGenCXX/module-initializer-guard-elision.cpp
23–44

If it's pretty stable/reliable/not too brittle, maybe worth CHECKing the whole function in these cases (CHECK+CHECK-NEXT etc) to demonstrate they have nothing in them other than the return, or the call+return?

iains updated this revision to Diff 462689.Sep 24 2022, 3:00 PM
iains marked 3 inline comments as done.
iains edited the summary of this revision. (Show Details)

address review comments, minor amendments to description.

iains added a comment.Sep 24 2022, 3:01 PM

thanks for the review.

clang/lib/CodeGen/CGDeclCXX.cpp
690–695

Agreed.
The reason was that we build an ordered vector of inits from several potential sources, and I'd been clearing the inputs as they were used. I've now sunk those clears to the end of the function and merged the test as you suggest below.

730

works for me.

clang/test/CodeGenCXX/module-initializer-guard-elision.cpp
23–44

The main thing we care about is that there is no guard variable - however it does no harm to be specific where we can, For the empty/nearly empty functions, that seems reasonable (done for those).

urnathan added a comment.EditedSep 26 2022, 7:28 AM

Richard & I discussed this, and decided it was a bad idea, as it means we could have an O(N^2) initialization walk, rather than O(N).

IIUC this is being motivated by NVPTX, which cannot support dynamically initialized global vars. But in that case we know all the global init fns must be NOP. However, I recommend not solving that problem piecemeal, but involve NVidia more directly. (There is an alternative design, relying on section ordering, but it is more brittle, and IIRC still requires idempotency vars in some cases.) We should not be pessimizing systems that have dynamic inits to cater for systems that lack them.

For GCC I (eventually) implemented optimizations where:
1 each CMI indicates whether it has a NOP global init fn.
2 the global init was always emitted, but if it is NOP, it's a completely empty fn (thus importers are not required to implement the below optimization)
3 Importers do not emit calls to known-NOP global inits.

Step #1 took a bit of moving things around, as one only knew there were no global inits at a later point than the CMI was being written out.

iains planned changes to this revision.Sep 26 2022, 8:15 AM

So, in light of these comments;

  • It seems that there is one case where I can elide the guard - where the init is completely empty (including of calls to imported module inits)
  • otherwise, I need to investigate what is needed for an importee to communicate that it has an empty init and thus does not need to be called.

I think the first of these could be useful in quite a few cases (it's not clear to me at this relatively early stage in modules adoption that we see very deep include graphs - but somewhat unknown).

The second will have to live on the TODO for now (at least from me).

iains updated this revision to Diff 464502.Oct 1 2022, 7:44 AM
iains edited the summary of this revision. (Show Details)

rebased and updated to elide the guard only for the case of no inits _and_ no imports.

urnathan accepted this revision.Oct 11 2022, 4:47 AM

This is ok, but see the note I added about object-file compatibility.

clang/lib/CodeGen/CGDeclCXX.cpp
646–647

It's not just 'current implementation' requirement, it's an ABI requirement. Remember, one could generate the interface object file from one compiler and then generate (and consume) the CMIs with a different compiler. this achieves object-file interoperability, but does not require CMI compatibility. We have no expectation any particular compiler implements the optimization you refer to.

Feel free to note this at your discretion.

This revision is now accepted and ready to land.Oct 11 2022, 4:47 AM

It occurs to me that for systems that lack dynamic initialization (like PTX), it is trivial to implement the 'don't call NOP module inits', because we know that there can be no non-nop inits. Any code emission that tries to create a global init fn with contents will fail to compile.

iains updated this revision to Diff 483770.Dec 17 2022, 11:36 AM

rebased, amended a comment as suggested