This is an archive of the discontinued LLVM Phabricator instance.

[C++20] [Modules] Remain dynamic initializing internal-linkage variables in module interface unit
AbandonedPublic

Authored by ChuanqiXu on Feb 9 2022, 10:51 PM.

Details

Reviewers
rsmith
iains
dblaikie
urnathan
Group Reviewers
Restricted Project
Summary

This fixes issue51873. The issue reports that we couldn't run hello world example in C++20 module. Then I found that the reason is that libstdc++ uses static variable to initialize resources for streams like std::cout. And it would be good if we use libc++.

So the key point here is that wether or not the definition of static variables should be remained in module interface. First I thought the reduction makes sense. Since static variable shouldn't be able to be referred outside the module. Then @rsmith pointed out that this one should be a bug.

The patch tries to remain the definition for dynamic initializing internal-linkage variable.

Diff Detail

Event Timeline

ChuanqiXu requested review of this revision.Feb 9 2022, 10:51 PM
ChuanqiXu created this revision.
urnathan added inline comments.Feb 11 2022, 12:30 PM
clang/test/CodeGenCXX/static-variable-in-module.cpp
3–9

rather than generate a foo.h file, why not (ab)use the preprocessor with internal line directives?

module;
# 3 __FILE__ 1 // use the next physical line number here (and below)
struct S { S(); };
static S s = S();
# 6 "" 2
export module m;
...
urnathan added inline comments.Feb 11 2022, 12:39 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

I don;t think this is correct. That should still be a linkonce odr, otherwise you'll get conflicts with other module implementation units.

9 ↗(On Diff #407396)

Likewise.

clang/test/CXX/modules-ts/basic/basic.def.odr/p4/user.cpp
5 ↗(On Diff #407396)

Likewise.

ChuanqiXu updated this revision to Diff 408315.Feb 13 2022, 6:58 PM

Update test about inline variable in module

ChuanqiXu updated this revision to Diff 408317.Feb 13 2022, 7:32 PM

Handle named module explicitly for better readability.

ChuanqiXu added inline comments.Feb 13 2022, 7:35 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

It is still linkonce_odr in the module it get defined. See the new added test case: inline-variable-in-module.cpp for example. The attribute available_externally is equivalent to external from the perspective of linker. See https://llvm.org/docs/LangRef.html#linkage-types. According to [dcl.inline]p7, inline variable attached to named module should be defined in that domain. Note that the variable attached to global module fragment and private module fragment shouldn't be accessed outside the module, so it implies that all the variable defined in the module could only be defined in the module unit itself.

clang/test/CodeGenCXX/static-variable-in-module.cpp
3–9

Yeah, the form is useful when we need to add expected-* diagnostic message to GMF. But I feel it is a little bit hacker. I guess the form mimics looks like user more wouldn't be worse personally.

ChuanqiXu added inline comments.Feb 14 2022, 1:35 AM
clang/test/CodeGenCXX/static-variable-in-module.cpp
3–9

Ok, I admit this is really helpful and time saving : ) (Now all my new test cases would be wrote in this form)

urnathan added inline comments.Feb 14 2022, 4:53 AM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

There's a couple of issues with this. module.cppm is emitting a (linkonce) definition of inlne_var_exported, but only because it itself is ODR-using that variable. If you take out the ODR-use in noninline_exported, there is no longer a symbol emitted.

But, even if you forced inline vars to be emitted in their defining-module's interface unit, that would be an ABI change. inline vars are emitted whereever ODR-used. They have no fixed home TU. Now, we could alter the ABI and allow interface units to define a home location for inline vars and similar entities (eg, vtables for keyless classes). But we'd need buy-in from other compilers to do that.

FWIW such a discussion did come up early in implementing modules-ts, but we decided there was enough going on just getting the TS implemented. I'm fine with revisiting that, but it is a more significant change.

And it wouldn't apply to (eg) templated variables, which of course could be instantiated anywhere.

clang/test/CodeGenCXX/static-variable-in-module.cpp
3–9

The other thing you can do, which I've seen elsewhere, is an Input subdirectory and put the #include file there. I prefer the direct encoding, 'cos that's less indirection when trying to understand the testcase :)

ChuanqiXu updated this revision to Diff 408694.Feb 14 2022, 7:28 PM

Update tests.

ChuanqiXu added inline comments.Feb 14 2022, 7:33 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Oh, now the key point here is what the correct behavior is instead of the patch. Let's discuss it first.

According to [[ http://eel.is/c++draft/dcl.inline#7 | [dcl.inline]p7 ]],

If an inline function or variable that is attached to a named module is declared in a definition domain, it shall be defined in that domain.

I think the intention of the sentence is to define inline variable in the module interface. So if it is required by the standard, I think other compiler need to follow up. As I described in the summary, it might be a difference between C++20 module and ModuleTS. Do you think it is necessary to send the question to WG21? (I get the behavior from reading the words. Maybe I misread or the word is not intentional).

Maybe the ABI standard group need to discuss what the linkage should be. Now it may be weak_odr or linkonce_odr. It depends on how we compile the file. If we compile the .cppm file directly, it would be linkonce_odr. And if we compile it to *.pcm file first, it would be weak_odr. I have registered an issue for this: https://github.com/llvm/llvm-project/issues/53838.

urnathan added inline comments.Feb 15 2022, 4:21 AM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Oh, now the key point here is what the correct behavior is instead of the patch. Let's discuss it first.

According to [[ http://eel.is/c++draft/dcl.inline#7 | [dcl.inline]p7 ]],

If an inline function or variable that is attached to a named module is declared in a definition domain, it shall be defined in that domain.

I think the intention of the sentence is to define inline variable in the module interface. So if it is required by the standard, I think other compiler need to follow up. As I described in the summary, it might be a difference between C++20 module and ModuleTS. Do you think it is necessary to send the question to WG21? (I get the behavior from reading the words. Maybe I misread or the word is not intentional).

You are reading more into the std than it says. The std specifies what /source code/ is meaningful. It says nothing about how a computation system might represent the program in another form. Most of the latter, for ahead-of-time translation, is at the discretion of compiler implementors. Part of that is the domain of the ABI, which specifies an interface to which different compilers may target, and then have compatibility at the object-file boundary.

Maybe the ABI standard group need to discuss what the linkage should be.

Correct. And right now there is no consensus to do anything different with such entities.
The ABI (http://itanium-cxx-abi.github.io/cxx-abi/abi.html) 5.2 documents such vague-linkage entities. That section would need changes to bless what you are trying to do.

ChuanqiXu added inline comments.Feb 15 2022, 10:01 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

You are reading more into the std than it says. The std specifies what /source code/ is meaningful. It says nothing about how a computation system might represent the program in another form. Most of the latter, for ahead-of-time translation, is at the discretion of compiler implementors. Part of that is the domain of the ABI, which specifies an interface to which different compilers may target, and then have compatibility at the object-file boundary.

OK, your words make sense. In fact, I don't care much about whether or not could we define inline variable in the module unit. The problem I tried to solve is about the definition static variable in module. We couldn't run a simple hello world example if we don't solve it.

What I care about is where should we define inline function. I want to define inline function in the module unit it get declared. And my theory comes from [dcl.inline]p7. And our experiments show that it is the key reason why module could speed up compilation. Our data shows that the compilation could speed up about 40% for the feature. Since most of the time consumed in compilation spent on the middle end, it is really not significant to save the time in frontend. So it matters a lot if we could avoid compiling same functions in middle end.

Originally, I thought I am doing right. But from your words, we couldn't do this until the ABI standard group get in consensus, right?

Finally, I feel it is odd about [dcl.inline]p7. Since if it is not for implementors, I feel it is meaningless for users.

ChuanqiXu added inline comments.Feb 15 2022, 10:35 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Or given that the CXXABI doesn't discuss the case about inline function in named module. Could we think it is a free space? Another question maybe where could we ask for opinion? WG21 or Itanium CXXABI group?

dblaikie added inline comments.Feb 15 2022, 11:14 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Finally, I feel it is odd about [dcl.inline]p7. Since if it is not for implementors, I feel it is meaningless for users.

Presumably that means that a user can't declare an inline function in a module, and define it somewhere else (like in a private implementation unit) - they must define it in the same definition domain it is declared. That's a concrete requirement for the user, irrespective of what object-file-level implementation strategy (where the definition gets emitted, what linkage is used, etc) is used.

ChuanqiXu added inline comments.Feb 15 2022, 11:26 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Presumably that means that a user can't declare an inline function in a module, and define it somewhere else (like in a private implementation unit) - they must define it in the same definition domain it is declared. That's a concrete requirement for the user, irrespective of what object-file-level implementation strategy (where the definition gets emitted, what linkage is used, etc) is used.

Oh, I get it. Thanks.

urnathan added inline comments.Feb 16 2022, 7:17 AM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

yes, I understand the problem you are trying to solve (had the same in GCC). The issue is with internal-linkage entities in global module fragments. Let's consider 3 separate cases.
a) the GMF is a header-unit.

  1. We could either emit it in a header-unit-specific object file (if ODR-used when building that header unit). This would surprise users as now we have a thing that is morally a header-file, but comes with an object file. That is likely to break build flow and is not what GCC does. There is of course a trade off here, in that it's either emitted exactly once, or emitted into every TU that ODR uses it. But is that a significant extra burden? The only variable case I came across was _ioinit. (There are many static inline functions, due to C compatibility, but for those you want to inline them anyway.)
  1. Or we could emit it in every TU that directly or indirectly imports the header unit and ODR uses the entity. This is what GCC does. in the case of _ioinit that means making sure to call its dynamic constructor from the TU's initialization function.
  1. Note we do not clone the internal entity within a single TU, once for each header or module that we import that itself ODR uses the entity.

b) textual inclusion in the GMF of a module. If the module ODR-uses the entity, it needs to be emitted into the object file.

c) Both textual inclusion AND importing of a header-unit. We could emit 2 copies, or we could merge these two definitions. Merging is better (and what GCC does).

The std allows all the alternatives considered above. Does that help?

ChuanqiXu added inline comments.Feb 16 2022, 10:31 PM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Thanks for the explanation. It really helps. Since I didn't touch header-unit before (I mainly focus on named module), I don't have an opinion for (a) and (c). For the (b), it might not be the case. I think we need to emit it all the time. Here is the pattern of <iostream> in libstdc++:

extern istream cin;
// ...
static ios_base::Init __ioinit;

And after including, __ioinit is not used in the module unit, but we couldn't erase it. Otherwise the problem might met segfault. You could find the example in the issue I linked.


In fact, I care more about whether or not should we emit inline function body in module purview to the module unit. But given that this is not the intention of the patch, let's talk it in other places.

Handle static variable only. Don't touch inline variable now.

ChuanqiXu retitled this revision from [C++20] [Modules] Remain variable's definition in module interface to [C++20] [Modules] Remain internal-linkage variables in module interface unit.Feb 16 2022, 11:25 PM
ChuanqiXu edited the summary of this revision. (Show Details)
urnathan added inline comments.Feb 17 2022, 4:02 AM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Thanks for the explanation. It really helps. Since I didn't touch header-unit before (I mainly focus on named module), I don't have an opinion for (a) and (c). For the (b), it might not be the case. I think we need to emit it all the time. Here is the pattern of <iostream> in libstdc++:

extern istream cin;
// ...
static ios_base::Init __ioinit;

And after including, __ioinit is not used in the module unit, but we couldn't erase it. Otherwise the problem might met segfault. You could find the example in the issue I linked.

Hehe, I remember having to look this up. __ioinit /is/ used, because it has a dynamic initializer: [basic.start.dynamic]

there is no need for the compiler to emit unused internal-linkage variables with static initialization.

In fact, I care more about whether or not should we emit inline function body in module purview to the module unit. But given that this is not the intention of the patch, let's talk it in other places.

Again, that's ABI. While we could emit externally-visible out-of-line-bodies into the module's object file (and have importers of the module not emit such out-of-line bodies as needed), that would require the same ABI change as the variable case. I notice that the ABI http://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague speficially coveres inline function bodies (5.2.1), but not inline variables, because it was never updated when those became a feature. But inline variables are just like keyless vtables, which it does cover, and that's what implementations do.)

ChuanqiXu retitled this revision from [C++20] [Modules] Remain internal-linkage variables in module interface unit to [C++20] [Modules] Remain dynamic initializing internal-linkage variables in module interface unit .
ChuanqiXu edited the summary of this revision. (Show Details)

Address comments.

ChuanqiXu added inline comments.Feb 18 2022, 12:56 AM
clang/test/CXX/modules-ts/basic/basic.def.odr/p4/module.cpp
5 ↗(On Diff #407396)

Hehe, I remember having to look this up. __ioinit /is/ used, because it has a dynamic initializer: [basic.start.dynamic]

there is no need for the compiler to emit unused internal-linkage variables with static initialization.

Thanks for the guide. This is addressed.

Again, that's ABI. While we could emit externally-visible out-of-line-bodies into the module's object file (and have importers of the module not emit such out-of-line bodies as needed), that would require the same ABI change as the variable case. I notice that the ABI http://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague speficially coveres inline function bodies (5.2.1), but not inline variables, because it was never updated when those became a feature. But inline variables are just like keyless vtables, which it does cover, and that's what implementations do.)

Thanks for providing it. It really helps. The conclusion I get is:

  • We're not allowed to emit externally-visible out-of-line-bodies in the module's object file if we want to follow current ABI rule.
  • We're allowed to do so for inline function. (Do I misread it?)

For the first part, since ItaniumABI don't introduce module now, I feel it is not allowed not banned. It doesn't specify it. I think it worths to allow such cases. It would speed up the compilation very significantly. I would like to throw the idea for CXXABI to see it.
(But after all, this is beyond the scope of the current patch. Let's discuss it later.)

(maybe relevant: For what it's worth: I originally implemented inline function homing in modules codegen for Clang Header Modules - the results I got for object file size in an -O0 build were marginal - a /slight/ win in object file size, but not as much as we might've expected. Part of the reason might be that there can be inline functions that are never called, or at higher optimization levels, inline functions that always get inlined (via "available externally" definitions) - in that case, providing "homed" definitions creates inline function definitions that are unused during linking/a waste of space. It's possible the workload I was dealing with (common Google internal programs) skewed compared to broader C++ code - for instance heavy use of protobufs could be leading to a lot of generated code/inline functions that are mostly unused. I didn't iterate further to tweak/implement heuristics about which inline functions should be homed. I'm not sure if Richard Smith made a choice about not homing inline functions in C++20 modules because of these results, or for other reasons, or just as a consequence of the implementation - but given we had the logic in Clang to do inline function homing for Clang Header Modules, I'm guessing it was an intentional choice to not use that functionality in C++20 modules when they have to have an object file anyway)

That's interesting data. I guess one could emit the out-of-line bodies into their own sections, and then rely on linker section GC to elide them in the static link. But if you're building an SO, you presumably need to provide them unilaterally in the static link, and callers would use a PLT to get to them. One could continue emitting them COMDAT, and unilaterally emit them in the module interface's object file, but that will still leave the SO problem. I think.

I'm not sure if Richard Smith made a choice about not homing inline functions in C++20 modules because of these results, or for other reasons, or just as a consequence of the implementation -

Richard and I discussed taking advantage of this kind of new home location, certainly for key-less polymorphic classes. I was against it as it was more work :) Blame me.

That's interesting data. I guess one could emit the out-of-line bodies into their own sections, and then rely on linker section GC to elide them in the static link.

Yep, that's what I did - they were emitted weak_odr (weak linkage in ELF semantics) - but I was mostly interested in the object size reduction in fully statically linked binaries, so producing redundant or unused descriptions was hurting that goal. (& yeah, shared library issues are a whole other bucket I didn't even consider/evaluate)

Richard and I discussed taking advantage of this kind of new home location, certainly for key-less polymorphic classes. I was against it as it was more work :) Blame me.

Ah, possibly. ( https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20170710/198206.html - seems I've had this question for a while (pinged this thread a few times over the years) & never got a clear answer from Richard at least... (@rsmith ?) - but I vaguely remember discussing it, maybe in some more ad-hoc conversation that didn't make it onto the mailing list... )

(maybe relevant: For what it's worth: I originally implemented inline function homing in modules codegen for Clang Header Modules - the results I got for object file size in an -O0 build were marginal - a /slight/ win in object file size, but not as much as we might've expected. Part of the reason might be that there can be inline functions that are never called, or at higher optimization levels, inline functions that always get inlined (via "available externally" definitions) - in that case, providing "homed" definitions creates inline function definitions that are unused during linking/a waste of space. It's possible the workload I was dealing with (common Google internal programs) skewed compared to broader C++ code - for instance heavy use of protobufs could be leading to a lot of generated code/inline functions that are mostly unused. I didn't iterate further to tweak/implement heuristics about which inline functions should be homed. I'm not sure if Richard Smith made a choice about not homing inline functions in C++20 modules because of these results, or for other reasons, or just as a consequence of the implementation - but given we had the logic in Clang to do inline function homing for Clang Header Modules, I'm guessing it was an intentional choice to not use that functionality in C++20 modules when they have to have an object file anyway)

Thanks for sharing this. I didn't consider code size before. I agree the result should depends on the pattern of the program. I guess the code size may increase or decrease between different projects.

Richard and I discussed taking advantage of this kind of new home location, certainly for key-less polymorphic classes. I was against it as it was more work :) Blame me.

From my experience, it depends on how well we want to implement. A plain implementation is relatively easy. It would speed up the compilation significantly in O0. But it doesn't work well with optimization turned on, since we need to do optimization and we must import all the function by available_externally to enable a complete optimization. In this case (with optimization), we could only save the time for Preprocessor, Parser, Semantic analysis and backend. But the big part of compilation takes on the middle end and we need to pay for the serialization and deserialization. My experiment shows that we could only get 5% improvement on compilation time with named module in optimization turned on. (We could offer an option to make it compile fast at On by not emitting functions in other module unit, but it would hurt the performance obviously).

A good implementation may attach optimized IR to the PCM files. Since the standard talks nothing about CMI/BMI (PCM), we are free to compile it during the middle end and attach the optimized IR to PCM files. And when we imports the optimized PCM, we could extract the optimized function on need. We could mark such functions with a special attribute (like 'optimized_available_externally'?) to tell the compiler not optimize it and delete it after middle end optimization gets done. Such functions is only available for inlining (or any other IPO). I think this wouldn't hurt performance and we could get a win for the compilation speed. But I agree this is not easy to implement.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 6:20 PM
urnathan resigned from this revision.Mar 9 2022, 10:54 AM
ChuanqiXu added a reviewer: Restricted Project.Mar 20 2022, 11:16 PM

(maybe relevant: For what it's worth: I originally implemented inline function homing in modules codegen for Clang Header Modules - the results I got for object file size in an -O0 build were marginal - a /slight/ win in object file size, but not as much as we might've expected. Part of the reason might be that there can be inline functions that are never called, or at higher optimization levels, inline functions that always get inlined (via "available externally" definitions) - in that case, providing "homed" definitions creates inline function definitions that are unused during linking/a waste of space. It's possible the workload I was dealing with (common Google internal programs) skewed compared to broader C++ code - for instance heavy use of protobufs could be leading to a lot of generated code/inline functions that are mostly unused. I didn't iterate further to tweak/implement heuristics about which inline functions should be homed. I'm not sure if Richard Smith made a choice about not homing inline functions in C++20 modules because of these results, or for other reasons, or just as a consequence of the implementation - but given we had the logic in Clang to do inline function homing for Clang Header Modules, I'm guessing it was an intentional choice to not use that functionality in C++20 modules when they have to have an object file anyway)

Thanks for sharing this. I didn't consider code size before. I agree the result should depends on the pattern of the program. I guess the code size may increase or decrease between different projects.

Richard and I discussed taking advantage of this kind of new home location, certainly for key-less polymorphic classes. I was against it as it was more work :) Blame me.

From my experience, it depends on how well we want to implement. A plain implementation is relatively easy. It would speed up the compilation significantly in O0.

Not even necessarily then - if you have code like protobufs (large amounts of autogenerated code, much of which you might never call) - or even just libraries where you only use a subset of their features - then you might have more code generated with an inline-function-homing

But it doesn't work well with optimization turned on, since we need to do optimization and we must import all the function by available_externally to enable a complete optimization. In this case (with optimization), we could only save the time for Preprocessor, Parser, Semantic analysis and backend. But the big part of compilation takes on the middle end and we need to pay for the serialization and deserialization. My experiment shows that we could only get 5% improvement on compilation time with named module in optimization turned on. (We could offer an option to make it compile fast at On by not emitting functions in other module unit, but it would hurt the performance obviously).

A good implementation may attach optimized IR to the PCM files. Since the standard talks nothing about CMI/BMI (PCM), we are free to compile it during the middle end and attach the optimized IR to PCM files. And when we imports the optimized PCM, we could extract the optimized function on need. We could mark such functions with a special attribute (like 'optimized_available_externally'?) to tell the compiler not optimize it and delete it after middle end optimization gets done. Such functions is only available for inlining (or any other IPO). I think this wouldn't hurt performance and we could get a win for the compilation speed. But I agree this is not easy to implement.

Possibly - there's still the possibility that even without any importing (& just homing into the module object file) that it'll cost more than it benefits due to inline functions that are never called.

Not even necessarily then - if you have code like protobufs (large amounts of autogenerated code, much of which you might never call) - or even just libraries where you only use a subset of their features - then you might have more code generated with an inline-function-homing

From the perspective of code size, it would be larger in inline-function-homing strategies in cases you described. But I guess it wouldn't hurt in compilation speed, is it? For the consideration of code size, I think we could mitigate the problem by thinLTO. (Although we couldn't solve the problem completely since thinLTO couldn't help for *.o, *.a and *.so files.)

Possibly - there's still the possibility that even without any importing (& just homing into the module object file) that it'll cost more than it benefits due to inline functions that are never called.

Yeah, I met the situation before. In a simple case, that the time spent on the deserialization looks more than the time we saved. By from my experience (my experience is not complete though), this kind of cases is relatively small.


@dblaikie given that the talk involves further optimization, which is beyond the scope of current patch. Would you like to accept this one?

Not even necessarily then - if you have code like protobufs (large amounts of autogenerated code, much of which you might never call) - or even just libraries where you only use a subset of their features - then you might have more code generated with an inline-function-homing

From the perspective of code size, it would be larger in inline-function-homing strategies in cases you described. But I guess it wouldn't hurt in compilation speed, is it?

Depending on optimizations, some amount of compilation time is proportional to code size.

For the consideration of code size, I think we could mitigate the problem by thinLTO. (Although we couldn't solve the problem completely since thinLTO couldn't help for *.o, *.a and *.so files.)

Yep. But with ThinLTO we wouldn't need to produce available_externally definitions anyway, since ThinLTO can cross-object import anyway.

Possibly - there's still the possibility that even without any importing (& just homing into the module object file) that it'll cost more than it benefits due to inline functions that are never called.

Yeah, I met the situation before. In a simple case, that the time spent on the deserialization looks more than the time we saved. By from my experience (my experience is not complete though), this kind of cases is relatively small.


@dblaikie given that the talk involves further optimization, which is beyond the scope of current patch. Would you like to accept this one?

I don't have enough context for the current state of this patch - might be useful to post a summary of the current state?

clang/test/CodeGenCXX/static-variable-in-module.cpp
3–9

Yeah, I'd be in favor of inline file directives rather than catting source code into other files - that source code may get unwieldy if it needs to be more complicated than a small struct declaration (already catting 3 lines is a bit awkward) and doesn't get syntax highlighting, etc.

Not even necessarily then - if you have code like protobufs (large amounts of autogenerated code, much of which you might never call) - or even just libraries where you only use a subset of their features - then you might have more code generated with an inline-function-homing

From the perspective of code size, it would be larger in inline-function-homing strategies in cases you described. But I guess it wouldn't hurt in compilation speed, is it?

Depending on optimizations, some amount of compilation time is proportional to code size.

Yeah, I am not strict enough. I mean the compilation time of the importee wouldn't be hurt. The compilation time of the module itself would be more of course. Given the module is imported in many places, it would be a win. But if it is not, then it might be a bad idea.
So the conclusion would be that inline-function-homing wouldn't be a pure win all the time. I just want to say it might be a win in a lot of cases.

For the consideration of code size, I think we could mitigate the problem by thinLTO. (Although we couldn't solve the problem completely since thinLTO couldn't help for *.o, *.a and *.so files.)

Yep. But with ThinLTO we wouldn't need to produce available_externally definitions anyway, since ThinLTO can cross-object import anyway.

Yeah. My point is that with the inline-function-homing strategy, we could not produce available_externally functions in the frontend so that we could get a big compilation speedup. In some of my experiments, the speed could be 5x than before : ) It is exciting.

Possibly - there's still the possibility that even without any importing (& just homing into the module object file) that it'll cost more than it benefits due to inline functions that are never called.

Yeah, I met the situation before. In a simple case, that the time spent on the deserialization looks more than the time we saved. By from my experience (my experience is not complete though), this kind of cases is relatively small.


@dblaikie given that the talk involves further optimization, which is beyond the scope of current patch. Would you like to accept this one?

I don't have enough context for the current state of this patch - might be useful to post a summary of the current state?

Yeah. Let me try to give a summary. The intention of the patch is fix the following problem:

// Hello.cppm
module;
#include <cstdio>
export module Hello;
export void SayHello()
{
    std::printf("Hello World.\n");

}
// main.cpp
import Hello;
int main() {
   SayHello();
   return 0;
}

The program compiled in this way would met a segment fault at run time if it uses libstdc++. The root cause is that libstdc++ uses the constructor of the following static variable to do resources initializations.

// For construction of filebuffers for cout, cin, cerr, clog et. al.
static ios_base::Init __ioinit;

And in the original implementation, the static variable (with internal linkage) is considered invisible to the outside so it wouldn't be generated in the PCM (the binary form of AST).

To solve the problem, the patch intends to remain the static variable (or say, variable with internal linkage). Then @urnathan pointed that we should only generate the static variables with dynamic initializer. Then this is the current status of patch.

Although the discuss is very long, almost of them talks about the inline-function-strategy. We talked a lot about that whether it is valid or whether it is useful. But all of them are beyond the scope of the patch : )

is this an issue for Clang Header Modules codegen too? (maybe the solution should be the same for both, then)
& maybe we should do this regardless of the presence/absence/type of initializer, just for consistency?

is this an issue for Clang Header Modules codegen too? (maybe the solution should be the same for both, then)

For my limited knowledge to clang header modules, .modulemap is a necessary condition to use clang header modules. And there is one .modulemap in libcxx so that user could use import <iostream> for libcxx. Then libcxx implements <iostream> in another fashion which wouldn't meet the problem. And according to my understanding to the clang/llvm ecosystem, libstdc++ is not included as a target user of clang header modules, is it? If we are talking about if it would be a problem that we use clang header modules for libstdc++'s <iostream>, (I don
't do a experiment!) I think we would meet the same problem and the solution should be the same as one.

& maybe we should do this regardless of the presence/absence/type of initializer, just for consistency?

Yeah, they should have internal linkage so it wouldn't be bad to emit them all. But if we choose consistency here, we would pay for the redundancy. My personal opinion would be that simplicity is more important here.

iains added a comment.Mar 26 2022, 1:12 PM

I think that this problem might well be a consequence of the bug which is fixed by D122413.

We have been generating code with module internal entities (always) given the special ModuleInternalLinkage (which means that, although the linkage is formally 'internal', the entities are made global when emitted. We should only be doing this for fmodules-ts, not for regular standard modules.

If you apply D122413 (which I hope to land soon), then I would expect that iostream should work as expected (with one internal instance of std::__ioinit in each TU that includes iostream).

IFF (after applying D122413 ) you add to the command line -fmodules-ts, then the patch here (D119409) would, presumably, be needed to work around multiple instances of the globalised std::__ioinit.

iains added a comment.Mar 26 2022, 1:15 PM

addendum: note we still have work to do on the module initialisers - those are not correct yet (so probably some nesting of modules might not work).

I think that this problem might well be a consequence of the bug which is fixed by D122413.

We have been generating code with module internal entities (always) given the special ModuleInternalLinkage (which means that, although the linkage is formally 'internal', the entities are made global when emitted. We should only be doing this for fmodules-ts, not for regular standard modules.

If you apply D122413 (which I hope to land soon), then I would expect that iostream should work as expected (with one internal instance of std::__ioinit in each TU that includes iostream).

IFF (after applying D122413 ) you add to the command line -fmodules-ts, then the patch here (D119409) would, presumably, be needed to work around multiple instances of the globalised std::__ioinit.

Sadly it wouldn't work after D122413 applied. Since the <iostream> is lived in GlobalModuleFragment, the calculated linkage wouldn't affect them. So I met the same segfault as before.

addendum: note we still have work to do on the module initialisers - those are not correct yet (so probably some nesting of modules might not work).

What does the nesting of modules mean?

iains added a comment.Mar 28 2022, 2:16 AM

I think that this problem might well be a consequence of the bug which is fixed by D122413.

We have been generating code with module internal entities (always) given the special ModuleInternalLinkage (which means that, although the linkage is formally 'internal', the entities are made global when emitted. We should only be doing this for fmodules-ts, not for regular standard modules.

If you apply D122413 (which I hope to land soon), then I would expect that iostream should work as expected (with one internal instance of std::__ioinit in each TU that includes iostream).

IFF (after applying D122413 ) you add to the command line -fmodules-ts, then the patch here (D119409) would, presumably, be needed to work around multiple instances of the globalised std::__ioinit.

Sadly it wouldn't work after D122413 applied. Since the <iostream> is lived in GlobalModuleFragment, the calculated linkage wouldn't affect them. So I met the same segfault as before.

Is this because we are not creating an initialiser for a static entity in the GMF ?

  • I did a quick test and that seemed to be the case.

(module initialisers need quite some work, it seems)

addendum: note we still have work to do on the module initialisers - those are not correct yet (so probably some nesting of modules might not work).

What does the nesting of modules mean?

If we have an import of a module that imports another - then we should be running the module initializers for the imported stack (in the correct order) .. at present, we do not do this.
As noted above, we have some work to do here.

I think that this problem might well be a consequence of the bug which is fixed by D122413.

We have been generating code with module internal entities (always) given the special ModuleInternalLinkage (which means that, although the linkage is formally 'internal', the entities are made global when emitted. We should only be doing this for fmodules-ts, not for regular standard modules.

If you apply D122413 (which I hope to land soon), then I would expect that iostream should work as expected (with one internal instance of std::__ioinit in each TU that includes iostream).

IFF (after applying D122413 ) you add to the command line -fmodules-ts, then the patch here (D119409) would, presumably, be needed to work around multiple instances of the globalised std::__ioinit.

Sadly it wouldn't work after D122413 applied. Since the <iostream> is lived in GlobalModuleFragment, the calculated linkage wouldn't affect them. So I met the same segfault as before.

Is this because we are not creating an initialiser for a static entity in the GMF ?

  • I did a quick test and that seemed to be the case.

I think we need this one finally, It would create the initialiser after the patch applied. And I think we couldn't do that without the patch. Since from the code we could see that the static variable wouldn't be generated in the current strategies.

(module initialisers need quite some work, it seems)

The initialiser above I said is the initialiser in that TU. What you mean module initializer ? Do you mean the one module could have only module initializer?

addendum: note we still have work to do on the module initialisers - those are not correct yet (so probably some nesting of modules might not work).

What does the nesting of modules mean?

If we have an import of a module that imports another - then we should be running the module initializers for the imported stack (in the correct order) .. at present, we do not do this.
As noted above, we have some work to do here.

I am not familiar with the history here. But I found http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1874r1.html#solution. It says clang already has a simple fix. So I am wondering if this one is already fixed or we are not talking about the same thing?

iains added a comment.Apr 6 2022, 3:45 AM

I think that this problem might well be a consequence of the bug which is fixed by D122413.

We have been generating code with module internal entities (always) given the special ModuleInternalLinkage (which means that, although the linkage is formally 'internal', the entities are made global when emitted. We should only be doing this for fmodules-ts, not for regular standard modules.

If you apply D122413 (which I hope to land soon), then I would expect that iostream should work as expected (with one internal instance of std::__ioinit in each TU that includes iostream).

IFF (after applying D122413 ) you add to the command line -fmodules-ts, then the patch here (D119409) would, presumably, be needed to work around multiple instances of the globalised std::__ioinit.

Sadly it wouldn't work after D122413 applied. Since the <iostream> is lived in GlobalModuleFragment, the calculated linkage wouldn't affect them. So I met the same segfault as before.

Is this because we are not creating an initialiser for a static entity in the GMF ?

  • I did a quick test and that seemed to be the case.

I think we need this one finally, It would create the initialiser after the patch applied. And I think we couldn't do that without the patch. Since from the code we could see that the static variable wouldn't be generated in the current strategies.

(module initialisers need quite some work, it seems)

The initialiser above I said is the initialiser in that TU. What you mean module initializer ? Do you mean the one module could have only module initializer?

addendum: note we still have work to do on the module initialisers - those are not correct yet (so probably some nesting of modules might not work).

What does the nesting of modules mean?

If we have an import of a module that imports another - then we should be running the module initializers for the imported stack (in the correct order) .. at present, we do not do this.
As noted above, we have some work to do here.

I am not familiar with the history here. But I found http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1874r1.html#solution. It says clang already has a simple fix. So I am wondering if this one is already fixed or we are not talking about the same thing?

Sorry for slow response - was hoping to have a patch to show by now
... I am currently finishing off an implementation for 1874 + elision of unused GMF decls. The "simple fix" in clang actually pulls all the initialisers into one list in the top-level module (which is slightly different from the intent of the paper, as implemented in GCC) - that implementation also does not fix the issue of a module interface where there is no explicit "import" for the sub-modules.

The patch in progress will build a per module initialiser (which will handle sub-module initialisers and calling module initialisers for direct imports). I think that will fix the iostreams problem (it is a motivating example from the paper).

I think that this problem might well be a consequence of the bug which is fixed by D122413.

We have been generating code with module internal entities (always) given the special ModuleInternalLinkage (which means that, although the linkage is formally 'internal', the entities are made global when emitted. We should only be doing this for fmodules-ts, not for regular standard modules.

If you apply D122413 (which I hope to land soon), then I would expect that iostream should work as expected (with one internal instance of std::__ioinit in each TU that includes iostream).

IFF (after applying D122413 ) you add to the command line -fmodules-ts, then the patch here (D119409) would, presumably, be needed to work around multiple instances of the globalised std::__ioinit.

Sadly it wouldn't work after D122413 applied. Since the <iostream> is lived in GlobalModuleFragment, the calculated linkage wouldn't affect them. So I met the same segfault as before.

Is this because we are not creating an initialiser for a static entity in the GMF ?

  • I did a quick test and that seemed to be the case.

I think we need this one finally, It would create the initialiser after the patch applied. And I think we couldn't do that without the patch. Since from the code we could see that the static variable wouldn't be generated in the current strategies.

(module initialisers need quite some work, it seems)

The initialiser above I said is the initialiser in that TU. What you mean module initializer ? Do you mean the one module could have only module initializer?

addendum: note we still have work to do on the module initialisers - those are not correct yet (so probably some nesting of modules might not work).

What does the nesting of modules mean?

If we have an import of a module that imports another - then we should be running the module initializers for the imported stack (in the correct order) .. at present, we do not do this.
As noted above, we have some work to do here.

I am not familiar with the history here. But I found http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1874r1.html#solution. It says clang already has a simple fix. So I am wondering if this one is already fixed or we are not talking about the same thing?

Sorry for slow response - was hoping to have a patch to show by now
... I am currently finishing off an implementation for 1874 + elision of unused GMF decls. The "simple fix" in clang actually pulls all the initialisers into one list in the top-level module (which is slightly different from the intent of the paper, as implemented in GCC) - that implementation also does not fix the issue of a module interface where there is no explicit "import" for the sub-modules.

The patch in progress will build a per module initialiser (which will handle sub-module initialisers and calling module initialisers for direct imports). I think that will fix the iostreams problem (it is a motivating example from the paper).

Cool. BTW, I want to be sure that it would solve the problem this patch shows. I mean the problem presented in this patch is about named module, while the motivating example from that paper is about header unit.