This is an archive of the discontinued LLVM Phabricator instance.

[libc++][modules] Removes the module partitions.
ClosedPublic

Authored by Mordante on Aug 2 2023, 8:10 AM.

Details

Summary

This patch is based on the suggestion by @ChuanqiXu on discourse
(https://discourse.llvm.org/t/alternatives-to-the-implementation-of-std-modules/71958)

Instead of making a module partition per header every header gets an inc
file which contains the exports per header. The std module then includes
all public headers and these inc files. The one file per header is
useful for testing purposes. The CI tests whether the exports of a
header's module partition matches the "public" named declarations in the
header. With one file per header this can still be done.

The patch improves compilation time of files using "import std;" and the
size of the std module.

A comparision of the compilation speed using a libc++ test

build/bin/llvm-lit -a -Dstd=c++23 -Denable_modules=std libcxx/test/std/modules/std.pass.cpp

Which boils down to

import std;

int main(int, char**) {
  std::println("Hello modular world");
  return 0;
}

and has -ftime-report enabled

Before

Clang front-end time report

-------------------------------------------------------------------------

Total Execution Time: 8.6585 seconds (8.6619 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 4.5041 ( 57.2%)   0.4264 ( 54.4%)   4.9305 ( 56.9%)   4.9331 ( 57.0%)  Clang front-end timer
 3.2037 ( 40.7%)   0.2408 ( 30.7%)   3.4445 ( 39.8%)   3.4452 ( 39.8%)  Reading modules
 0.1665 (  2.1%)   0.1170 ( 14.9%)   0.2835 (  3.3%)   0.2837 (  3.3%)  Loading .../build/test/__config_module__/CMakeFiles/std.dir/std.pcm
 7.8744 (100.0%)   0.7842 (100.0%)   8.6585 (100.0%)   8.6619 (100.0%)  Total

After

Clang front-end time report

-------------------------------------------------------------------------

Total Execution Time: 1.2420 seconds (1.2423 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 0.8892 ( 84.6%)   0.1698 ( 88.8%)   1.0590 ( 85.3%)   1.0590 ( 85.2%)  Clang front-end timer
 0.1533 ( 14.6%)   0.0168 (  8.8%)   0.1701 ( 13.7%)   0.1704 ( 13.7%)  Reading modules
 0.0082 (  0.8%)   0.0047 (  2.5%)   0.0129 (  1.0%)   0.0129 (  1.0%)  Loading .../build/test/__config_module__/CMakeFiles/std.dir/std.pcm
 1.0507 (100.0%)   0.1913 (100.0%)   1.2420 (100.0%)   1.2423 (100.0%)  Total

Using "include <print>" instead of "import module;"

Clang front-end time report

-------------------------------------------------------------------------

Total Execution Time: 2.1507 seconds (2.1517 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 1.9714 (100.0%)   0.1793 (100.0%)   2.1507 (100.0%)   2.1517 (100.0%)  Clang front-end timer
 1.9714 (100.0%)   0.1793 (100.0%)   2.1507 (100.0%)   2.1517 (100.0%)  Total

It's possible to use the std module in external projects
(https://libcxx.llvm.org/Modules.html#using-in-external-projects)

Tested this with a private project to validate the size of the generated files:

Before
$ du -sch std-*
448M std-build
508K std-src
120K std-subbuild
449M total

After
$ du -sch std-*
29M std-build
1004K std-src
132K std-subbuild
30M total

Diff Detail

Event Timeline

Mordante created this revision.Aug 2 2023, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 8:10 AM
Herald added a subscriber: arphaman. · View Herald Transcript
Mordante published this revision for review.Aug 2 2023, 10:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 10:45 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Thanks for your time! The numbers are very surprising and compelling to me too! It gives me more confidence for modules. And this patch LGTM and I'd like to leave the formal approval to @ldionne.

And for the topic whether or not to backport this patch to 17.x, my thought is definitely "yes, we should". On the one hand, the std modules are a very important feature at least to me and I don't want the C++ fans feel frustrated after they take some personal times to give a try and found that the std modules is slower 4.x times than the non-modules code in a hello world example. And now we have a chance to improve that. Then I can't understand why we don't do that. On the other hand, the std modules feature is still experimental, so it might not be a blocker to do some changes. Especially, we haven't released it yet. So it doesn't sound like a blocker to me too.

philnik added a subscriber: philnik.Aug 2 2023, 7:48 PM

I'm a bit concerned about the maintainability of this. AFAICT this change makes it basically impossible for tooling to parse the modules correctly. Would it be possible to have one partition that includes everything and have other modules do the actual exports (or something similar)? That should allow tooling to parse everything correctly and we don't have the problem that we have to parse a lot of code multiple times. Another option would be to just have a single huge file. I haven't looked at everything, but the total size would probably be < 5k lines, which wouldn't even make it the largest file we have in the library.

w.r.t. back-porting this it's a definite no from me, since this is an improvement of the library. Just like we shouldn't back-port other performance improvements without a really good reason. This feature is also still experimental. There is always a next release in a few months.

I'm a bit concerned about the maintainability of this. AFAICT this change makes it basically impossible for tooling to parse the modules correctly. Would it be possible to have one partition that includes everything and have other modules do the actual exports (or something similar)? That should allow tooling to parse everything correctly and we don't have the problem that we have to parse a lot of code multiple times. Another option would be to just have a single huge file. I haven't looked at everything, but the total size would probably be < 5k lines, which wouldn't even make it the largest file we have in the library.

While the overhead to introduce another partition should be small, I don't understand what's the problem for tools to parse the modules correctly now?

w.r.t. back-porting this it's a definite no from me, since this is an improvement of the library. Just like we shouldn't back-port other performance improvements without a really good reason. This feature is also still experimental. There is always a next release in a few months.

I think 8.x compilation speedup and 10.x size reduction may be a good reason.

I'm a bit concerned about the maintainability of this. AFAICT this change makes it basically impossible for tooling to parse the modules correctly. Would it be possible to have one partition that includes everything and have other modules do the actual exports (or something similar)? That should allow tooling to parse everything correctly and we don't have the problem that we have to parse a lot of code multiple times. Another option would be to just have a single huge file. I haven't looked at everything, but the total size would probably be < 5k lines, which wouldn't even make it the largest file we have in the library.

While the overhead to introduce another partition should be small, I don't understand what's the problem for tools to parse the modules correctly now?

The problem is that the files aren't self-contained, so a tool parsing only a single file doesn't know that other stuff happens to already be included. That's a huge problem for IDEs.

w.r.t. back-porting this it's a definite no from me, since this is an improvement of the library. Just like we shouldn't back-port other performance improvements without a really good reason. This feature is also still experimental. There is always a next release in a few months.

I think 8.x compilation speedup and 10.x size reduction may be a good reason.

I don*t think so. We have these kinds of speedups all the time. I have like a dozen patches locally that reach these kinds of speedups (some even 300x) for algorithms, but we shouldn't back-port them because they are improvements; they aren't fixing a regression.

I'm a bit concerned about the maintainability of this. AFAICT this change makes it basically impossible for tooling to parse the modules correctly. Would it be possible to have one partition that includes everything and have other modules do the actual exports (or something similar)? That should allow tooling to parse everything correctly and we don't have the problem that we have to parse a lot of code multiple times. Another option would be to just have a single huge file. I haven't looked at everything, but the total size would probably be < 5k lines, which wouldn't even make it the largest file we have in the library.

While the overhead to introduce another partition should be small, I don't understand what's the problem for tools to parse the modules correctly now?

The problem is that the files aren't self-contained, so a tool parsing only a single file doesn't know that other stuff happens to already be included. That's a huge problem for IDEs.

IIUC, you're saying when you're opening files like modules/std/vector.inc, the IDE/code intelligence can't tell you any thing. I am not sure if this is a problem. If yes, it may be a solution to create a huge file called std.cppm. And we can't put all the include in a partition without an export and export all of them in the primary module interface. Since the entities in the global module fragments are not visible to other module units even if they are in the same module.

w.r.t. back-porting this it's a definite no from me, since this is an improvement of the library. Just like we shouldn't back-port other performance improvements without a really good reason. This feature is also still experimental. There is always a next release in a few months.

I think 8.x compilation speedup and 10.x size reduction may be a good reason.

I don*t think so. We have these kinds of speedups all the time. I have like a dozen patches locally that reach these kinds of speedups (some even 300x) for algorithms, but we shouldn't back-port them because they are improvements; they aren't fixing a regression.

Of course it is up to the libcxx developers. I just want to express my feelings as a user or a c++ fans. And I feel it is different from the speedups for algorithms. Since the std modules are big whole thing. I mean, the big improvement on a small subsystem may be small for the overall system. Also many improvements depend on the input. It is usual that we made some improvements for a (kind of) specific input(s) but without affecting the overall system. However, for the use of std module, there is only one input. So I feel it is pretty important.

I like this change a lot. Seems like its strictly better than the previous variant. Are there any "downsides" to this change? I guess just the the single module file takes longer to build since its one gigantic file?

"Downstream-wise", the Bazel implementation will be simple to get working with this change and the improved buildtime and artifact sizes far outweigh the slight inconvenience of changing buildfiles.

No strong opinions regarding backporting. I'd expect users of such experimental technologies to live at head anyways 😅

Oh, wait a minute. I just realized that we didn't ship module files, right? So that the users who get libc++ by things like apt/yum install wouldn't see the std module. Then the people who can access the std module must be the people who are able to compile it from source. If all of these things are true, it is indeed not necessary to backport this since nearly no release users will get benefits from it.

I'm a bit concerned about the maintainability of this. AFAICT this change makes it basically impossible for tooling to parse the modules correctly. Would it be possible to have one partition that includes everything and have other modules do the actual exports (or something similar)? That should allow tooling to parse everything correctly and we don't have the problem that we have to parse a lot of code multiple times. Another option would be to just have a single huge file. I haven't looked at everything, but the total size would probably be < 5k lines, which wouldn't even make it the largest file we have in the library.

While the overhead to introduce another partition should be small, I don't understand what's the problem for tools to parse the modules correctly now?

The problem is that the files aren't self-contained, so a tool parsing only a single file doesn't know that other stuff happens to already be included. That's a huge problem for IDEs.

I think it is an issue, but not a huge issue. I added a README explaining the design. But basically the std module need to look like

module;

#include <all_libc++_public_headers>

export module std;

export namespace std {

  using std::all_public_named_declarations;

}

The split in the #include and using is the issue. In the future it might be possible to consolidate it again like

export module std;

#include <libc++_public_header_1>
export namespace std {
  using std::all_public_named_declarations_of_header_1;
}

#include <libc++_public_header_2>
export namespace std {
  using std::all_public_named_declarations_of_header_2;
}

But include after export is not allowed.

I like this change a lot. Seems like its strictly better than the previous variant. Are there any "downsides" to this change? I guess just the the single module file takes longer to build since its one gigantic file?

"Downstream-wise", the Bazel implementation will be simple to get working with this change and the improved buildtime and artifact sizes far outweigh the slight inconvenience of changing buildfiles.

It also removes the special case for __new for you :-)

No strong opinions regarding backporting. I'd expect users of such experimental technologies to live at head anyways 😅

Agreed.

Oh, wait a minute. I just realized that we didn't ship module files, right? So that the users who get libc++ by things like apt/yum install wouldn't see the std module. Then the people who can access the std module must be the people who are able to compile it from source. If all of these things are true, it is indeed not necessary to backport this since nearly no release users will get benefits from it.

Yes modules are not installed yet since they are still experimental and I didn't have enough time to work on this part before LLVM-17.

But include after export is not allowed.

Actually, what is not allowed is "include after import" and the import contains duplicate declarations with the #include. Also if we #include the declarations in the module purview, the linkage name will change unless we wrap them in a language linkage extern "C++". In another word, #include them in the module purview may produce ABI breakage.

Yes modules are not installed yet since they are still experimental and I didn't have enough time to work on this part before LLVM-17.

Yeah, it doesn't matter.

ldionne accepted this revision.Aug 8 2023, 10:09 AM

This LGTM with some minor comments.

I am not overly concerned with maintainability, it doesn't seem a lot worse than what we had before. Concretely, I think that the amount of interaction with these files by libc++ developers is going to be quite limited, and IMO working on improving how we report failure-to-have-updated the .inc files is what will yield the most value for our workflow.

I would tend not to backport this. First, we're not shipping C++20 modules to users, so backporting would have no effect. Also, this is experimental and it's technically not a bug (although that's arguable), so I feel like we wouldn't be following our usual policy for backports if we did.

libcxx/modules/README.md
5
14
20–24
libcxx/test/libcxx/module_std.gen.py
146–152

IMO the historical part of the comment is useful in this review, but not in the code.

158–160

That seems to explain why we're doing this a bit more explicitly?

This revision is now accepted and ready to land.Aug 8 2023, 10:09 AM
Mordante updated this revision to Diff 548627.Aug 9 2023, 8:05 AM

Rebased to test CI and addresses review comments.

Mordante marked 5 inline comments as done.Aug 9 2023, 8:05 AM
This revision was landed with ongoing or failed builds.Aug 9 2023, 10:39 AM
This revision was automatically updated to reflect the committed changes.
libcxx/modules/std/vector.inc