This is an alternative of https://reviews.llvm.org/D126694,
which is originally developed by Iain Sandoe <iain@sandoe.co.uk>
This patch intends to implement the feature to discard unreachable
decls in the global module fragment. The formal wording lives in
[[module.global.frag]](http://eel.is/c++draft/module.global.frag).
Informally, the feature means that we can discard the not (transitively)
used/referenced decls in the global module fragment. This is important
since many declarations in the headers won't be used in practice. So that we
can save spaces. Also it is helpful for the compilation speed since it is possible that
we can avoid handling redecorations from different modules, which is a major blocking
issue for the performance of C++20 named modules.
To achieve this, we need to implement this in the serialization side instead of deserialization side.
Otherwise, the feature won't be helpful for the size and the speed.
Different from https://reviews.llvm.org/D126694, this patch tries to mark the discardable decls (in the GMF) within
the process of writing BMIs instead of marking them with a manual yet-another AST Visitors. The reason is:
- It is a natural process for AST Writers. The part that ASTWriters don't write is naturally unreachable.
- From the engineering perspective, it looks bad to repeat the AST Visitors again. There are so many codes in ASTWriters. Also it may be slow to visit the AST again with a standalone traversal.
Now the patch can pass all the in-tree tests. But I meet some ODR violations after I apply this to some real world projects. So I need to resolve that first.
Performance
libcxx std module
Tested with std module from libcxx: https://libcxx.llvm.org/Modules.html
(commit version: 7aebe4eaaa10155dc3c3619)
size | build time | |
---|---|---|
Original | 432M | 3.8s |
Discarding | 335M | 3.2s |
The numbers show that the feature can save at least 22% size and 15% building time.
use std module
For this example:
import std; int main(int, char**) { std::vector<int> v; v.push_back(5); v.push_back(7); v.push_back(2); v.push_back(7); v.push_back(4); v.push_back(1); int t{3}; std::optional<int> op{3}; v.push_back((op <=> t) == std::strong_ordering::equal); std::sort(v.begin(), v.end()); for (int i : v) std::cout << i << " "; std::cout << "\n"; return 0; }
The compilation speed is:
build time | ||
---|---|---|
Original | 3.395s | |
Discarding | 2.455s | |
non-modular code | 1.861s | |
So that we can get a significant win with discarding approach.
(ps: the testing non modular version is:
// import std; #include <vector> #include <optional> #include <algorithm> #include <iostream> int main(int, char**) { std::vector<int> v; v.push_back(5); v.push_back(7); v.push_back(2); v.push_back(7); v.push_back(4); v.push_back(1); int t{3}; std::optional<int> op{3}; v.push_back((op <=> t) == std::strong_ordering::equal); std::sort(v.begin(), v.end()); for (int i : v) std::cout << i << " "; std::cout << "\n"; return 0; }
)
pr61477
This comes from https://github.com/llvm/llvm-project/issues/61447
The pattern of the issue is:
/// vulkan-a.cppm ... vulkan-z.cppm module; #include <iostream> // ${part} comes from a..z export module vulkan:${part}; export namespace ${part} { void hello() { std::cout << "hello\n"; } } // vulkan.cppm export module vulkan; export import :${part}; // test.cppm module; #include <iostream> export module test; import vulkan; // Vulcan is a constructed big module.
In this case, module vulcan is a constructed big module and it shows that we will take many times to handle the redeclaration due to the requirement of the languages. (It is not the topic of the patch).
size | build time | |
---|---|---|
Original | 164M | 5.2s |
Discarding | 118M | 3.8s |
In this case, the feature can save 28% space and 26% building times.