Index: clang/docs/CPlusPlus20Modules.rst =================================================================== --- /dev/null +++ clang/docs/CPlusPlus20Modules.rst @@ -0,0 +1,605 @@ +============= +C++20 Modules +============= + +.. contents:: + :local: + +Introduction +============ + +Modules have a lot of meanings. For the users of clang compiler, modules may +refer to `Objective-C Modules`, `Clang C++ Modules` (or `Clang Header Modules`, +etc) and C++20 modules. The implementation of all kinds of the modules in clang +share a big part of codes. But from the perspective of users, their semantics and +command line interfaces are very different. So it should be helpful for the users +to introduce how to use C++20 modules. + +There is already a detailed document about clang modules Modules_, it +should be helpful to read Modules_ if you want to know more about the general +idea of modules. But due to the C++20 modules having very different semantics, it +might be more friendly for users who care about C++20 modules only to create a +new page. + +Although the term `modules` has a unique meaning in C++20 Language Specification, +when people talk about C++20 modules, they may refer to another C++20 feature: +header units. So this document would try to cover header units too. + +C++20 Modules +============= + +This document was intended to be pure manual. But it should be helpful to +introduce some language background here for readers who are not familiar with +the new language feature. This document is not intended to be a language +tutorial. The document would only introduce concepts about the the +structure and building of the project. + +Background and terminology +-------------------------- + +Modules +~~~~~~~ + +In this document, the term `Modules`/`modules` refers to C++20 modules +feature if it is not decorated by `clang`. + +Clang Modules +~~~~~~~~~~~~~ + +In this document, the term `Clang Modules`/`clang modules` refer to clang +c++ modules extension. It is also known as `clang header modules` and +`clang module map modules` or `clang c++ modules`. + +Module and module unit +~~~~~~~~~~~~~~~~~~~~~~ + +A module consists of one or multiple module units. A module unit is a special +translation unit. Every module unit should have a module declaration. The syntax +of the module declaration is: + +.. code-block:: c++ + + [export] module module_name[:partition_name]; + +Things in ``[]`` means optional. The syntax of ``module_name`` and ``partition_name`` +in regex form should be ``[a-zA-Z_][a-zA-Z_0-9.]*``. The dot ``.`` in the name has +no special meaning. + +In this document, module units are classified into: + +* Primary module interface unit. + +* Module implementation unit. + +* Module partition interface unit. + +* Module partition implementation unit. + +A primary module interface unit is a module unit whose module declaration is +`export module module_name;`. The `module_name` here denotes the name of the +module. A module should have one and only primary module interface unit. + +A module implementation unit is a module unit whose module declaration is +`module module_name;`. A module could have multiple module implementation +units with the same declaration. + +A module partition interface unit is a module unit whose module declaration is +`export module module_name:partition_name;`. The `partition_name` should be +unique to the module. + +A module partition implementation unit is a module unit whose module declaration +is `module module_name:partition_name;`. The `partition_name` should be +unique to the module. + +In this document, we call `primary module interface unit` and +`module partition interface unit` as `module interface unit`. We call `module +interface unit` and `module partition implementation unit` as +`importable module unit`. We call `module partition interface unit` and +`module partition implementation unit` as `module partition unit`. + +Module file +~~~~~~~~~~~ + +A module file stands for the precompiled result of an importable module unit. + +Global module fragment +~~~~~~~~~~~~~~~~~~~~~~ + +In a module unit, the section from `module;` to the module declaration is called the global module fragment. + + +How to build projects using modules +----------------------------------- + +Quick Start +~~~~~~~~~~~ + +Here is a hello world example to show how to use modules. + +.. code-block:: c++ + + // M.cppm + export module M; + export import :interface_part; + import :impl_part; + export void Hello(); + + // interface_part.cppm + export module M:interface_part; + export void World(); + + // impl_part.cppm + module; + #include + #include + module M:impl_part; + import :interface_part; + + std::string W = "World."; + void World() { + std::cout << W << std::endl; + } + + // Impl.cpp + module; + #include + module M; + void Hello() { + std::cout << "Hello "; + } + + // User.cpp + import M; + int main() { + Hello(); + World(); + return 0; + } + +Then we could compile the example by the following command: + +.. code-block:: console + + # Precompiling the module + clang++ -std=c++20 interface_part.cppm --precompile -o M-interface_part.pcm + clang++ -std=c++20 impl_part.cppm --precompile -fprebuilt-module-path=. -o M-impl_part.pcm + clang++ -std=c++20 M.cppm --precompile -fprebuilt-module-path=. -o M.pcm + clang++ -std=c++20 Impl.cpp -fmodule-file=M.pcm -c -o Impl.o + + # Compiling the user + clang++ -std=c++20 User.cpp -fprebuilt-module-path=. -c -o User.o + + # Compiling the module and linking it together + clang++ -std=c++20 M-interface_part.pcm -c -o M-interface_part.o + clang++ -std=c++20 M-impl_part.pcm -c -o M-impl_part.o + clang++ -std=c++20 M.pcm -c -o M.o + clang++ User.o M-interface_part.o M-impl_part.o M.o Impl.o -o a.out + +We would explain the options in the following sections. + +How to enable C++20 Modules +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently, C++20 Modules is enabled automatically if the language standard is `-std=c++20`. +Sadly, we can't enable C++20 Modules now with lower language standard versions like coroutines by `-fcoroutines-ts` due to some implementation problems. +The `-fmodules-ts` option is deprecated and is planned to be removed. + +How to produce a module file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We could generate a module file for an importable module unit by `--precompile` option. + +File name requirement +~~~~~~~~~~~~~~~~~~~~~ + +The file name of `importable module unit` must end with `.cppm` +(or `.ccm`, `.cxxm`, etc). The file name of `module implementation unit` +should end with `.cpp` (or `.cc`, `.cxx`, etc). + +The file name of module files should end with `.pcm`. +The file name of the module file of a `primary module interface unit` should be `module_name.pcm`. +The file name of module files of `module partition unit` should be `module_name-partition_name.pcm`. + +How to specify the dependent module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We could use `-fprebuilt-module-interface` to tell the compiler the path to search the dependent module files. +`-fprebuilt-module-interface` could occur multiple times just like `-I`. + +Another way to specify the dependent module files is to use `-fmodule-file`. + +When we compile `module implementation unit`, we must pass the module file of the corresponding `primary module interface unit` by `-fmodule-file`. +The `-fmodule-file` option could occur multiple times. For example, the command line to compile `M.cppm` in the above example could be rewritten into: + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -fmodule-file=M-interface_part.pcm -fmodule-file=M-impl_part.pcm -o M.pcm + +`-fprebuilt-module-interface` is more convenient and `-fmodule-file` is faster since it would save the time for file lookup. + +Remember to linking module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is easy to forget to link module files at first since we may envision module interfaces like headers. It is not true. +Module units are translation units. We need to compile them and link them like the example shows. +It is also OK to compile module units into static library or dynamic library. + +Consistency Requirement +~~~~~~~~~~~~~~~~~~~~~~~ + +If we envision modules as a cache to speed up compilation, then it is important to keep the cache consistency as other cache techniques. +So **currently** clang would do very strict check for consistency. + +Options consistency +^^^^^^^^^^^^^^^^^^^ + +The language option of module units and their non-module-unit users should be consistent. The following example is not allowed: + +.. code-block:: c++ + // M.cppm + export module M; + + // Use.cpp + import M; + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -o M.pcm + clang++ -std=c++2b Use.cpp -fprebuilt-module-path=. + +The compiler would reject the example due to the inconsistent language options. +Not all options are language options. +For example, the following example is allowed: + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -o M.pcm + # Inconsistent optimization level. + clang++ -std=c++20 -O3 Use.cpp -fprebuilt-module-path=. + # Inconsistent debugging level. + clang++ -std=c++20 -g Use.cpp -fprebuilt-module-path=. + +Although the two examples have inconsistent optimization and debugging level, both of them are accepted. + +Note that **currently** the compiler don't think it is a problem about inconsistent macro definition. For example: + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -o M.pcm + # Inconsistent optimization level. + clang++ -std=c++20 -O3 -DNDEBUG Use.cpp -fprebuilt-module-path=. + +Currently clang would accept the above example. But it may produce surprising result if the debugging code dependents on each other. + +Source content consistency +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the compiler reads a module file, the compiler would check the consistency of the corresponding source files. For example: + +.. code-block:: c++ + // M.cppm + export module M; + export template + T foo(T t) { + return t; + } + + // Use.cpp + import M; + void bar() { + foo(5); + } + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -o M.pcm + rm -f M.cppm + clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm + +The compiler would reject the example since the compiler failed to find the source file to check the consistency. +So the following example would be rejected too. + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -o M.pcm + echo "int i=0;" >> M.cppm + clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm + +The compiler would reject it too since the compiler detected the file was changed. + +But it is OK to move the module file as long as the source files remained: + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -o M.pcm + mkdir -p tmp + mv M.pcm tmp/M.pcm + clang++ -std=c++20 Use.cpp -fmodule-file=tmp/M.pcm + +The above example would be accepted. + +If the user don't want to follow the consistency requirement due to some reasons (e.g., distributing module files), +the user could try to use `-Xclang -fmodules-embed-all-files` when producing module files. For example: + +.. code-block:: console + + clang++ -std=c++20 M.cppm --precompile -Xclang -fmodules-embed-all-files -o M.pcm + rm -f M.cppm + clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm + +Now the compiler would accept the above example. +Important note: Xclang options are intended to be used by compiler internally and its semantics are not guaranteed to preserve in future versions. + +How module speed up the compilation +---------------------------------- + +A classic theory for the reason why modules speed up the compilation is: +if there are ``n`` headers and ``m`` source files and each header is included by each source file, then the complexity of the compilation is ``O(nm)``; +But if there are ``n`` module interfaces and ``m`` source files, the complexity of the compilation is ``O(n+m)``. So the modules would get a big win +when scaling. In a simpler word, we could get rid of many redundant compilations by using modules. + +Roughly, the theory is correct. But the problem is that it is too rough. Let's see what would happen actually. And it depends on the optimization level +actually. + +First is ``O0``. The compilation process is described in the following graph. + +.. code-block:: console + + │ │ │ │ + ├───────────── frontend ───────────────┼───────────── middle end ──────────────┼──── backend ─────┤ + │ │ │ │ + └─── parsing ──── sema ──── codegen ────┴───── transformations ──── codegen ────┴──── codegen ─────┘ + + ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ + │ │ + │ source file │ + │ │ + └──────────────────────────────────────────────────────────────────────────────────────────────────┘ + + ┌─────────┐ + │ │ + │ imported│ + │ │ + │ codes │ + │ │ + └─────────┘ + +Here we can see that the source file (could be a non-module unit or a module unit) would get processed by the whole pipeline. +But the imported codes would only get involved in semantical analysis, which is mainly about name lookup, overload resolution and template instantiation. +All of these processes are fast to the whole compilation process. +The imported codes could save the time for the fronend code generation and the whole middle end and the backend. +So we could get big win for the compilation time in O0. + +But with optimizations, things are different: + +.. code-block:: console + + │ │ │ │ + ├───────────── frontend ─────────────┼───────────────────────── middle end ──────────────────────┼─────────── backend ────────────┤ + │ │ │ │ + └─── parsing ──── sema ──── codegen ──┴─── optimizations ─── IPO ─── optimizations ─── codegen ───┴─── optimizations ─── codegen ──┘ + + ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ + │ │ + │ source file │ + │ │ + └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + ┌───────────────────────────────────────────────────────────────┐ + │ │ + │ │ + │ imported codes │ + │ │ + │ │ + └───────────────────────────────────────────────────────────────┘ + +It is not acceptable if we get performance loss after we use modules. The main concern is that when we compile a source file, the compiler need to see the function body +of imported module units so that the compiler could perform IPO (InterProcedural Optimization, primarily inlining in practice) to optimize functions in current source file +by the information provided by the imported module units. +In other words, the imported codes would be processed again and again in importee units by optimizations (including IPO itself). +The optimizations before IPO and the IPO itself are the most time-consuming part in whole compilation process. +So from this perspective, we might not be able to get the improvements described in the theory. +But we could still save the time for optimizations after IPO and the whole backend. + +ABI Impacts +----------- + +The declarations in module unit which are not in global module fragment would get new linkage names. + +For example, + +.. code-block:: c++ + + export module M; + namespace NS { + export int foo(); + } + +The linkage name of `NS::foo()` would be `_ZN2NSW1M3fooEv`. +This couldn't be demangled by low versions of debugger of demangler. +User could use `llvm-cxxfilt` since 15.x to demangle this: + +.. code-block:: console + + llvm-cxxfilt _ZN2NSW1M3fooEv + +The result would be ``NS::foo@M()``, which reads as `NS::foo()` in module `M`. + +The ABI implies that we can't declare something in a module unit and define it in a non-module unit (or vice-versa). +Since it would meet linking errors. + +Header Units +============ + +How to build projects using header unit +--------------------------------------- + +Quick Start +~~~~~~~~~~~ + +For the following example, + +.. code-block:: c++ + + import ; + int main() { + std::cout << "Hello World.\n"; + } + +we could compile it as + +.. code-block:: console + + clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm + clang++ -std=c++20 -fmodule-file=iostream.pcm main.cpp + +How to produce module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Similar to modules, we could use `--precompile` to produce the module file. +But we need to specify that the input file is a header by `-xc++-system-header` or `-xc++-user-header`. + +How to specify the dependent module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We could use `-fmodule-file` to specify the module files. `-fmodule-file` could occur multiple times too. + +But what's different is that we can't use `-fprebuilt-module-path` to search the module file for header units. + +(This may be an implementation defect. Although we could argue that header units have no name according to the spec, +it is natural that we couldn't search it. But this is not user friendly enough.) + +Don't compile the module file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Another difference with modules is that we can't compile the module file. +It makes sense due to the semantics of header unit are just like headers. + +Include translation +~~~~~~~~~~~~~~~~~~~ + +The C++ spec allows the vendors to convert ``#include header-name`` to ``import header-name;`` when possible. +Currently, clang would do this translation for the ``#include`` in the global module fragment. + +For example, the following two examples are the same: + +.. code-block:: c++ + + module; + import ; + export module M; + export void Hello() { + std::cout << "Hello.\n"; + } + +with the following one: + +.. code-block:: c++ + + module; + #include + export module M; + export void Hello() { + std::cout << "Hello.\n"; + } + +.. code-block:: console + + clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm + clang++ -std=c++20 -fmodule-file=iostream.pcm --precompile M.cppm -o M.cpp + +In the latter example, the clang could find the module file for the `` +so it would try to replace the `#include ` to `import ;` automatically. + + +Relationships between clang modules +----------------------------------- + +Header units have pretty similar semantics with clang modules. +The semantics of both of them are like headers. + +In fact, we could even mimic the sytle of header units by clang modules: + +.. code-block:: c++ + + module "iostream" { + export * + header "/path/to/libstdcxx/iostream" + } + +.. code-block:: console + + clang++ -std=c++20 -fimplicit-modules -fmodule-map-file=.modulemap main.cpp + +It would be simpler if we are using libcxx: + +.. code-block:: console + + clang++ -std=c++20 main.cpp -fimplicit-modules -fimplicit-module-maps + +Since there is already one module maps in the source of libcxx. + +Then here will be a direct question: why don't we implement header units by clang header modules. +Here are the reasons. + +(1) The method to handle macros of header units and clang modules are slightly different. + +For example: + +.. code-block:: c++ + // foo.h + #define D 45 + + // bar.h + #undef D + + // use.cpp + #include + import "foo.h"; + import "bar.h"; + + int main() { + #ifdef D + std::cout << "Macro Value: " << D << "\n"; + #else + std::cout << "Not defined.\n"; + #endif + } + + // .modulemap + module "foo" { + export * + header "foo.h" + } + module "bar" { + export * + header "bar.h" + } + +.. code-block:: console + # Using clang modules + rm -f foo.pcm bar.pcm + clang++ -std=c++20 -fimplicit-modules -fmodule-map-file=.modulemap use.cpp + ./a.out + # Result would be: "Macro Value: 45" + +But according to [cpp.import](http://eel.is/c++draft/cpp.import#5), +the import of header unit should export undefinition too. +So the expected result of using header units here should be ``"Not defined."``. + +Although we haven't implemented the semantics correctly, +it may be better to separate the interfaces of header units and +clang modules to avoid to affect the many existing users of clang modules. + +(2) Clang modules more semantics than header units. + +Clang modules have more semantics like hierarchy, wrapping multiple headrs together as a big module. All of this are not part of C++20 Header units. +We're afraid that users may abuse these features and think they are using C++20 things. + +Another reason is that there are proposals to introduce module mappers to the C++ standard (for example, https://wg21.link/p1184r2). +Then if we decide to resued clang's modulemap, we may get in problem once we need to introduce antoher module mapper. + +So the final answer for why don't we reuse the interface of clang modules for header units is that +we've see some differences between header units and clang modules and we guess the differences may be bigger +so that we can't endure it. + +.. _Modules: Modules.html Index: clang/docs/index.rst =================================================================== --- clang/docs/index.rst +++ clang/docs/index.rst @@ -40,6 +40,7 @@ SafeStack ShadowCallStack SourceBasedCodeCoverage + CPlusPlus20Modules Modules MSVCCompatibility MisExpect