Index: clang/docs/CPlusPlus20Modules.rst =================================================================== --- /dev/null +++ clang/docs/CPlusPlus20Modules.rst @@ -0,0 +1,748 @@ +============= +C++20 Modules +============= + +.. contents:: + :local: + +Introduction +============ + +The term ``modules`` has a lot of meanings. For the users of Clang, modules may +refer to ``Objective-C Modules``, ``Clang C++ Modules`` (or ``Clang Header Modules``, +etc.) or C++20 modules. The implementation of all these kinds of modules in Clang +has a lot of shared code, but from the perspective of users, their semantics and +command line interfaces are very different. This document focuses on +an introduction of how to use C++20 modules in Clang. + +There is already a detailed document about `Clang modules `_, it +should be helpful to read `Clang modules `_ if you want to know +more about the general idea of modules. Since C++20 modules have different semantics +(and work flows) from `Clang modules`, this page describes the background and use of +Clang with C++20 modules + +Although the term ``modules`` has a unique meaning in C++20 Language Specification, +when people talk about C++20 modules, they may refer to another C++20 feature: +header units, which are also covered in this document. + +C++20 Modules +============= + +This document was intended to be a manual first and foremost, however, we consider it helpful to +introduce some language background here for readers who are not familiar with +the new language feature. This document is not intended to be a language +tutorial; it will only introduce necessary concepts about the +structure and building of the project. + +Background and terminology +-------------------------- + +Modules +~~~~~~~ + +In this document, the term ``Modules``/``modules`` refers to C++20 modules +feature if it is not decorated by ``Clang``. + +Clang Modules +~~~~~~~~~~~~~ + +In this document, the term ``Clang Modules``/``Clang modules`` refer to Clang +c++ modules extension. These are also known as ``Clang header modules``, +``Clang module map modules`` or ``Clang c++ modules``. + +Module and module unit +~~~~~~~~~~~~~~~~~~~~~~ + +A module consists of one or more module units. A module unit is a special +translation unit. Every module unit must have a module declaration. The syntax +of the module declaration is: + +.. code-block:: c++ + + [export] module module_name[:partition_name]; + +Terms enclosed in ``[]`` are optional. The syntax of ``module_name`` and ``partition_name`` +in regex form corresponds to ``[a-zA-Z_][a-zA-Z_0-9\.]*``. In particular, a literal dot ``.`` +in the name has no semantic meaning (e.g. implying a hierarchy). + +In this document, module units are classified into: + +* Primary module interface unit. + +* Module implementation unit. + +* Module interface partition unit. + +* Internal module partition unit. + +A primary module interface unit is a module unit whose module declaration is +``export module module_name;``. The ``module_name`` here denotes the name of the +module. A module should have one and only one primary module interface unit. + +A module implementation unit is a module unit whose module declaration is +``module module_name;``. A module could have multiple module implementation +units with the same declaration. + +A module interface partition unit is a module unit whose module declaration is +``export module module_name:partition_name;``. The ``partition_name`` should be +unique to the module. + +A internal module partition unit is a module unit whose module declaration +is ``module module_name:partition_name;``. The ``partition_name`` should be +unique to the module. + +In this document, we use the following umbrella terms: + +* A ``module interface unit`` refers to either a ``primary module interface unit`` + or a ``module interface partition unit``. + +* An ``importable module unit`` refers to either a ``module interface unit`` + or a ``internal module partition unit``. + +* A ``module partition unit`` refers to either a ``module interface partition unit`` + or a ``internal module partition unit``. + +Module file +~~~~~~~~~~~ + +A module file stands for the precompiled result of an importable module unit. +It is also called ``Built Module Interface file`` or the acronym ``BMI`` generally. +The terms ``module file`` and ``BMI`` are used interchangeably in this document. + +Global module fragment +~~~~~~~~~~~~~~~~~~~~~~ + +In a module unit, the section from ``module;`` to the module declaration is called the global module fragment. + + +How to build projects using modules +----------------------------------- + +Quick Start +~~~~~~~~~~~ + +Let's see a "hello world" example that uses modules. + +.. code-block:: c++ + + // Hello.cppm + module; + #include + export module Hello; + export void hello() { + std::cout << "Hello World!\n"; + } + + // use.cpp + import Hello; + int main() { + hello(); + return 0; + } + +Then we type: + +.. code-block:: console + + $ clang++ -std=c++20 Hello.cppm --precompile -o Hello.pcm + $ clang++ -std=c++20 use.cpp -fprebuilt-module-path=. Hello.pcm -o Hello.out + $ ./Hello.out + Hello World! + +In this example, we make and use a simple module ``Hello`` which contains only a +primary module interface unit ``Hello.cppm``. + +Then let's see a little bit more complex "hello world" example which uses the 4 kinds of module units. + +.. code-block:: c++ + + // M.cppm + export module M; + export import :interface_part; + import :impl_part; + export void Hello(); + + // interface_part.cppm + export module M:interface_part; + export void World(); + + // impl_part.cppm + module; + #include + #include + module M:impl_part; + import :interface_part; + + std::string W = "World."; + void World() { + std::cout << W << std::endl; + } + + // Impl.cpp + module; + #include + module M; + void Hello() { + std::cout << "Hello "; + } + + // User.cpp + import M; + int main() { + Hello(); + World(); + return 0; + } + +Then we are able to compile the example by the following command: + +.. code-block:: console + + # Precompiling the module + $ clang++ -std=c++20 interface_part.cppm --precompile -o M-interface_part.pcm + $ clang++ -std=c++20 impl_part.cppm --precompile -fprebuilt-module-path=. -o M-impl_part.pcm + $ clang++ -std=c++20 M.cppm --precompile -fprebuilt-module-path=. -o M.pcm + $ clang++ -std=c++20 Impl.cpp -fmodule-file=M.pcm -c -o Impl.o + + # Compiling the user + $ clang++ -std=c++20 User.cpp -fprebuilt-module-path=. -c -o User.o + + # Compiling the module and linking it together + $ clang++ -std=c++20 M-interface_part.pcm -c -o M-interface_part.o + $ clang++ -std=c++20 M-impl_part.pcm -c -o M-impl_part.o + $ clang++ -std=c++20 M.pcm -c -o M.o + $ clang++ User.o M-interface_part.o M-impl_part.o M.o Impl.o -o a.out + +We explain the options in the following sections. + +How to enable C++20 Modules +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently, C++20 Modules are enabled automatically if the language standard is ``-std=c++20`` or newer. +The ``-fmodules-ts`` option is deprecated and is planned to be removed. + +How to produce a module file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is possible to generate a module file for an importable module unit by specifying the ``--precompile`` option. + +File name requirement +~~~~~~~~~~~~~~~~~~~~~ + +The file name of an ``importable module unit`` must end with ``.cppm`` +(or ``.ccm``, ``.cxxm``, ``.c++m``). The file name of a ``module implementation unit`` +should end with ``.cpp`` (or ``.cc``, ``.cxx``, ``.c++``). + +The file name of module files should end with ``.pcm``. +The file name of the module file of a ``primary module interface unit`` should be ``module_name.pcm``. +The file name of module files of ``module partition unit`` should be ``module_name-partition_name.pcm``. + +If the file names use different extensions, Clang may fail to build the module. +For example, if the filename of an ``importable module unit`` ends with ``.cpp`` instead of ``.cppm``, +then we can't generate a module file for the ``importable module unit`` by ``--precompile`` option +since ``--precompile`` option now would only run preprocessor, which is equal to `-E` now. +If we still want the filename of an ``importable module unit`` ends with ``.cpp`` instead of ``.cppm``, +we could put ``-x c++-module`` in front of the file. For example, + +.. code-block:: c++ + + // Hello.cpp + module; + #include + export module Hello; + export void hello() { + std::cout << "Hello World!\n"; + } + + // use.cpp + import Hello; + int main() { + hello(); + return 0; + } + +Now the filename of the ``module interface`` ends with ``.cpp`` instead of ``.cppm``, +we can't compile them by the original command lines. But we are still able to do it by: + +.. code-block:: console + + $ clang++ -std=c++20 -x c++-module Hello.cpp --precompile -o Hello.pcm + $ clang++ -std=c++20 use.cpp -fprebuilt-module-path=. Hello.pcm -o Hello.out + $ ./Hello.out + Hello World! + +How to specify the dependent module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The option ``-fprebuilt-module-interface`` tells the compiler the path where to search for dependent module files. +It may be used multiple times just like ``-I`` for specifying paths for header files. The look up rule here is: + +* (1) When we import module M. The compiler would look up M.pcm in the directories specified + by ``-fprebuilt-module-interface``. +* (2) When we import partition module unit M:P. The compiler would look up M-P.pcm in the + directories specified by ``-fprebuilt-module-interface``. + +Another way to specify the dependent module files is to use ``-fmodule-file``. The main difference +is that ``-fprebuilt-module-interface`` takes a directory, whereas ``-fmodule-file`` requires a +specific file. + +When we compile a ``module implementation unit``, we must pass the module file of the corresponding +``primary module interface unit`` by ``-fmodule-file`` +since the langugae specification says a module implementation unit implicitly imports +the primary module interface unit. + + [module.unit]p8 + + A module-declaration that contains neither an export-keyword nor a module-partition implicitly + imports the primary module interface unit of the module as if by a module-import-declaration. + +Again, the option ``-fmodule-file`` may occur multiple times. +For example, the command line to compile ``M.cppm`` in +the above example could be rewritten into: + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -fmodule-file=M-interface_part.pcm -fmodule-file=M-impl_part.pcm -o M.pcm + +``-fprebuilt-module-interface`` is more convenient and ``-fmodule-file`` is faster since +it saves time for file lookup. + +Remember to compile and link module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +It is easy to forget to link module files at first since we may envision module interfaces like headers. +However, this is not true. +Module units are translation units. We need to compile them and link them like the example shows. + +If we want to create libraries for the module files, we can't wrap these module files directly. +We must compile these module files(``*.pcm``) into object files(``*.o``) and wrap these object files. + +Consistency Requirement +~~~~~~~~~~~~~~~~~~~~~~~ + +If we envision modules as a cache to speed up compilation, then - as with other caching techniques - +it is important to keep cache consistency. +So **currently** Clang will do very strict check for consistency. + +Options consistency +^^^^^^^^^^^^^^^^^^^ + +The language option of module units and their non-module-unit users should be consistent. +The following example is not allowed: + +.. code-block:: c++ + + // M.cppm + export module M; + + // Use.cpp + import M; + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -o M.pcm + $ clang++ -std=c++2b Use.cpp -fprebuilt-module-path=. + +The compiler would reject the example due to the inconsistent language options. +Not all options are language options. +For example, the following example is allowed: + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -o M.pcm + # Inconsistent optimization level. + $ clang++ -std=c++20 -O3 Use.cpp -fprebuilt-module-path=. + # Inconsistent debugging level. + $ clang++ -std=c++20 -g Use.cpp -fprebuilt-module-path=. + +Although the two examples have inconsistent optimization and debugging level, both of them are accepted. + +Note that **currently** the compiler doesn't consider inconsistent macro definition a problem. For example: + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -o M.pcm + # Inconsistent optimization level. + $ clang++ -std=c++20 -O3 -DNDEBUG Use.cpp -fprebuilt-module-path=. + +Currently Clang would accept the above example. But it may produce surprising results if the +debugging code depends on consistent use of ``NDEBUG`` also in other translation units. + +Source content consistency +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the compiler reads a module file, the compiler will check the consistency of the corresponding +source files. For example: + +.. code-block:: c++ + + // M.cppm + export module M; + export template + T foo(T t) { + return t; + } + + // Use.cpp + import M; + void bar() { + foo(5); + } + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -o M.pcm + $ rm -f M.cppm + $ clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm + +The compiler would reject the example since the compiler failed to find the source file to check the consistency. +So the following example would be rejected too. + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -o M.pcm + $ echo "int i=0;" >> M.cppm + $ clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm + +The compiler would reject it too since the compiler detected the file was changed. + +But it is OK to move the module file as long as the source files remain: + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -o M.pcm + $ mkdir -p tmp + $ mv M.pcm tmp/M.pcm + $ clang++ -std=c++20 Use.cpp -fmodule-file=tmp/M.pcm + +The above example would be accepted. + +If the user doesn't want to follow the consistency requirement due to some reasons (e.g., distributing module files), +the user could try to use ``-XClang -fmodules-embed-all-files`` when producing module files. For example: + +.. code-block:: console + + $ clang++ -std=c++20 M.cppm --precompile -XClang -fmodules-embed-all-files -o M.pcm + $ rm -f M.cppm + $ clang++ -std=c++20 Use.cpp -fmodule-file=M.pcm + +Now the compiler would accept the above example. +Important note: XClang options are intended to be used by compiler internally and its semantics +are not guaranteed to be preserved in future versions. + +How module speed up compilation +------------------------------- + +A classic theory for the reason why modules speed up the compilation is: +if there are ``n`` headers and ``m`` source files and each header is included by each source file, +then the complexity of the compilation is ``O(n*m)``; +But if there are ``n`` module interfaces and ``m`` source files, the complexity of the compilation is +``O(n+m)``. So, using modules would be a big win when scaling. +In a simpler word, we could get rid of many redundant compilations by using modules. + +Roughly, this theory is correct. But the problem is that it is too rough. Let's see what actually happens. +For example, the behavior also depends on the optimization level, as we will illustrate below. + +First is ``O0``. The compilation process is described in the following graph. + +.. code-block:: none + + ├-------------frontend----------┼-------------middle end----------------┼----backend----┤ + │ │ │ │ + └---parsing----sema----codegen--┴----- transformations ---- codegen ----┴---- codegen --┘ + + ┌---------------------------------------------------------------------------------------┐ + | │ + | source file │ + | │ + └---------------------------------------------------------------------------------------┘ + + ┌--------┐ + │ │ + │imported│ + │ │ + │ code │ + │ │ + └--------┘ + +Here we can see that the source file (could be a non-module unit or a module unit) would get processed by the +whole pipeline. +But the imported code would only get involved in semantic analysis, which is mainly about name lookup, +overload resolution and template instantiation. +All of these processes are fast relative to the whole compilation process. +More importantly, the imported code only needs to be processed once in frontend code generation, +as well as the whole middle end and backend. +So we could get a big win for the compilation time in O0. + +But with optimizations, things are different: + +(we omit ``code generation`` part for each end due to the limited space) + +.. code-block:: none + + ├-------- frontend ---------┼--------------- middle end --------------------┼------ backend ----┤ + │ │ │ │ + └--- parsing ---- sema -----┴--- optimizations --- IPO ---- optimizations---┴--- optimizations -┘ + + ┌-----------------------------------------------------------------------------------------------┐ + │ │ + │ source file │ + │ │ + └-----------------------------------------------------------------------------------------------┘ + ┌---------------------------------------┐ + │ │ + │ │ + │ imported code │ + │ │ + │ │ + └---------------------------------------┘ + +It would be very unfortunate if we end up with worse performance after using modules. +The main concern is that when we compile a source file, the compiler needs to see the function body +of imported module units so that it can perform IPO (InterProcedural Optimization, primarily inlining +in practice) to optimize functions in current source file wit the help of the information provided by +the imported module units. +In other words, the imported code would be processed again and again in importee units +by optimizations (including IPO itself). +The optimizations before IPO and the IPO itself are the most time-consuming part in whole compilation process. +So from this perspective, we might not be able to get the improvements described in the theory. +But we could still save the time for optimizations after IPO and the whole backend. + +ABI Impacts +----------- + +The declarations in a module unit which are not in global module fragment have new linkage names. + +For example, + +.. code-block:: c++ + + export module M; + namespace NS { + export int foo(); + } + +The linkage name of ``NS::foo()`` would be ``_ZN2NSW1M3fooEv``. +This couldn't be demangled by previous versions of the debugger or demangler. +As of LLVM 15.x, users can utilize ``llvm-cxxfilt`` to demangle this: + +.. code-block:: console + + $ llvm-cxxfilt _ZN2NSW1M3fooEv + +The result would be ``NS::foo@M()``, which reads as ``NS::foo()`` in module ``M``. + +The ABI implies that we can't declare something in a module unit and define it in a non-module unit (or vice-versa), +as this would result in linking errors. + +Known Problems +-------------- + +The following describes issues in the current implementation of modules. +Please see https://github.com/llvm/llvm-project/labels/clang%3Amodules for more issues +or file a new issue if you don't find an existing one. +If you're going to create a new issue for C++20 modules, please start the title with ``[C++20] [Modules]`` +and add the label ``clang:modules`` (if you have permissions for that). + +For higher level support for proposals, you could visit https://clang.llvm.org/cxx_status.html. + +Support for clang-scan-deps +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The support for clang-scan-deps may be the most urgent problem for modules now. +Without the support for clang-scan-deps, it's hard to involve build systems. +This means that users could only play with modules through makefiles or by writing a parser by hand. +It blocks more uses for modules, which will block more defect reports or requirements. + +This is tracked in: https://github.com/llvm/llvm-project/issues/51792. + +Ambiguous deduction guide +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Currently, when we call deduction guides in global module fragment, +we may get incorrect diagnosing message like: `ambiguous deduction`. + +So if we're using deduction guide from global module fragment, we probably need to write: + +.. code-block:: c++ + + std::lock_guard lk(mutex); + +instead of + +.. code-block:: c++ + + std::lock_guard lk(mutex); + +This is tracked in: https://github.com/llvm/llvm-project/issues/56916 + +Ignored PreferredName Attribute +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Due to a tricky problem, when Clang writes module files, Clang will ignore the ``preferred_name`` attribute, if any. +This implies that the ``preferred_name`` wouldn't show in debugger or dumping. + +This is tracked in: https://github.com/llvm/llvm-project/issues/56490 + +Don't emit macros about module declaration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This is covered by P1857R3. We mention it again here since users may abuse it before we implement it. + +Someone may want to write code which could be compiled both by modules or non-modules. +A direct idea would be use macros like: + +.. code-block:: c++ + + MODULE + IMPORT header_name + EXPORT_MODULE MODULE_NAME; + IMPORT header_name + EXPORT ... + +So this file could be triggered like a module unit or a non-module unit depending on the definition +of some macros. +However, this kind of usage is forbidden by P1857R3 but we haven't implemented P1857R3 yet. +This means that is possible to write illegal modules code now, and obviously this will stop working +once P1857R3 is implemented. +A simple suggestion would be "Don't play macro tricks with module declarations". + +This is tracked in: https://github.com/llvm/llvm-project/issues/56917 + +Header Units +============ + +How to build projects using header unit +--------------------------------------- + +Quick Start +~~~~~~~~~~~ + +For the following example, + +.. code-block:: c++ + + import ; + int main() { + std::cout << "Hello World.\n"; + } + +we could compile it as + +.. code-block:: console + + $ clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm + $ clang++ -std=c++20 -fmodule-file=iostream.pcm main.cpp + +How to produce module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Similar to modules, we could use ``--precompile`` to produce the module file. +But we need to specify that the input file is a header by ``-xc++-system-header`` or ``-xc++-user-header``. + +How to specify the dependent module files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We could use ``-fmodule-file`` to specify the module files, and this option may occur multiple times as well. + +With the existing implementation ``-fprebuilt-module-path`` cannot be used for header units +(since they are nominally anonymous). +For header units, use ``-fmodule-file`` to include the relevant PCM file for each header unit. + +This is expect to be solved in future editions of the compiler either by the tooling finding and specifying +the -fmodule-file or by the use of a module-mapper that understands how to map the header name to their PCMs. + +Don't compile the module file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Another difference with modules is that we can't compile the module file from a header unit. +For example: + +.. code-block:: console + + $ clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm + # This is not allowed! + $ clang++ iostream.pcm -c -o iostream.o + +It makes sense due to the semantics of header units, which are just like headers. + +Include translation +~~~~~~~~~~~~~~~~~~~ + +The C++ spec allows the vendors to convert ``#include header-name`` to ``import header-name;`` when possible. +Currently, Clang would do this translation for the ``#include`` in the global module fragment. + +For example, the following two examples are the same: + +.. code-block:: c++ + + module; + import ; + export module M; + export void Hello() { + std::cout << "Hello.\n"; + } + +with the following one: + +.. code-block:: c++ + + module; + #include + export module M; + export void Hello() { + std::cout << "Hello.\n"; + } + +.. code-block:: console + + $ clang++ -std=c++20 -xc++-system-header --precompile iostream -o iostream.pcm + $ clang++ -std=c++20 -fmodule-file=iostream.pcm --precompile M.cppm -o M.cpp + +In the latter example, the Clang could find the module file for the ```` +so it would try to replace the ``#include `` to ``import ;`` automatically. + + +Relationships between Clang modules +----------------------------------- + +Header units have pretty similar semantics with Clang modules. +The semantics of both of them are like headers. + +In fact, we could even "mimic" the sytle of header units by Clang modules: + +.. code-block:: c++ + + module "iostream" { + export * + header "/path/to/libstdcxx/iostream" + } + +.. code-block:: console + + $ clang++ -std=c++20 -fimplicit-modules -fmodule-map-file=.modulemap main.cpp + +It would be simpler if we are using libcxx: + +.. code-block:: console + + $ clang++ -std=c++20 main.cpp -fimplicit-modules -fimplicit-module-maps + +Since there is already one +`module map `_ +in the source of libcxx. + +Then immediately leads to the question: why don't we implement header units through Clang header modules? + +The main reason for this is that Clang modules have more semantics like hierarchy or +wrapping multiple headers together as a big module. +However, these things are not part of C++20 Header units, and we want to avoid the impression that these +additional semantics get interpreted as standard C++20 behavior. + +Another reason is that there are proposals to introduce module mappers to the C++ standard +(for example, https://wg21.link/p1184r2). +If we decide to reuse Clang's modulemap, we may get in trouble once we need to introduce another module mapper. + +So the final answer for why we don't reuse the interface of Clang modules for header units is that +there are some differences between header units and Clang modules and that ignoring those +differences now would likely become a problem in the future. Index: clang/docs/index.rst =================================================================== --- clang/docs/index.rst +++ clang/docs/index.rst @@ -40,6 +40,7 @@ SafeStack ShadowCallStack SourceBasedCodeCoverage + CPlusPlus20Modules Modules MSVCCompatibility MisExpect