This is an archive of the discontinued LLVM Phabricator instance.

[C++20][Modules][3/8] Initial handling for module partitions.
ClosedPublic

Authored by iains on Jan 31 2022, 2:22 AM.

Details

Summary

This implements the parsing and recognition of module partition CMIs
and removes the FIXMEs in the parser.

Module partitions are recognised in the base computation of visibility,
however additional amendments to visibility follow in subsequent patches.

Add initial testcases for partition handling, derived from the examples in
Section 10 of the C++20 standard, which identifies what should be accepted
and/or rejected.

Diff Detail

Event Timeline

iains created this revision.Jan 31 2022, 2:22 AM
iains updated this revision to Diff 405588.Feb 3 2022, 5:09 AM

rebased onto the import state machine.

iains updated this revision to Diff 405590.Feb 3 2022, 5:24 AM

reapply rebase

iains updated this revision to Diff 408760.Feb 15 2022, 1:23 AM
iains added a comment.Feb 15 2022, 1:23 AM

Rebased onto other modules work.

Also had to bump DIAG_SIZE_PARSE, since the existing allocation was full.

iains updated this revision to Diff 408855.Feb 15 2022, 6:49 AM

update after fixing (most) lint suggestions.

The change to DiagnosticIDs.h is not going to be applicable - it would need the
whole file to be rewritten and that change is incidental to this patch series.

iains published this revision for review.Feb 15 2022, 8:30 AM

this is the third patch in an 8 patch series to implement basic module partition support.
The change to clang/include/clang/Basic/DiagnosticIDs.h is required for this to build (at present) but is otherwise incidental.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 8:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Oh, I did similar thing in our downstream version. I planned to show it in next few weeks. Now it looks I could only give up on it...

iains updated this revision to Diff 409201.Feb 16 2022, 3:46 AM

Update to rebase on changes to parent patches and formatting checks.

while doing this made a check for module partitions more efficient.

iains retitled this revision from [C++20][Modules] Initial handling for module partitions. to [C++20][Modules][3/8] Initial handling for module partitions..Feb 16 2022, 3:46 AM
urnathan added inline comments.Feb 16 2022, 6:47 AM
clang/include/clang/Sema/Sema.h
2987–2988

Is NamePath really a better name? You've not consistently changed all Path's to this, and it doesn;t strike me as particularly mnemonic.

clang/lib/Parse/Parser.cpp
2393–2394

perhaps now we should just remove modules-ts references as drive-by cleanups?

I think we should add 2 more tests at least:
Import a partition a module purview but we failed to find one. And test reflects the successful case.

clang/include/clang/Sema/Sema.h
2987–2988

I use Path in my implementation.

clang/lib/Sema/SemaDecl.cpp
1611

We lack a comma here.

clang/lib/Sema/SemaModule.cpp
233–234

I prefer to add more words here to tell why we create an interface unit but the above condition is ModuleDeclKind::Implementation.

268

How about: Mod->Kind != Module::ModulePartitionImplementation.

353–365

I think we could change the signature of the lat 2 fields to

ModuleIdPath Path,
bool IsPartition)

I feel this is more simpler.

357

I prefer to remove the support for getLangOpts().ModulesTS on the fly.

366–367

I prefer to move the variable to the following block.

368

I prefer to add an assertion:

assert (ModuleScopes.back().Module.isModulePurview());
374–375

I would like to wrap this one as a method of class Module:

StringRef getPrimaryModuleName(Module *M) {
    if (not partition)
       return M->Name;
    return M->Name.split(":").first;
}
380

We could move this below the if statement.

441–459

I don't find the reason to refactor here. It looks NFC and I think the original form is simpler.

484–487

What does it assert? I don't get the point.

clang/test/Modules/cxx20-import-diagnostics-a.cpp
57 ↗(On Diff #409201)

What if we don't add this? I think the original is good. So we should add a new test if we feel needed.

ChuanqiXu added inline comments.Feb 17 2022, 1:23 AM
clang/lib/Sema/SemaModule.cpp
185

I chose '-' in my implementation since here ModuleName refers to the name in filesystem, right? And I remember '-' is the choice of GCC. So let's keep consistency.

ChuanqiXu added inline comments.Feb 17 2022, 1:26 AM
clang/test/Modules/cxx20-partition-diagnostics-a.cpp
4–6

We could use -fsyntax-only instead of -o /dev/null

ChuanqiXu added inline comments.Feb 17 2022, 1:32 AM
clang/lib/Sema/SemaModule.cpp
374–375

Oh, I found it made it in following patches. I think the series of patch is a little bit sparse.

iains updated this revision to Diff 409552.Feb 17 2022, 2:12 AM

address review comments, rebase onto parent patches.

iains marked 7 inline comments as done.Feb 17 2022, 2:27 AM

some comments remain to be addressed - this update addressed Nathan's primarily - but also cover some of Chuanqi's

clang/include/clang/Sema/Sema.h
2987–2988

I\ve reverted my changes - leaving the variable as "Path"

This is a problem with the code serving two masters ...

... for clang hierachical modules "Path" makes some sense
... for C++20 not really - it is not a pathname (confused me when first reading the code).

however "NamedModuleNameAsPath" is too long ;)

clang/lib/Parse/Parser.cpp
2393–2394

will do this generally now .. given consensus amongst active reviewers.

clang/lib/Sema/SemaModule.cpp
185

This is not my understanding.

ModuleName is the "user-facing" name of the module that is described in the source, and we use this in diagnostics etc.

The translation of that name to an on-disk name can be arbitrary and there are several mechanisms in clang already (e.g. -fmodule-file=A:B=foo.pcm) the module loader uses these to find the module on the basis of its user-facing name where required.

When we have P1184, it is even more important that the interface is supplied with the name that the user will put into the source code.

233–234

I am sorry, perhaps 4/8 should have been squashed with this patch - but this was getting quite large anyway - please see 4/8 for reorganisation of this code.

268

please see 4/8

353–365

please see 4/8

357

agreed in comments, but I think it is outside of my remit to remove the functionality?

380

please see 4/8 and later

484–487

that we should have created a module if the compilation task was to build one. IIRC this was an existing assert .. but I will check.

urnathan added inline comments.Feb 17 2022, 4:19 AM
clang/lib/Sema/SemaModule.cpp
185

I agree with Iain, we should use ':' is module names here. When mapping a named module to the file system some translation is needed because ':' is not permitted in file names on some filesystems (you know the ones!)

357

We can't just drop the -fmodules-ts option. I think its semantics should be[come] the same as CPlusPlusModules. That's not a change for this patch.

iains updated this revision to Diff 409614.Feb 17 2022, 5:46 AM
iains marked 9 inline comments as done.

address review comments

clang/lib/Sema/SemaModule.cpp
185

(just to expand a little more)

the on-disk name needs to be chosen suitably for the platform and by the user and/or the build system.

When the FE chooses a default filename (which is done in building jobs, not in the Parser of course) it chooses one based on the source filename. It would be most unfortunate if the Parser/Sema needed to understand platform filename rules.

When you do 'clang -module-file-info' (with the existing or updated version) you should see the module name as per the source code, the translation would only apply to the filename itself.

  • to answer a later comment:

when we do -fmodule-file=A_B.pcm and A_B.pcm contains a module named A:B the loader notices this pairing when it pre-loads the module, so that when we ask for "A:B" the loader already knows which on-disk file contains. it.

if we use the HeaderSearch mechanisms (when we want to figure out a module-name<=> filename pairing without loading the module) we can use -fmodule-name=A:B=A_B.pcm,

These mechanisms work today, but P1184 is a more flexible mechanism and avoids having massive command lines with many -fmodule-file= cases.

357

that was what I expected.

366–367

again the code is reorganised in 4/8

441–459

it is not NFC:

  1. we add the C++20 case (the name is a single identifier)
  2. we avoid repeating the check for an empty Path (the first loop would effectively check that too).
clang/test/Modules/cxx20-import-diagnostics-a.cpp
57 ↗(On Diff #409201)

OK, probably it should have been in from the start - but we will also check these things in the tests taken from the standard, so I don't think it is important here; removed.

clang/test/Modules/cxx20-partition-diagnostics-a.cpp
4–6

heh, I usually do this not sure why I did not here; changed.

urnathan accepted this revision.Feb 17 2022, 5:57 AM

thanks

This revision is now accepted and ready to land.Feb 17 2022, 5:57 AM
ChuanqiXu added inline comments.Feb 17 2022, 10:52 PM
clang/lib/Sema/SemaModule.cpp
185

But what if we need to import multiple modules? In our current workflow, we would like to compile importing module in following style:

clang++ -std=c++20 -fprebuilt-module-path=path1 -fprebuilt-module-path=path2 ... unit.cpp(m) ...

And unit.cpp(m) may look like:

export module M;
import :parta;
import :partb;
import :partc;

And our compiler would lookup M-parta.pcm, M-partb.pcm and M-partc.pcm in the path given in the command line. However, in current implementation, it would lookup M:parta.pcm, M:partb.pcm and M:partc.pcm in the path but it might fail. So my point here is that the current implementation doesn't work well with fprebuilt-module-path. And I don't think we should give up fprebuilt-module-path to support multiple fmodule-file. The fprebuilt-module-path is easier to understand and use for real projects. Even if we support multiple -fmodule-file, people might be frustrating to add many fmodule-file if they need to import many units.

So I really insist on this. Or at least let's add one option, something like -fmodule-partition-separator here.

iains added inline comments.Feb 18 2022, 1:47 AM
clang/lib/Sema/SemaModule.cpp
185

The semantics of clang hierarchical module names are different from C++20 module names.

C++20 does not specify any relationship between the module name and the filename - that task is for the user, build system (or in the case we have with clang where it can integrate some of that functionality, the lookup mechanism).

out of interest if the user puts:

export module A.B:C.D;

how do you represent that on disk in your implementation?


In my understanding, if the user does:

export module A.B:C.D;

clang -cc1 -std=c++20 -emit-module-interface foo.cpp -o xyzzy.pcm

then we should see :

clang -module-file-info xyzzy.pcm

Information for module file 'xyzzy.pcm':
  Module format: raw
  Generated by this Clang: (/repos/llvm-git.git 97fc6544d6a09f161d90417db05674a2b3d2a5fe)
  Module name: A.B:C.D

I think it would be wrong if that printed:
Module name: A.B-C.D

The primary module name is A.B (_not_ A) and there is no implication that we would see
A/B/...

What would happen if we ported clang to some (unknown, unlikely) platform where '-' was not a legal pathname character - we should not expect to have to change Sema for this, right?


In summary my understanding is:

  • The compiler (Parser/Sema) should know the C++20 module name as it is written by the user.
  • You can achieve your objective of using the prebuilt-module-path, but the translation from C++20 module name to on-disk name needs to happen in the lookup, not in artificially changing the module name internally to the compiler.

As a general point, there are already (too) many modules command line flags, and we will probably need to add a few more for C++20. I have tried in my implementation to keep separate those for C++20 and those for implicit / hierarchical and modules ts.

IMHO it would be better for the user if modules were modal in the driver - and the driver rejected options that do not apply to the mode in action - but that is not for this patch!

urnathan added inline comments.Feb 18 2022, 3:53 AM
clang/lib/Sema/SemaModule.cpp
185

Information should be presented to the user in the form the user understands -- M-P is not such a format. As Iain says, the association from module names to file names is arbitrary, and baking (a default?) into Sema is not the right thing.

In some cases the compiler may need to tell the user both. For instance

cannot find module 'M:P' (random/path/M-P.pcm)

Let's not derail the initial pieces of partitions with this complex mapping problem.

ChuanqiXu accepted this revision.Feb 20 2022, 6:09 PM
ChuanqiXu added inline comments.
clang/lib/Sema/SemaModule.cpp
185

I still think the compiler should have a -fmodule-partition-separator option to tell the partition seperator in filesystem to ease the use of C++20 modules. It looks like that the current implementation couldn't handle the case that a module unit imports multiple partitions.

But I agree with there already too many options about clang modules. And I agree with we could handle the problem in later patches. I have no objection to stop the patch landing.

I think it is helpful to add some tests from https://reviews.llvm.org/D113972

iains added a comment.Feb 21 2022, 4:30 AM

I think it is helpful to add some tests from https://reviews.llvm.org/D113972

Actually, in response to your concerns, this morning I wrote a multi-partition test case to add to this (which does pass).

Would you be able to split out the important test cases from D113972 as a separate patch ?

I think it is helpful to add some tests from https://reviews.llvm.org/D113972

Actually, in response to your concerns, this morning I wrote a multi-partition test case to add to this (which does pass).

Thanks. I think it would be helpful to add it here.

Would you be able to split out the important test cases from D113972 as a separate patch ?

I found the meaningful test case mainly comes from the example in spec. e.g., http://eel.is/c++draft/module.unit#4, http://eel.is/c++draft/module.unit#4, http://eel.is/c++draft/module.unit#4

iains added a comment.Feb 21 2022, 4:51 AM

I think it is helpful to add some tests from https://reviews.llvm.org/D113972

Actually, in response to your concerns, this morning I wrote a multi-partition test case to add to this (which does pass).

Thanks. I think it would be helpful to add it here.

Would you be able to split out the important test cases from D113972 as a separate patch ?

I found the meaningful test case mainly comes from the example in spec. e.g., http://eel.is/c++draft/module.unit#4, http://eel.is/c++draft/module.unit#4, http://eel.is/c++draft/module.unit#4

Right, actually it is my intention to have each of the examples in the std as a test case (this is what I have in my working branch) but, obviously, we can only introduce them when the features they show are implemented. So (for example) 4/8 adds 10.1 examples 1 and 2. 5/8 adds in 10.3 ex1 and ex2 .. and so on ..

However, maybe I should review if some of the others should be accepted already :)

iains updated this revision to Diff 410320.Feb 21 2022, 8:54 AM

rebased, added a multi-partition testcase.

(May off the patch)

BTW, I know you also work on GCC frontend. I want to consultant what's your opinion to uniform the command line options between Clang and GCC. Now it looks totally different. I feel it would be a problem for clang. Since the command line between clang and gcc are basically compatible so that user who decide to use clang could turn back to use gcc whenever he want. But I am worrying that it would be a harder decision for users if them are going to use clang. Since the cost to use GCC again would be higher.

(May off the patch)

I am not sure what you mean here.

BTW, I know you also work on GCC frontend. I want to consultant what's your opinion to uniform the command line options between Clang and GCC. Now it looks totally different. I feel it would be a problem for clang. Since the command line between clang and gcc are basically compatible so that user who decide to use clang could turn back to use gcc whenever he want. But I am worrying that it would be a harder decision for users if them are going to use clang. Since the cost to use GCC again would be higher.

We have the same objective: to keep user-facing options and functionality the same between clang and GCC (easier for both user and for build systems).
The content of this patch does not alter that. BTW GCC also keeps the module name as "Primary:Parition" internally and user-facing output. The module name in C++20 modules has no fixed mapping to the BMI/CMI filename on disk,.

(May off the patch)

I am not sure what you mean here.

I mean this comment is not related to the patch. So it doesn't mean to block the patch landing.

BTW, I know you also work on GCC frontend. I want to consultant what's your opinion to uniform the command line options between Clang and GCC. Now it looks totally different. I feel it would be a problem for clang. Since the command line between clang and gcc are basically compatible so that user who decide to use clang could turn back to use gcc whenever he want. But I am worrying that it would be a harder decision for users if them are going to use clang. Since the cost to use GCC again would be higher.

We have the same objective: to keep user-facing options and functionality the same between clang and GCC (easier for both user and for build systems).
The content of this patch does not alter that. BTW GCC also keeps the module name as "Primary:Parition" internally and user-facing output. The module name in C++20 modules has no fixed mapping to the BMI/CMI filename on disk,.

I just tested the behavior of GCC. When I compile a partition interface unit (let's name is mod:part), it would generate a file named mod-part,gcm under the directory gcm.cache. So I feel what GCC does here is more aggressive, it doesn't only choose the separator '-'. It would create an cache directory by default. (I am not blaming it. I like the style too. Otherwise, we need to sore CMI to a place by hand). What I want to say here is that GCC would replace '-' to ':' when maps to the filesystem. I am OK to not implement this in the series patch. But I hope we could get in consensus that we should implement this.

(May off the patch)

I am not sure what you mean here.

I mean this comment is not related to the patch. So it doesn't mean to block the patch landing.

BTW, I know you also work on GCC frontend. I want to consultant what's your opinion to uniform the command line options between Clang and GCC. Now it looks totally different. I feel it would be a problem for clang. Since the command line between clang and gcc are basically compatible so that user who decide to use clang could turn back to use gcc whenever he want. But I am worrying that it would be a harder decision for users if them are going to use clang. Since the cost to use GCC again would be higher.

We have the same objective: to keep user-facing options and functionality the same between clang and GCC (easier for both user and for build systems).
The content of this patch does not alter that. BTW GCC also keeps the module name as "Primary:Parition" internally and user-facing output. The module name in C++20 modules has no fixed mapping to the BMI/CMI filename on disk,.

I just tested the behavior of GCC. When I compile a partition interface unit (let's name is mod:part), it would generate a file named mod-part,gcm under the directory gcm.cache. So I feel what GCC does here is more aggressive, it doesn't only choose the separator '-'. It would create an cache directory by default. (I am not blaming it. I like the style too. Otherwise, we need to sore CMI to a place by hand). What I want to say here is that GCC would replace '-' to ':' when maps to the filesystem. I am OK to not implement this in the series patch. But I hope we could get in consensus that we should implement this.

I think I am not explaining well enough ...

Please execute "strings gcm.cache/mod-part,gcm" - you will see that the Module name is "mod:part" in the module file (GCC knows the module as mod:part internally, it is only the on-disk name that is changed).

We could choose to use "mod-part.pcm" in our example test cases here, but it would not make any difference to the requirement - the mapping between module name and filename has to be determined by the mechanism that interfaces with the filesystem (we cannot impose that mapping on the module names internally to the compiler - since the mapping is unknown in the general case).

So we would implement a basic interface (in the module loader, I suppose since that is available for jobs without a preprocessor) that would give pathNameForModuleName(ModuleName) ,... and probably moduleNameForPathName(PathName) ... those would interface to whatever means was used (e.g. ,module mapper, P1184 etc). You could arrange for the default implementation to do the translation you see in GCC (mod:part <=> mod-part.pcm).

(May off the patch)

I am not sure what you mean here.

I mean this comment is not related to the patch. So it doesn't mean to block the patch landing.

BTW, I know you also work on GCC frontend. I want to consultant what's your opinion to uniform the command line options between Clang and GCC. Now it looks totally different. I feel it would be a problem for clang. Since the command line between clang and gcc are basically compatible so that user who decide to use clang could turn back to use gcc whenever he want. But I am worrying that it would be a harder decision for users if them are going to use clang. Since the cost to use GCC again would be higher.

We have the same objective: to keep user-facing options and functionality the same between clang and GCC (easier for both user and for build systems).
The content of this patch does not alter that. BTW GCC also keeps the module name as "Primary:Parition" internally and user-facing output. The module name in C++20 modules has no fixed mapping to the BMI/CMI filename on disk,.

I just tested the behavior of GCC. When I compile a partition interface unit (let's name is mod:part), it would generate a file named mod-part,gcm under the directory gcm.cache. So I feel what GCC does here is more aggressive, it doesn't only choose the separator '-'. It would create an cache directory by default. (I am not blaming it. I like the style too. Otherwise, we need to sore CMI to a place by hand). What I want to say here is that GCC would replace '-' to ':' when maps to the filesystem. I am OK to not implement this in the series patch. But I hope we could get in consensus that we should implement this.

I think I am not explaining well enough ...

Please execute "strings gcm.cache/mod-part,gcm" - you will see that the Module name is "mod:part" in the module file (GCC knows the module as mod:part internally, it is only the on-disk name that is changed).

We could choose to use "mod-part.pcm" in our example test cases here, but it would not make any difference to the requirement - the mapping between module name and filename has to be determined by the mechanism that interfaces with the filesystem (we cannot impose that mapping on the module names internally to the compiler - since the mapping is unknown in the general case).

So we would implement a basic interface (in the module loader, I suppose since that is available for jobs without a preprocessor) that would give pathNameForModuleName(ModuleName) ,... and probably moduleNameForPathName(PathName) ... those would interface to whatever means was used (e.g. ,module mapper, P1184 etc). You could arrange for the default implementation to do the translation you see in GCC (mod:part <=> mod-part.pcm).

Thanks for the detailed explanation. I think we're in consensus now.

jansvoboda11 added inline comments.
clang/test/Modules/cxx20-multiple-partitions.cpp
49

Instead of splitting the file with preprocessor, you could use the split-file utility (introduced in D83834).

iains updated this revision to Diff 410498.Feb 22 2022, 3:26 AM

rebased, update testcases to use split-file.

iains marked an inline comment as done.Feb 22 2022, 3:27 AM
iains added inline comments.
clang/test/Modules/cxx20-multiple-partitions.cpp
49

thanks!
this is exactly what is needed here.
I've converted the added cases here (and will do for the follow-on patches).

This revision was landed with ongoing or failed builds.Feb 24 2022, 1:01 AM
This revision was automatically updated to reflect the committed changes.
iains marked an inline comment as done.