This is an archive of the discontinued LLVM Phabricator instance.

[C++20][Modules] Introduce an implementation module.
ClosedPublic

Authored by iains on Jun 3 2022, 5:15 AM.

Details

Summary

We need to be able to distinguish individual TUs from the same module in cases
where TU-local entities either need to be hidden (or, for some cases of ADL in
template instantiation, need to be detected as exposures).

This creates a module type for the implementation which implicitly imports its
primary module interface per C++20:
[module.unit/8] 'A module-declaration that contains neither an export-keyword
nor a module-partition implicitly imports the primary module interface unit of
the module as if by a module-import-declaration.

Implementation modules are never serialized (-emit-module-interface for an
implementation unit is diagnosed and rejected).

Diff Detail

Event Timeline

iains created this revision.Jun 3 2022, 5:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 5:15 AM
iains added a subscriber: Restricted Project.
iains published this revision for review.Jun 3 2022, 5:28 AM

we need to distinguish implementation and interface in code-gen.

Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2022, 5:28 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Implementation modules are never serialized (-emit-module-interface for an implementation unit is diagnosed and rejected).

Never? Or just not via -emit-module-interface? I would expect to be able to serialize via -emit-ast.

iains added a comment.Jun 3 2022, 9:02 AM

Implementation modules are never serialized (-emit-module-interface for an implementation unit is diagnosed and rejected).

Never? Or just not via -emit-module-interface? I would expect to be able to serialize via -emit-ast.

My current intent is that this is a implementation detail (a mechanism for maintaining the module purview and type for Sema and CG).

As things stand, I do not think we are set up to handle (de-)serialization of two kinds of module with the same name, so I'd prefer to defer
possible extension to allow this for now (not against the principle, just trying avoid a rabbit hole that's tangential to the current purpose).

ChuanqiXu added inline comments.Jun 5 2022, 8:17 PM
clang/lib/Lex/ModuleMap.cpp
951

The implementation looks similar to createModuleForInterfaceUnit really. It looks better to refactor it to createModuleUnits (or createModuleForCXX20Modules) which containing an additional argument Module::ModuleKind. It would optimize the case for partitions too, which uses createModuleForInterfaceUnit now and it is a little bit odd.

clang/lib/Sema/SemaModule.cpp
356

I feel like the 3 comments are redundant.

393–397

We could move this logic to the place Import is created.

418–419

Is it necessary/helpful to return the import declaration? Although there is a FIXME, I think the current method looks a little bit confusing.

Implementation modules are never serialized (-emit-module-interface for an implementation unit is diagnosed and rejected).

Never? Or just not via -emit-module-interface? I would expect to be able to serialize via -emit-ast.

My current intent is that this is a implementation detail (a mechanism for maintaining the module purview and type for Sema and CG).

As things stand, I do not think we are set up to handle (de-)serialization of two kinds of module with the same name, so I'd prefer to defer
possible extension to allow this for now (not against the principle, just trying avoid a rabbit hole that's tangential to the current purpose).

I think Iain's "(de-)serialization" means "generating PCM files". And it makes no sense to block users to use emit-ast for implementation units. I don't think we have divergence here since I don't find codes which try to emit an error if we are attempting to generate a PCM file for an implementation unit.


I feel better if we could add some tests:

  • As mentioned above, emit an error when we trying to generate a PCM file for an implementation unit.
  • Currently, when we use implementation units, we need to specify -fmodule-file=/path/to/the/pcm/of/primary/interface in the command line. It looks like we could omit it after the patch (as long as the PCM file of primary interface lives in the path specified by -fprebuilt-module-path)
clang/lib/Frontend/FrontendActions.cpp
763

Since ModuleKindName() here is used as a helpful for printers, it looks odd to contain the should never be serialized part. I think we should check this when we try to generate PCM files for ModuleImplementationUnit.

iains planned changes to this revision.Jun 6 2022, 2:42 AM

thanks for the reviews, I need to figure out why the implementation passes clang tests but gives fails for clangd.

iains updated this revision to Diff 434774.Jun 7 2022, 4:33 AM

rebased and tidied (still changes planned).

urnathan resigned from this revision.Aug 16 2022, 12:21 PM
iains updated this revision to Diff 505759.Mar 16 2023, 3:36 AM

rebased, and reworked for changes in main.

iains added a comment.Mar 16 2023, 3:42 AM

This was originally created (and @ChuanqiXu approved) for the work on module initialisers. I did not apply it then, since it was possible to determine the correct state for the initialisers without it.

However, now we are trying to handle some of the details of different lookup requirements between same-module and same-TU, I think it is a foundation for that. This also means that implementation units will get their own TU decl and have a distinct decl context (rather than the current case that we share the decl context of the interface).

The units are never serialised (as noted above) but we still need to allocate an extra bit for the enum (since we now have 9 values); this is not a significant load (since it adds only one bit to each module header).

(I don't why I can't send emails in the original mail address. It told me that your email addressed is not recorded by MX, while I don't know what is MX. So I tried to reply your email here)

This FIXME was added 6 years ago for Modules TS. (https://github.com/llvm/llvm-project/commit/becb92dec85924969ac0c3b049e0a74def431453).
And I moved the test simply without modifying it actually.

And I guess it may refer to the problem we are discussing. The spec says the implementation unit will import the primary module interface implicitly.
But the current implementation doesn't import it actually but loading it. This is inconsistent with the wording and it is not consistent with the design intention in clang.
It resulted some problems we saw.

I don't remember clearly if we have a test for the initializer in the implementation unit. Ignore this if we already have one.

clang/include/clang/Basic/Module.h
140

We may be able to save 1 bit for ModuleKind by adding two field bits to Module: IsImplementation and IsPartition. Then we can remove ModulePartitionInterface and ModulePartitionImplementation. Then let's rename ModuleInterfaceUnit to NamedModuleUnit.
So we can judge module interface unit, implementation unit, module partition interface and module partition implementation by NamedModuleUnit and the two bits.

clang/lib/Sema/SemaDecl.cpp
1678–1683

I feel slightly better to add a field in Sema:

Module *PrimaryModuleInterface = nullptr; // Only valid if we're parsing a module unit.

Then we can avoid the string compare here. Also we can add it to Sema::isUsableModule. Now Sema::isUsableModule works because it falls to the string comparison.

clang/lib/Sema/SemaModule.cpp
305

Then we can avoid to declare this if we have Sema::PrimaryModuleInterface

409–410

ModuleScopes can't be empty here.

clang/test/CXX/module/basic/basic.def.odr/p4.cppm
155–156

nit: I am a little bit tired to add FIXME in the tests... let's not try to add more...

iains marked 10 inline comments as done.Mar 17 2023, 2:07 AM

General comments (at least my opinion).

  1. One intention of this patch is to make Implementation Module Units regular (i.e. they should behave the same as any other module unit). So I would tend to avoid changes that make this back into a "special case"
  1. The lookup problems might be considered to be quite a serious bug, and so we should consider possible back porting and try to avoid making large changes in theses patches (once that problem is solved, then we can refactor, I think).
  1. We do have tests for the initialiser in implementation units.

And I guess it may refer to the problem we are discussing. The spec says the implementation unit will import the primary module interface implicitly.
But the current implementation doesn't import it actually but loading it. This is inconsistent with the wording and it is not consistent with the design intention in clang.
It resulted some problems we saw.

This patch implements the remaining pieces of that (including "as if by an import).

clang/include/clang/Basic/Module.h
140

Yes I agree we could do this, but let's keep refactoring as a follow-on job.

clang/lib/Frontend/FrontendActions.cpp
763

OK. I can remove that

We do already check and refuse to generate a PCM for a module implementation (if you put -emit-module-interface, and the file contains module Foo; it will be rejected).

clang/lib/Lex/ModuleMap.cpp
951

OK I did think about refactoring that, but decided to keep the changes simple for now. I am happy to look at refactoring.

clang/lib/Sema/SemaDecl.cpp
1678–1683

I also thought about doing this, but decided that we should try to keep the interface regular - so that all modules are treated the same.

If we find that the string comparison for primary module name is a "hot spot" then we should address it a different way - by caching an identifier or similar (since we need to compare it in other places too).

clang/lib/Sema/SemaModule.cpp
305

see comment above.

356

OK..

393–397

We cannot move all of it - since some of the information is being added to the module that we create after the import.

409–410

OK.

418–419

The FIXME predates our work on this, at the moment I do not know what the idea was for that - I am happy to remove it now - the current implementation does not need a module decl.

clang/test/CXX/module/basic/basic.def.odr/p4.cppm
155–156

I agree that FIXMEs in tests are likely to be forgotten, so that they are not so much use.

However, we have a situation in which the test needs to be incorrect in order to pass at the present time; we do not have a good mechanism for noting this - since if we add the expected-error, then the test will fail - so we'd have to XFAIL it - which would mean that we would lose test coverage for the cases that do work.

If you prefer, I can remove the FIXME.

The lookup problems might be considered to be quite a serious bug, and so we should consider possible back porting and try to avoid making large changes in theses patches (once that problem is solved, then we can refactor, I think).

I agree it is important to fix the bug. But I don't think we should back port this one to 16. On the one hand, this is not a regression bug. On the other hand, the existing users (we already have a lot users now) didn't report such problems. So I prefer to target 17 for this. And I don't mind to do refactoring after landing the fixing patch. Since we're looking at the code actively.

clang/include/clang/Basic/Module.h
140

OK. I don't think this is a blocking issue too. Let's have one FIXME for it.

clang/lib/Sema/SemaDecl.cpp
1678–1683

It just smells bad. I did profiling for the performance for modules these days. And it looks like the string comparison can hard to be the hot spot. The bottleneck lives in other places. The string comparison may be in the long tail. So we might not be able to prove this one by giving numbers. But it indeed smells bad and we can solve it simply. So let's make it. I think it is important to improve our code quality.

clang/test/CXX/module/basic/basic.def.odr/p4.cppm
155–156

I don't feel bad to remain this one. Since it is a little bit special as you said.

iains updated this revision to Diff 507535.Mar 22 2023, 3:57 PM
iains marked 12 inline comments as done.
iains edited the summary of this revision. (Show Details)

rebased, addressed review comments, and amended the description.

the patch has been repurposed to provide a mechanism to track implementation
TUs as separate entities which allows us to deal with handling Tu-local
entities better.

iains marked an inline comment as done.Mar 22 2023, 4:01 PM
iains added inline comments.
clang/lib/Sema/SemaDecl.cpp
1678–1683

I have done this (for now) to make progress.

but, actually, in this case, I do not agree with the mechanism

.. instead of making this into a special case we should (as part of the following fixes to lookup) make efficient helpers in module / sema that allow us to see "same TU", "same named module" etc. That way we can improve the performance of any one of those if it becomes an issue.

LGTM. Thanks.

clang/include/clang/Sema/Sema.h
2278
clang/lib/Sema/SemaDecl.cpp
1678–1683

Yeah, I tried to use hash algorithm to avoid the string comparison. But there is a concern that the hash algorithm is not safe. And I found the string comparison is not the bottle neck then. So I think it may not be too late to perform it actually if there is related bug report.

ChuanqiXu accepted this revision.Mar 22 2023, 7:06 PM
This revision is now accepted and ready to land.Mar 22 2023, 7:06 PM
This revision was landed with ongoing or failed builds.Mar 23 2023, 5:48 AM
This revision was automatically updated to reflect the committed changes.
iains marked an inline comment as done.
bjope added a subscriber: bjope.Mar 26 2023, 1:36 PM

This seem to case problems when building with asan enabled (LLVM_USE_SANITIZER='Address'):

Failed Tests (24):
  Clang :: CXX/basic/basic.link/p2.cpp
  Clang :: CXX/basic/basic.lookup/basic.lookup.argdep/p5-ex2.cpp
  Clang :: CXX/basic/basic.scope/basic.scope.namespace/p2.cpp
  Clang :: CXX/module/basic/basic.def.odr/p4.cppm
  Clang :: CXX/module/basic/basic.def.odr/p6.cppm
  Clang :: CXX/module/basic/basic.link/module-declaration.cpp
  Clang :: CXX/module/basic/basic.link/p2.cppm
  Clang :: CXX/module/dcl.dcl/dcl.module/dcl.module.import/p1.cppm
  Clang :: CXX/module/dcl.dcl/dcl.module/dcl.module.interface/p1.cppm
  Clang :: CXX/module/dcl.dcl/dcl.module/p1.cpp
  Clang :: CXX/module/dcl.dcl/dcl.module/p5.cpp
  Clang :: CXX/module/module.interface/p1.cpp
  Clang :: CXX/module/module.interface/p2.cpp
  Clang :: CXX/module/module.unit/p8.cpp
  Clang :: CodeGenCXX/cxx20-module-impl-1a.cpp
  Clang :: CodeGenCXX/module-intializer.cpp
  Clang :: Modules/cxx20-10-1-ex1.cpp
  Clang :: Modules/cxx20-10-1-ex2.cpp
  Clang :: Modules/cxx20-10-2-ex5.cpp
  Clang :: Modules/cxx20-10-3-ex2.cpp
  Clang :: Modules/cxx20-impl-module-conditionally-load.cppm
  Clang :: Modules/cxx20-import-diagnostics-a.cpp
  Clang :: Modules/cxx20-partition-redeclarations.cpp
  Clang :: Modules/pr58532.cppm

Here is a typical log from Modules/cxx20-partition-redeclarations.cpp:

==80746==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 2080 byte(s) in 1 object(s) allocated from:
    #0 0x61ea6d in operator new(unsigned long) /repo/old//compiler-rt/lib/asan/asan_new_delete.cpp:99:3
    #1 0x11a9ae7a in clang::ModuleMap::createModuleUnitWithKind(clang::SourceLocation, llvm::StringRef, clang::Module::ModuleKind) /repo/clang/lib/Lex/ModuleMap.cpp:894:7
    #2 0x11a9bc57 in clang::ModuleMap::createModuleForImplementationUnit(clang::SourceLocation, llvm::StringRef) /repo/clang/lib/Lex/ModuleMap.cpp:933:7
    #3 0xe84a76d in clang::Sema::ActOnModuleDecl(clang::SourceLocation, clang::SourceLocation, clang::Sema::ModuleDeclKind, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation>>, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation>>, clang::Sema::ModuleImportState&) /repo/clang/lib/Sema/SemaModule.cpp:350:17
    #4 0xd347fc5 in clang::Parser::ParseModuleDecl(clang::Sema::ModuleImportState&) /repo/clang/lib/Parse/Parser.cpp:2478:18
    #5 0xd3451b4 in clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) /repo/clang/lib/Parse/Parser.cpp:658:14
    #6 0xd343014 in clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, clang::Sema::ModuleImportState&) /repo/clang/lib/Parse/Parser.cpp:593:26
    #7 0xd32c624 in clang::ParseAST(clang::Sema&, bool, bool) /repo/clang/lib/Parse/ParseAST.cpp:161:25
    #8 0x9a0a494 in clang::FrontendAction::Execute() /repo/clang/lib/Frontend/FrontendAction.cpp:1058:8
    #9 0x98025a0 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /repo/clang/lib/Frontend/CompilerInstance.cpp:1048:33
    #10 0x9ca0f3d in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /repo/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:264:25
    #11 0x63ad58 in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /repo/clang/tools/driver/cc1_main.cpp:251:15
    #12 0x62d5cc in ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) /repo/clang/tools/driver/driver.cpp:366:12
    #13 0x628173 in clang_main(int, char**, llvm::ToolContext const&) /repo/clang/tools/driver/driver.cpp:407:12
    #14 0x65de41 in main /repo/tools/clang/tools/driver/clang-driver.cpp:15:10
    #15 0x7fd7c0406554 in __libc_start_main (/lib64/libc.so.6+0x22554)

SUMMARY: AddressSanitizer: 2080 byte(s) leaked in 1 allocation(s).

It is for example seen in this buildbot (sanitizer-x86_64-linux-bootstrap-asan): https://lab.llvm.org/buildbot/#/builders/168

iains added a comment.Mar 26 2023, 8:22 PM

This seem to case problems when building with asan enabled (LLVM_USE_SANITIZER='Address'):

investigating... do you need the patch reverted?

bjope added a comment.Mar 27 2023, 1:18 AM

This seem to case problems when building with asan enabled (LLVM_USE_SANITIZER='Address'):

investigating... do you need the patch reverted?

I've reverted it downstream right now, so personally I don't have a problem with this right now. Still, unless it will go quickly to find a fix it might help others people if reverting (it was not obvious to me which patch that caused the failures, and others might still be bisecting trying to find which commit that caused the trouble).

hctim added a subscriber: hctim.Mar 27 2023, 5:04 AM

Hi, yep, I've reverted upstream. If you could please also integrate https://reviews.llvm.org/rG8c7c1f11ffaacf762e612c65440fd2cbb58ee426 in the relanding, that would be great.

iains reopened this revision.Mar 27 2023, 7:54 AM

I had a hunch that the issue was the non-ownership of the implementation module.

This revision is now accepted and ready to land.Mar 27 2023, 7:54 AM
iains updated this revision to Diff 508662.Mar 27 2023, 7:55 AM

rebased, reworked to have the module owned.

The implementation module needs to be owned by the mpodul map so that
it is released when done. Reworked the comments and assert to check the
main file presence.

iains added a comment.Mar 27 2023, 7:58 AM

It took me a while to get my local macOS based devt. environment to reproduce the problem - but it was as expected; the implementation module was unowned and nothing was deleting it.

unless CI throws up anything untoward, I plan to re-land this tomorrow (since there is dependent work in my queue and this blocks or at least impedes progress there).

iains updated this revision to Diff 509014.Mar 28 2023, 7:52 AM

rebased, retested.

ChuanqiXu accepted this revision.Mar 28 2023, 6:49 PM
ChuanqiXu added inline comments.
clang/lib/Lex/ModuleMap.cpp
935

nit: It should more consistent to use <ImplementationUnit>.

iains marked an inline comment as done.Mar 28 2023, 8:29 PM
iains added inline comments.
clang/lib/Lex/ModuleMap.cpp
935

(I do not mind making this change later, if you like - but I did not want to repeat the test cycle today).

In this case I think that <> would not mean the same thing as it does in <global> and <private> modules fragments, These are never entered into the module map - but are always owned by a parent module.

In the case of the ImplementationUnit - the *name* of the module is unchanged (i.e. it would still be called M when the module line is module M;.

The .ImplementationUnit is not the name of the module - but rather it is a key into the module map (which needs to be different from the name of the interface), Since there can only be one Implementation Unit in a given session, it is safe to use a fixed key.

However, I do not mind changing the key to <ImplementationUnit> if you think that it would be more clear.

ChuanqiXu added inline comments.Mar 28 2023, 8:35 PM
clang/lib/Lex/ModuleMap.cpp
935

Thanks, I think it doesn't matter too.