This is an archive of the discontinued LLVM Phabricator instance.

Modules: add command line option to support loading prebuilt modules on demand, without parsing any module map
ClosedPublic

Authored by manmanren on Aug 3 2016, 10:43 AM.

Details

Summary

We add -fmodules-use-prebuilt-modules to support using prebuilt modules. In this mode, there is no need to load any module map and the programmer can simply use "@import" syntax to load the module directly from module cache without parsing any module map. We don't support rebuilding of the modules and ignore configuration mismatches.

Diff Detail

Event Timeline

manmanren updated this revision to Diff 66677.Aug 3 2016, 10:43 AM
manmanren retitled this revision from to Modules: add command line option to support loading prebuilt modules on demand, without parsing any module map.
manmanren updated this object.
manmanren added a reviewer: benlangmuir.
manmanren added a subscriber: cfe-commits.
benlangmuir edited edge metadata.Aug 3 2016, 11:09 AM

How about -fmodules-use-prebuilt-module-cache for the flag name? Saying "prebuilt-modules" is confusing to me, since -fmodule-file can also be used to load a prebuilt module, but doesn't use a cache.

lib/Driver/Tools.cpp
5416
if (IsPrebuilt) {
lib/Frontend/CompilerInstance.cpp
1491

Why do we assume this?

manmanren added inline comments.Aug 3 2016, 11:12 AM
lib/Driver/Tools.cpp
5416

Will do.

lib/Frontend/CompilerInstance.cpp
1491

Thanks for reviewing so quickly!

I was not sure about what value we should set IsSystem. Since we are mostly prebuilding system modules, I assumed "true". Do you have any suggestion here?

Sorry for the delay, I apparently forgot to hit submit. Replied inline.

lib/Frontend/CompilerInstance.cpp
1491

I guess treating it as system is the more forgiving option, so I'm okay with keeping it. It's probably worth adding a comment that this is not always the correct choice, since it can affect diagnostics, etc.

manmanren updated this revision to Diff 67431.Aug 9 2016, 4:07 PM
manmanren edited edge metadata.

Add comments for setting IsSystem to true.

benlangmuir accepted this revision.Aug 9 2016, 4:10 PM
benlangmuir edited edge metadata.

Still missing the change for if (IsPrebuilt) {, but otherwise LGTM.

This revision is now accepted and ready to land.Aug 9 2016, 4:10 PM

How about -fmodules-use-prebuilt-module-cache for the flag name? Saying "prebuilt-modules" is confusing to me, since -fmodule-file can also be used to load a prebuilt module, but doesn't use a cache.

Oops, forgot to address your other comments.
Yes, I will change the flag name as you suggested and use if (IsPrebuilt).

Cheers,
Manman

rsmith added a subscriber: rsmith.Aug 9 2016, 4:29 PM

This doesn't seem like quite the right user interface for this feature. The module cache is volatile, and it's not really reasonable for people to assume they know what is and isn't within it. Instead, it seems like what we want to provide is zero or more paths to directories containing prebuilt .pcm files whose file names are the same as their top-level module names, completely separate from the module cache.

Specifically, rather than spelling this as -fmodules-use-prebuilt-modules -fmodules-cache-path=X, I'd suggest spelling it as -fprebuilt-module-path=X or similar (which can be repeated to provide multiple search paths). That would then combine orthogonally with -fno-implicit-modules and -fno-implicit-module-maps, where implicit modules would still use the module cache. Does that make sense for your use case?

You're also missing updates to the modules documentation.

rsmith added inline comments.Aug 9 2016, 4:38 PM
lib/Frontend/CompilerInstance.cpp
1483

This looks wrong: you're asking ReadAST to not diagnose a configuration mismatch, and you also don't diagnose it yourself.

1499–1507

You requested that ReadAST produces diagnostics for the OutOfDate and Missing cases when using prebuilt modules, so this diagnostic is redundant.

lib/Serialization/ASTReader.cpp
495–498

This seems unnecessarily complicated. Perhaps you could make CompilerInstance::loadModule treat all prebuilt modules as MK_ExplicitModule, or add a third ModuleKind for them?

bruno added a subscriber: bruno.Aug 9 2016, 6:23 PM

This doesn't seem like quite the right user interface for this feature. The module cache is volatile, and it's not really reasonable for people to assume they know what is and isn't within it. Instead, it seems like what we want to provide is zero or more paths to directories containing prebuilt .pcm files whose file names are the same as their top-level module names, completely separate from the module cache.

Specifically, rather than spelling this as -fmodules-use-prebuilt-modules -fmodules-cache-path=X, I'd suggest spelling it as -fprebuilt-module-path=X or similar (which can be repeated to provide multiple search paths). That would then combine orthogonally with -fno-implicit-modules and -fno-implicit-module-maps, where implicit modules would still use the module cache. Does that make sense for your use case?

What we want is to achieve performance by prebuilding the modules and using @import to load the corresponding pcm without parsing the module maps. In ASTReader, we want to treat these modules in the same way as the explicit modules.

-fprebuilt-module-path sounds reasonable. Let me see how the implementation goes ...

You're also missing updates to the modules documentation.

Yes, I will update the document.

Cheers,
Manman

lib/Frontend/CompilerInstance.cpp
1499–1507

Yes, I can remove this diagnostic.

lib/Serialization/ASTReader.cpp
495–498

I tried adding a ModuleKind at some point, but somehow it didn't work out. I will take another look at that.

manmanren requested a review of this revision.Aug 10 2016, 11:15 AM
manmanren added a reviewer: rsmith.
manmanren updated this revision to Diff 67621.Aug 10 2016, 3:40 PM
manmanren edited edge metadata.

Addressing Richard's comments.

rsmith added inline comments.Aug 10 2016, 4:21 PM
include/clang/Lex/HeaderSearchOptions.h
96–97

It would seem preferable to allow multiple of these, to support mixing (for instance) a path to prebuilt system modules and a path to some per-project modules. I'd be fine handling that as a separate patch / discussion if you prefer.

lib/Frontend/CompilerInstance.cpp
1435–1439

It looks like the logic here is:

  • if we found the module in a module map, then look for it in the module cache
  • otherwise, if we have a prebuilt module path, then look for it there
  • otherwise, error

I think that may be surprising; if I explicitly load a module map, perhaps via -fmodule-map-file=, and supply a prebuilt form of that module in the prebuilt modules path, I'd expect my prebuilt form to still be used.

Perhaps instead we could first search the prebuilt module path for an existing .pcm file, and if that fails and we have a Module then look for it in the module cache?

1485–1493

The module file will have provided a Module instance, if it is in fact a module file for the requested module. Instead of inventing a Module here, can you check that one exists and produce an error if not?

lib/Serialization/ASTReader.cpp
2279–2280

You're allowing configuration mismatches that we don't consider to be "compatible configuration mismatches" here. That seems really scary -- I believe this allows using a C++ module from C and similar unreasonable things, that will cause clang to crash. Did you really mean to do that?

Thanks,
Manman

include/clang/Lex/HeaderSearchOptions.h
96–97

It sounds reasonable to provide prebuilt system modules and prebuilt project modules. I assume via the same command-line option -fprebuilt-module-path.

lib/Frontend/CompilerInstance.cpp
1435–1439

I was wondering about the priority of loading from prebuilt module path vs. loading from module cache. Your suggestion makes sense.

1485–1493

This is nice to know. I didn't realize we construct a Module when calling ReadAST. Can you be more specific on how to check if a Module exists?

lib/Serialization/ASTReader.cpp
2279–2280

I originally had this in my testing patch to be really relaxing on prebuilt modules. I will remove this :]

bruno added inline comments.Aug 10 2016, 5:34 PM
lib/Frontend/CompilerInstance.cpp
1435–1439

+1 to Richard's suggestion here.

I wonder about the workflow on generating such modules. Suppose I want to build some modules to re-use later with -fprebuilt-module-path=. RIght now, we get a module cache with hash in the names unless we disable hash computation via cc1 option -fdisable-module-hash. Should we assume that the user will have to copy out the cache dir and rename the modules OR that it will use -fdisable-module-hash instead? Should this option also try to read out modules with a hash in the name? Should we promote -fdisable-module-hash to a driver option as well?

@Manman, do you have any workflow in mind? @Richard, do you have any feedback in this respect?

In the crash reproducer we often have to rebuild the modules when running the script in a different machine because of the paths used to hash the modules, it would be nice (depending on the resulting design) if the crash reproducer could use -fprebuilt-module-path in order to try to re-use the pre-built modules resulting from the crash.

manmanren added inline comments.Aug 11 2016, 11:41 AM
lib/Frontend/CompilerInstance.cpp
1435–1439

If the modules are built implicitly with module hash, to load the module files as prebuilt modules, the user needs to copy out the cache dir and rename the modules. The user can choose to emit the modules explicitly or disable module hash when building the modules.

1485–1493

I assumed ReadAST only constructed Module instances for submodules, because it is inside ReadSubmoduleBlock. It turns out it also constructed Module instance for the top level module. I will update the patch.

manmanren updated this revision to Diff 67727.Aug 11 2016, 12:15 PM

Addressing Richard's comments.

Richard,

Are you okay with the patch now?

Thanks,
Manman

rsmith edited edge metadata.Aug 17 2016, 3:00 PM

Just a couple of things, but this basically LGTM.

docs/Modules.rst
217 ↗(On Diff #67727)

Please document that the option can be given multiple times.

lib/Frontend/CompilerInstance.cpp
1486

This should have a real diagnostic rather than an assertion. This would happen if the pcm file in the prebuilt module path had the wrong filename, for instance.

It'd also be good to check that Module->ASTFile actually refers to the file that we found in the prebuilt module path, and not (say) to one of its transitive dependencies that ReadAST also loaded.

manmanren updated this revision to Diff 68442.Aug 17 2016, 3:56 PM
manmanren edited edge metadata.

Thanks,

Manman

lib/Frontend/CompilerInstance.cpp
1486

Updated the patch to use error diagnostics instead of assertion.

<<<
It'd also be good to check that Module->ASTFile actually refers to the file that we found in the prebuilt module path, and not (say) to one of its transitive dependencies that ReadAST also loaded.

Did we already check this when creating the module in ASTReader?

if (!ParentModule) {
  if (const FileEntry *CurFile = CurrentModule->getASTFile()) {
    if (CurFile != F.File) {
      ...
    }
  }

  CurrentModule->setASTFile(F.File);
}
rsmith added inline comments.Aug 17 2016, 4:26 PM
lib/Frontend/CompilerInstance.cpp
1486

The AST reader has no idea which module we're trying to load a module file for, so it can't check this. The check you quoted is ensuring that we don't have two different module files providing the same module.

A testcase for this would be:

module A is in prebuilt-module-cache/x.pcm
module B is in prebuilt-module-cache/A.pcm and depends on x.pcm

When trying to load module A, ReadAST will load A.pcm, and we'll find that we now have a definition for module A, but we still didn't get the module from the expected file.

manmanren updated this revision to Diff 68456.Aug 17 2016, 4:53 PM

Cheers,

Manman

lib/Frontend/CompilerInstance.cpp
1486

Thanks for the explanation, it makes sense.

Updated the patch accordingly.

rsmith added inline comments.Aug 17 2016, 4:59 PM
lib/Frontend/CompilerInstance.cpp
1487

It'd be safer to check that FileMgr.getFile(ModuleFileName) == Module->getASTFile() in case the filename gets canonicalized by the file manager in some way.

rsmith accepted this revision.Aug 17 2016, 4:59 PM
rsmith edited edge metadata.
This revision is now accepted and ready to land.Aug 17 2016, 4:59 PM
manmanren added inline comments.Aug 17 2016, 5:18 PM
lib/Frontend/CompilerInstance.cpp
1487

Yes you are right. I will update the patch and commit the change.

Thanks for all the useful information!

This revision was automatically updated to reflect the committed changes.