This is an archive of the discontinued LLVM Phabricator instance.

[Modules] Add ability to specify module name to module file mapping
ClosedPublic

Authored by boris on Jul 5 2017, 10:17 AM.

Details

Summary

Extend the -fmodule-file option to support the [<name>=]<file> value format.
If the name is omitted, then the old semantics is preserved (the module file
is loaded whether needed or not). If the name is specified, then the mapping
is treated as just another prebuilt module search mechanism, similar to
-fprebuilt-module-path, and the module file is only loaded if actually
used (e.g., via import). With one exception: this mapping also overrides
module file references embedded in other modules (which can be useful if
module files are moved/renamed as often happens during remote compilation).

This override semantics requires some extra work: we now store the module
name in addition to the file name in the serialized AST representation as
well as keep track of already loaded prebuilt modules by-name in addition
to by-file.

Patch by Boris Kolpackov


Additional notes:

  1. Based on this mailing list discussion:

    http://lists.llvm.org/pipermail/cfe-dev/2017-June/054431.html
  1. The need to change the serialized AST representation was a bit more than what I hoped for my first patch but based on this FIXME comment I recon this is moving in right direction:

    ASTReader::ReadModuleOffsetMap():

    // FIXME: Looking up dependency modules by filename is horrible.

    So now, at least for prebuilt modules, we look up by the module name.
  1. Overloading -fmodule-file for this admittedly fairly different semantics might look like a bad idea and source of some unnecessary complexity. The problem with a separate option approach is the difficulty of finding a decent name that is not already used (e.g., -fmodule-map is out because of -fmodule-map-file; more details in the mailing list thread). But I am still open to changing this to a separate option if there are strong feelings (and good name suggestions ;-)).
  1. I plan to implement a companion option that will read this mapping from a file. I will submit it as a separate patch once the general validity of the approach is confirmed.

Diff Detail

Event Timeline

boris created this revision.Jul 5 2017, 10:17 AM
boris updated this revision to Diff 106041.Jul 11 2017, 9:00 AM

Rebase on latest HEAD.

boris updated this revision to Diff 107017.Jul 17 2017, 9:33 PM

Rebase on latest HEAD.

boris added a comment.Jul 17 2017, 9:38 PM

Ping.

I am holding off on proposing the same functionality to GCC because I want to make sure the command line interface is the same for both compilers (GCC has less baggage in this area, option-name-wise). So confirming that at least the naming/semantics are acceptable would be very helpful.

boris added a comment.Jul 25 2017, 4:46 AM

Ping.

FWIW, I went ahead and implemented this functionality in GCC. It has been merged into its c++-modules branch.

rsmith added inline comments.Aug 15 2017, 5:57 PM
docs/ClangCommandLineReference.rst
3–4

Please revert the change to this file; we can regenerate it after this lands (as a separate commit).

include/clang/Driver/Options.td
1105

How about "Specify the mapping of module name to precompiled module file, or load a module file if name is omitted."?

include/clang/Serialization/ModuleManager.h
52–53

llvm::StringMap would be preferable here.

lib/Driver/ToolChains/Clang.cpp
3610–3620

What's the purpose of splitting this into two separate loops? I see that you're passing on all -fmodule-file=name=value flags even if modules is not enabled. Is that useful?

lib/Frontend/CompilerInstance.cpp
1623–1630

This doesn't make sense for header modules (defined by a module map). For the case of a Modules TS module implementation unit, we should skip this CurrentModule check, but let's leave that for a separate patch.

lib/Frontend/CompilerInvocation.cpp
982

There should be some way to specify a module file that contains a = in its file name. Ideally, that way would be compatible with our currently-supported form of -fmodule-name=filename. I don't see a way to support that other than stating every value we're given and checking to see whether it exists before attempting to split on a =... thoughts?

One somewhat hackish alternative would be to only recognize an = if there is no preceding /. (So -fmodule-file=foo=bar.pcm specifies a mapping, and -fmodule-file=./foo=bar.pcm specifies the file ./foo=bar.pcm.)

(FWIW, we also support module names containing = characters.)

lib/Lex/HeaderSearch.cpp
131

I like the renaming you've done here. Should this one also be renamed to getCachedModuleFileName?

lib/Serialization/ASTReader.cpp
4145–4146

I'm worried by this: the Type can be different for different imports of the same .pcm file, and you'll only get here for the *first* import edge. So you can fail to add the module to the "prebuilt modules" map here, and then later attempt to look it up there (when processing the module offset map) and fail to find it.

Instead of keeping a separate lookup structure on the module manager, could you look up the Module, and then use its ASTFile to find the corresponding ModuleFile?

(If you go that way, the changes to module lookup when reading the module offset map could be generalized to apply to all MK_*Module types.)

boris marked 6 inline comments as done.Aug 19 2017, 1:41 AM

I've marked as "done" items that I have resolved in my local revision (not yet uploaded) and have added one comment for further feedback.

lib/Frontend/CompilerInvocation.cpp
982

A couple of thoughts:

  1. Using stat() is also not without issues: I may have an unrelated file with a name that matches the whole value -- this will be fun to debug.
  1. GCC seems to have given up on trying to support paths with '=' since AFAIK, none of their key=value options (e.g., -fdebug-prefix-map) support any kind of escaping. I think the implicit assumption is that if you are using paths with '=' in them, then you've asked for it.
  1. The '/' idea will work but will get a bit hairier if we also want to support Windows (will need to check for both slashes).
  1. Another option is to reserve empty module name as a way to escape '=': -fmodule-file==foo=bar.pcm. Not backwards compatible (but neither is ./) and looks a bit weird but is the simplest to implement.

My preference order is (2), (4), (3), (1). Let me know which way you want to go.

boris added inline comments.Aug 19 2017, 8:24 AM
lib/Serialization/ASTReader.cpp
4145–4146

Yes, I was not entirely happy with this register prebuilt business so thanks for the hint. I've implemented this and all tests pass (both Clang's and my own). Though I cannot say I fully understand all the possible scenarios (like when can ASTFile be NULL).

(If you go that way, the changes to module lookup when reading the module offset map could be generalized to apply to all MK_*Module types.)

Are you suggesting that we switch to storing module names instead of module files for all the module types in the offset map? If so, I think I've tried that at some point but it didn't go well (existing tests were failing if I remember correctly). I can try again if you think this should work.

boris updated this revision to Diff 111826.Aug 19 2017, 8:28 AM

New revision of the patch that I believe addresses all the issues except for the '=' escaping.

rsmith accepted this revision.Aug 25 2017, 5:10 PM

Only typographical nits and post-commit suggestions, so please go ahead. Thanks!

include/clang/Serialization/ModuleManager.h
66

&, not & , please.

lib/Frontend/CompilerInvocation.cpp
982

We've had a while to think about this, and haven't found a completely satisfactory answer, so let's go with (2) for now, without prejudice to future alternatives. It's fully GCC-compatible, and happens to be what this patch already does :)

983

Does a module file specified via -fmodule-file=foo=foo.pcm get included in the dep file if it is used? (If not, I'm happy for that to be fixed in a separate change, but it does need to be fixed for depfile-based build systems.)

lib/Serialization/ASTReader.cpp
4145–4146

Yes, that's what I was suggesting, but don't block this commit on it. Ultimately, we should probably be using something like an index into the IMPORTS list instead (but it's not quite as easy as that since we also have module offsets for imports of imports and so on).

10116

No space before ().

This revision is now accepted and ready to land.Aug 25 2017, 5:10 PM
boris marked 2 inline comments as done.Aug 30 2017, 1:13 AM
boris added inline comments.
lib/Frontend/CompilerInvocation.cpp
983

No, currently it does not and neither a module file that is discovered via -fprebuilt-module-path. I think if we are doing this, then we should do for both cases. What do you think?

This revision was automatically updated to reflect the committed changes.