This is an archive of the discontinued LLVM Phabricator instance.

[clang][lex] Fix search path usage remark with modules
AbandonedPublic

Authored by jansvoboda11 on Nov 11 2021, 7:15 AM.

Details

Summary

When looking for module, HeaderSearch::lookupModule iterates through search paths, parses modulemaps, eagerly creates Module instances and caches them in ModuleMap. When first called, it will correctly note usage of the search path that discovered the returned module. On subsequent calls, however, it can pull previously created Module from ModuleMap without noting the search path was used.

This patch fixes that. This requires HeaderSearch to be able to go from Module to the index of corresponding search path. This is achieved by adding a callback to the modulemap parser, intercepting creations of Module instances and pairing them with the current search path index.

The outlined solution doesn't handle all corner cases, though. It's possible for clients to use HeaderSearch to lookup module B, which could parse an unrelated modulemap and create instance of module A. The client can then look up module A directly in ModuleMap, avoiding the logic in HeaderSearch that makes -Rsearch-path-usage work.

Depends on D116750 & D116751.

Diff Detail

Event Timeline

jansvoboda11 requested review of this revision.Nov 11 2021, 7:15 AM
jansvoboda11 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2021, 7:15 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jansvoboda11 added inline comments.Nov 11 2021, 7:19 AM
clang/lib/Lex/HeaderSearch.cpp
264

This is the important change.

jansvoboda11 edited the summary of this revision. (Show Details)Nov 11 2021, 9:15 AM

Apply clang-format.

Add extra test case

LGTM overall. I’ve got a few questions/remarks inline.

clang/lib/Lex/HeaderSearch.cpp
94

When would the moduleMapModuleCreated be called while CurrentSearchPathIdx == ~0U? Could this be an assert instead?

264

Just a thought: Could we move noteModuleLookupUsage into findModule? IIUC that would guarantee that we’re catching all cases and we wouldn’t need to call noteModuleLookupUsage from both overloads of lookupModule.

clang/lib/Lex/ModuleMap.cpp
851

Maybe a stupid question but why don’t we need to call the callback e.g. here (or on the other new Module) calls in this file?

Thanks for the feedback, I'll try to clean this up a bit.

clang/lib/Lex/HeaderSearch.cpp
94

This happens whenever any of the ModuleMap member functions that create new Module instances are called outside of HeaderSearch.

The MMCallback callback is basically "global" (present for the whole lifetime of ModuleMap), so that we don't have to repeatedly register/deregister it in HeaderSearch::lookupModule.

264

That would clean up HeaderSearch nicely. I'll look into creating new HeaderSearch::findModule function that would wrap ModuleMap::findModule and note usage.

clang/lib/Lex/ModuleMap.cpp
851

This is related to C++20 modules, so it wasn't necessary for the test case I had in mind.

But I agree we should handle this as well. I think more robust solution would be to add factory function to ModuleMap through which all Module instances get created, and invoke the callback there.

ahoppen added inline comments.Nov 12 2021, 7:57 AM
clang/lib/Lex/HeaderSearch.cpp
94

Is there any reasonable case where module maps would be created without HeaderSearch triggering the creation?

jansvoboda11 added inline comments.Nov 12 2021, 8:15 AM
clang/lib/Lex/HeaderSearch.cpp
94

I think parsing of module maps (and therefore creation of contained modules) should be triggered through HeaderSearch.

However, there are also C++20 modules and explicit Clang modules that are not discovered by the header search mechanisms or modulemap parsing.

ahoppen added inline comments.Nov 17 2021, 4:35 AM
clang/lib/Lex/HeaderSearch.cpp
95

These indices will be out of date if the search paths are changed via HeaderSearch::SetSearchPaths or HeaderSearch::AddSearchPath after the first module map has been loaded. One could probably adjust the indices in AddSearchPath but maybe it’s easier to not use the index into SearchDirs as the key. An alternative suggestion would be to use std::shared_ptr<DirectoryLookup> instead, changing some of the data structures as follows:

- std::vector<DirectoryLookup> SearchDirs
+ std::vector<std::shared_ptr<DirectoryLookup>> SearchDirs

- std::vector<bool> SearchDirsUsage;
+ std::map<std::shared_ptr<DirectoryLookup>, bool> SearchDirsUsage; 

- llvm::DenseMap<Module *, unsigned> ModuleToSearchDirIdx;
+ llvm::DenseMap<Module *, std::shared_ptr<DirectoryLookup>> ModuleToSearchDirIdx;

- llvm::DenseMap<unsigned, unsigned> SearchDirToHSEntry;
+ llvm::DenseMap<std::shared_ptr<DirectoryLookup>, unsigned> SearchDirToHSEntry;

Still a WIP, but might be get some feedback on the direction.

I'm not 100% sure when to consider a search path that triggered the creation of Module as used. Doing so on Module creation is not useful, since we do that (somewhat) eagerly. The current approach marks search path as used whenever it's responsible for the creation of a Module that gets returned by HeaderSearch::lookupModule. That's not correct in situations when a Module is created via header search (but not actually returned) and then gets queried directly from ModuleMap instead of HeaderSearch.

Can we not consider a modulemap used when we load an AST that depends on that modulemap? It's possible this breaks if the module only includes textual headers though. It really feels like we should have a single place where we actually know if a module is used or not. Long term I would really like to separate modulemap parsing from Module creation, which would be a great place to actually do this.

For now we just need to make sure we report every modulemap search path that wouldn't change semantics if removed.

Can we not consider a modulemap used when we load an AST that depends on that modulemap? It's possible this breaks if the module only includes textual headers though.

I don't think that would be 100% correct. I think it's possible for Module instances (created by parsing a modulemap file) to have semantic meaning even without their AST being loaded. One example (that I just made up but sounds plausible enough to me) is that we could have a fixit for an @import of non-existent module that suggests importing another available module (discovered through header search) with the most similar name. We don't need the AST file for that, but removing search paths could change the diagnostic output.

It really feels like we should have a single place where we actually know if a module is used or not. Long term I would really like to separate modulemap parsing from Module creation, which would be a great place to actually do this.

Agreed. But I'm afraid that's a big undertaking. Do you have any other ideas on how to detect module usage with the current state of things?

Rebase on top of extracted patches.

jansvoboda11 retitled this revision from WIP: [clang][lex] Fix search path usage remark with modules to [clang][lex] Fix search path usage remark with modules.Jan 6 2022, 8:19 AM
jansvoboda11 edited the summary of this revision. (Show Details)
jansvoboda11 marked an inline comment as done.Jan 6 2022, 8:22 AM
jansvoboda11 added inline comments.
clang/lib/Lex/HeaderSearch.cpp
95

Fixed in D116750.

jansvoboda11 marked an inline comment as done.EditedJan 6 2022, 8:49 AM

I'm starting to think it could be fine to keep doing things as outlined by this patch: only marking search paths that discovered a module as used when its Module gets returned from HeaderSearch (and ignoring Modules returned ModuleMap). (Or adopt the AST-reading strategy @Bigcheese suggested.)

I think this could work okay for the purposes of dependency scanning.

If the scanning phase fails with an error, the diagnostics will be presented to the user and the actual compile won't be carried out at all - the pruning optimization won't kick in.
Other kinds of diagnostics (that might be discarded/changed by pruning search paths) will be still visible in the build log of the scanning phase - this might be good enough.
And the preprocessor seems to use HeaderSearch when handling include/import directives without touching ModuleMap, which should work.

If we document this properly in computeUserEntryUsage() I think that may be fine. Any thoughts?

jansvoboda11 marked an inline comment as done.Jan 19 2022, 4:41 AM
jansvoboda11 added inline comments.
clang/lib/Lex/HeaderSearch.cpp
264

Looking more into this, ModuleMap::findModule is used in a lot of different places. Since noteModuleLookupUsage takes SourceLocation of the #include, putting it into findModule that doesn't require SourceLocation ATM would be somewhat intrusive change.

I have cleaned up this patch so that noteModuleLookupUsage is called only in single HeaderSearch::lookupModule. I think this is good enough. WDYT?

clang/lib/Lex/ModuleMap.cpp
851

This is fixed in D116751.

jansvoboda11 abandoned this revision.Mar 16 2022, 4:22 AM

Abandoning this. We don't need to collect usage info from -fimplicit-module-maps because we don't use that feature during explicit builds: D120465. Support for usage collection for modules was removed entirely in D121295.

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 4:22 AM