This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules][deps] Create more efficient API for visitation of `ModuleFile` inputs
ClosedPublic

Authored by jansvoboda11 on Aug 3 2023, 8:55 PM.

Details

Summary

The current ASTReader::visitInputFiles() function calls into FileManager to create FileEntryRef objects. This ends up being fairly costly in clang-scan-deps, where we mostly only care about file paths.

This patch introduces new ASTReader API that gives clients access to just the serialized paths. Since the scanner needs both the as-requested path and the on-disk one (and doesn't want to transform the former into the latter via FileManager), this patch starts serializing both of them into the PCM file if they differ.

This increases the size of scanning PCMs by 0.1% and speeds up scanning by 5%.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Aug 3 2023, 8:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 8:55 PM
Herald added a subscriber: ributzka. · View Herald Transcript
jansvoboda11 requested review of this revision.Aug 3 2023, 8:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 8:55 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Remove leftover std::string constructor

CC @vsapsai since he was also making name vs. name-as-requested change in modules

clang/include/clang/Serialization/ModuleFile.h
72

Is there something using this new split? It seems like every user is using && for these

clang/lib/Serialization/ASTReader.cpp
2411

It's not clear to me why this one changed

9323

Can we rewrite visitInputFiles on top of this?

jansvoboda11 added inline comments.Aug 4 2023, 3:26 PM
clang/include/clang/Serialization/ModuleFile.h
72

You're right. I recall needing this on (now probably abandoned) patch, so I thought I might as well add the extra bit now, sine I'm changing the format anyways. I'm fine with removing this.

clang/lib/Serialization/ASTReader.cpp
2411

This actually maintains the same semantics - FI.Filename was previously the as-requested path, now it's the on-disk path. Without changing this line, FileManager::getFileRef() would get called with the on-disk path, meaning consumers of this function would get the incorrect path when calling FileEntryRef::getNameAsRequested() on the returned file. I recall one clang-scan-deps test failing because of it.

9323

I can try consolidating these in a follow-up if you're fine with that.

benlangmuir accepted this revision.Aug 4 2023, 3:33 PM
benlangmuir added inline comments.
clang/include/clang/Serialization/ModuleFile.h
72

It's fine, I was just making sure I didn't miss something.

This revision is now accepted and ready to land.Aug 4 2023, 3:33 PM
jansvoboda11 added inline comments.Aug 4 2023, 4:20 PM
clang/lib/Serialization/ASTReader.cpp
2411

Just remembered: in ModuleDepCollector.cpp, we call ModuleMap::getModuleMapFileForUniquing(). This calls SourceMgr.getFileEntryRefForID() based on Module::DefinitionLoc, which triggers deserialization of the associated source location entry and ends up calling this function right here. We'd end up with the on-disk module map path for modular dependencies.

vsapsai accepted this revision.Aug 8 2023, 7:58 PM

The existing FilenameAsRequested usage looks good to me. Unfortunately, don't know how to make sure all places where FilenameAsRequested is required instead of Filename are updated.

Landing with one more use of Filename converted to FilenameAsRequested (in call to Listener.visitInputFile()). The only remaining usages of Filename is now in the scanner (intentional) and in ASTReader when deciding whether an InputFileInfo has already been deserialized or not (default-initialized InputFileInfo has an empty Filename).

This revision was landed with ongoing or failed builds.Aug 9 2023, 10:19 AM
This revision was automatically updated to reflect the committed changes.