This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules] Serialize VFS overlay paths into PCMs
ClosedPublic

Authored by jansvoboda11 on Oct 10 2022, 6:34 PM.

Details

Summary

With implicitly built modules, the importing CompilerInstance assumes PCMs were built in a "compatible way" (i.e. with similarly set up instance). Either because their context hash matches, or because this instance has just built them.

There are some use-cases, however, where this assumption doesn't hold, libclang/c-index-test being one of them. There, the importing instance (or ASTUnit) is being set up while the PCM file is being deserialized. Until now, we've assumed the serialized paths to input files are the actual on-disk files, meaning the default physical VFS was always able to resolve them. This won't be the case after D135636. Therefore, this patch makes sure ASTUnit is initialized with the same VFS as the PCM it's deserializing - by storing paths to the VFS overlay files into the PCM itself.

For the VFS overlay files to be adopted at the very start of PCM deserialization, they are stored in a new section in the unhashed control block, together with header search paths and system header prefixes. The move to the unhashed control block should be safe: if two modules were built with different header search paths and they produced different results, the hashed part of the PCM file will reflect that.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Oct 10 2022, 6:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 6:34 PM
Herald added a subscriber: ributzka. · View Herald Transcript
jansvoboda11 requested review of this revision.Oct 10 2022, 6:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2022, 6:34 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jansvoboda11 edited the summary of this revision. (Show Details)Oct 10 2022, 6:42 PM

This seems fine to me but note that we no longer depend on the functionality that test/Index/index-module-with-vfs.m is testing (and not sure anyone else does), so if there is another change affecting it that is more complicated we could consider removing the test.

clang/include/clang/Frontend/CompilerInvocation.h
298

The idiomatic LLVM way is to use ArrayRef for the parameter.

akyrtzi added inline comments.Oct 11 2022, 11:15 AM
clang/lib/Frontend/CompilerInvocation.cpp
4741

A bit nitpick but I'd suggest changing to std::move(BaseFS)

This seems fine to me but note that we no longer depend on the functionality that test/Index/index-module-with-vfs.m is testing (and not sure anyone else does), so if there is another change affecting it that is more complicated we could consider removing the test.

Thanks, good to know. Other users of ASTUnit might be relying on that (e.g. replay AST), so still I think it would be nice to land this. Though it seems this requires D67010 ([Modules] Move search paths from control block to unhashed control block). That's because the unhashed control block is read before we start validating imports (and their input files). Without that patch, test/Index/index-module-with-vfs.m still starts to fail in D135636. Another solution would be to ensure the options block is serialized at the start of the control block.

Use ArrayRef, move VFS smart pointer

akyrtzi accepted this revision.Oct 11 2022, 3:47 PM
This revision is now accepted and ready to land.Oct 11 2022, 3:47 PM

I think we should deduplicate the vfs overlays if the same ivfsoverlay is specified in both the pcm and the command-line.

@bnbarham any concern about overlay vs chaining behaviour here? I remember you looking at that a while ago.

I think we should deduplicate the vfs overlays if the same ivfsoverlay is specified in both the pcm and the command-line.

@bnbarham any concern about overlay vs chaining behaviour here? I remember you looking at that a while ago.

It still chains right now since I never ended up finishing those patches. This does explicitly *chain* with the command-line specified FS, not sure if one makes sense over the other here. Should we be passing in the real FS to the overlay creation here and then creating an actual overlay FS with both the original FileManager VFS + the PCM VFS (or building the overlay from scratch with the original overlays as well, though that still chains so...)?

Do reproducers read PCMs at all or do they rebuild from the headers?

I think we should deduplicate the vfs overlays if the same ivfsoverlay is specified in both the pcm and the command-line.

My understanding is that ASTUnit never uses command-line and always adopts whatever is in the PCM.

Do reproducers read PCMs at all or do they rebuild from the headers?

I don't know. How are they related to this patch?

I think we should deduplicate the vfs overlays if the same ivfsoverlay is specified in both the pcm and the command-line.

My understanding is that ASTUnit never uses command-line and always adopts whatever is in the PCM.

It's now using the FS from FileManager as the base FS and I imagine was already just using the FileManager FS, so we are (and were) implicitly using the command-line overlays.

Do reproducers read PCMs at all or do they rebuild from the headers?

I don't know. How are they related to this patch?

It was more related to the chaining discussion. If reproducers do use PCMs then that would imply we *have* to chain, since we'd need to map from the path in the PCM -> original path (with PCM overlay) -> reproducer path (with reproducer overlay).

I suspect the answer is here is that they don't as it'd be easier to just include all the headers (I think?).

Ah, I now see that there are modes where ASTUnit gets configured with command-line arguments.

I'm not very familiar with its clients, but I'd be inclined to:

  1. If the original HeaderSearchOptions didn't have any VFS overlay files, adopt the PCM ones.
  2. If the original HeaderSearchOptions did have VFS overlay files of its own, error out if the PCM has different ones.

Any thoughts?

Move header search paths into unhashed control block.

Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2022, 12:51 PM
jansvoboda11 edited the summary of this revision. (Show Details)Dec 2 2022, 1:00 PM

If the original HeaderSearchOptions didn't have any VFS overlay files, adopt the PCM ones.

IIUC this is what the patch does, right?

If the original HeaderSearchOptions did have VFS overlay files of its own, error out if the PCM has different ones.

Where did we land on this on?

clang/include/clang/Serialization/ASTReader.h
179

We should document what the expectations are for ReadHeaderSearchOptions and ReadHeaderSearchPaths callbacks: will paths be set when ReadHeaderSearchOptions is called? Will non-paths be set when ReadHeaderSearchPaths is called? I assume we want to say "no" so that we're not tied to callback order, but we should document it.

If the original HeaderSearchOptions didn't have any VFS overlay files, adopt the PCM ones.

IIUC this is what the patch does, right?

Yes, that's correct.

If the original HeaderSearchOptions did have VFS overlay files of its own, error out if the PCM has different ones.

Where did we land on this on?

We didn't. I looked into the current behavior once again and noticed that for language options and the target, we adopt whatever the PCM reports for the first time the ASTReader callbacks get called and ignore everything else. I didn't want to complicate things beyond that, since @akyrtzi mentioned this mode of ASTUnit (adopting stuff from PCMs) isn't being actively used anyways, so I did the same thing.

clang/include/clang/Serialization/ASTReader.h
179

Sounds good.

benlangmuir accepted this revision.Dec 2 2022, 2:52 PM

Thanks for the explanation, make sense. LGTM with the documentation tweak!

This revision was landed with ongoing or failed builds.Dec 2 2022, 4:12 PM
This revision was automatically updated to reflect the committed changes.