This is an archive of the discontinued LLVM Phabricator instance.

[modules] Fix error about the same module being defined in different .pcm files when using VFS overlays.
ClosedPublic

Authored by vsapsai on Jul 31 2023, 3:10 PM.

Details

Summary

Fix errors like

module 'MultiPath' is defined in both 'path/to/modules.cache/3JR48BPRU7BCG/MultiPath-1352QHUF8RNMU.pcm' and 'path/to/modules.cache/3JR48BPRU7BCG/MultiPath-20HNSLLIUDDV1.pcm'

To avoid building extra identical modules -ivfsoverlay option is not a
part of the hash like "/3JR48BPRU7BCG/". And it is build system's
responsibility to provide -ivfsoverlay options that don't cause
observable differences. We also need to make sure the hash like
"-1352QHUF8RNMU" is not affected by -ivfsoverlay. As this hash is
defined by the module map path, use the path prior to any VFS
remappings.

rdar://111921464

Diff Detail

Event Timeline

vsapsai created this revision.Jul 31 2023, 3:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 3:10 PM
Herald added a subscriber: ributzka. · View Herald Transcript
vsapsai requested review of this revision.Jul 31 2023, 3:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 3:10 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ChuanqiXu accepted this revision.Jul 31 2023, 6:51 PM

Although there is a FIXME in the definition of getNameAsRequested(), it looks not sense to require you to fix that. It might not be an over burden for someone who will be intended to fix this later. So LGTM.

This revision is now accepted and ready to land.Jul 31 2023, 6:51 PM

Thanks for the quick review!

Although there is a FIXME in the definition of getNameAsRequested(), it looks not sense to require you to fix that. It might not be an over burden for someone who will be intended to fix this later. So LGTM.

I've discussed with @jansvoboda11 the future of FileEntryRef::getNameAsRequested and it is supposed to replace FileEntryRef::getName (eventually, not immediately), if I understand correctly. Given that, it is the move in the right direction regardless of FIXMEs. If I've misunderstood something, Jan can chime in and correct me.

And it is build system's responsibility to provide -ivfsoverlay options that don't cause observable differences.

I wasn't aware of that. Do we document this anywhere? It surprises me that we'd impose such restriction on the build system. This seems fairly easy to accidentally violate and end up in the same situation - Clang instances with different VFS overlays, identical context hashes and different canonical module map paths for the same module.

What are the performance implications of making VFS overlays part of the context hash?

Alternatively, we could keep VFS overlays out of the context hash but create <hash2> from the on-disk real path of the defining module map and make the whole PCM VFS-agnostic. Then it'd be correct to import that PCM regardless of the specific VFS overlay setup, as long as all VFS queries of the importer resolve the same way they resolved within the instance that built the PCM. Maybe we can force the importer to recompile the PCM if that's not the case, similar to what we do for diagnostic options.

CC @benlangmuir, since we've talked about this.

What are the performance implications of making VFS overlays part of the context hash?

I don't have specific measurements but for Xcode projects different targets have different VFS overlays. So adding VFS overlays to the context hash would prevent module sharing between targets. To use llvm+clang as an example it means different libraries will have different versions of the same module. For example, clangLex will have its version of llvmSupport and clangSerialization will have its version too. Actual impact depends on a project but for clang with 29 subdirectories in "clang/lib" an approximate estimate would be 10 times more modules to build. Note that clang build doesn't use VFS, so it won't be an actual impact on clang build but on projects with the size and structure similar to clang.

And it is build system's responsibility to provide -ivfsoverlay options that don't cause observable differences.

I wasn't aware of that. Do we document this anywhere? It surprises me that we'd impose such restriction on the build system. This seems fairly easy to accidentally violate and end up in the same situation - Clang instances with different VFS overlays, identical context hashes and different canonical module map paths for the same module.

There are no documents describing responsibilities of different components, as far as I know. I haven't researched the issue extensively but I think only Xcode build system uses VFS overlays. And if you have only one client, it is a questionable use of time to formalize the interface. And the absence of formal contract allows to evolve different sides faster.

Yes, if a build system provides bad data to a compiler, it can cause errors. And there are no ways to force build systems not to do that. I don't think it is the compiler's responsibility to verify build system behavior. For example, if for incremental build not all impacted files are recompiled (given the dependencies are correct) - the compiler shouldn't track it themselves and recompile the required files.

Alternatively, we could keep VFS overlays out of the context hash but create <hash2> from the on-disk real path of the defining module map and make the whole PCM VFS-agnostic. Then it'd be correct to import that PCM regardless of the specific VFS overlay setup, as long as all VFS queries of the importer resolve the same way they resolved within the instance that built the PCM. Maybe we can force the importer to recompile the PCM if that's not the case, similar to what we do for diagnostic options.

I'm not sure I understand your proposal. Before this change we were calculating hash from the on-disk real path of the defining module map. And due to different VFS/no-VFS options defining module map is at different locations on-disk.

Ping @jansvoboda11 , @benlangmuir in case you still have thoughts on this.

Alternatively, we could keep VFS overlays out of the context hash but create <hash2> from the on-disk real path of the defining module map and make the whole PCM VFS-agnostic. Then it'd be correct to import that PCM regardless of the specific VFS overlay setup, as long as all VFS queries of the importer resolve the same way they resolved within the instance that built the PCM. Maybe we can force the importer to recompile the PCM if that's not the case, similar to what we do for diagnostic options.

I'm not sure I understand your proposal. Before this change we were calculating hash from the on-disk real path of the defining module map. And due to different VFS/no-VFS options defining module map is at different locations on-disk.

My suggestion is to use the actual real on-disk path. Not FileEntryRef::getName(), but something that always behaves as if use-external-name was set to true. I believe this would handle your VFS/VFS-use-external-name-true/VFS-use-external-name-false problem. It would also handle another pitfall: two compilations with distinct VFS overlays that redirect two different as-requested module map paths into the same on-disk path.

My suggestion is to use the actual real on-disk path. Not FileEntryRef::getName(), but something that always behaves as if use-external-name was set to true. I believe this would handle your VFS/VFS-use-external-name-true/VFS-use-external-name-false problem. It would also handle another pitfall: two compilations with distinct VFS overlays that redirect two different as-requested module map paths into the same on-disk path.

Do you suggest doing it for the hashing or for ASTWriter or both? We are already doing some module map path canonicalization (added a comment in corresponding place) but it's not pure on-disk path, it takes into account VFS.

clang/lib/Lex/HeaderSearch.cpp
259–260

Sort of canonicalization and obtaining real on-disk path we are doing.

jansvoboda11 accepted this revision.Aug 7 2023, 8:20 AM

My suggestion is to use the actual real on-disk path. Not FileEntryRef::getName(), but something that always behaves as if use-external-name was set to true. I believe this would handle your VFS/VFS-use-external-name-true/VFS-use-external-name-false problem. It would also handle another pitfall: two compilations with distinct VFS overlays that redirect two different as-requested module map paths into the same on-disk path.

Do you suggest doing it for the hashing or for ASTWriter or both? We are already doing some module map path canonicalization (added a comment in corresponding place) but it's not pure on-disk path, it takes into account VFS.

I mainly meant in <hash2>. And then make the PCM files VFS-agnostic, which would probably require us to do the same thing for all the paths we serialize there. But I don't know how feasible that is. Besides, the scanner relies on the virtual paths in PCM files.

I think your solution is the most pragmatic. If you're confident this doesn't break anything internally, I say go for it. But I think it's good to be aware of the pitfall I mentioned, and make sure the build system doesn't trigger that.

clang/lib/Serialization/ASTWriter.cpp
1330–1331

Can we canonicalize this also? It'd be useful in the scanner.

I think your solution is the most pragmatic. If you're confident this doesn't break anything internally, I say go for it. But I think it's good to be aware of the pitfall I mentioned, and make sure the build system doesn't trigger that.

As far as I have tested, it isn't breaking anything.

clang/lib/Serialization/ASTWriter.cpp
1330–1331

I'm not sure about that. ASTReader has some complicated logic around reading this value https://github.com/llvm/llvm-project/blob/c1803d5366c794ecade4e4ccd0013690a1976d49/clang/lib/Serialization/ASTReader.cpp#L4005 So if we don't have a proven need for it with a test case, I wouldn't change it.

jansvoboda11 added inline comments.Aug 9 2023, 9:55 PM
clang/lib/Serialization/ASTWriter.cpp
1330–1331

Good point. I do have a use for it, but I think it'd be safer to do separately from this patch a qualify it in isolation.

This revision was landed with ongoing or failed builds.Aug 10 2023, 10:48 AM
This revision was automatically updated to reflect the committed changes.