This is an archive of the discontinued LLVM Phabricator instance.

[modules] Avoid module relocation error when building modules from VFS
AbandonedPublic

Authored by bruno on Apr 6 2016, 5:15 PM.

Details

Summary

The reproducer script (generated by the crash reproducer) invokes clang
and tries to rebuild modules using the headers in the .cache/vfs
directory. Depending on the order that modules are constructed it is
possible that there's a mismatch between the directory that a module was
constructed and the path that it was found:

fatal error: module '_Builtin_stddef_max_align_t' was built in directory
'/Users/bruno/Dev/install/clang/bin/../lib/clang/<ver>/include'
but now resides in directory
'/<path_to>.cache/vfs/Users/bruno/Dev/install/clang/lib/clang/<ver>/include'

This happens because getFile() can leak the original path depending on
the way it's accessed. This seems to be known issue and one that demands
a lot of work to get rid of, according to:

[cfe-dev] Module maps, __FILE__ and lazy stat'ing of header
http://lists.llvm.org/pipermail/cfe-dev/2014-July/038273.html

I tried a couple of fixes to the issue but this is the less invasive I
could come up with. This fixes the issue by first looking into modules
by using virtual dir paths instead of the real ones. Let me know if
there's a better solution or whether I'm missing something.

With this change on Darwin we are able to simulate a crash for a simple
application using "Foundation/Foundation.h" (which relies on a bunch of
different frameworks and headers) and successfully rebuild all the
modules by relying solely at the VFS overlay.

Diff Detail

Event Timeline

bruno updated this revision to Diff 52873.Apr 6 2016, 5:15 PM
bruno retitled this revision from to [modules] Avoid module relocation error when building modules from VFS.
bruno updated this object.
bruno added reviewers: rsmith, benlangmuir, klimek.
bruno added subscribers: aprantl, dexonsmith, cfe-commits.
benlangmuir edited edge metadata.Apr 11 2016, 1:11 PM

This doesn't appear to be a safe change. We can't assume that RootName wouldn't ever be legitimately found somewhere in a path (even without a VFS) and changing the path prefix could give a completely different location. I would also be concerned about the performance impact of performing two lookups, since finding a module for a header is not always a cheap operation.

I'm not sure off hand how else to fix this.

rsmith edited edge metadata.Apr 11 2016, 2:10 PM

Can you find who's poisoning the FileManager's idea of the name of the file with a full path to the VFS overlay and fix that?

lib/Lex/HeaderSearch.cpp
1163

remove_dot_dot=true is not a correct transformation to do to a path and can result in the wrong file being found.

bruno updated this revision to Diff 53363.Apr 11 2016, 9:32 PM
bruno edited edge metadata.

Hi Richard & Ben,

Thanks for the review!

I've attached a new patch and changed the approach: made findUsableModuleForHeader receive a FileName
instead of FileEntry. Answering the questions below:

FileManager::getFile returns the same filepath when a file is first requested and found. However,
for cached searches it returns the real path to the file. In the scenario of the testcase, the leak
happens when HeaderSearch::getFileAndSuggestModule calls getFileMgr().getFile for bar.h, prior to
calling findUsableModuleForHeader. Since bar.h is already cached, the real path is passed to findUsableModuleForHeader,
which triggers down the road the module redefinition error.

From what I understand from http://lists.llvm.org/pipermail/cfe-dev/2014-July/038273.html, changing
the behavior of the cached result has side-effects and demands a non trivial change, am I assuming
the right thing?

From what I understand from http://lists.llvm.org/pipermail/cfe-dev/2014-July/038273.html, changing the behavior of the cached result has side-effects and demands a non trivial change, am I assuming the right thing?

Yes.

Threading through the Filename seems fine to me. Specific comments below.

lib/Basic/FileManager.cpp
221–223 ↗(On Diff #53363)

The first time we lookup a file, we will return whatever name comes back from the VFS, which may or may not be the requested path. If this is a real file system, it will be the requested path. If it is a redirecting file system and "use-external-names" is true (which it is by default), then the VFS will return the external contents path, which is generally not the same as the requested path.

lib/Lex/HeaderSearch.cpp
493

What does this change do? FE->getDir()->getName() will be the virtual path if there is VFS, which is usually the one you want to operate on.

Hi Ben,

lib/Basic/FileManager.cpp
221–223 ↗(On Diff #53363)

The behaviour I'm seeing for VFS is a bit different: If it is a redirecting file system and "use-external-names" is true, then the FileManager returns the requested file path. Sub-sequential cached queries return the "use-external-names" / real-path instead. This is the rationale why I'm trying to thread the FileName instead of the FileEntry. Am I missing something?

lib/Lex/HeaderSearch.cpp
493

You're right! I assumed FE->getDir()->getName() would give me the real path directory for the file in question, but I double checked here and that doesn't happen.

benlangmuir added inline comments.Apr 12 2016, 12:47 PM
lib/Basic/FileManager.cpp
221–223 ↗(On Diff #53363)

That's not supposed to happen. You can see below that we always use the name from the StatCache, which in turn comes from the the VFS status() result. Have you traced through what's happening here? We should figure out how we got the requested path as the name.

bruno abandoned this revision.Apr 12 2016, 4:35 PM

Hi Ben,

lib/Basic/FileManager.cpp
221–223 ↗(On Diff #53363)

I double checked here and you're right, it actually always return the real path for the first request *and* the cached one when "use-external-names" == true. I was misguided by the fact that other places are using the virtual FileName instead of the one in FileEntry, this further lead to inconsistent paths. I'm abandoning this patch since it does not make sense for the crash reproducer, using "use-external-names" = false is the right thing there.