This is an archive of the discontinued LLVM Phabricator instance.

Use existing path sep style in clang::FileManager::FixupRelativePath
AbandonedPublic

Authored by ctetreau on Jun 2 2020, 3:36 PM.

Details

Summary

Modify clang::FileManager::FixupRelativePath to use the existing path
separator style, if one exists. Prior to this change, if a path is set
for the FileSystemOpts workingDir that does not match the native path
separator style, this can result in a path with mixed separators. This
was causing an assertion failure in
clang::test/ClangScanDeps/vfsoverlay.cpp on my machine.

Diff Detail

Event Timeline

ctetreau created this revision.Jun 2 2020, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2020, 3:36 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Please make sure all of the clang VFS tests still pass before submitted this. VFS paths are inherently problematic with llvm::sys::path because they can legitimately be in a hybrid of Windows and Posix styles. Guessing the style from the first separator you see can be misleading, but I don't know whether that's ever wrong in this path.

Exactly which assertion is firing? Is it possible it's the assertion that's wrong?

@amccarth

The assert that I am getting is at line 1701 of VirtualFileSystem.cpp:

assert(!isTraversalComponent(*Start) &&
         !isTraversalComponent(From->getName()) &&
         "Paths should not contain traversal components");

path is C:/[path]/[to]/[build]/tools/clang/test/ClangScanDeps/Output/vfsoverlay.cpp.tmp.dir\\. (notice the trailing windows path separator)

The path is canonicalized on line 1688 of VirtualFileSystem.cpp in lookupPath. The canonicalization fails to remove the trailing '.' because it detects (using the same method I am using in my patch) that the file path has posix separators and it sees "vfsoverlay.cpp.tmp.dir\\." as one path component. Perhaps the real fix is to make canonicalize handle mixed separators? I'm guessing that canonicalize is implemented as it is for performance reasons? This comment in the body of canonicalize "Explicitly specifying the path style prevents the direction of the slashes from changing" leads me to believe that it is desirable that the path separators not be changed, so a new implementation would need to walk the entire string and fail if mixed separators are encountered, or llvm::sys::path::remove_dots needs to also be changed to have a mixed-separators version.

The test clang::test/ClangScanDeps/vfsoverlay.cpp causes this on my machine. It is unfortunate that for reasons beyond my comprehensions, I often get test failures on my machine that are not caught by the builders, so I have no idea why the builders don't see this. Perhaps they aren't running this test?

Regardless, I think that using even a flawed method to detect what the default path separator should be might be better than just assuming native. Either it will be correct, or the path already had mixed separators, and it doesn't actually matter.

ctetreau abandoned this revision.Jun 3 2020, 3:34 PM

After some further investigation, I have come to believe that the root cause of the issue I am seeing is on line 783 of clang/lib/Lex/HeaderSearch.cpp. A path is constructed using string concatenation (dir + '/' + file), which is obviously not robust to the various issues in path construction. A fix had been committed and reverted back in 2015. Upon restoring the fix, I see that it causes several other test failures. Unfortunately, I do not have the bandwidth to fully resolve this issue myself, so I have opened a bug for it: https://bugs.llvm.org/show_bug.cgi?id=46187

While I still believe the issues I mentioned upthread should probably be addressed, they are much less pressing having discovered the root cause of my particular issue.

After some further investigation, I have come to believe that the root cause of the issue I am seeing is on line 783 of clang/lib/Lex/HeaderSearch.cpp. A path is constructed using string concatenation (dir + '/' + file), which is obviously not robust to the various issues in path construction. A fix had been committed and reverted back in 2015.

When I was fixing portability problems with VFS paths, I started out by trying to make paths canonical, and that always led to roadblocks. A clang developer told me that clang philosophically does not try to do any regularization of paths. It turns out that's actually key to making VFS paths viable. Since they can truly consist of a mix of styles, there is no "correct" canonical form. Once I took that approach, most of the VFS portability problems were simple to fix without inflicting collateral damage. So I'm not surprised that the 2015 "fix" causes problems.

I'm happy to look at future proposals, and I'll CC myself on that bug report. But since you've said you have other priorities now, I'll treat this patch as dormant.