This is an archive of the discontinued LLVM Phabricator instance.

Nativize filename in FileManager::getFile().
Needs ReviewPublic

Authored by klimek on Aug 11 2015, 7:15 AM.

Details

Summary

For virtual files to work correctly on Windows, we need to canonicalize
the paths before looking into the caches.

This is basically a request for comments - if we want to go this route, I will
make all FileManager methods canonicalize the paths.

Diff Detail

Event Timeline

klimek updated this revision to Diff 31813.Aug 11 2015, 7:15 AM
klimek retitled this revision from to Nativize filename in FileManager::getFile()..
klimek updated this object.
klimek added reviewers: rsmith, chapuni.
klimek added a subscriber: cfe-commits.

There is also the case insensitivity issue, see

https://llvm.org/bugs/show_bug.cgi?id=17993

The case sensitivity stuff is related, but a much larger problem - while the native paths are system dependent, the case sensitivity is file system dependent - I can have case insensitive file systems on any OS.

rsmith edited edge metadata.Aug 12 2015, 11:45 AM

In principle, normalizing slashes on Windows makes sense here. But we shouldn't use llvm::sys::path::native, because it's just too broken.

lib/Basic/FileManager.cpp
221–222

I have two concerns with this:

  1. It's needlessly inefficient on non-Win32 platforms (we'll make an unnecessary string copy)
  2. It does something bizarre and wrong on non-Win32 platforms (it converts \ to / unless the \ is followed by another \, making most files containing \ inaccessible)

In principle, normalizing slashes on Windows makes sense here. But we shouldn't use llvm::sys::path::native, because it's just too broken.

Ok, so if I understand you correctly it would make sense to create a normalize function for the file manager (it's all about the file manager's caching anyway).

The idea would be that on Windows we replace all / with \, otherwise we leave things as they are?

yaron.keren added inline comments.Aug 13 2015, 1:51 AM
lib/Basic/FileManager.cpp
221–222

2 is surprising. Any idea why native does this on non-Windows platforms?

chapuni edited edge metadata.Aug 13 2015, 3:20 AM

I wish, in clang/llvm, internal path separator would be slash.
It'd be better to use backslash where it'd be really required, for example interface to cmd.exe or MS toolchain.

For me, native-ization is nightmare.

I wish, in clang/llvm, internal path separator would be slash.
It'd be better to use backslash where it'd be really required, for example interface to cmd.exe or MS toolchain.

For me, native-ization is nightmare.

Are you saying that all MS functions we access support / as path separator on the platforms we support, or that we should canonicalize all filenames to / when we first get them, and convert them to \-based when we hit the MS file APIs?

I think she wishes your second option.

Note that windows API actually do accept slash as seperator and need no conversion.

https://en.wikipedia.org/wiki/Path_(computing)#MS-DOS.2FMicrosoft_Windows_style

The issues are with apps that interpret slash as switch, mainly the VC toolchain and with humans that may expect to see backward slashes. Not all apps do. Windows ports of the gnu utils accept slash just fine, so both

ls ../test
ls ..\test

work running from cmd.exe. So lit tests can continue to use slash unless they call the real VC toolchain (in which case they can't run on non-Windows anyhow).

As for the humans, while it may make sense to backward-slash paths when printing diagnostics, the current output is a mix of whatever-was-used-as-input, slashes and backslahses depending on the code path so any standardization - slash or backslash - would be better.

Overall, this makes sense and will simplify code and tests.

I would think most Windows users would be surprised if we showed them paths with /s instead of \s, but I'm fine with us using / internally if it doesn't leak out to the user.

klimek: if we want to nativize, I think the right thing to do is to remove the wrong "normalization" of non-Windows paths from llvm::sys::path::native, and add a more efficient overload to llvm::sys::path::native, such as StringRef native(StringRef Input, SmallVectorImpl<char> &Buffer).

alexr added a subscriber: alexr.Aug 17 2015, 6:57 PM

Just to remind everybody, different OSes handle Unicode differently as well. Think of it as case insensitivity for the different Unicode ways of encoding the same glyph. For example U+00DC (LATIN CAPITAL LETTER U WITH DIAERESIS) vs U+0055 (LATIN CAPITAL LETTER U) and U+0308 (COMBINING DIAERESIS). Apple's HFS+ file system always decomposes file names to ensure consistent matching. Other OSes, not so much.

+Benjamin to share another yak :D

bkramer edited edge metadata.Nov 2 2015, 3:29 PM

Can you try setting up the virtual files with the vfs::InMemoryFileSystem stuff? There are some examples how to set up with OverlayFileSystem in tree.

InMemoryFileSystem was written with windows path separators in mind, I just never tried to run it on windows :)