For virtual files to work correctly on Windows, we need to canonicalize
the paths before looking into the caches.
This is basically a request for comments - if we want to go this route, I will
make all FileManager methods canonicalize the paths.
The case sensitivity stuff is related, but a much larger problem - while the native paths are system dependent, the case sensitivity is file system dependent - I can have case insensitive file systems on any OS.
In principle, normalizing slashes on Windows makes sense here. But we shouldn't use llvm::sys::path::native, because it's just too broken.
I have two concerns with this:
Ok, so if I understand you correctly it would make sense to create a normalize function for the file manager (it's all about the file manager's caching anyway).
The idea would be that on Windows we replace all / with \, otherwise we leave things as they are?
I wish, in clang/llvm, internal path separator would be slash.
It'd be better to use backslash where it'd be really required, for example interface to cmd.exe or MS toolchain.
For me, native-ization is nightmare.
Are you saying that all MS functions we access support / as path separator on the platforms we support, or that we should canonicalize all filenames to / when we first get them, and convert them to \-based when we hit the MS file APIs?
I think she wishes your second option.
Note that windows API actually do accept slash as seperator and need no conversion.
The issues are with apps that interpret slash as switch, mainly the VC toolchain and with humans that may expect to see backward slashes. Not all apps do. Windows ports of the gnu utils accept slash just fine, so both
ls ../test ls ..\test
work running from cmd.exe. So lit tests can continue to use slash unless they call the real VC toolchain (in which case they can't run on non-Windows anyhow).
As for the humans, while it may make sense to backward-slash paths when printing diagnostics, the current output is a mix of whatever-was-used-as-input, slashes and backslahses depending on the code path so any standardization - slash or backslash - would be better.
Overall, this makes sense and will simplify code and tests.
I would think most Windows users would be surprised if we showed them paths with /s instead of \s, but I'm fine with us using / internally if it doesn't leak out to the user.
klimek: if we want to nativize, I think the right thing to do is to remove the wrong "normalization" of non-Windows paths from llvm::sys::path::native, and add a more efficient overload to llvm::sys::path::native, such as StringRef native(StringRef Input, SmallVectorImpl<char> &Buffer).
Just to remind everybody, different OSes handle Unicode differently as well. Think of it as case insensitivity for the different Unicode ways of encoding the same glyph. For example U+00DC (LATIN CAPITAL LETTER U WITH DIAERESIS) vs U+0055 (LATIN CAPITAL LETTER U) and U+0308 (COMBINING DIAERESIS). Apple's HFS+ file system always decomposes file names to ensure consistent matching. Other OSes, not so much.
Can you try setting up the virtual files with the vfs::InMemoryFileSystem stuff? There are some examples how to set up with OverlayFileSystem in tree.
InMemoryFileSystem was written with windows path separators in mind, I just never tried to run it on windows :)