For virtual files to work correctly on Windows, we need to canonicalize
the paths before looking into the caches.
This is basically a request for comments - if we want to go this route, I will
make all FileManager methods canonicalize the paths.
Differential D11944
Nativize filename in FileManager::getFile(). klimek on Aug 11 2015, 7:15 AM. Authored by
Details
Diff Detail Event TimelineComment Actions The case sensitivity stuff is related, but a much larger problem - while the native paths are system dependent, the case sensitivity is file system dependent - I can have case insensitive file systems on any OS. Comment Actions In principle, normalizing slashes on Windows makes sense here. But we shouldn't use llvm::sys::path::native, because it's just too broken.
Comment Actions Ok, so if I understand you correctly it would make sense to create a normalize function for the file manager (it's all about the file manager's caching anyway). The idea would be that on Windows we replace all / with \, otherwise we leave things as they are?
Comment Actions I wish, in clang/llvm, internal path separator would be slash. For me, native-ization is nightmare. Comment Actions Are you saying that all MS functions we access support / as path separator on the platforms we support, or that we should canonicalize all filenames to / when we first get them, and convert them to \-based when we hit the MS file APIs? Comment Actions I think she wishes your second option. Note that windows API actually do accept slash as seperator and need no conversion. https://en.wikipedia.org/wiki/Path_(computing)#MS-DOS.2FMicrosoft_Windows_style The issues are with apps that interpret slash as switch, mainly the VC toolchain and with humans that may expect to see backward slashes. Not all apps do. Windows ports of the gnu utils accept slash just fine, so both ls ../test ls ..\test work running from cmd.exe. So lit tests can continue to use slash unless they call the real VC toolchain (in which case they can't run on non-Windows anyhow). As for the humans, while it may make sense to backward-slash paths when printing diagnostics, the current output is a mix of whatever-was-used-as-input, slashes and backslahses depending on the code path so any standardization - slash or backslash - would be better. Overall, this makes sense and will simplify code and tests. Comment Actions I would think most Windows users would be surprised if we showed them paths with /s instead of \s, but I'm fine with us using / internally if it doesn't leak out to the user. klimek: if we want to nativize, I think the right thing to do is to remove the wrong "normalization" of non-Windows paths from llvm::sys::path::native, and add a more efficient overload to llvm::sys::path::native, such as StringRef native(StringRef Input, SmallVectorImpl<char> &Buffer). Comment Actions Just to remind everybody, different OSes handle Unicode differently as well. Think of it as case insensitivity for the different Unicode ways of encoding the same glyph. For example U+00DC (LATIN CAPITAL LETTER U WITH DIAERESIS) vs U+0055 (LATIN CAPITAL LETTER U) and U+0308 (COMBINING DIAERESIS). Apple's HFS+ file system always decomposes file names to ensure consistent matching. Other OSes, not so much. Comment Actions Can you try setting up the virtual files with the vfs::InMemoryFileSystem stuff? There are some examples how to set up with OverlayFileSystem in tree. InMemoryFileSystem was written with windows path separators in mind, I just never tried to run it on windows :) |
I have two concerns with this: