This is an archive of the discontinued LLVM Phabricator instance.

New option -fwindows-filesystem, affecting #include paths.
Needs ReviewPublic

Authored by simon_tatham on Jun 27 2018, 3:26 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This option interposes a wrapper implementation of VirtualFileSystem
in front of the one in the CompilerInstance. The wrapper filesystem
differs from the standard one in that it tolerates backslashes as a
path separator even if the native path syntax doesn't, and also looks
up filenames case-insensitively even if the native file system is
case-sensitive.

This is intended to be useful to anyone compiling source code on Unix
whose authors had never prepared it to be compiled on anything but
Windows, so that it uses backslashes and inconsistent filename case in
its #include directives.

One particular example is if you're running clang-cl as a cross-
compiler, to build a Windows application on a Unix build host; another
is if you're developing a Unix port of a Windows code base in a
downstream clone of their repository. In either case, if you can't get
the upstream authors to take patches that make their #include
directives more portable, you might instead use this option to get
clang to tolerate the non-portable ones.

Diff Detail

Event Timeline

simon_tatham created this revision.Jun 27 2018, 3:26 AM

As I mentioned on llvm-dev, this is an unfinished draft. I'm interested in review comments, but I already know things like 'needs more comments / tests / careful error handling' :-)

Some 'known unknowns':

  • I'm caching the results of every directory scan, to save effort when more include files are needed from the same directory. That assumes that the lifetime of this filesystem object is short enough that we don't have to worry about new files appearing in directories we've already looked at, which seems like a reasonable assumption in an ordinary compile, but I don't know if the same filesystem abstraction might be used in completely different contexts elsewhere in the LLVM ecosystem and have a longer lifetime. Do I need to take a lot more care with my cache persistence?
  • I was wondering if it might be worth having a second command-line option, to turn on only the case-insensitivity and not the backslash-separator tolerance (e.g. for use on source code developed for Mac rather than Windows).
  • The option name is the first one that sprang to mind and I'm expecting someone will probably suggest a better one.
thakis added a subscriber: thakis.Jun 27 2018, 6:08 AM

There are 2 other patches out there for the case sensitivity. Neither landed, because the performance hit form this approach is pretty big, and it's not necessary: You can either put the Win SDK into a ciopfs mount (example: https://cs.chromium.org/chromium/src/build/vs_toolchain.py?type=cs&q=ciopfs&sq=package:chromium&g=0&l=377 , used by chrome's win/cross build https://chromium.googlesource.com/chromium/src/+/master/docs/win_cross.md) or use -ivfsoverlay (example: http://llvm-cs.pcc.me.uk/cmake/platforms/WinMsvc.cmake#104 , used by llvm's win/cross build).

Have you measured the perf hit from your approach?

dmajor added a subscriber: dmajor.Jun 27 2018, 6:20 AM

No, I haven't measured it. Partly because that was one of the (many) things I was going to leave until after I didn't get feedback on the first draft that discouraged me from the whole idea :-) and also because I've already thought of one thing I can do to speed it up that I haven't done yet.

(Namely, break the pathname down recursively from the RH end instead of iteratively from the LH, so that in the case where the whole pathname is already case-correct, the lookup can succeed immediately. Then any performance hit should be limited to only the cases that actually need fixing up.)

I can see that if you're on Linux specifically then a FUSE filesystem can solve the case insensitivity, albeit in a way that needs setup and teardown. But surely backslash path separators are still problematic in that situation? (Or does ciopfs have an option I didn't spot to fudge those too?)

The things I pointed at only do case-insensitivity. I haven't seen anything using backslashes in the builds I worked on. (Is that for -I flags?) I'd think that the backslash bits can probably be implemented with less overhead.

FWIW, I'm also interested in this. I wasn't aware of the other ways of achieving this used by chromium, but although neat, I'd prefer something that doesn't require that kind of setup.

As for performance - although I haven't studied how the code works - wouldn't it be possible to try the normal straightforward case sensitive open first? Then if that fails, we'd fall back on trying doing a case insensitive search (in either direction). That way, the performance hit only affects compilations that would otherwise fail, so it shouldn't be prohibitive for anything else than the specific headers that are problematic. Or perhaps that's already been taken into consideration?

Adding some people who are interested in Windows cross-compilation.

@thakis : no, the backslash support is needed for include directives in source files, not just on command lines, because in my experience it's not unusual for Windows-only code bases to be full of things like #include "Subdir\Header.h".

@mstorsjo : in this draft, I traverse the pathname from left to right, and at each level, if the normal case-sensitive lookup succeeds then I don't scan the whole directory looking for other options. As I mentioned above, though, I've since realised I should have done it the other way round, trying the case sensitive lookup on the whole pathname in one go first, and not even trying it one level at a time unless that fails.

Although, come to think of it, that's not good enough, because if you have multiple directories on your include path then you expect a lot of lookups to fail for reasons that have nothing to do with case. Say, if you compile with -Ifoo -Ibar, then every include of a file intended to be found in bar will force lots of pointless directory enumeration in foo when the initial lookup there doesn't find the file. Hmmm. That suggests to me that perhaps this ought not to be a global -ffudge-my-paths type option applying to all include directories, and instead perhaps it should be a new kind of -I option, so that only lookups relative to that particular include-path directory would get the unusual semantics. What do people think?

Although, come to think of it, that's not good enough, because if you have multiple directories on your include path then you expect a lot of lookups to fail for reasons that have nothing to do with case. Say, if you compile with -Ifoo -Ibar, then every include of a file intended to be found in bar will force lots of pointless directory enumeration in foo when the initial lookup there doesn't find the file. Hmmm. That suggests to me that perhaps this ought not to be a global -ffudge-my-paths type option applying to all include directories, and instead perhaps it should be a new kind of -I option, so that only lookups relative to that particular include-path directory would get the unusual semantics. What do people think?

The cases where I would have needed this (the windows sdk), I'm not specifying that path explicitly via -I directives, but implicitly via the INCLUDE env variable, but perhaps limiting this to the directories known to have problematic case could be helpful performance wise.

Otherwise I guess this would need to go on a higher level in the algorithm; first check all directories in a fully case sensitive manner, just like today. And if that fails (and we'd have a real failure that we'd otherwise return to the caller), redo it all with case insensitivity. I think that would be the best performance wise, but that makes for much messier integration...

first check all directories in a fully case sensitive manner, just like today. And if that fails (and we'd have a real failure that we'd otherwise return to the caller), redo it all with case insensitivity.

I agree that the integration would be a bigger headache doing it that way. It would also depart from the expected semantics – a correctly cased header file in an include directory late on the path would override an incorrectly cased one earlier, which isn't how the real Windows compiler would handle the same situation.

The cases where I would have needed this (the windows sdk), I'm not specifying that path explicitly via -I directives, but implicitly via the INCLUDE env variable

*blush* I ought to have thought of that, because I've noticed the same problem!

but perhaps limiting this to the directories known to have problematic case could be helpful performance wise.

Yes. OK, in that case I think my proposed replacement design is to have an option along the lines of -fwindows-paths=prefix (spelling still open to change), with the effect that only pathname lookups starting with that prefix will be subject to this transformation.