This is an archive of the discontinued LLVM Phabricator instance.

Add support for case-insensitive header lookup
Needs ReviewPublic

Authored by hans on Jun 7 2016, 6:12 PM.

Details

Reviewers
thakis
rnk
rsmith
Summary

This is useful when dealing with headers that are normally used on case-insensitive filesystems, such as the Windows SDK, when cross-compiling from a file-system that is case-sensitive (such as Linux). A prime example would be compiling "#include <windows.h>" (the file is really called Windows.h).

There has been a patch for this before (http://reviews.llvm.org/D2972), but this one is more general, as it works on the virtual filesystem layer, and also supports case-insensitive lookups of parent directories not just the filename.

I was initially worried about performance here. The most common case is trying to lookup a header file in the wrong include path, and instead of a simple stat, this patch will cause a full directory listing for each such attempt.

I tried a version of this patch that used a small Bloom filter for each directory to avoid unnecessary searches for non-existing files, but that broke a lot of modules tests. It seems those tests would end up writing to a module cache directory, but the vfs wasn't aware of those writes so subsequent attempts to open files in that directory would fail because of the filter (yay, cache invalidation). Maybe that could be fixed, and maybe such filtering would be useful for broadly too, but perhaps we could look at that separately.

I measured compile-time of a file in V8 on Linux with and without case-sensitive includes, and also with the filter:

Case sensitive: (default)   8.60s +- 2.90%
Case insensitive:           8.74s +- 1.18%
Case insensitive w/ filter: 8.46s +- 0.82%

Adding some folks who were on Saleem's patch.

Diff Detail

Event Timeline

hans updated this revision to Diff 59980.Jun 7 2016, 6:12 PM
hans retitled this revision from to Add support for case-insensitive header lookup.
hans updated this object.
hans added reviewers: rsmith, rnk, thakis.
hans updated this revision to Diff 59982.Jun 7 2016, 6:17 PM

Add --show-includes test.

This looks good to me. Thanks for picking this up! And thanks for the perf numbers!

lib/Basic/VirtualFileSystem.cpp
401

Probably can inline this in the header.

410

Thats slightly difficult to read (the E directory_iterator.

481

These two can probably be inlined into the header.

hans updated this revision to Diff 60054.Jun 8 2016, 9:55 AM
hans marked 2 inline comments as done.

Addressing Saleem's comments, and renaming the flag to -fcase-insensitive-paths, since this doesn't apply just to includes, but anything that goes through vfs, including the main source filename.

thakis edited edge metadata.Jun 8 2016, 10:15 AM

For performance: Can you check how build times for some large target in Chromium on Linux targeting Windows compares with this vs having the sdk in a fat mount? That would give us some data.

The discussion in the include case warning thread sounds like MS might update its headers to have #include lines matching actual file case, so hopefully this won't be needed at all eventually.

hans added a comment.Jun 8 2016, 11:29 AM

For performance: Can you check how build times for some large target in Chromium on Linux targeting Windows compares with this vs having the sdk in a fat mount? That would give us some data.

I measured compile times (perf stat -r5) of the same V8 file as above on Linux, but now targeting Windows:

SDK in vfat-mounted file:                       5.84s +- 1.91%
SDK on normal fs with -fcase-insensitive-paths: 7.18s +- 3.06%
As above, with directory bloom filters:         6.08s +- 0.76%

This makes my patch look slower, but it's a lot more convenient than setting up that file system.

The discussion in the include case warning thread sounds like MS might update its headers to have #include lines matching actual file case, so hopefully this won't be needed at all eventually.

That would be pretty cool!

lib/Basic/VirtualFileSystem.cpp
410

You're right. Moving the declaration of E to its own line further down.

Can you try building a few more files? Say, v8_base?

hans added a comment.Jun 8 2016, 12:54 PM

Can you try building a few more files? Say, v8_base?

Well, that was depressing:

Putting the sdk on a vfat fs:
real    2m26.077s
user    68m31.476s
sys     1m25.702s

Using the flag:
real    9m5.179s
user    69m3.417s
sys     212m47.136s  <--- !

Using the flag with bloom filters:
real    3m1.046s
user    69m5.328s
sys     19m4.368s

It seems the filesystem wasn't so happy about 32 processes doing a lot of readdir() at the same time.

Not sure if we want a flag that adds 50% overhead, no matter how convenient it might be :-/

hans added a comment.Jun 8 2016, 5:06 PM

Not sure if we want a flag that adds 50% overhead, no matter how convenient it might be :-/

I now have a version of the patch that caches directory contents that it lists:

real    2m31.461s
user    68m42.090s
sys     1m51.707s

While the sys time is a little bit higher, real time is on par with putting the SDK on FAT, so I think this is still worth pursuing. I need to clean up the patch and figure out how to invalidate the cache after module things are written, though.

hans updated this revision to Diff 60220.Jun 9 2016, 11:53 AM
hans edited edge metadata.

Adding the version that caches directory contents.

This has the problem the it doesn't play with modules, the rewriter, and possibly others, because they write to the file system without any way for the vfs to find out. I've tried to find a good way to fix that, but didn't come up with anything. One way would be to maintain a global counter like llvm::sys::fileSystemChangeCount which when changed would invalidate the maps, but I don't know if that would be acceptable.

sheu added a subscriber: sheu.Sep 9 2016, 5:18 PM

I'd be very interested in seeing this patch happen. What's the current status here?

Also, w.r.t. the cache invalidation problem -- would it be feasible / a good idea to move users of the FileSystem API to VirtualFileSystem, in general? This would have the nice effect of putting filesystem things all in once place. Then invalidation tracking etc. will be easier.

I'm waiting for this to land as well, it's crucial for coding windows code with libclang assistance on a linux machine.

pcc added a subscriber: pcc.Dec 19 2016, 7:23 PM
pcc added inline comments.
lib/Basic/VirtualFileSystem.cpp
483

I wonder whether it would help with performance to try the original path here first, then do the case insensitive stuff only if that failed.

hans added a comment.Jan 9 2017, 10:15 AM

I'm waiting for this to land as well, it's crucial for coding windows code with libclang assistance on a linux machine.

I'm not actively working on this, so it might take a while.

In the meantime, I recommend mounting a case-insensitive file system on your linux machine.

Godin added a subscriber: Godin.Dec 4 2018, 5:02 PM
orivej added a subscriber: orivej.Sep 4 2019, 7:33 PM
Yoav added a subscriber: Yoav.Jul 6 2021, 12:56 AM

Any progress with this? @hans its been stale for a while now and this feature is crucial

hans added a comment.Jul 6 2021, 3:21 AM

Any progress with this? @hans its been stale for a while now and this feature is crucial

No progress, I'm afraid. I'm not working on this, so someone would have to take over the patch to push it forward.

nitsky added a subscriber: nitsky.Dec 30 2021, 8:34 AM