This is an archive of the discontinued LLVM Phabricator instance.

[VFS] Add a VFS wrapper which avoids opening nonexistent files by reading dirs.
Needs ReviewPublic

Authored by sammccall on Sep 26 2018, 6:49 AM.

Details

Summary

This is intended to reduce the number of syscalls when building large
translation units that have long include paths.

This is a WIP for early feedback:

  • obviously it needs tests
  • it will not be on by default
  • the logging and extra instrumentation is temporary to help me tune it

Particularly interested in how to express this without hundreds of
nested switch statements.

Diff Detail

Event Timeline

sammccall created this revision.Sep 26 2018, 6:49 AM
sammccall updated this revision to Diff 167126.Sep 26 2018, 6:50 AM

remove commented code

As noted, this is really early stage, but would appreciate design feedback (I think both of you have spent some time fighting VFS by now...)

This looks like a great improvement. The design looks reasonable to me. The cache states were a bit confusing at first, but I think more documentation when polishing the prototype could help.

I'd be really interested in seeing some profiles before/after this optimization e.g. for clang or clangd.

lib/Basic/AvoidStatsVFS.cpp
45

LessStatFS sounds more accurate lol

47

The comment is very clear, but it's hard to tell how it relates to ReadDirThreshold = 3.

187

Is there any overhead for reading all parent directories, e.g. when directories are large? Or would they be read anyway somewhere else?

sammccall updated this revision to Diff 167435.Sep 28 2018, 1:44 AM

Tests, tweaks to fix edge cases.

sammccall marked 2 inline comments as done.Sep 28 2018, 1:48 AM

Though logs looked good, actual profiles and straces shows we're still stat()ing every file in listed directories.
D44960/rL329338 is the culprit, will make a new patch to fix that.

lib/Basic/AvoidStatsVFS.cpp
45

Haha, renamed to AvoidStatsVFS to match the filename.
(Happy if you prefer another name but they should be consistent I think)

187

There are typically fewer ancestors than children, so this is not too bad.
That said, for /a/long/path/that/doesnt/fork/file.{cc,h} this is suboptimal.

Tweaked the threshold logic so it properly "cascades": if you access both file.h and file.cc then it'll read .../fork/ but not the parent directory yet. (See the new logic around AllowIO)

sammccall planned changes to this revision.Oct 2 2018, 1:10 AM
sammccall marked an inline comment as done.

Hmm, after fixing the stat issue, this still doesn't seem to make a big difference on real FSes.
Putting this on hold for now.
We do have VFSes where I think this might be a win, but I'll let them try it out out-of-tree.

sammccall updated this revision to Diff 167888.Oct 2 2018, 1:15 AM

latest snapshot