This is an archive of the discontinued LLVM Phabricator instance.

Added Support for StatOnly Files in VFS.
Needs ReviewPublic

Authored by usaxena95 on Sep 27 2018, 9:52 AM.

Details

Summary

Some files are only Statted by Clang and not read. Clang mostly uses
them for checking the existence of some files and in rare cases uses the
value of the Status to proceed further (for example while loading
module files, it checks the sizes of some files).

This change adds support to represent files that are supposed to be only
statted by Clang.

Diff Detail

Event Timeline

usaxena95 created this revision.Sep 27 2018, 9:52 AM

Could you give some more detail about what this is for?

  • it sounds like this is for record-replay. In the replayed compilation, we assume the accessed files are the same, is it safe to assume we never read files we previously stat()ed? What are these files in practice?
  • it seems like this could also be achieved with an overlayFS adding a simple specialized FS that only provides the stat-only files. If this is a relatively niche feature, cramming it into InMemoryFileSystem may not be the best option. Any thoughts on the tradeoff here?
include/clang/Basic/VirtualFileSystem.h
414

Please add some motivation here, e.g. "such files are useful when replaying a recorded compilation, if the original compilation never read the file contents"

lib/Basic/VirtualFileSystem.cpp
495

can't you just represent this by a null buffer?

kbobyrev added inline comments.
lib/Basic/VirtualFileSystem.cpp
2097

This used llvm::sort(Container &&C, Compare Comp) before, I would think that the range-based API is the preferred one. There's also a cleanup patch migrating STL-like calls to idiomatic LLVM's STL Extension API: D52576.

it sounds like this is for record-replay.

Yes this is for record-replay purpose.

What are these files in practice?

During loading a modules the files that were required to make the module are stat()ed and their sizes are used to verify that the module is still valid. There are occurrences of more stat() only files (apart from the use in modules) which I am not aware of.

In the replayed compilation, we assume the accessed files are the same, is it safe to assume we never read files we previously stat()ed?

I would call it a requirement instead of an assumption. The replay must be exactly the same (even the file stats and reads). If Clang reads the file in replay which was only stat()ed during compilation, it indicates non-determinism or something wrong (in clang or FS).
We currently deal with such files by adding empty buffers for them based on this assumption/requirement only.

it seems like this could also be achieved with an overlayFS adding a simple specialized FS that only provides the stat-only files. If this is a relatively niche feature, cramming it into InMemoryFileSystem may not be the best option. Any thoughts on the tradeoff here?

We can reuse the InMemoryFS to make another FS which just serves stat-only files. A tradeoff I can think of is: we might need to store a random buffer of required size to make final status of the file correct.

usaxena95 added inline comments.Sep 28 2018, 5:48 AM
lib/Basic/VirtualFileSystem.cpp
2097

I think these are some very recent changes and does not belong to my patch. Will sync these. Thanks for pointing out.

I would call it a requirement instead of an assumption. The replay must be exactly the same (even the file stats and reads). If Clang reads the file in replay which was only stat()ed during compilation, it indicates non-determinism or something wrong (in clang or FS).
We currently deal with such files by adding empty buffers for them based on this assumption/requirement only.

Does adding a flag to clang to skip the module inputs checks seems feasible/reasonable? +1 to Sam's comments, the mode for FS that has the file sizes, but not the file contents seems too specific for the use-case at hand, does not fit well into the design.