This patch normalizes filenames in DependencyScanningWorkerFilesystem so that lookup of ignored files works correctly on Windows (where / and \ are equivalent).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp | ||
---|---|---|
168–169 | I'm a bit nervous about the impact of modifying the input filename on Windows before passing it into other APIs. This could change behaviour of lower layers of the VFS (since they'll see a different filename than when DependencyScanningWOrkerFileSystem is NOT on top of them). Can we restrict this just to what's passed to IgnoredFiles? (Maybe add shouldIgnore() API, which returns false if the set is empty, and then locally converts to native and checks for membership...) It also seems wasteful to be calling sys::path::native and the memcpy all the time, when usually it has no effect. Have you checked whether this affects performance of scanning something big? | |
175–176 | Looking at this, makes me wonder if this is just fixing a specific instance of a more general problem. Maybe IgnoredFiles should be a set of FileEntrys instead of StringRefs... but that'd create a different performance bottleneck when the set is big, since creating the FileEntrys would be expensive. We'd want the FileEntry lookup to be globally cached / etc. -- and FileManager isn't quite safe to use globally. Do you think IgnoredFiles as-is will work well enough for where it'll be used for PCH? Or do we need to catch headers referenced in two different ways somehow? |
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp | ||
---|---|---|
168–169 | Yeah, I can see that path changing between VFS layers can be problematic. I'm pretty sure we can get away with only converting Filename to its native form when interacting with IgnoredFiles. I haven't checked the performance impact. If it ends up being measurable, I could implement something like sys::path::is_native and avoid the copy most of the time on unix-like OSes. WDYT? | |
175–176 | I think we could use llvm::sys::fs::UniqueID instead of the filename to refer to files. Since the VFS layer resolves symlinks when stat-ing a file, that should be a canonical file identifier. I can tackle that in a follow up patch. |
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp | ||
---|---|---|
168–169 | Probably it'll end up not being measurable, but if it is, something like is_native might help... that said, if this will eventually be replaced with logic relyin on fs::UniqueID it might not be worth optimizing. | |
175–176 | Yup, a unique ID should work for a file identifier. I'm concerned about the cost of looking up the unique ID — avoiding stat traffic was measured to be an important performance benefit in the dependency scanner model. To avoid a perf regression, I think you could use caches like:
Where "ids" and "originals" are read/cached in lock-step when accessing a filename, additionally computing "minimized" if not in the ignore-list. (Adding a file to the ignore-list would put content in "ids" and "originals".) The goal is to amortize the stat cost across the lifetime of the service while ensuring a consistent view of the file content. WDYT? ... regardless I think all of this is out of scope for the current patch, which is still useful for unblocking adding tests to the subsequent patches in the stack. |
With the call to llvm::sys::path::native scoped only to IgnoredFiles, would this patch LGTY?
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp | ||
---|---|---|
168–169 | Agreed. | |
175–176 | Yes, this is the cache structure I had in mind. I agree that this should be tackled in a follow-up patch. I'm going to create a patch with xfailing test case that demonstrates how one file with two different names (e.g. symlink) can cause issues with the current approach. |
Yes, this LGTM once you update that.
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp | ||
---|---|---|
175–176 | Might be nice to include that XFAIL'ed test in this patch, as well as a FIXME in the code, documenting the general problem. But if you'd rather land that separately/after it's fine with me. |
I'm a bit nervous about the impact of modifying the input filename on Windows before passing it into other APIs. This could change behaviour of lower layers of the VFS (since they'll see a different filename than when DependencyScanningWOrkerFileSystem is NOT on top of them).
Can we restrict this just to what's passed to IgnoredFiles? (Maybe add shouldIgnore() API, which returns false if the set is empty, and then locally converts to native and checks for membership...)
It also seems wasteful to be calling sys::path::native and the memcpy all the time, when usually it has no effect. Have you checked whether this affects performance of scanning something big?