Support: Have directory_iterator::status() return FindFirstFileEx/FindNextFile results on Windows.
ClosedPublic

Authored by pcc on Oct 9 2017, 9:36 PM.

Details

Summary

This allows clients to avoid an unnecessary fs::status() call on each
directory entry. Because the information returned by FindFirstFileEx
is a subset of the information returned by a regular status() call,
I needed to extract a base class from file_status that contains only
that information.

On my machine, this reduces the time required to enumerate a ThinLTO
cache directory containing 520k files from almost 4 minutes to less
than 2 seconds.

Diff Detail

Repository
rL LLVM
pcc created this revision.Oct 9 2017, 9:36 PM

Just a drive-by comment.

clang-tools-extra/modularize/CoverageChecker.cpp
250 ↗(On Diff #118314)

Just a thought...

Is there much value in explicitly declaring the return type, especially since it happens a lot in this patch? I know the LLVM Coding Standards don't want us to always use auto, but, here, the type is so garrulous that it kind of obscures what's going on. Consider:

const auto ErrorOrStatus = I->status();
if (!ErrorOrStatus)
  return false;
ss::fs::file_type Type = ErrorOrStatus->type();

Another thought is that ErrorOr<> might be overkill for methods that get the file or directory status. Perhaps any error in retrieving the status should be represented in the status object itself. The most likely error is probably that the file or directory doesn't exist. Given that there's an exists() function that takes a status, it seems status needs to be able to represent some error conditions anyway.

pcc added inline comments.Oct 10 2017, 1:04 PM
clang-tools-extra/modularize/CoverageChecker.cpp
250 ↗(On Diff #118314)
  1. Yeah, it's a little ugly in this file, but in most of the rest of the code we have using declarations which mitigate this somewhat. I don't feel so strongly about it that I'd want to deviate from the style guide.
  1. Maybe. The alternative solution, of course, would be to remove the ability to represent error conditions in file_status. It may be worth looking at which of those makes the code more readable, but it seems beyond the scope of this patch.

4 minutes to 2 seconds is pretty impressive!

clang-tools-extra/modularize/CoverageChecker.cpp
250 ↗(On Diff #118314)

Eh, the style guide says "use auto where it makes the code more readable and easier to maintain". Seems to fit the bill here. That said, it's also minor and like you I don't feel strongly enough to push one way or the other.

llvm/include/llvm/Support/FileSystem.h
169 ↗(On Diff #118314)

Should this be explicit?

235 ↗(On Diff #118314)

Should this be marked explicit?

413 ↗(On Diff #118314)

I just noticed all of these functions pass by value. I can't see a good reason for that, so can you make them pass by const reference (especially since now you'll be slicing otherwise).

llvm/lib/Support/CachePruning.cpp
186 ↗(On Diff #118314)

Unrelated, but I find it surprising this is using a std::set. Wouldn't you get better performance by using a StringMap<uint64_t>? or a DenseSet<std::pair<uint64_t, std::string>>?

pcc updated this revision to Diff 118477.Oct 10 2017, 2:13 PM
  • Address review comments
llvm/include/llvm/Support/FileSystem.h
169 ↗(On Diff #118314)

Makes sense, done.

235 ↗(On Diff #118314)

Done.

413 ↗(On Diff #118314)

Done.

llvm/lib/Support/CachePruning.cpp
186 ↗(On Diff #118314)

I think we need to keep the set ordered by size for determinism (below we end up deleting in reverse set order, i.e. in decreasing order of size), and neither of those data structures will keep that ordering, I believe. I've added a comment here.

zturner accepted this revision.Oct 10 2017, 2:37 PM
This revision is now accepted and ready to land.Oct 10 2017, 2:37 PM
This revision was automatically updated to reflect the committed changes.