This is an archive of the discontinued LLVM Phabricator instance.

[Support] Listing a directory containing dangling symlinks is not an error.
ClosedPublic

Authored by sammccall on Sep 28 2018, 7:16 AM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Sep 28 2018, 7:16 AM
vsk added a comment.Sep 28 2018, 9:00 AM

Thanks for the patch.

include/llvm/Support/FileSystem.h
1259 ↗(On Diff #167471)

I'm struggling to understand how work is saved here. Do you mind explaining in some more detail? Specifically, I'm not sure why determining file_type is cheaper than getting a basic_file_status. Don't both stat()?

tools/llvm-cov/CodeCoverage.cpp
218 ↗(On Diff #167471)

Do all clients of the directory iterator need to check status(), or is llvm-cov the weird one?

unittests/Support/Path.cpp
887 ↗(On Diff #167471)

Nice!

Dor1s added inline comments.Sep 28 2018, 9:24 AM
include/llvm/Support/FileSystem.h
1220–1221 ↗(On Diff #167471)

is this comment still relevant?

1263 ↗(On Diff #167471)

Can this status be different from the value obtained on line 1259?

Thanks for looking at this, good questions! I could have given more background.

Are there places you think better doc comments would help?

include/llvm/Support/FileSystem.h
1259 ↗(On Diff #167471)

since r342232 file_type is often better:

  • on linux/bsd/mac with most FSs, you get d_type as part of the struct dirent, so file_type is free when traversing, while basic_file_status costs a stat()
  • on solaris (and others?), d_type is not present. On some filesystems (e.g. certain configurations of XFS), d_type is "unknown". In these cases type() falls back to status(), so this change is a no-op.
  • on windows, both are free (you get basic_file_status as part of the directory listing)
tools/llvm-cov/CodeCoverage.cpp
218 ↗(On Diff #167471)

I don't think llvm-cov particularly needs to check or warn here, I'm just trying to preserve the existing behavior as I'm not so familiar with it.

The original bug ("llvm-cov hangs") was a problem in recursive_directory_iterator, not llvm-cov itself. When it hit a dangling symlink (or anything that failed to stat), increment() would become a no-op and return no error code.

D44960 fixed this by having the iterator report the stat() failure as an error_code, and continue to advance. At this point the natural loop works fine. I believe the changes to llvm-cov just add a warning, and prevent the dangling symlink itself from being added as a collected path. (The latter shouldn't matter for most programs, but I preserved it to be safe).

Reporting the stat() error code had a kind of logic: stat() produces ErrorOr<file_status> and directory_iterator yielded (basic_file_status, error_code) so it seems to fit. But having directory_iterator return something closer to (path, type) and not error on dangling links is a better match for most OSes.

I don't think this changed API is error-prone: for a dangling symlink you get a directory_entry with just a path, and to do anything you must open/stat it etc which can fail. So it gets handled in the same way the application handles files you can see but can't read for any other reason, which seems appropriate.

sammccall added inline comments.Sep 28 2018, 9:41 AM
include/llvm/Support/FileSystem.h
1220–1221 ↗(On Diff #167471)

In both directory_iterator and recursive_directory_iterator I can't tell if this comment refers to the above members or is a placeholder for future expansion. But it's been this way for at least 8 years. I'll remove them.

1263 ↗(On Diff #167471)

Line 1259 gets the type (not status) from the directory listing, which will be symlink_file if the thing in the dir is a symlink.

The stat() is a resolving stat on the symlink (notice here type is symlink_file and Follow is true), so the new type is of the symlink *target*. This allows us to traverse into it if it's a directory.

sammccall updated this revision to Diff 167502.Sep 28 2018, 9:43 AM

Zap unclear comments, clarify extra stat.

vsk accepted this revision.Sep 28 2018, 10:18 AM
vsk added inline comments.
include/llvm/Support/FileSystem.h
1259 ↗(On Diff #167471)

Thanks for the explanation, this change makes sense to me now.

1257 ↗(On Diff #167502)

Might be worth noting here that on most OSes, checking the type of a file doesn't cost a stat().

tools/llvm-cov/CodeCoverage.cpp
218 ↗(On Diff #167471)

I agree that the API change is safe. llvm-cov can handle situations in which it can't read a file. Not collecting paths to broken symlinks might actually be a problem, though, because it means summary statistics might not be emitted for files that can't (& don't need to) be read. I'll look into this as a follow-up -- for this patch, preserving the current behavior sgtm.

This revision is now accepted and ready to land.Sep 28 2018, 10:18 AM
Dor1s accepted this revision.Sep 28 2018, 3:52 PM
Dor1s added inline comments.
include/llvm/Support/FileSystem.h
1263 ↗(On Diff #167471)

Thanks for clarifying!

This revision was automatically updated to reflect the committed changes.