This is an archive of the discontinued LLVM Phabricator instance.

[Support/FileSystem] Add sub-second precision for atime/mtime of sys::fs::file_status on unix platforms
ClosedPublic

Authored by akyrtzi on Nov 21 2018, 11:45 PM.

Details

Summary

Adds nanosecond precision for the time reported for getLastAccessedTime() and getLastModificationTime() for unix platforms that support improved precision.
Previously those functions would only provide times in 1 second resolution, even when the underlying file system could provide more precise times than that.

Also add some comments to make sure people are aware that the resolution of times can vary across different file systems.

Diff Detail

Event Timeline

akyrtzi created this revision.Nov 21 2018, 11:45 PM
akyrtzi updated this revision to Diff 175062.Nov 22 2018, 10:32 AM

Moved the toTimePoint conversion function to llvm/Support/Chrono.h so it can be broadly available.

kristina requested changes to this revision.EditedNov 25 2018, 12:10 AM

Do we really need nanosecond precision with regards to this? The general changes to the support libraries are usually facilitated because of missing functionality that is required somewhere else and it being the most logical place to add it. If so please attach related revisions to this one (through "Edit Related Revisions") so it's more clear why this was added. However, if there's no consumer for such API (pending review) there is no point in adding it. The main spirit of the library is to only provide what LLVM and related projects require, I don't think nanosecond precision support falls within that category (unless what I said earlier is the case, in which case please link it to this revision).

This revision now requires changes to proceed.Nov 25 2018, 12:10 AM

The general changes to the support libraries are usually facilitated because of missing functionality that is required somewhere else and it being the most logical place to add it. If so please attach related revisions to this one (through "Edit Related Revisions") so it's more clear why this was added

There have been enhancements to file_status that added more info without the requirement you stated (attach related revisions to this one), like:
https://reviews.llvm.org/D31110
https://reviews.llvm.org/D18456

This patch makes no changes to the existing API so there's no specific place that I can point to, but it improves accuracy of *every* site that calls getLastModification() and uses its value for comparisons, in and out-of-tree. It addresses 'false-negatives' (mistakenly considering that the mod time did not change because the change happened within 1 second) and 'false-positives' (when considering same mod time as 'changed' so as to be conservative against sub-second changes).
For some examples:
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Frontend/PrecompiledPreamble.cpp#L486
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Driver/Driver.cpp#L1132
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Driver/ToolChains/Clang.cpp#L2623
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Serialization/ModuleManager.cpp#L156
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Index/IndexUnitWriter.cpp#L252
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h#L33

Apart from improving accuracy of mod time comparisons inside clang and its libraries themselves, mod times originating from clang libraries get reported up to our indexing system where we could take advantage of sub-second accuracy, for the same kind of benefits.

if there's no consumer for such API (pending review) there is no point in adding it.

To make sure it's clear, this is not a new API. getLastModificationTime() already reports a a timepoint in nanoseconds, but it is always in second intervals. This patch allows it to include the nanoseconds that are part of the underlying file status.

The general changes to the support libraries are usually facilitated because of missing functionality that is required somewhere else and it being the most logical place to add it. If so please attach related revisions to this one (through "Edit Related Revisions") so it's more clear why this was added

There have been enhancements to file_status that added more info without the requirement you stated (attach related revisions to this one), like:
https://reviews.llvm.org/D31110
https://reviews.llvm.org/D18456

This patch makes no changes to the existing API so there's no specific place that I can point to, but it improves accuracy of *every* site that calls getLastModification() and uses its value for comparisons, in and out-of-tree. It addresses 'false-negatives'

(mistakenly considering that the mod time did not change because the change happened within 1 second)

I think this is the most important bit of information here. It should have been in the description of the differential.
If the current precision was 1ms (or even 1us), then yes, going all the way down to 1ns may not be important/worthwhile.
But 1 sec precision is clearly laughably not great, and should be improved.

and 'false-positives' (when considering same mod time as 'changed' so as to be conservative against sub-second changes).
For some examples:
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Frontend/PrecompiledPreamble.cpp#L486
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Driver/Driver.cpp#L1132
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Driver/ToolChains/Clang.cpp#L2623
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Serialization/ModuleManager.cpp#L156
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Index/IndexUnitWriter.cpp#L252
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h#L33

Apart from improving accuracy of mod time comparisons inside clang and its libraries themselves, mod times originating from clang libraries get reported up to our indexing system where we could take advantage of sub-second accuracy, for the same kind of benefits.

On second look, if you are concerned with adding additional fields in basic_file_status, I can replace:

time_t fs_st_atime = 0;
time_t fs_st_mtime = 0;

with uint64_t's which are adequate to include nanosecond precision, while these time_t's are 64bits as well but waste space in the structure.

The general changes to the support libraries are usually facilitated because of missing functionality that is required somewhere else and it being the most logical place to add it. If so please attach related revisions to this one (through "Edit Related Revisions") so it's more clear why this was added

There have been enhancements to file_status that added more info without the requirement you stated (attach related revisions to this one), like:
https://reviews.llvm.org/D31110
https://reviews.llvm.org/D18456

This patch makes no changes to the existing API so there's no specific place that I can point to, but it improves accuracy of *every* site that calls getLastModification() and uses its value for comparisons, in and out-of-tree. It addresses 'false-negatives' (mistakenly considering that the mod time did not change because the change happened within 1 second) and 'false-positives' (when considering same mod time as 'changed' so as to be conservative against sub-second changes).

Is it going to be an issue that the Windows side of things has a more wild idea of file timestamp resolution? NTFS has a theoretical max precision of 100ns intervals, though according to MSDN, the access time on NTFS has a resolution of 1 hour, which is better than FAT file systems, where the resolution is 1 day. It seems odd to rely on ns resolution for access time, as that seems like it's going to be highly platform and filesystem dependent.

For some examples:
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Frontend/PrecompiledPreamble.cpp#L486
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Driver/Driver.cpp#L1132
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Driver/ToolChains/Clang.cpp#L2623
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Serialization/ModuleManager.cpp#L156
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/Index/IndexUnitWriter.cpp#L252
https://github.com/apple/swift-clang/blob/95cdf7c9af0dc102a3fbda4ae1f6265026ff2e30/lib/DirectoryWatcher/DirectoryWatcher-linux.inc.h#L33

Apart from improving accuracy of mod time comparisons inside clang and its libraries themselves, mod times originating from clang libraries get reported up to our indexing system where we could take advantage of sub-second accuracy, for the same kind of benefits.

I think this is the most important bit of information here. It should have been in the description of the differential.
If the current precision was 1ms (or even 1us), then yes, going all the way down to 1ns may not be important/worthwhile.
But 1 sec precision is clearly laughably not great, and should be improved.

Ah good point, I didn't consider that people are not broadly aware that current file_status has only 'up to second' precision.

kristina accepted this revision.EditedNov 25 2018, 12:17 PM

I think your original patch is fine, with better explanation of the reasoning behind it. I would comment on the platform differences with regards to modification time precision, it will make it easier to understand the rationale whenever someone is reading/reviewing related code in the future, IMO.

I'd also suggest elaborating on that point in the commit message since the actual purpose of the patch is vague from its description.

This revision is now accepted and ready to land.Nov 25 2018, 12:17 PM

Is it going to be an issue that the Windows side of things has a more wild idea of file timestamp resolution? NTFS has a theoretical max precision of 100ns intervals, though according to MSDN, the access time on NTFS has a resolution of 1 hour, which is better than FAT file systems, where the resolution is 1 day. It seems odd to rely on ns resolution for access time, as that seems like it's going to be highly platform and filesystem dependent.

I can't speak for the Windows side of things but what you are pointing out is a general question about applications taking into account that file properties may differ across platforms. This is orthogonal to file_status itself reporting file properties accurately.
If current file_status has 1 hour resolution on NTFS for access time, this patch is not going to make it any better or worse, nor affect behavior of applications that are querying files from NTFS.

The general changes to the support libraries are usually facilitated because of missing functionality that is required somewhere else and it being the most logical place to add it. If so please attach related revisions to this one (through "Edit Related Revisions") so it's more clear why this was added

There have been enhancements to file_status that added more info without the requirement you stated (attach related revisions to this one), like:
https://reviews.llvm.org/D31110
https://reviews.llvm.org/D18456

This patch makes no changes to the existing API so there's no specific place that I can point to, but it improves accuracy of *every* site that calls getLastModification() and uses its value for comparisons, in and out-of-tree. It addresses 'false-negatives' (mistakenly considering that the mod time did not change because the change happened within 1 second) and 'false-positives' (when considering same mod time as 'changed' so as to be conservative against sub-second changes).

Is it going to be an issue that the Windows side of things has a more wild idea of file timestamp resolution? NTFS has a theoretical max precision of 100ns intervals, though according to MSDN, the access time on NTFS has a resolution of 1 hour, which is better than FAT file systems, where the resolution is 1 day. It seems odd to rely on ns resolution for access time, as that seems like it's going to be highly platform and filesystem dependent.

Unfortunately there isn't a lot that can be done about that short of keeping a separate index of files and their timestamps, which seems to be slightly overkill, it's unlikely to affect anything unless you have a setup with different filesystems and are crossing mountpoints but I don't think this is significant enough to consider here. And with regards to NTFS, there really isn't much to be done unless you want to settle on microsecond precision instead as some sort of a universal thing, but that won't cover everything since there are just too many platforms and filesystems to consider besides NTFS, HFS+, APFS and Ext4, there are many others including some using network mounts or ramdisks.

Is it going to be an issue that the Windows side of things has a more wild idea of file timestamp resolution? NTFS has a theoretical max precision of 100ns intervals, though according to MSDN, the access time on NTFS has a resolution of 1 hour, which is better than FAT file systems, where the resolution is 1 day. It seems odd to rely on ns resolution for access time, as that seems like it's going to be highly platform and filesystem dependent.

I can't speak for the Windows side of things but what you are pointing out is a general question about applications taking into account that file properties may differ across platforms. This is orthogonal to file_status itself reporting file properties accurately.

It is and it isn't. This is a cross-platform API and users may expect the values to be somewhat similarly reported across platforms, especially given that there is no documentation on getLastAccessedTime() or getLastModificationTime().

If current file_status has 1 hour resolution on NTFS for access time, this patch is not going to make it any better or worse, nor affect behavior of applications that are querying files from NTFS.

Agreed, and I wasn't suggesting that it would. I was more wondering whether you had thoughts on people using this across platforms where they will get very different behavior from the same API. Personally, I think it's reasonable to report the values with whatever precision we have available to us. I think that should be documented on the public API part with a mention that the resolution is expected to differ depending on the file system and OS. WDYT?

Is it going to be an issue that the Windows side of things has a more wild idea of file timestamp resolution? NTFS has a theoretical max precision of 100ns intervals, though according to MSDN, the access time on NTFS has a resolution of 1 hour, which is better than FAT file systems, where the resolution is 1 day. It seems odd to rely on ns resolution for access time, as that seems like it's going to be highly platform and filesystem dependent.

I can't speak for the Windows side of things but what you are pointing out is a general question about applications taking into account that file properties may differ across platforms. This is orthogonal to file_status itself reporting file properties accurately.

It is and it isn't. This is a cross-platform API and users may expect the values to be somewhat similarly reported across platforms, especially given that there is no documentation on getLastAccessedTime() or getLastModificationTime().

If current file_status has 1 hour resolution on NTFS for access time, this patch is not going to make it any better or worse, nor affect behavior of applications that are querying files from NTFS.

Agreed, and I wasn't suggesting that it would. I was more wondering whether you had thoughts on people using this across platforms where they will get very different behavior from the same API. Personally, I think it's reasonable to report the values with whatever precision we have available to us. I think that should be documented on the public API part with a mention that the resolution is expected to differ depending on the file system and OS. WDYT?

Yes, I can update the documentation for LLVMSupport, definitely worth mentioning it, I think it's worth dropping a comment there as well, as it seems that if someone does hit issues with it, it may be a very annoying thing to debug.

With regards to UNIX-style access/modification times, the options are to find a uniform duration for all file-systems that is clamped to the one with lowest precision (like 1us), or just settle for this as-is. Though having this documented would be good, going on a hunt to determine the quirks of each possible file-system may be pretty difficult, especially considering some may allow tuning of those values.

aaron.ballman accepted this revision.Nov 25 2018, 1:01 PM

Aside from a minor consistency nit, I think this LGTM. I don't think there's a reasonable way to test this behavior, so it's fine to go in without a test.

include/llvm/Config/config.h.cmake
212

Is there a reason to not use #cmakedefine like the other uses?

lib/Support/Unix/Path.inc
601

...and switch these to using #ifdef or #if defined().

With regards to UNIX-style access/modification times, the options are to find a uniform duration for all file-systems that is clamped to the one with lowest precision (like 1us), or just settle for this as-is. Though having this documented would be good, going on a hunt to determine the quirks of each possible file-system may be pretty difficult, especially considering some may allow tuning of those values.

Yeah, I agree that some simple comments here should suffice. We don't need to document the behavior of each file system, just some words to remind people that file systems can differ should work fine.

I think that should be documented on the public API part with a mention that the resolution is expected to differ depending on the file system and OS. WDYT?

That is a good idea, the way I consider file_status is that it provides a cross-platform way to get at the file properties. It provides times in nanoseconds but leaves the precision as implementation detail of the underlying file system, which makes sense, people should not expect anything more than that.
I'll add some comments.

akyrtzi marked an inline comment as done.Nov 25 2018, 1:20 PM
akyrtzi added inline comments.
include/llvm/Config/config.h.cmake
212

This file is not consistent on what it uses, it has uses of #cmakedefine01 in some places but majority is with #cmakedefine.
I mainly liked to use #if HAVE_STRUCT_STAT.. directly since it reads more naturally, but if you prefer it the other way I will switch.

aaron.ballman added inline comments.Nov 25 2018, 1:39 PM
include/llvm/Config/config.h.cmake
212

I don't have a strong opinion; I just noticed everything around it in the patch context was using #cmakedefine. I have a weak preference for #cmakedefine because the resulting macro cannot produce a value (it can only be used in a conditional preprocessor test). The strength of that argument should tell you about how strong my preference is. ;-)

akyrtzi marked an inline comment as done.Nov 25 2018, 1:50 PM
akyrtzi added inline comments.
include/llvm/Config/config.h.cmake
212

I'm persuaded by the argument that the majority in the file uses #cmakedefine, I'll change it so consistency can slightly improve ;-)

akyrtzi updated this revision to Diff 175182.Nov 25 2018, 2:02 PM
akyrtzi edited the summary of this revision. (Show Details)
  • Change #cmakedefine01 to #cmakedefine for consistency.
  • Add comments to getLastAccessedTime()/ getLastModificationTime() APIs to make sure people are aware that time resolution can differ among file systems.
  • Improve commit summary.
This revision was automatically updated to reflect the committed changes.

Thanks for reviewing Kristina & Aaron, much appreciated!

thakis added a subscriber: thakis.Nov 26 2018, 10:02 PM
thakis added inline comments.
llvm/trunk/include/llvm/Config/config.h.cmake
215 ↗(On Diff #175189)

Putting this in the middle of the HAVE_SYS_ defines is a bit of a surprising place. Maybe put it above all the HAVE_SYS_ thingies so that it's at least locally alphabetical?

akyrtzi marked an inline comment as done.Nov 27 2018, 2:37 PM
akyrtzi added inline comments.
llvm/trunk/include/llvm/Config/config.h.cmake
215 ↗(On Diff #175189)

There's some debate in https://reviews.llvm.org/D54883 whether the CMake checks should be used or not, once this is settled I'll see about making such a change.