This is an archive of the discontinued LLVM Phabricator instance.

Support: Add mapped_file_region::sync(), equivalent to msync
ClosedPublic

Authored by dexonsmith on Jan 26 2021, 5:11 PM.

Details

Summary

Add mapped_file_region::sync(), equivalent to POSIX msync,
synchronizing written content to disk without unmapping the region.
Asserts if the mode is not mapped_file_region::readwrite.

Note that I don't have access to a Windows machine, so I can't
easily run those unit tests.

Diff Detail

Event Timeline

dexonsmith created this revision.Jan 26 2021, 5:11 PM
dexonsmith requested review of this revision.Jan 26 2021, 5:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2021, 5:11 PM
rnk edited reviewers, added: amccarth; removed: rnk.Feb 1 2021, 12:08 PM
rnk added a subscriber: rnk.

@amccarth, can you confirm that these are the right APIs to use here? Do we need both FlushViewOfFile and FlushFileBuffers?

ping / rebase

In D95494#2534572, @rnk wrote:

@amccarth, can you confirm that these are the right APIs to use here? Do we need both FlushViewOfFile and FlushFileBuffers?

Yes, those are the right APIs. The first initiates the flush asynchronously. The second blocks until the flush is complete. That's necessary if it's important to match the behavior of the corresponding Posix implementation, which passes the MS_SYNC flag.

The only difference I see is that that Posix version will mark the last write time on the file to be updated, but the Windows version _may_ not. I doubt that's important, but, if it is, then the Windows version would also have to call SetFileTime. I don't know how often LLVM code will sync, so I would avoid an extra operation if it's not necessary.

On Windows, you rarely need to use FlushViewOfFile because (1) the system will keep multiple _views_ coherent across threads and processes and (2) everything will be flushed to disk when the view is closed. FlushViewOfFile is useful when you need to ensure the bits on disk are updated because you expect another thread or process to access the file through something other than another view into the same file mapping.

llvm/unittests/Support/Path.cpp
1472

I'm mildly concerned that there are assertions that will prevent the temporary file from being deleted if/when the test fails. Accumulation of orphaned temp files can slow down tests, eat up disk quotas, etc.

There is a fs::TempFile class that will delete the file when it goes out of scope. Or, perhaps most of these ASSERT_s could be EXPECT_s.

In D95494#2534572, @rnk wrote:

@amccarth, can you confirm that these are the right APIs to use here? Do we need both FlushViewOfFile and FlushFileBuffers?

Yes, those are the right APIs. The first initiates the flush asynchronously. The second blocks until the flush is complete. That's necessary if it's important to match the behavior of the corresponding Posix implementation, which passes the MS_SYNC flag.

The only difference I see is that that Posix version will mark the last write time on the file to be updated, but the Windows version _may_ not. I doubt that's important, but, if it is, then the Windows version would also have to call SetFileTime. I don't know how often LLVM code will sync, so I would avoid an extra operation if it's not necessary.

On Windows, you rarely need to use FlushViewOfFile because (1) the system will keep multiple _views_ coherent across threads and processes and (2) everything will be flushed to disk when the view is closed. FlushViewOfFile is useful when you need to ensure the bits on disk are updated because you expect another thread or process to access the file through something other than another view into the same file mapping.

Ah, interesting. The use case I have in mind is optionally keeping a just-written-to mapped_file_region alive as a file-backed memory buffer. Effectively, allowing a caller to use f2 instead of f1:

std::unique_ptr<MemoryBuffer> f1(int FD, SmallVectorImpl<char> &&Data) {
  raw_fd_ostream file(FD, ...);
  file << Data;
  return std::make_unique<SmallVectorMemoryBuffer>(std::move(Data));
}
std::unique_ptr<MemoryBuffer> f2(int FD, SmallVectorImpl<char> &&Data) {
  auto MFR = std::make_unique<mapped_file_region>(FD, ...);
  memcpy(MFR.data(), ...)
  SmallVector<char, 0>().swap(Data);
  return std::make_unique<MappedFileRegionMemoryBuffer>(std::move(MFR));
}

I think we'd just want it to sync as much say destruction a file stream or destruction a mapped_file_region.

What do you recommend? (I'll also document clearly what the guarantees are we end up with...)

llvm/unittests/Support/Path.cpp
1472

Ah, thanks; I was just copy/pasting from an adjacent test, but I can at least clean mine up.

I think we'd just want it to sync as much say destruction a file stream or destruction a mapped_file_region.

Then I think this change would be fine.

But I'm not sure the test is testing that.

On Windows, there are three kinds of objects at play: the file, the mapping, and the view. I don't know the Posix model well enough to know how this corresponds. I also don't know all the relevant details behind fs::mapped_file_region and MemoryBuffer.

On Windows, two views derived from the same file mapping will be coherent--even across processes--without user syncing. Furthermore, all views ultimately backed by the same (non-remote) file will also be coherent. In the test, the second "user" of the file is a MemoryBuffer. Since MemoryBuffer is also opening the file for memory mapping (I believe), it seems its views are going to be coherent regardless of whether you do a manual sync. Thus the test might always pass, regardless of whether your function is correct.

What's not necessarily coherent is regular file i/o with a file that's also using memory mapped i/o. That's the case where you would definitely need a manual sync. This is rare in my experience since Microsoft recommends opening a file in exclusive mode if you're going to use memory mapped i/o.

Although the docs say you can close views and mappings in any order, I've only seen it done in strictly reverse order. And I'm not sure you can close the file itself while mappings still exist. I assume that would just decrement a reference count and the file would remain open until the last mapping object goes away. Compounding this uncertainty is that the test calls close on the file descriptor rather than the file object itself.

So my recommendations are (1) that the test be written so that one of the accesses is via memory mapped i/o and the other is through file i/o, and (2) that the test be shown to fail without the msync call.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 6 2022, 1:47 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2022, 1:47 PM

I have accidentally committed this change without review. I have reverted it. My apologies!
revert link:
https://github.com/llvm/llvm-project/commit/5d3cf8267f90c3715d77f9bf7f97077b65791cb3