This is an archive of the discontinued LLVM Phabricator instance.

Improve the robustness of mmap
ClosedPublic

Authored by zturner on Feb 15 2017, 3:05 PM.

Details

Summary

When we mmap a file, we don't take into consideration whether the file is on a remote share. mmaping such a file is problematic because if the network share goes away -- which can happen for reasons outside of the user's control -- anything trying to read from that file will have undefined behavior.

LLDB already has logic to check whether a file is local or remote, but it was not implemented on Windows.

Also, mmap on newer versions of OSX have some additional flags which LLDB requires and which I see no reason to ever NOT include even for general usage, so the low level call to mmap is udpated to use those flags when available.

Note that I do not have any non-Windows machine to test on, so I'm including a lot of people from various platforms here to look over this:

Getting the statvfs / statfs distiction correct was tricky, but hopefully I got this correct. FWIU,

NetBSD: statfs does not exist! Must use statvfs for everything. Reference
FreeBSD: statvfs::f_flag does not contain MNT_LOCAL. Must use statfs. Reference
OpenBSD: Both functions exist, but documentation does not specify whether the flags differ between the two.

Since FreeBSD cannot use statvfs to check for MNT_LOCAL, I'm adjusting the conditional in Path.inc so that FreeBSD and OpenBSD follow the same codepath. Nothing was broken before, it's only now that we need to use f_flag & MNT_LOCAL that FreeBSD requires a change to use statfs instead.

Diff Detail

Event Timeline

zturner created this revision.Feb 15 2017, 3:05 PM
amccarth edited edge metadata.Feb 15 2017, 3:53 PM

The Windows bits look right to me. Double check the clang-format to make sure those long lines are wrapped nicely.

zturner updated this revision to Diff 88619.Feb 15 2017, 3:55 PM

Had to manually force clang-format to get the .inc files, it ignores them by default.

rnk edited edge metadata.Feb 16 2017, 1:41 PM

Looks right to me. I'm not familiar with the stat syscall, though, so I'd look for a reviewer for that.

llvm/unittests/Support/Path.cpp
1158

I guess mounting a remote filesystem is out of scope for a unit test. :)

ruiu added a subscriber: ruiu.Feb 16 2017, 1:50 PM
ruiu added inline comments.
llvm/lib/Support/Unix/Path.inc
346

Does this function return other than std::error_code()?

zturner added inline comments.Feb 16 2017, 1:51 PM
llvm/lib/Support/Unix/Path.inc
346

No, it can't encounter an error. I guess I should update the function signature to reflect that.

zturner updated this revision to Diff 88785.Feb 16 2017, 2:21 PM

Fixed suggestion from ruiu. In fixing that I noticed that there was no break in between cases in the Linux preprocessor branch, so I fixed that too. Good thing you pointed that out!

krytarowski added inline comments.Feb 16 2017, 3:45 PM
llvm/lib/Support/Unix/Path.inc
68

Is this equivalent to defined(__linux__) || defined(__NetBSD__)? If so it might look easier to parse presented this in the way of what system is used for this.

zturner added inline comments.Feb 16 2017, 3:49 PM
llvm/lib/Support/Unix/Path.inc
68

If you scroll down the file about 30 lines you'll see this:

#if defined(__FreeBSD__) || defined (__NetBSD__) || defined(__Bitrig__) || \
    defined(__OpenBSD__) || defined(__minix) || defined(__FreeBSD_kernel__) || \
    defined(__linux__) || defined(__CYGWIN__) || defined(__DragonFly__) || \
    defined(_AIX)

I don't know the ins and outs of all these definitions, but I'm pretty sure at least some of them will fall into this #if branch. And there could even be other ones that are not listed here but which define other platforms. So I'm not entirely sure what the inverse of this conditional is.

krytarowski added inline comments.Feb 16 2017, 3:58 PM
llvm/lib/Support/Unix/Path.inc
68

Yes I saw this switch and I was thinking how many OS'es are really used there. Other than that NetBSD platform part looks fine. I will test build it.

ruiu added a comment.Feb 16 2017, 4:04 PM

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

ruiu added a comment.Feb 16 2017, 4:12 PM

Thank you for the reply. That's very helpful to understand the motivation of this patch. Please add that to somewhere in this patch as a comment.

krytarowski edited edge metadata.Feb 16 2017, 4:16 PM
In file included from /tmp/pkgsrc-tmp/wip/llvm-git/work/llvm/lib/Support/Path.cpp:1174:0:
/tmp/pkgsrc-tmp/wip/llvm-git/work/llvm/lib/Support/Unix/Path.inc: In function 'std::error_code llvm::sys::fs::is_local(int, bool&)':
/tmp/pkgsrc-tmp/wip/llvm-git/work/llvm/lib/Support/Unix/Path.inc:376:35: error: too many arguments to function 'bool llvm::sys::fs::is_local_impl(statvfs&)'
   return is_local_impl(Vfs, Result);
                                   ^
/tmp/pkgsrc-tmp/wip/llvm-git/work/llvm/lib/Support/Unix/Path.inc:346:13: note: declared here
 static bool is_local_impl(struct STATVFS &Vfs) {
             ^
/tmp/pkgsrc-tmp/wip/llvm-git/work/llvm/lib/Support/Unix/Path.inc:376:23: error: could not convert 'llvm::sys::fs::is_local_impl(Vfs)' from 'bool' to 'std::error_code'
   return is_local_impl(Vfs, Result);
                       ^
[ 26%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/SearchForAddressOfSpecialSymbol.cpp.o
[ 26%] Building CXX object lib/Support/CMakeFiles/LLVMSupport.dir/Signals.cpp.o
--- lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o ---
*** [lib/Support/CMakeFiles/LLVMSupport.dir/Path.cpp.o] Error code 1

make[2]: stopped in /tmp/pkgsrc-tmp/wip/llvm-git/work/build

My host: NetBSD 7.99.59 amd64

zturner updated this revision to Diff 88800.Feb 16 2017, 4:19 PM

Fixed careless error in last update of patch.

zturner updated this revision to Diff 88801.Feb 16 2017, 4:23 PM

Improve the comment explaining the rationale behind this change.

It builds now. (NetBSD 7.99.59 amd64)

labath edited edge metadata.Feb 17 2017, 4:22 AM
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

The thing here is that the failure modes for read() are much saner than for mmap()ed memory. If the NFS share goes down *after* the read(), nothing happens. If it goes down *during* the read(), the syscall will just return -1, and that's an error you can easily handle.

In the mmap case, if the NFS share goes down after you have mmap()ed the memory, that memory will be silently removed from your address space, and next time you try to access it you will get a SIGBUS. It is still possible to recover from this situation, but it is much more tricky.

That said, local files are also not immune to the SIGBUS issue - the same thing can happen if someone does an truncate(2) on the mmap()ed file, but I guess those kinds of issues are much more rare in normal circumstances.

llvm/lib/Support/Unix/Path.inc
68

I guess the proper solution here would be to do a check at cmake-time for the appropriate symbols, and then branch here based on that.

llvm/lib/Support/Unix/Path.inc
68

This does not compile on linux. struct statvfs does not have the f_type field. I guess you need to use the struct statfs branch on linux.

zturner updated this revision to Diff 88905.Feb 17 2017, 9:13 AM

Make linux codepath use statfs.

pcc added a subscriber: pcc.Feb 17 2017, 9:32 AM
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call is_local by itself and pass the result as the IsVolatileSize argument (which should probably have a better name).

In D30010#680007, @pcc wrote:
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call is_local by itself and pass the result as the IsVolatileSize argument (which should probably have a better name).

I actually considered doing something like that, and I'm not opposed to it if people think it's a good idea. One name I had come up with is bool Ephemeral, and it defaults to true (implying that by default mappings are short-lived, and only in the case that the mapping is long-lived would you want to "de-optimize" in this manner. Then LLDB could pass false. If we're worried that Ephemeral is a word people might not be familiar with, I could call it ShortLived or LongLived as well.

In D30010#680007, @pcc wrote:
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call is_local by itself and pass the result as the IsVolatileSize argument (which should probably have a better name).

I actually considered doing something like that, and I'm not opposed to it if people think it's a good idea. One name I had come up with is bool Ephemeral, and it defaults to true (implying that by default mappings are short-lived, and only in the case that the mapping is long-lived would you want to "de-optimize" in this manner. Then LLDB could pass false. If we're worried that Ephemeral is a word people might not be familiar with, I could call it ShortLived or LongLived as well.

In case it's not clear btw, you were proposing re-using an existing argument whereas my idea was to add a new argument. The IsVolatileSize argument seems kind of orthogonal to what we're doing here, so it's hard to come up with an intuitive way to merge the two things into one variable.

Actually one way to do this while simultaneously reducing parameter list explosion might be to make a flags enum, and merge some of the existing ones in. Right now we already have RequiresNullTerminator and IsVolatileSize. We could have:

enum class MemBufferFlags {
  RequiresNullTerminator = 0x1,
  IsVolatileSize = 0x2,
  LongLived = 0x4
};

Then each function could just take a MemBufferFlags with appropriate default value instead of a list of bools.

pcc added a comment.Feb 17 2017, 10:06 AM
In D30010#680007, @pcc wrote:
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call is_local by itself and pass the result as the IsVolatileSize argument (which should probably have a better name).

I actually considered doing something like that, and I'm not opposed to it if people think it's a good idea. One name I had come up with is bool Ephemeral, and it defaults to true (implying that by default mappings are short-lived, and only in the case that the mapping is long-lived would you want to "de-optimize" in this manner. Then LLDB could pass false. If we're worried that Ephemeral is a word people might not be familiar with, I could call it ShortLived or LongLived as well.

In case it's not clear btw, you were proposing re-using an existing argument whereas my idea was to add a new argument. The IsVolatileSize argument seems kind of orthogonal to what we're doing here, so it's hard to come up with an intuitive way to merge the two things into one variable.

I would rename the existing argument. I believe that the only thing that IsVolatileSize does is disable the use of mmap. Its original use case is pretty much the same I believe (libclang, which may be embedded in a text editor or some other interactive application, needed to arrange to prevent text editors from changing file contents beneath its feet -- NFS shares going away is just another instance of the same issue).

In D30010#680030, @pcc wrote:
In D30010#680007, @pcc wrote:
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call is_local by itself and pass the result as the IsVolatileSize argument (which should probably have a better name).

I actually considered doing something like that, and I'm not opposed to it if people think it's a good idea. One name I had come up with is bool Ephemeral, and it defaults to true (implying that by default mappings are short-lived, and only in the case that the mapping is long-lived would you want to "de-optimize" in this manner. Then LLDB could pass false. If we're worried that Ephemeral is a word people might not be familiar with, I could call it ShortLived or LongLived as well.

In case it's not clear btw, you were proposing re-using an existing argument whereas my idea was to add a new argument. The IsVolatileSize argument seems kind of orthogonal to what we're doing here, so it's hard to come up with an intuitive way to merge the two things into one variable.

I would rename the existing argument. I believe that the only thing that IsVolatileSize does is disable the use of mmap. Its original use case is pretty much the same I believe (libclang, which may be embedded in a text editor or some other interactive application, needed to arrange to prevent text editors from changing file contents beneath its feet -- NFS shares going away is just another instance of the same issue).

The only concern I have is that it's easy for someone to answer the questions "Does the file have a volatile size?" and "Does the file reside locally?" without understanding the implications of specifying true or false. It's harder for them to answer the question "Should I disable the use of mmap on this file?" So I can call the argument AllowMmap = true, but I worry this might be a step backwards from an API standpoint.

pcc added a comment.Feb 17 2017, 10:25 AM
In D30010#680030, @pcc wrote:
In D30010#680007, @pcc wrote:
In D30010#679198, @ruiu wrote:

I wonder what this patch is trying to fix on Unix machines. mmap()'ing a file on NFS is not a special thing, and if something goes wrong with NFS, a process will hang whether a file is being used through read(2)/write(2) or mmap. What am I missing?

If you just read() the entire contents, in order for the call to fail the NFS would have to go down during the very short period of time where you are reading the entire file contents into memory. If you mmap the contents and then keep the mapping open for, say several hours, then if the NFS has a hiccup at any point during those several hours, a call to read could crash. Any time you are dealing with a network file system you of course might encounter problems, I think this just minimizes the likelihood of those problems occuring in practice.

The motivation for this change came when I tried to replace LLDB's mmap code with LLVM's. But I was told that checking if the file is local is a requirement, because they've had bugs in the past where they crash all of Xcode because they try to mmap some debug info off of a network share and then some time later during debugging, the share goes down and read crashes all of Xcode.

This seems like an LLDB-specific issue though (or in general it seems more specific to interactive applications). Wouldn't it be better to put the decision in the hands of the application? For example, LLDB could call is_local by itself and pass the result as the IsVolatileSize argument (which should probably have a better name).

I actually considered doing something like that, and I'm not opposed to it if people think it's a good idea. One name I had come up with is bool Ephemeral, and it defaults to true (implying that by default mappings are short-lived, and only in the case that the mapping is long-lived would you want to "de-optimize" in this manner. Then LLDB could pass false. If we're worried that Ephemeral is a word people might not be familiar with, I could call it ShortLived or LongLived as well.

In case it's not clear btw, you were proposing re-using an existing argument whereas my idea was to add a new argument. The IsVolatileSize argument seems kind of orthogonal to what we're doing here, so it's hard to come up with an intuitive way to merge the two things into one variable.

I would rename the existing argument. I believe that the only thing that IsVolatileSize does is disable the use of mmap. Its original use case is pretty much the same I believe (libclang, which may be embedded in a text editor or some other interactive application, needed to arrange to prevent text editors from changing file contents beneath its feet -- NFS shares going away is just another instance of the same issue).

The only concern I have is that it's easy for someone to answer the questions "Does the file have a volatile size?" and "Does the file reside locally?" without understanding the implications of specifying true or false. It's harder for them to answer the question "Should I disable the use of mmap on this file?" So I can call the argument AllowMmap = true, but I worry this might be a step backwards from an API standpoint.

The semantic effect of this argument is that the application requires that the MemoryBuffer contains a snapshot of the file contents at this point in time (which implies that it is immune to changes to file size, file contents, file system failures, etc.) If we document the flag in those terms and give it a suitable name (Snapshot maybe?), I feel that that would be sufficient from an API perspective.

Actually, now that I think about it, instead of calling the parameter IsVolatileSize, how about just IsVolatile? That includes the case that the entire file is volatile (and might disappear from a network share).

pcc added a comment.Feb 17 2017, 10:34 AM

Actually, now that I think about it, instead of calling the parameter IsVolatileSize, how about just IsVolatile? That includes the case that the entire file is volatile (and might disappear from a network share).

SGTM.

zturner updated this revision to Diff 88918.Feb 17 2017, 10:42 AM

Rename parameter from IsVolatileSize to IsVolatile, and remove the logic in shouldUseMmap to check if the file is local.

Now that this basically doesn't change any existing behavior, I assume this is good to go since we're leaving it up to the applications?

beanz edited edge metadata.Feb 17 2017, 3:36 PM

Don't really have much to add here, but I really like what this is doing, and I think it will be a huge win for LLDB. Thanks @zturner!

Seems like everything is addressed and people have OK'ed the correctness of the implementation on multiple platforms. So I'll try to put this in tonight if there are no other objections.

Seems to work for me after adding the magic.h include. I can't really guarantee that all bots will be happy with it though..

llvm/lib/Support/Unix/Path.inc
78

__ANDROID__ is a subset of __linux__. You can drop the former.

350

You need to #include <linux/magic.h> to get these constants.

This revision was automatically updated to reflect the committed changes.