This is an archive of the discontinued LLVM Phabricator instance.

[Host] handle short reads and writes
AcceptedPublic

Authored by zturner on Oct 19 2016, 11:49 AM.

Details

Summary

The original motivation for this came from D25712, in which Eugene pointed out that File::Read() does not correctly handle short reads. However, I felt the fix was incomplete because it left the bug in other functions, and the code in general could have used some cleanup since there was a ton of duplication, which may be what led to this bug showing up in the first place.

Changes in this patch are:

  1. Have the normal Read() and Write() functions delegate to the versions that read and write with offset.
  2. Supply thread-safe versions of the Windows codepaths, which were previously incorrect in a multi-threaded environment.
  3. Delete a bunch of dead functions that are not used anywhere in LLDB.
  4. Remove the apple specific path due to MAX_READ_SIZE and MAX_WRITE_SIZE and merge with the standard non-Windows path.

The only real tricky part about this patch was that when you open a file in append mode, the old version of Write() with no offset would write at the end, whereas pwrite() always writes at the offset you specify. So to fix this I made the version of Write() with no offset explicitly compute the offset of the end of the file and pass that to the offset-version of Write().

Diff Detail

Event Timeline

zturner updated this revision to Diff 75181.Oct 19 2016, 11:49 AM
zturner retitled this revision from to [Host] handle short reads and writes.
zturner updated this object.
zturner added reviewers: clayborg, beanz, EugeneBi.
zturner added a subscriber: lldb-commits.
labath added a subscriber: labath.Oct 19 2016, 1:18 PM

I am not sure the "append" thing is actually a "fix". I consider it more like a feature of the append mode. It's also nice that it guarantees atomicity of writes even if two processes are writing to the same file (very useful for logging, although I'm not sure if that goes through these functions). I think we should keep the original behavior of append mode, as that is the behavior anyone familiar with standard file APIs will expect.

I am not sure the "append" thing is actually a "fix". I consider it more like a feature of the append mode. It's also nice that it guarantees atomicity of writes even if two processes are writing to the same file (very useful for logging, although I'm not sure if that goes through these functions). I think we should keep the original behavior of append mode, as that is the behavior anyone familiar with standard file APIs will expect.

There are many other problems with this code if we want to deal with atomicity. For example, the whole point of this patch was to handle short reads and writes. Well, if you have a short read or a write, then reading and writing a subsequent chunk is not atomic.

That being said, I don't think it's hugely important here. The worst that will happen is two log messages will be printed out of order. I don't think anything will be corrupt or interwoven. For example, if two threads come in at the same time and both compute the same file size, they will both attempt to write at the same offset. One will win, the other will insert right before the message. So it's possible the two could wind up reversed, but that's about it.

Only way to deal with multi-threading correctly is to put all this in a mutex.

Also, for the record, if you specify the threadsafe logging option, it already does put this in a mutex, so there should be no issue.

There are many other problems with this code if we want to deal with atomicity. For example, the whole point of this patch was to handle short reads and writes. Well, if you have a short read or a write, then reading and writing a subsequent chunk is not atomic.

I am willing abandon the atomicity if someone tries to write more than 2GB of data - I think he has bigger problems than that.

That being said, I don't think it's hugely important here. The worst that will happen is two log messages will be printed out of order. I don't think anything will be corrupt or interwoven. For example, if two threads come in at the same time and both compute the same file size, they will both attempt to write at the same offset. One will win, the other will insert right before the message. So it's possible the two could wind up reversed, but that's about it.

I am not sure what you mean by "insert right before the message". The second thread will just overwrite the thing that the first one wrote (plus you'll end up with a dangling fragment at the end if the second message was shorter.

Only way to deal with multi-threading correctly is to put all this in a mutex.

Actually, a mutex cannot handle this case if you have the multiple file descriptors (possibly in multiple processes) referring to the same file. That is the reason why append mode exists - to let the kernel do the arbitration.

There are many other problems with this code if we want to deal with atomicity. For example, the whole point of this patch was to handle short reads and writes. Well, if you have a short read or a write, then reading and writing a subsequent chunk is not atomic.

I am willing abandon the atomicity if someone tries to write more than 2GB of data - I think he has bigger problems than that.

Writing more than 2GB of data is not the only reason to get a short read or write. You can be interrupted by a signal after some of the data has been written but not all. This can happen with any number of bytes and at any time. write, read, and all other related functions will return a non-negative value indicating the number of bytes successfully read/written, which will be less than the number requested.

clayborg requested changes to this revision.Oct 19 2016, 1:52 PM
clayborg edited edge metadata.

Just a few questions on why we are calling lseek in the read and write functions. See inlined comments.

source/Host/common/File.cpp
405

Why are we calling lseek when we are passing the offset into the read below?

Shouldn't this just be:

off_t offset = 0;
416

Why are we calling lseek here? We specify the offset to the Write below and that function should do the right thing with the offset.

Shouldn't this just be:

off_t offset = 0;
This revision now requires changes to proceed.Oct 19 2016, 1:52 PM

Also you are right that I misspoke about the append case. But still, I just think that if writing to the same file from multiple processes is something we care about, we should support it "for real" instead of just pretending to. That means some kind of cross-process synchronization such as a shared mutex.

zturner added inline comments.Oct 19 2016, 1:55 PM
source/Host/common/File.cpp
405

This lseek is to get the current file pointer. If someone calls Write() with no offset, they expect this to mean "write at the current file position". In order to do that with pwrite(), you need to know what the current file position actually is. If we just set offset=0, it will write at the beginning of the file, which is probably not the intention.

LMK if I've misunderstood.

clayborg accepted this revision.Oct 19 2016, 1:58 PM
clayborg edited edge metadata.

Ok, the lseek calls make sense then.

This revision is now accepted and ready to land.Oct 19 2016, 1:58 PM

This can happen with any number of bytes and at any time. write, read, and all other related functions will return a non-negative value indicating the number of bytes successfully read/written, which will be less than the number requested.

Except if fd refers to a pipe, in which case writes of up to PIPE_MAX will be atomic.

Also, I just noticed another problem. What if the fd does not refer to an actual file, but a non-seekable file system object (named pipe, domain socket, ...). Will the lseek work on that? I have no idea. But, I think you're implementing a broken API to save a couple of lines of code.

(btw, you may want to know that pwrite() on O_APPEND descriptors basically ignores the offset argument, and always does an append).

This can happen with any number of bytes and at any time. write, read, and all other related functions will return a non-negative value indicating the number of bytes successfully read/written, which will be less than the number requested.

Except if fd refers to a pipe, in which case writes of up to PIPE_MAX will be atomic.

Also, I just noticed another problem. What if the fd does not refer to an actual file, but a non-seekable file system object (named pipe, domain socket, ...). Will the lseek work on that? I have no idea. But, I think you're implementing a broken API to save a couple of lines of code.

In that case the user should be using Host/Pipe or a more suitable class, this class is already unsuitable just on the grounds that it exposes a method (the offset version) that requires a seekable device. I'd even be fine asserting in the constructor if the device is not seekable.

(btw, you may want to know that pwrite() on O_APPEND descriptors basically ignores the offset argument, and always does an append).

Only on Linux, but this is apparently non-conformant behavior :(

https://linux.die.net/man/2/pwrite

POSIX requires that opening a file with the O_APPEND flag should have no affect on the location at which pwrite() writes data. However, on Linux, if a file is opened with O_APPEND, pwrite() appends data to the end of the file, regardless of the value of offset.

IDK, I'm the Windows guy so this isn't really my call, but I think part of the reason why the test suite is so flaky and bugs are so hard to track down sometimes is because we don't make assumptions. If we find some code that is clearly broken without a given set of assumptions, I think making it break "even more" without those assumptions is not only fine, but even desirable so that problems become easier to track down.

Incidentally, this patch actually makes all platforms behave consistently when the Write overload with offset is used with O_APPEND, so there's probably some value in having that consistency.

EugeneBi accepted this revision.Oct 19 2016, 8:28 PM
EugeneBi edited edge metadata.

This can happen with any number of bytes and at any time. write, read, and all other related functions will return a non-negative value indicating the number of bytes successfully read/written, which will be less than the number requested.

Except if fd refers to a pipe, in which case writes of up to PIPE_MAX will be atomic.

Also, I just noticed another problem. What if the fd does not refer to an actual file, but a non-seekable file system object (named pipe, domain socket, ...). Will the lseek work on that? I have no idea. But, I think you're implementing a broken API to save a couple of lines of code.

In that case the user should be using Host/Pipe or a more suitable class, this class is already unsuitable just on the grounds that it exposes a method (the offset version) that requires a seekable device. I'd even be fine asserting in the constructor if the device is not seekable.

If I have a string, the only way to know whether it refers to a seekable file is to open it and do an fstat() on the descriptor. And then I'm in the business of trying to decipher stat.st_mode to figure out whether to instantiate a Pipe, Socket, or a File object (do we want a CharacterDevice class ?). Check the 'a' attribute of https://linux.die.net/man/1/chattr for a fun corner case. Files in the proc filesystem are also very amusing. They are perfectly well readable and writable and appear as regular files in most aspects, but if you do an fstat() on them, you'll see their size as zero. I have no idea what would happen if I did a pwrite on a file in /proc -- it would probably work just fine as linux seems to ignore the offsets when they don't make sense, but that's not something I want to rely on.

IDK, I'm the Windows guy so this isn't really my call, but I think part of the reason why the test suite is so flaky and bugs are so hard to track down sometimes is because we don't make assumptions. If we find some code that is clearly broken without a given set of assumptions, I think making it break "even more" without those assumptions is not only fine, but even desirable so that problems become easier to track down.

I'm all for assertions, but I don't think this is a good example. In UNIX world (and linux is an extreme example of this) a great many things can be represented by a file-like objects (hell, you can even receive signals over a file descriptor, although I don't think you can assign a *name* to that), and here we should make as few assumptions as possible. The safest route for me seems:

  • File::Write -> ::write
  • File::Write(offset) -> ::pwrite

if pwrite fails on some crazy file, then so be it (you have to expect it to fail for other reasons anyway). It will also be faster, as Write() will do only one syscall instead of three. With a bit of clever programming, I think we can make the code reasonably small as well.