This is an archive of the discontinued LLVM Phabricator instance.

[Host] Handle short reads and writes, take 3
AbandonedPublic

Authored by labath on Jan 4 2017, 10:51 AM.

Details

Reviewers
zturner
clayborg
Summary

This is a rework of D25783, which was blocked by my objections to using pwrite
wholesale. This version keeps the benefits of the code simplification in the
previous version, but keeps the distinction between write/pwrite system calls
and their windows equivalents. Besides allowing appends to be atomic, I believe
this actually makes the code simpler(*), as there is no need to do manual
lseeks() to get the file positions.

I had made the retry logic less aggressive than previously, as it was hanging
the test suite - the problem was parts of the codebase were depending on read
returning with a non-full buffer, for example when reading from the inferior
process pseudo-terminal (/dev/pts/ptmx). So, now, I only retry the read if the
user requested a read larger than READ_MAX, and the syscall returned READ_MAX
bytes, which I believe is the original problem we were trying to solve.

This does not make the API particularly nice, but I am not actually
changing that - it was always that. I believe a good cleanup here would be to
let the user specify whether he wants to read the full buffer, or whether he
will be satisfied with a partial read -- a behaviour we already needed in
AdbClient, which is currently rolling out its own implementation of that.

I also fixed some typos in the previous version, and added unit tests for the
modified functions.

(*) It turns out there are windows equivalents for the posix semantics of file

append - just pass FFFF... as file position. This is even automatic if the
file was opened with the appropriate CreateFile flags - a thing I could not do
immediately, as we are using a different function (_wsopen_s) to open files.

Event Timeline

labath updated this revision to Diff 83084.Jan 4 2017, 10:51 AM
labath retitled this revision from to [Host] Handle short reads and writes, take 3.
labath updated this object.
labath added reviewers: zturner, clayborg.
labath added a subscriber: lldb-commits.
clayborg requested changes to this revision.Jan 4 2017, 11:10 AM
clayborg edited edge metadata.

This change removes support for using a FILE* instead of a file descriptor. This needs to be fixed. The old Read function used to do this:

ssize_t bytes_read = -1;
if (DescriptorIsValid()) {
  do {
    bytes_read = ::read(m_descriptor, buf, num_bytes);
  } while (bytes_read < 0 && errno == EINTR);

  if (bytes_read == -1) {
    error.SetErrorToErrno();
    num_bytes = 0;
  } else
    num_bytes = bytes_read;
} else if (StreamIsValid()) {
  bytes_read = ::fread(buf, 1, num_bytes, m_stream);

  if (bytes_read == 0) {
    if (::feof(m_stream))
      error.SetErrorString("feof");
    else if (::ferror(m_stream))
      error.SetErrorString("ferror");
    num_bytes = 0;
  } else
    num_bytes = bytes_read;
} else {
  num_bytes = 0;
  error.SetErrorString("invalid file handle");
}
return error;

Note the "else if (StreamIsValid()) {" section.

This revision now requires changes to proceed.Jan 4 2017, 11:10 AM
labath added a comment.Jan 4 2017, 1:51 PM

The way this deals with it is by using GetDescriptor() which extracts the raw fd from FILE *, and then always deals with those. I guess this has the potential of bypassing any libc cache that FILE * may be using. I don't know whether we care about that, but if we do then this (and the previous) patch needs to be completely rethinked.

Note that the current state is not very consistent either File::Read() uses fread() if we have a FILE*, but the offset version of File::Read always uses pread (via GetDescriptor(), as I do in this patch always), as there is no offset version of fread. I think this is a much bigger inconsistency than the one we have here.

The way this deals with it is by using GetDescriptor() which extracts the raw fd from FILE *, and then always deals with those. I guess this has the potential of bypassing any libc cache that FILE * may be using. I don't know whether we care about that, but if we do then this (and the previous) patch needs to be completely rethinked.

If you read right from the fd, it will do the wrong thing as the cache may have read many bytes at once and reading from the fd will skip bytes.

Note that the current state is not very consistent either File::Read() uses fread() if we have a FILE*, but the offset version of File::Read always uses pread (via GetDescriptor(), as I do in this patch always), as there is no offset version of fread. I think this is a much bigger inconsistency than the one we have here.

Agreed that there are bugs. We should see if anyone is using the "FILE *" part and remove it if possible. Some people might be using "FILE *" since it provides locking of the fd during reads and writes to stop multiple threads from borking each other, so if anyone is still using it, you can't just switch them away without looking very carefully at the code that is using the File object. Check and see if anyone is using the "FILE *" functionality and let me know what you find.

labath added a comment.Jan 5 2017, 8:10 AM

I've done a bit of investigation, and there is still a lot of FILE* usage in random places. I believe most of them would be pretty easy to get rid of (D28356 is one of the more complicated cases).

However, the main problem I see is the SB API, which uses FILE* in a couple of places (SBDebugger, SBCommandReturnObject, and a couple others). While we can make sure that we use a single api consistently for all the FILE* that are passed to us, we cannot guarantee anything if the user expects to be able to access the FILE* objects while we are using them.

With that in mind I still think we should go ahead with the standardization, as we are not very consistent to begin with. This way we can be at least internally consistent in our usage, and the API boundary problem can be resolved in SBv2.

Let me know what you think.

labath added a comment.Jan 6 2017, 7:46 AM

Ok, this is going to be more complicated than I expected. libedit takes a FILE* in its interface, so it's going to be very hard to get rid of it.

It would be possible to still use file descriptors everywhere, and create a fake FILE* using functions like funopen for the purposes of libedit, but then this starts to get a bit messy...

labath abandoned this revision.Nov 26 2018, 2:59 AM

Any reason we are removing File::SeekFromCurrent and File::SeekFromEnd?

This is a stale patch which I am abandoning. I am just cleaning up my queue.