This is an archive of the discontinued LLVM Phabricator instance.

[libc] add buffering to FILE writes
ClosedPublic

Authored by michaelrj on Jun 1 2022, 3:01 PM.

Details

Summary

Previously all FILE objects were fully buffered, this patch adds line
buffering and unbuffered output, as well as applying them to stdout and
stderr.

Diff Detail

Event Timeline

michaelrj created this revision.Jun 1 2022, 3:01 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 1 2022, 3:01 PM
michaelrj requested review of this revision.Jun 1 2022, 3:01 PM
sivachandra added inline comments.Jun 2 2022, 1:08 AM
libc/src/__support/File/file.cpp
34

These additional conditionals are making the code even more complicated. Instead, we should do this: rename the current write_unlocked method to write_unlocked_fully_buffered and make it private. Next, add two new methods like this:

public:
size_t write_unlocked(const void *data, size_t len) {
  if (bufmode == _IOLBF) {
    return write_unlocked_line_buffered(data, len);
  } else if (bufmode == _IOFBF) {
    return write_unlocked_fully_buffered(data, len);
  } else {
    size_t written_size =platform_write(this, data, len);
    // Error handling if platform_write fails.

    platform_flush(this);

    // Error handling if platform_flush fails

    return len;
  }
}

private:
size_t write_unlocked_line_buffered(const void *data, size_t len) {
  // Instead of looking for the first new line char, look for the last new line char.
  char *last = last_newline(data, len);
  if (last == nullptr) {
   // If no new line char is found, just do normal buffering.
    return write_unlocked_fully_buffered(data, len);
  }

  size_t written_size;
  if (pos == 0) {
    // If there is no content in the buffer, write out the new data.
    written_size = platform_write(data, last - data /* size including the new line char */);
    // Perform error handling
  } else {
    // First write out the existing content, and then write out the new data.
    platform_write(buf, pos);
    // Perform error handling
    pos = 0;

    written_size = platform_write(data, last - data);
    // Perform error handling.
  }


  platform_flush();

  // Perform error handling

  if (written_size == len)
    return len;

  // Remaining data should be buffered.
  written_size = write_unlocked_fully_buffered(data + written_size, len - written_size);

  // Perform error handling

  return len;
}
libc/src/__support/File/file.h
62 ↗(On Diff #433562)

We shouldn't need this, and may be even not do this. Add the macros _IOFBF etc. to stdio.h and use them directly.

libc/src/__support/File/linux_file.cpp
162–163

Shouldn't the default buffering mode be _IOFBF?

172–174

We don't need these anymore - just make the size arg 0 and buffer arg nullptr in the constructor call below.

libc/test/src/__support/File/file_test.cpp
110

Why did you change this to calloc?

michaelrj updated this revision to Diff 434152.Jun 3 2022, 2:52 PM
michaelrj marked 4 inline comments as done.

rewrite to be closer to the original

libc/src/__support/File/file.cpp
34

I don't think splitting line buffered and fully buffered writes is the correct design. The reason being that their logic is almost identical, only varying in the spot where they flush the buffer. With this idea in mind, I've rewritten the original function to support both types of buffering in a manner closer to your original design. I've also kept unbuffered writes in this function, since it seems odd to split just that function off on its own if all the other write modes are together.

sivachandra added inline comments.Jun 5 2022, 11:39 PM
libc/src/__support/File/file.cpp
34

The concept of a flush_point is promising but in the current state of this patch, I am not yet convinced.

47

While the concept of flush_point seems promising, I think there is a mix up of a xfer_point and a flush_point in the code below. You want to transfer and clear the libc buffer when it is full. But, under _IOLBF, you want to actually perform the flush operation. May be I am missing it, but I don't see any call to platform_flush after this line.

125

It took me some time to understand why this return is OK. Which probably means we need to refactor/rename/comment.

126

If the if has braces, else should also have braces.

133–148

Should this be:

size_t remaining = len - write_size;
137

Should remaining be updated inside this if block? Couple of points:

  1. It seems like the condition is true only for a specific case of _IOLBF.
  2. This condition can be true even if there is no newline char in the input?
144

This condition can be true if flush_point - write_size is less than remaining?

158

Is an else missing here?

michaelrj updated this revision to Diff 434917.Jun 7 2022, 12:47 PM
michaelrj marked 9 inline comments as done.

address comments

libc/src/__support/File/file.cpp
47

I've added a call to flush at the end.

133–148

in this case remaining represents the number of characters left after the flush point. This is useful, since we want to know if the characters after the flush point will fit into the buffer, and if they will we want to write everything to output directly.

137

no, since it's only tracking the number of characters after the flush point, and from that the number of characters written should be logical after each step.

This isn't only true in a special case of line buffered mode, but it is related to that. This handles the case where the flush point is after the end of the buffer, as well as the cases where the remainder after the flush point doesn't fit into the buffer. The flush point can only be after the end of the buffer if there's a newline, because otherwise the flush point is equal to the end of the buffer.

It should be able to be true if there's no newline in the input, that's a good catch. I've added an extra condition.

158

no, but there used to be one here. If this were only fully buffered then writing remaining to the buffer and writing from data to the output directly would never happen together, but in line buffered mode in the case where the newline is after the end of the buffer, but the characters after the newline fit into the buffer, we might need to do both.

While this is probably OK logically, it is way harder to reason than before. I am not yet convinced that this idea of flush_point and mixing line buffering and fully buffering is better.

I gave it one pass and will need at least one more to actually understand everything. I will do it once more and get back to you.

michaelrj updated this revision to Diff 434996.Jun 7 2022, 4:53 PM

change to a simpler algorithm

The idea of "split point" makes it almost equivalent to the split approach I proposed. Taking a step back and looking at it objectively, consider that once a file is in a buffering mode, it is in that mode for its life time. Likewise, FBF is a mode by itself but LBF is FBF plus something else. To reflect these conceptual notions in code, we should use the approach of building LBF over FBF. That will keep FBF and LBF implementations simple and focused. Quantitatively also, the current state of this patch demonstrates that mixing makes it more complicated than it was previously.

I was wrong in expecting that concepts like flush point would make the implementation simple so I apologize for the misleading feedback I have given earlier. To move forward, we will do this:

  1. Use the split approach of building LBF over FBF as I have suggested earlier.
  2. Fix any bugs in FBF implementation separately.
  3. If and when some optimizations are required, we will optimize based on the data about what exactly the problem is.
michaelrj updated this revision to Diff 435261.Jun 8 2022, 11:13 AM

split the buffering modes into different functions, as well as cleaning up each implementation to be simpler in its specific case.

Good to go after the comments addressed.

libc/src/__support/File/file.cpp
167

We will keep the implementation simple for now like this:

fbf(primary.data(), primary.size());
// Error check
flush_unlocked(); // A call to flush and not to platform_flush()
fbf(remaining.data(), remaining.size());
// Error check

It should incur the same number of platform_* calls but might lead to a few additional copies. Iff required, we can optimize FBF to eliminate the unecessary copies. Infact, what we really want is to replace the first FBF operation by an NBF operation. We can do all that iff required.

I don't think we have flush_unlocked, so you will have to add it.

libc/test/src/__support/File/file_test.cpp
155

It will be cool to actually create an FBF file also and show the difference in pos comparisons (like those on line 170 and 174).

397

_IOFBF?

411

Use some buffering mode. Or, test for all modes.

michaelrj updated this revision to Diff 435678.Jun 9 2022, 2:05 PM
michaelrj marked 4 inline comments as done.

address comments

libc/src/__support/File/file.cpp
167

I did add NBF as well as flush_unlocked, and now we're down to three platform writes maximum (one for the buffer, one for the section up to the newline, and one for the remainder).

sivachandra accepted this revision.Jun 9 2022, 2:54 PM

LGTM modulo one comment about the NBF operation.

libc/src/__support/File/file.cpp
54

You should have an else which does platfrom_flush()?

151

You shouldn't need this as the NBF operation does platform flush anyway. My previous comment suggested this assuming FBF on line 163.

This revision is now accepted and ready to land.Jun 9 2022, 2:54 PM
michaelrj marked an inline comment as done.Jun 9 2022, 4:06 PM
michaelrj added inline comments.
libc/src/__support/File/file.cpp
54

I have the flush outside of the unbuffered write so that in the fully buffered function (line 63) we can use the unbuffered write when the data is too large for the buffer. This means that in the base write_unlocked function and in write_unlocked_lbf we have to flush separately, but I think that's okay.

sivachandra added inline comments.Jun 9 2022, 10:31 PM
libc/src/__support/File/file.cpp
54

Ah, I missed that!

This revision was automatically updated to reflect the committed changes.