This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump][NFC] Make the PrettyPrinter::printInst() output buffered
ClosedPublic

Authored by seiya on Jul 18 2019, 8:43 PM.

Details

Summary

Every time PrettyPrinter::printInst is called, stdout is flushed and it makes llvm-objdump slow. This patches adds a string
buffer to prevent stdout from being flushed.

Benchmark results (./llvm-objdump-master: without this patch, ./bin/llvm-objcopy: with this patch):

$ hyperfine --warmup 10 './llvm-objdump-master -d ./bin/llvm-objcopy' './bin/llvm-objdump -d ./bin/llvm-objcopy'
Benchmark #1: ./llvm-objdump-master -d ./bin/llvm-objcopy
  Time (mean ± σ):      2.230 s ±  0.050 s    [User: 1.533 s, System: 0.682 s]
  Range (min … max):    2.115 s …  2.278 s    10 runs

Benchmark #2: ./bin/llvm-objdump -d ./bin/llvm-objcopy
  Time (mean ± σ):     386.4 ms ±  13.0 ms    [User: 376.6 ms, System: 6.1 ms]
  Range (min … max):   366.1 ms … 407.0 ms    10 runs

Summary
  './bin/llvm-objdump -d ./bin/llvm-objcopy' ran
    5.77 ± 0.23 times faster than './llvm-objdump-master -d ./bin/llvm-objcopy'

Diff Detail

Repository
rL LLVM

Event Timeline

seiya created this revision.Jul 18 2019, 8:43 PM
seiya edited the summary of this revision. (Show Details)Jul 18 2019, 8:44 PM
seiya edited the summary of this revision. (Show Details)
jhenderson added a subscriber: jhenderson.

I've added a few more reviewers for visibility.

My initial thought was "why not just delete the "flush"" from printInst, but I see that it's there for a "good" reason, looking at PrettyPrinter. I can't help but feel that a better solution would be to fix formatted_raw_ostream so that the indentation only gets added after any already-written content. Assuming that's not viable (e.g. it's intentional behaviour), I think what you have written works. @seiya, what are your thoughts on the formatted_raw_ostream implementation? Can it be sensibly changed?

seiya added a comment.Jul 19 2019, 4:30 AM

I've added a few more reviewers for visibility.

My initial thought was "why not just delete the "flush"" from printInst, but I see that it's there for a "good" reason, looking at PrettyPrinter. I can't help but feel that a better solution would be to fix formatted_raw_ostream so that the indentation only gets added after any already-written content. Assuming that's not viable (e.g. it's intentional behaviour), I think what you have written works. @seiya, what are your thoughts on the formatted_raw_ostream implementation? Can it be sensibly changed?

First I thought that it is not straightforward to update the position (what FOS.getColumn() returns) in formatted_raw_stream without flushing because it's done in its write_impl.

However, now I realized that it's trivial: add new method formatted_raw_ostream::updatePositionByBufferedString() (I couldn't come up with a good name) method, which reads the buffered string and updates the position by doing ComputePosition(getBufferStart(), GetNumBytesInBuffer()); .

I haven't tried this, but with this method, we no longer need to add a string buffer like this current patch. What do you think on this approach?

I've added a few more reviewers for visibility.

My initial thought was "why not just delete the "flush"" from printInst, but I see that it's there for a "good" reason, looking at PrettyPrinter. I can't help but feel that a better solution would be to fix formatted_raw_ostream so that the indentation only gets added after any already-written content. Assuming that's not viable (e.g. it's intentional behaviour), I think what you have written works. @seiya, what are your thoughts on the formatted_raw_ostream implementation? Can it be sensibly changed?

First I thought that it is not straightforward to update the position (what FOS.getColumn() returns) in formatted_raw_stream without flushing because it's done in its write_impl.

However, now I realized that it's trivial: add new method formatted_raw_ostream::updatePositionByBufferedString() (I couldn't come up with a good name) method, which reads the buffered string and updates the position by doing ComputePosition(getBufferStart(), GetNumBytesInBuffer()); .

I haven't tried this, but with this method, we no longer need to add a string buffer like this current patch. What do you think on this approach?

It's hard to envisage this without seeing a patch with it in. Could you create a separate patch to this one, with your alternative proposal in, so that I and others can compare and contrast the two options, please?

seiya added a comment.EditedJul 21 2019, 10:45 PM

I've added a few more reviewers for visibility.

My initial thought was "why not just delete the "flush"" from printInst, but I see that it's there for a "good" reason, looking at PrettyPrinter. I can't help but feel that a better solution would be to fix formatted_raw_ostream so that the indentation only gets added after any already-written content. Assuming that's not viable (e.g. it's intentional behaviour), I think what you have written works. @seiya, what are your thoughts on the formatted_raw_ostream implementation? Can it be sensibly changed?

First I thought that it is not straightforward to update the position (what FOS.getColumn() returns) in formatted_raw_stream without flushing because it's done in its write_impl.

However, now I realized that it's trivial: add new method formatted_raw_ostream::updatePositionByBufferedString() (I couldn't come up with a good name) method, which reads the buffered string and updates the position by doing ComputePosition(getBufferStart(), GetNumBytesInBuffer()); .

I haven't tried this, but with this method, we no longer need to add a string buffer like this current patch. What do you think on this approach?

It's hard to envisage this without seeing a patch with it in. Could you create a separate patch to this one, with your alternative proposal in, so that I and others can compare and contrast the two options, please?

I had a look at formatted_raw_ostream, but the idea I proposed in a comment is not effective and this current patch would be better. Here's why:

formatted_raw_ostream takes a raw_ostream (outs() in this context) and tracks the current column/line position. Since formatted_raw_ostream itself is buffered, it flushes and makes the wrapped stream ( outs()) unbuffered in the constructor, and restore its buffering settings in the destructor [1].

To prevent outs() from being flushed, I tried making formatted_raw_ostream unbuffered and keeping outs() buffered, however, outs() is unbuffered if it's tty (line buffering is not implemented and it is stated that "it's' not worth the complexity" [2]). That is, what we need to do is implementing line buffering for stdout (this patch implements a sort of it by InstString).

I think this patch is a simple workaround for this, but working on making outs() line-buffered instead may be better. What do you think which way sounds better?

Personally I redirect the output of llvm-objdump to a pipe (e.g., llvm-objdump foo | less) so we can just ignore such case (stdout is tty), btw.

[1]: https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/include/llvm/Support/FormattedStream.h#L83
[2]: https://github.com/llvm/llvm-project/blob/49a3ad21d6034eb20f99f228dbebcc5f65a748d8/llvm/lib/Support/raw_ostream.cpp#L773

MaskRay added a comment.EditedJul 21 2019, 11:06 PM

Nice finding!

[1]: https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/include/llvm/Support/FormattedStream.h#L83

I think the 4x (line buffered vs unbuffered) slowdown can be explained by your first link. The formatted_raw_ostream ctor resets the underlying stream (outs()) to unbuffered. There are several solutions to the problem. Given formatted_raw_ostream FOS(OS); is only used once in PrettyPrinter and can be easily replaced, I suggest deleting it and tracking the column manually.

Nice finding!

[1]: https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/include/llvm/Support/FormattedStream.h#L83

I think the 4x (line buffered vs unbuffered) slowdown can be explained by your first link. The formatted_raw_ostream ctor resets the underlying stream (outs()) to unbuffered. There are several solutions to the problem. Given formatted_raw_ostream FOS(OS); is only used once in PrettyPrinter and can be easily replaced, I suggest deleting it and tracking the column manually.

Tracking the column manually sounds good idea to me. I'll try that!

seiya updated this revision to Diff 211427.Jul 23 2019, 10:55 PM
  • Track the column manually to remove formatted_raw_ostream.

Simply removing formatted_raw_ostream makes llvm-objdump faster as I expected:

$ hyperfine --warmup 10 './llvm-objdump-master -d ./bin/llvm-objcopy' './bin/llvm-objdump -d ./bin/llvm-objcopy'
Benchmark #1: ./llvm-objdump-master -d ./bin/llvm-objcopy
  Time (mean ± σ):      2.230 s ±  0.050 s    [User: 1.533 s, System: 0.682 s]
  Range (min … max):    2.115 s …  2.278 s    10 runs

Benchmark #2: ./bin/llvm-objdump -d ./bin/llvm-objcopy
  Time (mean ± σ):     386.4 ms ±  13.0 ms    [User: 376.6 ms, System: 6.1 ms]
  Range (min … max):   366.1 ms … 407.0 ms    10 runs

Summary
  './bin/llvm-objdump -d ./bin/llvm-objcopy' ran
    5.77 ± 0.23 times faster than './llvm-objdump-master -d ./bin/llvm-objcopy'
MaskRay accepted this revision.Jul 23 2019, 11:09 PM

Apologize I added formatted_raw_ostream and I didn't read its ctor carefully...

Benchmark #2: ./bin/llvm-objdump -d ./bin/llvm-objcopy
Time (mean ± σ): 386.4 ms ± 13.0 ms [User: 376.6 ms, System: 6.1 ms]
Range (min … max): 366.1 ms … 407.0 ms 10 runs

Please remember updating the description before you commit. The benchmark has been updated.

This revision is now accepted and ready to land.Jul 23 2019, 11:09 PM

Apologize I added formatted_raw_ostream and I didn't read its ctor carefully...

Benchmark #2: ./bin/llvm-objdump -d ./bin/llvm-objcopy
Time (mean ± σ): 386.4 ms ± 13.0 ms [User: 376.6 ms, System: 6.1 ms]
Range (min … max): 366.1 ms … 407.0 ms 10 runs

Please remember updating the description before you commit. The benchmark has been updated.

Don't apologize! This was a good opportunity for me to work on the performance issue and was just interesting :D

seiya edited the summary of this revision. (Show Details)Jul 23 2019, 11:14 PM
jhenderson accepted this revision.Jul 24 2019, 4:48 AM

LGTM too.

This revision was automatically updated to reflect the committed changes.