Page MenuHomePhabricator

[lld] Buffer writes when composing a single diagnostic
AcceptedPublic

Authored by MaskRay on Sep 7 2020, 11:30 PM.

Details

Summary

llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).

For an invocation of ld.lld with 62000+ lines of
ld.lld: warning: symbol ordering file: no such symbol: warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file).

Diff Detail

Event Timeline

MaskRay created this revision.Sep 7 2020, 11:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2020, 11:30 PM
MaskRay requested review of this revision.Sep 7 2020, 11:30 PM

Buffer consecutive writes of one diagnostic

I found this comment confusing, especially given the example, as I thought initially you were buffering instances of the same diagnostic, rather than the writes for a single diagnostic report. Could you change this to "Buffer writes when composing a single diagnostic"? I think that better expresses the situation.

lld/Common/ErrorHandler.cpp
226

This code is very similar to the warning code. Can you de-duplicate it with a helper function?

grimar added inline comments.Sep 8 2020, 1:19 AM
llvm/include/llvm/Support/raw_ostream.h
317

Add const?

MaskRay retitled this revision from [lld] Buffer lld::errs() writes to [lld] Buffer writes when composing a single diagnostic.Sep 8 2020, 8:53 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay updated this revision to Diff 290499.Sep 8 2020, 9:00 AM

Address comments

lld/Common/ErrorHandler.cpp
226

os is used after the common part and requires a SmallString, so it is a bit difficult to share code.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic. And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

That's what I figured this might be originally about - but if that's already working, I'm not sure there's a need to change? But I don't feel super strongly if the alternative being proposed is especially tidier.

jhenderson added inline comments.Sep 9 2020, 1:44 AM
lld/Common/ErrorHandler.cpp
226

How about something like this:

void ErrorHandler::reportDiagnostic(Colors c, StringRef diagKind, StringRef location, const Twine &msg) {
  SmallString<256> buf;
  raw_svector_ostream os(buf);
  os << sep << location << ": ";
  if (lld::errs().colors_enabled()) {
    os.enable_colors(true);
    os << c << diagKind << ": " << Colors::
  } else {
    os << diagKind << ": ";
  }
  os << msg << '\n';
  std::lock_guard<std::mutex> lock(mu);
  lld::errs() << buf;
}

void ErrorHandler::warn(const Twine &msg) {
  ...
  reportDiagnostic(Colors::MAGENTA, "warning", getLocation(msg), msg);
  sep = getSeparator(msg);
}

void ErrorHandler::error(const Twine &msg) {
  ...
  if (errorLimit == 0 || errorCount <= errorLimit) {
    if (errorLimit != 0 && errorCount == errorLimit) {
      reportDiagnostic(Colors::RED, "error", getLocation(msg), errorLimitExceededMsg);
      exit = exitEarly;
    } else {
      reportDiagnostic(Colors::RED, "error", getLocation(msg), msg);
    }
    sep = getSeparator(msg);
    ++errorCount;
  }
MaskRay updated this revision to Diff 290759.Sep 9 2020, 9:41 AM

Adopt jhenderson's suggestion

MaskRay marked 3 inline comments as done.Sep 9 2020, 9:42 AM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls. Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

(The more focused syscall log lines under strace/truss is a very minor benefit)

And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

That's what I figured this might be originally about - but if that's already working, I'm not sure there's a need to change? But I don't feel super strongly if the alternative being proposed is especially tidier.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

Seems like the code could be simpler still by doing that refactoring without adding the buffering?

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

Seems like the code could be simpler still by doing that refactoring without adding the buffering?

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

There's still locking though, so I'm not sure it's a huge difference in mental complexity - locking on the write, to locking on a 10 line function that writes a few times?

MaskRay added a comment.EditedSep 9 2020, 3:06 PM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

https://reviews.llvm.org/D87121#2260177 I don't use it as a motivation for the patch. I created this patch because I think it is the right thing to do.

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

There's still locking though, so I'm not sure it's a huge difference in mental complexity - locking on the write, to locking on a 10 line function that writes a few times?

I keep the lock as I don't want to take the risk in this change. The mental complexity is different.

Before: (a) the coder worries that the raw_ostream::operator<< may access data potentially being concurrently modified; (b): concurrent ErrorHandler::log/warn/error may interleave their output so a lock is necessary
Now: only (a) exists

If lld::errs() can be made like std::osyncstream, the lock mental complexity can be entirely removed here. Without introducing the`std::osyncstream` concept, the new behavior can decrease mental complexity because composing local variables (as opposed to calling lld::errs() << multiple times) is apparently thread safe.

jhenderson accepted this revision.Sep 10 2020, 12:09 AM

Looks good from my point of view, but best wait for the conversation with @dblaikie to be resolved.

This revision is now accepted and ready to land.Sep 10 2020, 12:09 AM
andrewng added inline comments.
lld/Common/ErrorHandler.cpp
203

IIRC, the update of sep needs to be within the same lock to avoid a race condition.

234

Same locking/race condition for sep as above.

MaskRay planned changes to this revision.Sep 10 2020, 9:42 AM
MaskRay added inline comments.
lld/Common/ErrorHandler.cpp
234

Yes. errorCount is also racy. This is a big problem. The lock may still be needed.

This revision is now accepted and ready to land.Thu, Sep 24, 11:18 PM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

https://reviews.llvm.org/D87121#2260177 I don't use it as a motivation for the patch. I created this patch because I think it is the right thing to do.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

MaskRay added a comment.EditedSun, Oct 4, 11:46 PM

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

Clang has a concept of warnings that are default errors - so they can be downgraded to warnings again, and disabled individually. Perhaps that could be used here.

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Hmm, I think we're still miscommunicating here. Not all platforms use ANSI escape sequences - Windows uses APIs to change colour on the terminal, so there's no way to buffer coloured output there. Well, it seems like LLVM has an option to enable ANSI escape codes on Windows - so perhaps on some terminals that's used? But generally it's done via API on the file descriptor (see: https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/Windows/Process.inc#L369 and related use ( https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/raw_ostream.cpp#L513 )). So my understanding is that portable color rendering cannot be buffered as you're doing here.

MaskRay added a subscriber: aganea.Mon, Oct 5, 2:03 PM

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

Clang has a concept of warnings that are default errors - so they can be downgraded to warnings again, and disabled individually. Perhaps that could be used here.

Then we will need to communicate the -Wno- invention to binutils.

Such an opt-in option is still a bit inconvenient. For the inappropriate usage of D87121 and the out-of-range relocation warnings, if we do this patch, the performance is good enough that I don't need to care about using an option. With the opt-in option, I'll need to add it to the link.

An alternative is to introduce a warning limit, but that will add new code, too (and it needs to be justified as well).

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Hmm, I think we're still miscommunicating here. Not all platforms use ANSI escape sequences - Windows uses APIs to change colour on the terminal, so there's no way to buffer coloured output there. Well, it seems like LLVM has an option to enable ANSI escape codes on Windows - so perhaps on some terminals that's used? But generally it's done via API on the file descriptor (see: https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/Windows/Process.inc#L369 and related use ( https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/raw_ostream.cpp#L513 )). So my understanding is that portable color rendering cannot be buffered as you're doing here.

This is indeed a Windows problem. It is similar to https://reviews.llvm.org/D88348#2299650 Maybe @aganea has some thoughts whether buffering can be made.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

Clang has a concept of warnings that are default errors - so they can be downgraded to warnings again, and disabled individually. Perhaps that could be used here.

Then we will need to communicate the -Wno- invention to binutils.

If there are warnings people want to produce a lot of and ignore - it seems useful/good to have a way to disable those warnings.

Such an opt-in option is still a bit inconvenient. For the inappropriate usage of D87121 and the out-of-range relocation warnings, if we do this patch, the performance is good enough that I don't need to care about using an option. With the opt-in option, I'll need to add it to the link.

Part of the goal would be that it would make it more usable - by not producing a bunch of spurious warnings you need to filter out/ignore/etc?

An alternative is to introduce a warning limit, but that will add new code, too (and it needs to be justified as well).

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Hmm, I think we're still miscommunicating here. Not all platforms use ANSI escape sequences - Windows uses APIs to change colour on the terminal, so there's no way to buffer coloured output there. Well, it seems like LLVM has an option to enable ANSI escape codes on Windows - so perhaps on some terminals that's used? But generally it's done via API on the file descriptor (see: https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/Windows/Process.inc#L369 and related use ( https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/raw_ostream.cpp#L513 )). So my understanding is that portable color rendering cannot be buffered as you're doing here.

This is indeed a Windows problem. It is similar to https://reviews.llvm.org/D88348#2299650 Maybe @aganea has some thoughts whether buffering can be made.

(side note: raw_ostream already has buffering support - any idea why its performance is still significantly different than the extra layer of buffering on top of it? Might be a matter of enabling buffering in lld::errs() (lld::errs().SetBufferSize(N))? That'd still work with colored output without any troubles (it'd be flushed as needed on platforms where flushing is needed). I still think if there are use cases where the performance of errors matters to users - they probably should be addressed in other ways, though, such as being able to disable warnings the user isn't interested in)

MaskRay added a comment.EditedMon, Oct 5, 2:27 PM

(side note: raw_ostream already has buffering support - any idea why its performance is still significantly different than the extra layer of buffering on top of it? Might be a matter of enabling buffering in lld::errs() (lld::errs().SetBufferSize(N))? That'd still work with colored output without any troubles (it'd be flushed as needed on platforms where flushing is needed). I still think if there are use cases where the performance of errors matters to users - they probably should be addressed in other ways, though, such as being able to disable warnings the user isn't interested in)

(reply to side note: llvm::errs() << "..." does not use buffering. Every print is a write syscall on POSIX systems. The idea of the patch is to greatly reduce the number of write syscalls. Enabling buffering for llvm::errs() may have the side effect that an error can be reported far away from its call site. For certain warnings/errors (e.g. after all input has been read) this probably does not matter that much.)

raw_ostream does not have a line buffering mode.

(side note: raw_ostream already has buffering support - any idea why its performance is still significantly different than the extra layer of buffering on top of it? Might be a matter of enabling buffering in lld::errs() (lld::errs().SetBufferSize(N))? That'd still work with colored output without any troubles (it'd be flushed as needed on platforms where flushing is needed). I still think if there are use cases where the performance of errors matters to users - they probably should be addressed in other ways, though, such as being able to disable warnings the user isn't interested in)

(reply to side note: llvm::errs() << "..." does not use buffering. Every print is a write syscall on POSIX systems. The idea of the patch is to greatly reduce the number of write syscalls. Enabling buffering for llvm::errs() may have the side effect that an error can be reported far away from its call site. For certain warnings/errors (e.g. after all input has been read) this probably does not matter that much.)

raw_ostream does not have a line buffering mode.

Not line buffering, but it does have support for general purpose buffering - if there are particular places where the output must be generated (though I can't think of any in a linker or compiler - they don't accept interactive input, so they don't need to make sure some output is rendered to the user before waiting to read input, for instance) then it could/should be explicitly flushed at those locations.