This is an archive of the discontinued LLVM Phabricator instance.

[lld] Buffer writes when composing a single diagnostic
ClosedPublic

Authored by MaskRay on Sep 7 2020, 11:30 PM.

Details

Summary

llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).

For an invocation of ld.lld with 62000+ lines of
ld.lld: warning: symbol ordering file: no such symbol: warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file).

Diff Detail

Event Timeline

MaskRay created this revision.Sep 7 2020, 11:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2020, 11:30 PM
MaskRay requested review of this revision.Sep 7 2020, 11:30 PM

Buffer consecutive writes of one diagnostic

I found this comment confusing, especially given the example, as I thought initially you were buffering instances of the same diagnostic, rather than the writes for a single diagnostic report. Could you change this to "Buffer writes when composing a single diagnostic"? I think that better expresses the situation.

lld/Common/ErrorHandler.cpp
215

This code is very similar to the warning code. Can you de-duplicate it with a helper function?

grimar added inline comments.Sep 8 2020, 1:19 AM
llvm/include/llvm/Support/raw_ostream.h
304

Add const?

MaskRay retitled this revision from [lld] Buffer lld::errs() writes to [lld] Buffer writes when composing a single diagnostic.Sep 8 2020, 8:53 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay updated this revision to Diff 290499.Sep 8 2020, 9:00 AM

Address comments

lld/Common/ErrorHandler.cpp
215

os is used after the common part and requires a SmallString, so it is a bit difficult to share code.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic. And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

That's what I figured this might be originally about - but if that's already working, I'm not sure there's a need to change? But I don't feel super strongly if the alternative being proposed is especially tidier.

jhenderson added inline comments.Sep 9 2020, 1:44 AM
lld/Common/ErrorHandler.cpp
215

How about something like this:

void ErrorHandler::reportDiagnostic(Colors c, StringRef diagKind, StringRef location, const Twine &msg) {
  SmallString<256> buf;
  raw_svector_ostream os(buf);
  os << sep << location << ": ";
  if (lld::errs().colors_enabled()) {
    os.enable_colors(true);
    os << c << diagKind << ": " << Colors::
  } else {
    os << diagKind << ": ";
  }
  os << msg << '\n';
  std::lock_guard<std::mutex> lock(mu);
  lld::errs() << buf;
}

void ErrorHandler::warn(const Twine &msg) {
  ...
  reportDiagnostic(Colors::MAGENTA, "warning", getLocation(msg), msg);
  sep = getSeparator(msg);
}

void ErrorHandler::error(const Twine &msg) {
  ...
  if (errorLimit == 0 || errorCount <= errorLimit) {
    if (errorLimit != 0 && errorCount == errorLimit) {
      reportDiagnostic(Colors::RED, "error", getLocation(msg), errorLimitExceededMsg);
      exit = exitEarly;
    } else {
      reportDiagnostic(Colors::RED, "error", getLocation(msg), msg);
    }
    sep = getSeparator(msg);
    ++errorCount;
  }
MaskRay updated this revision to Diff 290759.Sep 9 2020, 9:41 AM

Adopt jhenderson's suggestion

MaskRay marked 3 inline comments as done.Sep 9 2020, 9:42 AM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls. Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

(The more focused syscall log lines under strace/truss is a very minor benefit)

And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

That's what I figured this might be originally about - but if that's already working, I'm not sure there's a need to change? But I don't feel super strongly if the alternative being proposed is especially tidier.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

Seems like the code could be simpler still by doing that refactoring without adding the buffering?

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

Seems like the code could be simpler still by doing that refactoring without adding the buffering?

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

There's still locking though, so I'm not sure it's a huge difference in mental complexity - locking on the write, to locking on a 10 line function that writes a few times?

MaskRay added a comment.EditedSep 9 2020, 3:06 PM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

https://reviews.llvm.org/D87121#2260177 I don't use it as a motivation for the patch. I created this patch because I think it is the right thing to do.

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

There's still locking though, so I'm not sure it's a huge difference in mental complexity - locking on the write, to locking on a 10 line function that writes a few times?

I keep the lock as I don't want to take the risk in this change. The mental complexity is different.

Before: (a) the coder worries that the raw_ostream::operator<< may access data potentially being concurrently modified; (b): concurrent ErrorHandler::log/warn/error may interleave their output so a lock is necessary
Now: only (a) exists

If lld::errs() can be made like std::osyncstream, the lock mental complexity can be entirely removed here. Without introducing the`std::osyncstream` concept, the new behavior can decrease mental complexity because composing local variables (as opposed to calling lld::errs() << multiple times) is apparently thread safe.

jhenderson accepted this revision.Sep 10 2020, 12:09 AM

Looks good from my point of view, but best wait for the conversation with @dblaikie to be resolved.

This revision is now accepted and ready to land.Sep 10 2020, 12:09 AM
andrewng added inline comments.
lld/Common/ErrorHandler.cpp
192

IIRC, the update of sep needs to be within the same lock to avoid a race condition.

235

Same locking/race condition for sep as above.

MaskRay planned changes to this revision.Sep 10 2020, 9:42 AM
MaskRay added inline comments.
lld/Common/ErrorHandler.cpp
235

Yes. errorCount is also racy. This is a big problem. The lock may still be needed.

This revision is now accepted and ready to land.Sep 24 2020, 11:18 PM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

https://reviews.llvm.org/D87121#2260177 I don't use it as a motivation for the patch. I created this patch because I think it is the right thing to do.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

MaskRay added a comment.EditedOct 4 2020, 11:46 PM

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

Clang has a concept of warnings that are default errors - so they can be downgraded to warnings again, and disabled individually. Perhaps that could be used here.

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Hmm, I think we're still miscommunicating here. Not all platforms use ANSI escape sequences - Windows uses APIs to change colour on the terminal, so there's no way to buffer coloured output there. Well, it seems like LLVM has an option to enable ANSI escape codes on Windows - so perhaps on some terminals that's used? But generally it's done via API on the file descriptor (see: https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/Windows/Process.inc#L369 and related use ( https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/raw_ostream.cpp#L513 )). So my understanding is that portable color rendering cannot be buffered as you're doing here.

MaskRay added a subscriber: aganea.Oct 5 2020, 2:03 PM

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

Clang has a concept of warnings that are default errors - so they can be downgraded to warnings again, and disabled individually. Perhaps that could be used here.

Then we will need to communicate the -Wno- invention to binutils.

Such an opt-in option is still a bit inconvenient. For the inappropriate usage of D87121 and the out-of-range relocation warnings, if we do this patch, the performance is good enough that I don't need to care about using an option. With the opt-in option, I'll need to add it to the link.

An alternative is to introduce a warning limit, but that will add new code, too (and it needs to be justified as well).

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Hmm, I think we're still miscommunicating here. Not all platforms use ANSI escape sequences - Windows uses APIs to change colour on the terminal, so there's no way to buffer coloured output there. Well, it seems like LLVM has an option to enable ANSI escape codes on Windows - so perhaps on some terminals that's used? But generally it's done via API on the file descriptor (see: https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/Windows/Process.inc#L369 and related use ( https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/raw_ostream.cpp#L513 )). So my understanding is that portable color rendering cannot be buffered as you're doing here.

This is indeed a Windows problem. It is similar to https://reviews.llvm.org/D88348#2299650 Maybe @aganea has some thoughts whether buffering can be made.

Not sure I follow - the patch description seems to be quite focussed on the performance implications of this change - it sounds like, reading the patch description/commit message that that is the motivation.

Oh, perhaps I see another reason that might be less nuanced: This will probably break color printing on Windows when using the normal Windows color API - because it's a stateful API, not something that can be encoded in the stream: https://github.com/llvm/llvm-project/blob/d3f6972abb9c0ac06ddabf61697754c3c6f5d11b/llvm/lib/Support/raw_ostream.cpp#L513

I find another reason, probably still weak: I want to check a large binary with many out of range relocations. lld will emit a large number of warnings (if --noinhibit-exec). This patch can make it faster.

Could the uninteresting warnings be disabled? if all the warnings are interesting, I'd guess a bunch of processing would need to be done to make it actionable - or a lot of human time to read through them all, either way that time probably dwarfs the extra time lld takes to print those errors.

No. They were errors but downgraded to warnings. For the binaries I investigated, its rodata+text+data are significantly larger than 2GiB so many out of range relocation are expected.

Clang has a concept of warnings that are default errors - so they can be downgraded to warnings again, and disabled individually. Perhaps that could be used here.

Then we will need to communicate the -Wno- invention to binutils.

If there are warnings people want to produce a lot of and ignore - it seems useful/good to have a way to disable those warnings.

Such an opt-in option is still a bit inconvenient. For the inappropriate usage of D87121 and the out-of-range relocation warnings, if we do this patch, the performance is good enough that I don't need to care about using an option. With the opt-in option, I'll need to add it to the link.

Part of the goal would be that it would make it more usable - by not producing a bunch of spurious warnings you need to filter out/ignore/etc?

An alternative is to introduce a warning limit, but that will add new code, too (and it needs to be justified as well).

In any case - I think colour can't be portably buffered (well, in addition to the existing buffering in the ostream itself) so far as I can see in the implementation. Does that understanding seem right? Is there a way that could be addressed?

The code makes messages line buffered, instead of full buffered. (Not to say that buffered ANSI escape sequences work as well). The currently mere downside of this patch is the introduction of reportDiagnostic (added complexity). There are a number of small advantage on the up side, though.

Hmm, I think we're still miscommunicating here. Not all platforms use ANSI escape sequences - Windows uses APIs to change colour on the terminal, so there's no way to buffer coloured output there. Well, it seems like LLVM has an option to enable ANSI escape codes on Windows - so perhaps on some terminals that's used? But generally it's done via API on the file descriptor (see: https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/Windows/Process.inc#L369 and related use ( https://github.com/llvm/llvm-project/blob/6e25586990b93e2c9eaaa4f473b6720ccd646c46/llvm/lib/Support/raw_ostream.cpp#L513 )). So my understanding is that portable color rendering cannot be buffered as you're doing here.

This is indeed a Windows problem. It is similar to https://reviews.llvm.org/D88348#2299650 Maybe @aganea has some thoughts whether buffering can be made.

(side note: raw_ostream already has buffering support - any idea why its performance is still significantly different than the extra layer of buffering on top of it? Might be a matter of enabling buffering in lld::errs() (lld::errs().SetBufferSize(N))? That'd still work with colored output without any troubles (it'd be flushed as needed on platforms where flushing is needed). I still think if there are use cases where the performance of errors matters to users - they probably should be addressed in other ways, though, such as being able to disable warnings the user isn't interested in)

MaskRay added a comment.EditedOct 5 2020, 2:27 PM

(side note: raw_ostream already has buffering support - any idea why its performance is still significantly different than the extra layer of buffering on top of it? Might be a matter of enabling buffering in lld::errs() (lld::errs().SetBufferSize(N))? That'd still work with colored output without any troubles (it'd be flushed as needed on platforms where flushing is needed). I still think if there are use cases where the performance of errors matters to users - they probably should be addressed in other ways, though, such as being able to disable warnings the user isn't interested in)

(reply to side note: llvm::errs() << "..." does not use buffering. Every print is a write syscall on POSIX systems. The idea of the patch is to greatly reduce the number of write syscalls. Enabling buffering for llvm::errs() may have the side effect that an error can be reported far away from its call site. For certain warnings/errors (e.g. after all input has been read) this probably does not matter that much.)

raw_ostream does not have a line buffering mode.

(side note: raw_ostream already has buffering support - any idea why its performance is still significantly different than the extra layer of buffering on top of it? Might be a matter of enabling buffering in lld::errs() (lld::errs().SetBufferSize(N))? That'd still work with colored output without any troubles (it'd be flushed as needed on platforms where flushing is needed). I still think if there are use cases where the performance of errors matters to users - they probably should be addressed in other ways, though, such as being able to disable warnings the user isn't interested in)

(reply to side note: llvm::errs() << "..." does not use buffering. Every print is a write syscall on POSIX systems. The idea of the patch is to greatly reduce the number of write syscalls. Enabling buffering for llvm::errs() may have the side effect that an error can be reported far away from its call site. For certain warnings/errors (e.g. after all input has been read) this probably does not matter that much.)

raw_ostream does not have a line buffering mode.

Not line buffering, but it does have support for general purpose buffering - if there are particular places where the output must be generated (though I can't think of any in a linker or compiler - they don't accept interactive input, so they don't need to make sure some output is rendered to the user before waiting to read input, for instance) then it could/should be explicitly flushed at those locations.

MaskRay added a comment.EditedSep 9 2021, 9:14 AM

I'll push this. Since this makes --noinhibit-exec --no-fatal-warnings much faster. I noticed another internal user complaining about link timeout due to spending too much time dumping the messages.

While we could consider --warning-limit, it is useful to speed up printing regardless.

If llvm/Support/raw_ostream.h wants to improve in the future, we can revisit how lld uses it.

This revision was landed with ongoing or failed builds.Sep 9 2021, 9:27 AM
This revision was automatically updated to reflect the committed changes.

I still don't think this patch is appropriate, in part due to the colour diagnostic issue - coloured output can't, in general, be buffered (due to Windows having a stateful/API based colour changing, rather than ANSI based colour changing - at least in some cases)

If people are getting pages and pages of errors they're ignoring/want to perform quickly - that seems like something that deserves a different solution: turning off the warnings they are ignoring.

I still don't think this patch is appropriate, in part due to the colour diagnostic issue - coloured output can't, in general, be buffered (due to Windows having a stateful/API based colour changing, rather than ANSI based colour changing - at least in some cases)

sys::Process::UseANSIEscapeCodes(true); enables ANSCI escape codes for Windows.

If people are getting pages and pages of errors they're ignoring/want to perform quickly - that seems like something that deserves a different solution: turning off the warnings they are ignoring.

This is separate: whether we need an option like --warn-limit=N.
Separate options turning off warnings seem less useful to me. There are some warnings which don't deserve an option (which would make things more complicate). Off-by-default options need the users to rerun the link to turn off it.

This patch addresses another problem: I have seen interleaved diagnostic (something like msg_from_process_0 msg_from_process_1 msg_from_process_1 on one line) from either Clang or ld.lld from multiple processes.
This patch will avoid such issue.

cs/raw

I still don't think this patch is appropriate, in part due to the colour diagnostic issue - coloured output can't, in general, be buffered (due to Windows having a stateful/API based colour changing, rather than ANSI based colour changing - at least in some cases)

sys::Process::UseANSIEscapeCodes(true); enables ANSCI escape codes for Windows.

Presumably there's a reason that's not the only option/hardcoded, though?

If people are getting pages and pages of errors they're ignoring/want to perform quickly - that seems like something that deserves a different solution: turning off the warnings they are ignoring.

This is separate: whether we need an option like --warn-limit=N.

I don't think it is separate, though - this change seems to add extra complications to the code to improve performance for a case that doesn't seem like it should matter (generally I don't think error code paths are optimized like this/planned for performance of large numbers of errors being produced, the general design guidance is that once you have to show a message a human has to read and think about, the actual runtime it takes to produce that message is always going to be smaller than the human time required to understand and act on it)

Separate options turning off warnings seem less useful to me. There are some warnings which don't deserve an option (which would make things more complicate). Off-by-default options need the users to rerun the link to turn off it.

But you've mentioned multiple cases (both in the patch description ("For an invocation of ld.lld with 62000+ lines of..."), and in the renewed motivation to commit this patch (" I noticed another internal user complaining about link timeout due to spending too much time dumping the messages.")).

This patch addresses another problem: I have seen interleaved diagnostic (something like msg_from_process_0 msg_from_process_1 msg_from_process_1 on one line) from either Clang or ld.lld from multiple processes.
This patch will avoid such issue.

That motivation isn't mentioned in the commit message - the scalability of 10s of thousands of diagnostics is mentioned (& the convenience of strace - also doesn't seem like a huge motivation to me).

(I'm not sure hardcoding the ANSI escape code option to on in rGbcc34ab6c8ab7bdff6c520c824d6d71ddac7a896 is right either - clang doesn't do that for instance, FileCheck probably does because it doesn't have to be as portable and mostly its output goes through lit anyway, so it's the only way it can produce colour through the extra layer of buffering being done by lit anyway)

MaskRay added a comment.EditedSep 9 2021, 5:57 PM

cs/raw

I still don't think this patch is appropriate, in part due to the colour diagnostic issue - coloured output can't, in general, be buffered (due to Windows having a stateful/API based colour changing, rather than ANSI based colour changing - at least in some cases)

sys::Process::UseANSIEscapeCodes(true); enables ANSCI escape codes for Windows.

Presumably there's a reason that's not the only option/hardcoded, though?

Unavailable for Windows<10?

If people are getting pages and pages of errors they're ignoring/want to perform quickly - that seems like something that deserves a different solution: turning off the warnings they are ignoring.

This is separate: whether we need an option like --warn-limit=N.

I don't think it is separate, though - this change seems to add extra complications to the code to improve performance for a case that doesn't seem like it should matter (generally I don't think error code paths are optimized like this/planned for performance of large numbers of errors being produced, the general design guidance is that once you have to show a message a human has to read and think about, the actual runtime it takes to produce that message is always going to be smaller than the human time required to understand and act on it)

The output can be a distributed build farm. Making diagnostic 3x / 4x faster can be sufficient to allow a build not to time out while it could time out previously.

Separate options turning off warnings seem less useful to me. There are some warnings which don't deserve an option (which would make things more complicate). Off-by-default options need the users to rerun the link to turn off it.

But you've mentioned multiple cases (both in the patch description ("For an invocation of ld.lld with 62000+ lines of..."), and in the renewed motivation to commit this patch (" I noticed another internal user complaining about link timeout due to spending too much time dumping the messages.")).

The "symbol ordering file" warning motivation was somewhat weak. It was due to incorrect usage. I don't think it needs dedicated warning option.

The new motivation is relocation XXX out of range with --noinhibit-exec.
This is stronger. Faster output can be useful when analyzing large output.
I don't think it needs dedicated warning option, though.
The slightly generic --warn-limit=N may have some marginal use case but I haven't made up my mind to add it.

This patch addresses another problem: I have seen interleaved diagnostic (something like msg_from_process_0 msg_from_process_1 msg_from_process_1 on one line) from either Clang or ld.lld from multiple processes.
This patch will avoid such issue.

That motivation isn't mentioned in the commit message - the scalability of 10s of thousands of diagnostics is mentioned (& the convenience of strace - also doesn't seem like a huge motivation to me).

Sure, it is subjective sometimes. To me reportDiagnostic is better because with the previous form it was difficult to ensure the different kinds of diagnostics have the same style.
All the motivation adds up and the overall value is larger than the slightly increase in the number of lines.
We even have potential to remove the lock.

(I'm not sure hardcoding the ANSI escape code option to on in rGbcc34ab6c8ab7bdff6c520c824d6d71ddac7a896 is right either - clang doesn't do that for instance, FileCheck probably does because it doesn't have to be as portable and mostly its output goes through lit anyway, so it's the only way it can produce colour through the extra layer of buffering being done by lit anyway)

Presumably there's a reason that's not the only option/hardcoded, though?

Unavailable for Windows<10?

Perhaps that's it, I'm not sure.

If people are getting pages and pages of errors they're ignoring/want to perform quickly - that seems like something that deserves a different solution: turning off the warnings they are ignoring.

This is separate: whether we need an option like --warn-limit=N.

I don't think it is separate, though - this change seems to add extra complications to the code to improve performance for a case that doesn't seem like it should matter (generally I don't think error code paths are optimized like this/planned for performance of large numbers of errors being produced, the general design guidance is that once you have to show a message a human has to read and think about, the actual runtime it takes to produce that message is always going to be smaller than the human time required to understand and act on it)

The output can be a distributed build farm. Making diagnostic 3x / 4x faster can be sufficient to allow a build not to time out while it could time out previously.

It still doesn't seem like the right solution to me - we shouldn't be generating reams of output that the user won't use. And it sounds like that's the case here. Scrolling through pages of output doesn't seem like a beneficial user experience to me, so far as I can tell?

Separate options turning off warnings seem less useful to me. There are some warnings which don't deserve an option (which would make things more complicate). Off-by-default options need the users to rerun the link to turn off it.

But you've mentioned multiple cases (both in the patch description ("For an invocation of ld.lld with 62000+ lines of..."), and in the renewed motivation to commit this patch (" I noticed another internal user complaining about link timeout due to spending too much time dumping the messages.")).

The "symbol ordering file" warning motivation was somewhat weak. It was due to incorrect usage. I don't think it needs dedicated warning option.

I'm not suggesting that lld must have a flag - just that if this issue is going to be addressed, I think that direction is more suitable than the one taken here. Doing neither is OK to me. But doing this rather than fixing the diagnostic experience in a way that's helpful to the user is something I don't think is the right direction.

The new motivation is relocation XXX out of range with --noinhibit-exec.
This is stronger. Faster output can be useful when analyzing large output.

What sort of analysis do you have in mind? It certainly doesn't seem like this would be useful to a normal user running their linker. If they have to write scripts to try to deduplicate, find the largest value, or do some other analysis over the output - I don't think the performance of lld's diagnostic printing is going to be the expensive part of that process.

I don't think it needs dedicated warning option, though.
The slightly generic --warn-limit=N may have some marginal use case but I haven't made up my mind to add it.

This patch addresses another problem: I have seen interleaved diagnostic (something like msg_from_process_0 msg_from_process_1 msg_from_process_1 on one line) from either Clang or ld.lld from multiple processes.
This patch will avoid such issue.

That motivation isn't mentioned in the commit message - the scalability of 10s of thousands of diagnostics is mentioned (& the convenience of strace - also doesn't seem like a huge motivation to me).

Sure, it is subjective sometimes. To me reportDiagnostic is better because with the previous form it was difficult to ensure the different kinds of diagnostics have the same style.

Having consistent error output is a good thing, yes - I don't object to having a common function - though I'd prefer it either be (or wrap/use) https://llvm.org/doxygen/classllvm_1_1WithColor.html#a59d59f7f8aa89b08f44ad6a87e8ebb1a for instance, which handles the colour changing and formatting in a common way across the whole LLVM project, for instance.

MaskRay added a comment.EditedSep 9 2021, 6:36 PM

Presumably there's a reason that's not the only option/hardcoded, though?

Unavailable for Windows<10?

Perhaps that's it, I'm not sure.

If people are getting pages and pages of errors they're ignoring/want to perform quickly - that seems like something that deserves a different solution: turning off the warnings they are ignoring.

This is separate: whether we need an option like --warn-limit=N.

I don't think it is separate, though - this change seems to add extra complications to the code to improve performance for a case that doesn't seem like it should matter (generally I don't think error code paths are optimized like this/planned for performance of large numbers of errors being produced, the general design guidance is that once you have to show a message a human has to read and think about, the actual runtime it takes to produce that message is always going to be smaller than the human time required to understand and act on it)

The output can be a distributed build farm. Making diagnostic 3x / 4x faster can be sufficient to allow a build not to time out while it could time out previously.

It still doesn't seem like the right solution to me - we shouldn't be generating reams of output that the user won't use. And it sounds like that's the case here. Scrolling through pages of output doesn't seem like a beneficial user experience to me, so far as I can tell?

Users who analyze output with out-of-range relocations may need every relocation.
I am unsure the number of warnings needs a limit (like --error-limit).

Say, with a link of 400s, the build just timed out (due to large amount of output) in a build farm.
With 100s, it may succeed with large amount of output.
The 3x/4x output speedup makes a large difference.

Separate options turning off warnings seem less useful to me. There are some warnings which don't deserve an option (which would make things more complicate). Off-by-default options need the users to rerun the link to turn off it.

But you've mentioned multiple cases (both in the patch description ("For an invocation of ld.lld with 62000+ lines of..."), and in the renewed motivation to commit this patch (" I noticed another internal user complaining about link timeout due to spending too much time dumping the messages.")).

The "symbol ordering file" warning motivation was somewhat weak. It was due to incorrect usage. I don't think it needs dedicated warning option.

I'm not suggesting that lld must have a flag - just that if this issue is going to be addressed, I think that direction is more suitable than the one taken here. Doing neither is OK to me. But doing this rather than fixing the diagnostic experience in a way that's helpful to the user is something I don't think is the right direction.

I have some idea about the frequency of various error/warning diagnostics.
I think it is overkill to introduce separate -W and -Wno like options now, beside some features which already have a disabling mechanism.

The new motivation is relocation XXX out of range with --noinhibit-exec.
This is stronger. Faster output can be useful when analyzing large output.

What sort of analysis do you have in mind? It certainly doesn't seem like this would be useful to a normal user running their linker. If they have to write scripts to try to deduplicate, find the largest value, or do some other analysis over the output - I don't think the performance of lld's diagnostic printing is going to be the expensive part of that process.

See a paragraph above.
I am thinking of some analysis which can figure out output section ordering.
3x/4x speedup is quite significant on the user experience.

I don't think it needs dedicated warning option, though.
The slightly generic --warn-limit=N may have some marginal use case but I haven't made up my mind to add it.

This patch addresses another problem: I have seen interleaved diagnostic (something like msg_from_process_0 msg_from_process_1 msg_from_process_1 on one line) from either Clang or ld.lld from multiple processes.
This patch will avoid such issue.

That motivation isn't mentioned in the commit message - the scalability of 10s of thousands of diagnostics is mentioned (& the convenience of strace - also doesn't seem like a huge motivation to me).

Sure, it is subjective sometimes. To me reportDiagnostic is better because with the previous form it was difficult to ensure the different kinds of diagnostics have the same style.

Having consistent error output is a good thing, yes - I don't object to having a common function - though I'd prefer it either be (or wrap/use) https://llvm.org/doxygen/classllvm_1_1WithColor.html#a59d59f7f8aa89b08f44ad6a87e8ebb1a for instance, which handles the colour changing and formatting in a common way across the whole LLVM project, for instance.

lld needs more full-blown diagnostic builder like clang::DiagnosticBuilder.
Currently WithColor lacks quite a few things. Perhaps making it support line buffered diagnostics may allow lld to migrate to its interface.
There are other lld specific needs like lld::errs().
Overall I am unclear sharing code would be the best solution across the whole LLVM project.

That said, making WithColor::error(errs(), ToolName) << ... << ... << ... line buffered can help some utilities (llvm-objdump/llvm-readelf/etc) to avoid line interleaved output.

I think we're continuing to talk past one another. I'll leave this alone.