Page MenuHomePhabricator

[lld] Buffer writes when composing a single diagnostic
Changes PlannedPublic

Authored by MaskRay on Mon, Sep 7, 11:30 PM.

Details

Summary

llvm::errs() is unbuffered. On a POSIX platform, composing a diagnostic
string may invoke the ::write syscall multiple times, which can be slow.
Buffer writes to a temporary SmallString when composing a single diagnostic to
reduce the number of ::write syscalls to one (also easier to read under
strace/truss).

For an invocation of ld.lld with 62000+ lines of
ld.lld: warning: symbol ordering file: no such symbol: warnings (D87121),
the buffering decreases the write time from 1s to 0.4s (for /dev/tty) and
from 0.4s to 0.1s (for a tmpfs file).

Diff Detail

Event Timeline

MaskRay created this revision.Mon, Sep 7, 11:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Sep 7, 11:30 PM
MaskRay requested review of this revision.Mon, Sep 7, 11:30 PM

Buffer consecutive writes of one diagnostic

I found this comment confusing, especially given the example, as I thought initially you were buffering instances of the same diagnostic, rather than the writes for a single diagnostic report. Could you change this to "Buffer writes when composing a single diagnostic"? I think that better expresses the situation.

lld/Common/ErrorHandler.cpp
213

This code is very similar to the warning code. Can you de-duplicate it with a helper function?

grimar added inline comments.Tue, Sep 8, 1:19 AM
llvm/include/llvm/Support/raw_ostream.h
304

Add const?

MaskRay retitled this revision from [lld] Buffer lld::errs() writes to [lld] Buffer writes when composing a single diagnostic.Tue, Sep 8, 8:53 AM
MaskRay edited the summary of this revision. (Show Details)
MaskRay updated this revision to Diff 290499.Tue, Sep 8, 9:00 AM

Address comments

lld/Common/ErrorHandler.cpp
213

os is used after the common part and requires a SmallString, so it is a bit difficult to share code.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic. And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

That's what I figured this might be originally about - but if that's already working, I'm not sure there's a need to change? But I don't feel super strongly if the alternative being proposed is especially tidier.

jhenderson added inline comments.Wed, Sep 9, 1:44 AM
lld/Common/ErrorHandler.cpp
213

How about something like this:

void ErrorHandler::reportDiagnostic(Colors c, StringRef diagKind, StringRef location, const Twine &msg) {
  SmallString<256> buf;
  raw_svector_ostream os(buf);
  os << sep << location << ": ";
  if (lld::errs().colors_enabled()) {
    os.enable_colors(true);
    os << c << diagKind << ": " << Colors::
  } else {
    os << diagKind << ": ";
  }
  os << msg << '\n';
  std::lock_guard<std::mutex> lock(mu);
  lld::errs() << buf;
}

void ErrorHandler::warn(const Twine &msg) {
  ...
  reportDiagnostic(Colors::MAGENTA, "warning", getLocation(msg), msg);
  sep = getSeparator(msg);
}

void ErrorHandler::error(const Twine &msg) {
  ...
  if (errorLimit == 0 || errorCount <= errorLimit) {
    if (errorLimit != 0 && errorCount == errorLimit) {
      reportDiagnostic(Colors::RED, "error", getLocation(msg), errorLimitExceededMsg);
      exit = exitEarly;
    } else {
      reportDiagnostic(Colors::RED, "error", getLocation(msg), msg);
    }
    sep = getSeparator(msg);
    ++errorCount;
  }
MaskRay updated this revision to Diff 290759.Wed, Sep 9, 9:41 AM

Adopt jhenderson's suggestion

MaskRay marked 3 inline comments as done.Wed, Sep 9, 9:42 AM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls. Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

(The more focused syscall log lines under strace/truss is a very minor benefit)

And one can worry less about interleaved output from multiple threads. (I suspect I can remove std::lock_guard<std::mutex> lock(mu); but I don't want to take the risk in this patch)

That's what I figured this might be originally about - but if that's already working, I'm not sure there's a need to change? But I don't feel super strongly if the alternative being proposed is especially tidier.

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

Seems like the code could be simpler still by doing that refactoring without adding the buffering?

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

Now with jhenderson's suggestion (thanks), this turned to a refactoring. The total number of lines has increased but it looks clearer (the log/warn/error functions don't have to deal with lock_guard, sep, and the order), so I think the end result is not bad.

Seems like the code could be simpler still by doing that refactoring without adding the buffering?

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

There's still locking though, so I'm not sure it's a huge difference in mental complexity - locking on the write, to locking on a 10 line function that writes a few times?

MaskRay added a comment.EditedWed, Sep 9, 3:06 PM

Is performance important in this case? (admittedly, some folks might have builds full of warnings they aren't paying attention to & so they still care about performance then?) Generally the idea with clang/etc is that anything that produces user output (other than the successfully compiled/linked file) doesn't need to be fast (in compile/link-time terms) because it'll take /way/ longer for a user to read and think about the error message than the slight increase in execution time.

The nicer thing is that when running under strace/truss, there will be just one write for one diagnostic.

Is that true of many other tools? (I wouldn't've expected it'd be true of clang, for instance - or probably other LLVM utilities) - is it especially helpful?

The motivation was to reduce unneeded ::write calls.

Sounds like unnecessary/premature optimization, though?

I don't think it is premature optimization. When a --symbol-ordering-file has absent entries or when --call-graph-ordering-file has incorrect entries, the terminal output can be quite slow. This change can make it much faster by also making things just better (one write syscall for one line of diagnostic).

How much faster is the performance if you user disables the warning they have chosen not to fix? (if they're fixing it, the time taken to fix it will be a lot longer than the link time penalty)

https://reviews.llvm.org/D87121#2260177 I don't use it as a motivation for the patch. I created this patch because I think it is the right thing to do.

You can save 2 or 3 lines by not making this change, but to me the "mental complexity" is larger: a lock is needed to prevent interleaving output from multiple threads.

There's still locking though, so I'm not sure it's a huge difference in mental complexity - locking on the write, to locking on a 10 line function that writes a few times?

I keep the lock as I don't want to take the risk in this change. The mental complexity is different.

Before: (a) the coder worries that the raw_ostream::operator<< may access data potentially being concurrently modified; (b): concurrent ErrorHandler::log/warn/error may interleave their output so a lock is necessary
Now: only (a) exists

If lld::errs() can be made like std::osyncstream, the lock mental complexity can be entirely removed here. Without introducing the`std::osyncstream` concept, the new behavior can decrease mental complexity because composing local variables (as opposed to calling lld::errs() << multiple times) is apparently thread safe.

jhenderson accepted this revision.Thu, Sep 10, 12:09 AM

Looks good from my point of view, but best wait for the conversation with @dblaikie to be resolved.

This revision is now accepted and ready to land.Thu, Sep 10, 12:09 AM
andrewng added inline comments.
lld/Common/ErrorHandler.cpp
191

IIRC, the update of sep needs to be within the same lock to avoid a race condition.

220

Same locking/race condition for sep as above.

MaskRay planned changes to this revision.Thu, Sep 10, 9:42 AM
MaskRay added inline comments.
lld/Common/ErrorHandler.cpp
220

Yes. errorCount is also racy. This is a big problem. The lock may still be needed.