This is an archive of the discontinued LLVM Phabricator instance.

[NFC][clang] Return std::strings built from raw_string_ostreams more efficiently
ClosedPublic

Authored by logan-5 on Dec 8 2021, 12:26 PM.

Details

Summary

Returning OS.str() is guaranteed to copy the underlying string, whereas OS.flush(); return Underlying; makes Underlying a candidate for NRVO, or at worst, implicit move.

To keep this kind of inefficiency at bay in the future, the fast code should probably be made easier to type than the slow code (as it's currently the opposite). Perhaps this could be solved by:

  • making raw_string_ostream guarantee+document that it does not need to be flush()ed (similar to raw_svector_ostream), and/or
  • making raw_string_ostream::str() return a StringRef rather than std::string&, to discourage using it to initialize std::strings

Implementing those two ideas would make simply return Underlying; the natural, correct, and efficient choice. They are, however, out of scope for this patch.

Diff Detail

Event Timeline

logan-5 created this revision.Dec 8 2021, 12:26 PM
logan-5 requested review of this revision.Dec 8 2021, 12:26 PM
Quuxplusone added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
36

FWIW, it appears to me that in most (all?) of these cases, what's really wanted is not "a string and a stream" but rather "a stream that owns a string" (std::ostringstream or the LLVM-codebase equivalent thereof). Then the return can be return std::move(OS).str(); — for std::ostringstream, this Does The Right Thing since C++20, and if LLVM had its own stringstream it could make it Do The Right Thing today.
https://en.cppreference.com/w/cpp/io/basic_ostringstream/str

logan-5 added inline comments.Dec 8 2021, 1:46 PM
clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
36

True enough. Although return std::move(OS).str(); is still much harder to type than the less efficient return OS.str();, and it requires at minimum a move of the underlying string--whereas return Str; is the easiest of all to type, and it opens things up for NRVO. If (as I said in the patch summary) raw_string_ostream were changed to be guaranteed to not need flushing, return Str; would IMHO be cemented as the clear winner.

That said, you're clearly right that all these cases are semantically trying to do "a stream that owns a string", and it's clunky to execute with the existing APIs.

I don't feel strongly, but IMO the code might be a bit harder to read/maintain with the explicit flush. I worry that it'd be easy to move the flush() away from the return. Not sure I'm right; could just be familiarity with str().

Have you considered other options? Two below.

One option:

std::string Result;
{
  llvm::raw_string_ostream OS(Str);
  OS << ...;
}
return Result;

which in branch-free cases could shorten to:

std::string Str;
llvm::raw_string_ostream(Str) << ...;
return Str;

I personally find the lifetime a bit more obvious than the explicit flush call.

Another option:

std::string Result;
llvm::raw_string_ostream OS(Str);
OS << ...;
return OS.take();

Where raw_string_ostream::take() is just return std::move(str()). It doesn't get NRVO, but I'm not sure that really matters in most of these places. Benefit is that it's a minimal change and the name is clear / matches other LLVM things.

clang/lib/AST/DeclarationName.cpp
236–240 ↗(On Diff #392864)

For trivial cases like this, maybe it'd be worth creating a helper function that gets reused. Maybe called llvm::raw_string_ostream::toString():

template <class T>
std::string raw_string_ostream::toString(const T &V) {
  std::string Result;
  raw_string_ostream(Result) << V;
  return Result;
}

and the code here would be:

return llvm::raw_string_ostream::toString(*this);
dexonsmith added inline comments.Dec 8 2021, 4:06 PM
clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
36

If (as I said in the patch summary) raw_string_ostream were changed to be guaranteed to not need flushing

This sounds like a one-line patch; might be better to just do it rather than having to churn all these things twice.

I don't feel strongly, but IMO the code might be a bit harder to read/maintain with the explicit flush. I worry that it'd be easy to move the flush() away from the return. Not sure I'm right; could just be familiarity with str().

I definitely hear you. I don't really mind it personally, and I did it this way because I saw precedent in a couple spots (there's one on CompilerInvocation.cpp:653, several in clangd, etc.). I definitely see how it could be a little bit spooky though.

std::string Str;
llvm::raw_string_ostream(Str) << ...;
return Str;

I like this idea. I'd be happy to go back through and change the simple ones to this pattern.

Another option:

std::string Result;
llvm::raw_string_ostream OS(Str);
OS << ...;
return OS.take();

Where raw_string_ostream::take() is just return std::move(str()). It doesn't get NRVO, but I'm not sure that really matters in most of these places. Benefit is that it's a minimal change and the name is clear / matches other LLVM things.

I suppose the question then becomes whether to name it take() or str() && for symmetry with C++20's std::ostringstream. (Also for the record, I agree that NRVOing some std::strings isn't going to make a giant difference; my opinion is simply that if we can get it, we might as well.)

clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
36

If (as I said in the patch summary) raw_string_ostream were changed to be guaranteed to not need flushing

This sounds like a one-line patch; might be better to just do it rather than having to churn all these things twice.

I guess this change kind of freaks me out. Currently you can call SetBuffered() on raw_string_ostream (though I don't know why you would...), which creates an intermediate buffer and then flush() syncs the buffer with the underlying std::string&. Removing that ability would be a breaking change, and I'm not sure how we could make it while being confident we're not breaking anything downstream.

(On the other hand, you can call SetBuffered() on raw_svector_ostream too, whose documentation more or less says it doesn't support buffering. If I'm reading right, you get an assert failure in ~raw_ostream() if you do.)

logan-5 updated this revision to Diff 392996.Dec 8 2021, 5:16 PM

Eliminate some explicit .flush()es by using temporary raw_string_ostreams where possible.

I don't feel strongly, but IMO the code might be a bit harder to read/maintain with the explicit flush. I worry that it'd be easy to move the flush() away from the return. Not sure I'm right; could just be familiarity with str().

I definitely hear you. I don't really mind it personally, and I did it this way because I saw precedent in a couple spots (there's one on CompilerInvocation.cpp:653, several in clangd, etc.). I definitely see how it could be a little bit spooky though.

I haven't done the git-blame, but I somewhat suspect str() was added specifically as a convenience to make the flush+access/return a one-liner.

I suppose the question then becomes whether to name it take() or str() && for symmetry with C++20's std::ostringstream. (Also for the record, I agree that NRVOing some std::strings isn't going to make a giant difference; my opinion is simply that if we can get it, we might as well.)

I think take() would be clearer for use in LLVM code, despite the symmetry. Especially since the two stream libraries have significant differences. (Also, maybe I'm mis-remembering, but I think I've seen places where OS.str() is passed to some function when it's partially written; seems dangerous to silent start selecting a different overload. Could be I'm wrong though.)

clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
36

I don't think we need to be afraid. raw_ostream seems like a bit of a leaky abstraction, but IMO we need to trust that code doesn't arbitrarily call SetBuffered() on without full control. It's essentially never the right thing to do except from subclasses (where it could be protected) and maybe in unit tests for convenience. As you pointed, out this would also break svector_stream.

What I'd be more worried about is the micro-performance-penalty of std::string null-terminating with every write. But probably that's not particularly important either.

To be clear, it sounds like we should either add .take() for moving the string out of raw_string_ostream's string reference, or make raw_string_ostream not need to be flushed (after which there won't be as clear a use for .take(), since you can just use the underlying string directly).

I vote for the second option, making raw_string_ostream not need to be flushed, since it allows for simpler code at the call site (return Str;), permits NRVO at the call site, and avoids some possibly weird questions .take() would bring with it (like whether it would ever be surprising that it moves out of a reference that someone else might also have).

If that's the direction that sounds best, I can submit an updated patch sometime tomorrow.

To be clear, it sounds like we should either add .take() for moving the string out of raw_string_ostream's string reference, or make raw_string_ostream not need to be flushed (after which there won't be as clear a use for .take(), since you can just use the underlying string directly).

I vote for the second option, making raw_string_ostream not need to be flushed, since it allows for simpler code at the call site (return Str;), permits NRVO at the call site, and avoids some possibly weird questions .take() would bring with it (like whether it would ever be surprising that it moves out of a reference that someone else might also have).

If that's the direction that sounds best, I can submit an updated patch sometime tomorrow.

Yeah, probably one or the other. I'm leaning toward the no-flush approach as well, but I'd suggest making that a separate prep patch to reduce churn/dependencies in case there's some reason it needs to be reverted (e.g., compile time regression). Removing the no-longer-needed .str()s in a follow-up a few days later (rebasing this without the flushes) should be trivial.

Returning from https://reviews.llvm.org/D115421, it looks like that raw_string_ostream has unbuffered by default since 65b13610a5226b84889b923bae884ba395ad084d. Seems like all the new flush()s in this patch are no-ops. You can simplify this patch to just mechanically converting return OS.str() to return S.

logan-5 updated this revision to Diff 393262.Dec 9 2021, 12:59 PM

Removed .flush()es. Seems like this might be able to land without https://reviews.llvm.org/D115421?

dexonsmith accepted this revision.Dec 9 2021, 2:17 PM

Removed .flush()es. Seems like this might be able to land without https://reviews.llvm.org/D115421?

Yup! LGTM if you remove the cleanups that are no longer related to this patch, and/or split them into separate follow-up NFC commits. Might be nice to split even the main commit into a few pieces (e.g., by functional area) to push separately, just in case something needs to be reverted due to a call to SetBuffered() in an unexpected place.

Please make sure the commit message points at 65b13610a5226b84889b923bae884ba395ad084d, which made this correct.

clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
72

Given there's no scope change needed, these cleanups are no longer related to the patch and are adding noise to the diff. I suggest committing them separately (or not at all).

This revision is now accepted and ready to land.Dec 9 2021, 2:17 PM