This is an archive of the discontinued LLVM Phabricator instance.

[Support] No longer require .flush()ing raw_string_ostream
ClosedPublic

Authored by logan-5 on Dec 8 2021, 8:25 PM.

Details

Summary

This solves some problems in https://reviews.llvm.org/D115374 by eliminating the need for .flush() calls before returning the underlying strings. It also brings raw_string_ostream closer into parity with raw_svector_ostream, which does not require (nor even allow, see below) flushing.

I tried void flush() = delete; inside raw_string_ostream for parity with raw_svector_ostream, but ended up with around 750 call sites that needed to be manually inspected/fixed. (And if that's any indication, I think we can be sure the =delete will break lots of downstream code.) A great many of them were call sites that followed the OS.flush(); return Underlying; pattern I originally put forth in D115374. @dexonsmith, you mentioned that you weren't sure if the manual .flush()es were even all that terrible; my inspection of maybe a quarter of the 750 call sites suggests very strong precedent for this pattern already throughout LLVM. With this, plus the potential (small) performance+flexibility loss about not being able to buffer to avoid std::string's eager null terminator writes, I admit I'm back on the fence about removing flush().

(On the other hand, we could just not =delete flush, i.e. still allow it but no longer require it, which is effectively what this diff accomplishes.)

Diff Detail

Event Timeline

logan-5 created this revision.Dec 8 2021, 8:25 PM
logan-5 requested review of this revision.Dec 8 2021, 8:25 PM
logan-5 updated this revision to Diff 393025.Dec 8 2021, 9:47 PM

Fixed some tests--had to add manual .flush()es back in to some tests that were testing general flushing behavior of raw_ostream, but using raw_string_ostream for convenience.

logan-5 updated this revision to Diff 393217.Dec 9 2021, 10:38 AM

Formatting.

dexonsmith accepted this revision.Dec 9 2021, 12:23 PM
dexonsmith added a reviewer: bkramer.
dexonsmith added a subscriber: bkramer.

LGTM, with a couple of comments inline (e.g., I might leave str() alone for now, delete all callers, and then delete it entirely, but I'm okay either way).

plus the potential (small) performance+flexibility loss about not being able to buffer to avoid std::string's eager null terminator writes

Users can still buffer for that case, they'd just need to manually flush. But most of the time, better to use a likely-big-enough SmallString and a raw_svector_ostream and then convert to a std::string at the end.

llvm/include/llvm/Support/raw_ostream.h
640

Interesting; I didn't realize this was already here. Looks like it's relatively recent, from 65b13610a5226b84889b923bae884ba395ad084d, but that means we've already analyzed and/or accepted any perf regressions.

@bkramer, any specific reason you left the flush()s around? Or just didn't have time to finish aligning with raw_svector_ostream?

642

I think you can just delete this line entirely now.

644–646

I wonder if this API should be deleted entirely? (not in this commit) -- in which case, could land just the change to the destructor here, then delete callers, then delete this.

llvm/unittests/Support/raw_ostream_test.cpp
23–30 ↗(On Diff #393217)

I'd suggest dropping the scope as well since it's not doing anything anymore.

146 ↗(On Diff #393217)

You should drop the .str() on this line as well.

This revision is now accepted and ready to land.Dec 9 2021, 12:23 PM
logan-5 updated this revision to Diff 393258.Dec 9 2021, 12:47 PM
logan-5 marked 3 inline comments as done.

Addressed comments

logan-5 added inline comments.Dec 9 2021, 12:49 PM
llvm/include/llvm/Support/raw_ostream.h
644–646

N.B. raw_svector_ostream has str() that returns StringRef, which appears to be useful in places. We could keep this for API parity, but change it to return StringRef to discourage using it to initialize std::strings.

dexonsmith added inline comments.Dec 9 2021, 2:21 PM
llvm/include/llvm/Support/raw_ostream.h
644–646

Sure, fine by me if there are places where the OS.str() actually helps. I kind of suspect there aren't any though, and this was just to make flush+ref a one-liner.

raw_svector_ostream::str() is more useful since SmallVectorImpl<char> doesn't have an implicit conversion to StringRef, whereas std::string does.

To be quite honest, in regards to the big .str() refactor (whether changing its type or deleting it), I feel I may have started to bite off more than I can chew here. There are at least a few thousand call sites of .str() throughout LLVM and its subprojects, and as just a very occasional contributor who will be a distant stranger to most of them, I don't feel equipped or qualified to tackle auditing and fixing them myself. (Not to mention that it's orders of magnitude more work than I had planned to contribute with my initial patch.) I just wanted to optimize a few returns, and now we're talking about making a huge number of changes across the entire repo. I'm happy to bring this current patch through to landing, but when it comes to the big .str() purge I might have to humbly step back.

To be quite honest, in regards to the big .str() refactor (whether changing its type or deleting it), I feel I may have started to bite off more than I can chew here. There are at least a few thousand call sites of .str() throughout LLVM and its subprojects,

(FWIW, I bet most of those are unrelated to raw_string_ostream. Removing the function would probably find you a much smaller list.)

and as just a very occasional contributor who will be a distant stranger to most of them, I don't feel equipped or qualified to tackle auditing and fixing them myself. (Not to mention that it's orders of magnitude more work than I had planned to contribute with my initial patch.) I just wanted to optimize a few returns, and now we're talking about making a huge number of changes across the entire repo. I'm happy to bring this current patch through to landing, but when it comes to the big .str() purge I might have to humbly step back.

Fair enough! In that case, I suggest just removing the flush() from str() (as you have here) and adding a TODO to consider deleting it or making it return StringRef. Then at least anyone who looks can see it's obviously uninteresting; and you and/or others can incrementally delete the uninteresting calls over time.

Note also: git-grep gives me only 10 hits across llvm-project SetBuffered(); the only two outside of raw_ostream and its tests are llvm::errs().SetBuffered() in clangd. In case you're still worried about whether it's safe to do.

logan-5 updated this revision to Diff 393523.Dec 10 2021, 9:23 AM

Updated comment and added TODO.

Thanks for the guidance, it's much appreciated.

you and/or others can incrementally delete the uninteresting calls over time.

Doing it incrementally sounds much more approachable. I may indeed submit cleanup patches like this as time permits.

This revision was landed with ongoing or failed builds.Jan 7 2022, 9:25 AM
This revision was automatically updated to reflect the committed changes.