This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format][2/6] Adds a __output_iterator.
ClosedPublic

Authored by Mordante on Sep 26 2021, 5:49 AM.

Details

Reviewers
ldionne
vitaut
Group Reviewers
Restricted Project
Commits
rG555214cbcc79: [libc++][format][2/6] Adds a __output_iterator.
Summary

Instead of using a temporary string in __vformat_to_wrapped use a new
generic iterator. This aids to reduce the number of template instantions
and avoids using a string to buffer the entire formatted output.

This changes the type of format_context and wformat_context, this can
still be done since the code isn't ABI stable yet.

Several approaches have been evaluated:

  • Using a __output_buffer base class with:
    • a put function to store the buffer in its internal buffer
    • a virtual flush function to copy the internal buffer to the output
  • Using a function to forward the output operation to the output buffer, much like the next method.
  • Using a type erased function point to store the data in the buffer.

The last version resulted in the best performance. For some cases there's
still a loss of speed over the original method. This loss many becomes
apparent when large strings are copied to a pointer like iterator, before
the compiler optimized this using memcpy.

Diff Detail

Event Timeline

Mordante created this revision.Sep 26 2021, 5:49 AM
Mordante requested review of this revision.Sep 26 2021, 5:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2021, 5:49 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added inline comments.
libcxx/include/__format/buffer.h
30

/that too/it is too/?

39

Nit: type-erased hyphenated.

46–47

Generally I find the javadoc-style comments unhelpful; but also in this case it's a bit misleading, since of course __obj can't be any old class object containing a put; it must be the specific object, of the specific type, that you instantiated __put with (or that you instantiated __output_iterator's ctor with, if you take my refactor below). Fortunately, if you take my refactor below, then you won't need _Fun in multiple places and so you won't need a distinct name for it and so you won't need a comment about what the name means. :)

73–77

The comment is ungrammatical and unhelpful, and T needs uglification; but why just not make this a private detail of __output_iterator? Instead of

_LIBCPP_HIDE_FROM_ABI explicit __output_iterator(_Fun __fun, void* __obj)
    : __fun_(__fun), __obj_(__obj) {}

do

_LIBCPP_HIDE_FROM_ABI explicit __output_iterator(_Tp *__t)
    : __fun_([](_CharT __c, void *__o){ static_cast<_Tp*>(__o)->put(__c); }), __obj_(__t) {}

and construct it on line 504 as

_VSTD::__format::__output_iterator<_CharT>(_VSTD::addressof(__buffer))

(notice addressof, not operator&, for ADL-proofing and hijacking-avoidance)

85–86

Btw, I see some inconsistency throughout on whether we're doing 2-space tabs in this file or 4-space (lines 57, 88). Doesn't matter, though.

libcxx/test/libcxx/utilities/format/format.formatter/format.context/types.compile.pass.cpp
126

Your commit message implies that you did some performance testing benchmarks between this implementation and some other implementation. What were those benchmarks, and can you check them into libcxx/benchmarks/?

Mordante marked 4 inline comments as done.Sep 26 2021, 9:07 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
30

I'll leave this as is since the same block is copied several times. Still hope we soon only need to support compilers with proper concepts support.

85–86

This is what clang-format makes of it. I noticed clang-format-13 seems to have some support for concepts and I still want to reformat all format related headers once we switch to clang-format-13 and the number of active format patches is reduced.

libcxx/test/libcxx/utilities/format/format.formatter/format.context/types.compile.pass.cpp
126

See D110501.

Mordante marked 5 inline comments as done.Sep 26 2021, 10:05 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
73–77

Your suggestion indeed looks better than the current code. Thanks!

Mordante updated this revision to Diff 375123.Sep 26 2021, 10:09 AM
Mordante marked an inline comment as done.

Addresses review comments.
@Quuxplusone's suggestion to use a lambda instead of a helper function means the other patches in this series need to be adjusted in a similar fashion.

Mordante updated this revision to Diff 378435.Oct 9 2021, 4:40 AM

Rebased; changes the generated files.

Mordante updated this revision to Diff 380172.Oct 16 2021, 5:08 AM

Rebased to test with wchar_t changes.

vitaut requested changes to this revision.Oct 17 2021, 7:30 AM

I'm very surprised that this is the solution you went with. Indirect function call to write each character individually will likely be slow in practice compared to a contiguous buffer and a bulk copy. Please post benchmark results.

This revision now requires changes to proceed.Oct 17 2021, 7:30 AM

Thanks for the review. I'll try your suggestion and benchmark both versions.

I've created a proof-of-concept here with benchmark results D112361.
That revision is just to show what I did. It's intended to be a basis for the changes, it *is not* to be used as-is.

vitaut added a comment.EditedOct 24 2021, 9:46 AM

I've created a proof-of-concept here with benchmark results D112361.
That revision is just to show what I did. It's intended to be a basis for the changes, it *is not* to be used as-is.

I think D112361 is a step in the right direction. You can do even better and make it represent a contiguous (not necessarily fixed-size) buffer with a single type-erased reallocation/flush function. Then it can be used to write to contiguous containers like vector and string which is a common case directly avoiding copies. This is basically what {fmt} does.

Mordante updated this revision to Diff 387079.Nov 14 2021, 6:38 AM

Rebased and make changes based on D112361.

vitaut added inline comments.Nov 25 2021, 8:11 AM
libcxx/include/__format/buffer.h
57–59

As discussed in D112361, it would be good to avoid extra buffering for the common case of appending to a contiguous container such as string or vector.

Mordante added inline comments.Nov 25 2021, 11:25 AM
libcxx/include/__format/buffer.h
57–59

What do you mean with "appending"?
std::back_inserter(std::string) or std::string::begin()

I think it makes sense to buffer the back_inserter. For example every push_back in a std::string NUL-terminates the string. The insert operation does one NUL-termination.

D110497 changes the behaviour for these containers to buffer in this object and then use insert in these containers.

But I can test with the benchmarks whether the extra buffering has a positive or negative effect.

vitaut added inline comments.Nov 26 2021, 7:59 AM
libcxx/include/__format/buffer.h
57–59

What do you mean with "appending"?

std::back_inserter(std::string)

For example every push_back in a std::string NUL-terminates the string.

You should avoid character-by-character insertion in the first place. Also adding buffering for std::string is easy but you cannot remove it so the current approach is inherently less efficient than needed.

Mordante marked 2 inline comments as done.Dec 1 2021, 10:02 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
57–59

I agree and I should improve this inefficiency. I added a todo on line 71 to that effect and have more todos for this in my personal notes.
For now my main focus is feature completeness, so I prefer to postpone these changes to a later time.
D110497 already has some improvements for containers, but for string and vector we can do better.

Mordante updated this revision to Diff 391073.Dec 1 2021, 10:04 AM
Mordante marked an inline comment as done.

Rebased and some minor improvements.

Mordante updated this revision to Diff 399750.Jan 13 2022, 12:12 PM

Rebased to trigger CI.

Mordante updated this revision to Diff 399900.Jan 13 2022, 10:53 PM

Removed related commits to fix CI patch application.

vitaut accepted this revision.Jan 16 2022, 9:06 AM

As commented, I'm very skeptical about this approach because it has extra buffering for common cases but I defer to libc++ maintainers the decision whether they want to land this as is or optimize buffering now.

Sorry, I hadn't seen this was waiting for some sort of resolution.

@Mordante do you have an already planned way of addressing the inefficiencies raised by @vitaut? If not, I'd like to have a solution for those before shipping this. Otherwise, there's a risk that we'll forget about it and eventually ship the library without these optimizations (which IIUC are not merely nice-to-haves, but really important basic guarantees). I especially want to avoid us being painted into a corner where these optimizations are difficult to achieve in the current design.

Sorry, I hadn't seen this was waiting for some sort of resolution.

No problem I was busy with other format parts which I wanted in LLVM 14. Otherwise I would have pinged you on Discord.

@Mordante do you have an already planned way of addressing the inefficiencies raised by @vitaut? If not, I'd like to have a solution for those before shipping this. Otherwise, there's a risk that we'll forget about it and eventually ship the library without these optimizations (which IIUC are not merely nice-to-haves, but really important basic guarantees). I especially want to avoid us being painted into a corner where these optimizations are difficult to achieve in the current design.

I've ideas how I want to address these inefficiencies and added TODOs in the code. Before we mark format as complete I will review those TODOs and see whether they are blocking for marking the implementation complete. For example after moving the floating-point formatter to the dylib the proper long double implementation won't be an ABI break. (This requires long double support in to_chars. I want to tackle that, but it isn't trivial.)

The current changes in this series are an improvement. Especially for formatted_size and format_to_n that now use a string as temporary buffer.

I prefer to land the current improvement and look at further optimizations later.

ldionne accepted this revision.Mar 23 2022, 1:04 PM

Looking at patches 4 and 5 in the series, I think this seems reasonable since the buffering issue is being addressed AFAIU.

This revision is now accepted and ready to land.Mar 23 2022, 1:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 1:04 PM
Mordante updated this revision to Diff 418390.Mar 26 2022, 5:06 AM

Rebased to test CI and require compilers with concepts.

As discussed on Discord more buffer optimization can be achieved, but that will be done in separate patches.

Mordante updated this revision to Diff 418394.Mar 26 2022, 7:26 AM

Fix Windows CI.

This revision was automatically updated to reflect the committed changes.