Page MenuHomePhabricator

[libc++][format][3/6] Adds a __container_buffer.
Needs ReviewPublic

Authored by Mordante on Sep 26 2021, 6:48 AM.

Details

Reviewers
ldionne
vitaut
Group Reviewers
Restricted Project
Summary

Instead of writing every character directly into the container by using
a back_insert_iterator the data is buffered in an array. This buffer
is then inserted to the container by calling its insert member function.

Since there's no guarantee every container's insert behaves properly
containers need to opt-in to this behaviour. The appropriate standard
containers opt-in to this behaviour.

This change improves the performance of the format functions that use a
back_insert_iterator.

Depends on D110495

Diff Detail

Event Timeline

Mordante created this revision.Sep 26 2021, 6:48 AM
Mordante requested review of this revision.Sep 26 2021, 6:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2021, 6:48 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Mordante updated this revision to Diff 375099.Sep 26 2021, 7:11 AM

Attempt to fix Clang-11 build issue.

Quuxplusone added inline comments.
libcxx/include/__format/buffer.h
97

The machinery to __enable_insertable in a container right now is like 16 lines. You could get it down to 1 line, and no helper headers, and C++11-compatible, if you changed this to

is_same_v<_Container, typename _Container::__enable_format_insertable>

Then the containers (vector, deque, string, list, etc.) would just have to say e.g.

using __enable_format_insertable = vector<_Tp, _Alloc>;

without any extra headers or ifdef-guards or anything.

100

add_pointer_t<typename _Container::value_type> is invariably typename _Container::value_type*, right?

105–127

Rather than lines 129-140, I suggest that you add a member function container_type *back_insert_iterator::__get_container() const to libc++'s back_insert_iterator.
I suppose you still need a partial-specialization-based approach equivalent to lines 117-127, in order to tell you whether the user's iterator type is exactly back_insert_iterator (in which case this optimization is safe) or merely something that slices to back_insert_iterator (in which case this optimization is dubious). But once you know it's exactly back_insert_iterator, you can just ask for its container_type member typedef, and use your new member function to get a pointer to the container; you don't have to use tricks for that part.

159

Why not _CharT __buffer_[__buffer_size];? (What does the std::array dependency buy you? and the zero-initialization?)

libcxx/include/module.modulemap
442 ↗(On Diff #375099)

/intertable/insertable/

Mordante planned changes to this revision.Sep 26 2021, 10:00 AM

Needs to be updated due to the removal of __put in D110495.

Mordante marked 5 inline comments as done.Oct 9 2021, 5:25 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
97

Thanks for the suggestion, this indeed is a lot less intrusive.

105–127

See D110573.

159

I prefer an array just for convenience. I need to know the last element and I like it better to use end().
The zero-initialization isn't needed.

libcxx/include/module.modulemap
442 ↗(On Diff #375099)

This is no longer needed due to your earlier suggestions.

Mordante updated this revision to Diff 378439.Oct 9 2021, 5:37 AM
Mordante marked 4 inline comments as done.

Addresses the review comments.
Temporary include the used part of D110573 to avoid possible Buildkite dependency issues.

I think it's up to @ldionne and/or @vitaut to say whether this PR is ultimately a good tradeoff, complexity-and-performance-wise. IMHO we've successfully reached the local optimum, code-wise. :)

libcxx/include/__format/buffer.h
148

Style nit: _CharT* __buffer_it_ = __buffer_.begin(); is easier to read, and equivalent. Could even say __buffer_.data() and then I wouldn't need to consciously avoid looking up on cppreference whether array::iterator is guaranteed to be _CharT*. ;)

libcxx/include/list
867

Style nit: _Tp, _Alloc with a space

libcxx/test/libcxx/utilities/format/enable_insertable.compile.pass.cpp
30–52

FWIW, I think it is not useful to test no_value_type, no_end, or no_insert, because we don't permit users to define such types and we don't expect to define any such types ourselves. Seeing a correct "specialization" of __enable_format_insertable should be enough to indicate that the type author intends the type to be insertable. If it's somehow not insertable, then that's a serious show-stopping bug in the type, not something that <format> should be silently working-around.

69

Also test

struct UserVector : public std::vector<char> {
    using std::vector<char>::vector;
};
static_assert(!std::__format::__insertable<UserVector>);

This is important to test because users are allowed to do stupid things like this. (They shouldn't; it's bad style and has many downsides; but they are allowed to.)

94

What do you expect to happen for std::vector<int> or std::vector<std::string>? On the one hand, int and string are not "char types," right? But on the other hand, vec.insert(vec.end(), charptr, charendptr) will work as expected, so we do want to do the optimization, right?

Mordante planned changes to this revision.Oct 10 2021, 10:27 AM
Mordante marked an inline comment as done.

Thanks for the feedback, I'll look at these issues when Buildkite is working properly again.

libcxx/test/libcxx/utilities/format/enable_insertable.compile.pass.cpp
94

The format specific functions require a char or wchar_t. So I expect them not to pass this trait. When we want to use this more generic, we can add an std::__insertable and rename the typedef in the containers. I didn't restrain the typedef in the classes; I looked at it but it became quite ugly quite fast. Instead it's constrained in the std::__format::__insertable concept. So it would be quite trivial to add std::__insertable, but I don't have a usecase for it.

Mordante updated this revision to Diff 378701.Oct 11 2021, 9:26 AM
Mordante marked 5 inline comments as done.

Addresses review comments.

libcxx/include/__format/buffer.h
148

I prefer the begin() since that conveys the intended usage better.

libcxx/include/list
867

Interesting clang-format didn't catch that for me.

libcxx/test/libcxx/utilities/format/enable_insertable.compile.pass.cpp
30–52

I not entirely convinced that removing them is a good idea. I'd like to hear @ldionne's opinion.

Mordante updated this revision to Diff 380179.Oct 16 2021, 6:20 AM

Rebased to test with wchar_t changes.

Mordante updated this revision to Diff 380181.Oct 16 2021, 7:11 AM

Fixes wide char build.

Since there's no guarantee every container's insert behaves properly

What does behaving properly mean? Could you give an example of a container's insert that doesn't behave properly?

libcxx/include/__format/buffer.h
92

replace -> replaced

94

opt-in to -> "opt in to" or "opt into"

vitaut requested changes to this revision.Oct 31 2021, 7:03 AM

Character-by-character output, especially type erased one introduced in D110494 doesn't seem efficient. As commented on the the previous diff the buffer and corresponding output iterator should provide access to a contiguous buffer and type erase the reallocation, not character output.

This revision now requires changes to proceed.Oct 31 2021, 7:03 AM

Character by character output doesn't seem efficient

Right, IIUC, this would end up rebased on top of D112361 or the like.

libcxx/include/__format/buffer.h
94

Since there's no guarantee every container's insert behaves properly

What does behaving properly mean? Could you give an example of a container's insert that doesn't behave properly?

User-defined containers can make their insert do arbitrary things, e.g.

struct Evil : std::string {
    void insert(std::string::iterator, char *, char *) { /*no-op*/ }
};
Evil e;
auto evil_it = std::back_inserter(e);  // uses e.push_back, which is fine

(Serendipitously, Charlie Barto, of the MSVC STL, also mentioned the need to guard against this possibility in his CppCon talk "Why does std::format do that?" on Thursday. https://cppcon.digital-medium.co.uk/thursday-recordings/ )

vitaut added inline comments.Oct 31 2021, 9:25 AM
libcxx/include/__format/buffer.h
94

Makes sense, thanks for clarifying.

Character-by-character output, especially type erased one introduced in D110494 doesn't seem efficient. As commented on the the previous diff the buffer and corresponding output iterator should provide access to a contiguous buffer and type erase the reallocation, not character output.

Yes I first want to get a good solution in D112361 and then update this patch with these changes.

libcxx/include/__format/buffer.h
94

Since there's no guarantee every container's insert behaves properly

What does behaving properly mean? Could you give an example of a container's insert that doesn't behave properly?

User-defined containers can make their insert do arbitrary things, e.g.

struct Evil : std::string {
    void insert(std::string::iterator, char *, char *) { /*no-op*/ }
};
Evil e;
auto evil_it = std::back_inserter(e);  // uses e.push_back, which is fine

(Serendipitously, Charlie Barto, of the MSVC STL, also mentioned the need to guard against this possibility in his CppCon talk "Why does std::format do that?" on Thursday. https://cppcon.digital-medium.co.uk/thursday-recordings/ )

Yes that's the reason, interesting to see Charlie did the same. I'm looking forward to watching this talk.

Mordante marked 4 inline comments as done.Sun, Nov 14, 6:48 AM
Mordante updated this revision to Diff 387083.Sun, Nov 14, 7:23 AM

Rebased and make changes based on D112361.
Addresses review comments.