This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format][3/6] Adds a __container_buffer.
ClosedPublic

Authored by Mordante on Sep 26 2021, 6:48 AM.

Details

Reviewers
ldionne
vitaut
Group Reviewers
Restricted Project
Commits
rG889302292bf6: [libc++][format][3/6] Adds a __container_buffer.
Summary

Instead of writing every character directly into the container by using
a back_insert_iterator the data is buffered in an array. This buffer
is then inserted to the container by calling its insert member function.

Since there's no guarantee every container's insert behaves properly
containers need to opt-in to this behaviour. The appropriate standard
containers opt-in to this behaviour.

This change improves the performance of the format functions that use a
back_insert_iterator.

Depends on D110495

Diff Detail

Event Timeline

Mordante created this revision.Sep 26 2021, 6:48 AM
Mordante requested review of this revision.Sep 26 2021, 6:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2021, 6:48 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Mordante updated this revision to Diff 375099.Sep 26 2021, 7:11 AM

Attempt to fix Clang-11 build issue.

libcxx/include/__format/buffer.h
146

The machinery to __enable_insertable in a container right now is like 16 lines. You could get it down to 1 line, and no helper headers, and C++11-compatible, if you changed this to

is_same_v<_Container, typename _Container::__enable_format_insertable>

Then the containers (vector, deque, string, list, etc.) would just have to say e.g.

using __enable_format_insertable = vector<_Tp, _Alloc>;

without any extra headers or ifdef-guards or anything.

149

add_pointer_t<typename _Container::value_type> is invariably typename _Container::value_type*, right?

154–176

Rather than lines 129-140, I suggest that you add a member function container_type *back_insert_iterator::__get_container() const to libc++'s back_insert_iterator.
I suppose you still need a partial-specialization-based approach equivalent to lines 117-127, in order to tell you whether the user's iterator type is exactly back_insert_iterator (in which case this optimization is safe) or merely something that slices to back_insert_iterator (in which case this optimization is dubious). But once you know it's exactly back_insert_iterator, you can just ask for its container_type member typedef, and use your new member function to get a pointer to the container; you don't have to use tricks for that part.

208

Why not _CharT __buffer_[__buffer_size];? (What does the std::array dependency buy you? and the zero-initialization?)

libcxx/include/module.modulemap
442 ↗(On Diff #375099)

/intertable/insertable/

Mordante planned changes to this revision.Sep 26 2021, 10:00 AM

Needs to be updated due to the removal of __put in D110495.

Mordante marked 5 inline comments as done.Oct 9 2021, 5:25 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
146

Thanks for the suggestion, this indeed is a lot less intrusive.

154–176

See D110573.

208

I prefer an array just for convenience. I need to know the last element and I like it better to use end().
The zero-initialization isn't needed.

libcxx/include/module.modulemap
442 ↗(On Diff #375099)

This is no longer needed due to your earlier suggestions.

Mordante updated this revision to Diff 378439.Oct 9 2021, 5:37 AM
Mordante marked 4 inline comments as done.

Addresses the review comments.
Temporary include the used part of D110573 to avoid possible Buildkite dependency issues.

I think it's up to @ldionne and/or @vitaut to say whether this PR is ultimately a good tradeoff, complexity-and-performance-wise. IMHO we've successfully reached the local optimum, code-wise. :)

libcxx/include/__format/buffer.h
275

Style nit: _CharT* __buffer_it_ = __buffer_.begin(); is easier to read, and equivalent. Could even say __buffer_.data() and then I wouldn't need to consciously avoid looking up on cppreference whether array::iterator is guaranteed to be _CharT*. ;)

libcxx/include/list
853

Style nit: _Tp, _Alloc with a space

libcxx/test/libcxx/utilities/format/enable_insertable.compile.pass.cpp
30–52

FWIW, I think it is not useful to test no_value_type, no_end, or no_insert, because we don't permit users to define such types and we don't expect to define any such types ourselves. Seeing a correct "specialization" of __enable_format_insertable should be enough to indicate that the type author intends the type to be insertable. If it's somehow not insertable, then that's a serious show-stopping bug in the type, not something that <format> should be silently working-around.

69

Also test

struct UserVector : public std::vector<char> {
    using std::vector<char>::vector;
};
static_assert(!std::__format::__insertable<UserVector>);

This is important to test because users are allowed to do stupid things like this. (They shouldn't; it's bad style and has many downsides; but they are allowed to.)

94

What do you expect to happen for std::vector<int> or std::vector<std::string>? On the one hand, int and string are not "char types," right? But on the other hand, vec.insert(vec.end(), charptr, charendptr) will work as expected, so we do want to do the optimization, right?

Mordante planned changes to this revision.Oct 10 2021, 10:27 AM
Mordante marked an inline comment as done.

Thanks for the feedback, I'll look at these issues when Buildkite is working properly again.

libcxx/test/libcxx/utilities/format/enable_insertable.compile.pass.cpp
94

The format specific functions require a char or wchar_t. So I expect them not to pass this trait. When we want to use this more generic, we can add an std::__insertable and rename the typedef in the containers. I didn't restrain the typedef in the classes; I looked at it but it became quite ugly quite fast. Instead it's constrained in the std::__format::__insertable concept. So it would be quite trivial to add std::__insertable, but I don't have a usecase for it.

Mordante updated this revision to Diff 378701.Oct 11 2021, 9:26 AM
Mordante marked 5 inline comments as done.

Addresses review comments.

libcxx/include/__format/buffer.h
275

I prefer the begin() since that conveys the intended usage better.

libcxx/include/list
853

Interesting clang-format didn't catch that for me.

libcxx/test/libcxx/utilities/format/enable_insertable.compile.pass.cpp
30–52

I not entirely convinced that removing them is a good idea. I'd like to hear @ldionne's opinion.

Mordante updated this revision to Diff 380179.Oct 16 2021, 6:20 AM

Rebased to test with wchar_t changes.

Mordante updated this revision to Diff 380181.Oct 16 2021, 7:11 AM

Fixes wide char build.

Since there's no guarantee every container's insert behaves properly

What does behaving properly mean? Could you give an example of a container's insert that doesn't behave properly?

libcxx/include/__format/buffer.h
219

replace -> replaced

221

opt-in to -> "opt in to" or "opt into"

vitaut requested changes to this revision.Oct 31 2021, 7:03 AM

Character-by-character output, especially type erased one introduced in D110494 doesn't seem efficient. As commented on the the previous diff the buffer and corresponding output iterator should provide access to a contiguous buffer and type erase the reallocation, not character output.

This revision now requires changes to proceed.Oct 31 2021, 7:03 AM

Character by character output doesn't seem efficient

Right, IIUC, this would end up rebased on top of D112361 or the like.

libcxx/include/__format/buffer.h
221

Since there's no guarantee every container's insert behaves properly

What does behaving properly mean? Could you give an example of a container's insert that doesn't behave properly?

User-defined containers can make their insert do arbitrary things, e.g.

struct Evil : std::string {
    void insert(std::string::iterator, char *, char *) { /*no-op*/ }
};
Evil e;
auto evil_it = std::back_inserter(e);  // uses e.push_back, which is fine

(Serendipitously, Charlie Barto, of the MSVC STL, also mentioned the need to guard against this possibility in his CppCon talk "Why does std::format do that?" on Thursday. https://cppcon.digital-medium.co.uk/thursday-recordings/ )

vitaut added inline comments.Oct 31 2021, 9:25 AM
libcxx/include/__format/buffer.h
221

Makes sense, thanks for clarifying.

Character-by-character output, especially type erased one introduced in D110494 doesn't seem efficient. As commented on the the previous diff the buffer and corresponding output iterator should provide access to a contiguous buffer and type erase the reallocation, not character output.

Yes I first want to get a good solution in D112361 and then update this patch with these changes.

libcxx/include/__format/buffer.h
221

Since there's no guarantee every container's insert behaves properly

What does behaving properly mean? Could you give an example of a container's insert that doesn't behave properly?

User-defined containers can make their insert do arbitrary things, e.g.

struct Evil : std::string {
    void insert(std::string::iterator, char *, char *) { /*no-op*/ }
};
Evil e;
auto evil_it = std::back_inserter(e);  // uses e.push_back, which is fine

(Serendipitously, Charlie Barto, of the MSVC STL, also mentioned the need to guard against this possibility in his CppCon talk "Why does std::format do that?" on Thursday. https://cppcon.digital-medium.co.uk/thursday-recordings/ )

Yes that's the reason, interesting to see Charlie did the same. I'm looking forward to watching this talk.

Mordante marked 4 inline comments as done.Nov 14 2021, 6:48 AM
Mordante updated this revision to Diff 387083.Nov 14 2021, 7:23 AM

Rebased and make changes based on D112361.
Addresses review comments.

Mordante updated this revision to Diff 399949.Jan 14 2022, 3:32 AM

Rebased tp trigger CI.

vitaut accepted this revision.Jan 16 2022, 9:13 AM

Same as D110495:

As commented, I'm very skeptical about this approach because it has extra buffering for common cases but I defer to libc++ maintainers the decision whether they want to land this as is or optimize buffering now.

ldionne added inline comments.Mar 23 2022, 1:10 PM
libcxx/include/__format/buffer.h
162–169

Is there any reason why this isn't implemented as a bool variable template that containers specialize or something similar? IMO it would be nice to keep containers free of a __enable_format_insertable typedef used for this sole purpose.

Requiring a specialization also makes it slightly less likely that users are going to do it for their own containers, which IMO is a good thing (we really don't want users to start depending on this unless the Standard commits to providing this functionality).

[Reading some other comments, I get the feeling this might have been how it was implemented in previous diffs, please let me know if that's the case]

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 1:10 PM
Mordante added inline comments.Mar 26 2022, 6:02 AM
libcxx/include/__format/buffer.h
162–169

That was my initial approach https://reviews.llvm.org/D110497?vs=on&id=375096#toc
when you too like that better I can switch back to that approach.

Mordante updated this revision to Diff 418398.Mar 26 2022, 8:50 AM

Rebase and remove concepts guards.

ldionne accepted this revision.Apr 7 2022, 12:49 PM

LGTM with a comment I'd like to see addressed, but I don't think I need to see this again. IIUC, @vitaut 's concern about character-by-character output has been resolved since this was rebased. We are still buffering too much to @vitaut 's liking, but @Mordante 's telling me that he'll investigate that in the future, before we ship this.

Also, this should be rebased onto the __get_container() patch.

libcxx/include/__format/buffer.h
162–169

Yeah, I think it would be better to have a less intrusive mechanism. We can specialize it for various containers in this header instead of having to touch vector & friends directly.

This revision is now accepted and ready to land.Apr 7 2022, 12:49 PM
Mordante added inline comments.Apr 8 2022, 9:29 AM
libcxx/include/__format/buffer.h
162–169

In that case format pulls in several unrelated headers. In the original approach these container headers pull in one small header. So I think that's the better solution.

Mordante updated this revision to Diff 421569.Apr 8 2022, 10:11 AM
  • Rebased
  • Updated to recent upstream cheanges
  • Address review comments
Mordante updated this revision to Diff 421573.Apr 8 2022, 10:13 AM

Forgot to format last patch.

Mordante marked 2 inline comments as done.Apr 8 2022, 10:13 AM
Mordante updated this revision to Diff 421588.Apr 8 2022, 10:52 AM

Fix a typo in the module map.

This revision was automatically updated to reflect the committed changes.