This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format][5/6] Improve format_to_n.
ClosedPublic

Authored by Mordante on Sep 26 2021, 8:05 AM.

Details

Reviewers
ldionne
vitaut
Mordante
Group Reviewers
Restricted Project
Commits
rGf0c06c042040: [libc++][format][5/6] Improve format_to_n.
Summary

Use a specialized buffer wrapper to limit the number of insertions in the
buffer. After the limit has been reached the buffer only needs to count
the number of insertions to return the buffer size required to store the
entire output.

Depends on D110498

Diff Detail

Event Timeline

Mordante created this revision.Sep 26 2021, 8:05 AM
Mordante requested review of this revision.Sep 26 2021, 8:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2021, 8:05 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added inline comments.
libcxx/include/__format/buffer.h
283–284
286

Given that you continue incrementing __size_ even after hitting the end of __buffer_, I think it's a bit dangerous to use any type here except size_t. I recommend s/_Size/size_t/g.
I don't see any particular usefulness in continuing to store __n_ as _Size, either. formatted_size() is specced to return size_t, not iter_difference_t<anything>, so I think size_t throughout would make plenty of sense.

Mordante marked 2 inline comments as done.Sep 26 2021, 9:15 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
283–284

Again what clang-format makes of the code.

286

The normal formatted_size indeed returns a size_t, but format_to_n returns

template<class Out> struct format_to_n_result {
  Out out;
  iter_difference_t<Out> size; // This is the formatted_size.
};

So I prefer to keep this the same type as the Standard specified.

Mordante planned changes to this revision.Sep 26 2021, 10:00 AM
Mordante marked 2 inline comments as done.

Needs to be updated due to the removal of __put in D110495.

Mordante updated this revision to Diff 378455.Oct 9 2021, 8:07 AM

Rebased and adjusted to changes in __put.

Mordante updated this revision to Diff 380190.Oct 16 2021, 8:48 AM

Rebased to test with wchar_t changes.

Mordante planned changes to this revision.Nov 2 2021, 10:57 AM
Mordante updated this revision to Diff 387097.Nov 14 2021, 8:46 AM

Rebased and make changes based on D112361.

vitaut added inline comments.Nov 25 2021, 8:33 AM
libcxx/include/format
380–381

Similarly to other diffs, let's introduce __vformat_to_n and force inline format_to_n for bloat prevention to actually work.

381–384

Not sure if this branch is worth adding. __format_to_n_buffer should perform reasonably well for this case.

Mordante marked an inline comment as done.Nov 25 2021, 11:36 AM
Mordante added inline comments.
libcxx/include/format
381–384

I need to guard against __n < 0 since the __n is used as an unsigned capacity in the buffer code. So I thought it was easiest to add the guard here.

vitaut added inline comments.Nov 26 2021, 8:13 AM
libcxx/include/format
381–384

I understand that you need a check, my comment was that special casing 0 (and negative size) is probably not worth it. I'd suggest keeping it simpler and go through __format_to_n_buffer in all cases.

Mordante updated this revision to Diff 390419.Nov 29 2021, 11:14 AM
Mordante marked an inline comment as done.

Addresses review comments.

Mordante marked 2 inline comments as done.Nov 29 2021, 11:17 AM
Mordante updated this revision to Diff 400057.Jan 14 2022, 9:57 AM

Rebased to trigger CI.

I investigated the CI issue and it seems to be unrelated to the changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2022, 11:20 AM
vitaut added inline comments.Mar 27 2022, 7:47 AM
libcxx/include/__format/buffer.h
23

What is this for?

Mordante marked an inline comment as done.Mar 27 2022, 9:29 AM
Mordante added inline comments.
libcxx/include/__format/buffer.h
23

This is the libc++ internal header that (among other things) contains iter_difference_t.

ldionne added inline comments.Apr 7 2022, 12:56 PM
libcxx/include/__format/buffer.h
279

This formatting strikes me as really strange. I'd expect something like:

template <class _OutIt, __formatter::__char_type _CharT, bool>
  requires output_iterator<_OutIt, const _CharT&>
struct _LIBCPP_TEMPLATE_VIS __format_to_n_buffer_base {

The current formatting for this declaration is somewhat difficult to read IMO.

libcxx/include/format
374

You probably want to use std:: instead of _VSTD::

524–527

I would rather not use _LIBCPP_ALWAYS_INLINE I would let the compiler make the right call here -- _LIBCPP_ALWAYS_INLINE has a negative impact on various optimizations and on the debugging experience, so we should steer clear of it unless we have a specific reason to. A function being a one-liner isn't a sufficient reason -- the optimizer should be able to figure out whether it is worth actually generating a function call (most likely not).

Mordante planned changes to this revision.Apr 7 2022, 1:14 PM
Mordante marked 2 inline comments as done.

Thanks for the review! I'll address the comment shortly.

libcxx/include/__format/buffer.h
279

I tend to use clang-format and that's what clang-format made of it. I've filed some bug reports. I'm quite sure ToT clang-format will format it as you suggested.

libcxx/include/format
374

Actually, for consistency, I prefer to use _VSTD::. This patch predates the changes to use std::. On my TODO list I have an item to do s/_VSTD/std for all format related code.

524–527

This was suggested by @vitaut. I don't mind to remove it. I still want to look closer at the codegen after P2216 and P2418 are done. (I already identified code bloat in the parser code, which is on my list for P2216 and P2418.)

vitaut added inline comments.Apr 8 2022, 4:43 PM
libcxx/include/format
524–527

Functions like this should be inlined, otherwise you'll get a new instantiation for every call with different argument types which is a disaster. Compilers are terrible at dealing with this sort of things.

Mordante marked 4 inline comments as done.Apr 9 2022, 7:08 AM
Mordante added inline comments.
libcxx/include/format
524–527

Thanks for the additional information. I've added a TODO to investigate this and when needed file a bug report.

Mordante updated this revision to Diff 421729.Apr 9 2022, 7:19 AM
Mordante marked an inline comment as done.

Rebased and address review comments.

vitaut added inline comments.Apr 11 2022, 10:10 AM
libcxx/include/format
524–527

If you are referring to my comment then there is no bug to report. I was mostly explaining why this function must be forced inlined. A compiler doesn't have information to figure it out on its own.

Mordante marked an inline comment as done.Apr 11 2022, 10:48 AM
Mordante added inline comments.
libcxx/include/format
524–527

No it's a TODO for me. In general we dislike to add compiler work-arounds in our code. If we need it we often file a bug report, where it hopefully gets fixed.

vitaut added inline comments.Apr 11 2022, 11:40 AM
libcxx/include/format
524–527

I would argue that it's not a workaround but a part of the design of std::format and the whole reason why we have v* functions in the first place.

ldionne added inline comments.May 13 2022, 12:10 PM
libcxx/include/__format/buffer.h
295

It strikes me that perhaps __internal_storage::capacity() should be renamed to __internal_storage::size(). capacity() makes it look like it's not something we can write to (like std::vector::capacity()). No action for this patch, and this is just a suggestion.

315

Why do we unwrap_iter here? If the iterator is something that e.g. checks the bounds, it seems like it would be useful to keep it wrapped to get this checking?

libcxx/include/format
524–527

I'm swayed by @vitaut's explanation. _LIBCPP_ALWAYS_INLINE is definitely reasonable here then, IMO.

philnik added inline comments.
libcxx/include/format
524–527

Could you add a comment explaining the _LIBCPP_ALWAYS_INLINE? We've got a few of these around the code base and most of them look to me like they don't actually belong there. I think it would be nice if we could have the reasoning right in the code and not have to hunt it down (assuming there is one to find). I didn't feel like looking into the other _LIBCPP_ALWAYS_INLINE yet, but I want to do that at some point and potentially remove some of them.

Mordante marked 4 inline comments as done.May 14 2022, 8:16 AM

Thanks for the reviews!

libcxx/include/__format/buffer.h
295

I see your point, but to bike-shed I'm not too fond of size() since the storage doesn't really keep track of the number of elements stored. I think, maybe remove the function and make __buffer_size_ public instead. I'll look at that in a followup patch.

315

Unwrapping gives a raw pointer, which allows certain optimizations. For example when using std::string::iterator. This is only done when the checks are disabled. The concept on line 119 is used to select this __format_to_n_buffer_base specialization.

template <class _OutIt, class _CharT>
concept __enable_direct_output = __formatter::__char_type<_CharT> &&
    (same_as<_OutIt, _CharT*>
#if _LIBCPP_DEBUG_LEVEL < 2
     || same_as<_OutIt, __wrap_iter<_CharT*>>
#endif
    );

When this concept isn't true __out_it will not be unwrapped.

libcxx/include/format
524–527

I already added that to my todo list after I read Louis' comment yesterday ;-) However it seems I already added a comment :-)
I have moved it to the first occurrence (only once since I see no value in duplicating the comment).

Mordante updated this revision to Diff 429460.May 14 2022, 8:16 AM
Mordante marked 2 inline comments as done.

Rebased and improved some comment.

vitaut added inline comments.May 15 2022, 8:47 AM
libcxx/include/format
307–309

"to" looks out of place. Also I don't think it's clang-specific so opening bug report just for this compiler won't help.

LGTM, just one minor comment inline.

Mordante updated this revision to Diff 430086.May 17 2022, 8:49 AM

Address review comments.

Mordante marked an inline comment as done.May 17 2022, 8:53 AM
Mordante added inline comments.
libcxx/include/format
307–309

s/to/the.

I want to look whether I can file a nice bug report since we in general want to do that when we need to manually aid the compiler. Before removing the annotations we need to look at whether all supported compilers do the right thing. (I don't expect that will happen soon.)

Mordante accepted this revision.May 18 2022, 11:12 AM
Mordante marked an inline comment as done.
This revision is now accepted and ready to land.May 18 2022, 11:12 AM
This revision was automatically updated to reflect the committed changes.