This is an archive of the discontinued LLVM Phabricator instance.

[libcxx][format] Fix how we handle char traits in formatter<string> and formatter<string_view>
ClosedPublic

Authored by DanielMcIntosh-IBM on Oct 18 2021, 10:19 AM.

Details

Summary

Right now we drop the char_traits template argument, which presumes that
string<_CharT, _Traits> and string<_CharT> are interchangeable.

Diff Detail

Event Timeline

DanielMcIntosh-IBM requested review of this revision.Oct 18 2021, 10:19 AM
DanielMcIntosh-IBM created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2021, 10:19 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
vitaut added inline comments.Oct 18 2021, 10:49 AM
libcxx/include/__format/formatter.h
184–186 ↗(On Diff #380460)

I don't think traits are needed in this internal function. It might be better to convert basic_string_view<_CharT, _Traits> into basic_string_view<_CharT> when calling it and remove _Traits here to prevent template bloat.

libcxx/include/__format/formatter.h
184–186 ↗(On Diff #380460)

If my understanding of templates is correct, when _Traits != char_traits<_CharT>, then basic_string_view<_CharT, _Traits> is a totally different type from basic_string_view<_CharT> == basic_string_view<_CharT, char_traits<_CharT>>, and may not be convertible to basic_string_view<_CharT>.

vitaut added inline comments.Oct 18 2021, 5:09 PM
libcxx/include/__format/formatter.h
184–186 ↗(On Diff #380460)

Yes but string view is just a pointer and a size so you can do basic_string_view<_CharT>(__str.data(), __str.size()) at the __write_unicode call site.

Nice catch!

libcxx/include/__format/formatter_string.h
136–137

I think we can here directly convert to a VSTD::basic_string_view<_CharT> and drop the _Traits.

144

I even wonder whether we want to drop the _Traits here and have no _Traits for __formatter_string.
When a string(_view) is stored as format_arg the traits and allocator are stripped http://eel.is/c++draft/format#arg-9

vitaut added inline comments.Oct 19 2021, 12:32 PM
libcxx/include/__format/formatter_string.h
144

Yes, please.

Address review comments

DanielMcIntosh-IBM marked 5 inline comments as done.Oct 22 2021, 9:32 AM
Mordante accepted this revision as: Mordante.Nov 4 2021, 11:12 AM

Sorry I missed your update on this patch, but LGTM!

Quuxplusone accepted this revision.Nov 5 2021, 12:09 PM

LGTM, so I'll be the second approver here. But I'd like to see both of my comments on formatter<basic_string_view>::format addressed.

libcxx/include/__format/formatter_string.h
136–137

We can also drop the _VSTD:: (on non-functions). I wonder if we could make this self-documenting (i.e. eliminate the comment) by doing something like

using _StringView = basic_string_view<_CharT, char_traits<_CharT>>;
return _Base::format(_StringView(__str.data(), __str.size()), __ctx);

but meh, it seems about the same to me.

147

Style: Linebreak after auto, please.

Perf: Shouldn't basic_string_view<_CharT, _Traits> __str be passed by value, not by const-ref? Or is there a subtle reason we're passing by const-ref here in particular?

This revision is now accepted and ready to land.Nov 5 2021, 12:09 PM

Address review comments

Quuxplusone accepted this revision.Nov 8 2021, 10:35 AM
Quuxplusone added inline comments.
libcxx/include/__format/formatter_string.h
136–137

I like this!

libcxx/include/__format/formatter_string.h
136–137

Yeah, unfortunately, I don't see any good way of highlighting the change of _Traits. Since they're about the same I think it's probably better not to introduce a second source of truth by explicitly specifying the _Traits template argument to be char_traits<_CharT>. Especially since we rely on the default for the __str parameter in the definition of __formatter_string::format.

147

I believe formatter<basic_string_view<_CharT, _Traits>, _CharT>::format needs to meet the Formatter named requirements, which limits what the function signature can be.

Rebase to trigger CI

libcxx/include/__format/formatter_string.h
147

Never mind, I realized passing by value won't change what we can accept as arguments.

However, I'm not quite sure why passing by value would be a performance benefit. When _Traits == char_traits<_CharT>, does the basic_string_view<_CharT>(__str.data(), __str.size()) somehow get treated as a move constructor, allowing the re-use of __str as the argument to _Base::format if __str was passed by value? i.e. does using that specific constructor somehow still allow copy elision? I was under the impression it was only allowed if "the initializer expression is a [value] of the same class type". Or is this a case of "we know what the constructor does, so the optimizer can figure out that there are no side-effects, and remove it"?

Quuxplusone added inline comments.Nov 9 2021, 9:34 AM
libcxx/include/__format/formatter_string.h
147

Pass-by-value has basically three perf benefits, in the abstract:

  • Eliminate a pointer indirection in the callee, which means eliminate a load from memory. string_view can be passed in registers... but if you pass it by reference, it spills into memory. https://godbolt.org/z/zqr6TjEoG
  • Eliminate the need to spill the variable onto the stack in the caller, which means maybe eliminate the entire need for a stack frame in the caller. https://godbolt.org/z/G1Kj65n15
  • Give the callee its own copy of the string_view object, which means no aliasing is possible with anything else in the program, which means greater opportunities for optimization. https://godbolt.org/z/zddj8Yn4T

For string_view (and int and char* and other small trivial types), the rule is "pass by value"; and then for big or expensive types (like string or vector<int>) the rule becomes "pass by const reference, as an optimization of pass by value." But only because avoiding-that-one-copy is big enough to outweigh all three of the benefits above! (And because for non-trivially-copyable types such as string, pass-by-value secretly turns into pass-by-hidden-reference on most ABIs we care about. So really there's not even a competition in those non-trivial cases.)

Now, inliners are really good, and it's quite probable that passing by const reference in this specific case is harmless in practice. But since pass-string_view-by-value is the Right Thing To Do, here I would want to see a positive reason to deviate from the best practice. I'd rather keep our code as idiomatic as possible, so that when someone (me ;)) looks at it, they go "oh yeah, cool, that's what I expect to see" instead of "wait, is something subtle going on here? why does this look weird?" So "pass small types including string_view by value" is the default.

As per review comments, pass string_view by value

libcxx/include/__format/formatter_string.h
148

Aargh, I was gonna say "ship it" and then I noticed that... none of these format methods are marked const?! What's up with that?
https://en.cppreference.com/w/cpp/named_req/Formatter says:

formatter.format(arg, format_context) [should work with] formatter, a value of type Formatter

What does "value" mean in this context? It probably doesn't mean "lvalue" or it would have said "lvalue," right? I suspect it means "(possibly const) lvalue or rvalue," in which case format should always be const-qualified. But then how does any of this ever work, today, given that we're not const-qualifying these format methods? Does this suddenly need further investigation?

Ping @Mordante — clarification please? (If @Mordante says "yeah this is all working as intended," then you can land this AFAIC.)

Mordante accepted this revision.Nov 11 2021, 10:59 AM

LGTM!

libcxx/include/__format/formatter_string.h
148

I'm not entirely seeing what exactly the issue is. But there's no requirement the format member function is a const member function.
http://eel.is/c++draft/format#formatter.spec-6 contains a sample of a user defined formatter, which has a non-const member function.

When a formatter is used it:

  • first calls the parse member function, this function should store the state of the parsing, so shouldn't be const,
  • then calls format.
Quuxplusone accepted this revision.Nov 11 2021, 11:08 AM
Quuxplusone added inline comments.
libcxx/include/__format/formatter_string.h
148

http://eel.is/c++draft/format#formatter.spec-6 contains a sample of a user defined formatter, which has a non-const member function.

Okay, works for me!

FWIW: Sounds like the situation is analogous to how it works with the distributions from <random>: some of the <random> distributions are stateless and thus could physically be const-qualified, but that would actually be worse QoI because it would harm portability-to-other-vendors and/or refactorability-to-other-distribution-types-that-happen-to-be-stateful.

vitaut added inline comments.Nov 11 2021, 11:26 AM
libcxx/include/__format/formatter_string.h
148

I'd recommend marking format functions as const even though it's not required because they shouldn't mutate the formatter state and it will help with format string compilation in the future.

DanielMcIntosh-IBM marked 4 inline comments as done.Nov 11 2021, 11:44 AM
DanielMcIntosh-IBM added inline comments.
libcxx/include/__format/formatter_string.h
148

I'm going to land the changes for now.
Changing the format functions to const should probably be done across the board if it's going to be done at all, so that should probably go in a separate patch anyways.

vitaut added inline comments.Nov 11 2021, 11:44 AM
libcxx/include/__format/formatter_string.h
148

Maybe we should even open an LWG issue clarifying that all standard formatters have a const format function.

This revision was landed with ongoing or failed builds.Nov 11 2021, 11:49 AM
This revision was automatically updated to reflect the committed changes.
libcxx/include/__format/formatter_string.h
148

@vitaut wrote:

format functions [...] shouldn't mutate the formatter state

@Mordante wrote:

http://eel.is/c++draft/format#formatter.spec-6 contains a sample of a user-defined formatter, which has a non-const member function.

@vitaut wrote:

Maybe we should even open an LWG issue clarifying that all [...] formatters [must] have a const format function.

Yes, and not just the standard formatters — IIUC, it should be all formatters!
I've just now sent an email to lwgchair with subject line "Request new LWG issue: formatter<T>::format should be const-qualified".

vitaut added inline comments.Nov 11 2021, 2:57 PM
libcxx/include/__format/formatter_string.h
148

I've just now sent an email to lwgchair with subject line "Request new LWG issue: formatter<T>::format should be const-qualified".

Thank you!

Mordante added inline comments.Nov 12 2021, 8:18 AM
libcxx/include/__format/formatter_string.h
148

I'm going to land the changes for now.
Changing the format functions to const should probably be done across the board if it's going to be done at all, so that should probably go in a separate patch anyways.

Yes it should indeed be a separate patch when we want to make this change. Currently the format function can't be const. It updates the requested width of the format string when it's supplied as argument to the format function.

148

I'd recommend marking format functions as const even though it's not required because they shouldn't mutate the formatter state and it will help with format string compilation in the future.

Just curious will it be required that format strings are compiled in the future? AFAIK P2216 only requires compile-time validation but not compilation. (Obviously it would be a good idea to avoid parsing the format string again at runtime.)

vitaut added inline comments.Nov 12 2021, 8:50 AM
libcxx/include/__format/formatter_string.h
148

Just curious will it be required that format strings are compiled in the future?

No, it will be an opt-in because it increases binary size which is often undesirable.