This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format] Improve format-arg-store.
ClosedPublic

Authored by Mordante on Mar 12 2022, 2:58 AM.

Details

Summary

This optimizes the __format_arg_store type to allow a more efficient
storage of the basic_format_args.

It stores the data in two arrays:

  • A struct with the tag of the exposition only variant's type and the offset of the element in the data array. Since this array only depends on the type information it's calculated at compile time and can be shared by different instances of this class.
  • The arguments converted to the types used in the exposition only variant of basic_format_arg. This means the packed data can be directly copied to an element of this variant.

The new code uses rvalue reference arguments in preparation for P2418.
The handle class also has some changes to prepare for P2418. The real
changed for P2418 will be done separately, but these parts make it
easier to implement that paper.

Some parts of existing test code are removed since they were no longer
valid after the changes, but new tests have been added.

Implements parts of:

  • P2418 Add support for std::generator-like types to std::format

Completes:

  • LWG3473 Normative encouragement in non-normative note

Depends on D121138

Diff Detail

Event Timeline

Mordante created this revision.Mar 12 2022, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2022, 2:58 AM
Herald added a subscriber: mgorny. · View Herald Transcript
Mordante requested review of this revision.Mar 12 2022, 2:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2022, 2:58 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
EricWF requested changes to this revision.Mar 14 2022, 7:13 AM
EricWF added a subscriber: EricWF.

I've got a bunch of concerns about the memcpy magic happening here. How is it constexpr, how does it handle lifetimes, is it fully tested?

libcxx/include/__format/format_arg_store.h
158

These overloads are constrained in entirely different ways. Is it possible to use a single method to constrain each of them?

246

Why do some of these functions take output parameters and other return by value? An overload set should be cohesive in how you use it.

288

This comment doesn't make sense to me? If the constructors can possibly be non-noexcept, does that mean they can throw?

292

I'm very uncomfortable with this function. How does it handle alignment? non-trivial types? Is it opaque to sanitizers?
Does it leave us open to buffer overflow bugs/attacks?

306

This needs to static_assert(is_trivially_copyable<_Sp>); and do we really need to instantiate another type just to memcpy this thing?
What are the requirements on _Sp generally?

307

sizeof(__storage<_Sp>) may not always be equal to sizeof(_Sp).

libcxx/include/__format/format_args.h
35–44

Should this be explicit?

60

Is there a reason to pack the members in this order? It seems likely to generate padding for certain arg types

61–72

Have we shipped this in a release?
Because I'm not sure we can change the A-BI here then.

libcxx/include/format
191–192

This isn't specified as returning auto so you should spell out the return type. Making the compiler deduce it has costs and can potentially break SFINAE.

198

This isn't specified as returning auto so you should spell out the return type. Making the compiler deduce it has costs and can potentially break SFINAE.

libcxx/test/libcxx/utilities/format/format.arguments/format.arg/format_arg_store.pass.cpp
1 ↗(On Diff #414816)

We typically avoid testing internal details like this. Can you say more about why we really

89 ↗(On Diff #414816)

This is undefined behavior. Don't

EricWF added inline comments.Mar 14 2022, 7:13 AM
libcxx/include/__format/format_arg_store.h
265

Why not use a fold expression here rather than recursively instantiating all these function templates.
Please find a way to make the number of instantiations N, not N**2

301

How is any of this constexpr?
https://godbolt.org/z/esKTzK38b

If I'm right that this isn't, it means we need a lot more tests before this lands. Because it should have been caught by tests.

This revision now requires changes to proceed.Mar 14 2022, 7:13 AM
Mordante edited the summary of this revision. (Show Details)Mar 14 2022, 10:12 AM
Mordante marked 9 inline comments as done.Mar 14 2022, 2:36 PM

Thanks for your review comments! I'll look at the other comments at another time.

libcxx/include/__format/format_arg_store.h
265

I never realized you could use fold expressions on the template types.
Thanks for your suggestion!

288

They don't throw. The constructor explicit basic_format_arg(const char_type* s); isn't no except but it doesn't throw.

292

It doesn't handle alignment it stores all elements packed in a buffer as suggested in http://eel.is/c++draft/format#args-1

An instance of basic_­format_­args provides access to formatting arguments.
Implementations should optimize the representation of basic_­format_­args for a small number of formatting arguments.
[Note 1: For example, by storing indices of type alternatives separately from values and packing the former.  — end note]

The types that can be used are all trivial.
I haven't tested whether it works with sanitizers, but if it doesn't the CI will complain.
This should be safe against buffer overflow attacks. All sizes are determined compile-time based on the types.
If there are bugs in our implementation I expect them to be cause by the sanitizers in the CI.

301

It seems it indeed isn't. It seems this function shouldn't be a constexpr function. Which explains why the code works. There are no constexpr tests so that's the reason the tests didn't catch it either.

306

Yes we need that to convert certain types. For example all signed integrals where sizeof(_Tp) < sizeof(int) are converted to an int. The stored types match the types in the basic_format_arg value "variant". (This variant is exposition only and not used on the code, but these are the types stored in our implementation.)

307

I'm not convinced that's true. But it doesn't matter, the code copies __s.__value. __s.__value has the type _Sp.

libcxx/include/__format/format_args.h
35–44

Unfortunately no. The Standard specifies a non-explicit single argument constructor (http://eel.is/c++draft/format#args-3).

60

I'm not entirely sure what you mean. This always stores a size_t and two pointers regardless of the arg types.

I'm not aware of platforms where the sizes of pointers and size_t differ, is there a specific platform you have in mind?

61–72

We have shipped this, but vendors need to opt-in to it due to future API/ABI breaks.
We also advertise <format> as API/ABI-unstable.
There are more API/ABI breaking changes planned in the code. For example

During your absence we have identified this as an issue and we need to find a better solution to be able to implement large features while being able to ship stable code. (Ranges have the same issue. There the number of retroactively accepted papers is even larger.)

libcxx/test/libcxx/utilities/format/format.arguments/format.arg/format_arg_store.pass.cpp
1 ↗(On Diff #414816)

I usually also tend to avoid that. In this case I think the test if valuable. During development of the code it make it possible to quickly test whether things worked as expected. Here I test whether the buffers have the expected sizes. When these tests fail there are possible buffer bugs. The indirect tests don't give me the same confidence level.

Mordante marked 14 inline comments as done.Mar 15 2022, 10:23 AM
Mordante added inline comments.
libcxx/include/__format/format_arg_store.h
158

You mean like using requires in all of them?

246

Based on your remarks and the remark of the fold expression I've removed these overloads and I'm using fold expressions instead.

libcxx/include/format
191–192

Normally I agree, but in this case the return type is a bit awkward. The change implements parts of P2418 but not all. After P2418 is implemented it's easy to make it a concrete type as is. (The awkward part is the due to the helper function at line 185.)

I've added a TODO to remind me to change it back.

Same for the next comment.

libcxx/test/libcxx/utilities/format/format.arguments/format.arg/format_arg_store.pass.cpp
89 ↗(On Diff #414816)

What's exactly UB? The cast is valid, it becomes UB when the test tries to write to cstring.

Mordante marked 3 inline comments as done.Mar 15 2022, 10:23 AM
Mordante updated this revision to Diff 416700.Mar 19 2022, 8:15 AM

Rebased, addresses review comments, remove concepts guards.

Mordante added inline comments.Mar 19 2022, 11:38 AM
libcxx/test/std/utilities/format/format.formatter/format.formatter.spec/formatter.handle.pass.cpp
11

Since GCC-12 is expected to be released in about a month I prefer to first see whether it works there out of the box.

Mordante updated this revision to Diff 416790.Mar 20 2022, 10:00 AM

Attempt to fix CI.

Mordante updated this revision to Diff 416791.Mar 20 2022, 10:10 AM

Fix modular build.

Mordante updated this revision to Diff 416796.Mar 20 2022, 12:10 PM

Another attempt to fix the CI.

Mordante updated this revision to Diff 417007.Mar 21 2022, 10:33 AM

This should fix the clang-13 CI, test whether the other CIs are happy too.

Mordante updated this revision to Diff 417198.Mar 21 2022, 11:51 PM

The last build passed the CI, cleanup before review.

Mordante added inline comments.Mar 23 2022, 2:28 PM
libcxx/include/__format/format_arg_store.h
305

Originally __store_data_element was a lambda function. This however resulted in an ICE.
This looks like https://github.com/llvm/llvm-project/issues/22200

vitaut added a comment.EditedMar 27 2022, 8:46 AM

You could do much better by having a single array and storing the type alternatives for the common case of small(ish) number of arguments in an integer. This is essentially what {fmt} does. I think the current implementation is based on Microsoft's which I'd strongly discourage.

You could do much better by having a single array and storing the type alternatives for the common case of small(ish) number of arguments in an integer. This is essentially what {fmt} does. I think the current implementation is based on Microsoft's which I'd strongly discourage.

I've developed this approach independently, but when I watched Charlie's CppCon talk I noticed the similarities between Microsoft's an my solution. Can you explain why you dislike this approach?

vitaut added a comment.Apr 1 2022, 7:21 AM

Can you explain why you dislike this approach?

The codegen is strictly worse compared to {fmt}'s approach where the type alternatives are passed in a single register in a common case. See e.g. https://godbolt.org/z/sT54fcGET.

Mordante planned changes to this revision.Apr 2 2022, 6:22 AM

Can you explain why you dislike this approach?

The codegen is strictly worse compared to {fmt}'s approach where the type alternatives are passed in a single register in a common case. See e.g. https://godbolt.org/z/sT54fcGET.

Thanks for the explanation. I'll look into it.

Mordante updated this revision to Diff 419998.Apr 2 2022, 10:15 AM

Applies @vitaut's implementation approach.

Mordante updated this revision to Diff 419999.Apr 2 2022, 10:43 AM

Fixes CI. (Locally tested with D122534 already applied.)

Mordante updated this revision to Diff 421718.Apr 9 2022, 4:44 AM

Rebased, minor improvements, and updated to new test method.

vitaut requested changes to this revision.Apr 16 2022, 7:53 AM

Looks much better, thanks. A few comments inline.

libcxx/include/__format/format_arg.h
78–79

Do I miss something or you reverse the order of arguments i.e. the first argument is stored in higher bits? Why? This makes the constants bigger and logic slightly more awkward for no good reason.

libcxx/include/__format/format_arg_store.h
45

nit: else seems redundant here and everywhere in this function

80–84

What is this struct for? Why not use __arg_t and the "normalized" type directly instead of wrapping in a struct?

203

Looks like this is the only usage of __make_storage_type which seems weird because only __storage_type::type is used. This suggests that instead of returning __storage_type __make_storage_type better return the "mapped" type itself and possibly be renamed. Perhaps "normalize" is a better name because essentially what this function does is normalizes the type by stripping off traits and mapping numeric types into canonical form.

204–226

I think it would be cleaner to move this into the function that does normalization by making it take an argument instead of splitting type and value handling.

This revision now requires changes to proceed.Apr 16 2022, 7:53 AM
Mordante marked 3 inline comments as done.Apr 19 2022, 11:31 AM

Thanks for the feedback! I'll look into your suggestions.

libcxx/include/__format/format_arg.h
78–79

Good point about the size of the constants. I'll change this.

libcxx/include/__format/format_arg_store.h
45

All but the last are redundant, however I prefer to use else in a if constexpr since the else isn't always redundant.

80–84

The __arg indeed seems not needed. But the __make_storage is used in the next patch in the series. So it still has its merits.

204–226

I think I need both for the basic-format-string changes. Based on your feedback I'll look at some alternative approaches.

Mordante marked 5 inline comments as done.Apr 20 2022, 10:32 AM
Mordante added inline comments.
libcxx/include/__format/format_arg_store.h
204–226

It was easier to let __make_storage_type (renamed to __determine_arg_t) return the __arg_t and let this function do the normalization based on the __arg_t. I went with this direction since I use the __arg_t in the basic-format-string. In that code I've no need for normalization since that code only cares about the types.

Mordante updated this revision to Diff 423967.Apr 20 2022, 10:54 AM

Addresses review comments.

Mordante updated this revision to Diff 427686.May 6 2022, 11:20 AM

Rebased to trigger CI.

vitaut accepted this revision.May 8 2022, 6:51 AM
Mordante accepted this revision.May 18 2022, 11:07 AM
This revision was not accepted when it landed; it landed in state Needs Review.May 18 2022, 11:11 AM
This revision was automatically updated to reflect the committed changes.
libcxx/test/libcxx/utilities/format/format.arguments/format.arg/visit_format_arg.pass.cpp