This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format] Adds formatter floating-point.
ClosedPublic

Authored by Mordante on Nov 16 2021, 7:11 AM.

Details

Reviewers
ldionne
vitaut
Group Reviewers
Restricted Project
Commits
rGdb2944e34b16: [libc++][format] Adds formatter floating-point.
Summary

This properly implements the formatter for floating-point types.

Completes:

  • P1652R1 Printf corner cases in std::format
  • LWG 3250 std::format: # (alternate form) for NaN and inf
  • LWG 3243 std::format and negative zeroes

Implements parts of:

  • P0645 Text Formatting

Diff Detail

Event Timeline

Mordante created this revision.Nov 16 2021, 7:11 AM
Mordante requested review of this revision.Nov 16 2021, 7:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2021, 7:11 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Mordante planned changes to this revision.Nov 16 2021, 7:12 AM
Mordante updated this revision to Diff 387989.Nov 17 2021, 10:40 AM

Rebased.
Should fix all CI issues.
Adds more tests.

Still not ready for review.

Mordante planned changes to this revision.Nov 17 2021, 10:40 AM
Mordante updated this revision to Diff 389600.Nov 24 2021, 12:49 PM

More improvements and tests.
Adds benchmarks.
Still WIP, but getting close to be reviewed.

Mordante planned changes to this revision.Nov 24 2021, 12:50 PM
Mordante updated this revision to Diff 390195.Nov 28 2021, 4:26 AM

Some more cleanups and attempt to fix the CI.

Mordante updated this revision to Diff 390197.Nov 28 2021, 5:05 AM

Guard against macro's named min or max.

Mordante updated this revision to Diff 391911.Dec 5 2021, 7:08 AM

Fixes and improves the benchmark.

Mordante added inline comments.Dec 5 2021, 7:12 AM
libcxx/benchmarks/formatter_float.bench.cpp
92

File a bug report about this issue when GitHub issues are activated.
cppreference https://en.cppreference.com/w/cpp/numeric/random/uniform_real_distribution
has a link to https://hal.archives-ouvertes.fr/hal-03282794/document
a better way to get the proper distribution.

Mordante retitled this revision from [libc++][format] WIP Adds formatter floating-point. to [libc++][format] Adds formatter floating-point..Dec 5 2021, 8:35 AM
Mordante edited the summary of this revision. (Show Details)
Mordante added reviewers: ldionne, vitaut.
Mordante updated this revision to Diff 395275.Dec 18 2021, 2:14 AM
  • Rebased.
  • Constrain the formatters so they won't be enabled for _CharT != char and _CharT != wchar_t. These constrains will be validated in an upcomming patch
ashermancinelli added inline comments.
libcxx/include/__format/formatter_floating_point.h
51
542
565
645

As commented inline, to_chars is not a great API for implementing this because you have to do extra work to reparse the output. But it's probably OK for the initial implementation.

libcxx/benchmarks/formatter_float.bench.cpp
92–93

I am not sure if uniform distribution over all FP numbers is very meaningful for benchmarking. The perf of FP formatting methods often depends on the number of significant digits which for these inputs will likely be leaning towards the maximum. A better way of generating inputs can be found in https://github.com/miloyip/dtoa-benchmark.

201

Note that 6 is the default for fixed, general and scientific so this test is effectively the same as None on these formats.

libcxx/include/__format/formatter.h
209

nit: I'd suggest renaming __trailing_zero to something reflecting that it's the number of trailing zeros, e.g. __num_trailing_zeros or at least __trailing_zeros. Right now it looks like a flag indicating whether to write trailing zeros.

libcxx/include/__format/formatter_floating_point.h
28

What is this for?

127

You can do better than that by generating significant digits only which gives maximum precision of 767: https://www.exploringbinary.com/maximum-number-of-decimal-digits-in-binary-floating-point-numbers/. This will require replacing to_chars with something more efficient/lower-level and can be done later. This will also avoid dynamic memory allocation.

145–146

There seems to be something missing grammatically in

At the moment that isn't important floats and doubles
with a "small" precision will always use a stack buffer.

239–244

Switching away from to_chars will also get rid of this reparsing overhead.

Mordante planned changes to this revision.Dec 27 2021, 10:33 AM
Mordante marked 10 inline comments as done.

Thanks both your reviews!

libcxx/benchmarks/formatter_float.bench.cpp
92–93

I'll have a look at this later.

201

Good point.

libcxx/include/__format/formatter.h
209

Good point, I'll use __num_trailing_zeros as suggested.

libcxx/include/__format/formatter_floating_point.h
28

It uses some parts in this header. I probably want to refactor this later.
(When making the formatter's format functions const members and address the fill character allowing a code unit or grapheme cluster I want to look at a slightly different approach for the parser. But it depends on the outcome of the fill character's LWG issue.)

51

I'll leave this one, this block is copy-pasted several times. I actually hope we soon only support compilers with proper concepts support so I can remove this guard from a lot of files.

127

I had a quick look at that article and it looks interesting, thanks! I'll have a closer look later. Even when I implement this method I still need dynamic allocation. The article doesn't mention long double only IEEE quad. But the number of digits is similar, which means there are over 10000 significant digits possible.
(I haven't implemented this since to_chars casts a long double to a double. So I could remove the code for a double, but then I've to re-add it for long double.

Since to_chars is in the dylib it's possible make modifications to its implementation. I've considered before to have a version returning more status information; like the location of the radix point and exponent. I may pursue that at a later time. For now I much rather focus on completing <format> and optimizing this header for size and speed.

239–244

I agree, but that will be quite a bit of effort.

Mordante marked 6 inline comments as done.Dec 27 2021, 10:33 AM
Mordante updated this revision to Diff 396400.Dec 28 2021, 8:54 AM

Addresses review comments.

vitaut added inline comments.Jan 1 2022, 8:01 AM
libcxx/include/__format/parser_std_format_spec.h
750

The separation of concerns between parse and __parse is unclear, they even have identical apidoc comments. I think __parse better be merged into parse.

Mordante marked an inline comment as done.Jan 2 2022, 8:02 AM
Mordante added inline comments.
libcxx/include/__format/parser_std_format_spec.h
750

__parse does the low parsing, and parse does the validation. I've used this methods for other parser. For now I like to keep it that way.

On a side note I noticed that since a formatter is-a parser and not has-a parser there are a lot of instantiations of the same parser code. I've already experimented with an "has-a" approach. But before moving that patch further I like to see what the resolution for LWG3576 "Clarifying fill character in std::format" will be.

That improvement will remove parts of the code duplication and probably improve the binary size.

Mordante marked 2 inline comments as done.Jan 3 2022, 10:37 AM
Mordante added inline comments.Jan 10 2022, 9:52 AM
libcxx/include/__format/parser_std_format_spec.h
750

I did some tests with this different approach and it indeed reduces the binary size. Testing my changes on main saves about 12 KB.

vitaut added inline comments.Jan 12 2022, 10:13 AM
libcxx/include/__format/parser_std_format_spec.h
750

Testing my changes on main saves about 12 KB.

Nice! Looking forward to the diff.

Mordante updated this revision to Diff 399695.Jan 13 2022, 9:12 AM

Rebased to trigger CI and fix some tests broken after rebasing.

Mordante updated this revision to Diff 399705.Jan 13 2022, 9:43 AM

Fixing the CI by sorting the headers.

Overall looks good, just a bunch of minor(ish) comments inline.

libcxx/include/__format/formatter_floating_point.h
60

Can to_chars fail for any reason other than small buffer size?

88

I'd add that many of these fractional digits are zeros and don't have to be generated as a potential future optimization.

97

I suggest renaming __max_precision to something like __max_fractional for consistency with __max_integral and since precision can refer to the number of integral and fractional digits (depending on the presentation format).

98

This is not an exponent but a character that introduces it. Maybe replace with e/E/p/P?

100

This looks like the number of digits in the minimum (subnormal) exponent so I suggest renaming accordingly.

Moreover, I don't think you can produce a number that contains both maximum fractional digits and an exponent because for the former you have to use the fixed format which doesn't have an exponent, e.g. (https://godbolt.org/z/jsrPqsEEb)

fmt::print("{:.1074f}", 4.940656458412e-324);

produces:

0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625

(note that there is no exponent)

And if you use some other format, you'll get an exponent but no leading zeros so buffer requirements will be lower.

Therefore you can remove exponent, sign and __max_precision_value from these computations and simplify the trait.

100

Also you might want to add formatting of the minimum subnormals to tests because it's an interesting corner case and may reveal buffer issues if the test suite is run with sanitizers.

127

Leaving it as a future optimization makes sense but I suggest adding a note about not needing to generate zeros (suggested in another comment above).

162

nit: "from '0' to L'0'" -> "from narrow to wide" since the buffer will contain characters other than '0'.

192

__trailing_zero_ -> __num_trailing_zeros_ for consistency with other places and the accessor.

195

Do you need to store the allocator? Is it stateful?

301

Isn't exponent always produced in hex format?

393

zero's -> zeros

Also I don't understand this comment. What does it have to do with printf and what do you mean by not available? Do you mean that # is not yet implemented here?

394

nit: __remove_trailing_zero -> __remove_trailing_zeros since there can be more than one

542

positive -> nonnegative

553

You might want to outline the rest of this function because it doesn't depend on _Tp.

652

litter -> littered (although I'm not sure if this historical context about previous implementation is useful at all)

660–661

This and almost all other function template here don't need to be defined in a header because they are only instantiated for a few known types (float/double/long double). Moving them out of the header would be beneficial for build times (esp. since you can also move includes).

libcxx/include/__format/parser_std_format_spec.h
489–490

Why not set alignment unconditionally? Numeric alignment is just a variation of right alignment and you already have __zero_padding to check for it.

libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_floating_point.pass.cpp
187

The error message could be improved. First why "the" format-spec and second format-spec doesn't have a numeric value. Something along the lines of "Number (is) of out of range (in a format string)" would be better. Format string can probably be omitted too because it's redundant in format_error.

BTW is the error format consistent with the rest of libc++, particularly is the first word normally capitalized and do they use standardese terms?

191

This error message can be improved too because the fact that you are parsing arg-id is an implementation detail that doesn't help user. A more helpful message would say something about unmatched {, for example:

>>> "{".format(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Single '{' encountered in format string

I'd recommend making a pass over error messages in a separate diff and making them a bit more user-friendly, possibly looking how similar errors are reported in python which does a decent job.

libcxx/test/std/utilities/format/format.functions/locale-specific_form.pass.cpp
943

Do you test that hex digits are capitalized with A somewhere?

Mordante marked 16 inline comments as done.Jan 17 2022, 11:47 AM

Thanks for the review comments. I've answered the first batch. The others need a bit more investigation.

libcxx/include/__format/formatter_floating_point.h
60

No. According to the Standard the only error it can return is errc​::​value_­too_­large. http://eel.is/c++draft/charconv#to.chars-1

88

I've added a todo and a link to the article you shared earlier.

98

I've changed it to exponent character, thinks that's clearer than only the letters.

162

Actually this is only about zeros. This explains why the digits that are always zero at the end aren't stored in the buffer. Still I think changing the wording to avoid the value is a good suggestion.

192

Seems I missed updating them after changing the accessor :-(

195

Fair point, it's the default so it should be stateless.

652

I like to keep the historic comment, it explains why the current approach was chosen.

660–661

This and other code could be moved to the dylib. There's one big disadvantage of the dylib. As soon as the function is exported in the dylib its ABI is frozen. As long as the code is still under development I don't want to freeze the ABI. Especially this function will change once we have proper long double support. Moving code to the dylib is on my todo list, but I want to wait until the code has matured a bit further.

For example the this->__sign on the next line will change when the formatter no longer inherits from a parser. That will be impossible when I move it to the dylib now.

libcxx/include/__format/parser_std_format_spec.h
489–490

There are some catches with with alignment and zero padding. When the user selects zero padding and sets the alignment to zero padding is ignored. When parsing it's not certain what the default alignment will be, for example with a boolean it depends on the display type.

I agree this code isn't the cleanest and I'm not happy with it. I still want to look at another solution. I even wonder whether I want to implement the old '=' alignment internally. But before changing it I also want to look at how you implemented it.

libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_floating_point.pass.cpp
187

For now I'll keep this message, it's also used for all existing parsers where the width exceeds the maximum supported value.

191

I agree. I already had a talk with some of the other libc++ maintainers on Discord regarding this subject. I like your suggestion to look at python. But indeed that will be a separate diff.

vitaut added inline comments.Jan 17 2022, 12:30 PM
libcxx/include/__format/formatter_floating_point.h
660–661

Moving code to the dylib is on my todo list, but I want to wait until the code has matured a bit further.

Sounds reasonable. Thanks for clarifying.

ldionne added inline comments.Jan 18 2022, 8:08 AM
libcxx/include/__format/formatter.h
211

Why is this an important precondition? Let's write that instead of just telling the user what they already know (that it's a precondition failure).

212

This should be worded not to imply that the user is the one "using" that overload. If it ever shows up in a user's crash report, it would most likely be indicative of a libc++ bug, not something on their end. I think The overload not writing trailing zeros should have been used would be better.

libcxx/include/__format/formatter_floating_point.h
104

Should we file an issue on GitHub for this?

143

Would it be ABI breaking to change this after we've shipped it (I'm asking cause it would seem to change the layout of this type)? If so, and if it's actually likely that we'll change that in the future, we might want to allocate from the heap right away to avoid being stuck later on.

183–188

_LIBCPP_HIDE_FROM_ABI throughout.

195

Same question -- I would suggest not storing it. Or if we store it, at least use [[no_unique_address]].

225–228

_VSTD::find(__first, __last - 3, 'e')? I agree it feels a bit weird to use an algorithm for at most 3 steps, but I guess it makes the code a bit clearer. Not binding, just a suggestion if you agree it's clearer.

361

This needs to be qualified. I haven't paid attention elsewhere, but I suggest you take a quick look at all your function calls and qualify them when necessary (or all the time so that we don't even have to ask ourselves).

549

If we had a GitHub issue for long double support in to_chars, we could reference it here and in the other places. Then we'd only have to grep for the issue number to see everything we can fix once that is done.

660–661

Indeed, FWIW I support keeping everything in the headers and marked with _LIBCPP_HIDE_FROM_ABI for the time being, since that means we don't have to commit to keeping these implementation detail functions around forever (which is the case when we move them to the dylib). Once everything is stable, we can "outline" them to the dylib as a size optimization.

Mordante updated this revision to Diff 400897.Jan 18 2022, 10:02 AM
Mordante marked 27 inline comments as done.

Addresses review comments.

Thanks for the reviews!

libcxx/include/__format/formatter.h
211

It was copy pasted and not needed here. I'll remove it.

libcxx/include/__format/formatter_floating_point.h
100

I've already a test for it, I mentioned you in a comment at that place.

100

I want to optimize the storage size when looking at the optimization to not store unneeded zeros. Then I'll determine the proper buffer sizes.

104

I can do that, probably the same for some other of the TODO's. I'll have a look at it later.

143

The struct already can be used for local and heap storage, so I foresee no ABI break. I expect this code can be moved to the dylib in the future, which will solve all ABI implications.

195

I already removed it locally, but not yet uploaded since I had some more comments to address.

225–228

I had that originally but wasn't too happy with it. The code would be something like:

__first = _VSTD::find(__first, __last - 3, 'e');
if(__first != __last - 3)
  return __first;

When there's no match the function should return __last and not __last - 3.

In general I prefer to use the algorithms, but in this case I like the current code better.

301

Yes this seems to be a copy-paste issue. The _LIBCPP_ASSERT at line 312 will fail when this isn't the case.

393

The comment is outdated and can be removed.

549

That's a fair point. I think it would be good to think about how we want to organize missing features in GitHub first.

553

It depends on _Fp in the buffer. Since _Fp is only temporary that it doesn't depend on _Tp. So the benefit would only be temporary. So I prefer to keep it as is for now.

libcxx/test/std/utilities/format/format.formatter/format.context/format.formatter.spec/formatter.floating_point.pass.cpp
427

@vitaut The subnormal value is tested here.

libcxx/test/std/utilities/format/format.functions/locale-specific_form.pass.cpp
943

I see I don't added them.

Quuxplusone added inline comments.
libcxx/include/__format/formatter.h
211

_LIBCPP_ASSERT(__num_trailing_zeros > 0, "The overload not writing trailing zeros should have been used");
I haven't looked really closely at the logic around this, but it feels very weird that we have two different overloads — one taking __num_trailing_zeros but internally checking that it's never zero, and one not taking __num_trailing_zeros and internally assuming that it's always zero. Then, in the caller (new line 603), we have if (__num_trailing_zeros) /*call the first overload*/ else /*call the second overload*/. Couldn't we just have one overload, and no precondition, and do the if test inside the callee?

Also, this smells very much like "When Should You Give Two Things the Same Name?". Why do you need the two overloads to both be named __write, so that accidentally omitting an argument will do the wrong thing? Could you simply name them __write and __write_with_trailing_zeros instead?
If so, then even if you keep everything else the same, your precondition-failure message can use the greppable and unambiguous name "__write_with_trailing_zeros" in place of the verbose epithet "The overload not writing trailing zeros".

Mordante added inline comments.Jan 20 2022, 12:03 PM
libcxx/include/__format/formatter.h
211

I prefer to keep it as is, but I'll adjust the assertion message in the next revision/before landing.

(I'm looking at some size optimizations and I expect some of this code will be changed in the future.)

vitaut accepted this revision.Jan 23 2022, 6:42 AM

LGTM

ldionne accepted this revision.Jan 24 2022, 7:58 AM
This revision is now accepted and ready to land.Jan 24 2022, 7:58 AM
This revision was automatically updated to reflect the committed changes.