This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format] Adds integer formatter.
ClosedPublic

Authored by Mordante on May 31 2021, 10:39 PM.

Details

Summary

Implements the formatter for all fundamental integer types
(except char, wchar_t, and bool).
[format.formatter.spec]/2.3
For each charT, for each cv-unqualified arithmetic type ArithmeticT other
than char, wchar_t, char8_t, char16_t, or char32_t, a specialization

template<> struct formatter<ArithmeticT, charT>;

This removes the stub implemented in D96664.

As an extension it adds partial support for 128-bit integer types.

Implements parts of:

  • P0645 Text Formatting
  • P1652 Printf corner cases in std::format

Completes:

  • LWG-3248 #b, #B, #o, #x, and #X presentation types misformat negative numbers

Diff Detail

Event Timeline

Mordante created this revision.May 31 2021, 10:39 PM
Mordante requested review of this revision.May 31 2021, 10:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2021, 10:39 PM
Herald added a reviewer: Restricted Project. · View Herald TranscriptMay 31 2021, 10:39 PM
Mordante updated this revision to Diff 348884.May 31 2021, 11:01 PM

Adds a required inline, fixes the build issue.

Mordante updated this revision to Diff 348982.Jun 1 2021, 8:20 AM

Cast to the proper signedness that should fix the AArch and ARM builds.

Mordante updated this revision to Diff 349658.Jun 3 2021, 12:44 PM
  • Rebase
  • Fix octal buffer size off by 1.
  • Add test for maximum integral values.
  • Fixes Windows issue
  • Polishing
Mordante edited the summary of this revision. (Show Details)Jun 5 2021, 4:16 AM
Mordante added reviewers: curdeius, ldionne, miscco, vitaut.
Mordante updated this revision to Diff 355451.Jun 29 2021, 11:02 PM
Mordante edited the summary of this revision. (Show Details)

Rebased and updated:

  • Address some changes from previous patches
  • Minor cleanups and comment improvements
  • Added a module map
vitaut requested changes to this revision.Jul 17 2021, 8:48 AM
vitaut added inline comments.
libcxx/include/__format/formatter_integer.h
112–117

AFAICS to_chars supports 128-bit integers (https://godbolt.org/z/7GTPrcq88) or am I missing something?

libcxx/include/__format/formatter_integral.h
72

Did you mean "combining"?

89

Why not make these protected to keep the public API of formatter specializations clean?

137

Here and elsewhere the default alignment should depend on the presentation type: http://eel.is/c++draft/format#tab:format.align

i.e. format("{:10}", true) is left-aligned while format("{:10d}", true) is right-aligned.

232

#ifdef can be omitted because there is nothing that use std::locale in __determine_grouping. It will just be unused if locales are disabled.

305

Why store __negative instead of passing it as an argument to __format_unsigned_integral?

315

nit: I'd rename it to __format_as_char because __value is not a char but an integer which we convert and format as char.

316–317

Isn't the alignment supposed to be resolved at parse time by calling __handle_char / __handle_bool?

330

_CharT is already signed so make_signed_t doesn't do anything. Did you mean make_unsigned_t?

libcxx/test/std/utilities/format/format.functions/tests.inc
380–381

Per my earlier comment this should indeed be left-aligned.

This revision now requires changes to proceed.Jul 17 2021, 8:48 AM
vitaut added inline comments.Jul 18 2021, 6:13 AM
libcxx/include/__format/formatter_integral.h
549

In general formatters should only contain information derived from the format string, not from the value. Among other things this would inhibit format string compilation in the future. Please remove.

Mordante marked 11 inline comments as done.Jul 18 2021, 9:42 AM

Thanks for the review!

libcxx/include/__format/formatter_integer.h
112–117

libc++ accepts 128-bit integers, but silently truncates them to 64-bit. IMO this is a bug and it's on my TODO to fix this.
It's visible in the generated assembly, it contains a`std::1::itoa::__pow10_64` lookup table instead of a std::__1::__itoa::__pow10_128.

libcxx/include/__format/formatter_integral.h
89

Good catch!

137

This function is only called when a char is used for the char representation is selected.
Only the __handle_integer is used slightly different. That was basically done to use different formatting for
format("{:6c}", 42) and format("{:6}", '*'). But I'll adjust __handle_integer.

See my response to https://reviews.llvm.org/D103433/new/#inline-1010793 for more information.

305

Basically to avoid passing an argument to a forwarding function. IMO both are a valid design choice. But as you mention in another comment "Among other things this would inhibit format string compilation in the future. Please remove.". So I'll apply your suggestion.

316–317

Only partly, not for integers formatted as char. But I'll adjust this.

See my response to https://reviews.llvm.org/D103433/new/#inline-1010793 for more information.

330

Should indeed be make_unsigned_t. Odd the compiler doesn't complain about a useless cast.

libcxx/test/std/utilities/format/format.functions/tests.inc
380–381

I assumed that was intended. However my interpretation of the wording requires it to be right aligned:
This is the default for arithmetic types other than charT and bool or when an integer presentation type is specified.
My interpretation is that the supplied type in an aritmetic type, so this rule applies.
Do you agree with my interpretation?

I agree it's silly that the result of format("{:6c}", 42); differs from the result of format("{:6}", '*');. So I'll adjust it.

Mordante updated this revision to Diff 359633.Jul 18 2021, 9:44 AM
Mordante marked 7 inline comments as done.

Rebase
Addresses review comments
Contrain the formatter to the allowed character types
Use private module headers
Guard unit tests for implementation details

Quuxplusone added inline comments.
libcxx/include/__format/formatter_integer.h
112–117

That is definitely a bug; see https://godbolt.org/z/j9r7YG8aq where there's a discontinuity at base=10 (but I think all the other outputs are correct, probably?). It could be quickly patched like this:

diff --git a/libcxx/include/charconv b/libcxx/include/charconv
index 4a5c65df7c75..74b8993cf392 100644
--- a/libcxx/include/charconv
+++ b/libcxx/include/charconv
@@ -351,6 +351,9 @@ _LIBCPP_AVAILABILITY_TO_CHARS
 inline _LIBCPP_INLINE_VISIBILITY to_chars_result
 __to_chars_itoa(char* __first, char* __last, _Tp __value, false_type)
 {
+    if (sizeof(_Tp) > 8) {
+      return __to_chars_integral(__first, __last, __value, 10, false_type());
+
     using __tx = __itoa::__traits<_Tp>;
     auto __diff = __last - __first;
 
@@ -426,7 +429,7 @@ inline _LIBCPP_INLINE_VISIBILITY to_chars_result
 __to_chars_integral(char* __first, char* __last, _Tp __value, int __base,
                     false_type)
 {
-  if (__base == 10)
+  if (__base == 10 && sizeof(_Tp) <= 8)
     return __to_chars_itoa(__first, __last, __value, false_type());
 
   ptrdiff_t __cap = __last - __first;

But we'd have to add test coverage too.

Mordante marked an inline comment as done.Jul 19 2021, 10:59 AM
Mordante added inline comments.
libcxx/include/__format/formatter_integer.h
112–117

Since we've been shipping this implementation of std::to_chars for a while I didn't have a lot of urgency to fix it. I still want to address this issue later. I've identified more issues with <charconv> and I want to look at these later. For now I want to focus on the formatting library, but <charconv> is still high on my list. Of course if you want to implement this work around feel free to do so; I would be happy to review it.

Mordante marked an inline comment as done.Jul 20 2021, 10:14 AM
Mordante added inline comments.
libcxx/include/__format/formatter.h
138

s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/ here and all other occurrences in this patch.

vitaut accepted this revision.Jul 25 2021, 8:27 AM

Looks good, thanks for addressing the comments.

libcxx/test/std/utilities/format/format.functions/tests.inc
380–381

Yeah, I think P1652 missed this part of the wording. The interpretation depends on whether you look at the original types or types after conversion per presentation specifiers. I suggest opening an LWG issue to clarify that we want the latter.

Mordante planned changes to this revision.Jul 30 2021, 4:22 AM
Mordante marked an inline comment as not done.

Needs updates due to changes in D103368.

Mordante updated this revision to Diff 364405.Aug 5 2021, 3:52 AM

Rebased and adjusted for changes in main
Retarget for LLVM 14
s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/
Remove nodiscard
Remove obsolete unit test
Generate private module tests

vitaut added a comment.Aug 8 2021, 8:47 AM

LGTM

libcxx/include/__format/formatter_integral.h
78–79

I suggest removing the "For example the 128-bit integral formatters use the 64-bit version to do the formatting." because it's a workaround that will go away per TODO in format(__int128_t __value, auto& __ctx) rendering this example wrong. In general I don't think there is a good reason for one specialization to reuse another.

314–320

Why not use a conditional expression?

return __format_unsigned_integral(__array.begin(), __array.end(),
                                  __value, __negative, 8, __ctx, __value != 0 ? "0" : nullptr);
364–365

Maybe add a TODO to format upper-case hex in one pass? As a nice side effect you could probably also kill dependency on <algorithm> which is super bloated in 20.

Mordante added inline comments.Aug 8 2021, 12:50 PM
libcxx/include/__format/formatter_integral.h
78–79

Valid point and I think the 128-bit work-around is the only place it's used.

314–320

No specific reason not to use it. But in this case looks better with the conditional expression.

364–365

I'll add a TODO. I'm not sure I'll be able to remove <algorithm>, I use more parts of it. But I can have a look to use the new smaller <__algorithm/foo.h> headers. That's something that has been worked on after my initial version of this patch.

Mordante updated this revision to Diff 365527.Aug 10 2021, 10:06 AM
Mordante marked 4 inline comments as done.

Addresses review comments.

Mordante updated this revision to Diff 370735.Sep 4 2021, 5:52 AM

Rebased and adjusted for changes in main
s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/

Mordante updated this revision to Diff 372760.Sep 15 2021, 11:26 AM

Rebased and updated to changes in earlier commits in the series.

ldionne accepted this revision.Oct 6 2021, 12:17 PM
ldionne added inline comments.
libcxx/include/__format/formatter_integer.h
112–117

Can we file a bug report to track this work item? I'm fearful we'll forget about it forever otherwise.

libcxx/include/__format/formatter_integral.h
490

This block is long enough that we can't easily see what #if the #endif relates to.

This revision is now accepted and ready to land.Oct 6 2021, 12:17 PM
Mordante marked an inline comment as done.Oct 6 2021, 12:31 PM
Mordante added inline comments.
libcxx/include/__format/formatter_integer.h
112–117

I've marked it as TODO's in the format related files, for example line 151 in this file
// TODO FMT Implement full 128 bit support.

So I don't fear we'll forget it. I don't consider format fully implemented before these TODO FMTs are resolved.

Mordante marked 2 inline comments as done.Oct 7 2021, 8:07 AM
This revision was automatically updated to reflect the committed changes.