Page MenuHomePhabricator

[libc++][format] Adds parser std-format-spec.

Authored by Mordante on May 29 2021, 1:09 PM.



This implements the generic std.format.spec framework for all types.

The Unicode support will be added in a separate patch.

Implements parts of:

  • P0645 Text Formatting


  • LWG-3242 std::format: missing rules for arg-id in width and precision
  • P1892 Extended locale-specific presentation specifiers for std::format

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Mordante added inline comments.Jun 27 2021, 6:42 AM

I personally like to report errors as early as possible. So for types where the sign makes no sense, for example strings, I like to report them as soon as they're found. For the other types I indeed still need post processing based on the type.


Removed. The if (__r.first == __end) validation in this function has also been removed.


I think that can lead to errors. When _Type is uint64_t containing UINT32_MAX + 1 it will return 1 without this test.


This was on "the 999.999.999 maximum"-will-never-be-used-in-practice assumption. I've adjusted the code to handle every non-negative int32_t.


true :-)


Yes that is now handled in the replacement field of the previous patch.

Mordante updated this revision to Diff 354745.Jun 27 2021, 7:12 AM
Mordante marked 8 inline comments as done.

Addresses review comments:

  • Replacement field shouldn't require a terminating '}'
  • Change error strings
  • Allow precision and width to use the whole positive range of int32_t
  • Update module map
  • Several minor improvements
Mordante updated this revision to Diff 356978.Jul 7 2021, 9:01 AM

Only test the exact exception messages when testing libc++, as noted in D96664.

Mordante updated this revision to Diff 357275.Jul 8 2021, 10:47 AM

Use new monostate header.

vitaut requested changes to this revision.Jul 11 2021, 9:01 AM
vitaut added inline comments.

Fixed should have lower and uppercase variants too (f and F).


should I file a defect report?

Please do open a LWG issue. The "character" term is ambiguous but if interpreted in the C++ sense the current wording is clearly wrong because you cannot even use box drawing characters (in the Unicode sense). We can discuss the details in the LWG issue.


In addition to being conceptually wrong having these checks penalizes many common cases. I strongly suggest making all __parser_no_* cases noops. The only reason the standard wording uses a single grammar for built-in and string types is simplicity but in reality these are separate per-type grammars and should be treated as such. Moreover, those unnecessary checks won't apply to chrono and UDTs potentially adding more confusion.


I suggest moving this check into __substitute_arg_id which will allow having one check instead of two for the common case of signed width.


So if the user passes a precision of INT32_MAX it will be treated as "no precision" (and asserted on in debug mode) which is somewhat arbitrary and may lead to the same problems I mentioned earlier. To avoid this you could use two values above INT32_MAX (e.g. INT32_MAX + 1 and INT32_MAX + 2 to denote "no precision" and __precision_as_arg respectively.

This revision now requires changes to proceed.Jul 11 2021, 9:01 AM
Mordante marked 2 inline comments as done.Jul 12 2021, 12:36 PM

Thanks for the review!


The upper and lower case are formatted the same. So the the type function maps both f and F to fixed.


I guess I need more comment for this code ;-)
When the format-spec has a value __precision_as_arg == 0.
When the format-spec has an arg-id all arg-id's except for INT32_MAX are a valid arg-id. If the arg-id gets resolved to INT32_MAX this value is considered a limit.

vitaut added inline comments.Jul 13 2021, 8:40 AM

Not quite, NaN and infinity are formatted differently.


In other words INT32_MAX is already handled correctly with the current approach or have I misunderstood your comment?

Mordante marked 6 inline comments as done.Jul 14 2021, 11:12 AM
Mordante added inline comments.

I forgot about these cases.


Thanks for the additional explanation. I now see what you meant before. For now I'll make them noops. I'll also add a TODO because with this new insight I might want to look at a cleaner solution later.


__substitute_arg_id is also used to substitute the precision. A precision of 0 is valid. I'll give it some thought, but at the moment I don't see a good alternative.


Yes INT32_MAX is currently handled correctly.

Mordante marked 4 inline comments as done.Jul 15 2021, 11:11 AM
Mordante updated this revision to Diff 359062.Jul 15 2021, 11:15 AM

Made module headers private
Addresses review comments
Removed unit tests no longer required due change to the parser_no changes
Marked other unit tests as libc++ specific

vitaut accepted this revision.Jul 17 2021, 6:31 AM

Looks great, thanks for addressing the comments!

vitaut added inline comments.Jul 17 2021, 7:00 AM

So have you or do you plan to open an LWG issue? Otherwise I can do it.

Mordante added inline comments.Jul 18 2021, 6:53 AM

I plan to file an LWG issue. But I first want to address all other review comments. It's on my TODO list.

Mordante added inline comments.Jul 20 2021, 10:13 AM

s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/ here and all other occurrences in this patch.

Mordante updated this revision to Diff 362991.Jul 30 2021, 1:27 AM
  • Rebased and updated to work against main.
  • Remove [[nodiscard]]s.
  • Refactor the std-format-spec parser. Since the parser_no_ classes are a kind of useless dummy they can be removed. Due to this change several tests no longer that useful, they are removed.
  • Test with designated initializers to improve readability of the tests. When this passes the CI I want to improve more of the tests.
Mordante planned changes to this revision.Jul 30 2021, 1:30 AM

When the build passes I want to polish the unit tests a bit more.

Apologies for the AArch64 test timeout. There are some issues on our end that I'm currently working on.

Apologies for the AArch64 test timeout. There are some issues on our end that I'm currently working on.

Thanks for the update.

Mordante updated this revision to Diff 363029.Jul 30 2021, 4:07 AM

Retarget for llvm 14.
Improve unit tests by using designated initializers.

Apologies for the AArch64 test timeout. There are some issues on our end that I'm currently working on.

(as you can see from the build) This has now been resolved.

Mordante marked 2 inline comments as done.Aug 10 2021, 10:56 AM
Mordante added inline comments.
Mordante updated this revision to Diff 370723.Sep 4 2021, 3:00 AM

Rebased against main.
Adds missing includes

ldionne requested changes to this revision.Sep 9 2021, 3:56 PM

Nothing major, I'm leaving the correctness of the implementation to @vitaut, who already reviewed it.


You don't need this (nor the _LIBCPP_POP_MACROS at the end) if you don't use min() or max() in this file.


This assertion message is not very helpful, we should explain *why* it's unexpected that begin == end instead.


What do you mean by "freestanding" here?

Also, this header should include what it uses.


We don't want to print error messages like that in the tests, specifically because that doesn't work on some platforms where stderr & friends are not supported. I'd rather just remove this if constexpr. Same goes for below.

This revision now requires changes to proceed.Sep 9 2021, 3:56 PM
Mordante marked 6 inline comments as done.Sep 15 2021, 10:09 AM
Mordante added inline comments.

Good catch, this indeed looks weird.


I'm not too happy with this change, it's what clang-format did. I created D109835 containing an updated clang-format settings so this won't be undone when I rerun clang-format.


Freestanding means the header depends on the state of the translation unit before the header itself. Hence it requires no includes.
It depends on the parser type in the test using it. I changed the unit tests to supply that argument as template argument, so the header can be a normal header.


I expect stderr to be valid on all platforms what do not define _LIBCPP_HAS_NO_LOCALIZATION. AFAIK this macro should be the proper guard.
I don't mind removing them too much, so I'll remove them.

Mordante updated this revision to Diff 372747.Sep 15 2021, 10:36 AM
Mordante marked 4 inline comments as done.

Addresses review comments.
Ran clang-format with the settings added in D109835.
Adds a small comment.

ldionne accepted this revision.Sep 20 2021, 5:26 PM
This revision is now accepted and ready to land.Sep 20 2021, 5:26 PM
This revision was automatically updated to reflect the committed changes.