This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format] Adds parser std-format-spec.
ClosedPublic

Authored by Mordante on May 29 2021, 1:09 PM.

Details

Summary

This implements the generic std.format.spec framework for all types.

The Unicode support will be added in a separate patch.

Implements parts of:

  • P0645 Text Formatting

Completes:

  • LWG-3242 std::format: missing rules for arg-id in width and precision
  • P1892 Extended locale-specific presentation specifiers for std::format

Diff Detail

Event Timeline

Mordante created this revision.May 29 2021, 1:09 PM
Mordante requested review of this revision.May 29 2021, 1:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2021, 1:09 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Mordante updated this revision to Diff 348655.May 29 2021, 1:16 PM

Reupload, forgot to set the dependencies so CI failed.

Mordante updated this revision to Diff 348680.May 30 2021, 3:13 AM

Updates the paper status.
Fixes no-except and no-localization unit tests.

Mordante updated this revision to Diff 349296.Jun 2 2021, 9:02 AM

Rebase.
Mark unit test requiring concepts.
Minor cleanup.

Mordante updated this revision to Diff 349300.Jun 2 2021, 9:25 AM

Tabs -> spaces.

Mordante edited the summary of this revision. (Show Details)Jun 5 2021, 4:09 AM
Mordante added reviewers: curdeius, ldionne, miscco, vitaut.
vitaut requested changes to this revision.Jun 17 2021, 9:02 AM
vitaut added inline comments.
libcxx/include/__format/parser_std_format_spec.h
133

This should really be a grapheme cluster but {fmt} uses a code point at the moment (grapheme clusterization will be added later). In any case this should be a range (e.g. string_view) and not a single code unit.

139

This will have to change for the same reason.

185–186

This is incorrect because format specs can consist of a single sign specifier. Same for other "End of input" errors.

197–202

I don't think you need to have this check because it will be caught elsewhere anyway. Format specifiers are type-specific and it generally makes more sense to say what is supported rather than what is not. Same for other __parser_no_* cases.

278–280

This is redundant and only adds an overhead without substantially improving diagnostics. The error message is not particularly correct either.

308–309

It might be better to just return a large value and let the error be reported elsewhere (when the ID is actually used). It's both more efficient and gives better diagnostic without referring to internal magic limits.

322

with -> width

374

Note that any nonnegative int32_t value is a valid precision so I'm not sure __number_max is OK here.

This revision now requires changes to proceed.Jun 17 2021, 9:02 AM
vitaut added inline comments.Jun 18 2021, 7:21 AM
libcxx/include/__format/parser_std_format_spec.h
422

Looks like a stray "without".

472

This is a wrong place to do such validation because format specs may not have }.

552

Did you mean std-format-spec?

557–559

It does now.

583

This isn't correct. std-format-spec doesn't have to be terminated by anything. formatter::parse can be used to parse standalone format specs. } is part of replacement-field and should be handled there.

Mordante marked 13 inline comments as done.Jun 27 2021, 6:42 AM
Mordante added inline comments.
libcxx/include/__format/parser_std_format_spec.h
133

My interpretation of http://eel.is/c++draft/format#string.std-2

The fill character can be any character other than { or }.

is that this is intended to be a single character and not a Unicode grapheme cluster. I can't find a paper or defect report that intends to change the current behaviour. Was it intended to be part of P1868R2 "🦄 width: clarifying units of width and precision in std::format" ?

Is there a paper or defect report, or should I file a defect report?

I don't mind changing it, but for now there are some unclarities how to implement this feature:
Are 2 column Unicode characters allowed?

  • if yes, how to do odd even formatting?
  • if no, then is it a precondition or should the implementation do an width estimate?

For now I'll leave it as is, there's a TODO FMT to address this issue later.

139

Likewise will be done later.

Mordante added inline comments.Jun 27 2021, 6:42 AM
libcxx/include/__format/parser_std_format_spec.h
197–202

I personally like to report errors as early as possible. So for types where the sign makes no sense, for example strings, I like to report them as soon as they're found. For the other types I indeed still need post processing based on the type.

278–280

Removed. The if (__r.first == __end) validation in this function has also been removed.

308–309

I think that can lead to errors. When _Type is uint64_t containing UINT32_MAX + 1 it will return 1 without this test.

374

This was on "the 999.999.999 maximum"-will-never-be-used-in-practice assumption. I've adjusted the code to handle every non-negative int32_t.

557–559

true :-)

583

Yes that is now handled in the replacement field of the previous patch.

Mordante updated this revision to Diff 354745.Jun 27 2021, 7:12 AM
Mordante marked 8 inline comments as done.

Addresses review comments:

  • Replacement field shouldn't require a terminating '}'
  • Change error strings
  • Allow precision and width to use the whole positive range of int32_t
  • Update module map
  • Several minor improvements
Mordante updated this revision to Diff 356978.Jul 7 2021, 9:01 AM

Only test the exact exception messages when testing libc++, as noted in D96664.

Mordante updated this revision to Diff 357275.Jul 8 2021, 10:47 AM

Rebase.
Use new monostate header.
Use LIBCPP_ASSERT.

vitaut requested changes to this revision.Jul 11 2021, 9:01 AM
vitaut added inline comments.
libcxx/include/__format/parser_std_format_spec.h
101

Fixed should have lower and uppercase variants too (f and F).

133

should I file a defect report?

Please do open a LWG issue. The "character" term is ambiguous but if interpreted in the C++ sense the current wording is clearly wrong because you cannot even use box drawing characters (in the Unicode sense). We can discuss the details in the LWG issue.

197–202

In addition to being conceptually wrong having these checks penalizes many common cases. I strongly suggest making all __parser_no_* cases noops. The only reason the standard wording uses a single grammar for built-in and string types is simplicity but in reality these are separate per-type grammars and should be treated as such. Moreover, those unnecessary checks won't apply to chrono and UDTs potentially adding more confusion.

361–363

I suggest moving this check into __substitute_arg_id which will allow having one check instead of two for the common case of signed width.

389

So if the user passes a precision of INT32_MAX it will be treated as "no precision" (and asserted on in debug mode) which is somewhat arbitrary and may lead to the same problems I mentioned earlier. To avoid this you could use two values above INT32_MAX (e.g. INT32_MAX + 1 and INT32_MAX + 2 to denote "no precision" and __precision_as_arg respectively.

This revision now requires changes to proceed.Jul 11 2021, 9:01 AM
Mordante marked 2 inline comments as done.Jul 12 2021, 12:36 PM

Thanks for the review!

libcxx/include/__format/parser_std_format_spec.h
101

The upper and lower case are formatted the same. So the the type function maps both f and F to fixed.

389

I guess I need more comment for this code ;-)
When the format-spec has a value __precision_as_arg == 0.
When the format-spec has an arg-id all arg-id's except for INT32_MAX are a valid arg-id. If the arg-id gets resolved to INT32_MAX this value is considered a limit.

vitaut added inline comments.Jul 13 2021, 8:40 AM
libcxx/include/__format/parser_std_format_spec.h
101

Not quite, NaN and infinity are formatted differently.

389

In other words INT32_MAX is already handled correctly with the current approach or have I misunderstood your comment?

Mordante marked 6 inline comments as done.Jul 14 2021, 11:12 AM
Mordante added inline comments.
libcxx/include/__format/parser_std_format_spec.h
101

I forgot about these cases.

197–202

Thanks for the additional explanation. I now see what you meant before. For now I'll make them noops. I'll also add a TODO because with this new insight I might want to look at a cleaner solution later.

361–363

__substitute_arg_id is also used to substitute the precision. A precision of 0 is valid. I'll give it some thought, but at the moment I don't see a good alternative.

389

Yes INT32_MAX is currently handled correctly.

Mordante marked 4 inline comments as done.Jul 15 2021, 11:11 AM
Mordante updated this revision to Diff 359062.Jul 15 2021, 11:15 AM

Rebased
Made module headers private
Addresses review comments
Removed unit tests no longer required due change to the parser_no changes
Marked other unit tests as libc++ specific

vitaut accepted this revision.Jul 17 2021, 6:31 AM

Looks great, thanks for addressing the comments!

vitaut added inline comments.Jul 17 2021, 7:00 AM
libcxx/include/__format/parser_std_format_spec.h
133

So have you or do you plan to open an LWG issue? Otherwise I can do it.

Mordante added inline comments.Jul 18 2021, 6:53 AM
libcxx/include/__format/parser_std_format_spec.h
133

I plan to file an LWG issue. But I first want to address all other review comments. It's on my TODO list.

Mordante added inline comments.Jul 20 2021, 10:13 AM
libcxx/include/__format/parser_std_format_spec.h
118

s/_LIBCPP_INLINE_VISIBILITY/_LIBCPP_HIDE_FROM_ABI/ here and all other occurrences in this patch.

Mordante updated this revision to Diff 362991.Jul 30 2021, 1:27 AM
  • Rebased and updated to work against main.
  • Remove [[nodiscard]]s.
  • s/_LIBCPP_TYPE_VIS/_LIBCPP_HIDE_FROM_ABI/
  • Refactor the std-format-spec parser. Since the parser_no_ classes are a kind of useless dummy they can be removed. Due to this change several tests no longer that useful, they are removed.
  • Test with designated initializers to improve readability of the tests. When this passes the CI I want to improve more of the tests.
Mordante planned changes to this revision.Jul 30 2021, 1:30 AM

When the build passes I want to polish the unit tests a bit more.

Apologies for the AArch64 test timeout. There are some issues on our end that I'm currently working on.

Apologies for the AArch64 test timeout. There are some issues on our end that I'm currently working on.

Thanks for the update.

Mordante updated this revision to Diff 363029.Jul 30 2021, 4:07 AM

Retarget for llvm 14.
Improve unit tests by using designated initializers.

Apologies for the AArch64 test timeout. There are some issues on our end that I'm currently working on.

(as you can see from the build) This has now been resolved.

Mordante marked 2 inline comments as done.Aug 10 2021, 10:56 AM
Mordante added inline comments.
libcxx/include/__format/parser_std_format_spec.h
133
Mordante updated this revision to Diff 370723.Sep 4 2021, 3:00 AM

Rebased against main.
Adds missing includes
s/_LIBCPP_NO_EXCEPTIONS/TEST_HAS_NO_EXCEPTIONS/

ldionne requested changes to this revision.Sep 9 2021, 3:56 PM

Nothing major, I'm leaving the correctness of the implementation to @vitaut, who already reviewed it.

libcxx/include/__format/parser_std_format_spec.h
27–28

You don't need this (nor the _LIBCPP_POP_MACROS at the end) if you don't use min() or max() in this file.

141

This assertion message is not very helpful, we should explain *why* it's unexpected that begin == end instead.

616
libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_integral.pass.cpp
24
libcxx/test/libcxx/utilities/format/format.string/format.string.std/test_exception.h
12

What do you mean by "freestanding" here?

Also, this header should include what it uses.

22–24

We don't want to print error messages like that in the tests, specifically because that doesn't work on some platforms where stderr & friends are not supported. I'd rather just remove this if constexpr. Same goes for below.

This revision now requires changes to proceed.Sep 9 2021, 3:56 PM
Mordante marked 6 inline comments as done.Sep 15 2021, 10:09 AM
Mordante added inline comments.
libcxx/include/__format/parser_std_format_spec.h
616

Good catch, this indeed looks weird.

libcxx/test/libcxx/utilities/format/format.string/format.string.std/std_format_spec_integral.pass.cpp
24

I'm not too happy with this change, it's what clang-format did. I created D109835 containing an updated clang-format settings so this won't be undone when I rerun clang-format.

libcxx/test/libcxx/utilities/format/format.string/format.string.std/test_exception.h
12

Freestanding means the header depends on the state of the translation unit before the header itself. Hence it requires no includes.
It depends on the parser type in the test using it. I changed the unit tests to supply that argument as template argument, so the header can be a normal header.

22–24

I expect stderr to be valid on all platforms what do not define _LIBCPP_HAS_NO_LOCALIZATION. AFAIK this macro should be the proper guard.
I don't mind removing them too much, so I'll remove them.

Mordante updated this revision to Diff 372747.Sep 15 2021, 10:36 AM
Mordante marked 4 inline comments as done.

Addresses review comments.
Ran clang-format with the settings added in D109835.
Adds a small comment.

ldionne accepted this revision.Sep 20 2021, 5:26 PM
This revision is now accepted and ready to land.Sep 20 2021, 5:26 PM
This revision was automatically updated to reflect the committed changes.