This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
docs/
-
ReleaseNotes.rst
-
Status/
-
Cxx2bPapers.csv
-
FormatIssues.csv
-
include/__format/
-
__format/
-
formatter_floating_point.h
-
formatter_integral.h
-
formatter_output.h
9/9
parser_std_format_spec.h
-
test/std/utilities/format/format.functions/
-
std/
-
utilities/
-
format/
-
format.functions/
5/5
fill.unicode.pass.cpp
-
utils/ci/
-
ci/
-
run-buildbot

Differential D144742

[libc++][format] Improves fill character.
ClosedPublic

Authored by Mordante on Feb 24 2023, 9:03 AM.

Download Raw Diff

Details

Reviewers

ldionne
tahonermann
vitaut

Group Reviewers

Restricted Project

Commits

rG5db033e204b2: [libc++][format] Improves fill character.

Summary

The main change is to allow a UCS scalar value as fill character.
Especially for char based formatting this increase the number of valid
characters. Originally this was to be expected ABI breaking, however the
current change does not seem to break the ABI.

Implements

P2572 std::format() fill character allowances

Depends on D144499

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Mordante created this revision.Feb 24 2023, 9:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 9:03 AM

Harbormaster completed remote builds in B215786: Diff 500233.Feb 24 2023, 9:38 AM

CI fixes.

Herald added a subscriber: arichardson. · View Herald TranscriptFeb 24 2023, 11:57 AM

Harbormaster completed remote builds in B215813: Diff 500269.Feb 24 2023, 12:26 PM

Rebased

Harbormaster completed remote builds in B216031: Diff 500537.Feb 26 2023, 3:49 AM

CI fixes.

Harbormaster completed remote builds in B216036: Diff 500544.Feb 26 2023, 4:37 AM

Rebased.

Harbormaster completed remote builds in B226949: Diff 515436.Apr 20 2023, 1:29 PM

CI fixes.

Harbormaster completed remote builds in B227449: Diff 516062.Apr 22 2023, 9:55 AM

CI Fixes.

Test CI

Herald added a subscriber: mstorsjo. · View Herald TranscriptApr 22 2023, 11:31 AM

Harbormaster completed remote builds in B227474: Diff 516090.Apr 22 2023, 11:55 AM

Try to determine error.

Harbormaster completed remote builds in B227477: Diff 516094.Apr 22 2023, 12:31 PM

Fixes UTF-16 bug.

Harbormaster completed remote builds in B227480: Diff 516098.Apr 22 2023, 1:09 PM

Tests CI.

Trigger CI.

Harbormaster completed remote builds in B228603: Diff 517630.Apr 27 2023, 10:35 AM

Trigger CI.

Harbormaster completed remote builds in B228610: Diff 517641.Apr 27 2023, 11:03 AM

Adds missing paren.

Harbormaster completed remote builds in B228625: Diff 517657.Apr 27 2023, 11:47 AM

Rebased and code polishing.

Harbormaster completed remote builds in B228860: Diff 517966.Apr 28 2023, 1:43 PM

Mordante published this revision for review.Apr 30 2023, 4:32 AM

Mordante added reviewers: ldionne, tahonermann, vitaut.

Mordante added inline comments.

libcxx/include/__format/parser_std_format_spec.h
432–433

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2023, 4:32 AM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

This looks great. I added one comment seeking clarification when an encoding error is present. If the code is as intended, it might help to add a comment to explain what is happening.

libcxx/include/__format/parser_std_format_spec.h
452–457	This seems a little odd to me. When consumption of the fill character fails (due to an encoding issue), an attempt is still made to parse the alignment at the new position before checking for the consumption error and then reporting a parse issue. I'm not sure why that attempt is made since success is going to lead to reporting the fill character failure and failure is going to result in falling through to retry parsing the alignment at the beginning anyway. If the intent is to get to different error messages, that seems reasonable, but it seems this can fallback to trying to parse the alignment at the beginning anyway. Am I missing something?

JMazurkiewicz added a subscriber: JMazurkiewicz.May 12 2023, 4:39 PM

JMazurkiewicz added inline comments.

libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp
65

Thanks for the reviews!

libcxx/include/__format/parser_std_format_spec.h
452–457	I wrote this in this way due to historic reasons. A character like 0 is a fill-character when followed by an alignment. Else it's a zero-padding. Since valid elements of the format-spec are not invalid Unicode this is not needed. I adjusted it, but reverted it again. For UTF-32 it makes sense to only test when it's a fill character, and I like to keep the same diagnostic regardless of the encoding used. So I added comment to explain the design.

Rebased and addresses review comments.

Harbormaster completed remote builds in B231772: Diff 521889.May 13 2023, 5:39 AM

ldionne accepted this revision.May 16 2023, 9:23 AM

This revision is now accepted and ready to land.May 16 2023, 9:23 AM

ldionne added inline comments.May 16 2023, 9:24 AM

libcxx/include/__format/parser_std_format_spec.h
231	Maybe this could be renamed to `__codepoint`? That way you'd have `__codepoint<_CharT> __fill_;`.

Mordante added inline comments.May 17 2023, 11:20 AM

libcxx/include/__format/parser_std_format_spec.h
452–457	@tahonermann are you happy with this comment?

tahonermann added inline comments.May 17 2023, 12:48 PM

libcxx/include/__format/parser_std_format_spec.h
452–457	I agree with the goal of keeping the diagnostics aligned, but I don't think this change does that. The concern I have is that, when a code point isn't decoded by the call to `__view.__consume()`, the current location in the view will have been bumped by one code unit (based on my reading of the `__consume()` implementation for the `__code_point_view<char>` explicit specialization). There isn't much reason to expect a code point to be successfully decoded at that location. The likely result is that this block isn't entered (because the call to `__parse_alignment(__view.__position()` above fails to match an alignment character; note that it doesn't even attempt proper decoding) and execution falls through to the `__parse_alignment(__begin)` below which will likely return false thus leading to `__parse_fill_align()` returning with an indication that the fill-and-align option is not present. I haven't looked into what might happen next. The UTF-8 and UTF-16 cases are fundamentally different compared to UTF-32 since they are variable length encodings and the latter is a trivial fixed length encoding. I think it is reasonable to throw a distinct error in this case since this problem cannot occur in UTF-32 (a failure to decode a code point at all is subtly different than decoding a code point that is not a UCS scalar value). It is trivially easy to step to the next code unit sequence in UTF-32, but not in UTF-8 or UTF-16.
libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp
85	Suggested tests for ill-formed UTF-8 code unit sequences: check_exception("???", SV("{:\x80^}"), 42); // Trailing code unit with no leading one. check_exception("???", SV("{:\xc0^}"), 42); // Missing trailing code unit. check_exception("???", SV("{:\xe0\x80^}"), 42); // Missing trailing code unit. check_exception("???", SV("{:\xf0\x80^}"), 42); // Missing two trailing code units. check_exception("???", SV("{:\xf0\x80\x80^}"), 42); // Missing trailing code unit.
88–91	These all test lone surrogates. Here is a suggested test for reversed surrogates: check_exception("???", std::wstring_view{L"{:\xdc00\xd800^}"}, 42);

Addresses review comments.

Thanks fro the reviews!

libcxx/include/__format/parser_std_format_spec.h
452–457	Good point thanks! I tested with your additional suggested tests and the `fill-and-align` is indeed not properly detected for some of the new test cases. This results in a replacement-field that doesn't end with a '}'. There the error doesn't mention fill either. So having a different error for UTF-32 and UTF-8/16 seems the way forward. (I don't feel like doing more effort to possibly detect an alignment for an error message to be worth the code size.)
libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp
85	I tested this before changing the code in the header. The first to result in `The fill character contains an invalid value`, the third `"The fill character contains an invalid value`. (I didn't test the fourth and fifth.)
88–91	This also results in `"The format-spec should consume the input or end with a '}'`

Harbormaster completed remote builds in B232906: Diff 523427.May 18 2023, 12:54 PM

This looks good to me now. I added one comment with suggested alternatives for the new error message, but your call on whether you like them better then what you have.

libcxx/include/__format/parser_std_format_spec.h
451	"... contains an ill-formed code unit sequence" seems more accurate to me, but that is probably too technical for the intended audience.

Thanks for the review!

libcxx/include/__format/parser_std_format_spec.h
451	"... contains an ill-formed code unit sequence" seems more accurate to me, but that is probably too technical for the intended audience. Yes, I tried to keep the target audience in mind when I wrote the message. If it was purely internally I would have picked a more accurate message.

Closed by commit rG5db033e204b2: [libc++][format] Improves fill character. (authored by Mordante). · Explain WhyMay 19 2023, 8:21 AM

This revision was automatically updated to reflect the committed changes.

Mordante marked an inline comment as done.

Mordante added a commit: rG5db033e204b2: [libc++][format] Improves fill character..

Revision Contents

Path

Size

libcxx/

docs/

ReleaseNotes.rst

1 line

Status/

Cxx2bPapers.csv

2 lines

FormatIssues.csv

2 lines

include/

__format/

formatter_floating_point.h

4 lines

formatter_integral.h

2 lines

formatter_output.h

41 lines

parser_std_format_spec.h

201 lines

test/

std/

utilities/

format/

format.functions/

fill.unicode.pass.cpp

108 lines

utils/

ci/

run-buildbot

1 line

Diff 516084

libcxx/docs/ReleaseNotes.rst

	Show All 35 Lines
	============================			============================

	Implemented Papers			Implemented Papers
	------------------			------------------
	- P2520R0 - ``move_iterator<T*>`` should be a random access iterator			- P2520R0 - ``move_iterator<T*>`` should be a random access iterator
	- P1328R1 - ``constexpr type_info::operator==()``			- P1328R1 - ``constexpr type_info::operator==()``
	- P1413R3 - Formatting ``thread::id`` (the ``stacktrace`` is not done yet)			- P1413R3 - Formatting ``thread::id`` (the ``stacktrace`` is not done yet)
	- P2675R1 - ``format``'s width estimation is too approximate and not forward compatible			- P2675R1 - ``format``'s width estimation is too approximate and not forward compatible
				- P2572R1 - ``std::format`` fill character allowances

	Improvements and New Features			Improvements and New Features
	-----------------------------			-----------------------------
	- ``std::equal`` and ``std::ranges::equal`` are now forwarding to ``std::memcmp`` for integral types and pointers,			- ``std::equal`` and ``std::ranges::equal`` are now forwarding to ``std::memcmp`` for integral types and pointers,
	which can lead up to 40x performance improvements.			which can lead up to 40x performance improvements.

	- ``std::string_view`` now provides iterators that check for out-of-bounds accesses when the safe			- ``std::string_view`` now provides iterators that check for out-of-bounds accesses when the safe
	libc++ mode is enabled.			libc++ mode is enabled.
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

libcxx/docs/Status/Cxx2bPapers.csv

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	"","","","","","",""			"","","","","","",""
	"`P0290R4 <https://wg21.link/P0290R4>`__","LWG", "``apply()`` for ``synchronized_value<T>``","February 2023","","","\|concurrency TS\|"			"`P0290R4 <https://wg21.link/P0290R4>`__","LWG", "``apply()`` for ``synchronized_value<T>``","February 2023","","","\|concurrency TS\|"
	"`P2770R0 <https://wg21.link/P2770R0>`__","LWG", "Stashing stashing ``iterators`` for proper flattening","February 2023","","","\|ranges\|"			"`P2770R0 <https://wg21.link/P2770R0>`__","LWG", "Stashing stashing ``iterators`` for proper flattening","February 2023","","","\|ranges\|"
	"`P2164R9 <https://wg21.link/P2164R9>`__","LWG", "``views::enumerate``","February 2023","","","\|ranges\|"			"`P2164R9 <https://wg21.link/P2164R9>`__","LWG", "``views::enumerate``","February 2023","","","\|ranges\|"
	"`P2711R1 <https://wg21.link/P2711R1>`__","LWG", "Making multi-param constructors of ``views`` ``explicit``","February 2023","\|Partial\| [#note-P2711R1]_","","\|ranges\|"			"`P2711R1 <https://wg21.link/P2711R1>`__","LWG", "Making multi-param constructors of ``views`` ``explicit``","February 2023","\|Partial\| [#note-P2711R1]_","","\|ranges\|"
	"`P2609R3 <https://wg21.link/P2609R3>`__","LWG", "Relaxing Ranges Just A Smidge","February 2023","","","\|ranges\|"			"`P2609R3 <https://wg21.link/P2609R3>`__","LWG", "Relaxing Ranges Just A Smidge","February 2023","","","\|ranges\|"
	"`P2713R1 <https://wg21.link/P2713R1>`__","LWG", "Escaping improvements in ``std::format``","February 2023","","","\|format\|"			"`P2713R1 <https://wg21.link/P2713R1>`__","LWG", "Escaping improvements in ``std::format``","February 2023","","","\|format\|"
	"`P2675R1 <https://wg21.link/P2675R1>`__","LWG", "``format``'s width estimation is too approximate and not forward compatible","February 2023","\|Complete\|","17.0","\|format\|"			"`P2675R1 <https://wg21.link/P2675R1>`__","LWG", "``format``'s width estimation is too approximate and not forward compatible","February 2023","\|Complete\|","17.0","\|format\|"
	"`P2572R1 <https://wg21.link/P2572R1>`__","LWG", "``std::format`` fill character allowances","February 2023","","","\|format\|"			"`P2572R1 <https://wg21.link/P2572R1>`__","LWG", "``std::format`` fill character allowances","February 2023","\|Complete\|","17.0","\|format\|"
	"`P2693R1 <https://wg21.link/P2693R1>`__","LWG", "Formatting ``thread::id`` and ``stacktrace``","February 2023","\|Partial\| [#note-P2693R1]_","","\|format\|"			"`P2693R1 <https://wg21.link/P2693R1>`__","LWG", "Formatting ``thread::id`` and ``stacktrace``","February 2023","\|Partial\| [#note-P2693R1]_","","\|format\|"
	"`P2679R2 <https://wg21.link/P2679R2>`__","LWG", "Fixing ``std::start_lifetime_as`` for arrays","February 2023","","",""			"`P2679R2 <https://wg21.link/P2679R2>`__","LWG", "Fixing ``std::start_lifetime_as`` for arrays","February 2023","","",""
	"`P2674R1 <https://wg21.link/P2674R1>`__","LWG", "A trait for implicit lifetime types","February 2023","","",""			"`P2674R1 <https://wg21.link/P2674R1>`__","LWG", "A trait for implicit lifetime types","February 2023","","",""
	"`P2655R3 <https://wg21.link/P2655R3>`__","LWG", "``common_reference_t`` of ``reference_wrapper`` Should Be a Reference Type","February 2023","","",""			"`P2655R3 <https://wg21.link/P2655R3>`__","LWG", "``common_reference_t`` of ``reference_wrapper`` Should Be a Reference Type","February 2023","","",""
	"`P2652R2 <https://wg21.link/P2652R2>`__","LWG", "Disallow User Specialization of ``allocator_traits``","February 2023","","",""			"`P2652R2 <https://wg21.link/P2652R2>`__","LWG", "Disallow User Specialization of ``allocator_traits``","February 2023","","",""
	"`P2787R1 <https://wg21.link/P2787R1>`__","LWG", "``pmr::generator`` - Promise Types are not Values","February 2023","","",""			"`P2787R1 <https://wg21.link/P2787R1>`__","LWG", "``pmr::generator`` - Promise Types are not Values","February 2023","","",""
	"`P2614R2 <https://wg21.link/P2614R2>`__","LWG", "Deprecate ``numeric_limits::has_denorm``","February 2023","","",""			"`P2614R2 <https://wg21.link/P2614R2>`__","LWG", "Deprecate ``numeric_limits::has_denorm``","February 2023","","",""
	"`P2588R3 <https://wg21.link/P2588R3>`__","LWG", "``barrier``’s phase completion guarantees","February 2023","","",""			"`P2588R3 <https://wg21.link/P2588R3>`__","LWG", "``barrier``’s phase completion guarantees","February 2023","","",""
	"`P2763R1 <https://wg21.link/P2763R1>`__","LWG", "``layout_stride`` static extents default constructor fix","February 2023","","",""			"`P2763R1 <https://wg21.link/P2763R1>`__","LWG", "``layout_stride`` static extents default constructor fix","February 2023","","",""
	"`P2736R2 <https://wg21.link/P2736R2>`__","CWG","Referencing The Unicode Standard","February 2023","","","\|format\|"			"`P2736R2 <https://wg21.link/P2736R2>`__","CWG","Referencing The Unicode Standard","February 2023","","","\|format\|"

libcxx/docs/Status/FormatIssues.csv

	Number,Name,Standard,Assignee,Status,First released version			Number,Name,Standard,Assignee,Status,First released version
	`P0645 <https://wg21.link/P0645>`_,"Text Formatting","C++20",Mark de Wever,\|Complete\|,Clang 14			`P0645 <https://wg21.link/P0645>`_,"Text Formatting","C++20",Mark de Wever,\|Complete\|,Clang 14
	`P1652 <https://wg21.link/P1652>`_,"Printf corner cases in std::format","C++20",Mark de Wever,\|Complete\|,Clang 14			`P1652 <https://wg21.link/P1652>`_,"Printf corner cases in std::format","C++20",Mark de Wever,\|Complete\|,Clang 14
	`P1892 <https://wg21.link/P1892>`_,"Extended locale-specific presentation specifiers for std::format","C++20",Mark de Wever,\|Complete\|,Clang 14			`P1892 <https://wg21.link/P1892>`_,"Extended locale-specific presentation specifiers for std::format","C++20",Mark de Wever,\|Complete\|,Clang 14
	`P1868 <https://wg21.link/P1868>`_,"width: clarifying units of width and precision in std::format (Implements the unicode support.)","C++20",Mark de Wever,\|Complete\|,Clang 14			`P1868 <https://wg21.link/P1868>`_,"width: clarifying units of width and precision in std::format (Implements the unicode support.)","C++20",Mark de Wever,\|Complete\|,Clang 14
	`P2216 <https://wg21.link/P2216>`_,"std::format improvements","C++20",Mark de Wever,\|Complete\|,Clang 15			`P2216 <https://wg21.link/P2216>`_,"std::format improvements","C++20",Mark de Wever,\|Complete\|,Clang 15
	`P2418 <https://wg21.link/P2418>`__,"Add support for ``std::generator``-like types to ``std::format``","C++20",Mark de Wever,\|Complete\|, Clang 15			`P2418 <https://wg21.link/P2418>`__,"Add support for ``std::generator``-like types to ``std::format``","C++20",Mark de Wever,\|Complete\|, Clang 15
	"`P2093R14 <https://wg21.link/P2093R14>`__","Formatted output","C++23",Mark de Wever,\|In Progress\|,			"`P2093R14 <https://wg21.link/P2093R14>`__","Formatted output","C++23",Mark de Wever,\|In Progress\|,
	"`P2286R8 <https://wg21.link/P2286R8>`__","Formatting Ranges","C++23","Mark de Wever","\|Complete\|",Clang 16			"`P2286R8 <https://wg21.link/P2286R8>`__","Formatting Ranges","C++23","Mark de Wever","\|Complete\|",Clang 16
	"`P2508R1 <https://wg21.link/P2508R1>`__","Exposing ``std::basic-format-string``","C++23","Mark de Wever","\|Complete\|", Clang 15			"`P2508R1 <https://wg21.link/P2508R1>`__","Exposing ``std::basic-format-string``","C++23","Mark de Wever","\|Complete\|", Clang 15
	"`P2585R0 <https://wg21.link/P2585R0>`__","Improving default container formatting","C++23","Mark de Wever","\|Complete\|", Clang 17			"`P2585R0 <https://wg21.link/P2585R0>`__","Improving default container formatting","C++23","Mark de Wever","\|Complete\|", Clang 17
	"`P2539R4 <https://wg21.link/P2539R4>`__","Should the output of ``std::print`` to a terminal be synchronized with the underlying stream?","C++23","Mark de Wever"			"`P2539R4 <https://wg21.link/P2539R4>`__","Should the output of ``std::print`` to a terminal be synchronized with the underlying stream?","C++23","Mark de Wever"
	"`P2713R1 <https://wg21.link/P2713R1>`__","Escaping improvements in ``std::format``","C++23","Mark de Wever",""			"`P2713R1 <https://wg21.link/P2713R1>`__","Escaping improvements in ``std::format``","C++23","Mark de Wever",""
	"`P2675R1 <https://wg21.link/P2675R1>`__","``format``'s width estimation is too approximate and not forward compatible","C++23","Mark de Wever","\|Complete\|", Clang 17			"`P2675R1 <https://wg21.link/P2675R1>`__","``format``'s width estimation is too approximate and not forward compatible","C++23","Mark de Wever","\|Complete\|", Clang 17
	"`P2572R1 <https://wg21.link/P2572R1>`__","``std::format`` fill character allowances","C++23","Mark de Wever","\|In progress\|"			"`P2572R1 <https://wg21.link/P2572R1>`__","``std::format`` fill character allowances","C++23","Mark de Wever","\|Complete\|", Clang 17
	"`P2693R1 <https://wg21.link/P2693R1>`__","Formatting ``thread::id`` and ``stacktrace``","C++23","Mark de Wever","\|In progress\|"			"`P2693R1 <https://wg21.link/P2693R1>`__","Formatting ``thread::id`` and ``stacktrace``","C++23","Mark de Wever","\|In progress\|"
	`P1361 <https://wg21.link/P1361>`_,"Integration of chrono with text formatting","C++20",Mark de Wever,\|In Progress\|,			`P1361 <https://wg21.link/P1361>`_,"Integration of chrono with text formatting","C++20",Mark de Wever,\|In Progress\|,
	`P2372 <https://wg21.link/P2372>`__,"Fixing locale handling in chrono formatters","C++20",Mark de Wever,\|In Progress\|,			`P2372 <https://wg21.link/P2372>`__,"Fixing locale handling in chrono formatters","C++20",Mark de Wever,\|In Progress\|,
	"`P2419R2 <https://wg21.link/P2419R2>`__","Clarify handling of encodings in localized formatting of chrono types","C++23",			"`P2419R2 <https://wg21.link/P2419R2>`__","Clarify handling of encodings in localized formatting of chrono types","C++23",

libcxx/include/__format/formatter_floating_point.h

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines	ptrdiff_t __size =
__grouping.size() - // Grouping contains one		__grouping.size() - // Grouping contains one
!__grouping.empty(); // additional character		!__grouping.empty(); // additional character

__formatter::__padding_size_result __padding = {0, 0};		__formatter::__padding_size_result __padding = {0, 0};
bool __zero_padding = __specs.__alignment_ == __format_spec::__alignment::__zero_padding;		bool __zero_padding = __specs.__alignment_ == __format_spec::__alignment::__zero_padding;
if (__size < __specs.__width_) {		if (__size < __specs.__width_) {
if (__zero_padding) {		if (__zero_padding) {
__specs.__alignment_ = __format_spec::__alignment::__right;		__specs.__alignment_ = __format_spec::__alignment::__right;
__specs.__fill_ = _CharT('0');		__specs.__fill_.__data[0] = _CharT('0');
}		}

__padding = __formatter::__padding_size(__size, __specs.__width_, __specs.__alignment_);		__padding = __formatter::__padding_size(__size, __specs.__width_, __specs.__alignment_);
}		}

// sign and (zero padding or alignment)		// sign and (zero padding or alignment)
if (__zero_padding && __first != __buffer.begin())		if (__zero_padding && __first != __buffer.begin())
__out_it++ = __buffer.begin();		__out_it++ = __buffer.begin();
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	if (__specs.__alignment_ == __format_spec::__alignment ::__zero_padding) {
// When there is a sign output it before the padding. Note the __size		// When there is a sign output it before the padding. Note the __size
// doesn't need any adjustment, regardless whether the sign is written		// doesn't need any adjustment, regardless whether the sign is written
// here or in __formatter::__write.		// here or in __formatter::__write.
if (__first != __result.__integral)		if (__first != __result.__integral)
__out_it++ = __first++;		__out_it++ = __first++;
// After the sign is written, zero padding is the same a right alignment		// After the sign is written, zero padding is the same a right alignment
// with '0'.		// with '0'.
__specs.__alignment_ = __format_spec::__alignment::__right;		__specs.__alignment_ = __format_spec::__alignment::__right;
__specs.__fill_ = _CharT('0');		__specs.__fill_.__data[0] = _CharT('0');
}		}

if (__num_trailing_zeros)		if (__num_trailing_zeros)
return __formatter::__write_using_trailing_zeros(		return __formatter::__write_using_trailing_zeros(
__first, __result.__last, _VSTD::move(__out_it), __specs, __size, __result.__exponent, __num_trailing_zeros);		__first, __result.__last, _VSTD::move(__out_it), __specs, __size, __result.__exponent, __num_trailing_zeros);

return __formatter::__write(__first, __result.__last, _VSTD::move(__out_it), __specs, __size);		return __formatter::__write(__first, __result.__last, _VSTD::move(__out_it), __specs, __size);
}		}
Show All 38 Lines

libcxx/include/__format/formatter_integral.h

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	# endif
else {		else {
// __buf contains [sign][prefix]data		// __buf contains [sign][prefix]data
// ^ location of __first		// ^ location of __first
// The zero padding is done like:		// The zero padding is done like:
// - Write [sign][prefix]		// - Write [sign][prefix]
// - Write data right aligned with '0' as fill character.		// - Write data right aligned with '0' as fill character.
__out_it = __formatter::__copy(__begin, __first, _VSTD::move(__out_it));		__out_it = __formatter::__copy(__begin, __first, _VSTD::move(__out_it));
__specs.__alignment_ = __format_spec::__alignment::__right;		__specs.__alignment_ = __format_spec::__alignment::__right;
__specs.__fill_ = _CharT('0');		__specs.__fill_.__data[0] = _CharT('0');
int32_t __size = __first - __begin;		int32_t __size = __first - __begin;

__specs.__width_ -= _VSTD::min(__size, __specs.__width_);		__specs.__width_ -= _VSTD::min(__size, __specs.__width_);
}		}

if (__specs.__std_.__type_ != __format_spec::__type::__hexadecimal_upper_case) [[likely]]		if (__specs.__std_.__type_ != __format_spec::__type::__hexadecimal_upper_case) [[likely]]
return __formatter::__write(__first, __last, __ctx.out(), __specs);		return __formatter::__write(__first, __last, __ctx.out(), __specs);

▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

libcxx/include/__format/formatter_output.h

// -- C++ --		// -- C++ --
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef _LIBCPP___FORMAT_FORMATTER_OUTPUT_H		#ifndef _LIBCPP___FORMAT_FORMATTER_OUTPUT_H
#define _LIBCPP___FORMAT_FORMATTER_OUTPUT_H		#define _LIBCPP___FORMAT_FORMATTER_OUTPUT_H

#include <__algorithm/ranges_copy.h>		#include <__algorithm/ranges_copy.h>
#include <__algorithm/ranges_fill_n.h>		#include <__algorithm/ranges_fill_n.h>
#include <__algorithm/ranges_for_each.h>		#include <__algorithm/ranges_for_each.h>
#include <__algorithm/ranges_transform.h>		#include <__algorithm/ranges_transform.h>
		#include <__bit/countl.h>
#include <__charconv/to_chars_integral.h>		#include <__charconv/to_chars_integral.h>
#include <__charconv/to_chars_result.h>		#include <__charconv/to_chars_result.h>
#include <__chrono/statically_widen.h>		#include <__chrono/statically_widen.h>
#include <__concepts/same_as.h>		#include <__concepts/same_as.h>
#include <__config>		#include <__config>
#include <__format/buffer.h>		#include <__format/buffer.h>
#include <__format/concepts.h>		#include <__format/concepts.h>
#include <__format/escaped_output_table.h>		#include <__format/escaped_output_table.h>
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	_LIBCPP_HIDE_FROM_ABI _OutIt __fill(_OutIt __out_it, size_t __n, _CharT __value) {
} else if constexpr (_VSTD::same_as<decltype(__out_it), typename __format::__retarget_buffer<_CharT>::__iterator>) {		} else if constexpr (_VSTD::same_as<decltype(__out_it), typename __format::__retarget_buffer<_CharT>::__iterator>) {
__out_it.__buffer_->__fill(__n, __value);		__out_it.__buffer_->__fill(__n, __value);
return __out_it;		return __out_it;
} else {		} else {
return std::ranges::fill_n(_VSTD::move(__out_it), __n, __value);		return std::ranges::fill_n(_VSTD::move(__out_it), __n, __value);
}		}
}		}

		# ifndef _LIBCPP_HAS_NO_UNICODE
		template <__fmt_char_type _CharT, output_iterator<const _CharT&> _OutIt>
		requires(same_as<_CharT, char>)
		_LIBCPP_HIDE_FROM_ABI _OutIt __fill(_OutIt __out_it, size_t __n, __format_spec::__fill<_CharT> __value) {
		std::size_t __bytes = std::countl_one(static_cast<unsigned char>(__value.__data[0]));
		if (__bytes == 0)
		return __formatter::__fill(std::move(__out_it), __n, __value.__data[0]);

		for (size_t __i = 0; __i < __n; ++__i)
		__out_it = __formatter::__copy(
		std::addressof(__value.__data[0]), std::addressof(__value.__data[0]) + __bytes, std::move(__out_it));
		return __out_it;
		}

		# ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS
		template <__fmt_char_type _CharT, output_iterator<const _CharT&> _OutIt>
		requires(same_as<_CharT, wchar_t> && sizeof(wchar_t) == 2)
		_LIBCPP_HIDE_FROM_ABI _OutIt __fill(_OutIt __out_it, size_t __n, __format_spec::__fill<_CharT> __value) {
		if (!__unicode::__is_high_surrogate(__value.__data[0]))
		return __formatter::__fill(std::move(__out_it), __n, __value.__data[0]);

		for (size_t __i = 0; __i < __n; ++__i)
		__out_it = __formatter::__copy(
		std::addressof(__value.__data[0]), std::addressof(__value.__data[0]) + 2, std::move(__out_it));
		return __out_it;
		}

		template <__fmt_char_type _CharT, output_iterator<const _CharT&> _OutIt>
		requires(same_as<_CharT, wchar_t> && sizeof(wchar_t) == 4)
		_LIBCPP_HIDE_FROM_ABI _OutIt __fill(_OutIt __out_it, size_t __n, __format_spec::__fill<_CharT> __value) {
		return __formatter::__fill(std::move(__out_it), __n, __value.__data[0]);
		}
		# endif // _LIBCPP_HAS_NO_WIDE_CHARACTERS
		# else // _LIBCPP_HAS_NO_UNICODE
		template <__fmt_char_type _CharT, output_iterator<const _CharT&> _OutIt>
		_LIBCPP_HIDE_FROM_ABI _OutIt __fill(_OutIt __out_it, size_t __n, __format_spec::__fill<_CharT> __value) {
		return __formatter::__fill(std::move(__out_it), __n, __value.__data[0]);
		}
		# endif // _LIBCPP_HAS_NO_UNICODE

template <class _OutIt, class _CharT>		template <class _OutIt, class _CharT>
_LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, const char* __begin, const char* __first,		_LIBCPP_HIDE_FROM_ABI _OutIt __write_using_decimal_separators(_OutIt __out_it, const char* __begin, const char* __first,
const char* __last, string&& __grouping, _CharT __sep,		const char* __last, string&& __grouping, _CharT __sep,
__format_spec::__parsed_specifications<_CharT> __specs) {		__format_spec::__parsed_specifications<_CharT> __specs) {
int __size = (__first - __begin) + // [sign][prefix]		int __size = (__first - __begin) + // [sign][prefix]
(__last - __first) + // data		(__last - __first) + // data
(__grouping.size() - 1); // number of separator characters		(__grouping.size() - 1); // number of separator characters

▲ Show 20 Lines • Show All 380 Lines • Show Last 20 Lines

libcxx/include/__format/parser_std_format_spec.h

Show All 10 Lines

#define _LIBCPP___FORMAT_PARSER_STD_FORMAT_SPEC_H #define _LIBCPP___FORMAT_PARSER_STD_FORMAT_SPEC_H

/// \file Contains the std-format-spec parser. /// \file Contains the std-format-spec parser.

/// ///

/// Most of the code can be reused in the chrono-format-spec. /// Most of the code can be reused in the chrono-format-spec.

/// This header has some support for the chrono-format-spec since it doesn't /// This header has some support for the chrono-format-spec since it doesn't

/// affect the std-format-spec. /// affect the std-format-spec.

#include <__algorithm/copy_n.h>

#include <__algorithm/find_if.h> #include <__algorithm/find_if.h>

#include <__algorithm/min.h> #include <__algorithm/min.h>

#include <__assert> #include <__assert>

#include <__bit/countl.h>

#include <__concepts/arithmetic.h> #include <__concepts/arithmetic.h>

#include <__concepts/same_as.h> #include <__concepts/same_as.h>

#include <__config> #include <__config>

#include <__debug> #include <__debug>

#include <__format/format_arg.h> #include <__format/format_arg.h>

#include <__format/format_error.h> #include <__format/format_error.h>

#include <__format/format_parse_context.h> #include <__format/format_parse_context.h>

#include <__format/format_string.h> #include <__format/format_string.h>

#include <__format/unicode.h> #include <__format/unicode.h>

#include <__format/width_estimation_table.h> #include <__format/width_estimation_table.h>

#include <__iterator/concepts.h> #include <__iterator/concepts.h>

#include <__iterator/readable_traits.h> // iter_value_t #include <__iterator/readable_traits.h> // iter_value_t

#include <__memory/addressof.h>

#include <__type_traits/common_type.h> #include <__type_traits/common_type.h>

#include <__type_traits/is_trivially_copyable.h> #include <__type_traits/is_trivially_copyable.h>

#include <__variant/monostate.h> #include <__variant/monostate.h>

#include <cstdint> #include <cstdint>

#include <string_view> #include <string_view>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)

# pragma GCC system_header # pragma GCC system_header

▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines struct __chrono {

bool __hour_ : 1; bool __hour_ : 1;

bool __weekday_name_ : 1; bool __weekday_name_ : 1;

bool __weekday_ : 1; bool __weekday_ : 1;

bool __day_of_year_ : 1; bool __day_of_year_ : 1;

bool __week_of_year_ : 1; bool __week_of_year_ : 1;

bool __month_name_ : 1; bool __month_name_ : 1;

}; };

// The fill UCS scalar value.

// This is always an array, with 1, 2, or 4 elements.

// The size of the data structure is always 32-bits.

template <class _CharT>

struct __fill;

ldionneUnsubmitted

Done

Maybe this could be renamed to __codepoint? That way you'd have __codepoint<_CharT> __fill_;.

ldionne: Maybe this could be renamed to `__codepoint`? That way you'd have `__codepoint<_CharT> __fill_…

template <>

struct __fill<char> {

char __data[4] = {' '};

};

# ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS

template <>

struct __fill<wchar_t> {

wchar_t __data[4 / sizeof(wchar_t)] = {L' '};

};

# endif

/// Contains the parsed formatting specifications. /// Contains the parsed formatting specifications.

/// ///

/// This contains information for both the std-format-spec and the /// This contains information for both the std-format-spec and the

/// chrono-format-spec. This results in some unused members for both /// chrono-format-spec. This results in some unused members for both

/// specifications. However these unused members don't increase the size /// specifications. However these unused members don't increase the size

/// of the structure. /// of the structure.

/// ///

/// This struct doesn't cross ABI boundaries so its layout doesn't need to be /// This struct doesn't cross ABI boundaries so its layout doesn't need to be

Show All 19 Lines struct __parsed_specifications {

int32_t __width_; int32_t __width_;

/// The requested precision. /// The requested precision.

/// ///

/// When the format-spec used an arg-id for this field it has already been /// When the format-spec used an arg-id for this field it has already been

/// replaced with the value of that arg-id. /// replaced with the value of that arg-id.

int32_t __precision_; int32_t __precision_;

_CharT __fill_; __fill<_CharT> __fill_;

_LIBCPP_HIDE_FROM_ABI constexpr bool __has_width() const { return __width_ > 0; } _LIBCPP_HIDE_FROM_ABI constexpr bool __has_width() const { return __width_ > 0; }

_LIBCPP_HIDE_FROM_ABI constexpr bool __has_precision() const { return __precision_ >= 0; } _LIBCPP_HIDE_FROM_ABI constexpr bool __has_precision() const { return __precision_ >= 0; }

}; };

// Validate the struct is small and cheap to copy since the struct is passed by // Validate the struct is small and cheap to copy since the struct is passed by

// value in formatting functions. // value in formatting functions.

▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines public:

bool __precision_as_arg_ : 1 {false}; bool __precision_as_arg_ : 1 {false};

/// The requested width, either the value or the arg-id. /// The requested width, either the value or the arg-id.

int32_t __width_{0}; int32_t __width_{0};

/// The requested precision, either the value or the arg-id. /// The requested precision, either the value or the arg-id.

int32_t __precision_{-1}; int32_t __precision_{-1};

// LWG 3576 will probably change this to always accept a Unicode code point __fill<_CharT> __fill_{};

// To avoid changing the size with that change align the field so when it

// becomes 32-bit its alignment will remain the same. That also means the

// size will remain the same. (D2572 addresses the solution for LWG 3576.)

_CharT __fill_{_CharT(' ')};

private: private:

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_alignment(_CharT __c) { _LIBCPP_HIDE_FROM_ABI constexpr bool __parse_alignment(_CharT __c) {

switch (__c) { switch (__c) {

case _CharT('<'): case _CharT('<'):

__alignment_ = __alignment::__left; __alignment_ = __alignment::__left;

return true; return true;

case _CharT('^'): case _CharT('^'):

__alignment_ = __alignment::__center; __alignment_ = __alignment::__center;

return true; return true;

case _CharT('>'): case _CharT('>'):

__alignment_ = __alignment::__right; __alignment_ = __alignment::__right;

return true; return true;

} }

return false; return false;

} }

_LIBCPP_HIDE_FROM_ABI constexpr void __validate_fill_character(_CharT __fill, bool __use_range_fill) {

// The forbidden fill characters all are 1-byte code points, thus the

// check can be omitted when more bytes are used.

MordanteAuthorUnsubmitted

Done

_LIBCPP_HIDE_FROM_ABI constexpr void __validate_fill_character(_CharT __fill, bool __use_range_fill) {

- // The forbidden fill characters all are 1-byte code points, thus the

- // check can be omitted when more bytes are used.

+ // The forbidden fill characters all code points formed from a single code unit, thus the

+ // check can be omitted when more code units are used.

if (__use_range_fill && (__fill == _CharT('{') || __fill == _CharT('}') || __fill == _CharT(':')))

Mordante:

if (__use_range_fill && (__fill == _CharT('{') || __fill == _CharT('}') || __fill == _CharT(':')))

std::__throw_format_error("The format-spec range-fill field contains an invalid character");

else if (__fill == _CharT('{') || __fill == _CharT('}'))

std::__throw_format_error("The format-spec fill field contains an invalid character");

}

# ifndef _LIBCPP_HAS_NO_UNICODE

// range-fill and tuple-fill are identical // range-fill and tuple-fill are identical

template <contiguous_iterator _Iterator> template <contiguous_iterator _Iterator>

requires(same_as<_CharT, char>)

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_fill_align(_Iterator& __begin, _Iterator __end, bool __use_range_fill) {

_LIBCPP_ASSERT(__begin != __end,

"when called with an empty input the function will cause "

"undefined behavior by evaluating data not in the input");

// The number of bytes the are used for the UCS scalar value, can be

// determined by the number of leading bits with value 1 in the first byte.

std::size_t __bytes = std::countl_one(static_cast<unsigned char>(*__begin));

switch (__bytes) {

tahonermannUnsubmitted

Done

if (__consumed.__status != __unicode::__consume_result::__ok)

- std::__throw_format_error("The format-spec contains malformed Unicode");

+ std::__throw_format_error("The format-spec contains malformed Unicode characters");

if (__view.__position() < __end && __parse_alignment(*__view.__position())) {

"... contains an ill-formed code unit sequence" seems more accurate to me, but that is probably too technical for the intended audience.

tahonermann: "... contains an ill-formed code unit sequence" seems more accurate to me, but that is probably…

MordanteAuthorUnsubmitted

Done

"... contains an ill-formed code unit sequence" seems more accurate to me, but that is probably too technical for the intended audience.

Yes, I tried to keep the target audience in mind when I wrote the message. If it was purely internally I would have picked a more accurate message.

Mordante: > "... contains an ill-formed code unit sequence" seems more accurate to me, but that is…

case 0:

__bytes = 1;

break;

case 2:

case 3:

tahonermannUnsubmitted

Done

This seems a little odd to me. When consumption of the fill character fails (due to an encoding issue), an attempt is still made to parse the alignment at the new position before checking for the consumption error and then reporting a parse issue. I'm not sure why that attempt is made since success is going to lead to reporting the fill character failure and failure is going to result in falling through to retry parsing the alignment at the beginning anyway. If the intent is to get to different error messages, that seems reasonable, but it seems this can fallback to trying to parse the alignment at the beginning anyway. Am I missing something?

tahonermann: This seems a little odd to me. When consumption of the fill character fails (due to an encoding…

MordanteAuthorUnsubmitted

Done

I wrote this in this way due to historic reasons. A character like 0 is a fill-character when followed by an alignment. Else it's a zero-padding. Since valid elements of the format-spec are not invalid Unicode this is not needed. I adjusted it, but reverted it again. For UTF-32 it makes sense to only test when it's a fill character, and I like to keep the same diagnostic regardless of the encoding used. So I added comment to explain the design.

Mordante: I wrote this in this way due to historic reasons. A character like 0 is a fill-character when…

MordanteAuthorUnsubmitted

Done

@tahonermann are you happy with this comment?

Mordante: @tahonermann are you happy with this comment?

tahonermannUnsubmitted

Done

I agree with the goal of keeping the diagnostics aligned, but I don't think this change does that.

The concern I have is that, when a code point isn't decoded by the call to __view.__consume(), the current location in the view will have been bumped by one code unit (based on my reading of the __consume() implementation for the __code_point_view<char> explicit specialization). There isn't much reason to expect a code point to be successfully decoded at that location. The likely result is that this block isn't entered (because the call to __parse_alignment(*__view.__position() above fails to match an alignment character; note that it doesn't even attempt proper decoding) and execution falls through to the __parse_alignment(*__begin) below which will likely return false thus leading to __parse_fill_align() returning with an indication that the fill-and-align option is not present. I haven't looked into what might happen next.

The UTF-8 and UTF-16 cases are fundamentally different compared to UTF-32 since they are variable length encodings and the latter is a trivial fixed length encoding. I think it is reasonable to throw a distinct error in this case since this problem cannot occur in UTF-32 (a failure to decode a code point at all is subtly different than decoding a code point that is not a UCS scalar value). It is trivially easy to step to the next code unit sequence in UTF-32, but not in UTF-8 or UTF-16.

tahonermann: I agree with the goal of keeping the diagnostics aligned, but I don't think this change does…

MordanteAuthorUnsubmitted

Done

Good point thanks! I tested with your additional suggested tests and the fill-and-align is indeed not properly detected for some of the new test cases. This results in a replacement-field that doesn't end with a '}'. There the error doesn't mention fill either. So having a different error for UTF-32 and UTF-8/16 seems the way forward. (I don't feel like doing more effort to possibly detect an alignment for an error message to be worth the code size.)

Mordante: Good point thanks! I tested with your additional suggested tests and the `fill-and-align` is…

case 4:

break;

default:

std::__throw_format_error("Malformed Unicode fill character");

}

if (__begin + __bytes < __end) {

if (__parse_alignment(*(__begin + __bytes))) {

// Validates whether the input is indeed a valid UCS Scalar value.

__unicode::__code_point_view<char> __view{__begin, __begin + __bytes};

__unicode::__consume_result __consumed = __view.__consume();

if (__consumed.__status != __unicode::__consume_result::__ok)

std::__throw_format_error("The fill character contains an invalid value");

_LIBCPP_ASSERT(__view.__at_end(), "a valid fill character should have consumed the entire input");

if (__bytes == 1)

// The forbidden fill characters all are 1-byte code points, thus the

// check can be omitted when more bytes are used.

__validate_fill_character(*__begin, __use_range_fill);

std::copy_n(__begin, __bytes, std::addressof(__fill_.__data[0]));

__begin += __bytes + 1;

return true;

}

if (!__parse_alignment(*__begin))

return false;

++__begin;

return true;

}

# ifndef _LIBCPP_HAS_NO_WIDE_CHARACTERS

template <contiguous_iterator _Iterator>

requires(same_as<_CharT, wchar_t> && sizeof(wchar_t) == 2)

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_fill_align(_Iterator& __begin, _Iterator __end, bool __use_range_fill) { _LIBCPP_HIDE_FROM_ABI constexpr bool __parse_fill_align(_Iterator& __begin, _Iterator __end, bool __use_range_fill) {

_LIBCPP_ASSERT(__begin != __end, "when called with an empty input the function will cause " _LIBCPP_ASSERT(__begin != __end,

"when called with an empty input the function will cause "

"undefined behavior by evaluating data not in the input");

std::size_t __bytes = 1 + __unicode::__is_high_surrogate(*__begin);

if (__begin + __bytes < __end) {

if (__parse_alignment(*(__begin + __bytes))) {

// Validates whether the input is indeed a valid UCS Scalar value.

__unicode::__code_point_view<wchar_t> __view{__begin, __begin + __bytes};

__unicode::__consume_result __consumed = __view.__consume();

if (__consumed.__status != __unicode::__consume_result::__ok)

std::__throw_format_error("The fill character contains an invalid value");

_LIBCPP_ASSERT(__view.__at_end(), "a valid fill character should have consumed the entire input");

if (__bytes == 1)

// The forbidden fill characters all are 1-byte code points, thus the

// check can be omitted when more bytes are used.

__validate_fill_character(*__begin, __use_range_fill);

std::copy_n(__begin, __bytes, std::addressof(__fill_.__data[0])); // ranges and inout result to be used?

__begin += __bytes + 1;

return true;

}

if (!__parse_alignment(*__begin))

return false;

++__begin;

return true;

}

template <contiguous_iterator _Iterator>

requires(same_as<_CharT, wchar_t> && sizeof(wchar_t) == 4)

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_fill_align(_Iterator& __begin, _Iterator __end, bool __use_range_fill) {

_LIBCPP_ASSERT(__begin != __end,

"when called with an empty input the function will cause "

"undefined behavior by evaluating data not in the input"); "undefined behavior by evaluating data not in the input");

if (__begin + 1 != __end) { if (__begin + 1 != __end) {

if (__parse_alignment(*(__begin + 1))) { if (__parse_alignment(*(__begin + 1))) {

if (__use_range_fill && (*__begin == _CharT('{') || *__begin == _CharT('}') || *__begin == _CharT(':'))) if (!__unicode::__is_scalar_value(*__begin))

std::__throw_format_error("The format-spec range-fill field contains an invalid character"); std::__throw_format_error("The fill character contains an invalid value");

else if (*__begin == _CharT('{') || *__begin == _CharT('}'))

std::__throw_format_error("The format-spec fill field contains an invalid character");

__fill_ = *__begin; __validate_fill_character(*__begin, __use_range_fill);

__fill_.__data[0] = *__begin;

__begin += 2; __begin += 2;

return true; return true;

} }

if (!__parse_alignment(*__begin)) if (!__parse_alignment(*__begin))

return false; return false;

++__begin; ++__begin;

return true; return true;

} }

# endif // _LIBCPP_HAS_NO_WIDE_CHARACTERS

# else // _LIBCPP_HAS_NO_UNICODE

// range-fill and tuple-fill are identical

template <contiguous_iterator _Iterator>

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_fill_align(_Iterator& __begin, _Iterator __end, bool __use_range_fill) {

_LIBCPP_ASSERT(__begin != __end,

"when called with an empty input the function will cause "

"undefined behavior by evaluating data not in the input");

if (__begin + 1 != __end) {

if (__parse_alignment(*(__begin + 1))) {

__validate_fill_character(*__begin, __use_range_fill);

__fill_.__data[0] = *__begin;

__begin += 2;

return true;

}

if (!__parse_alignment(*__begin))

return false;

++__begin;

return true;

}

# endif // _LIBCPP_HAS_NO_UNICODE

template <contiguous_iterator _Iterator> template <contiguous_iterator _Iterator>

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_sign(_Iterator& __begin) { _LIBCPP_HIDE_FROM_ABI constexpr bool __parse_sign(_Iterator& __begin) {

switch (*__begin) { switch (*__begin) {

case _CharT('-'): case _CharT('-'):

__sign_ = __sign::__minus; __sign_ = __sign::__minus;

break; break;

case _CharT('+'): case _CharT('+'):

__sign_ = __sign::__plus; __sign_ = __sign::__plus;

Show All 31 Lines # endif // _LIBCPP_HAS_NO_UNICODE

template <contiguous_iterator _Iterator> template <contiguous_iterator _Iterator>

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_width(_Iterator& __begin, _Iterator __end, auto& __parse_ctx) { _LIBCPP_HIDE_FROM_ABI constexpr bool __parse_width(_Iterator& __begin, _Iterator __end, auto& __parse_ctx) {

if (*__begin == _CharT('0')) if (*__begin == _CharT('0'))

std::__throw_format_error("A format-spec width field shouldn't have a leading zero"); std::__throw_format_error("A format-spec width field shouldn't have a leading zero");

if (*__begin == _CharT('{')) { if (*__begin == _CharT('{')) {

__format::__parse_number_result __r = __format_spec::__parse_arg_id(++__begin, __end, __parse_ctx); __format::__parse_number_result __r = __format_spec::__parse_arg_id(++__begin, __end, __parse_ctx);

__width_as_arg_ = true; __width_as_arg_ = true;

__width_ = __r.__value; __width_ = __r.__value;

__begin = __r.__last; __begin = __r.__last;

return true; return true;

} }

if (*__begin < _CharT('0') || *__begin > _CharT('9')) if (*__begin < _CharT('0') || *__begin > _CharT('9'))

return false; return false;

__format::__parse_number_result __r = __format::__parse_number(__begin, __end); __format::__parse_number_result __r = __format::__parse_number(__begin, __end);

__width_ = __r.__value; __width_ = __r.__value;

_LIBCPP_ASSERT(__width_ != 0, "A zero value isn't allowed and should be impossible, " _LIBCPP_ASSERT(__width_ != 0,

"A zero value isn't allowed and should be impossible, "

"due to validations in this function"); "due to validations in this function");

__begin = __r.__last; __begin = __r.__last;

return true; return true;

} }

template <contiguous_iterator _Iterator> template <contiguous_iterator _Iterator>

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_precision(_Iterator& __begin, _Iterator __end, auto& __parse_ctx) { _LIBCPP_HIDE_FROM_ABI constexpr bool __parse_precision(_Iterator& __begin, _Iterator __end, auto& __parse_ctx) {

if (*__begin != _CharT('.')) if (*__begin != _CharT('.'))

return false; return false;

++__begin; ++__begin;

if (__begin == __end) if (__begin == __end)

std::__throw_format_error("End of input while parsing format-spec precision"); std::__throw_format_error("End of input while parsing format-spec precision");

if (*__begin == _CharT('{')) { if (*__begin == _CharT('{')) {

__format::__parse_number_result __arg_id = __format_spec::__parse_arg_id(++__begin, __end, __parse_ctx); __format::__parse_number_result __arg_id = __format_spec::__parse_arg_id(++__begin, __end, __parse_ctx);

__precision_as_arg_ = true; __precision_as_arg_ = true;

__precision_ = __arg_id.__value; __precision_ = __arg_id.__value;

__begin = __arg_id.__last; __begin = __arg_id.__last;

return true; return true;

} }

if (*__begin < _CharT('0') || *__begin > _CharT('9')) if (*__begin < _CharT('0') || *__begin > _CharT('9'))

std::__throw_format_error("The format-spec precision field doesn't contain a value or arg-id"); std::__throw_format_error("The format-spec precision field doesn't contain a value or arg-id");

__format::__parse_number_result __r = __format::__parse_number(__begin, __end); __format::__parse_number_result __r = __format::__parse_number(__begin, __end);

__precision_ = __r.__value; __precision_ = __r.__value;

__precision_as_arg_ = false; __precision_as_arg_ = false;

__begin = __r.__last; __begin = __r.__last;

return true; return true;

} }

template <contiguous_iterator _Iterator> template <contiguous_iterator _Iterator>

_LIBCPP_HIDE_FROM_ABI constexpr bool __parse_locale_specific_form(_Iterator& __begin) { _LIBCPP_HIDE_FROM_ABI constexpr bool __parse_locale_specific_form(_Iterator& __begin) {

if (*__begin != _CharT('L')) if (*__begin != _CharT('L'))

return false; return false;

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines case '?':

break; break;

# endif # endif

default: default:

return; return;

} }

++__begin; ++__begin;

} }

_LIBCPP_HIDE_FROM_ABI _LIBCPP_HIDE_FROM_ABI int32_t __get_width(auto& __ctx) const {

int32_t __get_width(auto& __ctx) const {

if (!__width_as_arg_) if (!__width_as_arg_)

return __width_; return __width_;

return __format_spec::__substitute_arg_id(__ctx.arg(__width_)); return __format_spec::__substitute_arg_id(__ctx.arg(__width_));

} }

_LIBCPP_HIDE_FROM_ABI _LIBCPP_HIDE_FROM_ABI int32_t __get_precision(auto& __ctx) const {

int32_t __get_precision(auto& __ctx) const {

if (!__precision_as_arg_) if (!__precision_as_arg_)

return __precision_; return __precision_;

return __format_spec::__substitute_arg_id(__ctx.arg(__precision_)); return __format_spec::__substitute_arg_id(__ctx.arg(__precision_));

} }

}; };

// Validates whether the reserved bitfields don't change the size. // Validates whether the reserved bitfields don't change the size.

▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp

This file was added.

//===----------------------------------------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// UNSUPPORTED: c++03, c++11, c++14, c++17

// UNSUPPORTED: libcpp-has-no-incomplete-format

// TODO FMT Evaluate gcc-12 status

// This version runs the test when the platform has Unicode support.

// UNSUPPORTED: libcpp-has-no-unicode

// XFAIL: availability-fp_to_chars-missing

// <format>

// The paper

// P2572R1 std::format fill character allowances

// adds support for Unicode Scalar Values as fill character.

#include <format>

#include "assert_macros.h"

#include "concat_macros.h"

#include "format.functions.common.h"

#include "make_string.h"

#include "string_literal.h"

#include "test_format_string.h"

#include "test_macros.h"

#define SV(S) MAKE_STRING_VIEW(CharT, S)

auto check = []<class CharT, class... Args>(

std::basic_string_view<CharT> expected, test_format_string<CharT, Args...> fmt, Args&&... args) {

std::basic_string<CharT> out = std::format(fmt, std::forward<Args>(args)...);

TEST_REQUIRE(out == expected,

TEST_WRITE_CONCATENATED(

"\nFormat string ", fmt.get(), "\nExpected output ", expected, "\nActual output ", out, '\n'));

};

auto check_exception =

[]<class CharT, class... Args>(

[[maybe_unused]] std::string_view what,

[[maybe_unused]] std::basic_string_view<CharT> fmt,

[[maybe_unused]] Args&&... args) {

TEST_VALIDATE_EXCEPTION(

std::format_error,

[&]([[maybe_unused]] const std::format_error& e) {

TEST_LIBCPP_REQUIRE(

e.what() == what,

TEST_WRITE_CONCATENATED(

"\nFormat string ", fmt, "\nExpected exception ", what, "\nActual exception ", e.what(), '\n'));

TEST_IGNORE_NODISCARD std::vformat(fmt, std::make_format_args<context_t<CharT>>(args...)));

};

template <class CharT>

void test() {

// 1, 2, 3, 4 code unit UFT-8 transitions

check(SV("\u000042\u0000"), SV("{:\u0000^4}"), 42);

check(SV("\u007f42\u007f"), SV("{:\u007f^4}"), 42);

check(SV("\u008042\u0080"), SV("{:\u0080^4}"), 42);

check(SV("\u07ff42\u07ff"), SV("{:\u07ff^4}"), 42);

JMazurkiewiczUnsubmitted

Done

void test() {

- // 1, 2, 3, 4 code unit UFT-8 transitions

+ // 1, 2, 3, 4 code unit UTF-8 transitions

std::cerr << __LINE__ << '\n';

JMazurkiewicz:

check(SV("\u080042\u0800"), SV("{:\u0800^4}"), 42);

check(SV("\uffff42\uffff"), SV("{:\uffff^4}"), 42);

check(SV("\U0010000042\U00100000"), SV("{:\U00100000^4}"), 42);

check(SV("\U0010ffff42\U0010ffff"), SV("{:\U0010ffff^4}"), 42);

// Examples of P2572R1

check(SV("🤡🤡x🤡🤡🤡"), SV("{:🤡^6}"), SV("x"));

check(SV("🤡🤡🤡"), SV("{:*^6}"), SV("🤡🤡🤡"));

check(SV("12345678"), SV("{:*>6}"), SV("12345678"));

// Invalid Unicode Scalar Values

if constexpr (std::same_as<CharT, char>) {

check_exception("The fill character contains an invalid value", SV("{:\xed\xa0\x80^4}"), 42); // U+D800

check_exception("The fill character contains an invalid value", SV("{:\xed\xa0\xbf^4}"), 42); // U+DBFF

check_exception("The fill character contains an invalid value", SV("{:\xed\xbf\x80^4}"), 42); // U+DC00

check_exception("The fill character contains an invalid value", SV("{:\xed\xbf\xbf^4}"), 42); // U+DFFF

check_exception("The fill character contains an invalid value", SV("{:\xf4\x90\x80\x80^4}"), 42); // U+110000

check_exception("The fill character contains an invalid value", SV("{:\xf4\x90\xbf\xbf^4}"), 42); // U+11FFFF

#ifndef TEST_HAS_NO_WIDE_CHARACTERS

tahonermannUnsubmitted

Done

Suggested tests for ill-formed UTF-8 code unit sequences:

check_exception("???", SV("{:\x80^}"), 42);         // Trailing code unit with no leading one.
check_exception("???", SV("{:\xc0^}"), 42);         // Missing trailing code unit.
check_exception("???", SV("{:\xe0\x80^}"), 42);     // Missing trailing code unit.
check_exception("???", SV("{:\xf0\x80^}"), 42);     // Missing two trailing code units.
check_exception("???", SV("{:\xf0\x80\x80^}"), 42); // Missing trailing code unit.

tahonermann: Suggested tests for ill-formed UTF-8 code unit sequences: check_exception("???", SV("{…

MordanteAuthorUnsubmitted

Done

I tested this before changing the code in the header. The first to result in The fill character contains an invalid value, the third "The fill character contains an invalid value. (I didn't test the fourth and fifth.)

Mordante: I tested this before changing the code in the header. The first to result in `The fill…

} else {

check_exception("The fill character contains an invalid value", std::wstring_view{L"{:\xd800^4}"}, 42);

check_exception("The fill character contains an invalid value", std::wstring_view{L"{:\xdbff^4}"}, 42);

check_exception("The fill character contains an invalid value", std::wstring_view{L"{:\xdc00^4}"}, 42);

check_exception("The fill character contains an invalid value", std::wstring_view{L"{:\xddff^4}"}, 42);

tahonermannUnsubmitted

Done

These all test lone surrogates. Here is a suggested test for reversed surrogates:

check_exception("???", std::wstring_view{L"{:\xdc00\xd800^}"}, 42);

tahonermann: These all test lone surrogates. Here is a suggested test for reversed surrogates…

MordanteAuthorUnsubmitted

Done

This also results in "The format-spec should consume the input or end with a '}'

Mordante: This also results in `"The format-spec should consume the input or end with a '}'`

# ifndef TEST_SHORT_WCHAR

check_exception("The fill character contains an invalid value", std::wstring_view{L"{:\x00110000^4}"}, 42);

check_exception("The fill character contains an invalid value", std::wstring_view{L"{:\x0011ffff^4}"}, 42);

# endif

#endif // TEST_HAS_NO_WIDE_CHARACTERS

}

int main(int, char**) {

test<char>();

#ifndef TEST_HAS_NO_WIDE_CHARACTERS

test<wchar_t>();

#endif

return 0;

}

libcxx/utils/ci/run-buildbot

Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	if [ -s ${BUILD_DIR}/generated_output.status ]; then
false		false
fi		fi

# Reject patches that introduce non-ASCII characters or hard tabs.		# Reject patches that introduce non-ASCII characters or hard tabs.
# Depends on LC_COLLATE set at the top of this script.		# Depends on LC_COLLATE set at the top of this script.
! grep -rn '[^ -~]' libcxx/include libcxx/src libcxx/test libcxx/benchmarks \		! grep -rn '[^ -~]' libcxx/include libcxx/src libcxx/test libcxx/benchmarks \
--exclude '*.dat' \		--exclude '*.dat' \
--exclude 'escaped_output.*.pass.cpp' \		--exclude 'escaped_output.*.pass.cpp' \
		--exclude 'fill.unicode.pass.cpp' \
--exclude 'format_tests.h' \		--exclude 'format_tests.h' \
--exclude 'format.functions.tests.h' \		--exclude 'format.functions.tests.h' \
--exclude 'formatter.*.pass.cpp' \		--exclude 'formatter.*.pass.cpp' \
--exclude 'grep.pass.cpp' \		--exclude 'grep.pass.cpp' \
--exclude 'locale-specific_form.pass.cpp' \		--exclude 'locale-specific_form.pass.cpp' \
--exclude 'ostream.pass.cpp' \		--exclude 'ostream.pass.cpp' \
--exclude 'std_format_spec_string_unicode.bench.cpp' \		--exclude 'std_format_spec_string_unicode.bench.cpp' \
--exclude 'underflow.pass.cpp' \		--exclude 'underflow.pass.cpp' \
▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libc++][format] Improves fill character.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 516084

libcxx/docs/ReleaseNotes.rst

libcxx/docs/Status/Cxx2bPapers.csv

libcxx/docs/Status/FormatIssues.csv

libcxx/include/__format/formatter_floating_point.h

libcxx/include/__format/formatter_integral.h

libcxx/include/__format/formatter_output.h

libcxx/include/__format/parser_std_format_spec.h

libcxx/test/std/utilities/format/format.functions/fill.unicode.pass.cpp

libcxx/utils/ci/run-buildbot

[libc++][format] Improves fill character.
ClosedPublic