Page MenuHomePhabricator

Microsoft's floating-point to_chars powered by Ryu and Ryu Printf
Changes PlannedPublic

Authored by Mordante on Nov 22 2019, 8:21 PM.

Details

Reviewers
ldionne
mclow.lists
EricWF
zoecarver
ulfjack
jorgbrown
lichray
STL_MSFT
Group Reviewers
Restricted Project
Summary

Microsoft would like to contribute its implementation of floating-point to_chars to libc++. This uses the impossibly fast Ryu and Ryu Printf algorithms invented by Ulf Adams at Google. Upstream repos: https://github.com/microsoft/STL and https://github.com/ulfjack/ryu .

Licensing notes: MSVC's STL is available under the Apache License v2.0 with LLVM Exception, intentionally chosen to match libc++. We've used Ryu under the Boost Software License.

This patch contains minor changes from Jorg Brown at Google, to adapt the code to libc++. He verified that it works in Google's Linux-based environment, but then I applied more changes on top of his, so any compiler errors are my fault. (I haven't tried to build and test libc++ yet.) Please tell me if we need to do anything else in order to follow https://llvm.org/docs/DeveloperPolicy.html#attribution-of-changes .

Notes:

  • libc++'s integer charconv is unchanged (except for a small refactoring). MSVC's integer charconv hasn't been tuned for performance yet, so you're not missing anything.
  • Floating-point from_chars isn't part of this patch because Jorg found that MSVC's implementation (derived from our CRT's strtod) was slower than Abseil's. If you're unable to use Abseil or another implementation due to licensing or technical considerations, Microsoft would be delighted if you used MSVC's from_chars (and you can just take it, or ask us to provide a patch like this). Ulf is also working on a novel algorithm for from_chars.
  • This assumes that float is IEEE 32-bit, double is IEEE 64-bit, and long double is also IEEE 64-bit.
  • I have added MSVC's charconv tests (the whole thing: integer/floating from_chars/to_chars), but haven't adapted them to libcxx's harness at all. (These tests will be available in the microsoft/STL repo soon.)
  • Jorg added int128 codepaths. These were originally present in upstream Ryu, and I removed them from microsoft/STL purely for performance reasons (MSVC doesn't support int128; Clang on Windows does, but I found that x64 intrinsics were slightly faster).
  • The implementation is split into 3 headers. In MSVC's STL, charconv contains only Microsoft-written code. xcharconv_ryu.h contains code derived from Ryu (with significant modifications and additions). xcharconv_ryu_tables.h contains Ryu's large lookup tables (they were sufficiently large to make editing inconvenient, hence the separate file). The xmeow.h convention is MSVC's for internal headers; you may wish to rename them.
  • You should consider separately compiling the lookup tables (see https://github.com/microsoft/STL/issues/172 ) for compiler throughput and reduced object file size.
  • See https://github.com/StephanTLavavej/llvm-project/commits/charconv for fine-grained history. (If necessary, I can perform some rebase surgery to show you what Jorg changed relative to the microsoft/STL repo; currently that's all fused into the first commit.)

Questions:

  1. MSVC's STL conventionally uses _Single_uppercase_ugly identifiers, but xcharconv_ryu.h (and tables) also use __double_lowercase_ugly in Ryu-derived code, for ease of search-and-replacing (and comparing) with upstream Ryu. (That file still uses _Single_uppercase_ugly for additions; it looks weird but helped me keep track of what surgery I was performing.) I've compared these files against MSVC's and the diffs are very readable, so I'd like to keep the identifiers following the same convention so our codebases don't diverge too much. Does libc++ want to use __double_lowercase_ugly identifiers? (But apparently you use _Single_uppercase_ugly for template parameters?) If so, I can change the identifiers in this patch, and in microsoft/STL charconv upstream.
  2. I have included Microsoft's Apache+LLVM and Ulf's Boost banners, as they have appeared in the microsoft/STL repo. Would you like something different?

Diff Detail

Unit TestsFailed

TimeTest
3,290 mslibcxx CI Apple back-deployment macosx10.9 > libc++.libcxx::double_include.sh.cpp
Script: -- : 'RUN: at line 12'; /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ -c /private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/libcxx/double_include.sh.cpp -o /private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/test/libcxx/Output/double_include.sh.cpp.dir/t.tmp.first.o -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -v --target=x86_64-apple-macosx10.9 -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/__config_site -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/include -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/include/c++build -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/test/libcxx/Output/double_include.sh.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1
3,260 mslibcxx CI Apple back-deployment macosx10.9 > libc++.libcxx::min_max_macros.compile.pass.cpp
Script: -- : 'COMPILED WITH'; /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ /private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/libcxx/min_max_macros.compile.pass.cpp -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -v --target=x86_64-apple-macosx10.9 -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/__config_site -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/include -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/include/c++build -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/test/libcxx/Output/min_max_macros.compile.pass.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -fsyntax-only
3,200 mslibcxx CI Apple back-deployment macosx10.9 > libc++.libcxx::no_assert_include.compile.pass.cpp
Script: -- : 'COMPILED WITH'; /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ /private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/libcxx/no_assert_include.compile.pass.cpp -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -v --target=x86_64-apple-macosx10.9 -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/__config_site -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/include -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/include/c++build -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/test/libcxx/Output/no_assert_include.compile.pass.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -fsyntax-only
16,320 mslibcxx CI Apple back-deployment macosx10.9 > libc++.libcxx/modules::stds_include.sh.cpp
Script: -- : 'RUN: at line 29'; /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -v --target=x86_64-apple-macosx10.9 -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/__config_site -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/include -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/include/c++build -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/test/libcxx/modules/Output/stds_include.sh.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -fmodules -fcxx-modules -fsyntax-only -std=c++03 -DINVALIDATE_CACHE_CXX03 /private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/libcxx/modules/stds_include.sh.cpp
1,400 mslibcxx CI Apple back-deployment macosx10.9 > libc++.libcxx/utilities/charconv/charconv_to_chars::availability.fail.cpp
Script: -- : 'COMPILED WITH'; /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++ /private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/libcxx/utilities/charconv/charconv.to.chars/availability.fail.cpp -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -v --target=x86_64-apple-macosx10.9 -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/__config_site -include /tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support/nasty_macros.h -nostdinc++ -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/include -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/include/c++build -I/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/libcxx/test/support -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -std=c++2a -Werror -Wall -Wextra -Wshadow -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-user-defined-literals -Wno-noexcept-type -Wno-atomic-alignment -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fmodules-cache-path=/private/tmp/buildkite-builds/mbp-2-local-1/llvm-project/libcxx-ci/build/x86_64-apple-system-backdeployment-10.9/projects/libcxx/test/libcxx/utilities/charconv/charconv.to.chars/Output/availability.fail.cpp.dir/t.tmp/ModuleCache -Wno-macro-redefined -D_LIBCPP_HAS_THREAD_API_PTHREAD -Wno-macro-redefined -D_LIBCPP_ABI_VERSION=1 -fsyntax-only -Wno-error -Xclang -verify -Xclang -verify-ignore-unexpected=note -ferror-limit=0
View Full Test Results (9 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
STL_MSFT added inline comments.Nov 25 2019, 6:01 PM
libcxx/CREDITS.TXT
16

My apologies for the incomplete credit! I've added that, using your phrasing, and avoiding wrapping to the next line.

Regarding the comments - in microsoft/STL I generally preserved your comments whenever possible, adjusting only identifier names. In a few places, I chopped out both code and comments simultaneously (e.g. the RYU_OPTIMIZE_SIZE codepath). I erred on the side of preserving too many comments, instead of too few. I can definitely drop/alter comments that don't make sense here, just point them out.

libcxx/include/charconv
632

In MSVC's STL, std::min is defined in <algorithm>, a heavyweight header that isn't otherwise needed by <charconv>. This appears to be the case for libcxx as well.

If you want, I can include <algorithm> and use std::min, although I recommend against doing so due to the throughput cost, if <algorithm> isn't already being dragged in.

In the general precision codepath, I use std::lower_bound() and std::find_if(), as in MSVC's STL they're defined in a central internal header. In libcxx, they appear to be defined in <algorithm>, so I'm not sure how this compiled for Jorg.

637

It requires chars_format and to_chars_result to be defined. (In MSVC's STL, a fourth header xcharconv.h defines just those types, so both charconv and xcharconv_ryu.h can include it, but Jorg eliminated that complexity as you already had definitions of these types in charconv.)

If you want, I can move this almost to the top of the file, after the enum and return type definitions.

644

Thanks! I'll mark your comment as Done; please let me know if your patch lands before to_chars does, and I can eliminate _Bit_cast accordingly. (I assume that __libcpp_bitcast will be the internal, callable-from-any-Standard-mode counterpart of C++20 bit_cast; microsoft/STL has similar conventions.)

716

Done; I changed all occurrences in charconv and xcharconv_ryu.h. (MSVC's compiler just directly recognizes is_same_v for improved throughput.)

libcxx/include/xcharconv_ryu.h
75

Dropped // clang-format off, // clang-format on in both xcharconv_ryu.h and xcharconv_ryu_tables.h. (We clang-format all of microsoft/STL and it is an amazing time-saver. We usually suppress it only when clang-format really damages the code. In this case, we suppressed it to avoid adding yet another massive set of diffs with upstream Ryu.)

194

Additional notes: the C++ Standard recently made this well-defined for signed destinations, but it has always been well-defined for unsigned destinations. There's an MSVC compiler option (/RTCc) that asserts when such conversions are value-modifying, even when performed with casts, but because this breaks Standard-conformant code, MSVC's STL explicitly does not support that compiler option.

225

I suspect that _WIN64 isn't the right macro to test, though. For microsoft/STL, _WIN64 is the macro defined for all 64-bit architectures (i.e. x64 and ARM64).

974

Good catch; your Java implementation isn't present here, so this is a dangling reference. I'll simply drop this comment; anyone who needs to delve into the tables should be looking at your codebase (which they can find given the URL and DERIVED FROM f2s.c comments).

976

Ah, excellent! I think I found the relevant commit. I haven't taken an update from Ryu upstream since 2019-04-02, so I'll file a microsoft/STL issue to do that. It's a labor-intensive process and I haven't figured out a better way to do it yet (all of the PRs you've accepted have been enormously helpful, but the need to uglify literally every identifier makes rebasing difficult). Keeping libcxx in sync will be additional work, but hopefully not too much, if we can keep the divergence between microsoft/STL and libcxx small.

Would libcxx prefer to go ahead with porting the current implementation (and figure out how to port enhancements later), or should we prioritize updating microsoft/STL and then resume this port? As the float lookup tables are small, I believe that it should be OK to ship them like MSVC has been doing, for the near future.

1026

Changed to // __builtin_ctz doesn't appear to be faster here. and similarly for 64-bit. I'll submit this to ulfjack/ryu too.

STL_MSFT updated this revision to Diff 230993.Nov 25 2019, 6:08 PM
STL_MSFT marked 7 inline comments as done.
STL_MSFT added reviewers: ulfjack, jorgbrown.

First round of changes addressing code review comments.

expnkx added a subscriber: expnkx.EditedNov 27 2019, 8:50 AM

PLEASE DON'T ACCEPT THIS STUPID pull request.

assuming long double is 64 bits is stupid. Also why I do not use VC. 64 bits long double and no any kind of assembly allowed is a joke.
They could not prove the algorithm is right as well since Ryu only provides a small part of algorithm.

charconv is a mistake. It does not worth the effort for implementing this. It does not make your code faster since most problems are iostreams and stdio.h's problem. There are NO algorithms on the world to make it correct including Ryu (Ryu is not panacea although STL keeps spreading false information). LET IT DIE PLEASE.

assuming long double is 64 bits is stupid.

I agree, actually. We'll fix this shortly afterward.

Also why I do not use VC. 64 bits long double and no any kind of assembly allowed is a joke.
They could not prove the algorithm is right as well since Ryu only provides a small part of algorithm.

Can you prove the algorithm wrong? I was on the way to my own charconv implementation, and forced to give it up because the implementation in VC was both faster and more correct. (Specifically, there were rare cases where VC was producing one less digit.)

charconv is a mistake. It does not worth the effort for implementing this. It does not make your code faster since most problems are iostreams and stdio.h's problem.

But both iostream and stdio.h are much, much slower than this code. A factor of 10 or more.

There are NO algorithms on the world to make it correct including Ryu (Ryu is not panacea although STL keeps spreading false information).

You sound very convinced of this, so I'm hoping you have some examples where this implementation produces incorrect results? Certainly, we could use the test cases.

Certainly, we could use the test cases.

Test cases beyond the 5 MB here are very welcome.

Aside from superficial bugs which I was easily able to fix (like printing "4.") and a bug that I found in Ulf's earliest Ryu Printf code which he fixed promptly, I have been unable to find any bugs in either Ryu or Ryu Printf despite months of intensive testing. The test suite in this review, which is derived from Ulf's, contains hardcoded test cases exercising all of the tricky scenarios that I'm aware of. It also contains randomized coverage, verifying that scientific shortest (i.e. Ryu) round-trips, although it doesn't directly verify the shortest property. (That should now be possible, as I have an explicit list of anomalous powers of 2 thanks to Rick Regan's Exploring Binary blog; those powers are tested.) The randomized coverage also compares fixed shortest integers (i.e. Ryu with a long division or Ryu Printf fallback) against sprintf(), verifying that all digits are printed correctly. Those are in test_floating_prefix(). In test_floating_precision_prefix(), randomized coverage verifies that fixed, scientific, and general precision emit the same output as printf(). (This doesn't test rounding, primarily because Microsoft's UCRT contained a rounding bug that Ryu Printf is immune to, but the hardcoded test cases carefully verify rounding.) Finally, during my original testing, I took the randomized round-tripping code and modified it to run tests in parallel, forever, and validated trillions of values on a many-core machine for days, with zero failures. Upstream, Ulf's readme explains how he exhaustively checked all 32-bit floats for shortest round-trip (validating both properties).

The code that's most likely to be incorrect is the code that I've layered on top of Ryu and Ryu Printf; e.g. the _Can_use_ryu switching criterion is something that I had to derive from first principles, and similarly for the general precision code that avoids trial formatting with a "triangular table" and then formats into a local buffer before trimming zeroes. That said, I wrote all of that code with maximum attention to detail, and I believe I got it right (verified by tests).

Aside from randomized testing directly verifying the shortest property, I believe that the other interesting area for testing would be fuzz testing; I experimented a bit with this (finding no issues).

ldionne requested changes to this revision.Dec 10 2019, 2:26 PM

@STL_MSFT @jorgbrown @ulfjack

Thanks a lot for your work (and anyone else that's participated). I've left a few comments, but obviously this patch is huge. @STL_MSFT How do you want to proceed with this? We can either back-and-forth on this review until we're happy and we can ship it (my preference), or if you prefer someone from libc++ can take over the patch from here. My preference is clearly the former, but I want to avoid annoying you in case you don't have time for this.

If you want to stay responsible for the patch, do you need help with running the libc++ test suite under various language versions? Right now I see several language conformance errors popping up (like using inline variables prior to C++17), and those are going to be easy to fix if you can just compile locally. Otherwise I can do that and point it to you, but it's not very efficient.

PLEASE DON'T ACCEPT THIS STUPID pull request.

Please refrain from commenting on libc++/libc++abi reviews until you can do so without being rude. Making technical points about a specification or an implementation thereof is OK, calling other people's work stupid isn't.

libcxx/include/xcharconv_ryu.h
94

As terrible as it seems, we would technically need to sprinkle _LIBCPP_INLINE_VISIBILITY on top of everything that we don't want to expose in the ABI of libc++. This will give those symbols hidden visibility and internal linkage, basically, allowing us to modify the implementation of these functions

@STL_MSFT Actually, how do you guys handle ABI stability in your STL nowadays? Do you avoid changing the preconditions/postconditions of even implementation-detail functions? I think that is what libstdc++ does.

libcxx/include/xcharconv_ryu_tables.h
57

We need to make this code (and all other code in headers too) conditional on C++17. libc++ needs to work in C++14 and older language standards too, and

  1. we don't want to provide charconv prior to C++17, and
  2. we can't use inline variables before C++17, etc.

Right now, when I compile with your patch applied in C++14 mode, I get a bunch of warnings/errors about inline variables being used but not supported.

libcxx/test/std/utilities/charconv/charconv.msvc/double_fixed_precision_to_chars_test_cases_1.hpp
2

Any reason why this is in charconv/charconv.msvc instead of just charconv/? Is it because these tests are not adapted to libc++'s style?

libcxx/test/std/utilities/charconv/charconv.msvc/double_scientific_precision_to_chars_test_cases_1.hpp
40

We normally use regular include guards, it would be great to use that instead.

This revision now requires changes to proceed.Dec 10 2019, 2:26 PM
STL_MSFT marked 4 inline comments as done.Dec 11 2019, 11:47 PM

I've left a few comments, but obviously this patch is huge.

Yeah (and I'm regretting adding the 5 MB of tests to this patch, since I can barely type into Phabricator; I suspect it's making the web interface very slow). It'll be bigger if you want from_chars()! :-)

@STL_MSFT How do you want to proceed with this? We can either back-and-forth on this review until we're happy and we can ship it (my preference),
or if you prefer someone from libc++ can take over the patch from here. My preference is clearly the former, but I want to avoid annoying you in case you don't have time for this.

I humbly request the latter (take over the patch) as I'm currently time-limited due to getting the microsoft/STL repo up and running for community contributions. As I'm not familiar with working in the libc++ environment, iterating with me in the loop will be extremely time-consuming for all involved. My hope is that, given that Jorg has gotten the code working on at least one libc++ platform, you can adapt the tests, and use them to verify that further changes for other platforms don't disrupt the code, which should make iterating on the code much faster.

I am, however, available to answer questions, review diffs to the code, or change microsoft/STL in ways that will make your life easier.

libcxx/include/xcharconv_ryu.h
94

Yeah, we're careful about precondition/postcondition changes. If we need to do that to implementation-detail functions, we ensure that their mangled names change (often by numbering them with 2, 3, etc. and marking the old names as "zombies" in a special test).

I'm not sure how much you want to avoid mechanical changes like these visibility macros, in addition to things like our differing assert macros, std qualification macros, etc. If aligning with microsoft/STL sources is very important, we could decide on a common layer of macros, so that we could keep the diffs to a minimum. AFAIK, no STLs have done this before (sharing significant code) so we're in somewhat new territory. The alternative is to just make the mechanical changes in libcxx, and if any interesting optimizations appear in upstream Ryu or microsoft/STL, port them by hand.

libcxx/include/xcharconv_ryu_tables.h
57

Makes sense - this is all guarded for C++17 mode in microsoft/STL, but I didn't notice the lack of guards here. If I understand the issue correctly, it's pre-existing in libc++ charconv (which immediately begins providing machinery instead of checking the standard mode), but it becomes a problem due to these inline variables.

libcxx/test/std/utilities/charconv/charconv.msvc/double_fixed_precision_to_chars_test_cases_1.hpp
2

Correct - it's an exact copy of our tests, with no attempt made to adapt them to libc++'s style yet, or to compile them etc. The intention is for them to be properly adapted and moved, instead of committed like this.

libcxx/test/std/utilities/charconv/charconv.msvc/double_scientific_precision_to_chars_test_cases_1.hpp
40

Agreed. We could even make that change in microsoft/STL's tests if you want to reduce divergence.

expnkx added a comment.EditedDec 11 2019, 11:57 PM

assuming long double is 64 bits is stupid.

I agree, actually. We'll fix this shortly afterward.

Also why I do not use VC. 64 bits long double and no any kind of assembly allowed is a joke.
They could not prove the algorithm is right as well since Ryu only provides a small part of algorithm.

Can you prove the algorithm wrong? I was on the way to my own charconv implementation, and forced to give it up because the implementation in VC was both faster and more correct. (Specifically, there were rare cases where VC was producing one less digit.)

charconv is a mistake. It does not worth the effort for implementing this. It does not make your code faster since most problems are iostreams and stdio.h's problem.

But both iostream and stdio.h are much, much slower than this code. A factor of 10 or more.

There are NO algorithms on the world to make it correct including Ryu (Ryu is not panacea although STL keeps spreading false information).

You sound very convinced of this, so I'm hoping you have some examples where this implementation produces incorrect results? Certainly, we could use the test cases.

The problem is that you won't just avoid iostream/stdio.h for your outputting. Even speed up 10x on parsing strings/double make no sense in real world because you still have to use stdio and iostream for doing the IO part which basically eliminates all the gain from charconv.

You cannot avoid clocale. You cannot avoid iostates. You cannot avoid virtual function calls. You cannot avoid all the bad stuff iostream and stdio.h provide. You cannot avoid a lot of some uses std::endl. You cannot avoid exception safety issues of iostates. You cannot avoid the error ignoring part of stdio.h. stdio.h (libc) in general is the number 1 violator of zero-overhead principle, despite Herb Sutter spreads misinformation of exceptions and RTTI. You cannot avoid incorrect type for operator<< overloading bloats error message for the entire page.

I would like to just remove iostream and stdio.h in the future instead of just fixing one part. The entire IO libraries in C and C++ need to be removed.

Not mentioning return-to-libc vulnerabilities and accepts this as implementation of CRT would make it even worse.

Here are my benchmarks to show why charconv is a mistake since it reduces your performance from the optimal implementation for at least 4 times.
https://bitbucket.org/ejsvifq_mabmip/fast_io/src/default/

g++ (cqwrteur) 10.0.0 20191031 (experimental)

Copyright (C) 2019 Free Software Foundation, Inc.

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

cqwrteur@DESKTOP-7H7UHQ9:~/fast_io/examples/build$ ./output_10M_size_t

std::FILE*: 0.56558740s

std::ofstream: 0.57254780s

std::ofstream with tricks: 0.37952570s

std::to_chars + ofstream rdbuf tricks: 0.16530360s

std::to_chars + obuf: 0.12705310s

obuf: 0.07508470s

Charconv does not save you in any case.

charconv is a mistake. It has a terrible API design. It encourages buffer overflow. It does not use exceptions as error reporting mechanism and the error code can easily be ignored. There is no way you can allocate the correct size of your buffer to make it portable. It does not accept contiguous_iterator which is silly. The base stuff is not constexpr which means the library cannot generate base table for each base which kills your performance for any other base rather than base 10. (base3,base36 for example).

It is funny how WG21 accepts stuff like charconv in 20 minutes even there are no algorithms in the world to correctly implement this and no implementations are available where things like concepts, modules, coroutines, herbceptions, reflections, metaclasses have to be delayed for 20 years.

charconv is a perfect example of why C++ keeps bleeding users because the decision-makers in WG21 do not understand the basic theories of computer science.

What's wrong with adding unchecked_push_back to std::vector??? No, they don't since they never seriously take an architecture course in the university and do not understand what is branch prediction. Branch prediction is to reduce the penalty of a branch, NOT to eliminate the branch instruction itself which means branch will always take a cycle.

BTW, I implement full of Ryu algorithm and what I know is that Ryu is not a complete algorithm. It is still in its early phase and far from actual usable, no long double, for example, no full implementation string to double parses.

https://queue.acm.org/detail.cfm?id=3212479

De facto, neither C++ nor C cares about performance. Today's C++ abstraction machine does not understand page tables, multi processes, virtual memory space, GPU which are basically all the modern hardware implements. It has been reported std::vector can be 70 times slower than just abusing page tables by niall douglas.

ivafanas added inline comments.
libcxx/include/charconv
739

As I remember if constexpr is available since C++17:
https://en.cppreference.com/w/cpp/language/if

And charconv header was made usable with C++11 in libc++:
https://reviews.llvm.org/D59598

May be it is a good idea to save header compatibility with C++11.

STL_MSFT marked an inline comment as done.Mar 4 2020, 7:01 PM
STL_MSFT added inline comments.
libcxx/include/charconv
739

MSVC and Clang actually support if constexpr in earlier language versions (I am not sure about C++11 since MSVC supports only C++14), with a warning that can be suppressed.

I don't understand why someone would want to use an old language mode with a new library mode, but this shouldn't be a blocking issue for that.

ivafanas added inline comments.Mar 5 2020, 7:40 AM
libcxx/include/charconv
739

Hi,

MSVC and Clang actually support if constexpr in earlier language versions

Official documentation says that GCC 5.0+ is supported but @EricWF noted here about GCC 5.1+. According to cppreference if constexpr is supported since GCC 7+:
https://en.cppreference.com/w/cpp/compiler_support

If things haven't changed too much there might be GCC 5.* or GCC 6.* with C++11 but without if constexpr.

I don't understand why someone would want to use an old language mode with a new library mode

There is an explanation why libc++ needs charconv backport for C++11:
https://reviews.llvm.org/D59598

this shouldn't be a blocking issue for that

I'm not an expert in libc++ and do not have strong opinion.
Suppose that libc++ developers should have deep knowledge on that.

ldionne added a subscriber: lichray.

@lichray You expressed interest in this patch in Prague -- would you be interested in helping review it?

ldionne added inline comments.Mar 5 2020, 7:43 AM
libcxx/include/charconv
739

@ivafanas

I don't think it's reasonable or useful to backport this huge patch to C++11. We'll just disable the feature pre C++17.

ivafanas added inline comments.Mar 5 2020, 8:01 AM
libcxx/include/charconv
739

Understood.
Thank you!

lichray requested changes to this revision.Mar 5 2020, 9:13 AM

I suggest moving the core parts of the implementation to one or two translation units. The integer portion of <charconv> isn't header-only either. One part of the issue is that, if users build their project with debug, they probably don't want the whole <charconv> facilities to be debugable.

libcxx/include/charconv
132

The integer portion needs to work in C++11.
We may want to use char_format::fixed default precision of to_chars in implementing std::to_string as well, but in that case, we can create some shims.

971

It was reported some buggy gcc versions only accept errc(). Worth a fix?

1122

Maybe move it to some compilable utils that are controlled by cmake?

libcxx/include/xcharconv_ryu.h
58

__traits_base::__width also want this.
But I prefer to give it a different names because if we name these routines like this, I can foresee users pull in <charconv> to use MS intrinsics.

120

Do they need to be initialized to 0?

190

IIUC Clang supports _umul128 https://reviews.llvm.org/D25353

986

You can use UINT64_C to be warning-free.

1473

Can code like this reuse integer to_chars?

dnoor added a subscriber: dnoor.Jul 1 2020, 2:06 PM

@STL_MSFT How do you want to proceed with this? We can either back-and-forth on this review until we're happy and we can ship it (my preference),
or if you prefer someone from libc++ can take over the patch from here. My preference is clearly the former, but I want to avoid annoying you in case you don't have time for this.

I humbly request the latter (take over the patch) as I'm currently time-limited due to getting the microsoft/STL repo up and running for community contributions. As I'm not familiar with working in the libc++ environment, iterating with me in the loop will be extremely time-consuming for all involved. My hope is that, given that Jorg has gotten the code working on at least one libc++ platform, you can adapt the tests, and use them to verify that further changes for other platforms don't disrupt the code, which should make iterating on the code much faster.

I am, however, available to answer questions, review diffs to the code, or change microsoft/STL in ways that will make your life easier.

@STL_MSFT I'm busy implementing the format header so I'll need float to string conversion code at some point. Are you fine by me commandeering this patch? Do you know whether there are important changes at Microsoft's side that aren't in this patch?

Are you fine by me commandeering this patch?

Please go ahead!

Do you know whether there are important changes at Microsoft's side that aren't in this patch?

We've made a few changes upstream, fixing a couple of bugs in extreme corner cases and optimizing things slightly. https://github.com/microsoft/STL/commits/main/stl/inc/charconv shows what's changed.

There have been a couple of algorithmic developments very recently - for parsing (from_chars), the Eisel-Lemire algorithm is much faster than the technique used here. I would have used it if it were available back then! For formatting, Dragonbox is reportedly slightly faster, but I believe it would be a significant undertaking to replace Ryu/Ryu Printf in this code. (There's the core algorithms, the changes to implement bounds checking, and then the layer that handles fixed/general notation.)

Are you fine by me commandeering this patch?

Please go ahead!

Thanks!

Do you know whether there are important changes at Microsoft's side that aren't in this patch?

We've made a few changes upstream, fixing a couple of bugs in extreme corner cases and optimizing things slightly. https://github.com/microsoft/STL/commits/main/stl/inc/charconv shows what's changed.

Thanks.

There have been a couple of algorithmic developments very recently - for parsing (from_chars), the Eisel-Lemire algorithm is much faster than the technique used here. I would have used it if it were available back then! For formatting, Dragonbox is reportedly slightly faster, but I believe it would be a significant undertaking to replace Ryu/Ryu Printf in this code. (There's the core algorithms, the changes to implement bounds checking, and then the layer that handles fixed/general notation.)

I wasn't aware of the Eisel-Lemire algorithm it looks interesting, thanks. I had a look at Dragonbox before. I was considering whether to implement Dragonbox or Ryu until Louis pointed me at this patch.

Mordante commandeered this revision.Feb 13 2021, 6:05 AM
Mordante added a reviewer: STL_MSFT.
Mordante updated this revision to Diff 323554.Feb 13 2021, 6:14 AM

Rebase against main.

Herald added a reviewer: Restricted Project. · View Herald TranscriptFeb 13 2021, 6:14 AM
Mordante planned changes to this revision.Feb 13 2021, 6:15 AM
Mordante updated this revision to Diff 323556.Feb 13 2021, 7:30 AM

Add the new headers to CMake and the module map.

Mordante planned changes to this revision.Feb 13 2021, 7:33 AM
Mordante added inline comments.
libcxx/include/xcharconv_ryu.h
2

This file needs to be renamed to a proper libc++ name.

libcxx/include/xcharconv_ryu_tables.h
2

This file needs to be renamed to a proper libc++ name.

Mordante updated this revision to Diff 325010.Feb 19 2021, 9:22 AM
  • Update the header tests.
  • Apply Microsoft's changes of e4bc00e70cbb539f90b803a64a31f0259e21f28
  • Include the unit test with a hack, just to avoid code changes.
  • Add several TODO's for thing that need to be fixed before the final approval.
  • Disable and modify some code.
    • Parts of these changes would nice to be upstreamed. I'll create a PR for MSFT's STL
    • Parts are work-around for unimplemented features in libc++.
Mordante planned changes to this revision.Feb 19 2021, 9:23 AM
Mordante updated this revision to Diff 325165.Feb 20 2021, 12:04 AM
  • add guards for C++17. This fixes several failing unit tests.
  • Removed _Charconv_digits[] from C++03.
Mordante planned changes to this revision.Feb 20 2021, 12:05 AM
Mordante updated this revision to Diff 325189.Feb 20 2021, 4:30 AM
  • Re-add an colon after UNSUPPORTED, seems I fat fingered the delete after testing, properly disables the nodes prior to C++17.
  • Disable for the following builds:
    • Single-treaded
    • No random device
    • No locale This should be looked at later, but for now just want to get all builds in the green.
Mordante planned changes to this revision.Feb 20 2021, 4:41 AM
Mordante updated this revision to Diff 325209.Feb 20 2021, 8:38 AM

Mark the functions not available on older Mac versions. This should fix
Apple back-deployment macosx10.9, the last build failing to pass.

Mordante planned changes to this revision.Feb 20 2021, 8:38 AM
Mordante updated this revision to Diff 325324.Feb 21 2021, 11:03 AM

_LIBCPP_NODISCARD_AFTER_CXX17 -> [[nodiscard]] compilers should support it by now, also the AFTER_CXX17 version seems not to be required
Add _LIBCPP_INLINE_VISIBILITY to the functions

Mordante planned changes to this revision.Feb 21 2021, 11:05 AM

@ldionne should we enabling floating point support by a CMake configuration option which is ON by default. The tables aren't small and it could add bloat for embedded devices.