This is an archive of the discontinued LLVM Phabricator instance.

[libc] add atof, strtof and strtod
ClosedPublic

Authored by michaelrj on Sep 3 2021, 2:30 PM.

Details

Summary

Add the string to floating point conversion functions.
Long doubles aren't supported yet, but floats and doubles are. The
primary algorithm used is the Eisel-Lemire ParseNumberF64 algorithm,
with the Simple Decimal Conversion algorithm as backup.

Links for more information on the algorithms:

Number Parsing at a Gigabyte per Second, Software: Practice and
Experience 51 (8), 2021 (https://arxiv.org/abs/2101.11408)
https://nigeltao.github.io/blog/2020/eisel-lemire.html
https://nigeltao.github.io/blog/2020/parse-number-f64-simple.html

Diff Detail

Event Timeline

michaelrj created this revision.Sep 3 2021, 2:30 PM
michaelrj requested review of this revision.Sep 3 2021, 2:30 PM
michaelrj updated this revision to Diff 370679.Sep 3 2021, 3:56 PM

Fixed the issue where mantissa was never initialized.

michaelrj updated this revision to Diff 371215.Sep 7 2021, 4:27 PM

added a ton more tests for the hexadecimal path, and ended up fixing a lot of bugs along the way. Mainly issues related to how rounding was handled. Now the rounding is correct for all of the tests in WorkingExamples.

michaelrj updated this revision to Diff 373099.Sep 16 2021, 4:14 PM

add support for decimal conversion

Just did a quick pass. Will do a detailed review when it is final.

libc/src/__support/detailed_powers_of_ten.h
19

We should ideally have the script which produced these constants checked in. I encourage you to look at https://www.sollya.org/ using which one can trivially produce these constants. We can check-in sollya scripts. For example: https://github.com/llvm/llvm-project/blob/main/libc/utils/mathtools/expm1f.sollya

libc/src/__support/str_conv_utils.h
242 ↗(On Diff #373099)

Using fputil:UInt is probably an overkill here if all you want is a 128-bit integer. You should probably use __uint128_t. For the access to the individual 32-bits words, you can use a MutableArrayRef: https://github.com/llvm/llvm-project/blob/main/libc/utils/CPP/ArrayRef.h#L123.

So, something like this:

__uint128_t finalApprox = mantissa;
MutableArrayRef<uint32_t> finalApproxWords(reinterpret_cast<uint32_t *>(&finalApprox, 4);

To reduce verbosity, you can may be define:

using UInt128Words = MutableArrayRef<uint32_t>;
static inline UInt128Words makeUInt128Words(__uint128_t *val) {
  return UInt128Words(reinterpret_cast<uint32_t *>(val), 4);
}

That said, I will leave it up to you to decide which is more readable.

michaelrj updated this revision to Diff 373384.Sep 17 2021, 7:18 PM
michaelrj marked an inline comment as done.

change to using __uint128_t instead of UInt<128>

michaelrj updated this revision to Diff 373708.Sep 20 2021, 2:19 PM

rewrite strtof tests to be more readable

michaelrj updated this revision to Diff 376385.Sep 30 2021, 3:30 PM

Finish fallback algorithm and add unit tests.

michaelrj retitled this revision from [libc][WIP] add atof, strtof, strtod, and strtold to [libc] add atof, strtof and strtod.Sep 30 2021, 3:30 PM
michaelrj edited the summary of this revision. (Show Details)
michaelrj updated this revision to Diff 376391.Sep 30 2021, 3:48 PM

add atof, since I forgot it previously

michaelrj updated this revision to Diff 377046.Oct 4 2021, 3:55 PM

fix a comment

lntue added a subscriber: lntue.Oct 5 2021, 6:23 AM
michaelrj updated this revision to Diff 377372.Oct 5 2021, 3:37 PM

fix bugs uncovered by the new tester.

lntue added inline comments.Oct 5 2021, 4:34 PM
libc/test/src/__support/string_to_float_comparison_test.cpp
73

Can these test re-use the testing infra from FPBits and ULP functions?

michaelrj edited the summary of this revision. (Show Details)Oct 6 2021, 4:16 PM
michaelrj updated this revision to Diff 377720.Oct 6 2021, 4:20 PM

add a link to the original paper for Eisel-Lemire

update the large scale tester to give more useful feedback, and fix some problems discovered using it.

Current results from running the tester on the full "parse number fxx test data" set:
Total failed conversions: 0
Total conversions off by the least significant bit: 9112
5242 float low
3804 float high
59 double low
7 double high
Total lines: 5268191

runtime: ~10s when built in debug mode.

michaelrj added inline comments.Oct 6 2021, 4:41 PM
libc/test/src/__support/string_to_float_comparison_test.cpp
73

I'll look into that for the future when I'm cleaning this test up for running on the bot, but for the moment this is more of a quick and dirty thing that I'm using as a manual test.

michaelrj updated this revision to Diff 378041.Oct 7 2021, 4:33 PM

finalize the decimal to float algorithm. Total failed conversions of any type is down to 0. Next task is cleaning up the CL.

michaelrj updated this revision to Diff 378379.Oct 8 2021, 4:33 PM

Clean up a lot of the mess made while trying to complete the algorithm.

michaelrj updated this revision to Diff 380094.Oct 15 2021, 1:25 PM

add fuzzing test and fix some bugs discovered by it.

michaelrj updated this revision to Diff 380521.Oct 18 2021, 2:31 PM

added another test found via fuzzing and cleaned up a few typos

michaelrj updated this revision to Diff 380529.Oct 18 2021, 3:07 PM

rename the hex conversion functions in the comparison test to be more clear

michaelrj updated this revision to Diff 380536.Oct 18 2021, 3:53 PM

add a todo for the followup patch so that I can land this.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 18 2021, 4:10 PM
This revision was automatically updated to reflect the committed changes.