This is an archive of the discontinued LLVM Phabricator instance.

[libc] refactor atof string parsing
ClosedPublic

Authored by michaelrj on Nov 2 2021, 11:16 AM.

Details

Summary

Split the code for parsing hexadecimal floating point numbers from the
code for parsing the decimal floating point numbers so that the parsing
can be faster for both of them.

This decreases the time for the benchmark in release mode by about 15%,
which noticeably beats GLibc.

Old version: 2.299s
New version: 1.893s
GLibc: 2.133s

Tests run by running the following command 10 times for each version:
time ~/llvm-project/build/bin/libc_str_to_float_comparison_test ~/parse-number-fxx-test-data/data/*

the parse-number-fxx-test-data-repository is here:
https://github.com/nigeltao/parse-number-fxx-test-data/tree/fe94de252c691900982050c8e7c503d1efd1299a

It's important to build llvm-libc in Release mode for accurate
performance comparisons against glibc (set -DCMAKE_BUILD_TYPE=Release in
your cmake).
You also have to build the libc_str_to_float_comparison_test target.

Diff Detail

Event Timeline

michaelrj created this revision.Nov 2 2021, 11:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2021, 11:16 AM
michaelrj requested review of this revision.Nov 2 2021, 11:16 AM
michaelrj updated this revision to Diff 384810.Nov 4 2021, 10:46 AM

add __builtin_clz

lntue accepted this revision.Nov 4 2021, 11:04 AM

Overall the patch looks good for me. I can see where the possible rounding errors come from and also we can definitely improve to support parsing longer strings than fitting into the integer types. But we can leave them for the followups.

libc/src/__support/str_to_float.h
468

BITS_IN_BITSTYPE is only used in this expression, and this part is also constexpr. So maybe merge this into the BITS_IN_BITSTYPE definition and change its name appropriately?

598–605

What do you think about restructuring this part to:

if (*src == DECIMAL_POINT) {
  if (afterDecimal) {
    // comments
    break;
  }
  afterDecimal = true;
  ++src;
  continue;
}
627

Is this the same as src = tempStrEnd?

673–680

Similar structure as the above comment for decimal?

685

Does it mean that we cannot parse the string longer than 16 hexadecimal digits?

This revision is now accepted and ready to land.Nov 4 2021, 11:04 AM
sivachandra added inline comments.Nov 4 2021, 11:58 PM
libc/src/__support/str_to_float.h
459

This function is mostly NormalFloat::operator T. It would be nice if we can extend it as required and avoid the mostly duplicated logic. AFAICT, setting of errno is the difference? Input exponent needs to be corrected of course like on line 480, which can be done as a preprocessing step. You can do it as a cleanup later if it can be done.

500

Can the shift and round operation here lead to an overflow? See this for why it can happen: https://github.com/llvm/llvm-project/blob/main/libc/src/__support/FPUtil/NormalFloat.h#L124

michaelrj updated this revision to Diff 385207.Nov 5 2021, 3:33 PM
michaelrj marked 6 inline comments as done.

clean up code in response to the comments, some of which improved speed significantly.

michaelrj edited the summary of this revision. (Show Details)Nov 5 2021, 3:34 PM
michaelrj added inline comments.
libc/src/__support/str_to_float.h
459

I think it's possible. What I'd probably want to do for that is add a way to initialize a NormalFloat with a mantissa that uses all the bits of a UIntType. Then it would be a relatively trivial process.

500

ah, yes it can. This is now fixed here and above, and a test has been added.

598–605

Doing the change you suggested was a noticeable speed improvement.

627

ah, I think I misunderstood how restrict worked. This is fixed now.

685

correct, but that's because that's just how many bits there are space for in a double. The mantissa actually stores a little less, the extra bits are so that the rounding is more accurate.

lntue accepted this revision.Nov 8 2021, 4:23 PM
This revision was automatically updated to reflect the committed changes.