This is an archive of the discontinued LLVM Phabricator instance.

[libc][math] fmod/fmodf implementation.
ClosedPublic

Authored by orex on Jun 4 2022, 4:23 AM.

Details

Summary

This is a implementation of find remainder fmod function from standard libm.
The underline algorithm is developed by myself, but probably it was first
invented before.
Some features of the implementation:

  1. The code is written on more-or-less modern C++.
  2. One general implementation for both float and double precision numbers.
  3. Spitted platform/architecture dependent and independent code and tests.
  4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc.
  5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided).
  6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication.

Performance tests:

The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases.

./check.sh <--special|--worst> fmodf passed.
CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf results are

GNU libc version: 2.35
GNU libc release: stable
21.166 <-- FPU
51.031 <-- current glibc
37.659 <-- this fmod version.

Diff Detail

Event Timeline

orex created this revision.Jun 4 2022, 4:23 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 4 2022, 4:23 AM
orex published this revision for review.Jun 4 2022, 4:28 AM
orex edited the summary of this revision. (Show Details)

Thanks for adding support for fmod functions with improved performance on regular range! Let's discuss a bit more to see if we can also improve or maintain the performance for denormal inputs.

libc/src/__support/FPUtil/FPBits.h
30

clz are also used in sqrt functions, which would also be used again by FMA function. Would you mind factoring clz functions to another library similar to https://reviews.llvm.org/D124495 ? You can overwrite what I did over there, as this change should be landed before that one. Thanks!

libc/src/__support/FPUtil/generic/CMakeLists.txt
19

Please fix.

libc/src/__support/FPUtil/generic/FMod.h
17

I'm not sure that we can include that many C++ standard headers in here, as it might introduce circular dependency among the libraries. @sivachandra should have known more about these than me.

83

I don't think you would want to use std::isnan or std::isfinite in here. As you can imagine, FPUtil functions would be the ones to provide the *backbone* of libc, and hence std:: functions, so if FPUtil functions depending on std:: math functions, it would likely create circular dependency when building or linking. So it's best to reuse or reimplement those simple math functions in the FPUtil / FPBits themselves.

113

This function seems to better be in FPBits class.

162

This function seems to be generic enough to be a static function in FPBits class.

230

We might not be able to use std::min here.

257

You might have to reimplement std::optional in __support or __support/CPP to prevent circular dependency.

libc/src/math/generic/CMakeLists.txt
1119

Please fix.

libc/test/src/math/differential_testing/CMakeLists.txt
513

Please fix.

libc/test/src/math/exhaustive/FMod_test.cpp
14 ↗(On Diff #434258)

Is this still needed?

43 ↗(On Diff #434258)

by * 2^iy is better implemented with ldexp functions https://en.cppreference.com/w/cpp/numeric/math/ldexp

orex updated this revision to Diff 434343.Jun 5 2022, 12:10 PM

Cleanup code by Intue suggestions.

orex marked 5 inline comments as done.Jun 5 2022, 12:29 PM

Thank you, Intue for your useful comments. I implement all of them.
As for performance issues with denormalized values, I've checked the code of glibc e_fmod.c. It looks like (I assume, but I did not check assembler code) that glibc function comes to denormalaized values faster than in my implementation, where I check most common cases first. I don't think that we should do something with it, because the case is very rare. I can hardly imaging it's real practical usage.

Kirill.

libc/src/__support/FPUtil/FPBits.h
30

I create a new helper with wrapped bulitins. libc/src/__support/FPUtil/builtin_wrappers.h

libc/src/__support/FPUtil/generic/FMod.h
17

All std includes were removed except cerrno.

257

Use "old style" returning optional values. I don't think, that optional reimplementing is needed for this case.

orex updated this revision to Diff 434346.Jun 5 2022, 12:32 PM
orex marked 2 inline comments as done.

Cleanup of the code by Intue suggestions.

orex updated this revision to Diff 434348.Jun 5 2022, 12:36 PM

Cleanup of the code by Intue suggestions

orex updated this revision to Diff 434349.Jun 5 2022, 12:39 PM

Cleanup of the code by Intue suggestions.

Thanks a lot for the patch. It seems like it includes a bunch of cleanups / "better this way" items not related to the main goal of the patch. Can you please separate out those parts in to a different patch so that we can keep the review focused? Also, we cannot strictly use std C++ headers. So, no cerrno also. Include errno.h instead.

orex added a comment.Jun 6 2022, 4:46 AM

Thanks a lot for the patch. It seems like it includes a bunch of cleanups / "better this way" items not related to the main goal of the patch. Can you please separate out those parts in to a different patch so that we can keep the review focused? Also, we cannot strictly use std C++ headers. So, no cerrno also. Include errno.h instead.

Thank you for the comment. I've picked out 3 "better this way" commits:
https://reviews.llvm.org/D127088
https://reviews.llvm.org/D127091
https://reviews.llvm.org/D127097

I'll rebase this commit, when the changes above will be submitted.

orex updated this revision to Diff 435837.Jun 10 2022, 1:49 AM

Rebasing changes on last main.

orex updated this revision to Diff 435841.Jun 10 2022, 1:56 AM

cerrno fix.

orex marked 2 inline comments as done.Jun 10 2022, 2:01 AM
orex added inline comments.
libc/src/__support/FPUtil/generic/FMod.h
17

cerrno excluded.

83

Implemented using builtin functions.

orex updated this revision to Diff 436463.Jun 13 2022, 10:17 AM

C standard/Posix processing of special numbers.

lntue added inline comments.Jun 16 2022, 9:33 AM
libc/src/__support/FPUtil/FPBits.h
173

For safety, you might need to add a quick return for when number == 0:

if (unlikely(number == 0)) return zero ...
libc/src/__support/FPUtil/generic/FMod.h
26

nit: treated *separately*

71

There is only one public method for this class, you can replace class with struct and remove public:.

71

Maybe rename the class to FModErrorHandler and add to the comment: following C99 standard with a link to https://en.cppreference.com/w/c/numeric/math/fmod

82

From the C standard, fmod(0, NaN) and fmod(0, inf) should be 0

If x is ±0 and y is not zero, ±0 is returned
88

What's the main purpose of this evaluation?

96

This should be moved to above isnan(x) || isnan(y), I don't think it is hit here.

100

I don't think this line is hit, and all of the unlikely's above are not needed.

104

We don't need this anymore, just use C standard.

145

Instead of using enable_ifwith the boolean InverseMultiplication, it might be better to separate these 2 versions of division_loop into 2 separate structs and then pass them to the second template argument, similar to the way you do exceptional handling. Something like:

// Comments explains what these functions do and what the input parameters are.
template<typename T>
struct FModDivisionLoop {
  static T execute(int n, int hyltzeros, T hx, T hy) { ... }
};
template<typename T>
struct FModInverseMultiplicationLoop {
  static T execute( ... )
};

Then the main class will be something like:

template<typename T, class ErrorHandler = FModErrorHandler<T>, class MainLoop = InverseMultiplicationLoop<T>>
class FMod { ... };
266

FMod<T>::eval() or execute instead of make?

orex added inline comments.Jun 21 2022, 5:30 AM
libc/src/__support/FPUtil/FPBits.h
173

It was Siva suggestion to move the function here. There is another class called NormalFloat which can handle such things. I propose to move the function back to FMod and implement "full" functionality in that class. What do you think? if number == 0 is not only one case which we need to check. ep value after processing below should also be checked for overflow.

libc/src/__support/FPUtil/generic/FMod.h
26

Sure.

71

Sure.

71

It is not an error handler, but special numbers cases processor. Do you have another name in mind? If no, let's go for Error.

82

Thank you. This is a very good comment. Obviously C standard "F.10.7.1" is not full. It do not describe at all, for example, NaN NaN case. I also do not think, that fmod(0, NaN) should return zero. I prefer to go for better consistent standard which are described here
https://pubs.opengroup.org/onlinepubs/9699919799/functions/fmod.html
https://en.cppreference.com/w/c/numeric/math/fmod
What do you think about this?

88

Thank you for the comment. I don't know any other approach to raise floating-point exception. Please suggest me something better.

96

A question about standard. See above.

100

return false. Yes you are right, but I can't return nothing from the function. It will be a warning warning: non-void function does not return a value in all control paths I don't want my code to produce such things.
unlikely I would like to explicitly say compiler to make just one conditional jump in the function. It is not so big problem, of course, I can remove this.

104

See above.

145

Thank you. Sounds reasonable. I will do this.

266

Yes. Sounds better. Thank you.

lntue added inline comments.Jun 21 2022, 9:15 AM
libc/src/__support/FPUtil/FPBits.h
173

The main reason you need to check for number == 0 here is because fputil::clz is a wrapper around __built_in_clz and when the inputs are 0, the behavior/output is undefined.

libc/src/__support/FPUtil/generic/FMod.h
71

I think technically these are not errors per se. They actually handle exceptional inputs / outputs, and as a class name, it's more appropriate to use a noun. So I think FModExceptionalHandler or FModExceptionalInputHandler or Helper might be better.

88

You can use https://github.com/llvm/llvm-project/blob/main/libc/src/__support/FPUtil/FEnvImpl.h to set the floating point exceptions. It's a bit more verbose but much clearer. And these are exceptional cases, so we don't care much about the performance anyway.

100

You can remove the last if and put x == 0 in the comment.

sivachandra added inline comments.Jun 21 2022, 9:23 AM
libc/src/__support/FPUtil/FPBits.h
173

I am unable to locate the context of my suggestion. Can you help me find it?

orex edited the summary of this revision. (Show Details)Jun 21 2022, 12:31 PM
orex added inline comments.
libc/src/__support/FPUtil/FPBits.h
173

Sorry for bothering you. I mixed your suggestion to extract "FBits improvements" and Tue particular suggestion. After new push all old comments are moved to new places for me, so It was difficult for me to figure it out.

173

I never call ctz or clz with zero in fmod case. If the functions now in the separate module, probably we should check zero there? It can have a performance impact, of course, but may be compiler can optimized out. If you always want to check it, I think that it is better to check it there? We can also think about best way to check it. My proposition is to move the function make_value back to FMod and add x == 0 check in ctz/clz functions (probably separate commit).

lntue added inline comments.Jun 21 2022, 12:56 PM
libc/src/__support/FPUtil/FPBits.h
173

You can use https://reviews.llvm.org/DXXXXXX/new/ instead of https://reviews.llvm.org/DXXXXXX. That should separate old and new comments better.

173

Now that this function make_value is refactored to stay under FPBits class, which might be used beyond fmod. So you will need to either add some comments explicitly stating its non-zero input assumption, or adding an extra check to make sure that it won't silently return wrong outputs when being used by other functions.

orex added inline comments.Jun 21 2022, 12:59 PM
libc/src/__support/FPUtil/FPBits.h
173

Don't you think, that we need to mention this in clz/ctz builtin wrappers?

orex updated this revision to Diff 439064.Jun 22 2022, 9:39 AM

Some cosmetic changes.

sivachandra added inline comments.Jun 22 2022, 9:49 AM
libc/src/__support/FPUtil/FPBits.h
173

Don't you think, that we need to mention this in clz/ctz builtin wrappers?

Yes. I think handling the corner case as reasonable should be done within our wrappers. We should not ideally have to pepper user code with checks.

orex added inline comments.Jun 22 2022, 10:00 AM
libc/src/__support/FPUtil/FPBits.h
173

Thank you for support Siva. I've implemented it in a way it check the case by default.

Thanks for your patient and sorry it took a bit too long for reviewing! Just a few nits and it's good to go.

Also feel free to sync and add fmod to the math status page at libc/docs/math.rst.

libc/src/__support/FPUtil/generic/FMod.h
26

Nit: you don't need to wrap the word separately in *'s, I just use it to highlight my suggestion.

32

Nit: actually without extra conditions on hx, hy, ix, iy, the representation x = hx * 2^ix is ambiguous (i.e., not unique), just from your example. It is only unambiguous if hx, hy, ix, iy are integers and hx, hy are odd, i.e., not divisible by 2.

33–69

So here is my understanding of the algorithm and the math behind it, please correct me if I'm wrong:

The two main properties about the (integer) modulus operation, denoted by mod or %, that we will use are:
For any positive integers a, b, c:

1) a mod b = (a mod (b * c)) mod b
2) (a * c) mod (b * c) = (a mod b) * c

First, let write x = hx * 2^ix and y = hy * 2^iy with hx, hy, ix, iy are integers (assumed to be positive for simplification).
Then the naive implementation of the fmod function with a simple for/while loop:

while (ix > iy) {
  hx = (hx * 2) % hy;
  --ix;
}

is mathematically equivalent to:

x mod y = (hx * 2^ix) mod (hy * 2^iy)
        = ((hx * 2^(ix - iy)) mod hy) * 2^iy     (apply property 2)
        = (( ... (((hx * 2^(ix - iy)) mod (hy * 2^(ix - iy - 1))) mod (hy * 2^(ix - iy - 2)) ... ) mod hy) * 2^iy     (apply property 1 repeatedly)
        = (( ... (( (hx * 2) mod hy) * 2^(ix - iy - 1)) mod (hy * 2^(ix - iy - 2)) ... ) mod hy) * 2^iy     (apply property 2)
        = (( ... (( (hx * 2) mod hy) * 2 ) mod hy ) ... ) * 2) mod hy) * 2^iy     (apply property 2 repeatedly)

And the total number of iterations is ix - iy.

On the other hand, your algorithm exploits the fact that hx, hy are the mantissas of floating point numbers, which use less bits than the storage integers: 24 / 32 for floats and 53 / 64 for doubles, so if in each step of the iteration, we can left shift hx as many bits as the storage integer type can hold, the exponent reduction per step will be at least 32 - 24 = 8 for floats and 64 - 53 = 9 for doubles:

x mod y = (hx * 2^ix) mod (hy * 2^iy)
        = ((hx * 2^(ix - iy) mod hy) * 2^iy
        = (( ... (( (( hx * 2^r1) mod hy ) * 2^r2) mod hy ) ... ) * 2^rk) mod hy) * 2^iy

where r1 + r2 + ... + rk = ix - iy and ri >= sizeof(UInt) - hy length for i = 1..k.

And so the number of iterations is at most by: (ix - iy) / (sizeof(UInt) - mantissa_length).

Feel free to use this to update the comments if it makes sense.

117

exp_diff for n and max_shift for hyltzeroes or other more descriptive names.

179

Maybe using mx, ex or m_x, e_x which are closer to mantissa and exponent of x?

210

Use more descriptive names such as lead_zeros_hy, tail_zeros_hy?

215

max_shift / max_scale_factor maybe more descriptive?

libc/test/src/math/exhaustive/CMakeLists.txt
189

Use lower case for the target name fmod_test.

195

Use lower case for the test file.

lntue added inline comments.Jun 23 2022, 10:24 AM
libc/src/__support/FPUtil/generic/FMod.h
82

I agree that this makes more sense. We should add a link to https://man7.org/linux/man-pages/man3/fmod.3p.html to support our decision, and maybe send suggestion to cppreference to adjust the order of exceptional cases on their page.

lntue added inline comments.Jun 23 2022, 6:54 PM
libc/src/__support/FPUtil/generic/FMod.h
237–240

Does marking these 2 conditions unlikely improve throughput?

lntue added inline comments.Jun 23 2022, 7:43 PM
libc/src/__support/FPUtil/generic/FMod.h
139

Use NumericLimits template from libc/src/__support/CPP/Limits.h.

libc/test/src/math/FModTest.h
37

Use NumericLimits template from libc/src/__support/CPP/Limits.h.

orex updated this revision to Diff 439665.Jun 24 2022, 1:35 AM
orex marked 5 inline comments as done.

Cosmetic changes: veriable renaming, docs update etc.

orex added a comment.Jun 24 2022, 1:47 AM

Thank you Tue, for your comments. They was really useful. I'll really appreciate if you go through all my replies. It was a lot of them, so I'm afraid that I can miss something or my changes can be improved even more. I'll also appreciate if you check the performance for fmod in the same way, as other functions. I've attached the file

. You should simply unpack it inside core-math folder.

libc/src/__support/FPUtil/generic/FMod.h
32

Thank you! Just a mistake "unambiguous" -> "ambiguous"

33–69

Thank you! Your explanation is very good. I included some sentences to the description. Huge math inserts looks absolutely unreadable in text format, so I skip them.
Nit: Some preparations step requires for this loop to work. Also this loop can be improved, changing division by subtraction.

117

Done. Thank you!

215

Changed to sides_zeroes_count.

237–240

n == 0 is quite likely condition, I think. But for hx, yes. You are right.

libc/test/src/math/FModTest.h
37

Unfortunately the library is very primitive. It does not have things, I need.

lntue added inline comments.Jun 24 2022, 7:30 AM
libc/src/__support/FPUtil/FPBits.h
173

Sorry that I confused you and Siva! What I meant was that since the function make_value had undefined behavior when the input is 0, and it was put in the common library, it should either:

  • be documented
  • add extra checks for 0.

Since you already documented it in item #4, that's good enough for me, and adding extra checks is not needed anymore since there are legitimate usage of make_value and clz when inputs are guaranteed to be non-zero such as in this patch and in other places. You can definitely add extra comments to document the undefined behavior of clz when inputs are 0.

libc/src/__support/FPUtil/builtin_wrappers.h
52

In your use case, extra checks for 0 input are not needed, so commenting on the undefined behavior of builtin_clz when inputs are 0 is good enough. A safe variant with extra checks could be added, but it should use different name. One of the reason is that now if someone one to use builtin_clz without extra checks, they will have to specify the input type T:

clz<T, false>(...)

So when the type T is known, or inside a non-template function, users cannot use type deduction: clz<uint32_t, false>( ... ).
So in summary, my preference is that:

  • clz should just be a simple wrap around type matching for __builtin_clz*.
  • undefined behavior for inputs 0 can/should be documented.
  • safe versions can be added, but use different name so that type deduction can be applied.
libc/test/src/math/FModTest.h
22

You should be able to feed floating point type T directly to expected, and use EXPECT_FQ_EQ macro instead.

37

Look like you only use it for some special constants. With std::numeric_limits, you are actually using <limits> without directly including it which might cause error if some targets or configs that do not transitively include it.

It would be better to add those constants that you need to constant creating functions in FPBits class, and/or update the DECLARE_SPECIAL_CONSTANTS macro at https://github.com/llvm/llvm-project/blob/main/libc/utils/UnitTest/FPMatcher.h#L70 . It could a separate patch that this one can depend on.

orex updated this revision to Diff 439798.Jun 24 2022, 9:21 AM

Tests polishing.

orex added a comment.Jun 24 2022, 9:24 AM

Tue. I think, if we will not find agreement with ctz/clz syntax, we can ask Siva to solve the problem. Also, as long as you will use the function in the next changes, you can also try them and if it will not work good, improve it.

libc/src/__support/FPUtil/builtin_wrappers.h
52

I have a little bit different opinion for this case. If somebody would like to simply use clz/ctz he can use it without any complication and in a safe way. But, if somebody know, that he can improve the performance, skipping the test, it looks like OK to have such complications. For example. I don't see much difference between, adding new function have syntax like this ctz<decltype(i), false>(i). Moreover such syntax can help to avoid implicit type conversion. Another point to use such template is to pass check zero parameter above easily. For example, make_value function can looks like make_value<bool CheckZero>(.... Of course it can be done with if consexpr, but template will be more straightforward.

libc/test/src/math/FModTest.h
22

Thank you.

37

Can you explain your point, please. From my point of view, I just forget to #include <limits>. From another side, I have a feeling, from you comment, that I should not do this. Is it so? If yes, why. It is a test. Tests are "final" instances, so they can include whatever they want. Fre example, I've checked exhaustive_test.cpp. It includes "half of" STL.

lntue added inline comments.Jun 24 2022, 11:30 AM
libc/src/__support/FPUtil/builtin_wrappers.h
52

So from the readability standpoint of both implementers and reviewers, I think clz(x) and safe_clz(x) or clz(x) and unsafe_clz(x)' are better to understand than clz(x) and clz<delctype(x), false>(x)`.
I'm also not a big fan of using boolean in template parameter, unless the template name makes it very clear what true and false mean. If it's only used in one or a few tests, that's fine. But this is on a utility header that will be used everywhere. If we really want to pass these as template parameters, it's better to wrap them in functors, like function<SafeClz> vs function<true>. Hope it makes sense?

Also, by changing the default clz(x) in this patch, you would need to update all current usages of it like in sqrt to the default version.

libc/test/src/math/FModTest.h
37

For unit tests, we try to limits the use of std:: and C++ standard header. You can see other unit tests in libc/test/src/math. Exhaustive tests are a bit different. They are kind of integration tests, so we do not be so strict about that (yet). So for example, std::numeric_limit<>::quiet_NaN() (most likely) simply call __builtin_nan*, which technically what our library implements explicitly. You don't have to do it now because this test currently does not have that problem. But we should clean it up in the future if not now.

orex updated this revision to Diff 439860.Jun 24 2022, 12:41 PM

unsafe ctz/clz

orex added inline comments.Jun 24 2022, 12:44 PM
libc/src/__support/FPUtil/builtin_wrappers.h
52

OK. You won.)))) I've change all occurrence to unsafe_ctz/clz.

lntue accepted this revision.Jun 24 2022, 12:50 PM

Thanks for sticking with me until now! Please sync to head and feel free to land when the pre-merge checks turn green.

This revision is now accepted and ready to land.Jun 24 2022, 12:50 PM
orex updated this revision to Diff 439877.Jun 24 2022, 1:16 PM

Changed unsafe_clz to safe_clz in string_to_float.h

lntue accepted this revision.Jun 24 2022, 2:08 PM
orex closed this revision.Jun 27 2022, 10:21 AM