This is a implementation of find remainder fmod function from standard libm.
The underline algorithm is developed by myself, but probably it was first
invented before.
Some features of the implementation:
- The code is written on more-or-less modern C++.
- One general implementation for both float and double precision numbers.
- Spitted platform/architecture dependent and independent code and tests.
- Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc.
- The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided).
- Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication.
Performance tests:
The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases.
./check.sh <--special|--worst> fmodf passed.
CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf results are
GNU libc version: 2.35 GNU libc release: stable 21.166 <-- FPU 51.031 <-- current glibc 37.659 <-- this fmod version.
clz are also used in sqrt functions, which would also be used again by FMA function. Would you mind factoring clz functions to another library similar to https://reviews.llvm.org/D124495 ? You can overwrite what I did over there, as this change should be landed before that one. Thanks!