- User Since
- Jul 4 2017, 7:47 AM (177 w, 5 d)
Wed, Nov 25
Tue, Nov 24
Mon, Nov 23
I am on Ubuntu 18 machine and it has finite math header <bits/math-finite.h>.
This header is included by the glibc 2.27. This header has this following definition.
extern double log (double) asm ("" "log_finite") attribute__ ((nothrow ));
Oct 21 2020
Oct 20 2020
Remove an incorrect file that got attached with my earlier patch.
Added a test case for testing vector library calls for VF=2 and VF=8.
Oct 13 2020
Updated the patch as per review comments received.
Oct 12 2020
As per review comments from Sanjay, updated the test case to use metadata. Also autogenerated the checks in the test cases using llvm/utils/update_test_checks.py.
Oct 11 2020
Changed library naming to LIBMVEC-X86 as per comments and also selected based on Target Tripple in clang.
I am still working on auto generating FileCheck for the test cases.
Oct 6 2020
Pinging for review comments.
Oct 4 2020
Selection of Glibc vector math library is enabled via the option -fvec-lib=libmvec .
Sep 29 2020
Sep 24 2020
Sep 23 2020
Initial version I supported the following vector functions (VF 2 and 4 ).
Sep 1 2020
Updated the patch as per comments received from Sanjay.
Aug 31 2020
Updated the patch with "nsz" check .
Update the patch as per the review comments given by Sanjay.
Aug 29 2020
Added comment in the test case.
As per the comment given by Sanjay, updated the coverage test for x*1/sqrt(x) pattern.
Aug 28 2020
Folding is not happening when the number of uses for the divide operand (say 1.0/something) is more than one.
Aug 27 2020
Aug 23 2020
Updated patch as per the comments from Sanjay.
Updated the patch as per the comments given by Sanjay. Adjusted the test cases.
Aug 22 2020
Aug 13 2020
Aug 11 2020
Agreed "FeatureFastScalarFSQRT" can be removed if target thinks scalar FSQRT is costly. I see currently set at "SKXTuning" (Skylake).
Fixed test case to check for proper return value.
Apr 9 2019
Looks Ok to me.
Jan 30 2018
I ran SPEC2017 on Ryzen (c/c++ benchmarks) -O2 -fno-unroll-loops. no significant change in performance with the patch.
I agree removing the lengthy (9 byte) instructions and reducing size of the loop is good. But on performance side, I need to do some tests.