This is an archive of the discontinued LLVM Phabricator instance.

[libc] Use intrinsics for x86-64 fma and optimize PolyEval for x86-64 with degree 3 & 5 polynomials.
ClosedPublic

Authored by lntue on Dec 8 2021, 7:31 AM.

Details

Summary
  • Use intrinsics for x86-64 fma
  • Optimize PolyEval for x86-64 with degree 3 & 5 polynomials.
  • There might be a slight loss of accuracy compared to Horner's scheme due to usages of higher powers x^2 and x^3 in the computations.

Diff Detail

Event Timeline

lntue created this revision.Dec 8 2021, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 7:31 AM
lntue requested review of this revision.Dec 8 2021, 7:31 AM

The subject line and description are confusing. It seems to me like you have added specializations for 3rd and 5th degree polynomials. But, the description has "3-6 polynomials".

libc/src/__support/FPUtil/PolyEval.h
36

Couple of things:

  1. We want individual header files to be self-contained. So, this include should be moved to the header files where it is required.
  2. Because of the same self-contained headers requirement, the entities defined in a header file should be in the __llvm_libc namespace explicitly as opposed to ending up in a namespace when included like this here. So, these includes should be moved outside of the namespace here.
libc/src/__support/FPUtil/x86_64/PolyEvalDouble.h
1 ↗(On Diff #392768)

Fix line length but not sure if the description is correct also.

libc/src/__support/FPUtil/x86_64/PolyEvalFloat.h
1 ↗(On Diff #392768)

Fix line length but not sure if the description is correct also.

libc/src/math/generic/CMakeLists.txt
503 ↗(On Diff #392768)

Can you add a comment explaining what in the implementation would be affected by this? AFAICT, we are either using the fma instructions directly, or are calling the fma related builtins. So, it is not clear to me as to why this should be required.

libc/test/src/math/expm1f_test.cpp
111

If there is a possibility of trading off between performance and accuracy, we should provide build time switches for users to pick one or the other based on their needs. By default we want to "err" on the side of providing more accurate implementations over providing faster but less accurate implementations.

sivachandra added inline comments.Dec 8 2021, 11:16 PM
libc/src/math/generic/CMakeLists.txt
503 ↗(On Diff #392768)

So I learn't that the fma related builtins require this option. The correct way to do this would be to do it this way:

  1. Make a separate helper library of x86 builtin calls.
  2. List -mfma as an INTERFACE compile option for that library.

Then targets depending on the helper library will automatically get the -mfma compile option.

lntue marked 3 inline comments as done.Dec 9 2021, 7:26 AM
lntue added inline comments.
libc/test/src/math/expm1f_test.cpp
111

We're going to have a correctly rounded version for expm1f soon so I'm not worried about this regression yet.

lntue updated this revision to Diff 393163.Dec 9 2021, 7:31 AM

[libc] Optimize PolyEval for x86-64 with degree 3-6 polynomials.

lntue updated this revision to Diff 393235.Dec 9 2021, 11:41 AM

[libc] Optimize PolyEval for x86-64 with degree 3-6 polynomials.

lntue updated this revision to Diff 393265.Dec 9 2021, 1:07 PM

[libc] Optimize PolyEval for x86-64 with degree 3-6 polynomials.

lntue updated this revision to Diff 393266.Dec 9 2021, 1:10 PM

[libc] Optimize PolyEval for x86-64 with degree 3-6 polynomials.

lntue updated this revision to Diff 393268.Dec 9 2021, 1:15 PM

[libc] Optimize PolyEval for x86-64 with degree 3-6 polynomials.

Is the subject line and description correct?

libc/src/__support/FPUtil/PolyEval.h
40

Can these be in just one file say, PolyEvalSpecializations.h?

libc/test/src/math/expm1f_test.cpp
111

Do we know why the accuracy dropped? If yes, can we add the reason to the commit description?

lntue updated this revision to Diff 393290.Dec 9 2021, 2:46 PM

[libc] Use intrinsic for x86-64 fma and optimize PolyEval for x86-64 with degree 3 & 5 polynomials.

lntue retitled this revision from [libc] Optimize PolyEval for x86-64 with degree 3-6 polynomials. to [libc] Use intrinsics for x86-64 fma and optimize PolyEval for x86-64 with degree 3 & 5 polynomials..Dec 9 2021, 2:49 PM
lntue edited the summary of this revision. (Show Details)
lntue marked an inline comment as done.
lntue added inline comments.
libc/test/src/math/expm1f_test.cpp
111

Added a possible reason to the patch's summary.

lntue updated this revision to Diff 393293.Dec 9 2021, 2:54 PM

[libc] Use intrinsics for x86-64 fma and optimize PolyEval for x86-64 with degree 3 & 5 polynomials.

sivachandra accepted this revision.Dec 9 2021, 3:25 PM
This revision is now accepted and ready to land.Dec 9 2021, 3:25 PM