This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add math functions to AMD/NVPTX libm
ClosedPublic

Authored by elmcdonough on Jun 9 2023, 7:26 PM.

Details

Summary

Related to D152486. The following functions are included in this revision: acosf, acoshf, asinf, asinhf, atanf, atanhf, ceil, ceilf, copysign, copysignf, cos, cosf, cosh, coshf, exp10f, exp2f, expf, expm1f, fabs, fabsf, fdim, fdimf, floor, floorf, fma, fmaf, fmax, fmaxf, fmin, fminf, fmod, fmodf, frexp, frexpf, hypot, hypotf, ilogb, ilogbf, ldexp, ldexpf, llrint, llrintf, llround, llroundf, pow, and powf.

Diff Detail

Event Timeline

elmcdonough created this revision.Jun 9 2023, 7:26 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 9 2023, 7:26 PM
AntonRydahl accepted this revision.Jun 12 2023, 1:24 PM
This comment was removed by AntonRydahl.
This revision is now accepted and ready to land.Jun 12 2023, 1:24 PM
elmcdonough retitled this revision from [libc] Add F32 inverse trig functions to AMD/NVPTX libm to [libc] Add math functions to AMD/NVPTX libm.
elmcdonough edited the summary of this revision. (Show Details)
elmcdonough added a reviewer: sivachandra.

I have been able to confirm that this patch builds and that each builtin function collapses to an LLVM intrinsic.

That said, when I compile the individual cpp files with llc, I do get an issue related to cos in the 'AMDGPU DAG->DAG Pattern Instruction Selection' pass. Not sure if this is specific to my setup or if cos needs to be a vendor function.

sivachandra added inline comments.Jun 21 2023, 11:10 AM
libc/src/math/gpu/cos.cpp
14 ↗(On Diff #533188)

I am not sure the GPUs would have instructions to do trignometric functions.

libc/src/math/gpu/exp2f.cpp
14 ↗(On Diff #533188)

AMD gpus seem to be having exp2f. Can we assume other GPUs also have them? Even if they do have, what guarantees and promises they come with? The reason I am asking is to understand how we should set up tests for them.

jhuber6 added inline comments.
libc/src/math/gpu/cos.cpp
14 ↗(On Diff #533188)
libc/src/math/gpu/exp2f.cpp
14 ↗(On Diff #533188)

Seems NVPTX does not support it https://godbolt.org/z/a1jGzeo48. Maybe @arsenm can answer the later questions about the properties.

I'm still not sure how to stand up tests for this math on the GPU. We may need to map it against some host implementation and compare within some ULP. We could potentially have the GPU allocate a table like arr = (x, f(x)) for that math function and pass it in, letting the GPU scan it.

arsenm added inline comments.Jun 21 2023, 11:26 AM
libc/src/math/gpu/cos.cpp
14 ↗(On Diff #533188)

They do for f32. llvm.sin.f32 and llvm.cos.f32 expand fast-mathy for AMDGPU and I've been debating dropping support for them because it's just wrong

libc/src/math/gpu/exp2f.cpp
14 ↗(On Diff #533188)

IMO the math intrinsics should all be correctly lowered. We've traditionally expanded a lot of the f32 intrinsics in a fast math type of way. I'm fixing the ones it's reasonable to inline expand (e.g. see D153025, D153024, D153023, D153022). sqrt is also currently wrong, and I have partial patches I'm working on to fix. sqrt f32 and f64 are totally doable.

llvm.pow.f32, llvm.cos.f32 and llvm.sin.f32's correct expansions are so big I don't think it's practical to implement backend expansions for., and we should probably stop supporting the fast expansions we do for them now. The f64 versions for nearly every function (except sqrt) I think is also impractically large to handle without emitting a libcall.

Use vendor for math functions instead of builtins.

This revision now requires review to proceed.Jun 21 2023, 2:16 PM
arsenm added inline comments.Jun 21 2023, 2:32 PM
libc/src/math/gpu/frexp.cpp
19–28

See https://reviews.llvm.org/D149715, this should get a generic builtin

libc/src/math/gpu/frexpf.cpp
19

Same, https://reviews.llvm.org/D149715

Also, I think splitting the files up by FP type is pretty annoying. Many of the functions would benefit from templating

libc/src/math/gpu/vendor/amdgpu/amdgpu.h
28

Will be correctly lowered and should use the builtin after D153025

29

Will be correctly lowered after D153024 and should use the builtin

30

In the abstract sense covered by D153024 but there's currently no defined llvm.exp10. This is asymmetrical and annoying and should be fixed, then the builtin will work

38–39

Should use the new builtin_ldexp/builtin_ldexpf

40–41

__builtin_rint

42–43

__builtin_round

48

pow/powf?

Just realized the current machinery probably doesn't handle the case where AMDGPU has a builtin but NVPTX doesn't very well right now, since we only assume one of the gpu/ or gpu/vendor entrypoints. We could probably just put the __builtin version under gpu/vendor for now and try to make it common later.

A lot of the intrinsics being implemented by @arsenm seem to be porting ocml functions to the backend. I wonder if a better approach for libc specifically would be to port those implementations to C++ so that we can share it between NVPTX and AMDGPU.

A lot of the intrinsics being implemented by @arsenm seem to be porting ocml functions to the backend. I wonder if a better approach for libc specifically would be to port those implementations to C++ so that we can share it between NVPTX and AMDGPU.

If you can actually use the existing C++ implementations or extend/tweak the existing ones to suit GPU architecture, we would not have had to use the builtins and vendor implementations in the first place?

elmcdonough edited the summary of this revision. (Show Details)

Removed long double functions and replaced some vendor functions with in-progress intrinsics. pow, powf, and cosh were also added.

jhuber6 accepted this revision.Jul 11 2023, 10:53 AM
This revision is now accepted and ready to land.Jul 11 2023, 10:53 AM
arsenm added inline comments.Jul 11 2023, 10:57 AM
libc/src/math/gpu/vendor/amdgpu/declarations.h
26

You don't need exp_f32 or exp2_f32 anymore, the generic builtin is now correctly lowered (we're still asymmetrically missing an intrinsic for exp10). You do still need to wrap the f64 versions

35–36

No reason to wrap ldexp, just use the generic builtin

39–42

all rint and round can use the generic builtin

This revision was automatically updated to reflect the committed changes.
jplehr added a subscriber: jplehr.Jul 26 2023, 2:20 AM

Hi, I believe this may have broken the AMDGPU libc on GPU buildbot (that is unfortunately still in staging).
This change came in here (https://lab.llvm.org/staging/#/builders/247/builds/3734 which was broken because of another change), but the libc GPU tests are still failing (https://lab.llvm.org/staging/#/builders/247/builds/3736).

I'm probably a bit inexperienced in the whole libc infrastructure to judge this fully, but this patch would be my best guess.
Happy to help if anything needed.

Yeah this broke both buildbots, it's most likely because we're enabling some extra tests that weren't checked. Whenever a new entrypoint is added it may start building an existing test.