Related to D152486. The following functions are included in this revision: acosf, acoshf, asinf, asinhf, atanf, atanhf, ceil, ceilf, copysign, copysignf, cos, cosf, cosh, coshf, exp10f, exp2f, expf, expm1f, fabs, fabsf, fdim, fdimf, floor, floorf, fma, fmaf, fmax, fmaxf, fmin, fminf, fmod, fmodf, frexp, frexpf, hypot, hypotf, ilogb, ilogbf, ldexp, ldexpf, llrint, llrintf, llround, llroundf, pow, and powf.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I have been able to confirm that this patch builds and that each builtin function collapses to an LLVM intrinsic.
That said, when I compile the individual cpp files with llc, I do get an issue related to cos in the 'AMDGPU DAG->DAG Pattern Instruction Selection' pass. Not sure if this is specific to my setup or if cos needs to be a vendor function.
libc/src/math/gpu/cos.cpp | ||
---|---|---|
14 ↗ | (On Diff #533188) | I am not sure the GPUs would have instructions to do trignometric functions. |
libc/src/math/gpu/exp2f.cpp | ||
14 ↗ | (On Diff #533188) | AMD gpus seem to be having exp2f. Can we assume other GPUs also have them? Even if they do have, what guarantees and promises they come with? The reason I am asking is to understand how we should set up tests for them. |
libc/src/math/gpu/cos.cpp | ||
---|---|---|
14 ↗ | (On Diff #533188) | They do not, https://godbolt.org/z/zTWY168a3. |
libc/src/math/gpu/exp2f.cpp | ||
14 ↗ | (On Diff #533188) | Seems NVPTX does not support it https://godbolt.org/z/a1jGzeo48. Maybe @arsenm can answer the later questions about the properties. I'm still not sure how to stand up tests for this math on the GPU. We may need to map it against some host implementation and compare within some ULP. We could potentially have the GPU allocate a table like arr = (x, f(x)) for that math function and pass it in, letting the GPU scan it. |
libc/src/math/gpu/cos.cpp | ||
---|---|---|
14 ↗ | (On Diff #533188) | They do for f32. llvm.sin.f32 and llvm.cos.f32 expand fast-mathy for AMDGPU and I've been debating dropping support for them because it's just wrong |
libc/src/math/gpu/exp2f.cpp | ||
14 ↗ | (On Diff #533188) | IMO the math intrinsics should all be correctly lowered. We've traditionally expanded a lot of the f32 intrinsics in a fast math type of way. I'm fixing the ones it's reasonable to inline expand (e.g. see D153025, D153024, D153023, D153022). sqrt is also currently wrong, and I have partial patches I'm working on to fix. sqrt f32 and f64 are totally doable. llvm.pow.f32, llvm.cos.f32 and llvm.sin.f32's correct expansions are so big I don't think it's practical to implement backend expansions for., and we should probably stop supporting the fast expansions we do for them now. The f64 versions for nearly every function (except sqrt) I think is also impractically large to handle without emitting a libcall. |
libc/src/math/gpu/frexp.cpp | ||
---|---|---|
19–28 | See https://reviews.llvm.org/D149715, this should get a generic builtin | |
libc/src/math/gpu/frexpf.cpp | ||
19 | Same, https://reviews.llvm.org/D149715 Also, I think splitting the files up by FP type is pretty annoying. Many of the functions would benefit from templating | |
libc/src/math/gpu/vendor/amdgpu/amdgpu.h | ||
28 | Will be correctly lowered and should use the builtin after D153025 | |
29 | Will be correctly lowered after D153024 and should use the builtin | |
30 | In the abstract sense covered by D153024 but there's currently no defined llvm.exp10. This is asymmetrical and annoying and should be fixed, then the builtin will work | |
38–39 | Should use the new builtin_ldexp/builtin_ldexpf | |
40–41 | __builtin_rint | |
42–43 | __builtin_round | |
45 | pow/powf? |
Just realized the current machinery probably doesn't handle the case where AMDGPU has a builtin but NVPTX doesn't very well right now, since we only assume one of the gpu/ or gpu/vendor entrypoints. We could probably just put the __builtin version under gpu/vendor for now and try to make it common later.
A lot of the intrinsics being implemented by @arsenm seem to be porting ocml functions to the backend. I wonder if a better approach for libc specifically would be to port those implementations to C++ so that we can share it between NVPTX and AMDGPU.
If you can actually use the existing C++ implementations or extend/tweak the existing ones to suit GPU architecture, we would not have had to use the builtins and vendor implementations in the first place?
Removed long double functions and replaced some vendor functions with in-progress intrinsics. pow, powf, and cosh were also added.
libc/src/math/gpu/vendor/amdgpu/declarations.h | ||
---|---|---|
26 | You don't need exp_f32 or exp2_f32 anymore, the generic builtin is now correctly lowered (we're still asymmetrically missing an intrinsic for exp10). You do still need to wrap the f64 versions | |
35–36 | No reason to wrap ldexp, just use the generic builtin | |
39–42 | all rint and round can use the generic builtin |
Hi, I believe this may have broken the AMDGPU libc on GPU buildbot (that is unfortunately still in staging).
This change came in here (https://lab.llvm.org/staging/#/builders/247/builds/3734 which was broken because of another change), but the libc GPU tests are still failing (https://lab.llvm.org/staging/#/builders/247/builds/3736).
I'm probably a bit inexperienced in the whole libc infrastructure to judge this fully, but this patch would be my best guess.
Happy to help if anything needed.
Yeah this broke both buildbots, it's most likely because we're enabling some extra tests that weren't checked. Whenever a new entrypoint is added it may start building an existing test.
See https://reviews.llvm.org/D149715, this should get a generic builtin