This is an archive of the discontinued LLVM Phabricator instance.

[libc][libm][GPU] Populated libmgpu.a with both built-in, HIP Math, and CUDA Math functions
Needs ReviewPublic

Authored by AntonRydahl on Jul 25 2023, 2:26 PM.

Details

Summary

I made this patch to compare the generic implementations from LIBC with built-in functions, HIP Math, and CUDA math. I think this could potentially go into LIBC. Compared to the previous patches we have iterated over, I have added some very basic CMake logic to be able to prioritize including vendor wrappers, built-in wrappers, or generic functions in the archive. With the CMake flag LIBC_GPU_BUILTIN_MATH one can select whether built-in or generic functions have higher priority. This is a very large patch, so please let me know if it should be split into smaller patches, and if some things should be renamed, for instance the flag LIBC_GPU_BUILTIN_MATH.

We previously discussed whether long double functions should be included or not. I have not included them, but I have added some functions that return long long since I found that the CUDA
math library has support for those functions, for instance __nv_llrint and __nv_llround.

Ideally, I think we should still aim at being able to select between functions on an individual level based on performance and accuracy analysis, but this patch at least allows for collecting
measurements for making such an analysis.

Diff Detail

Event Timeline

AntonRydahl created this revision.Jul 25 2023, 2:26 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 25 2023, 2:26 PM
AntonRydahl requested review of this revision.Jul 25 2023, 2:26 PM
AntonRydahl edited the summary of this revision. (Show Details)Jul 25 2023, 2:34 PM
AntonRydahl edited the summary of this revision. (Show Details)
jdoerfert retitled this revision from Populated libmgpu.a with both built-in, HIP Math, and CUDA Math functions and changed the CMake logic to allow selecting different subsets of the math functions to be included in the archive. to [libc][libm][GPU] Populated libmgpu.a with both built-in, HIP Math, and CUDA Math functions.Jul 25 2023, 2:37 PM

After D152603 was upstreamed it was necessary to rebase this patch. Note that I have updated the vendor header files to always use the vendor functions compared to D152603. However, by adjusting the CMake flags, one can stil use
the built-ins.

Rebased the patch after D153395 was upstreamed and removed built-in functions that did not pass unit tests.

jhuber6 added a subscriber: arsenm.Aug 14 2023, 9:39 AM

@arsenm Do we support things like __builtin_atanf? IIRC all the builtins for things like sin and sinf were kind of broken and shouldn't be used.

@arsenm Do we support things like __builtin_atanf? IIRC all the builtins for things like sin and sinf were kind of broken and shouldn't be used.

No, there's no underlying llvm intrinsic for atan, it's essentially an alias for the libcall only. llvm.sin and llvm.cos do exist but are just codegened in a fast math way only

@arsenm Do we support things like __builtin_atanf? IIRC all the builtins for things like sin and sinf were kind of broken and shouldn't be used.

No, there's no underlying llvm intrinsic for atan, it's essentially an alias for the libcall only. llvm.sin and llvm.cos do exist but are just codegened in a fast math way only

In an ideal world we would define the platform and provide the libm functions through the builtins, but that wouldn't be provided by the backend

I made a mistake in the wrapper for __nv_atan2f which I have corrected in this commit.

AntonRydahl added a comment.EditedAug 14 2023, 10:13 AM

@arsenm Do we support things like __builtin_atanf? IIRC all the builtins for things like sin and sinf were kind of broken and shouldn't be used.

No, there's no underlying llvm intrinsic for atan, it's essentially an alias for the libcall only. llvm.sin and llvm.cos do exist but are just codegened in a fast math way only

In an ideal world we would define the platform and provide the libm functions through the builtins, but that wouldn't be provided by the backend

I think there are actually only 33 of the built-ins in this commit that produce correct results. But I had to add wrappers for them all to test which ones were correct.

AntonRydahl added a comment.EditedAug 14 2023, 10:32 AM

According to my simple tests, the following built-ins produce correct results:

  • __builtin_ceilf
  • __builtin_copysign
  • __builtin_copysignf
  • __builtin_fabs
  • __builtin_fabsf
  • __builtin_floor
  • __builtin_floorf
  • __builtin_fma
  • __builtin_fmaf
  • __builtin_fmax
  • __builtin_fmaxf
  • __builtin_fmin
  • __builtin_fminf
  • __builtin_frexp
  • __builtin_frexpf
  • __builtin_ilogb
  • __builtin_ilogbf
  • __builtin_ldexp
  • __builtin_ldexpf
  • __builtin_logb
  • __builtin_logbf
  • __builtin_nearbyint
  • __builtin_nearbyintf
  • __builtin_nextafter
  • __builtin_nextafterf
  • __builtin_rint
  • __builtin_rintf
  • __builtin_round
  • __builtin_roundf
  • __builtin_sincosf
  • __builtin_sqrt
  • __builtin_trunc
  • __builtin_truncf