I made this patch to compare the generic implementations from LIBC with built-in functions, HIP Math, and CUDA math. I think this could potentially go into LIBC. Compared to the previous patches we have iterated over, I have added some very basic CMake logic to be able to prioritize including vendor wrappers, built-in wrappers, or generic functions in the archive. With the CMake flag LIBC_GPU_BUILTIN_MATH one can select whether built-in or generic functions have higher priority. This is a very large patch, so please let me know if it should be split into smaller patches, and if some things should be renamed, for instance the flag LIBC_GPU_BUILTIN_MATH.
We previously discussed whether long double functions should be included or not. I have not included them, but I have added some functions that return long long since I found that the CUDA
math library has support for those functions, for instance __nv_llrint and __nv_llround.
Ideally, I think we should still aim at being able to select between functions on an individual level based on performance and accuracy analysis, but this patch at least allows for collecting
measurements for making such an analysis.