Both nvptx and amdgcn will use the autoheader
openmp_wrappers/__clang_openmp_device_functions.h.
Use the clang headers for math functions. Still need a
libm bitcode file to build the FORTRAN device. The code
in __clang_hip_math.h can beu sed to build a libm
device library for FORTRAN. Add log2 and pow operation
with int arg.
Details
Diff Detail
Event Timeline
clang/lib/Headers/__clang_cuda_math.h | ||
---|---|---|
29 ↗ | (On Diff #298715) | I recognise this from the fortran libs build, but I'm not sure it's applicable to trunk as we don't have any amdgpu/openmp fortran support in trunk. |
clang/lib/Headers/__clang_hip_cmath.h | ||
98 | What are these _i suffix functions for? They aren't in libm. Writing log2(42) usually works without extra overloads | |
clang/lib/Headers/__clang_hip_libdevice_declares.h | ||
60 | I don't think the return type of this library function can depend on whether this header is compiled as c++/openmp or not. ocml is very macro themed, but it looks to me like the declaration there is an instance of extern __attribute__((const)) int OCML_MANGLE_F32(N)(float); which suggests 'int' was right. | |
clang/lib/Headers/__clang_hip_math.h | ||
273–277 | This is a bit weird. __tmp is a stack variable, which I thought would be in address_space(5) already. Declaration is extern float OCML_MANGLE_F32(frexp)(float, __private int *);. So I guess I can see why either of these branches would work, but don't understand why either of them don't. | |
442 | I don't recognise the symbol powii (or powi) from libm | |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
70 | Maybe use this define instead of attribute((address_space(5))) throughout the above? | |
clang/lib/Headers/openmp_wrappers/math.h | ||
30 | Why is <new> needed here? |
Updated based on review comments.
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
98 | okay, I will remove these new functions, and a future patch can update these log2 and pow functions. | |
clang/lib/Headers/__clang_hip_math.h | ||
273–277 | I will update this to use the static version for HIP too. That path works correctly, and the HIP path was dropping the addrspace qualifier. | |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
70 | I will update the HIP and OpenMP to share the same usage of this address space (rather than the 2 paths above). Since HIP doesn't define __private, I may skip this change. |
clang/lib/Headers/__clang_hip_math.h | ||
---|---|---|
273–277 | Using static seems to cause a HIP error that is not observed on OpenMP. Should we change this patch to using #ifdef, or we can wait until this patch lands (it will fix this): |
Patch looks broadly reasonable to me. @jdoerfert does this align with what you want from these headers?
clang/lib/Headers/__clang_hip_math.h | ||
---|---|---|
273–277 | I guess we ifdef until D88345 lands. Have you opened a task for HIP dropping the addrspace qualifier? |
First off, this is the right direction, thanks for working on this!
Then, we need to split this. There are various things happening here and we can easily do them one by one.
Please reach out if you don't know where to split but basically try to make the patches "as small as possible" such that you write a test against them.
So for example, math and complex stuff are separate. New functions are separate, changing the attributes is separate, ...
I added a bunch of comments and tried to be exhaustive, though once we split it in the separate components I will do another review.
Finally, where are the tests? You only added the test/Inputs stuff.
clang/lib/Headers/CMakeLists.txt | ||
---|---|---|
163 | I'm not sold on the hip/hip... idea here but ok. Now that I write this, where is hip_runtime included? | |
clang/lib/Headers/__clang_cuda_complex_builtins.h | ||
103 | What is the difference to the block above? We should also revisit the need for this in the first place. We have template overload support through begin/end declare variant in clang now. Best case, none of this is needed and we simply call the C++ or C standard library functions here. That would be great because they have to resolve to the right nv and ocml functions anyway so why not call them ;) | |
298 | The max stuff you can commit ahead of time. LGTM on that NFC part. | |
clang/lib/Headers/__clang_hip_cmath.h | ||
31 | Nit: Hm, the comment talks about static but we use that either way. | |
clang/lib/Headers/__clang_hip_libdevice_declares.h | ||
308 | This I don't get, why is the signature different between C++ and C? That seems wrong. | |
clang/lib/Headers/__clang_hip_math.h | ||
39 | What is the reason you need constexpr in the first place? | |
51 | So I fought this problem long and hard and in the end realized that this way cannot work. I don't require you to change this (for this commit or in general) but I would urge you to consider it or we will most certainly get bug reports for HIP and OpenMP offload for AMD at some point. | |
65 | This should probably go to the declaration of the macros not their first use. | |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
48 | Neither the ifdef here nor the one above are needed. | |
51 | Please elaborate on this and clean it up if we *really* need it. Why is "optimized macro definitions" bad? | |
70 | I think this is problematic but let's talk about this once the stuff is split off. | |
77 | The extern C looks dubious to me. What in here needs it and why isn't it in the included file then? | |
clang/lib/Headers/openmp_wrappers/cmath | ||
22 | If hip cmath needs new, include it there. cmath can arguably rely on math.h but new seems to be hip specific. | |
79 | We don't need this ifdef. Please check also other ifdefs surrounding begin/end declare variant. | |
90 | Same as the ifdef above, not needed. match_any not needed here. | |
clang/lib/Headers/openmp_wrappers/complex | ||
22 | we forgot to undef CUDA here... | |
25 | This is the wrong spot to include libdevice_declares. | |
48 | This I do not understand. | |
clang/lib/Headers/openmp_wrappers/hip/hip_runtime.h | ||
24 | These pragmas are useless and, FWIW, you don't need the extension if you have a single arch here. undefine the OPENMP_AMDGCN or make sure it is always defined, see also below. | |
clang/lib/Headers/openmp_wrappers/math.h | ||
63 | The extension is not necessary here. It is dubious that we have to define things here without undefining them. |
Hi @jdoerfert , thank you for the review. I will discuss with the team, and start opening smaller patches one at a time.
I'm not sold on the hip/hip... idea here but ok.
I guess I would have put the stuff in __clang_openmp_device_functions instead.
Now that I write this, where is hip_runtime included?