With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
29 | __DEVICE__ should not imply constexpr. It should be added to each function separately. |
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
29 | iirc rocm does that with a macro called DEVICE_NOCE, perhaps we could go with DEVICE_CONSTEXPR. There's some interaction with overloading rules and different glibc versions, so it would be nice to tag exactly the same functions as constexpr on nvptx and amdgcn | |
clang/lib/Headers/__clang_hip_math.h | ||
29–30 | wonder if HIP would benefit from nothrow here | |
35 | I'd expect openmp to match the cplusplus/c distinction here, as openmp works on C source | |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
113 | i think this should be #define __device__ | |
clang/test/Headers/Inputs/include/cstdlib | ||
13 | I think I'd expect builtin_labs et al to work on amdgcn, are we missing lowering for them? |
Those changes in OpenMP headers LGTM, except #define __device__.
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
---|---|---|
113 | Right because we already have declare variant. |
Really looking forward to this! Thanks a lot!
I left some comments.
clang/lib/Headers/__clang_hip_math.h | ||
---|---|---|
35 | ^ Agreed. Though, we use a different trick because it's unfortunately not as straight forward always and can be decided based on the C vs C++. | |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
132 | Can you make the declare variant scope of nvptx and amdgpu smaller and put them next to each other. #ifdef __cplusplus extern "C" { #endif #declare variant #define ... ... #undef #end #declare variant ... #end #ifdef __cplusplus } // extern "C" | |
clang/lib/Headers/openmp_wrappers/cmath | ||
83 | No match_any needed (here and elsewhere). Also, don't we want all but the includes to be the same for both GPUs. Maybe we have a device(kind(gpu)) variant and inside the nvptx and amdgpu just for the respective include? | |
clang/lib/Headers/openmp_wrappers/math.h | ||
59 | FWIW, This is what I think the begin/end regions should look like. Small and next to each other. | |
clang/test/Headers/Inputs/include/cstdlib | ||
13 | Yeah, looks weird that we cannot compile this mock-up header. |
Addressed review comments.
clang/lib/Headers/__clang_hip_math.h | ||
---|---|---|
29–30 | Would like to keep hip changes minimal in this patch. | |
35 | This is somewhat tricky. Since declaration of __finite/__isnan /__isinff is with int return type in standard library (and the corresponding methods in C++ seems to be isfinite, isnan and isinf with bool return type), the compiler fails to resolve these functions when using bool. I don't know how HIP is working. __RETURN_TYPE macro is only being used with the following methods:
and with the corresponding float versions. | |
clang/lib/Headers/openmp_wrappers/cmath | ||
83 | device(kind(gpu)) breaks nvptx and hip with lots of errors like below, ... __clang_cuda_device_functions.h:29:40: error: use of undeclared identifier '__nvvm_vote_all' ... Maybe I am doing something wrong. | |
clang/test/Headers/Inputs/include/cstdlib | ||
13 | From what I understand, hip is defining fabs to use ocml's version into the std namespace, which was already defined in this header. So that's causing multiple declaration error. I will wrap only fabs in the ifdef's |
A few small comments, otherwise LGTM on the HIP header side.
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
30 | Does OpenMP not require __device__ attribute here? I know constexpr defines __device__ on HIP, does OMP do the same? | |
32 | I don't think this is the right place to define __constant__? It's unused in this header, and may get forgotten. Would it be better to define it in the openmp wrapper or does cmath define it in OpenMP? | |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
47 | Would it be better to push and pop these macros, in case it was defined outside of here? |
OpenMP side looks reasonable.
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
101 | ^ This is how OpenMP resolves the overload issue wrt. different return types. | |
clang/lib/Headers/__clang_hip_math.h | ||
35 | I marked the code above that actually overloads these functions in OpenMP (or better the versions w/o underscores) such that the system can have either version and it should work fine. | |
clang/test/Headers/Inputs/include/cstdlib | ||
25 | That seems to be fundamentally broken then, but let's see, maybe it will somehow work anyway. |
clang/test/Headers/Inputs/include/cstdlib | ||
---|---|---|
25 | I thought fabs was in math, not stdlib. Not sure what this file is doing but the functions above are inline and fabs isn't |
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
43 | I'm pretty sure it's UB, no diagnostic req to call a non-constexpr function from a constexpr one. Nevertheless, it does presently seem to be working for nvptx clang so will probably work for amdgcn. Strictly I think we're supposed to detect when called at compile time and call one implementation and otherwise call the library one, where weirdly it's ok for them to return different answers. I think there's a built-in we can call to select between the two. |
I recommend we ship this and fix up the rough edges as we run into them. Paired with ocml it passes OVO libm tests which seems to be a fairly high bar for 'does it work'. Therefore tagging green. Any objections?
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
101 | I tried the exact same way. The lit tests compile and run fine. I could not get the runtime tests compile without the errors. It might be that I am not using match patterns correctly. I also tried some other combinations of the match selector but none of them worked. |
@ronlieb reports that this change means CUDA is defined for openmp amdgcn compilation. I'm going to try to verify that
[AMD Official Use Only]
It maybe that the patch does not expose CUDA directly.
Rather the patch works so well we finally see the pre-existing issue in complex.h
./clang/lib/Headers/openmp_wrappers/complex.h
clang/lib/Headers/__clang_hip_cmath.h | ||
---|---|---|
101 | Not sure what to say. If we want it to work in the wild, I doubt there is much we can do but to make this work. Not sure what your errors were or why they were caused, I'd recommend to determine that instead of punting and hoping nobody will run into this. |
Good spot. I've been feeding the following to various toolchains:
// permute //#include <math.h> //#include <cmath> //#include <complex.h> #ifndef _OPENMP #error "OpenMP should be defined #endif #ifdef __CUDA__ #error "Cuda should not be defined" #endif #ifdef __HIP__ #error "HIP should not be defined" #endif int main() {}
Currently, trunk passes that with any of the headers uncommented. Rocm is ok for math and cmath, but defines CUDA for complex.
Weird pre-existing stuff in cuda_complex_builtins. It has an #ifdef AMDGCN macro in it, despite 'cuda' in the name. I note there is no corresponding 'hip' complex builtins.
The ifdef logic for stubbing out some functions (which is done with macros...) isn't ideal, it's:
#if !defined (_OPENMP_NVPTX) // use std:: functions from cmath.h, which isn't included, though math.h is included from openmp before it #else #ifdef __AMDGCN__ // use ocml functions #else // use nv functions #end #end
None of this uses #define __CUDA__ so we could drop that from the openmp wrapper. Or, as far as I can tell, we could drop all the macro obfuscation in that header and just call the libm functions directly, which will already resolve to the appropriate platform specific thing.
Instead of revising that as part of this patch, how about we wrap the openmp_wrappers/complex.h logic in #ifndef __AMDGCN__, which will cut it from the graph for openmp while leaving nvptx openmp untouched?
tagged request changes because I think we should ifdef around complex before (or while) landing this, as defining __CUDA__, even transiently, is a user hostile thing to do from amdgpu openmp
It is *really* ugly that we have cuda and hip implementations of cmath. Opening them in diff it looks very likely that the hip one was created by copying and pasting the cuda one then hacking on it a bit. This means we have openmp specific fixes already done in the cuda one and VS2019 workarounds in the hip one. It also means there are a bunch of differences that might be important or might be spurious, like whether a function calls ::scalbln or std::scalbln. This is particularly frustrating because we should be able isolate essentially all the differences between nv and ocml functions in math.h.
clang/lib/Headers/openmp_wrappers/cmath | ||
---|---|---|
30 | this declare variant will not match amdgcn | |
43 | which means that amdgcn is not going to pick up any of these overloads, but that looks like it's actually OK because clang_hip_cmath does define them (I think, there are a lot of macros involved) | |
clang/lib/Headers/openmp_wrappers/math.h | ||
41 | @jdoerfert do you know why we have match_any here? wondering if the amdgcn variant below should have the same |
clang/lib/Headers/openmp_wrappers/math.h | ||
---|---|---|
41 | Because nvptx and nvptx64 share the same implementation. They just emit different IRs. If AMDGCN only has one architecture, it doesn't need to use match_any. |
clang/lib/Headers/openmp_wrappers/math.h | ||
---|---|---|
41 | ah, right - thank you! |
clang/test/Headers/Inputs/include/cstdlib | ||
---|---|---|
25 | I am afraid this is just a workaround to get openmp_device_math_isnan.cpp to pass for AMDGCN. This stems from not having an #ifndef OPENMP_AMDGCN in __clang_hip_cmath.h where 'using ::fabs' is present. Currently, OPENMP_AMDGCN uses all of the overloaded functions created by the HIP macros where NVPTX does not use the CUDA overloads. This may be a new topic of discussion. https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_cmath.h#L191 By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788. |
clang/test/Headers/Inputs/include/cstdlib | ||
---|---|---|
25 |
Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot. |
clang/test/Headers/Inputs/include/cstdlib | ||
---|---|---|
25 | TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause. The list of functions near the top of __clang_cuda_cmath.h is a subset of libm, e.g. __DEVICE__ float acos(float __x) { return ::acosf(__x); } but no acosh Later on in the file are: __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh) but these are guarded by #ifndef __OPENMP_NVPTX__, which suggests they are not included when using the header from openmp. However, openmp_wrappers/cmath does include __DEVICE__ float acosh(float __x) { return ::acoshf(__x); } under the comment Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header. My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor. Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks. |
LG from my side
clang/test/Headers/Inputs/include/cstdlib | ||
---|---|---|
25 |
@estewart08 Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that. |
clang/test/Headers/Inputs/include/cstdlib | ||
---|---|---|
25 |
| |
25 |
At this point all of the functions I have tried for nvptx did not show an error. It was unclear to me how the device versions of certain overloaded functions were being resolved. As Jon mentioned above, it is a mix of headers that range from clang, openmp, cuda, and system headers. For now, I will retract my statement and if I run into any problems in the future I will point them out. |
Landed on @pdhaliwal's behalf. My expectation is that this patch mostly works and the rough edges can be cleaned up once ocml is linked in and we can more easily run more applications through it.
My local build failed due to regression failures. clang/test/Headers/openmp_device_math_isnan.cpp failed with the following errors on undeclared fabs.
1950 /home/michliao/working/llvm/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: use of undeclared identifier 'fabs'
1951 float abs(float x) { return fabs(x); }
1952 ^
1953 /home/michliao/working/llvm/llvm-project/clang/test/Headers/Inputs/include/cstdlib:30:33: error: use of undeclared identifier 'fabs'
1954 double abs(double x) { return fabs(x); }
1955 ^
1956 2 errors generated.
This breaks tests: http://45.33.8.238/linux/51733/step_7.txt
Please take a look and revert for now if it takes a while to fix.
Thanks! Will take a look. Feel free to revert, I'll do so shortly if noone beats me to it
cstdlib test header contains
// amdgcn already provides definition of fabs #ifndef __AMDGCN__ float fabs(float __x) { return __builtin_fabs(__x); } #endif
If I delete or invert the ifndef
$HOME/llvm-build/llvm/lib/clang/13.0.0/include/__clang_hip_cmath.h:660:9: error: target of using declaration conflicts with declaration already in scope
using ::fabs;
when included from openmp_wrappers/cmath
If I delete the definition,
$HOME/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: use of undeclared identifier 'fabs'
when included from openmp_wrappers/__clang_openmp_device_functions.h
Current conclusion is that we cannot work around the presence/absence of fabs in the cstdlib test file, we have to do something in the real headers such that the test file does the right thing
We are working on some additions to this patch. The lit failure noted above has been fixed locally. I would expect an update here very soon.
Landing ocml side first seems reasonable as it's less likely to be broken and makes testing this more straightforward
It required some work to fix the failing lit test case. And many thanks to
@estewart for helping in that.
The current status is that we are now following the nvptx openmp strategy for
openmp math headers very closely. In this version of patch, there are bunch
of HIP cmath overloads which are disabled for AMDGPU openmp similar to nvptx.
This fixed the lit failure, but a large number of tests started failing in OvO.,
Reason being that there were some overloads which were used in the suite but
were disabled earlier. In order to fix them, we had added definitions in the
openmp_wrappers/cmath for the missing overloads. With these changes, OvO compiles 100% of the
mathematical_function test suite successfully. There are still 6/177 tests in
the suite which are producing wrong result.
Now my suggestion is to land this patch as it is and fix the remaining 6 tests
in a later patch.
Unforuantely I hit error on my ubuntu 20.04 system.
#include <complex> int main() { }
~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 main.cpp -c
works fine
~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp
$ ~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 --libomptarget-nvptx-bc-path=/home/yeluo/opt/llvm-clang/build_mirror_offload_main/runtimes/runtimes-bins/openmp/libomptarget/libomptarget-nvptx-sm_80.bc main.cpp -c --std=c++14 clang-14: warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11040. Assuming the latest supported version 10.1 [-Wunknown-cuda-version] In file included from main.cpp:2: In file included from /home/yeluo/opt/llvm-clang/build_mirror_offload_main/lib/clang/14.0.0/include/openmp_wrappers/complex:26: In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/complex:45: In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/sstream:38: In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/istream:38: In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ios:38: In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/iosfwd:39: /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:73:30: error: no template named 'allocator' typename _Alloc = allocator<_CharT> > ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:73:21: error: template parameter missing a default argument typename _Alloc = allocator<_CharT> > ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:72:48: note: previous default template argument defined here template<typename _CharT, typename _Traits = char_traits<_CharT>, ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:79:11: error: too few template arguments for class template 'basic_string' typedef basic_string<char> string; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here class basic_string; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:83:11: error: too few template arguments for class template 'basic_string' typedef basic_string<wchar_t> wstring; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here class basic_string; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:93:11: error: too few template arguments for class template 'basic_string' typedef basic_string<char16_t> u16string; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here class basic_string; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:96:11: error: too few template arguments for class template 'basic_string' typedef basic_string<char32_t> u32string; ^ /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here class basic_string; ^
Manually add <memory> in main.cpp workaround the issue but need a fix.
This patch didn't change complex so I'm struggling to make sense of the backtrace. Something in libstdc++ needs memory but doesn't include it?
Given that ^ in fail.cpp and an invocation on a machine that doesn't have cuda or an nvidia card (it had built nvptx devicertl, probably doesn't matter for this)
$HOME/llvm-install/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 fail.cpp -nocudalib
I also get a failure. This time on libstdc++ 10, various failures, starting from
In file included from /home/amd/llvm-install/lib/clang/14.0.0/include/openmp_wrappers/complex:26: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/complex:45: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/sstream:38: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/istream:38: In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ios:40: /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/char_traits.h:216:7: error: no member named 'copy' in namespace 'std'; did you mean simply 'copy'? std::copy(__s2, __s2 + __n, __s1); ^~~~~
Reverting replaces that error with the expected "ptxas" doesn't exist. Therefore reverting this change, will reapply once the unexpected change of behaviour on nvptx is understood and avoided.
@ye-luo and @JonChesterfield can you please test the latest version of this patch? It should work now.
clang/lib/Headers/openmp_wrappers/math.h | ||
---|---|---|
53 | That's quite worrying. Declare variant match amdgcn is supposed to have the same effect here as the older style macro. I wonder if we have any test coverage for whether declare variant works for amdgcn. |
Latest patch can only misfire on amdgpu so lets go with it and try to work out variant vs ifdef subsequently.
(edit: Adding the ifdef around the declare variant, though I still think it should be a no-op, does indeed fix the above failure for nvptx)
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
---|---|---|
38 | Given that declare variant didn't work elsewhere, it probably doesn't work here. Thus this may be the root cause of https://bugs.llvm.org/show_bug.cgi?id=51337 |
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h | ||
---|---|---|
38 | Was able to reproduce this issue locally on nvptx machine. And you are right, declare variant didn't work here as well. Wrapping it in #ifdef fixed the issue. I will create a fix. |
__DEVICE__ should not imply constexpr. It should be added to each function separately.