Page MenuHomePhabricator

[OpenMP][AMDGCN] Initial math headers support
ClosedPublic

Authored by pdhaliwal on Jun 25 2021, 2:58 AM.

Details

Summary

With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
pdhaliwal updated this revision to Diff 354913.Jun 28 2021, 8:37 AM
pdhaliwal marked 2 inline comments as done.
  • Move constant to openmp_wrappers/cmath
  • Using push/pop_macro to avoid redefinition
clang/lib/Headers/__clang_hip_cmath.h
30

It does not as these methods are inside declare variant.

32

It is being used. However, I have moved it to openmp_wrappers/cmath.

OpenMP side looks reasonable.

clang/lib/Headers/__clang_hip_cmath.h
101

^ This is how OpenMP resolves the overload issue wrt. different return types.

clang/lib/Headers/__clang_hip_math.h
35

I marked the code above that actually overloads these functions in OpenMP (or better the versions w/o underscores) such that the system can have either version and it should work fine.

clang/test/Headers/Inputs/include/cstdlib
25

That seems to be fundamentally broken then, but let's see, maybe it will somehow work anyway.

clang/test/Headers/Inputs/include/cstdlib
25

I thought fabs was in math, not stdlib. Not sure what this file is doing but the functions above are inline and fabs isn't

clang/lib/Headers/__clang_hip_cmath.h
43

I'm pretty sure it's UB, no diagnostic req to call a non-constexpr function from a constexpr one. Nevertheless, it does presently seem to be working for nvptx clang so will probably work for amdgcn.

Strictly I think we're supposed to detect when called at compile time and call one implementation and otherwise call the library one, where weirdly it's ok for them to return different answers. I think there's a built-in we can call to select between the two.

JonChesterfield accepted this revision.Jun 30 2021, 3:31 AM

I recommend we ship this and fix up the rough edges as we run into them. Paired with ocml it passes OVO libm tests which seems to be a fairly high bar for 'does it work'. Therefore tagging green. Any objections?

This revision is now accepted and ready to land.Jun 30 2021, 3:31 AM
pdhaliwal added inline comments.Jun 30 2021, 3:59 AM
clang/lib/Headers/__clang_hip_cmath.h
101

I tried the exact same way. The lit tests compile and run fine. I could not get the runtime tests compile without the errors. It might be that I am not using match patterns correctly. I also tried some other combinations of the match selector but none of them worked.

scchan accepted this revision.Jun 30 2021, 7:59 AM

@ronlieb reports that this change means CUDA is defined for openmp amdgcn compilation. I'm going to try to verify that

[AMD Official Use Only]

It maybe that the patch does not expose CUDA directly.
Rather the patch works so well we finally see the pre-existing issue in complex.h

./clang/lib/Headers/openmp_wrappers/complex.h

jdoerfert added inline comments.Jun 30 2021, 10:03 AM
clang/lib/Headers/__clang_hip_cmath.h
101

Not sure what to say. If we want it to work in the wild, I doubt there is much we can do but to make this work. Not sure what your errors were or why they were caused, I'd recommend to determine that instead of punting and hoping nobody will run into this.

Good spot. I've been feeding the following to various toolchains:

// permute
//#include <math.h>
//#include <cmath>
//#include <complex.h>

#ifndef _OPENMP
#error "OpenMP should be defined
#endif

#ifdef __CUDA__
#error "Cuda should not be defined"
#endif

#ifdef __HIP__
#error "HIP should not be defined"
#endif

int main() {}

Currently, trunk passes that with any of the headers uncommented. Rocm is ok for math and cmath, but defines CUDA for complex.

Weird pre-existing stuff in cuda_complex_builtins. It has an #ifdef AMDGCN macro in it, despite 'cuda' in the name. I note there is no corresponding 'hip' complex builtins.

The ifdef logic for stubbing out some functions (which is done with macros...) isn't ideal, it's:

#if !defined (_OPENMP_NVPTX)
// use std:: functions from cmath.h, which isn't included, though math.h is included from openmp before it
#else 
#ifdef __AMDGCN__
// use ocml functions
#else
// use nv functions
#end
#end

None of this uses #define __CUDA__ so we could drop that from the openmp wrapper. Or, as far as I can tell, we could drop all the macro obfuscation in that header and just call the libm functions directly, which will already resolve to the appropriate platform specific thing.

Instead of revising that as part of this patch, how about we wrap the openmp_wrappers/complex.h logic in #ifndef __AMDGCN__, which will cut it from the graph for openmp while leaving nvptx openmp untouched?

JonChesterfield requested changes to this revision.Jun 30 2021, 10:25 AM
This revision now requires changes to proceed.Jun 30 2021, 10:25 AM
JonChesterfield added a comment.EditedJun 30 2021, 10:39 AM

tagged request changes because I think we should ifdef around complex before (or while) landing this, as defining __CUDA__, even transiently, is a user hostile thing to do from amdgpu openmp

It is *really* ugly that we have cuda and hip implementations of cmath. Opening them in diff it looks very likely that the hip one was created by copying and pasting the cuda one then hacking on it a bit. This means we have openmp specific fixes already done in the cuda one and VS2019 workarounds in the hip one. It also means there are a bunch of differences that might be important or might be spurious, like whether a function calls ::scalbln or std::scalbln. This is particularly frustrating because we should be able isolate essentially all the differences between nv and ocml functions in math.h.

clang/lib/Headers/openmp_wrappers/cmath
30

this declare variant will not match amdgcn

43

which means that amdgcn is not going to pick up any of these overloads, but that looks like it's actually OK because clang_hip_cmath does define them (I think, there are a lot of macros involved)

clang/lib/Headers/openmp_wrappers/math.h
41

@jdoerfert do you know why we have match_any here? wondering if the amdgcn variant below should have the same

clang/lib/Headers/openmp_wrappers/math.h
41

Because nvptx and nvptx64 share the same implementation. They just emit different IRs. If AMDGCN only has one architecture, it doesn't need to use match_any.

clang/lib/Headers/openmp_wrappers/math.h
41

ah, right - thank you!

estewart08 added inline comments.Jul 9 2021, 12:10 PM
clang/test/Headers/Inputs/include/cstdlib
25

I am afraid this is just a workaround to get openmp_device_math_isnan.cpp to pass for AMDGCN. This stems from not having an #ifndef OPENMP_AMDGCN in __clang_hip_cmath.h where 'using ::fabs' is present. Currently, OPENMP_AMDGCN uses all of the overloaded functions created by the HIP macros where NVPTX does not use the CUDA overloads. This may be a new topic of discussion.

https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_cmath.h#L191

By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788.

jdoerfert added inline comments.Jul 9 2021, 3:37 PM
clang/test/Headers/Inputs/include/cstdlib
25

By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788.

Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot.

JonChesterfield added inline comments.Jul 9 2021, 4:34 PM
clang/test/Headers/Inputs/include/cstdlib
25

TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause.

The list of functions near the top of __clang_cuda_cmath.h is a subset of libm, e.g.

__DEVICE__ float acos(float __x) { return ::acosf(__x); }
but no acosh

Later on in the file are:

__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh)

but these are guarded by #ifndef __OPENMP_NVPTX__, which suggests they are not included when using the header from openmp.

However, openmp_wrappers/cmath does include __DEVICE__ float acosh(float __x) { return ::acoshf(__x); } under the comment
// Overloads not provided by the CUDA wrappers by by the CUDA system headers

Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header.

My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor.

Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks.

Only revision I'm looking for here is to land D105221 or equivalent first

jdoerfert accepted this revision.Jul 16 2021, 1:04 PM

LG from my side

clang/test/Headers/Inputs/include/cstdlib
25
>  By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788.

Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot.

@estewart08 Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that.

JonChesterfield accepted this revision.EditedJul 20 2021, 12:56 AM

D105221 landed so LGTM too

This revision is now accepted and ready to land.Jul 20 2021, 12:56 AM
estewart08 added inline comments.Jul 21 2021, 7:01 AM
clang/test/Headers/Inputs/include/cstdlib
25

TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause.

The list of functions near the top of __clang_cuda_cmath.h is a subset of libm, e.g.

__DEVICE__ float acos(float __x) { return ::acosf(__x); }
but no acosh

Later on in the file are:

__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh)

but these are guarded by #ifndef __OPENMP_NVPTX__, which suggests they are not included when using the header from openmp.

However, openmp_wrappers/cmath does include __DEVICE__ float acosh(float __x) { return ::acoshf(__x); } under the comment
// Overloads not provided by the CUDA wrappers by by the CUDA system headers

Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header.

My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor.

Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks.

25

Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that.

At this point all of the functions I have tried for nvptx did not show an error. It was unclear to me how the device versions of certain overloaded functions were being resolved. As Jon mentioned above, it is a mix of headers that range from clang, openmp, cuda, and system headers. For now, I will retract my statement and if I run into any problems in the future I will point them out.

JonChesterfield updated this revision to Diff 360447.EditedJul 21 2021, 7:22 AM
  • rebase on main, builds OK
This revision was landed with ongoing or failed builds.Jul 21 2021, 8:16 AM
This revision was automatically updated to reflect the committed changes.

Landed on @pdhaliwal's behalf. My expectation is that this patch mostly works and the rough edges can be cleaned up once ocml is linked in and we can more easily run more applications through it.

hliao added a subscriber: hliao.Jul 21 2021, 9:12 AM

My local build failed due to regression failures. clang/test/Headers/openmp_device_math_isnan.cpp failed with the following errors on undeclared fabs.

1950 /home/michliao/working/llvm/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: use of undeclared identifier 'fabs'
1951 float abs(float x) { return fabs(x); }
1952 ^
1953 /home/michliao/working/llvm/llvm-project/clang/test/Headers/Inputs/include/cstdlib:30:33: error: use of undeclared identifier 'fabs'
1954 double abs(double x) { return fabs(x); }
1955 ^
1956 2 errors generated.

This breaks tests: http://45.33.8.238/linux/51733/step_7.txt

Please take a look and revert for now if it takes a while to fix.

Thanks! Will take a look. Feel free to revert, I'll do so shortly if noone beats me to it

cstdlib test header contains

// amdgcn already provides definition of fabs
#ifndef __AMDGCN__
float fabs(float __x) { return __builtin_fabs(__x); }
#endif

If I delete or invert the ifndef

$HOME/llvm-build/llvm/lib/clang/13.0.0/include/__clang_hip_cmath.h:660:9: error: target of using declaration conflicts with declaration already in scope
using ::fabs;
when included from openmp_wrappers/cmath

If I delete the definition,

$HOME/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: use of undeclared identifier 'fabs'
when included from openmp_wrappers/__clang_openmp_device_functions.h

Current conclusion is that we cannot work around the presence/absence of fabs in the cstdlib test file, we have to do something in the real headers such that the test file does the right thing

how to get this moving?

how to get this moving?

We are working on some additions to this patch. The lit failure noted above has been fixed locally. I would expect an update here very soon.

JonChesterfield reopened this revision.Jul 29 2021, 10:48 AM

Landing ocml side first seems reasonable as it's less likely to be broken and makes testing this more straightforward

This revision is now accepted and ready to land.Jul 29 2021, 10:48 AM
pdhaliwal updated this revision to Diff 362997.Jul 30 2021, 2:10 AM

It required some work to fix the failing lit test case. And many thanks to
@estewart for helping in that.

The current status is that we are now following the nvptx openmp strategy for
openmp math headers very closely. In this version of patch, there are bunch
of HIP cmath overloads which are disabled for AMDGPU openmp similar to nvptx.
This fixed the lit failure, but a large number of tests started failing in OvO.,
Reason being that there were some overloads which were used in the suite but
were disabled earlier. In order to fix them, we had added definitions in the
openmp_wrappers/cmath for the missing overloads. With these changes, OvO compiles 100% of the
mathematical_function test suite successfully. There are still 6/177 tests in
the suite which are producing wrong result.

Now my suggestion is to land this patch as it is and fix the remaining 6 tests
in a later patch.

JonChesterfield accepted this revision.Jul 30 2021, 7:11 AM
JonChesterfield added inline comments.
clang/lib/Headers/openmp_wrappers/cmath
113

let's not add in commented out code

clang/test/Headers/Inputs/include/cstdlib
24

drop comment?

pdhaliwal updated this revision to Diff 363090.Jul 30 2021, 7:51 AM

Addressed review comments.

This revision was landed with ongoing or failed builds.Jul 30 2021, 7:53 AM
This revision was automatically updated to reflect the committed changes.
ye-luo added a comment.EditedJul 30 2021, 9:49 AM

Unforuantely I hit error on my ubuntu 20.04 system.

#include <complex>
int main()
{ }

~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 main.cpp -c
works fine

~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp

$ ~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 --libomptarget-nvptx-bc-path=/home/yeluo/opt/llvm-clang/build_mirror_offload_main/runtimes/runtimes-bins/openmp/libomptarget/libomptarget-nvptx-sm_80.bc main.cpp -c --std=c++14
clang-14: warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11040. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
In file included from main.cpp:2:
In file included from /home/yeluo/opt/llvm-clang/build_mirror_offload_main/lib/clang/14.0.0/include/openmp_wrappers/complex:26:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/complex:45:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/sstream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/istream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ios:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/iosfwd:39:
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:73:30: error: no template named 'allocator'
           typename _Alloc = allocator<_CharT> >
                             ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:73:21: error: template parameter missing a default argument
           typename _Alloc = allocator<_CharT> >
                    ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:72:48: note: previous default template argument defined here
  template<typename _CharT, typename _Traits = char_traits<_CharT>,
                                               ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:79:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<char>    string;   
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:83:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<wchar_t> wstring;   
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:93:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<char16_t> u16string; 
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:96:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<char32_t> u32string; 
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^

Manually add <memory> in main.cpp workaround the issue but need a fix.

This patch didn't change complex so I'm struggling to make sense of the backtrace. Something in libstdc++ needs memory but doesn't include it?

Unforuantely I hit error on my ubuntu 20.04 system.

#include <complex>
int main()
{ }

Given that ^ in fail.cpp and an invocation on a machine that doesn't have cuda or an nvidia card (it had built nvptx devicertl, probably doesn't matter for this)
$HOME/llvm-install/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 fail.cpp -nocudalib

I also get a failure. This time on libstdc++ 10, various failures, starting from

In file included from /home/amd/llvm-install/lib/clang/14.0.0/include/openmp_wrappers/complex:26:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/complex:45:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/sstream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/istream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ios:40:
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/char_traits.h:216:7: error: no member named 'copy' in namespace 'std'; did you mean
      simply 'copy'?
      std::copy(__s2, __s2 + __n, __s1);
      ^~~~~

Reverting replaces that error with the expected "ptxas" doesn't exist. Therefore reverting this change, will reapply once the unexpected change of behaviour on nvptx is understood and avoided.

JonChesterfield reopened this revision.Jul 30 2021, 2:07 PM
This revision is now accepted and ready to land.Jul 30 2021, 2:07 PM
pdhaliwal updated this revision to Diff 363387.Aug 2 2021, 12:20 AM

Fixed compilation error for nvptx headers. Tested on both cuda and non-cuda systems.

@ye-luo and @JonChesterfield can you please test the latest version of this patch? It should work now.

JonChesterfield added inline comments.Aug 2 2021, 7:15 AM
clang/lib/Headers/openmp_wrappers/math.h
53

That's quite worrying. Declare variant match amdgcn is supposed to have the same effect here as the older style macro. I wonder if we have any test coverage for whether declare variant works for amdgcn.

JonChesterfield accepted this revision.EditedAug 2 2021, 7:36 AM

Latest patch can only misfire on amdgpu so lets go with it and try to work out variant vs ifdef subsequently.

(edit: Adding the ifdef around the declare variant, though I still think it should be a no-op, does indeed fix the above failure for nvptx)

This revision was landed with ongoing or failed builds.Aug 2 2021, 7:39 AM
This revision was automatically updated to reflect the committed changes.
JonChesterfield added inline comments.Aug 4 2021, 7:22 AM
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
38

Given that declare variant didn't work elsewhere, it probably doesn't work here. Thus this may be the root cause of https://bugs.llvm.org/show_bug.cgi?id=51337

pdhaliwal added inline comments.Aug 4 2021, 8:11 AM
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
38

Was able to reproduce this issue locally on nvptx machine. And you are right, declare variant didn't work here as well. Wrapping it in #ifdef fixed the issue. I will create a fix.