This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/
-
Driver/ToolChains/
-
ToolChains/
-
Clang.cpp
-
Headers/
4/10
__clang_hip_cmath.h
2/6
__clang_hip_math.h
-
openmp_wrappers/
4/6
__clang_openmp_device_functions.h
1/5
cmath
5
math.h
-
test/Headers/
-
Headers/
-
Inputs/include/
-
include/
-
algorithm
1/12
cstdlib
-
utility
-
amdgcn_openmp_device_math.c
-
openmp_device_math_isnan.cpp

Differential D104904

[OpenMP][AMDGCN] Initial math headers support
ClosedPublic

Authored by pdhaliwal on Jun 25 2021, 2:58 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
ye-luo
ronlieb
gregrodgers
jdoerfert
yaxunl
scchan
b-sumner
jhuber6
tianshilei1992

Commits

rG713a5d12cde5: [OpenMP][AMDGCN] Initial math headers support
rG12da97ea10a9: [OpenMP][AMDGCN] Initial math headers support
rG968899ad9cf1: [OpenMP][AMDGCN] Initial math headers support

Summary

With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pdhaliwal created this revision.Jun 25 2021, 2:58 AM

Herald added subscribers: guansong, yaxunl, jvesely. · View Herald TranscriptJun 25 2021, 2:58 AM

pdhaliwal requested review of this revision.Jun 25 2021, 2:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2021, 2:58 AM

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

pdhaliwal added reviewers: yaxunl, scchan.Jun 25 2021, 3:00 AM

@estewart08 @ashi1 please review

Harbormaster completed remote builds in B110965: Diff 354456.Jun 25 2021, 3:33 AM

Fix format errors

scchan added a reviewer: b-sumner.Jun 25 2021, 5:41 AM

Harbormaster completed remote builds in B110975: Diff 354471.Jun 25 2021, 5:44 AM

LGTM for HIP header changes. Pls make sure it passes internal CI (ePSDB).

scchan added inline comments.Jun 25 2021, 6:50 AM

clang/lib/Headers/__clang_hip_cmath.h
29	`__DEVICE__` should not imply constexpr. It should be added to each function separately.

JonChesterfield added reviewers: jhuber6, tianshilei1992.Jun 25 2021, 11:03 AM

JonChesterfield added inline comments.Jun 25 2021, 11:12 AM

clang/lib/Headers/__clang_hip_cmath.h
29	iirc rocm does that with a macro called DEVICE_NOCE, perhaps we could go with DEVICE_CONSTEXPR. There's some interaction with overloading rules and different glibc versions, so it would be nice to tag exactly the same functions as constexpr on nvptx and amdgcn
clang/lib/Headers/__clang_hip_math.h
29–30	wonder if HIP would benefit from nothrow here
35	I'd expect openmp to match the cplusplus/c distinction here, as openmp works on C source
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
113	i think this should be `#define __device__`
clang/test/Headers/Inputs/include/cstdlib
15	I think I'd expect builtin_labs et al to work on amdgcn, are we missing lowering for them?

Those changes in OpenMP headers LGTM, except #define __device__.

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
113	Right because we already have `declare variant`.

Really looking forward to this! Thanks a lot!

I left some comments.

clang/lib/Headers/__clang_hip_math.h
35	^ Agreed. Though, we use a different trick because it's unfortunately not as straight forward always and can be decided based on the C vs C++.
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
132	Can you make the declare variant scope of nvptx and amdgpu smaller and put them next to each other. #ifdef __cplusplus extern "C" { #endif #declare variant #define ... ... #undef #end #declare variant ... #end #ifdef __cplusplus } // extern "C"
clang/lib/Headers/openmp_wrappers/cmath
83	No match_any needed (here and elsewhere). Also, don't we want all but the includes to be the same for both GPUs. Maybe we have a device(kind(gpu)) variant and inside the nvptx and amdgpu just for the respective include?
clang/lib/Headers/openmp_wrappers/math.h
59	FWIW, This is what I think the begin/end regions should look like. Small and next to each other.
clang/test/Headers/Inputs/include/cstdlib
15	Yeah, looks weird that we cannot compile this mock-up header.

Addressed review comments.

clang/lib/Headers/__clang_hip_math.h
29–30	Would like to keep hip changes minimal in this patch.
35	This is somewhat tricky. Since declaration of `__finite/__isnan /__isinff` is with int return type in standard library (and the corresponding methods in C++ seems to be isfinite, isnan and isinf with bool return type), the compiler fails to resolve these functions when using bool. I don't know how HIP is working. __RETURN_TYPE macro is only being used with the following methods: __finite __isnan __isinf __signbit and with the corresponding float versions.
clang/lib/Headers/openmp_wrappers/cmath
83	device(kind(gpu)) breaks nvptx and hip with lots of errors like below, ... __clang_cuda_device_functions.h:29:40: error: use of undeclared identifier '__nvvm_vote_all' ... Maybe I am doing something wrong.
clang/test/Headers/Inputs/include/cstdlib
15	From what I understand, hip is defining fabs to use ocml's version into the std namespace, which was already defined in this header. So that's causing multiple declaration error. I will wrap only fabs in the ifdef's

Typo

Harbormaster completed remote builds in B111227: Diff 354802.Jun 28 2021, 2:09 AM

A few small comments, otherwise LGTM on the HIP header side.

clang/lib/Headers/__clang_hip_cmath.h
30	Does OpenMP not require `__device__` attribute here? I know constexpr defines `__device__` on HIP, does OMP do the same?
32	I don't think this is the right place to define `__constant__`? It's unused in this header, and may get forgotten. Would it be better to define it in the openmp wrapper or does cmath define it in OpenMP?
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
47	Would it be better to push and pop these macros, in case it was defined outside of here?

Move constant to openmp_wrappers/cmath
Using push/pop_macro to avoid redefinition

clang/lib/Headers/__clang_hip_cmath.h
30	It does not as these methods are inside declare variant.
32	It is being used. However, I have moved it to openmp_wrappers/cmath.

Harbormaster completed remote builds in B111300: Diff 354913.Jun 28 2021, 9:22 AM

OpenMP side looks reasonable.

clang/lib/Headers/__clang_hip_cmath.h
96	^ This is how OpenMP resolves the overload issue wrt. different return types.
clang/lib/Headers/__clang_hip_math.h
35	I marked the code above that actually overloads these functions in OpenMP (or better the versions w/o underscores) such that the system can have either version and it should work fine.
clang/test/Headers/Inputs/include/cstdlib
29	That seems to be fundamentally broken then, but let's see, maybe it will somehow work anyway.

JonChesterfield added inline comments.Jun 30 2021, 1:50 AM

clang/test/Headers/Inputs/include/cstdlib
29	I thought fabs was in math, not stdlib. Not sure what this file is doing but the functions above are inline and fabs isn't

JonChesterfield added inline comments.Jun 30 2021, 1:59 AM

clang/lib/Headers/__clang_hip_cmath.h
43	I'm pretty sure it's UB, no diagnostic req to call a non-constexpr function from a constexpr one. Nevertheless, it does presently seem to be working for nvptx clang so will probably work for amdgcn. Strictly I think we're supposed to detect when called at compile time and call one implementation and otherwise call the library one, where weirdly it's ok for them to return different answers. I think there's a built-in we can call to select between the two.

I recommend we ship this and fix up the rough edges as we run into them. Paired with ocml it passes OVO libm tests which seems to be a fairly high bar for 'does it work'. Therefore tagging green. Any objections?

This revision is now accepted and ready to land.Jun 30 2021, 3:31 AM

pdhaliwal added inline comments.Jun 30 2021, 3:59 AM

clang/lib/Headers/__clang_hip_cmath.h
96	I tried the exact same way. The lit tests compile and run fine. I could not get the runtime tests compile without the errors. It might be that I am not using match patterns correctly. I also tried some other combinations of the match selector but none of them worked.

scchan accepted this revision.Jun 30 2021, 7:59 AM

@ronlieb reports that this change means CUDA is defined for openmp amdgcn compilation. I'm going to try to verify that

[AMD Official Use Only]

It maybe that the patch does not expose CUDA directly.
Rather the patch works so well we finally see the pre-existing issue in complex.h

./clang/lib/Headers/openmp_wrappers/complex.h

jdoerfert added inline comments.Jun 30 2021, 10:03 AM

clang/lib/Headers/__clang_hip_cmath.h
96	Not sure what to say. If we want it to work in the wild, I doubt there is much we can do but to make this work. Not sure what your errors were or why they were caused, I'd recommend to determine that instead of punting and hoping nobody will run into this.

Good spot. I've been feeding the following to various toolchains:

// permute
//#include <math.h>
//#include <cmath>
//#include <complex.h>

#ifndef _OPENMP
#error "OpenMP should be defined
#endif

#ifdef __CUDA__
#error "Cuda should not be defined"
#endif

#ifdef __HIP__
#error "HIP should not be defined"
#endif

int main() {}

Currently, trunk passes that with any of the headers uncommented. Rocm is ok for math and cmath, but defines CUDA for complex.

Weird pre-existing stuff in cuda_complex_builtins. It has an #ifdef AMDGCN macro in it, despite 'cuda' in the name. I note there is no corresponding 'hip' complex builtins.

The ifdef logic for stubbing out some functions (which is done with macros...) isn't ideal, it's:

#if !defined (_OPENMP_NVPTX)
// use std:: functions from cmath.h, which isn't included, though math.h is included from openmp before it
#else 
#ifdef __AMDGCN__
// use ocml functions
#else
// use nv functions
#end
#end

None of this uses #define __CUDA__ so we could drop that from the openmp wrapper. Or, as far as I can tell, we could drop all the macro obfuscation in that header and just call the libm functions directly, which will already resolve to the appropriate platform specific thing.

Instead of revising that as part of this patch, how about we wrap the openmp_wrappers/complex.h logic in #ifndef __AMDGCN__, which will cut it from the graph for openmp while leaving nvptx openmp untouched?

JonChesterfield requested changes to this revision.Jun 30 2021, 10:25 AM

This revision now requires changes to proceed.Jun 30 2021, 10:25 AM

tagged request changes because I think we should ifdef around complex before (or while) landing this, as defining __CUDA__, even transiently, is a user hostile thing to do from amdgpu openmp

It is *really* ugly that we have cuda and hip implementations of cmath. Opening them in diff it looks very likely that the hip one was created by copying and pasting the cuda one then hacking on it a bit. This means we have openmp specific fixes already done in the cuda one and VS2019 workarounds in the hip one. It also means there are a bunch of differences that might be important or might be spurious, like whether a function calls ::scalbln or std::scalbln. This is particularly frustrating because we should be able isolate essentially all the differences between nv and ocml functions in math.h.

clang/lib/Headers/openmp_wrappers/cmath
30	this declare variant will not match amdgcn
43	which means that amdgcn is not going to pick up any of these overloads, but that looks like it's actually OK because clang_hip_cmath does define them (I think, there are a lot of macros involved)
clang/lib/Headers/openmp_wrappers/math.h
41	@jdoerfert do you know why we have match_any here? wondering if the amdgcn variant below should have the same

tianshilei1992 added inline comments.Jun 30 2021, 10:50 AM

clang/lib/Headers/openmp_wrappers/math.h
41	Because `nvptx` and `nvptx64` share the same implementation. They just emit different IRs. If AMDGCN only has one architecture, it doesn't need to use `match_any`.

JonChesterfield added inline comments.Jun 30 2021, 11:22 AM

clang/lib/Headers/openmp_wrappers/math.h
41	ah, right - thank you!

JonChesterfield mentioned this in D105221: [openmp][nfc] Simplify macros guarding math complex headers.Jun 30 2021, 11:53 AM

estewart08 added inline comments.Jul 9 2021, 12:10 PM

clang/test/Headers/Inputs/include/cstdlib
29	I am afraid this is just a workaround to get openmp_device_math_isnan.cpp to pass for AMDGCN. This stems from not having an #ifndef OPENMP_AMDGCN in __clang_hip_cmath.h where 'using ::fabs' is present. Currently, OPENMP_AMDGCN uses all of the overloaded functions created by the HIP macros where NVPTX does not use the CUDA overloads. This may be a new topic of discussion. https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_cmath.h#L191 By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788.

jdoerfert added inline comments.Jul 9 2021, 3:37 PM

clang/test/Headers/Inputs/include/cstdlib
29	By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788. Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot.

JonChesterfield added inline comments.Jul 9 2021, 4:34 PM

clang/test/Headers/Inputs/include/cstdlib
29	TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause. The list of functions near the top of `__clang_cuda_cmath.h` is a subset of libm, e.g. __DEVICE__ float acos(float __x) { return ::acosf(__x); } but no acosh Later on in the file are: __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh) but these are guarded by `#ifndef __OPENMP_NVPTX__`, which suggests they are not included when using the header from openmp. However, openmp_wrappers/cmath does include `__DEVICE__ float acosh(float __x) { return ::acoshf(__x); }` under the comment `// Overloads not provided by the CUDA wrappers by by the CUDA system headers` Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header. My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor. Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks.

Only revision I'm looking for here is to land D105221 or equivalent first

LG from my side

clang/test/Headers/Inputs/include/cstdlib
29	> By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788. Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot. @estewart08 Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that.

D105221 landed so LGTM too

This revision is now accepted and ready to land.Jul 20 2021, 12:56 AM

estewart08 added inline comments.Jul 21 2021, 7:01 AM

clang/test/Headers/Inputs/include/cstdlib
29	TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause. The list of functions near the top of `__clang_cuda_cmath.h` is a subset of libm, e.g. __DEVICE__ float acos(float __x) { return ::acosf(__x); } but no acosh Later on in the file are: __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh) but these are guarded by `#ifndef __OPENMP_NVPTX__`, which suggests they are not included when using the header from openmp. However, openmp_wrappers/cmath does include `__DEVICE__ float acosh(float __x) { return ::acoshf(__x); }` under the comment `// Overloads not provided by the CUDA wrappers by by the CUDA system headers` Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header. My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor. Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks.
29	Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that. At this point all of the functions I have tried for nvptx did not show an error. It was unclear to me how the device versions of certain overloaded functions were being resolved. As Jon mentioned above, it is a mix of headers that range from clang, openmp, cuda, and system headers. For now, I will retract my statement and if I run into any problems in the future I will point them out.

rebase on main, builds OK

This revision was landed with ongoing or failed builds.Jul 21 2021, 8:16 AM

Closed by commit rG968899ad9cf1: [OpenMP][AMDGCN] Initial math headers support (authored by pdhaliwal, committed by JonChesterfield). · Explain Why

This revision was automatically updated to reflect the committed changes.

JonChesterfield added a commit: rG968899ad9cf1: [OpenMP][AMDGCN] Initial math headers support.

Landed on @pdhaliwal's behalf. My expectation is that this patch mostly works and the rough edges can be cleaned up once ocml is linked in and we can more easily run more applications through it.

My local build failed due to regression failures. clang/test/Headers/openmp_device_math_isnan.cpp failed with the following errors on undeclared fabs.

1950 /home/michliao/working/llvm/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: use of undeclared identifier 'fabs'
1951 float abs(float x) { return fabs(x); }
1952 ^
1953 /home/michliao/working/llvm/llvm-project/clang/test/Headers/Inputs/include/cstdlib:30:33: error: use of undeclared identifier 'fabs'
1954 double abs(double x) { return fabs(x); }
1955 ^
1956 2 errors generated.

This breaks tests: http://45.33.8.238/linux/51733/step_7.txt

Please take a look and revert for now if it takes a while to fix.

Harbormaster completed remote builds in B115309: Diff 360447.Jul 21 2021, 9:25 AM

Thanks! Will take a look. Feel free to revert, I'll do so shortly if noone beats me to it

JonChesterfield added a reverting change: rGd71062fbdab2: Revert "[OpenMP][AMDGCN] Initial math headers support".Jul 21 2021, 9:36 AM

cstdlib test header contains

// amdgcn already provides definition of fabs
#ifndef __AMDGCN__
float fabs(float __x) { return __builtin_fabs(__x); }
#endif

If I delete or invert the ifndef

$HOME/llvm-build/llvm/lib/clang/13.0.0/include/__clang_hip_cmath.h:660:9: error: target of using declaration conflicts with declaration already in scope
using ::fabs;
when included from openmp_wrappers/cmath

If I delete the definition,

$HOME/llvm-project/clang/test/Headers/Inputs/include/cstdlib:29:31: error: use of undeclared identifier 'fabs'
when included from openmp_wrappers/__clang_openmp_device_functions.h

Current conclusion is that we cannot work around the presence/absence of fabs in the cstdlib test file, we have to do something in the real headers such that the test file does the right thing

how to get this moving?

In D104904#2913983, @ye-luo wrote:

how to get this moving?

We are working on some additions to this patch. The lit failure noted above has been fixed locally. I would expect an update here very soon.

Landing ocml side first seems reasonable as it's less likely to be broken and makes testing this more straightforward

This revision is now accepted and ready to land.Jul 29 2021, 10:48 AM

It required some work to fix the failing lit test case. And many thanks to
@estewart for helping in that.

The current status is that we are now following the nvptx openmp strategy for
openmp math headers very closely. In this version of patch, there are bunch
of HIP cmath overloads which are disabled for AMDGPU openmp similar to nvptx.
This fixed the lit failure, but a large number of tests started failing in OvO.,
Reason being that there were some overloads which were used in the suite but
were disabled earlier. In order to fix them, we had added definitions in the
openmp_wrappers/cmath for the missing overloads. With these changes, OvO compiles 100% of the
mathematical_function test suite successfully. There are still 6/177 tests in
the suite which are producing wrong result.

Now my suggestion is to land this patch as it is and fix the remaining 6 tests
in a later patch.

Harbormaster completed remote builds in B117124: Diff 362997.Jul 30 2021, 3:10 AM

JonChesterfield accepted this revision.Jul 30 2021, 7:11 AM

JonChesterfield added inline comments.

clang/lib/Headers/openmp_wrappers/cmath
113	let's not add in commented out code
clang/test/Headers/Inputs/include/cstdlib
26	drop comment?

Addressed review comments.

This revision was landed with ongoing or failed builds.Jul 30 2021, 7:53 AM

Closed by commit rG12da97ea10a9: [OpenMP][AMDGCN] Initial math headers support (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG12da97ea10a9: [OpenMP][AMDGCN] Initial math headers support.

Harbormaster completed remote builds in B117186: Diff 363090.Jul 30 2021, 8:44 AM

Unforuantely I hit error on my ubuntu 20.04 system.

#include <complex>
int main()
{ }

~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 main.cpp -c
works fine

~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp

$ ~/opt/llvm-clang/build_mirror_offload_main/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 --libomptarget-nvptx-bc-path=/home/yeluo/opt/llvm-clang/build_mirror_offload_main/runtimes/runtimes-bins/openmp/libomptarget/libomptarget-nvptx-sm_80.bc main.cpp -c --std=c++14
clang-14: warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11040. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
In file included from main.cpp:2:
In file included from /home/yeluo/opt/llvm-clang/build_mirror_offload_main/lib/clang/14.0.0/include/openmp_wrappers/complex:26:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/complex:45:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/sstream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/istream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ios:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/iosfwd:39:
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:73:30: error: no template named 'allocator'
           typename _Alloc = allocator<_CharT> >
                             ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:73:21: error: template parameter missing a default argument
           typename _Alloc = allocator<_CharT> >
                    ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:72:48: note: previous default template argument defined here
  template<typename _CharT, typename _Traits = char_traits<_CharT>,
                                               ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:79:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<char>    string;   
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:83:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<wchar_t> wstring;   
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:93:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<char16_t> u16string; 
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:96:11: error: too few template arguments for class template 'basic_string'
  typedef basic_string<char32_t> u32string; 
          ^
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stringfwd.h:74:11: note: template is declared here
    class basic_string;
          ^

Manually add <memory> in main.cpp workaround the issue but need a fix.

This patch didn't change complex so I'm struggling to make sense of the backtrace. Something in libstdc++ needs memory but doesn't include it?

In D104904#2916943, @ye-luo wrote:
Unforuantely I hit error on my ubuntu 20.04 system.
#include <complex>
int main()
{ }

Given that ^ in fail.cpp and an invocation on a machine that doesn't have cuda or an nvidia card (it had built nvptx devicertl, probably doesn't matter for this)
$HOME/llvm-install/bin/clang++ -fopenmp -fopenmp-targets=nvptx64 fail.cpp -nocudalib

I also get a failure. This time on libstdc++ 10, various failures, starting from

In file included from /home/amd/llvm-install/lib/clang/14.0.0/include/openmp_wrappers/complex:26:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/complex:45:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/sstream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/istream:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ios:40:
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/char_traits.h:216:7: error: no member named 'copy' in namespace 'std'; did you mean
      simply 'copy'?
      std::copy(__s2, __s2 + __n, __s1);
      ^~~~~

Reverting replaces that error with the expected "ptxas" doesn't exist. Therefore reverting this change, will reapply once the unexpected change of behaviour on nvptx is understood and avoided.

JonChesterfield reopened this revision.Jul 30 2021, 2:07 PM

This revision is now accepted and ready to land.Jul 30 2021, 2:07 PM

JonChesterfield added a reverting change: rG7f97ddaf8aa0: Revert "[OpenMP][AMDGCN] Initial math headers support".Jul 30 2021, 2:07 PM

Fixed compilation error for nvptx headers. Tested on both cuda and non-cuda systems.

@ye-luo and @JonChesterfield can you please test the latest version of this patch? It should work now.

Harbormaster completed remote builds in B117389: Diff 363387.Aug 2 2021, 1:02 AM

JonChesterfield added inline comments.Aug 2 2021, 7:15 AM

clang/lib/Headers/openmp_wrappers/math.h
53	That's quite worrying. Declare variant match amdgcn is supposed to have the same effect here as the older style macro. I wonder if we have any test coverage for whether declare variant works for amdgcn.

Latest patch can only misfire on amdgpu so lets go with it and try to work out variant vs ifdef subsequently.

(edit: Adding the ifdef around the declare variant, though I still think it should be a no-op, does indeed fix the above failure for nvptx)

This revision was landed with ongoing or failed builds.Aug 2 2021, 7:39 AM

Closed by commit rG713a5d12cde5: [OpenMP][AMDGCN] Initial math headers support (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG713a5d12cde5: [OpenMP][AMDGCN] Initial math headers support.

JonChesterfield added inline comments.Aug 4 2021, 7:22 AM

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
38	Given that declare variant didn't work elsewhere, it probably doesn't work here. Thus this may be the root cause of https://bugs.llvm.org/show_bug.cgi?id=51337

pdhaliwal added inline comments.Aug 4 2021, 8:11 AM

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
38	Was able to reproduce this issue locally on nvptx machine. And you are right, declare variant didn't work here as well. Wrapping it in #ifdef fixed the issue. I will create a fix.

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Clang.cpp

3 lines

Headers/

__clang_hip_cmath.h

179 lines

__clang_hip_math.h

17 lines

openmp_wrappers/

__clang_openmp_device_functions.h

30 lines

cmath

15 lines

math.h

8 lines

test/

Headers/

Inputs/

include/

algorithm

6 lines

cstdlib

4 lines

utility

2 lines

amdgcn_openmp_device_math.c

51 lines

openmp_device_math_isnan.cpp

8 lines

Diff 354913

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,243 Lines • ▼ Show 20 Lines	void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA,
if (JA.isOffloading(Action::OFK_Cuda))		if (JA.isOffloading(Action::OFK_Cuda))
getToolChain().AddCudaIncludeArgs(Args, CmdArgs);		getToolChain().AddCudaIncludeArgs(Args, CmdArgs);
if (JA.isOffloading(Action::OFK_HIP))		if (JA.isOffloading(Action::OFK_HIP))
getToolChain().AddHIPIncludeArgs(Args, CmdArgs);		getToolChain().AddHIPIncludeArgs(Args, CmdArgs);

// If we are offloading to a target via OpenMP we need to include the		// If we are offloading to a target via OpenMP we need to include the
// openmp_wrappers folder which contains alternative system headers.		// openmp_wrappers folder which contains alternative system headers.
if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&		if (JA.isDeviceOffloading(Action::OFK_OpenMP) &&
getToolChain().getTriple().isNVPTX()){		(getToolChain().getTriple().isNVPTX() \|\|
		getToolChain().getTriple().isAMDGCN())) {
if (!Args.hasArg(options::OPT_nobuiltininc)) {		if (!Args.hasArg(options::OPT_nobuiltininc)) {
// Add openmp_wrappers/* to our system include path. This lets us wrap		// Add openmp_wrappers/* to our system include path. This lets us wrap
// standard library headers.		// standard library headers.
SmallString<128> P(D.ResourceDir);		SmallString<128> P(D.ResourceDir);
llvm::sys::path::append(P, "include");		llvm::sys::path::append(P, "include");
llvm::sys::path::append(P, "openmp_wrappers");		llvm::sys::path::append(P, "openmp_wrappers");
CmdArgs.push_back("-internal-isystem");		CmdArgs.push_back("-internal-isystem");
CmdArgs.push_back(Args.MakeArgString(P));		CmdArgs.push_back(Args.MakeArgString(P));
▲ Show 20 Lines • Show All 6,493 Lines • Show Last 20 Lines

clang/lib/Headers/__clang_hip_cmath.h

	/*===---- __clang_hip_cmath.h - HIP cmath decls -----------------------------===			/*===---- __clang_hip_cmath.h - HIP cmath decls -----------------------------===
	*			*
	* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	* See https://llvm.org/LICENSE.txt for license information.			* See https://llvm.org/LICENSE.txt for license information.
	* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	*			*
	*===-----------------------------------------------------------------------===			*===-----------------------------------------------------------------------===
	*/			*/

	#ifndef __CLANG_HIP_CMATH_H__			#ifndef __CLANG_HIP_CMATH_H__
	#define __CLANG_HIP_CMATH_H__			#define __CLANG_HIP_CMATH_H__

	#if !defined(__HIP__)			#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__)
	#error "This file is for HIP and OpenMP AMDGCN device compilation only."			#error "This file is for HIP and OpenMP AMDGCN device compilation only."
	#endif			#endif

	#if !defined(__HIPCC_RTC__)			#if !defined(__HIPCC_RTC__)
	#if defined(__cplusplus)			#if defined(__cplusplus)
	#include <limits>			#include <limits>
	#include <type_traits>			#include <type_traits>
	#include <utility>			#include <utility>
	#endif			#endif
	#include <limits.h>			#include <limits.h>
	#include <stdint.h>			#include <stdint.h>
	#endif // !defined(__HIPCC_RTC__)			#endif // !defined(__HIPCC_RTC__)

	#pragma push_macro("__DEVICE__")			#pragma push_macro("__DEVICE__")
				#pragma push_macro("__CONSTEXPR__")
				#ifdef __OPENMP_AMDGCN__
				scchanUnsubmitted Not Done Reply Inline Actions `__DEVICE__` should not imply constexpr. It should be added to each function separately. scchan: `__DEVICE__` should not imply constexpr. It should be added to each function separately.
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions iirc rocm does that with a macro called DEVICE_NOCE, perhaps we could go with DEVICE_CONSTEXPR. There's some interaction with overloading rules and different glibc versions, so it would be nice to tag exactly the same functions as constexpr on nvptx and amdgcn JonChesterfield: iirc rocm does that with a macro called __DEVICE_NOCE__, perhaps we could go with…
				#define __DEVICE__ static __attribute__((always_inline, nothrow))
				ashi1Unsubmitted Not Done Reply Inline Actions Does OpenMP not require `__device__` attribute here? I know constexpr defines `__device__` on HIP, does OMP do the same? ashi1: Does OpenMP not require `__device__` attribute here? I know constexpr defines `__device__` on…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions It does not as these methods are inside declare variant. pdhaliwal: It does not as these methods are inside declare variant.
				#define __CONSTEXPR__ constexpr
				#else
				ashi1Unsubmitted Done Reply Inline Actions I don't think this is the right place to define `__constant__`? It's unused in this header, and may get forgotten. Would it be better to define it in the openmp wrapper or does cmath define it in OpenMP? ashi1: I don't think this is the right place to define `__constant__`? It's unused in this header, and…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions It is being used. However, I have moved it to openmp_wrappers/cmath. pdhaliwal: It is being used. However, I have moved it to openmp_wrappers/cmath.
	#define __DEVICE__ static __device__ inline __attribute__((always_inline))			#define __DEVICE__ static __device__ inline __attribute__((always_inline))
				#define __CONSTEXPR__
				#endif // __OPENMP_AMDGCN__

	// Start with functions that cannot be defined by DEF macros below.			// Start with functions that cannot be defined by DEF macros below.
	#if defined(__cplusplus)			#if defined(__cplusplus)
	__DEVICE__ double abs(double __x) { return ::fabs(__x); }			__DEVICE__ __CONSTEXPR__ double abs(double __x) { return ::fabs(__x); }
	__DEVICE__ float abs(float __x) { return ::fabsf(__x); }			__DEVICE__ __CONSTEXPR__ float abs(float __x) { return ::fabsf(__x); }
	__DEVICE__ long long abs(long long __n) { return ::llabs(__n); }			__DEVICE__ __CONSTEXPR__ long long abs(long long __n) { return ::llabs(__n); }
	__DEVICE__ long abs(long __n) { return ::labs(__n); }			__DEVICE__ __CONSTEXPR__ long abs(long __n) { return ::labs(__n); }
	__DEVICE__ float fma(float __x, float __y, float __z) {			__DEVICE__ __CONSTEXPR__ float fma(float __x, float __y, float __z) {
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I'm pretty sure it's UB, no diagnostic req to call a non-constexpr function from a constexpr one. Nevertheless, it does presently seem to be working for nvptx clang so will probably work for amdgcn. Strictly I think we're supposed to detect when called at compile time and call one implementation and otherwise call the library one, where weirdly it's ok for them to return different answers. I think there's a built-in we can call to select between the two. JonChesterfield: I'm pretty sure it's UB, no diagnostic req to call a non-constexpr function from a constexpr…
	return ::fmaf(__x, __y, __z);			return ::fmaf(__x, __y, __z);
	}			}
	#if !defined(__HIPCC_RTC__)			#if !defined(__HIPCC_RTC__)
	// The value returned by fpclassify is platform dependent, therefore it is not			// The value returned by fpclassify is platform dependent, therefore it is not
	// supported by hipRTC.			// supported by hipRTC.
	__DEVICE__ int fpclassify(float __x) {			__DEVICE__ __CONSTEXPR__ int fpclassify(float __x) {
	return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,			return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
	FP_ZERO, __x);			FP_ZERO, __x);
	}			}
	__DEVICE__ int fpclassify(double __x) {			__DEVICE__ __CONSTEXPR__ int fpclassify(double __x) {
	return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,			return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
	FP_ZERO, __x);			FP_ZERO, __x);
	}			}
	#endif // !defined(__HIPCC_RTC__)			#endif // !defined(__HIPCC_RTC__)

	__DEVICE__ float frexp(float __arg, int *__exp) {			__DEVICE__ __CONSTEXPR__ float frexp(float __arg, int *__exp) {
	return ::frexpf(__arg, __exp);			return ::frexpf(__arg, __exp);
	}			}

	#if defined(__OPENMP_AMDGCN__)			#if defined(__OPENMP_AMDGCN__)
	// For OpenMP we work around some old system headers that have non-conforming			// For OpenMP we work around some old system headers that have non-conforming
	// `isinf(float)` and `isnan(float)` implementations that return an `int`. We do			// `isinf(float)` and `isnan(float)` implementations that return an `int`. We do
	// this by providing two versions of these functions, differing only in the			// this by providing two versions of these functions, differing only in the
	// return type. To avoid conflicting definitions we disable implicit base			// return type. To avoid conflicting definitions we disable implicit base
	// function generation. That means we will end up with two specializations, one			// function generation. That means we will end up with two specializations, one
	// per type, but only one has a base function defined by the system header.			// per type, but only one has a base function defined by the system header.
	#pragma omp begin declare variant match( \			#pragma omp begin declare variant match( \
	implementation = {extension(disable_implicit_base)})			implementation = {extension(disable_implicit_base)})

	// FIXME: We lack an extension to customize the mangling of the variants, e.g.,			// FIXME: We lack an extension to customize the mangling of the variants, e.g.,
	// add a suffix. This means we would clash with the names of the variants			// add a suffix. This means we would clash with the names of the variants
	// (note that we do not create implicit base functions here). To avoid			// (note that we do not create implicit base functions here). To avoid
	// this clash we add a new trait to some of them that is always true			// this clash we add a new trait to some of them that is always true
	// (this is LLVM after all ;)). It will only influence the mangled name			// (this is LLVM after all ;)). It will only influence the mangled name
	// of the variants inside the inner region and avoid the clash.			// of the variants inside the inner region and avoid the clash.
	#pragma omp begin declare variant match(implementation = {vendor(llvm)})			#pragma omp begin declare variant match(implementation = {vendor(llvm)})

	__DEVICE__ int isinf(float __x) { return ::__isinff(__x); }			__DEVICE__ __CONSTEXPR__ int isinf(float __x) { return ::__isinff(__x); }
	__DEVICE__ int isinf(double __x) { return ::__isinf(__x); }			__DEVICE__ __CONSTEXPR__ int isinf(double __x) { return ::__isinf(__x); }
	__DEVICE__ int isfinite(float __x) { return ::__finitef(__x); }			__DEVICE__ __CONSTEXPR__ int isfinite(float __x) { return ::__finitef(__x); }
	__DEVICE__ int isfinite(double __x) { return ::__finite(__x); }			__DEVICE__ __CONSTEXPR__ int isfinite(double __x) { return ::__finite(__x); }
	__DEVICE__ int isnan(float __x) { return ::__isnanf(__x); }			__DEVICE__ __CONSTEXPR__ int isnan(float __x) { return ::__isnanf(__x); }
	__DEVICE__ int isnan(double __x) { return ::__isnan(__x); }			__DEVICE__ __CONSTEXPR__ int isnan(double __x) { return ::__isnan(__x); }

	#pragma omp end declare variant			#pragma omp end declare variant
	#endif // defined(__OPENMP_AMDGCN__)			#endif // defined(__OPENMP_AMDGCN__)

	__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); }			__DEVICE__ __CONSTEXPR__ bool isinf(float __x) { return ::__isinff(__x); }
	__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); }			__DEVICE__ __CONSTEXPR__ bool isinf(double __x) { return ::__isinf(__x); }
	__DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); }			__DEVICE__ __CONSTEXPR__ bool isfinite(float __x) { return ::__finitef(__x); }
	__DEVICE__ bool isfinite(double __x) { return ::__finite(__x); }			__DEVICE__ __CONSTEXPR__ bool isfinite(double __x) { return ::__finite(__x); }
	__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); }			__DEVICE__ __CONSTEXPR__ bool isnan(float __x) { return ::__isnanf(__x); }
	__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); }			__DEVICE__ __CONSTEXPR__ bool isnan(double __x) { return ::__isnan(__x); }
				jdoerfertUnsubmitted Not Done Reply Inline Actions ^ This is how OpenMP resolves the overload issue wrt. different return types. jdoerfert: ^ This is how OpenMP resolves the overload issue wrt. different return types.
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I tried the exact same way. The lit tests compile and run fine. I could not get the runtime tests compile without the errors. It might be that I am not using match patterns correctly. I also tried some other combinations of the match selector but none of them worked. pdhaliwal: I tried the exact same way. The lit tests compile and run fine. I could not get the runtime…
				jdoerfertUnsubmitted Not Done Reply Inline Actions Not sure what to say. If we want it to work in the wild, I doubt there is much we can do but to make this work. Not sure what your errors were or why they were caused, I'd recommend to determine that instead of punting and hoping nobody will run into this. jdoerfert: Not sure what to say. If we want it to work in the wild, I doubt there is much we can do but to…

	#if defined(__OPENMP_AMDGCN__)			#if defined(__OPENMP_AMDGCN__)
	#pragma omp end declare variant			#pragma omp end declare variant
	#endif // defined(__OPENMP_AMDGCN__)			#endif // defined(__OPENMP_AMDGCN__)

	__DEVICE__ bool isgreater(float __x, float __y) {			__DEVICE__ __CONSTEXPR__ bool isgreater(float __x, float __y) {
	return __builtin_isgreater(__x, __y);			return __builtin_isgreater(__x, __y);
	}			}
	__DEVICE__ bool isgreater(double __x, double __y) {			__DEVICE__ __CONSTEXPR__ bool isgreater(double __x, double __y) {
	return __builtin_isgreater(__x, __y);			return __builtin_isgreater(__x, __y);
	}			}
	__DEVICE__ bool isgreaterequal(float __x, float __y) {			__DEVICE__ __CONSTEXPR__ bool isgreaterequal(float __x, float __y) {
	return __builtin_isgreaterequal(__x, __y);			return __builtin_isgreaterequal(__x, __y);
	}			}
	__DEVICE__ bool isgreaterequal(double __x, double __y) {			__DEVICE__ __CONSTEXPR__ bool isgreaterequal(double __x, double __y) {
	return __builtin_isgreaterequal(__x, __y);			return __builtin_isgreaterequal(__x, __y);
	}			}
	__DEVICE__ bool isless(float __x, float __y) {			__DEVICE__ __CONSTEXPR__ bool isless(float __x, float __y) {
	return __builtin_isless(__x, __y);			return __builtin_isless(__x, __y);
	}			}
	__DEVICE__ bool isless(double __x, double __y) {			__DEVICE__ __CONSTEXPR__ bool isless(double __x, double __y) {
	return __builtin_isless(__x, __y);			return __builtin_isless(__x, __y);
	}			}
	__DEVICE__ bool islessequal(float __x, float __y) {			__DEVICE__ __CONSTEXPR__ bool islessequal(float __x, float __y) {
	return __builtin_islessequal(__x, __y);			return __builtin_islessequal(__x, __y);
	}			}
	__DEVICE__ bool islessequal(double __x, double __y) {			__DEVICE__ __CONSTEXPR__ bool islessequal(double __x, double __y) {
	return __builtin_islessequal(__x, __y);			return __builtin_islessequal(__x, __y);
	}			}
	__DEVICE__ bool islessgreater(float __x, float __y) {			__DEVICE__ __CONSTEXPR__ bool islessgreater(float __x, float __y) {
	return __builtin_islessgreater(__x, __y);			return __builtin_islessgreater(__x, __y);
	}			}
	__DEVICE__ bool islessgreater(double __x, double __y) {			__DEVICE__ __CONSTEXPR__ bool islessgreater(double __x, double __y) {
	return __builtin_islessgreater(__x, __y);			return __builtin_islessgreater(__x, __y);
	}			}
	__DEVICE__ bool isnormal(float __x) { return __builtin_isnormal(__x); }			__DEVICE__ __CONSTEXPR__ bool isnormal(float __x) {
	__DEVICE__ bool isnormal(double __x) { return __builtin_isnormal(__x); }			return __builtin_isnormal(__x);
	__DEVICE__ bool isunordered(float __x, float __y) {			}
				__DEVICE__ __CONSTEXPR__ bool isnormal(double __x) {
				return __builtin_isnormal(__x);
				}
				__DEVICE__ __CONSTEXPR__ bool isunordered(float __x, float __y) {
	return __builtin_isunordered(__x, __y);			return __builtin_isunordered(__x, __y);
	}			}
	__DEVICE__ bool isunordered(double __x, double __y) {			__DEVICE__ __CONSTEXPR__ bool isunordered(double __x, double __y) {
	return __builtin_isunordered(__x, __y);			return __builtin_isunordered(__x, __y);
	}			}
	__DEVICE__ float modf(float __x, float *__iptr) { return ::modff(__x, __iptr); }			__DEVICE__ __CONSTEXPR__ float modf(float __x, float *__iptr) {
	__DEVICE__ float pow(float __base, int __iexp) {			return ::modff(__x, __iptr);
				}
				__DEVICE__ __CONSTEXPR__ float pow(float __base, int __iexp) {
	return ::powif(__base, __iexp);			return ::powif(__base, __iexp);
	}			}
	__DEVICE__ double pow(double __base, int __iexp) {			__DEVICE__ __CONSTEXPR__ double pow(double __base, int __iexp) {
	return ::powi(__base, __iexp);			return ::powi(__base, __iexp);
	}			}
	__DEVICE__ float remquo(float __x, float __y, int *__quo) {			__DEVICE__ __CONSTEXPR__ float remquo(float __x, float __y, int *__quo) {
	return ::remquof(__x, __y, __quo);			return ::remquof(__x, __y, __quo);
	}			}
	__DEVICE__ float scalbln(float __x, long int __n) {			__DEVICE__ __CONSTEXPR__ float scalbln(float __x, long int __n) {
	return ::scalblnf(__x, __n);			return ::scalblnf(__x, __n);
	}			}
	__DEVICE__ bool signbit(float __x) { return ::__signbitf(__x); }			__DEVICE__ __CONSTEXPR__ bool signbit(float __x) { return ::__signbitf(__x); }
	__DEVICE__ bool signbit(double __x) { return ::__signbit(__x); }			__DEVICE__ __CONSTEXPR__ bool signbit(double __x) { return ::__signbit(__x); }

	// Notably missing above is nexttoward. We omit it because			// Notably missing above is nexttoward. We omit it because
	// ocml doesn't provide an implementation, and we don't want to be in the			// ocml doesn't provide an implementation, and we don't want to be in the
	// business of implementing tricky libm functions in this header.			// business of implementing tricky libm functions in this header.

	// Other functions.			// Other functions.
	__DEVICE__ _Float16 fma(_Float16 __x, _Float16 __y, _Float16 __z) {			__DEVICE__ __CONSTEXPR__ _Float16 fma(_Float16 __x, _Float16 __y,
				_Float16 __z) {
	return __ocml_fma_f16(__x, __y, __z);			return __ocml_fma_f16(__x, __y, __z);
	}			}
	__DEVICE__ _Float16 pow(_Float16 __base, int __iexp) {			__DEVICE__ __CONSTEXPR__ _Float16 pow(_Float16 __base, int __iexp) {
	return __ocml_pown_f16(__base, __iexp);			return __ocml_pown_f16(__base, __iexp);
	}			}

	// BEGIN DEF_FUN and HIP_OVERLOAD			// BEGIN DEF_FUN and HIP_OVERLOAD

	// BEGIN DEF_FUN			// BEGIN DEF_FUN

	#pragma push_macro("__DEF_FUN1")			#pragma push_macro("__DEF_FUN1")
	#pragma push_macro("__DEF_FUN2")			#pragma push_macro("__DEF_FUN2")
	#pragma push_macro("__DEF_FUN2_FI")			#pragma push_macro("__DEF_FUN2_FI")

	// Define cmath functions with float argument and returns __retty.			// Define cmath functions with float argument and returns __retty.
	#define __DEF_FUN1(__retty, __func) \			#define __DEF_FUN1(__retty, __func) \
	__DEVICE__ \			__DEVICE__ __CONSTEXPR__ __retty __func(float __x) { return __func##f(__x); }
	__retty __func(float __x) { return __func##f(__x); }

	// Define cmath functions with two float arguments and returns __retty.			// Define cmath functions with two float arguments and returns __retty.
	#define __DEF_FUN2(__retty, __func) \			#define __DEF_FUN2(__retty, __func) \
	__DEVICE__ \			__DEVICE__ __CONSTEXPR__ __retty __func(float __x, float __y) { \
	__retty __func(float __x, float __y) { return __func##f(__x, __y); }			return __func##f(__x, __y); \
				}

	// Define cmath functions with a float and an int argument and returns __retty.			// Define cmath functions with a float and an int argument and returns __retty.
	#define __DEF_FUN2_FI(__retty, __func) \			#define __DEF_FUN2_FI(__retty, __func) \
	__DEVICE__ \			__DEVICE__ __CONSTEXPR__ __retty __func(float __x, int __y) { \
	__retty __func(float __x, int __y) { return __func##f(__x, __y); }			return __func##f(__x, __y); \
				}

	__DEF_FUN1(float, acos)			__DEF_FUN1(float, acos)
	__DEF_FUN1(float, acosh)			__DEF_FUN1(float, acosh)
	__DEF_FUN1(float, asin)			__DEF_FUN1(float, asin)
	__DEF_FUN1(float, asinh)			__DEF_FUN1(float, asinh)
	__DEF_FUN1(float, atan)			__DEF_FUN1(float, atan)
	__DEF_FUN2(float, atan2)			__DEF_FUN2(float, atan2)
	__DEF_FUN1(float, atanh)			__DEF_FUN1(float, atanh)
	▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	#endif //__cplusplus >= 201103L			#endif //__cplusplus >= 201103L
	} // namespace __hip			} // namespace __hip

	// __HIP_OVERLOAD1 is used to resolve function calls with integer argument to			// __HIP_OVERLOAD1 is used to resolve function calls with integer argument to
	// avoid compilation error due to ambibuity. e.g. floor(5) is resolved with			// avoid compilation error due to ambibuity. e.g. floor(5) is resolved with
	// floor(double).			// floor(double).
	#define __HIP_OVERLOAD1(__retty, __fn) \			#define __HIP_OVERLOAD1(__retty, __fn) \
	template <typename __T> \			template <typename __T> \
	__DEVICE__ \			__DEVICE__ __CONSTEXPR__ \
	typename __hip_enable_if<__hip::is_integral<__T>::value, __retty>::type \			typename __hip_enable_if<__hip::is_integral<__T>::value, __retty>::type \
	__fn(__T __x) { \			__fn(__T __x) { \
	return ::__fn((double)__x); \			return ::__fn((double)__x); \
	}			}

	// __HIP_OVERLOAD2 is used to resolve function calls with mixed float/double			// __HIP_OVERLOAD2 is used to resolve function calls with mixed float/double
	// or integer argument to avoid compilation error due to ambibuity. e.g.			// or integer argument to avoid compilation error due to ambibuity. e.g.
	// max(5.0f, 6.0) is resolved with max(double, double).			// max(5.0f, 6.0) is resolved with max(double, double).
	#if __cplusplus >= 201103L			#if __cplusplus >= 201103L
	#define __HIP_OVERLOAD2(__retty, __fn) \			#define __HIP_OVERLOAD2(__retty, __fn) \
	template <typename __T1, typename __T2> \			template <typename __T1, typename __T2> \
	__DEVICE__ typename __hip_enable_if< \			__DEVICE__ __CONSTEXPR__ typename __hip_enable_if< \
	__hip::is_arithmetic<__T1>::value && __hip::is_arithmetic<__T2>::value, \			__hip::is_arithmetic<__T1>::value && __hip::is_arithmetic<__T2>::value, \
	typename __hip::__promote<__T1, __T2>::type>::type \			typename __hip::__promote<__T1, __T2>::type>::type \
	__fn(__T1 __x, __T2 __y) { \			__fn(__T1 __x, __T2 __y) { \
	typedef typename __hip::__promote<__T1, __T2>::type __result_type; \			typedef typename __hip::__promote<__T1, __T2>::type __result_type; \
	return __fn((__result_type)__x, (__result_type)__y); \			return __fn((__result_type)__x, (__result_type)__y); \
	}			}
	#else			#else
	#define __HIP_OVERLOAD2(__retty, __fn) \			#define __HIP_OVERLOAD2(__retty, __fn) \
	template <typename __T1, typename __T2> \			template <typename __T1, typename __T2> \
	__DEVICE__ typename __hip_enable_if<__hip::is_arithmetic<__T1>::value && \			__DEVICE__ __CONSTEXPR__ \
				typename __hip_enable_if<__hip::is_arithmetic<__T1>::value && \
	__hip::is_arithmetic<__T2>::value, \			__hip::is_arithmetic<__T2>::value, \
	__retty>::type \			__retty>::type \
	__fn(__T1 __x, __T2 __y) { \			__fn(__T1 __x, __T2 __y) { \
	return __fn((double)__x, (double)__y); \			return __fn((double)__x, (double)__y); \
	}			}
	#endif			#endif

	__HIP_OVERLOAD1(double, acos)			__HIP_OVERLOAD1(double, acos)
	__HIP_OVERLOAD1(double, acosh)			__HIP_OVERLOAD1(double, acosh)
	__HIP_OVERLOAD1(double, asin)			__HIP_OVERLOAD1(double, asin)
	__HIP_OVERLOAD1(double, asinh)			__HIP_OVERLOAD1(double, asinh)
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

	// Overload these but don't add them to std, they are not part of cmath.			// Overload these but don't add them to std, they are not part of cmath.
	__HIP_OVERLOAD2(double, max)			__HIP_OVERLOAD2(double, max)
	__HIP_OVERLOAD2(double, min)			__HIP_OVERLOAD2(double, min)

	// Additional Overloads that don't quite match HIP_OVERLOAD.			// Additional Overloads that don't quite match HIP_OVERLOAD.
	#if __cplusplus >= 201103L			#if __cplusplus >= 201103L
	template <typename __T1, typename __T2, typename __T3>			template <typename __T1, typename __T2, typename __T3>
	__DEVICE__ typename __hip_enable_if<			__DEVICE__ __CONSTEXPR__ typename __hip_enable_if<
	__hip::is_arithmetic<__T1>::value && __hip::is_arithmetic<__T2>::value &&			__hip::is_arithmetic<__T1>::value && __hip::is_arithmetic<__T2>::value &&
	__hip::is_arithmetic<__T3>::value,			__hip::is_arithmetic<__T3>::value,
	typename __hip::__promote<__T1, __T2, __T3>::type>::type			typename __hip::__promote<__T1, __T2, __T3>::type>::type
	fma(__T1 __x, __T2 __y, __T3 __z) {			fma(__T1 __x, __T2 __y, __T3 __z) {
	typedef typename __hip::__promote<__T1, __T2, __T3>::type __result_type;			typedef typename __hip::__promote<__T1, __T2, __T3>::type __result_type;
	return ::fma((__result_type)__x, (__result_type)__y, (__result_type)__z);			return ::fma((__result_type)__x, (__result_type)__y, (__result_type)__z);
	}			}
	#else			#else
	template <typename __T1, typename __T2, typename __T3>			template <typename __T1, typename __T2, typename __T3>
	__DEVICE__ typename __hip_enable_if<__hip::is_arithmetic<__T1>::value &&			__DEVICE__ __CONSTEXPR__
				typename __hip_enable_if<__hip::is_arithmetic<__T1>::value &&
	__hip::is_arithmetic<__T2>::value &&			__hip::is_arithmetic<__T2>::value &&
	__hip::is_arithmetic<__T3>::value,			__hip::is_arithmetic<__T3>::value,
	double>::type			double>::type
	fma(__T1 __x, __T2 __y, __T3 __z) {			fma(__T1 __x, __T2 __y, __T3 __z) {
	return ::fma((double)__x, (double)__y, (double)__z);			return ::fma((double)__x, (double)__y, (double)__z);
	}			}
	#endif			#endif

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__ __CONSTEXPR__
	typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type			typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type
	frexp(__T __x, int *__exp) {			frexp(__T __x, int *__exp) {
	return ::frexp((double)__x, __exp);			return ::frexp((double)__x, __exp);
	}			}

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__ __CONSTEXPR__
	typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type			typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type
	ldexp(__T __x, int __exp) {			ldexp(__T __x, int __exp) {
	return ::ldexp((double)__x, __exp);			return ::ldexp((double)__x, __exp);
	}			}

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__ __CONSTEXPR__
	typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type			typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type
	modf(__T __x, double *__exp) {			modf(__T __x, double *__exp) {
	return ::modf((double)__x, __exp);			return ::modf((double)__x, __exp);
	}			}

	#if __cplusplus >= 201103L			#if __cplusplus >= 201103L
	template <typename __T1, typename __T2>			template <typename __T1, typename __T2>
	__DEVICE__			__DEVICE__ __CONSTEXPR__
	typename __hip_enable_if<__hip::is_arithmetic<__T1>::value &&			typename __hip_enable_if<__hip::is_arithmetic<__T1>::value &&
	__hip::is_arithmetic<__T2>::value,			__hip::is_arithmetic<__T2>::value,
	typename __hip::__promote<__T1, __T2>::type>::type			typename __hip::__promote<__T1, __T2>::type>::type
	remquo(__T1 __x, __T2 __y, int *__quo) {			remquo(__T1 __x, __T2 __y, int *__quo) {
	typedef typename __hip::__promote<__T1, __T2>::type __result_type;			typedef typename __hip::__promote<__T1, __T2>::type __result_type;
	return ::remquo((__result_type)__x, (__result_type)__y, __quo);			return ::remquo((__result_type)__x, (__result_type)__y, __quo);
	}			}
	#else			#else
	template <typename __T1, typename __T2>			template <typename __T1, typename __T2>
	__DEVICE__ typename __hip_enable_if<__hip::is_arithmetic<__T1>::value &&			__DEVICE__ __CONSTEXPR__
				typename __hip_enable_if<__hip::is_arithmetic<__T1>::value &&
	__hip::is_arithmetic<__T2>::value,			__hip::is_arithmetic<__T2>::value,
	double>::type			double>::type
	remquo(__T1 __x, __T2 __y, int *__quo) {			remquo(__T1 __x, __T2 __y, int *__quo) {
	return ::remquo((double)__x, (double)__y, __quo);			return ::remquo((double)__x, (double)__y, __quo);
	}			}
	#endif			#endif

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__ __CONSTEXPR__
	typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type			typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type
	scalbln(__T __x, long int __exp) {			scalbln(__T __x, long int __exp) {
	return ::scalbln((double)__x, __exp);			return ::scalbln((double)__x, __exp);
	}			}

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__ __CONSTEXPR__
	typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type			typename __hip_enable_if<__hip::is_integral<__T>::value, double>::type
	scalbn(__T __x, int __exp) {			scalbn(__T __x, int __exp) {
	return ::scalbn((double)__x, __exp);			return ::scalbn((double)__x, __exp);
	}			}

	#pragma pop_macro("__HIP_OVERLOAD1")			#pragma pop_macro("__HIP_OVERLOAD1")
	#pragma pop_macro("__HIP_OVERLOAD2")			#pragma pop_macro("__HIP_OVERLOAD2")

	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	// But, from VS2019, it's only included in `<complex>`. Need to include			// But, from VS2019, it's only included in `<complex>`. Need to include
	// `<ymath.h>` here to ensure C functions declared there won't be markded as			// `<ymath.h>` here to ensure C functions declared there won't be markded as
	// `__host__` and `__device__` through `<complex>` wrapper.			// `__host__` and `__device__` through `<complex>` wrapper.
	#include <ymath.h>			#include <ymath.h>

	#if defined(__cplusplus)			#if defined(__cplusplus)
	extern "C" {			extern "C" {
	#endif // defined(__cplusplus)			#endif // defined(__cplusplus)
	__DEVICE__ __attribute__((overloadable)) double _Cosh(double x, double y) {			__DEVICE__ __CONSTEXPR__ __attribute__((overloadable)) double _Cosh(double x,
				double y) {
	return cosh(x) * y;			return cosh(x) * y;
	}			}
	__DEVICE__ __attribute__((overloadable)) float _FCosh(float x, float y) {			__DEVICE__ __CONSTEXPR__ __attribute__((overloadable)) float _FCosh(float x,
				float y) {
	return coshf(x) * y;			return coshf(x) * y;
	}			}
	__DEVICE__ __attribute__((overloadable)) short _Dtest(double *p) {			__DEVICE__ __CONSTEXPR__ __attribute__((overloadable)) short _Dtest(double *p) {
	return fpclassify(*p);			return fpclassify(*p);
	}			}
	__DEVICE__ __attribute__((overloadable)) short _FDtest(float *p) {			__DEVICE__ __CONSTEXPR__ __attribute__((overloadable)) short _FDtest(float *p) {
	return fpclassify(*p);			return fpclassify(*p);
	}			}
	__DEVICE__ __attribute__((overloadable)) double _Sinh(double x, double y) {			__DEVICE__ __CONSTEXPR__ __attribute__((overloadable)) double _Sinh(double x,
				double y) {
	return sinh(x) * y;			return sinh(x) * y;
	}			}
	__DEVICE__ __attribute__((overloadable)) float _FSinh(float x, float y) {			__DEVICE__ __CONSTEXPR__ __attribute__((overloadable)) float _FSinh(float x,
				float y) {
	return sinhf(x) * y;			return sinhf(x) * y;
	}			}
	#if defined(__cplusplus)			#if defined(__cplusplus)
	}			}
	#endif // defined(__cplusplus)			#endif // defined(__cplusplus)
	#endif // defined(_MSC_VER)			#endif // defined(_MSC_VER)
	#endif // !defined(__HIPCC_RTC__)			#endif // !defined(__HIPCC_RTC__)

	#pragma pop_macro("__DEVICE__")			#pragma pop_macro("__DEVICE__")
				#pragma pop_macro("__CONSTEXPR__")

	#endif // __CLANG_HIP_CMATH_H__			#endif // __CLANG_HIP_CMATH_H__

clang/lib/Headers/__clang_hip_math.h

	/*===---- __clang_hip_math.h - Device-side HIP math support ----------------===			/*===---- __clang_hip_math.h - Device-side HIP math support ----------------===
	*			*
	* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	* See https://llvm.org/LICENSE.txt for license information.			* See https://llvm.org/LICENSE.txt for license information.
	* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	*			*
	*===-----------------------------------------------------------------------===			*===-----------------------------------------------------------------------===
	*/			*/
	#ifndef __CLANG_HIP_MATH_H__			#ifndef __CLANG_HIP_MATH_H__
	#define __CLANG_HIP_MATH_H__			#define __CLANG_HIP_MATH_H__

	#if !defined(__HIP__)			#if !defined(__HIP__) && !defined(__OPENMP_AMDGCN__)
	#error "This file is for HIP and OpenMP AMDGCN device compilation only."			#error "This file is for HIP and OpenMP AMDGCN device compilation only."
	#endif			#endif

	#if !defined(__HIPCC_RTC__)			#if !defined(__HIPCC_RTC__)
	#if defined(__cplusplus)			#if defined(__cplusplus)
	#include <algorithm>			#include <algorithm>
	#endif			#endif
	#include <limits.h>			#include <limits.h>
	#include <stdint.h>			#include <stdint.h>
	#endif // __HIPCC_RTC__			#endif // !defined(__HIPCC_RTC__)

	#pragma push_macro("__DEVICE__")			#pragma push_macro("__DEVICE__")

				#ifdef __OPENMP_AMDGCN__
				#define __DEVICE__ static inline __attribute__((always_inline, nothrow))
				#else
	#define __DEVICE__ static __device__ inline __attribute__((always_inline))			#define __DEVICE__ static __device__ inline __attribute__((always_inline))
				#endif
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions wonder if HIP would benefit from nothrow here JonChesterfield: wonder if HIP would benefit from nothrow here
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Would like to keep hip changes minimal in this patch. pdhaliwal: Would like to keep hip changes minimal in this patch.

	// A few functions return bool type starting only in C++11.			// A few functions return bool type starting only in C++11.
	#pragma push_macro("__RETURN_TYPE")			#pragma push_macro("__RETURN_TYPE")
				#ifdef __OPENMP_AMDGCN__
				#define __RETURN_TYPE int
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I'd expect openmp to match the cplusplus/c distinction here, as openmp works on C source JonChesterfield: I'd expect openmp to match the cplusplus/c distinction here, as openmp works on C source
				jdoerfertUnsubmitted Not Done Reply Inline Actions ^ Agreed. Though, we use a different trick because it's unfortunately not as straight forward always and can be decided based on the C vs C++. jdoerfert: ^ Agreed. Though, we use a different trick because it's unfortunately not as straight forward…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions This is somewhat tricky. Since declaration of `__finite/__isnan /__isinff` is with int return type in standard library (and the corresponding methods in C++ seems to be isfinite, isnan and isinf with bool return type), the compiler fails to resolve these functions when using bool. I don't know how HIP is working. __RETURN_TYPE macro is only being used with the following methods: __finite __isnan __isinf __signbit and with the corresponding float versions. pdhaliwal: This is somewhat tricky. Since declaration of `__finite/__isnan /__isinff` is with int return…
				jdoerfertUnsubmitted Not Done Reply Inline Actions I marked the code above that actually overloads these functions in OpenMP (or better the versions w/o underscores) such that the system can have either version and it should work fine. jdoerfert: I marked the code above that actually overloads these functions in OpenMP (or better the…
				#else
	#if defined(__cplusplus)			#if defined(__cplusplus)
	#define __RETURN_TYPE bool			#define __RETURN_TYPE bool
	#else			#else
	#define __RETURN_TYPE int			#define __RETURN_TYPE int
	#endif			#endif
				#endif // __OPENMP_AMDGCN__

	#if defined (__cplusplus) && __cplusplus < 201103L			#if defined (__cplusplus) && __cplusplus < 201103L
	// emulate static_assert on type sizes			// emulate static_assert on type sizes
	template<bool>			template<bool>
	struct __compare_result{};			struct __compare_result{};
	template<>			template<>
	struct __compare_result<true> {			struct __compare_result<true> {
	static const __device__ bool valid;			static const __device__ bool valid;
	▲ Show 20 Lines • Show All 1,215 Lines • ▼ Show 20 Lines
	double max(double __x, double __y) { return fmax(__x, __y); }			double max(double __x, double __y) { return fmax(__x, __y); }

	__DEVICE__			__DEVICE__
	float min(float __x, float __y) { return fminf(__x, __y); }			float min(float __x, float __y) { return fminf(__x, __y); }

	__DEVICE__			__DEVICE__
	double min(double __x, double __y) { return fmin(__x, __y); }			double min(double __x, double __y) { return fmin(__x, __y); }

	#if !defined(__HIPCC_RTC__)			#if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
	__host__ inline static int min(int __arg1, int __arg2) {			__host__ inline static int min(int __arg1, int __arg2) {
	return std::min(__arg1, __arg2);			return std::min(__arg1, __arg2);
	}			}

	__host__ inline static int max(int __arg1, int __arg2) {			__host__ inline static int max(int __arg1, int __arg2) {
	return std::max(__arg1, __arg2);			return std::max(__arg1, __arg2);
	}			}
	#endif // __HIPCC_RTC__			#endif // !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
	#endif			#endif

	#pragma pop_macro("__DEVICE__")			#pragma pop_macro("__DEVICE__")
	#pragma pop_macro("__RETURN_TYPE")			#pragma pop_macro("__RETURN_TYPE")

	#endif // __CLANG_HIP_MATH_H__			#endif // __CLANG_HIP_MATH_H__

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h

	/*===- __clang_openmp_device_functions.h - OpenMP device function declares -===			/*===- __clang_openmp_device_functions.h - OpenMP device function declares -===
	*			*
	* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	* See https://llvm.org/LICENSE.txt for license information.			* See https://llvm.org/LICENSE.txt for license information.
	* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	*			*
	*===-----------------------------------------------------------------------===			*===-----------------------------------------------------------------------===
	*/			*/

	#ifndef __CLANG_OPENMP_DEVICE_FUNCTIONS_H__			#ifndef __CLANG_OPENMP_DEVICE_FUNCTIONS_H__
	#define __CLANG_OPENMP_DEVICE_FUNCTIONS_H__			#define __CLANG_OPENMP_DEVICE_FUNCTIONS_H__

	#ifndef _OPENMP			#ifndef _OPENMP
	#error "This file is for OpenMP compilation only."			#error "This file is for OpenMP compilation only."
				Lint: Pre-merge checks Inline Actions clang-tidy: error: "This file is for OpenMP compilation only." [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: "This file is for OpenMP compilation only." [clang-diagnostic-error] [[https…
	#endif			#endif

	#pragma omp begin declare variant match( \
	device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any)})

	#ifdef __cplusplus			#ifdef __cplusplus
	extern "C" {			extern "C" {
	#endif			#endif

				#pragma omp begin declare variant match( \
				device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any)})

	#define __CUDA__			#define __CUDA__
	#define __OPENMP_NVPTX__			#define __OPENMP_NVPTX__

	/// Include declarations for libdevice functions.			/// Include declarations for libdevice functions.
	#include <__clang_cuda_libdevice_declares.h>			#include <__clang_cuda_libdevice_declares.h>

	/// Provide definitions for these functions.			/// Provide definitions for these functions.
	#include <__clang_cuda_device_functions.h>			#include <__clang_cuda_device_functions.h>

	#undef __OPENMP_NVPTX__			#undef __OPENMP_NVPTX__
	#undef __CUDA__			#undef __CUDA__

	#ifdef __cplusplus			#pragma omp end declare variant
	} // extern "C"
				#pragma omp begin declare variant match(device = {arch(amdgcn)})
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Given that declare variant didn't work elsewhere, it probably doesn't work here. Thus this may be the root cause of https://bugs.llvm.org/show_bug.cgi?id=51337 JonChesterfield: Given that declare variant didn't work elsewhere, it probably doesn't work here. Thus this may…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Was able to reproduce this issue locally on nvptx machine. And you are right, declare variant didn't work here as well. Wrapping it in #ifdef fixed the issue. I will create a fix. pdhaliwal: Was able to reproduce this issue locally on nvptx machine. And you are right, declare variant…

				// Import types which will be used by __clang_hip_libdevice_declares.h
				#ifndef __cplusplus
				#include <stdbool.h>
				#include <stdint.h>
	#endif			#endif

				#define __OPENMP_AMDGCN__
				#pragma push_macro("__device__")
				ashi1Unsubmitted Done Reply Inline Actions Would it be better to push and pop these macros, in case it was defined outside of here? ashi1: Would it be better to push and pop these macros, in case it was defined outside of here?
				#define __device__

				/// Include declarations for libdevice functions.
				#include <__clang_hip_libdevice_declares.h>

				#pragma pop_macro("__device__")
				#undef __OPENMP_AMDGCN__

	#pragma omp end declare variant			#pragma omp end declare variant

				#ifdef __cplusplus
				} // extern "C"
				#endif

	// Ensure we make `_ZdlPv`, aka. `operator delete(void*)` available without the			// Ensure we make `_ZdlPv`, aka. `operator delete(void*)` available without the
	// need to `include <new>` in C++ mode.			// need to `include <new>` in C++ mode.
	#ifdef __cplusplus			#ifdef __cplusplus

	// We require malloc/free.			// We require malloc/free.
	#include <cstdlib>			#include <cstdlib>

	#pragma push_macro("OPENMP_NOEXCEPT")			#pragma push_macro("OPENMP_NOEXCEPT")
	Show All 26 Lines
	inline void operator delete[](void *ptr, __SIZE_TYPE__ size) OPENMP_NOEXCEPT {			inline void operator delete[](void *ptr, __SIZE_TYPE__ size) OPENMP_NOEXCEPT {
	::operator delete(ptr);			::operator delete(ptr);
	}			}
	#endif			#endif

	#pragma pop_macro("OPENMP_NOEXCEPT")			#pragma pop_macro("OPENMP_NOEXCEPT")
	#endif			#endif

	#endif			#endif
				JonChesterfieldUnsubmitted Done Reply Inline Actions i think this should be `#define __device__` JonChesterfield: i think this should be `#define __device__`
				jdoerfertUnsubmitted Done Reply Inline Actions Can you make the declare variant scope of nvptx and amdgpu smaller and put them next to each other. #ifdef __cplusplus extern "C" { #endif #declare variant #define ... ... #undef #end #declare variant ... #end #ifdef __cplusplus } // extern "C" jdoerfert: Can you make the declare variant scope of nvptx and amdgpu smaller and put them next to each…
				tianshilei1992Unsubmitted Not Done Reply Inline Actions Right because we already have `declare variant`. tianshilei1992: Right because we already have `declare variant`.

clang/lib/Headers/openmp_wrappers/cmath

	Show All 21 Lines

	// We (might) need cstdlib because __clang_cuda_cmath.h below declares `abs`			// We (might) need cstdlib because __clang_cuda_cmath.h below declares `abs`
	// which might live in cstdlib.			// which might live in cstdlib.
	#include <cstdlib>			#include <cstdlib>

	// We need limits because __clang_cuda_cmath.h below uses `std::numeric_limit`.			// We need limits because __clang_cuda_cmath.h below uses `std::numeric_limit`.
	#include <limits>			#include <limits>

	#pragma omp begin declare variant match( \			#pragma omp begin declare variant match( \
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions this declare variant will not match amdgcn JonChesterfield: this declare variant will not match amdgcn
	device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any, allow_templates)})			device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any, allow_templates)})

	#define __CUDA__			#define __CUDA__
	#define __OPENMP_NVPTX__			#define __OPENMP_NVPTX__
	#include <__clang_cuda_cmath.h>			#include <__clang_cuda_cmath.h>
	#undef __OPENMP_NVPTX__			#undef __OPENMP_NVPTX__
	#undef __CUDA__			#undef __CUDA__

	// Overloads not provided by the CUDA wrappers but by the CUDA system headers.			// Overloads not provided by the CUDA wrappers but by the CUDA system headers.
	// Since we do not include the latter we define them ourselves.			// Since we do not include the latter we define them ourselves.
	#define __DEVICE__ static constexpr __attribute__((always_inline, nothrow))			#define __DEVICE__ static constexpr __attribute__((always_inline, nothrow))

	__DEVICE__ float acosh(float __x) { return ::acoshf(__x); }			__DEVICE__ float acosh(float __x) { return ::acoshf(__x); }
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions which means that amdgcn is not going to pick up any of these overloads, but that looks like it's actually OK because clang_hip_cmath does define them (I think, there are a lot of macros involved) JonChesterfield: which means that amdgcn is not going to pick up any of these overloads, but that looks like…
	__DEVICE__ float asinh(float __x) { return ::asinhf(__x); }			__DEVICE__ float asinh(float __x) { return ::asinhf(__x); }
	__DEVICE__ float atanh(float __x) { return ::atanhf(__x); }			__DEVICE__ float atanh(float __x) { return ::atanhf(__x); }
	__DEVICE__ float cbrt(float __x) { return ::cbrtf(__x); }			__DEVICE__ float cbrt(float __x) { return ::cbrtf(__x); }
	__DEVICE__ float erf(float __x) { return ::erff(__x); }			__DEVICE__ float erf(float __x) { return ::erff(__x); }
	__DEVICE__ float erfc(float __x) { return ::erfcf(__x); }			__DEVICE__ float erfc(float __x) { return ::erfcf(__x); }
	__DEVICE__ float exp2(float __x) { return ::exp2f(__x); }			__DEVICE__ float exp2(float __x) { return ::exp2f(__x); }
	__DEVICE__ float expm1(float __x) { return ::expm1f(__x); }			__DEVICE__ float expm1(float __x) { return ::expm1f(__x); }
	__DEVICE__ float fdim(float __x, float __y) { return ::fdimf(__x, __y); }			__DEVICE__ float fdim(float __x, float __y) { return ::fdimf(__x, __y); }
	Show All 18 Lines
	}			}
	__DEVICE__ float scalbn(float __x, int __y) { return ::scalbnf(__x, __y); }			__DEVICE__ float scalbn(float __x, int __y) { return ::scalbnf(__x, __y); }
	__DEVICE__ float tgamma(float __x) { return ::tgammaf(__x); }			__DEVICE__ float tgamma(float __x) { return ::tgammaf(__x); }

	#undef __DEVICE__			#undef __DEVICE__

	#pragma omp end declare variant			#pragma omp end declare variant

				#ifdef __AMDGCN__
				#pragma omp begin declare variant match(device = {arch(amdgcn)})

				#pragma push_macro("__constant__")
				#define __constant__ __attribute__((constant))
				#define __OPENMP_AMDGCN__
				jdoerfertUnsubmitted Not Done Reply Inline Actions No match_any needed (here and elsewhere). Also, don't we want all but the includes to be the same for both GPUs. Maybe we have a device(kind(gpu)) variant and inside the nvptx and amdgpu just for the respective include? jdoerfert: No match_any needed (here and elsewhere). Also, don't we want all but the includes to be the…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions device(kind(gpu)) breaks nvptx and hip with lots of errors like below, ... __clang_cuda_device_functions.h:29:40: error: use of undeclared identifier '__nvvm_vote_all' ... Maybe I am doing something wrong. pdhaliwal: device(kind(gpu)) breaks nvptx and hip with lots of errors like below, ``` ...

				#include <__clang_hip_cmath.h>

				#pragma pop_macro("__constant__")
				#undef __OPENMP_AMDGCN__

				#pragma omp end declare variant
				#endif // __AMDGCN__

	#endif			#endif
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions let's not add in commented out code JonChesterfield: let's not add in commented out code

clang/lib/Headers/openmp_wrappers/math.h

	Show All 18 Lines
	#ifdef __cplusplus			#ifdef __cplusplus
	#include <cmath>			#include <cmath>
	#endif			#endif

	#ifndef __CLANG_OPENMP_MATH_H__			#ifndef __CLANG_OPENMP_MATH_H__
	#define __CLANG_OPENMP_MATH_H__			#define __CLANG_OPENMP_MATH_H__

	#ifndef _OPENMP			#ifndef _OPENMP
	#error "This file is for OpenMP compilation only."			#error "This file is for OpenMP compilation only."
				Lint: Pre-merge checks Inline Actions clang-tidy: error: "This file is for OpenMP compilation only." [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: "This file is for OpenMP compilation only." [clang-diagnostic-error] [[https…
	#endif			#endif

	#include_next <math.h>			#include_next <math.h>

	// We need limits.h for __clang_cuda_math.h below and because it should not hurt			// We need limits.h for __clang_cuda_math.h below and because it should not hurt
	// we include it eagerly here.			// we include it eagerly here.
	#include <limits.h>			#include <limits.h>

	// We need stdlib.h because (for now) __clang_cuda_math.h below declares `abs`			// We need stdlib.h because (for now) __clang_cuda_math.h below declares `abs`
	// which should live in stdlib.h.			// which should live in stdlib.h.
	#include <stdlib.h>			#include <stdlib.h>

	#pragma omp begin declare variant match( \			#pragma omp begin declare variant match( \
	device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any)})			device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any)})
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions @jdoerfert do you know why we have match_any here? wondering if the amdgcn variant below should have the same JonChesterfield: @jdoerfert do you know why we have match_any here? wondering if the amdgcn variant below should…
				tianshilei1992Unsubmitted Not Done Reply Inline Actions Because `nvptx` and `nvptx64` share the same implementation. They just emit different IRs. If AMDGCN only has one architecture, it doesn't need to use `match_any`. tianshilei1992: Because `nvptx` and `nvptx64` share the same implementation. They just emit different IRs. If…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions ah, right - thank you! JonChesterfield: ah, right - thank you!

	#define __CUDA__			#define __CUDA__
	#define __OPENMP_NVPTX__			#define __OPENMP_NVPTX__
	#include <__clang_cuda_math.h>			#include <__clang_cuda_math.h>
	#undef __OPENMP_NVPTX__			#undef __OPENMP_NVPTX__
	#undef __CUDA__			#undef __CUDA__

	#pragma omp end declare variant			#pragma omp end declare variant

				#pragma omp begin declare variant match(device = {arch(amdgcn)})

				#define __OPENMP_AMDGCN__
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions That's quite worrying. Declare variant match amdgcn is supposed to have the same effect here as the older style macro. I wonder if we have any test coverage for whether declare variant works for amdgcn. JonChesterfield: That's quite worrying. Declare variant match amdgcn is supposed to have the same effect here as…
				#include <__clang_hip_math.h>
				#undef __OPENMP_AMDGCN__

				#pragma omp end declare variant

	#endif			#endif
				jdoerfertUnsubmitted Not Done Reply Inline Actions FWIW, This is what I think the begin/end regions should look like. Small and next to each other. jdoerfert: FWIW, This is what I think the begin/end regions should look like. Small and next to each other.

clang/test/Headers/Inputs/include/algorithm

This file was added.

				#pragma once

				namespace std {
				template<class T> constexpr const T& min(const T& a, const T& b);
				template<class T> constexpr const T& max(const T& a, const T& b);
				}
				No newline at end of file

clang/test/Headers/Inputs/include/cstdlib

	#pragma once			#pragma once

	#include <stdlib.h>			#include <stdlib.h>

	#if __cplusplus >= 201703L			#if __cplusplus >= 201703L
	extern int abs (int __x) throw() __attribute__ ((__const__)) ;			extern int abs (int __x) throw() __attribute__ ((__const__)) ;
	extern long int labs (long int __x) throw() __attribute__ ((__const__)) ;			extern long int labs (long int __x) throw() __attribute__ ((__const__)) ;
	extern float fabs (float __x) throw() __attribute__ ((__const__)) ;			extern float fabs (float __x) throw() __attribute__ ((__const__)) ;
	#else			#else
	extern int abs (int __x) __attribute__ ((__const__)) ;			extern int abs (int __x) __attribute__ ((__const__)) ;
	extern long int labs (long int __x) __attribute__ ((__const__)) ;			extern long int labs (long int __x) __attribute__ ((__const__)) ;
	extern float fabs (float __x) __attribute__ ((__const__)) ;			extern float fabs (float __x) __attribute__ ((__const__)) ;
	#endif			#endif

	namespace std			namespace std
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I think I'd expect builtin_labs et al to work on amdgcn, are we missing lowering for them? JonChesterfield: I think I'd expect builtin_labs et al to work on amdgcn, are we missing lowering for them?
				jdoerfertUnsubmitted Not Done Reply Inline Actions Yeah, looks weird that we cannot compile this mock-up header. jdoerfert: Yeah, looks weird that we cannot compile this mock-up header.
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions From what I understand, hip is defining fabs to use ocml's version into the std namespace, which was already defined in this header. So that's causing multiple declaration error. I will wrap only fabs in the ifdef's pdhaliwal: From what I understand, hip is defining fabs to use ocml's version into the std namespace…
	{			{

	using ::abs;			using ::abs;

	inline long			inline long
	abs(long __i) { return __builtin_labs(__i); }			abs(long __i) { return __builtin_labs(__i); }

	inline long long			inline long long
	abs(long long __x) { return __builtin_llabs (__x); }			abs(long long __x) { return __builtin_llabs (__x); }

				// amdgcn already provides definition of fabs
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions drop comment? JonChesterfield: drop comment?
				#ifndef __AMDGCN__
	float fabs(float __x) { return __builtin_fabs(__x); }			float fabs(float __x) { return __builtin_fabs(__x); }
				#endif
				jdoerfertUnsubmitted Not Done Reply Inline Actions That seems to be fundamentally broken then, but let's see, maybe it will somehow work anyway. jdoerfert: That seems to be fundamentally broken then, but let's see, maybe it will somehow work anyway.
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I thought fabs was in math, not stdlib. Not sure what this file is doing but the functions above are inline and fabs isn't JonChesterfield: I thought fabs was in math, not stdlib. Not sure what this file is doing but the functions…
				estewart08Unsubmitted Not Done Reply Inline Actions I am afraid this is just a workaround to get openmp_device_math_isnan.cpp to pass for AMDGCN. This stems from not having an #ifndef OPENMP_AMDGCN in __clang_hip_cmath.h where 'using ::fabs' is present. Currently, OPENMP_AMDGCN uses all of the overloaded functions created by the HIP macros where NVPTX does not use the CUDA overloads. This may be a new topic of discussion. https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_cmath.h#L191 By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788. estewart08: I am afraid this is just a workaround to get openmp_device_math_isnan.cpp to pass for AMDGCN.
				jdoerfertUnsubmitted Not Done Reply Inline Actions By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788. Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot. jdoerfert: > By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause. The list of functions near the top of `__clang_cuda_cmath.h` is a subset of libm, e.g. __DEVICE__ float acos(float __x) { return ::acosf(__x); } but no acosh Later on in the file are: __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh) but these are guarded by `#ifndef __OPENMP_NVPTX__`, which suggests they are not included when using the header from openmp. However, openmp_wrappers/cmath does include `__DEVICE__ float acosh(float __x) { return ::acoshf(__x); }` under the comment `// Overloads not provided by the CUDA wrappers by by the CUDA system headers` Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header. My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor. Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks. JonChesterfield: TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into…
				estewart08Unsubmitted Not Done Reply Inline Actions TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into looking for something that doesn't work, then much longer trying to trace where the various functions come from, and have concluded that the hip cmath header diverging from the cuda cmath header is the root cause. The list of functions near the top of `__clang_cuda_cmath.h` is a subset of libm, e.g. __DEVICE__ float acos(float __x) { return ::acosf(__x); } but no acosh Later on in the file are: __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh) but these are guarded by `#ifndef __OPENMP_NVPTX__`, which suggests they are not included when using the header from openmp. However, openmp_wrappers/cmath does include `__DEVICE__ float acosh(float __x) { return ::acoshf(__x); }` under the comment `// Overloads not provided by the CUDA wrappers by by the CUDA system headers` Finally there are some functions that are not in either list, such as fma(float,float,float), but which are nevertheless resolved, at a guess in a glibc header. My current theory is that nvptx gets the set of functions right through a combination of cuda headers, clang cuda headers, clang openmp headers, system headers. At least, the half dozen I've tried work, and iirc it passes the OvO suite which I believe calls all of them. Wimplicit-float-conversion complains about a few but that seems minor. Further, I think hip does not get this right, because the hip cmath header has diverged from the cuda one, and the amdgpu openmp implementation that tries to use the hip headers does not pass the OvO suite without some hacks. estewart08: > TLDR, I think nvptx works here, but it's hard to be certain. I've put a few minutes into…
				jdoerfertUnsubmitted Not Done Reply Inline Actions > By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these meant to eventually be present in openmp_wrappers/cmath? Not sure what issues @jdoerfert ran into with D75788. Can you provide an example that shows how we "loose" something? So an input and command line that should work but doesn't, or that should be compiled to something else. That would help me a lot. @estewart08 Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that. jdoerfert: > > By using this ifndef, it seems NVPTX looses quite a few overloaded functions. Are these…
				estewart08Unsubmitted Not Done Reply Inline Actions Feel free to provide me with something that doesn't work even as this goes in. It sounded you had some ideas and I'd like to look into that. At this point all of the functions I have tried for nvptx did not show an error. It was unclear to me how the device versions of certain overloaded functions were being resolved. As Jon mentioned above, it is a mix of headers that range from clang, openmp, cuda, and system headers. For now, I will retract my statement and if I run into any problems in the future I will point them out. estewart08: >Feel free to provide me with something that doesn't work even as this goes in. It sounded you…

	float abs(float __x) { return fabs(__x); }			float abs(float __x) { return fabs(__x); }
	double abs(double __x) { return fabs(__x); }			double abs(double __x) { return fabs(__x); }

	}			}

clang/test/Headers/Inputs/include/utility

This file was added.

				#pragma once

clang/test/Headers/amdgcn_openmp_device_math.c

This file was added.

				// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c -fopenmp -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
				// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - \| FileCheck %s --check-prefixes=CHECK-C,CHECK
				// RUN: %clang_cc1 -internal-isystem %S/Inputs/include -x c++ -fopenmp -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-host.bc
				// RUN: %clang_cc1 -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -x c++ -fopenmp -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - \| FileCheck %s --check-prefixes=CHECK-CPP,CHECK

				#ifdef __cplusplus
				#include <cmath>
				#else
				#include <math.h>
				#endif

				void test_math_f64(double x) {
				// CHECK-LABEL: define {{.*}}test_math_f64
				#pragma omp target
				{
				// CHECK: call double @__ocml_sin_f64
				double l1 = sin(x);
				// CHECK: call double @__ocml_cos_f64
				double l2 = cos(x);
				// CHECK: call double @__ocml_fabs_f64
				double l3 = fabs(x);
				}
				}

				void test_math_f32(float x) {
				// CHECK-LABEL: define {{.*}}test_math_f32
				#pragma omp target
				{
				// CHECK-C: call double @__ocml_sin_f64
				// CHECK-CPP: call float @__ocml_sin_f32
				float l1 = sin(x);
				// CHECK-C: call double @__ocml_cos_f64
				// CHECK-CPP: call float @__ocml_cos_f32
				float l2 = cos(x);
				// CHECK-C: call double @__ocml_fabs_f64
				// CHECK-CPP: call float @__ocml_fabs_f32
				float l3 = fabs(x);
				}
				}
				void test_math_f32_suffix(float x) {
				// CHECK-LABEL: define {{.*}}test_math_f32_suffix
				#pragma omp target
				{
				// CHECK: call float @__ocml_sin_f32
				float l1 = sinf(x);
				// CHECK: call float @__ocml_cos_f32
				float l2 = cosf(x);
				// CHECK: call float @__ocml_fabs_f32
				float l3 = fabsf(x);
				}
				}

clang/test/Headers/openmp_device_math_isnan.cpp

	Show All 15 Lines
	// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN \| FileCheck %s --check-prefix=AMD_INT_RETURN			// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple amdgcn-amd-amdhsa -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN \| FileCheck %s --check-prefix=AMD_INT_RETURN
	// expected-no-diagnostics			// expected-no-diagnostics

	#include <cmath>			#include <cmath>

	double math(float f, double d) {			double math(float f, double d) {
	double r = 0;			double r = 0;
	// INT_RETURN: call i32 @__nv_isnanf(float			// INT_RETURN: call i32 @__nv_isnanf(float
	// AMD_INT_RETURN: call i32 @_{{.*}}isnanf(float			// AMD_INT_RETURN: call i32 @__ocml_isnan_f32(float
	// BOOL_RETURN: call i32 @__nv_isnanf(float			// BOOL_RETURN: call i32 @__nv_isnanf(float
	// AMD_BOOL_RETURN: call zeroext i1 @_{{.*}}isnanf(float			// AMD_BOOL_RETURN: call i32 @__ocml_isnan_f32(float
	r += std::isnan(f);			r += std::isnan(f);
	// INT_RETURN: call i32 @__nv_isnand(double			// INT_RETURN: call i32 @__nv_isnand(double
	// AMD_INT_RETURN: call i32 @_{{.*}}isnand(double			// AMD_INT_RETURN: call i32 @__ocml_isnan_f64(double
	// BOOL_RETURN: call i32 @__nv_isnand(double			// BOOL_RETURN: call i32 @__nv_isnand(double
	// AMD_BOOL_RETURN: call zeroext i1 @_{{.*}}isnand(double			// AMD_BOOL_RETURN: call i32 @__ocml_isnan_f64(double
	r += std::isnan(d);			r += std::isnan(d);
	return r;			return r;
	}			}

	long double foo(float f, double d, long double ld) {			long double foo(float f, double d, long double ld) {
	double r = ld;			double r = ld;
	r += math(f, d);			r += math(f, d);
	#pragma omp target map(r)			#pragma omp target map(r)
	{ r += math(f, d); }			{ r += math(f, d); }
	return r;			return r;
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Initial math headers supportClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354913

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Headers/__clang_hip_cmath.h

clang/lib/Headers/__clang_hip_math.h

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h

clang/lib/Headers/openmp_wrappers/cmath

clang/lib/Headers/openmp_wrappers/math.h

clang/test/Headers/Inputs/include/algorithm

clang/test/Headers/Inputs/include/cstdlib

clang/test/Headers/Inputs/include/utility

clang/test/Headers/amdgcn_openmp_device_math.c

clang/test/Headers/openmp_device_math_isnan.cpp

[OpenMP][AMDGCN] Initial math headers support
ClosedPublic