This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][OPENMP] OpenMP AMDGCN Header Support
AbandonedPublic

Authored by ashi1 on Oct 16 2020, 12:16 PM.

Details

Summary

Both nvptx and amdgcn will use the autoheader
openmp_wrappers/__clang_openmp_device_functions.h.
Use the clang headers for math functions. Still need a
libm bitcode file to build the FORTRAN device. The code
in __clang_hip_math.h can beu sed to build a libm
device library for FORTRAN. Add log2 and pow operation
with int arg.

Diff Detail

Event Timeline

ashi1 created this revision.Oct 16 2020, 12:16 PM
ashi1 requested review of this revision.Oct 16 2020, 12:16 PM
clang/lib/Headers/__clang_cuda_math.h
29 ↗(On Diff #298715)

I recognise this from the fortran libs build, but I'm not sure it's applicable to trunk as we don't have any amdgpu/openmp fortran support in trunk.

clang/lib/Headers/__clang_hip_cmath.h
98

What are these _i suffix functions for? They aren't in libm. Writing log2(42) usually works without extra overloads

clang/lib/Headers/__clang_hip_libdevice_declares.h
60

I don't think the return type of this library function can depend on whether this header is compiled as c++/openmp or not. ocml is very macro themed, but it looks to me like the declaration there is an instance of extern __attribute__((const)) int OCML_MANGLE_F32(N)(float); which suggests 'int' was right.

clang/lib/Headers/__clang_hip_math.h
273–277

This is a bit weird. __tmp is a stack variable, which I thought would be in address_space(5) already. Declaration is extern float OCML_MANGLE_F32(frexp)(float, __private int *);.

So I guess I can see why either of these branches would work, but don't understand why either of them don't.

442

I don't recognise the symbol powii (or powi) from libm

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
70

Maybe use this define instead of attribute((address_space(5))) throughout the above?

clang/lib/Headers/openmp_wrappers/math.h
30

Why is <new> needed here?

ashi1 updated this revision to Diff 299191.Oct 19 2020, 3:17 PM
ashi1 marked 7 inline comments as done.

Updated based on review comments.

clang/lib/Headers/__clang_hip_cmath.h
98

okay, I will remove these new functions, and a future patch can update these log2 and pow functions.

clang/lib/Headers/__clang_hip_math.h
273–277

I will update this to use the static version for HIP too. That path works correctly, and the HIP path was dropping the addrspace qualifier.

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
70

I will update the HIP and OpenMP to share the same usage of this address space (rather than the 2 paths above). Since HIP doesn't define __private, I may skip this change.

ashi1 added inline comments.Oct 20 2020, 2:04 PM
clang/lib/Headers/__clang_hip_math.h
273–277

Using static seems to cause a HIP error that is not observed on OpenMP.
[2020-10-20T16:18:06.080Z] /opt/rocm-3.9.0-3703/llvm/lib/clang/12.0.0/include/__clang_hip_math.h:280:48: error: within a device function, only shared variables or const variables without device memory qualifier may be marked 'static'
[2020-10-20T16:18:06.080Z] static attribute((address_space(5))) int __tmp;
[2020-10-20T16:18:06.080Z] ^

Should we change this patch to using #ifdef, or we can wait until this patch lands (it will fix this):
https://reviews.llvm.org/D88345

Patch looks broadly reasonable to me. @jdoerfert does this align with what you want from these headers?

clang/lib/Headers/__clang_hip_math.h
273–277

I guess we ifdef until D88345 lands. Have you opened a task for HIP dropping the addrspace qualifier?

ashi1 updated this revision to Diff 299687.Oct 21 2020, 7:44 AM

Added back the #ifdef for few functions that need static with private for OpenMP.

jdoerfert requested changes to this revision.Oct 21 2020, 3:32 PM

First off, this is the right direction, thanks for working on this!

Then, we need to split this. There are various things happening here and we can easily do them one by one.
Please reach out if you don't know where to split but basically try to make the patches "as small as possible" such that you write a test against them.
So for example, math and complex stuff are separate. New functions are separate, changing the attributes is separate, ...

I added a bunch of comments and tried to be exhaustive, though once we split it in the separate components I will do another review.

Finally, where are the tests? You only added the test/Inputs stuff.

clang/lib/Headers/CMakeLists.txt
163

I'm not sold on the hip/hip... idea here but ok.
I guess I would have put the stuff in __clang_openmp_device_functions instead.

Now that I write this, where is hip_runtime included?

clang/lib/Headers/__clang_cuda_complex_builtins.h
103

What is the difference to the block above? We should also revisit the need for this in the first place. We have template overload support through begin/end declare variant in clang now. Best case, none of this is needed and we simply call the C++ or C standard library functions here. That would be great because they have to resolve to the right nv and ocml functions anyway so why not call them ;)

298

The max stuff you can commit ahead of time. LGTM on that NFC part.

clang/lib/Headers/__clang_hip_cmath.h
31

Nit: Hm, the comment talks about static but we use that either way.

clang/lib/Headers/__clang_hip_libdevice_declares.h
308

This I don't get, why is the signature different between C++ and C? That seems wrong.

clang/lib/Headers/__clang_hip_math.h
39

What is the reason you need constexpr in the first place?

51

So I fought this problem long and hard and in the end realized that this way cannot work.
If you fix the type you will always break some systems. The OpenMP declare variant provides therefore an extension that allows to "overload" on the return type. Take a look at
clang/lib/Headers/__clang_cuda_cmath.h:72 which was introduced D85879.

I don't require you to change this (for this commit or in general) but I would urge you to consider it or we will most certainly get bug reports for HIP and OpenMP offload for AMD at some point.

65

This should probably go to the declaration of the macros not their first use.

clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
48

Neither the ifdef here nor the one above are needed.

51

Please elaborate on this and clean it up if we *really* need it. Why is "optimized macro definitions" bad?

70

I think this is problematic but let's talk about this once the stuff is split off.

77

The extern C looks dubious to me. What in here needs it and why isn't it in the included file then?

clang/lib/Headers/openmp_wrappers/cmath
22

If hip cmath needs new, include it there. cmath can arguably rely on math.h but new seems to be hip specific.

79

We don't need this ifdef. Please check also other ifdefs surrounding begin/end declare variant.

90

Same as the ifdef above, not needed. match_any not needed here.

clang/lib/Headers/openmp_wrappers/complex
22

we forgot to undef CUDA here...

25

This is the wrong spot to include libdevice_declares.

48

This I do not understand.

clang/lib/Headers/openmp_wrappers/hip/hip_runtime.h
24

These pragmas are useless and, FWIW, you don't need the extension if you have a single arch here.

undefine the OPENMP_AMDGCN or make sure it is always defined, see also below.

clang/lib/Headers/openmp_wrappers/math.h
63

The extension is not necessary here.

It is dubious that we have to define things here without undefining them.
Please check if these things are set before or not. If not, define and undefine them as needed, I don't want to clutter the user code differently when math is included vs. when it's not.

This revision now requires changes to proceed.Oct 21 2020, 3:32 PM
ashi1 added a comment.Oct 26 2020, 1:15 PM

First off, this is the right direction, thanks for working on this!

Then, we need to split this. There are various things happening here and we can easily do them one by one.
Please reach out if you don't know where to split but basically try to make the patches "as small as possible" such that you write a test against them.
So for example, math and complex stuff are separate. New functions are separate, changing the attributes is separate, ...

I added a bunch of comments and tried to be exhaustive, though once we split it in the separate components I will do another review.

Finally, where are the tests? You only added the test/Inputs stuff.

Hi @jdoerfert , thank you for the review. I will discuss with the team, and start opening smaller patches one at a time.

ashi1 abandoned this revision.Oct 26 2020, 1:23 PM

Dropping this patch in favour of smaller patches.