This is an archive of the discontinued LLVM Phabricator instance.

[Clang][OpenMP] Include header for CUDA builtin vars into OpenMP wrapper header
AbandonedPublic

Authored by tianshilei1992 on Jan 17 2021, 11:50 AM.

Details

Reviewers
jdoerfert
tra
Summary

Current OpenMP wrapper header __clang_openmp_device_functions.h
doesn't include the header for CUDA builtin vars, so variable like threadIdx
cannot be used in OpenMP code, even within declare target.

This patch includes the header. One thing left is, is it fine that we still use
the name __clang_openmp_device_functions.h? Those builtin vars seems like not
part of "device functions".

Diff Detail

Event Timeline

tianshilei1992 requested review of this revision.Jan 17 2021, 11:50 AM
Herald added a project: Restricted Project. · View Herald Transcript

I can see that we want this but I guess the errors show the problem, replace __attribute((device)) with DEVICE and define it based on CUDA vs OpenMP properly.

clang/lib/Headers/__clang_cuda_builtin_vars.h
119

Do we need to do something wrt to the attribute here and above?

Use macro DEVICE to control the attribute

C is still broken as the header is written in C++ where a struct can have member functions, either static or not. Need to fix it.

Fixed compilation in C mode

clang/lib/Headers/__clang_cuda_builtin_vars.h
64–65

How can we deal with the conversion in C?

jdoerfert added inline comments.Jan 18 2021, 2:28 PM
clang/lib/Headers/__clang_cuda_builtin_vars.h
45

Why do we need these fetch_builtins (for C). They have a different name than in the C++ variant anyway. In C we could just not define the fetch stuff but only the Field. So gridDim.x will work fine.

64–65

We don't. Given that CUDA is C++ we can just limit us to the C subset.

Removed code for C that are still not supported

tianshilei1992 marked 3 inline comments as done.Jan 18 2021, 7:31 PM
tianshilei1992 added inline comments.
clang/lib/Headers/__clang_cuda_builtin_vars.h
45

__declspec(property) doesn't support C so that code were removed.

tianshilei1992 marked an inline comment as done.Jan 18 2021, 7:31 PM
tra added inline comments.Jan 19 2021, 10:54 AM
clang/lib/Headers/__clang_cuda_builtin_vars.h
30

Perhaps we should move all C++-related code under #ifdef __cplusplus intead of cherry-picking them all one by one and let the compilation fail if some C code references builtin vars.

43

Can we generate a sensible error instead?
I'd rather fail in an obvious way during compilation than compile successfully into somethings that will not do what the user expected.

Refined the patch to make it only work in C++ mode. In C mode everything is unchanged.

tianshilei1992 marked 2 inline comments as done.Jan 19 2021, 12:14 PM
tianshilei1992 added inline comments.
clang/lib/Headers/__clang_cuda_builtin_vars.h
30

Finally I decided to only include the header in C++ mode. For C mode, everything is unchanged, which means compilation error will still be emitted because the variables are not defined.

tianshilei1992 marked an inline comment as done.Jan 19 2021, 12:14 PM
tra added inline comments.Jan 19 2021, 12:43 PM
clang/lib/Headers/__clang_cuda_builtin_vars.h
31

You should use __ prefix to avoid unintentional clashes with user-defined macros.
__DEVICE__ ?

124–125

You need to #undef the DEVICE macro here.

Renamed the macro and undef it after use

tianshilei1992 marked 2 inline comments as done.Jan 19 2021, 12:51 PM
tra accepted this revision.Jan 19 2021, 1:24 PM

LGTM for __clang_cuda_builtin_vars.h.

This revision is now accepted and ready to land.Jan 19 2021, 1:24 PM

I don't think introducing everything from the cuda namespace into openmp nvptx offloading is a feature. Inevitably people will call threadIdx.x instead of the openmp or clang equivalent, and this will mask missing functionality in openmp.

I won't object too strongly, as ultimately I don't care about cuda, but I view intertwining the two implementations as technical debt.

Note that a cuda-free openmp devicertl compilation doesn't require this patch, or any other pieces of cuda headers.

Note that a cuda-free openmp devicertl compilation doesn't require this patch, or any other pieces of cuda headers.

Yes, we can write all code with CUDA built-in or others in LLVM intrinsics. Then the question is, do we want to use existing wrapper headers or all LLVM instrinsics?

tra added a comment.Jan 19 2021, 2:15 PM

I won't object too strongly, as ultimately I don't care about cuda, but I view intertwining the two implementations as technical debt.

+1 A lot of CUDA-releated headers are not intended to be reusable nor often the best way to implement particular functionality.
While it may be tempting to reuse existing CUDA bits, having a clean implementation not tied to CUDA SDK is usually better.

That said, the builtin vars header is relatively independent from the CUDA SDK (unlike the runtime wrapper and various math-related headers) and is OK to use for other purposes, if needed.

Whether OpenMP should make threadIdx & friends available -- I have no opinion on.

tianshilei1992 abandoned this revision.Jan 19 2021, 6:15 PM

Let's use LLVM intrinsics for now.