Page MenuHomePhabricator

[Clang][AMDGCN] Universal device offloading macros header
Needs ReviewPublic

Authored by saiislam on Jul 28 2020, 4:17 AM.

Details

Summary

This header creates macros _DEVICE_ARCH and _DEVICE_GPU with values. This
header exists because compiler macros are inconsistent in specifying if a
compiliation is a device pass or a host pass. There is also inconsistency in
how the device architecture and type are specified during a device pass. The
inconsistencies are between OpenMP, CUDA, HIP, and OpenCL. The macro logic
in this header is aware of these inconsistencies and sets useful values for
_DEVICE_ARCH and _DEVICE_GPU during a device compilation. The macros will
not be defined during a host compilation pass. So "#ifndef _DEVICE_ARCH" can
be used by users to imply a host compilation. This header must remain a
preprocessing header only because it is intended to be used by different
languages.

Originally authored by Greg Rodgers (@gregrodgers).

Diff Detail

Event Timeline

saiislam created this revision.Jul 28 2020, 4:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 4:17 AM
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 28 2020, 4:46 AM
Harbormaster failed remote builds in B65994: Diff 281183!
saiislam updated this revision to Diff 281189.Jul 28 2020, 4:53 AM

Fixed clang-tidy warnings.

saiislam requested review of this revision.Jul 28 2020, 5:23 AM
jdoerfert added subscribers: MaskRay, tra.

I like this. I hope this is the start of splitting the __cuda headers into generic and specific code, right? @tra @MaskRay any objections on the direction?

clang/lib/Headers/offload_macros.h
23

Nit duplicate

70

I guess the pattern

#if defined(__AMDGCN__)
#define _DEVICE_ARCH amdgcn
// _DEVICE_GPU set below
#endif
#if defined(__NVPTX__)
#define _DEVICE_ARCH nvptx64
#define _DEVICE_GPU __CUDA_ARCH__
#endif

is repeating here but it might make sense to lists all the cases one by one instead of a single conditional, e.g., ifdef OPENMP || SYCL || OPENCL || ...

tra added a comment.Jul 28 2020, 11:11 AM

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

clang/lib/Headers/offload_macros.h
29

The naming seems to conflict with the current notation that GPU arch is the specific GPU variant. E.g. gfx900 or sm_60.
Perhaps we should use a higher level term. kind, vendor?

34

I'd just split it into separate if sections for AMDGCN and NVPTX. One less nesting level for preprocessor conditionals would be easier to follow.

36

What exactly is amdgcn and how can it be used in practice? I.e. I can't use it in preprocessor conditionals.

I think you need to have numberic constants defined for the different ARCH variants.

38

Please add a comment tracking which conditional this else is for. E.g. // __AMDGCN__

39

Nit -- there's techically 32-bit nvptx, even though it's getting obsolete.

72

This does not work, does it? https://godbolt.org/z/Kn3r4x

In D84743#2179441, @tra wrote:

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

FWIW, I was only thinking about clang/lib/Header usage. *Potentially* documented for user of clang.

In D84743#2179441, @tra wrote:

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

FWIW, I was only thinking about clang/lib/Header usage. *Potentially* documented for user of clang.

Honestly I am a bit uneasy with the new clang/lib/Header file. It will be part of the clang resource directory and users on every target will be able to #include <offload_macros.h> it.
This is also a namespace pollution - used incorrectly, people can trip over it if they have files of the same name.

I think there really should be a good justification for it being being part of the resource directory and not a library, and there needs to be a specification.

In D84743#2179441, @tra wrote:

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

FWIW, I was only thinking about clang/lib/Header usage. *Potentially* documented for user of clang.

Honestly I am a bit uneasy with the new clang/lib/Header file. It will be part of the clang resource directory and users on every target will be able to #include <offload_macros.h> it.
This is also a namespace pollution - used incorrectly, people can trip over it if they have files of the same name.

There are two levels to it though. 1) clang/lib/Header, and 2) clang/lib/Header/XXXXXX. We do not expose XXXXX to every user on every target, so when we do we really want the headers to be first. So for 2) the names should match existing system headers we want to wrap. For 1) the header names should be "in the compiler namespace". That said, the file above is not and I didn't notice before. The existing CUDA overloads that live in 1) start with __cuda, which should be sufficient for users not to trip over them. I mean, they could trip over a lot of things that starts with __. I imagine we have a __gpu_... set of header soon to avoid duplicating or polluting the __cuda ones (more). Now that I finished all this, also the XXXXXX above needs to be renamed into __XXXXX, but that seems easy enough to do.

jdoerfert added inline comments.Jul 28 2020, 11:23 PM
clang/lib/Headers/offload_macros.h
2

After @MaskRay noticed this, I think this should be __offload_macros.h to make it clear this is an internal header.

We probably do want a macro to indicate 'compiling for amdgcn as the device half of a combined host+device language'. I'm having a tough time with the control flow in this header so we probably want tests to check the overall behaviour is as intended. E.g. static assert + various language modes.

The header should be obviously implemention only so we can change it later. Maybe also provide an unset header and keep them out of application scope entirely to begin with. That's the advantage over the otherwise simpler design of clang always setting them.

This is all excellent feedback. Thank you.
I don't understand what I see on the godbolt link. So far, we have only tested with clang. We will test with gcc to understand the fail.
I will make the change to use numeric values for _DEVICE_ARCH and change "UNKNOWN" to some integer (maybe -1). The value _DEVICE_GPU is intended to be generational within a specific _DEVICE_ARCH.
To be clear, this is primarily for users or library writers to implement device specific or host-only code. This is not something that should be automatic. Users or library authors will opt-in with their own include. So maybe it does not belong in clang/lib/Headers.
As noted in the header comment, I expect the community to help keep this up to date as offloading technology evolves.

tra added a comment.Jul 29 2020, 10:12 AM

Also, do we need the header at all?
It would be much easier to just get clang itself to add normalized macros without trying to reconstruct them from the existing macros.