This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Headers/
-
lib/
-
Headers/
9
offload_macros.h

Differential D84743

[Clang][AMDGCN] Universal device offloading macros header
Needs ReviewPublic

Authored by saiislam on Jul 28 2020, 4:17 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
ABataev
JonChesterfield
tra
MaskRay
jhuber6
arsenm

Summary

This header creates macros _DEVICE_ARCH and _DEVICE_GPU with values. This
header exists because compiler macros are inconsistent in specifying if a
compiliation is a device pass or a host pass. There is also inconsistency in
how the device architecture and type are specified during a device pass. The
inconsistencies are between OpenMP, CUDA, HIP, and OpenCL. The macro logic
in this header is aware of these inconsistencies and sets useful values for
_DEVICE_ARCH and _DEVICE_GPU during a device compilation. The macros will
not be defined during a host compilation pass. So "#ifndef _DEVICE_ARCH" can
be used by users to imply a host compilation. This header must remain a
preprocessing header only because it is intended to be used by different
languages.

Originally authored by Greg Rodgers (@gregrodgers).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.Jul 28 2020, 4:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 4:17 AM

Herald added subscribers: cfe-commits, Anastasia. · View Herald Transcript

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 28 2020, 4:46 AM

Harbormaster failed remote builds in B65994: Diff 281183!

Fixed clang-tidy warnings.

Harbormaster completed remote builds in B65998: Diff 281189.Jul 28 2020, 5:23 AM

saiislam requested review of this revision.Jul 28 2020, 5:23 AM

Herald added a subscriber: sstefan1. · View Herald TranscriptJul 28 2020, 5:23 AM

Herald added a subscriber: wdng. · View Herald TranscriptJul 28 2020, 5:32 AM

I like this. I hope this is the start of splitting the __cuda headers into generic and specific code, right? @tra @MaskRay any objections on the direction?

clang/lib/Headers/offload_macros.h
23	Nit duplicate
70	I guess the pattern #if defined(__AMDGCN__) #define _DEVICE_ARCH amdgcn // _DEVICE_GPU set below #endif #if defined(__NVPTX__) #define _DEVICE_ARCH nvptx64 #define _DEVICE_GPU __CUDA_ARCH__ #endif is repeating here but it might make sense to lists all the cases one by one instead of a single conditional, e.g., `ifdef OPENMP \|\| SYCL \|\| OPENCL \|\| ...`

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

clang/lib/Headers/offload_macros.h
29	The naming seems to conflict with the current notation that GPU `arch` is the specific GPU variant. E.g. `gfx900` or `sm_60`. Perhaps we should use a higher level term. `kind`, `vendor`?
34	I'd just split it into separate `if` sections for AMDGCN and NVPTX. One less nesting level for preprocessor conditionals would be easier to follow.
36	What exactly is `amdgcn` and how can it be used in practice? I.e. I can't use it in preprocessor conditionals. I think you need to have numberic constants defined for the different `ARCH` variants.
38	Please add a comment tracking which conditional this `else` is for. E.g. `// __AMDGCN__`
39	Nit -- there's techically 32-bit nvptx, even though it's getting obsolete.
72	This does not work, does it? https://godbolt.org/z/Kn3r4x

In D84743#2179441, @tra wrote:

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

FWIW, I was only thinking about clang/lib/Header usage. *Potentially* documented for user of clang.

In D84743#2181031, @jdoerfert wrote:

In D84743#2179441, @tra wrote:

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

FWIW, I was only thinking about clang/lib/Header usage. *Potentially* documented for user of clang.

Honestly I am a bit uneasy with the new clang/lib/Header file. It will be part of the clang resource directory and users on every target will be able to #include <offload_macros.h> it.
This is also a namespace pollution - used incorrectly, people can trip over it if they have files of the same name.

I think there really should be a good justification for it being being part of the resource directory and not a library, and there needs to be a specification.

In D84743#2181044, @MaskRay wrote:

In D84743#2181031, @jdoerfert wrote:

In D84743#2179441, @tra wrote:

I'm not sure it's particularly useful, to be honest. CUDA code still needs to be compatible with NVCC so it can't be used in portable code like TF or other currently used CUDA libraries.
It could be useful internally, though, so I'm fine with it for that purpose.

FWIW, I was only thinking about clang/lib/Header usage. *Potentially* documented for user of clang.

Honestly I am a bit uneasy with the new clang/lib/Header file. It will be part of the clang resource directory and users on every target will be able to #include <offload_macros.h> it.
This is also a namespace pollution - used incorrectly, people can trip over it if they have files of the same name.

There are two levels to it though. 1) clang/lib/Header, and 2) clang/lib/Header/XXXXXX. We do not expose XXXXX to every user on every target, so when we do we really want the headers to be first. So for 2) the names should match existing system headers we want to wrap. For 1) the header names should be "in the compiler namespace". That said, the file above is not and I didn't notice before. The existing CUDA overloads that live in 1) start with __cuda, which should be sufficient for users not to trip over them. I mean, they could trip over a lot of things that starts with __. I imagine we have a __gpu_... set of header soon to avoid duplicating or polluting the __cuda ones (more). Now that I finished all this, also the XXXXXX above needs to be renamed into __XXXXX, but that seems easy enough to do.

jdoerfert added inline comments.Jul 28 2020, 11:23 PM

clang/lib/Headers/offload_macros.h
2	After @MaskRay noticed this, I think this should be `__offload_macros.h` to make it clear this is an internal header.

We probably do want a macro to indicate 'compiling for amdgcn as the device half of a combined host+device language'. I'm having a tough time with the control flow in this header so we probably want tests to check the overall behaviour is as intended. E.g. static assert + various language modes.

The header should be obviously implemention only so we can change it later. Maybe also provide an unset header and keep them out of application scope entirely to begin with. That's the advantage over the otherwise simpler design of clang always setting them.

This is all excellent feedback. Thank you.
I don't understand what I see on the godbolt link. So far, we have only tested with clang. We will test with gcc to understand the fail.
I will make the change to use numeric values for _DEVICE_ARCH and change "UNKNOWN" to some integer (maybe -1). The value _DEVICE_GPU is intended to be generational within a specific _DEVICE_ARCH.
To be clear, this is primarily for users or library writers to implement device specific or host-only code. This is not something that should be automatic. Users or library authors will opt-in with their own include. So maybe it does not belong in clang/lib/Headers.
As noted in the header comment, I expect the community to help keep this up to date as offloading technology evolves.

Also, do we need the header at all?
It would be much easier to just get clang itself to add normalized macros without trying to reconstruct them from the existing macros.

Is this still relevant?

Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 3:45 PM

Herald added subscribers: kosarev, StephenFan. · View Herald Transcript

arsenm added a reviewer: jhuber6.Nov 16 2022, 3:47 PM

This might be useful in the context of generating multi-architecture libraries when we start writing libc and libc++ functionality. Although I can't name any use-cases for certain right now. However, shouldn't we just be able to define these in clang along similar to the __NVPTX__ and __AMDGPU___? We already have __CUDA_ARCH__ don't we?

arsenm resigned from this revision.Nov 18 2022, 4:07 PM

Revision Contents

Path

Size

clang/

lib/

Headers/

offload_macros.h

118 lines

Diff 281183

clang/lib/Headers/offload_macros.h

This file was added.

				//===--- offload_macros.h - Universal _DEVICE Offloading Macros Header ---===//
				//
				jdoerfertUnsubmitted Not Done Reply Inline Actions After @MaskRay noticed this, I think this should be `__offload_macros.h` to make it clear this is an internal header. jdoerfert: After @MaskRay noticed this, I think this should be `__offload_macros.h` to make it clear this…
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===-----------------------------------------------------------------------===
				//
				// This header creates macros _DEVICE_ARCH and _DEVICE_GPU with values. This
				// header exists because compiler macros are inconsistent in specifying if a
				// compiliation is a device pass or a host pass. There is also inconsistency in
				// how the device architecture and type are specified during a device pass. The
				// inconsistencies are between OpenMP, CUDA, HIP, and OpenCL. The macro logic
				// in this header is aware of these inconsistencies and sets useful values for
				// _DEVICE_ARCH and _DEVICE_GPU during a device compilation. The macros will
				// not be defined during a host compilation pass. So "#ifndef _DEVICE_ARCH" can
				// be used by users to imply a host compilation. This header must remain a
				// preprocessing header only because it is intended to be used by different
				// languages.
				//
				//===----------------------------------------------------------------------===//
				//===----------------------------------------------------------------------===//

				jdoerfertUnsubmitted Not Done Reply Inline Actions Nit duplicate jdoerfert: Nit duplicate
				#ifndef __OFFLOAD_MACROS_H__
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] not useful Lint: Pre-merge checks: clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] [[https…
				#define __OFFLOAD_MACROS_H__

				#undef _DEVICE_GPU
				#undef _DEVICE_ARCH

				traUnsubmitted Not Done Reply Inline Actions The naming seems to conflict with the current notation that GPU `arch` is the specific GPU variant. E.g. `gfx900` or `sm_60`. Perhaps we should use a higher level term. `kind`, `vendor`? tra: The naming seems to conflict with the current notation that GPU `arch` is the specific GPU…
				#if defined(_OPENMP)
				// OpenMP does not set architecture macros on host pass.
				// So if either set, this is an OpenMP device pass.
				#if defined(__AMDGCN__) \|\| defined(__NVPTX__)
				#if defined(__AMDGCN__)
				traUnsubmitted Not Done Reply Inline Actions I'd just split it into separate `if` sections for AMDGCN and NVPTX. One less nesting level for preprocessor conditionals would be easier to follow. tra: I'd just split it into separate `if` sections for AMDGCN and NVPTX. One less nesting level for…
				#define _DEVICE_ARCH amdgcn
				// _DEVICE_GPU set below
				traUnsubmitted Not Done Reply Inline Actions What exactly is `amdgcn` and how can it be used in practice? I.e. I can't use it in preprocessor conditionals. I think you need to have numberic constants defined for the different `ARCH` variants. tra: What exactly is `amdgcn` and how can it be used in practice? I.e. I can't use it in…
				#else
				#define _DEVICE_ARCH nvptx64
				traUnsubmitted Not Done Reply Inline Actions Please add a comment tracking which conditional this `else` is for. E.g. `// __AMDGCN__` tra: Please add a comment tracking which conditional this `else` is for. E.g. `// __AMDGCN__`
				#define _DEVICE_GPU __CUDA_ARCH__
				traUnsubmitted Not Done Reply Inline Actions Nit -- there's techically 32-bit nvptx, even though it's getting obsolete. tra: Nit -- there's techically 32-bit nvptx, even though it's getting obsolete.
				#endif
				#endif
				#elif defined(__CUDA_ARCH__)
				// CUDA sets macros __NVPTX__ on host pass. So use __CUDA_ARCH__
				// to determine if this is device pass.
				#define _DEVICE_ARCH nvptx64
				#define _DEVICE_GPU __CUDA_ARCH__
				#elif defined(__HIP_DEVICE_COMPILE__)
				// HIP sets macros __AMDGCN__ on host pass. So use __HIP_DEVICE_COMPILE__
				// to determine if this is device pass.
				#define _DEVICE_ARCH amdgcn
				// _DEVICE_GPU set below
				#elif defined(__SYCL_DEVICE_ONLY__)
				#if defined(__AMDGCN__)
				#define _DEVICE_ARCH amdgcn
				// _DEVICE_GPU set below
				#else
				#define _DEVICE_ARCH nvptx64
				#define _DEVICE_GPU __CUDA_ARCH__
				#endif
				#elif defined(__OPENCL_C_VERSION__) \|\| defined(__OPENCL_CPP_VERSION__)
				#if defined(__AMDGCN__)
				#define _DEVICE_ARCH amdgcn
				// _DEVICE_GPU set below
				#endif
				#if defined(__NVPTX__)
				#define _DEVICE_ARCH nvptx64
				#define _DEVICE_GPU __CUDA_ARCH__
				#endif
				#endif

				jdoerfertUnsubmitted Not Done Reply Inline Actions I guess the pattern #if defined(__AMDGCN__) #define _DEVICE_ARCH amdgcn // _DEVICE_GPU set below #endif #if defined(__NVPTX__) #define _DEVICE_ARCH nvptx64 #define _DEVICE_GPU __CUDA_ARCH__ #endif is repeating here but it might make sense to lists all the cases one by one instead of a single conditional, e.g., `ifdef OPENMP \|\| SYCL \|\| OPENCL \|\| ...` jdoerfert: I guess the pattern ``` #if defined(__AMDGCN__) #define _DEVICE_ARCH amdgcn // _DEVICE_GPU set…
				#if defined(_DEVICE_ARCH) && (_DEVICE_ARCH == amdgcn)
				// AMD uses binary macros only, so create a value for _DEVICE_GPU
				traUnsubmitted Not Done Reply Inline Actions This does not work, does it? https://godbolt.org/z/Kn3r4x tra: This does not work, does it? https://godbolt.org/z/Kn3r4x
				#if defined(__gfx906__)
				#define _DEVICE_GPU 9060
				#elif defined(__gfx900__)
				#define _DEVICE_GPU 9000
				#elif defined(__gfx601__)
				#define _DEVICE_GPU 6010
				#elif defined(__gfx700__)
				#define _DEVICE_GPU 7000
				#elif defined(__gfx701__)
				#define _DEVICE_GPU 7010
				#elif defined(__gfx702__)
				#define _DEVICE_GPU 7020
				#elif defined(__gfx703__)
				#define _DEVICE_GPU 7030
				#elif defined(__gfx801__)
				#define _DEVICE_GPU 8010
				#elif defined(__gfx802__)
				#define _DEVICE_GPU 8020
				#elif defined(__gfx803__)
				#define _DEVICE_GPU 8030
				#elif defined(__gfx810__)
				#define _DEVICE_GPU 8100
				#elif defined(__gfx900__)
				#define _DEVICE_GPU 9000
				#elif defined(__gfx902__)
				#define _DEVICE_GPU 9020
				#elif defined(__gfx904__)
				#define _DEVICE_GPU 9040
				#elif defined(__gfx906__)
				#define _DEVICE_GPU 9060
				#elif defined(__gfx909__)
				#define _DEVICE_GPU 9090
				#elif defined(__gfx1010__)
				#define _DEVICE_GPU 10100
				#elif defined(__gfx1011__)
				#define _DEVICE_GPU 10110
				#elif defined(__gfx1012__)
				#define _DEVICE_GPU 10120
				#elif defined(__gfx1030__)
				#define _DEVICE_GPU 10300
				#else
				#define _DEVICE_GPU UNKNOWN
				#endif
				#endif

				#endif // __OFFLOAD_MACROS_H__
				No newline at end of file