This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Headers/
-
Headers/
-
__clang_cuda_complex_builtins.h
-
__clang_cuda_math.h
-
test/Headers/
-
Headers/
-
nvptx_device_math_complex.c
-
nvptx_device_math_complex.cpp

Differential D83591

[OpenMP][CUDA] Fix std::complex in GPU regions
ClosedPublic

Authored by jdoerfert on Jul 10 2020, 3:05 PM.

Download Raw Diff

Details

Reviewers

tra
hfinkel
JonChesterfield
yaxunl

Commits

rGb5667d00e044: [OpenMP][CUDA] Fix std::complex in GPU regions

Summary

The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Jul 10 2020, 3:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2020, 3:05 PM

Herald added subscribers: sstefan1, guansong, bollu, yaxunl. · View Herald Transcript

Harbormaster failed remote builds in B63817: Diff 277152!Jul 10 2020, 3:36 PM

Fine by me. Let's get nvptx working properly in tree now and work out how to wire up amdgcn subsequently. I'm sure a reasonable abstraction will present itself.

This revision is now accepted and ready to land.Jul 10 2020, 4:27 PM

In D83591#2145378, @JonChesterfield wrote:

Fine by me. Let's get nvptx working properly in tree now and work out how to wire up amdgcn subsequently. I'm sure a reasonable abstraction will present itself.

I'm missing something -- what was wrong with the changes in D80897 ?
AMD's HIP compilation already piggy-backs on using clang's C++ wrappers, so this change will likely break them now and I'll be the first in line to revert the change.

@yaxunl -- Sam, does this change affect HIP compilation? If it does, perhaps we should keep C++-based macro definitions around.

In D83591#2145411, @tra wrote:

In D83591#2145378, @JonChesterfield wrote:

Fine by me. Let's get nvptx working properly in tree now and work out how to wire up amdgcn subsequently. I'm sure a reasonable abstraction will present itself.

I'm missing something -- what was wrong with the changes in D80897 ?

It doesn't work for OpenMP. The problem is that we overload some of the math functions fine, e.g., sin(float) but not the template ones. So when the code below calls copysign(int, double) (or something similar), the OpenMP target variant overload is missing. I have template overload support locally but it needs tests and there is one issue I've seen. This was supposed to be a stopgap as it unblocks the OpenMP mode.

AMD's HIP compilation already piggy-backs on using clang's C++ wrappers, so this change will likely break them now and I'll be the first in line to revert the change.

I did not know they are using __clang_cuda headers. (Site note, we should rename them then.)

@yaxunl -- Sam, does this change affect HIP compilation? If it does, perhaps we should keep C++-based macro definitions around.

Sure, I can do this only in OpenMP mode and keep the proper C++ std functions in C++ mode. Does that sound good?

Keep the std:: functions in non-OpenMP mode

In D83591#2145437, @jdoerfert wrote:

I did not know they are using __clang_cuda headers. (Site note, we should rename them then.)

I also did not know that. I am repeatedly caught out by things named 'cuda', 'nvptx' or '__nv' being used by amdgpu.

Perhaps we should refactor the __clang_cuda_* headers to make the distinctions between cuda, hip, openmp-nvptx, openmp-amdgcn clear(er).

JonChesterfield added a reviewer: yaxunl.Jul 10 2020, 5:45 PM

In D83591#2145512, @JonChesterfield wrote:

In D83591#2145437, @jdoerfert wrote:

I did not know they are using __clang_cuda headers. (Site note, we should rename them then.)

I also did not know that. I am repeatedly caught out by things named 'cuda', 'nvptx' or '__nv' being used by amdgpu.

It's complicated. :-)

Originally clang's headers were written to tactically fill in the gaps in the CUDA SDK headers that clang could not deal with.
OpenMP grew NVPTX back-end support and wanted to use a subset of those headers that happened to be conveniently close to math.h
AMD's HIP shares C++ front-end with CUDA and wants to benefit from the standard library glue we've implemented for CUDA. It also uses a lot of things internally that were originally targeting CUDA, but are now reused for HIP as well. Hence there are number of places where 'cuda' things do the double duty during HIP compilation. So do some of the CUDA-related headers.

Perhaps we should refactor the __clang_cuda_* headers to make the distinctions between cuda, hip, openmp-nvptx, openmp-amdgcn clear(er).

Agreed.

Balancing OpenMP and CUDA constraints was interesting. With HIP in the picture, it will be even more so. TBH at the moment I do not see a clean way to satisfy all users of GPU-related things in clang. That said, now may be a good time to deal with this. AMD has made a lot of progress making clang work for targeting AMD GPUs and it will likely see a lot more use relatively soon. We do want to keep things working for all parties involved.

LGTM.

Harbormaster failed remote builds in B63830: Diff 277173!Jul 10 2020, 6:20 PM

LGTM. This fixed the regression caused by previous change. Thanks.

Thx for the reviews!

FWIW, OpenMP should be able to use the C/C++ standard functions/macros for this eventually. Getting the overloads right if you don't have type system support is tricky though and I need more time...

On a separate note, we should bundle the resources to get more "GPU-compatible" generic headers in. At the end of the day, what we need for CUDA/HIP/OPENMP on NVIDIA/AMD/... hardware should be very similar, assuming we can make some design choices right ;)

Closed by commit rGb5667d00e044: [OpenMP][CUDA] Fix std::complex in GPU regions (authored by jdoerfert). · Explain WhyJul 10 2020, 10:42 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

lib/

Headers/

__clang_cuda_complex_builtins.h

52 lines

__clang_cuda_math.h

10 lines

test/

Headers/

nvptx_device_math_complex.c

31 lines

nvptx_device_math_complex.cpp

31 lines

Diff 277217

clang/lib/Headers/__clang_cuda_complex_builtins.h

	Show All 17 Lines
	#pragma push_macro("__DEVICE__")			#pragma push_macro("__DEVICE__")
	#ifdef _OPENMP			#ifdef _OPENMP
	#pragma omp declare target			#pragma omp declare target
	#define __DEVICE__ __attribute__((noinline, nothrow, cold, weak))			#define __DEVICE__ __attribute__((noinline, nothrow, cold, weak))
	#else			#else
	#define __DEVICE__ __device__ inline			#define __DEVICE__ __device__ inline
	#endif			#endif

	// Make the algorithms available for C and C++ by selecting the right functions.			// To make the algorithms available for C and C++ in CUDA and OpenMP we select
	#if defined(__cplusplus)			// different but equivalent function versions. TODO: For OpenMP we currently
	// TODO: In OpenMP mode we cannot overload isinf/isnan/isfinite the way we			// select the native builtins as the overload support for templates is lacking.
	// overload all other math functions because old math system headers and not			#if !defined(_OPENMP)
	// always conformant and return an integer instead of a boolean. Until that has			#define _ISNANd std::isnan
	// been addressed we need to work around it. For now, we substituate with the			#define _ISNANf std::isnan
	// calls we would have used to implement those three functions. Note that we			#define _ISINFd std::isinf
	// could use the C alternatives as well.			#define _ISINFf std::isinf
	#define _ISNANd ::__isnan			#define _ISFINITEd std::isfinite
	#define _ISNANf ::__isnanf			#define _ISFINITEf std::isfinite
	#define _ISINFd ::__isinf
	#define _ISINFf ::__isinff
	#define _ISFINITEd ::__isfinited
	#define _ISFINITEf ::__finitef
	#define _COPYSIGNd std::copysign			#define _COPYSIGNd std::copysign
	#define _COPYSIGNf std::copysign			#define _COPYSIGNf std::copysign
	#define _SCALBNd std::scalbn			#define _SCALBNd std::scalbn
	#define _SCALBNf std::scalbn			#define _SCALBNf std::scalbn
	#define _ABSd std::abs			#define _ABSd std::abs
	#define _ABSf std::abs			#define _ABSf std::abs
	#define _LOGBd std::logb			#define _LOGBd std::logb
	#define _LOGBf std::logb			#define _LOGBf std::logb
	#else			#else
	#define _ISNANd isnan			#define _ISNANd __nv_isnand
	#define _ISNANf isnanf			#define _ISNANf __nv_isnanf
	#define _ISINFd isinf			#define _ISINFd __nv_isinfd
	#define _ISINFf isinff			#define _ISINFf __nv_isinff
	#define _ISFINITEd isfinite			#define _ISFINITEd __nv_isfinited
	#define _ISFINITEf isfinitef			#define _ISFINITEf __nv_finitef
	#define _COPYSIGNd copysign			#define _COPYSIGNd __nv_copysign
	#define _COPYSIGNf copysignf			#define _COPYSIGNf __nv_copysignf
	#define _SCALBNd scalbn			#define _SCALBNd __nv_scalbn
	#define _SCALBNf scalbnf			#define _SCALBNf __nv_scalbnf
	#define _ABSd abs			#define _ABSd __nv_fabs
	#define _ABSf absf			#define _ABSf __nv_fabsf
	#define _LOGBd logb			#define _LOGBd __nv_logb
	#define _LOGBf logbf			#define _LOGBf __nv_logbf
	#endif			#endif

	#if defined(__cplusplus)			#if defined(__cplusplus)
	extern "C" {			extern "C" {
	#endif			#endif

	__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,			__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,
	double __d) {			double __d) {
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

clang/lib/Headers/__clang_cuda_math.h

	Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines
	}			}
	__DEVICE__ double y0(double __a) { return __nv_y0(__a); }			__DEVICE__ double y0(double __a) { return __nv_y0(__a); }
	__DEVICE__ float y0f(float __a) { return __nv_y0f(__a); }			__DEVICE__ float y0f(float __a) { return __nv_y0f(__a); }
	__DEVICE__ double y1(double __a) { return __nv_y1(__a); }			__DEVICE__ double y1(double __a) { return __nv_y1(__a); }
	__DEVICE__ float y1f(float __a) { return __nv_y1f(__a); }			__DEVICE__ float y1f(float __a) { return __nv_y1f(__a); }
	__DEVICE__ double yn(int __a, double __b) { return __nv_yn(__a, __b); }			__DEVICE__ double yn(int __a, double __b) { return __nv_yn(__a, __b); }
	__DEVICE__ float ynf(int __a, float __b) { return __nv_ynf(__a, __b); }			__DEVICE__ float ynf(int __a, float __b) { return __nv_ynf(__a, __b); }

	// In C++ mode OpenMP takes the system versions of these because some math
	// headers provide the wrong return type. This cannot happen in C and we can and
	// want to use the specialized versions right away.
	#if defined(_OPENMP) && !defined(__cplusplus)
	__DEVICE__ int isinff(float __x) { return __nv_isinff(__x); }
	__DEVICE__ int isinf(double __x) { return __nv_isinfd(__x); }
	__DEVICE__ int isnanf(float __x) { return __nv_isnanf(__x); }
	__DEVICE__ int isnan(double __x) { return __nv_isnand(__x); }
	#endif

	#pragma pop_macro("__DEVICE__")			#pragma pop_macro("__DEVICE__")
	#pragma pop_macro("__DEVICE_VOID__")			#pragma pop_macro("__DEVICE_VOID__")
	#pragma pop_macro("__FAST_OR_SLOW")			#pragma pop_macro("__FAST_OR_SLOW")

	#endif // __CLANG_CUDA_DEVICE_FUNCTIONS_H__			#endif // __CLANG_CUDA_DEVICE_FUNCTIONS_H__

clang/test/Headers/nvptx_device_math_complex.c

	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s			// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
	// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s			// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	#ifdef __cplusplus			#ifdef __cplusplus
	#include <complex>			#include <complex>
	#else			#else
	#include <complex.h>			#include <complex.h>
	#endif			#endif

	// CHECK-DAG: define weak {{.*}} @__mulsc3			// CHECK: define weak {{.*}} @__muldc3
	// CHECK-DAG: define weak {{.*}} @__muldc3			// CHECK-DAG: call i32 @__nv_isnand(
	// CHECK-DAG: define weak {{.*}} @__divsc3			// CHECK-DAG: call i32 @__nv_isinfd(
	// CHECK-DAG: define weak {{.*}} @__divdc3			// CHECK-DAG: call double @__nv_copysign(

				// CHECK: define weak {{.*}} @__mulsc3
				// CHECK-DAG: call i32 @__nv_isnanf(
				// CHECK-DAG: call i32 @__nv_isinff(
				// CHECK-DAG: call float @__nv_copysignf(

				// CHECK: define weak {{.*}} @__divdc3
				// CHECK-DAG: call i32 @__nv_isnand(
				// CHECK-DAG: call i32 @__nv_isinfd(
				// CHECK-DAG: call i32 @__nv_isfinited(
				// CHECK-DAG: call double @__nv_copysign(
				// CHECK-DAG: call double @__nv_scalbn(
				// CHECK-DAG: call double @__nv_fabs(
				// CHECK-DAG: call double @__nv_logb(

				// CHECK: define weak {{.*}} @__divsc3
				// CHECK-DAG: call i32 @__nv_isnanf(
				// CHECK-DAG: call i32 @__nv_isinff(
				// CHECK-DAG: call i32 @__nv_finitef(
				// CHECK-DAG: call float @__nv_copysignf(
	// CHECK-DAG: call float @__nv_scalbnf(			// CHECK-DAG: call float @__nv_scalbnf(
				// CHECK-DAG: call float @__nv_fabsf(
				// CHECK-DAG: call float @__nv_logbf(

	void test_scmplx(float _Complex a) {			void test_scmplx(float _Complex a) {
	#pragma omp target			#pragma omp target
	{			{
	(void)(a * (a / a));			(void)(a * (a / a));
	}			}
	}			}

	// CHECK-DAG: call double @__nv_scalbn(
	void test_dcmplx(double _Complex a) {			void test_dcmplx(double _Complex a) {
	#pragma omp target			#pragma omp target
	{			{
	(void)(a * (a / a));			(void)(a * (a / a));
	}			}
	}			}

clang/test/Headers/nvptx_device_math_complex.cpp

	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s			// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	#include <complex>			#include <complex>

	// CHECK-DAG: define weak {{.*}} @__mulsc3			// CHECK: define weak {{.*}} @__muldc3
	// CHECK-DAG: define weak {{.*}} @__muldc3			// CHECK-DAG: call i32 @__nv_isnand(
	// CHECK-DAG: define weak {{.*}} @__divsc3			// CHECK-DAG: call i32 @__nv_isinfd(
	// CHECK-DAG: define weak {{.*}} @__divdc3			// CHECK-DAG: call double @__nv_copysign(

				// CHECK: define weak {{.*}} @__mulsc3
				// CHECK-DAG: call i32 @__nv_isnanf(
				// CHECK-DAG: call i32 @__nv_isinff(
				// CHECK-DAG: call float @__nv_copysignf(

				// CHECK: define weak {{.*}} @__divdc3
				// CHECK-DAG: call i32 @__nv_isnand(
				// CHECK-DAG: call i32 @__nv_isinfd(
				// CHECK-DAG: call i32 @__nv_isfinited(
				// CHECK-DAG: call double @__nv_copysign(
				// CHECK-DAG: call double @__nv_scalbn(
				// CHECK-DAG: call double @__nv_fabs(
				// CHECK-DAG: call double @__nv_logb(

				// CHECK: define weak {{.*}} @__divsc3
				// CHECK-DAG: call i32 @__nv_isnanf(
				// CHECK-DAG: call i32 @__nv_isinff(
				// CHECK-DAG: call i32 @__nv_finitef(
				// CHECK-DAG: call float @__nv_copysignf(
	// CHECK-DAG: call float @__nv_scalbnf(			// CHECK-DAG: call float @__nv_scalbnf(
				// CHECK-DAG: call float @__nv_fabsf(
				// CHECK-DAG: call float @__nv_logbf(

	void test_scmplx(std::complex<float> a) {			void test_scmplx(std::complex<float> a) {
	#pragma omp target			#pragma omp target
	{			{
	(void)(a * (a / a));			(void)(a * (a / a));
	}			}
	}			}

	// CHECK-DAG: call double @__nv_scalbn(
	void test_dcmplx(std::complex<double> a) {			void test_dcmplx(std::complex<double> a) {
	#pragma omp target			#pragma omp target
	{			{
	(void)(a * (a / a));			(void)(a * (a / a));
	}			}
	}			}