This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Headers/
-
Headers/
3/7
__clang_cuda_cmath.h
-
test/Headers/
-
Headers/
-
Inputs/include/
-
include/
-
cmath
-
openmp_device_math_isnan.cpp

Differential D85879

[OpenMP] Overload `std::isnan` and friends multiple times for the GPU
ClosedPublic

Authored by jdoerfert on Aug 12 2020, 11:31 PM.

Download Raw Diff

Details

Reviewers

JonChesterfield
jhuber6
ABataev
MaskRay
tra

Commits

rG97652202d1e6: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU

Summary

std::isnan and friends can be found in two variants in the wild, one
returns bool, as the standard defines it, one returns int, as the C
macros do. So far we kinda hoped the system versions of these functions
will work for people, e.g. they are definitions that can be compiled for
the target. We know that is not the case always so we leverage the
disable_implicit_base OpenMP context extension to specialize both
versions of these functions without causing an invalid redeclaration.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.Aug 12 2020, 11:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2020, 11:31 PM

Herald added subscribers: guansong, bollu, yaxunl. · View Herald Transcript

jdoerfert requested review of this revision.Aug 12 2020, 11:31 PM

Herald added a subscriber: sstefan1. · View Herald TranscriptAug 12 2020, 11:31 PM

Harbormaster completed remote builds in B68215: Diff 285263.Aug 13 2020, 1:15 AM

jdoerfert added a parent revision: D85878: [OpenMP] Context selector extensions for return value overloading.Aug 13 2020, 6:14 AM

Fix problem, add test

tra added a subscriber: tra.Aug 13 2020, 9:31 AM

tra added inline comments.

clang/lib/Headers/__clang_cuda_cmath.h
85	If you just want to disable some existing declarations that get in the way, one way to do it would be to redeclare them with an `__arrtibute__((enable_if(false)))` Having overloads with different return types will be observable.

Harbormaster completed remote builds in B68275: Diff 285385.Aug 13 2020, 9:42 AM

jdoerfert added inline comments.Aug 13 2020, 12:10 PM

clang/lib/Headers/__clang_cuda_cmath.h
85	We need both overloads as we don't know what return type the system uses. I modeled the test below this way, that is we don't know if `isnan` has a `bool` or `int` return type. Having overloads with different return types will be observable. Unsure what observable effect you expect, the variants are there, yes, but they have different names (wrt the base function and the other variant function). The variant without a base function is simply an unused internal function. Could you elaborate what problem you expect?

(I think the tests are failing due to missing parent revisions in the build, these have to go in in-order.)

tra added inline comments.Aug 13 2020, 1:31 PM

clang/lib/Headers/__clang_cuda_cmath.h
85	What will be the result of `sizeof(isinf(1.0f))` ? I would expect it to be the same on host and on the device. I'm not quite sure what the pragma would do, so it's possible I'm barking at the wrong tree here.

jdoerfert added inline comments.Aug 13 2020, 4:01 PM

clang/lib/Headers/__clang_cuda_cmath.h

So, I actually had to run this to verify what I suspected would happen:

sizeof(isinf(1.0f)) is in the AST usually:

|         `-UnaryExprOrTypeTraitExpr 0x5586a27bd590 <col:14, col:37> 'unsigned long' sizeof                                                                                                                                                                  
|           `-ParenExpr 0x5586a27bd570 <col:20, col:37> 'bool'
|             `-CallExpr 0x5586a27bd548 <col:21, col:36> 'bool'
|               |-ImplicitCastExpr 0x5586a27bd530 <col:21, col:26> 'bool (*)(float)' <FunctionToPointerDecay>
|               | `-DeclRefExpr 0x5586a27bd500 <col:21, col:26> 'bool (float)' lvalue Function 0x5586a276f9f0 'isinf' 'bool (float)' non_odr_use_unevaluated
|               `-FloatingLiteral 0x5586a27bd290 <col:32> 'float' 1.000000e+00

If isinf has an applicable variant, it will be picked up:

|         `-UnaryExprOrTypeTraitExpr 0x55f9ac949a20 <col:14, col:37> 'unsigned long' sizeof                                                                                                                                                                  
|           `-ParenExpr 0x55f9ac949a00 <col:20, col:37> 'bool'
|             `-PseudoObjectExpr 0x55f9ac9499e0 <col:21, col:36> 'bool'
|               |-CallExpr 0x55f9ac949978 <col:21, col:36> 'bool'
|               | |-ImplicitCastExpr 0x55f9ac949960 <col:21, col:26> 'bool (*)(float)' <FunctionToPointerDecay>
|               | | `-DeclRefExpr 0x55f9ac949930 <col:21, col:26> 'bool (float)' lvalue Function 0x55f9ac7cd1d0 'isinf' 'bool (float)' non_odr_use_unevaluated
|               | `-FloatingLiteral 0x55f9ac9496c0 <col:32> 'float' 1.000000e+00
|               `-CallExpr 0x55f9ac9499b8 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20, //data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:36> 'bool'
|                 |-ImplicitCastExpr 0x55f9ac9499a0 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20> 'bool (*)(float) __attribute__((nothrow))' <FunctionToPointerDecay>
|                 | `-DeclRefExpr 0x55f9ac939790 <col:20> 'bool (float) __attribute__((nothrow))' Function 0x55f9ac939690 'isinf[implementation={extension(disable_implicit_base, match_any, allow_templates)}, device={arch(nvptx, nvptx64)}]' 'bool (float@
|                 `-FloatingLiteral 0x55f9ac9496c0 <//data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:32> 'float' 1.000000e+00

That is the behavior I expected, as it happens for any base function call with an applicable variant.

This patch doesn't change any of this. We have two specialization that do only differ in their return type but each will only be a variant of a base function with that return type. In any context, when we have a call to the original base function, then we try to specialize. Since only the bool return *or* the int return specializations are variants of the base function, we might replace the base call with a call, but consistent on host and device. I hope this makes some sense, I don't think I did a good job explaining.

tra added inline comments.Aug 13 2020, 4:33 PM

clang/lib/Headers/__clang_cuda_cmath.h
85	It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all). If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either. Is that in the ballpark of what's happening? If I'm still off, could you point me to more info about how "pragma omp declare variant" works?

jdoerfert added inline comments.Aug 13 2020, 6:03 PM

clang/lib/Headers/__clang_cuda_cmath.h
85	It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all). [...] Is that in the ballpark of what's happening? Yep. Basically, you can provide N specialization for a function and calls to that function are replaced by calls to a matching specialization. We also only do this for direct calls, that is `&foo` will always give you the address of the base version, which may or may not be desirable but is certainly different from overloading. I also had to completely give up on my overloading based implementation of declare variant :(, but the new one works really well ;) If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either. Exactly, with the caveat mentioned here in the TODO: We mangle the variants to avoid conflicts with the base function. Since this mangling is only based on the context selector and the function name, two variants that only differ in their return type would clash. To avoid this I added a "no-op" context selector trait here that will ensure the names are different in the "overlay/variant" space. how "pragma omp declare variant" works? So, this is an extension to the context selector as allowed by the standard. The latest public version is https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf, `declare variant` is on page 56, Section 2.3.5. OpenMP 5.1 (Nov 2020) will have various clarifications but the principles are the same. Note that there is `declare variant` and the `begin/end` version which behave slightly different. I implemented all of math and complex support with the begin/end version and I believe it to be far superior anyway ;)

I think this is reasonable. It's unfortunate to have isnan return bool or int depending on the system headers, but considering we have that in a language that doesn't mangle the return type into the name the workaround seems OK.

edit: removed concerns about macro implementations of isnan as this is cmath, not math

JonChesterfield accepted this revision.Aug 13 2020, 8:31 PM

This revision is now accepted and ready to land.Aug 13 2020, 8:31 PM

tra accepted this revision.Aug 14 2020, 10:38 AM

tra added inline comments.

clang/lib/Headers/__clang_cuda_cmath.h
85	Thank you for the details.

This revision was landed with ongoing or failed builds.Sep 16 2020, 11:40 AM

Closed by commit rG97652202d1e6: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU (authored by jdoerfert). · Explain Why

This revision was automatically updated to reflect the committed changes.

jdoerfert added a commit: rG97652202d1e6: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU.

jdoerfert mentioned this in D89584: [AMDGPU][OPENMP] OpenMP AMDGCN Header Support.Oct 21 2020, 3:32 PM

estewart08 mentioned this in D104677: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..Jun 21 2021, 6:22 PM

JonChesterfield mentioned this in rG5dfdc1812d9b: [OpenMP][AMDGCN] Apply fix for isnan, isinf and isfinite for amdgcn..Jun 23 2021, 7:26 AM

Revision Contents

Path

Size

clang/

lib/

Headers/

__clang_cuda_cmath.h

41 lines

test/

Headers/

Inputs/

include/

cmath

5 lines

openmp_device_math_isnan.cpp

30 lines

Diff 292292

clang/lib/Headers/__clang_cuda_cmath.h

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	__DEVICE__ int fpclassify(double __x) {
return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,		return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL,
FP_ZERO, __x);		FP_ZERO, __x);
}		}
__DEVICE__ float frexp(float __arg, int *__exp) {		__DEVICE__ float frexp(float __arg, int *__exp) {
return ::frexpf(__arg, __exp);		return ::frexpf(__arg, __exp);
}		}

// For inscrutable reasons, the CUDA headers define these functions for us on		// For inscrutable reasons, the CUDA headers define these functions for us on
// Windows. For OpenMP we omit these as some old system headers have		// Windows.
// non-conforming `isinf(float)` and `isnan(float)` implementations that return		#if !defined(_MSC_VER) \|\| defined(__OPENMP_NVPTX__)
// an `int`. The system versions of these functions should be fine anyway.
#if !defined(_MSC_VER) && !defined(__OPENMP_NVPTX__)		// For OpenMP we work around some old system headers that have non-conforming
		// `isinf(float)` and `isnan(float)` implementations that return an `int`. We do
		// this by providing two versions of these functions, differing only in the
		// return type. To avoid conflicting definitions we disable implicit base
		// function generation. That means we will end up with two specializations, one
		// per type, but only one has a base function defined by the system header.
		#if defined(__OPENMP_NVPTX__)
		#pragma omp begin declare variant match( \
		implementation = {extension(disable_implicit_base)})

		// FIXME: We lack an extension to customize the mangling of the variants, e.g.,
		// add a suffix. This means we would clash with the names of the variants
		// (note that we do not create implicit base functions here). To avoid
		// this clash we add a new trait to some of them that is always true
		traUnsubmitted Not Done Reply Inline Actions If you just want to disable some existing declarations that get in the way, one way to do it would be to redeclare them with an `__arrtibute__((enable_if(false)))` Having overloads with different return types will be observable. tra: If you just want to disable some existing declarations that get in the way, one way to do it…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions We need both overloads as we don't know what return type the system uses. I modeled the test below this way, that is we don't know if `isnan` has a `bool` or `int` return type. Having overloads with different return types will be observable. Unsure what observable effect you expect, the variants are there, yes, but they have different names (wrt the base function and the other variant function). The variant without a base function is simply an unused internal function. Could you elaborate what problem you expect? jdoerfert: We need both overloads as we don't know what return type the system uses. I modeled the test…
		traUnsubmitted Not Done Reply Inline Actions What will be the result of `sizeof(isinf(1.0f))` ? I would expect it to be the same on host and on the device. I'm not quite sure what the pragma would do, so it's possible I'm barking at the wrong tree here. tra: What will be the result of `sizeof(isinf(1.0f))` ? I would expect it to be the same on host and…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions So, I actually had to run this to verify what I suspected would happen: `sizeof(isinf(1.0f))` is in the AST usually: \| `-UnaryExprOrTypeTraitExpr 0x5586a27bd590 <col:14, col:37> 'unsigned long' sizeof \| `-ParenExpr 0x5586a27bd570 <col:20, col:37> 'bool' \| `-CallExpr 0x5586a27bd548 <col:21, col:36> 'bool' \| \|-ImplicitCastExpr 0x5586a27bd530 <col:21, col:26> 'bool ()(float)' <FunctionToPointerDecay> \| \| `-DeclRefExpr 0x5586a27bd500 <col:21, col:26> 'bool (float)' lvalue Function 0x5586a276f9f0 'isinf' 'bool (float)' non_odr_use_unevaluated \| `-FloatingLiteral 0x5586a27bd290 <col:32> 'float' 1.000000e+00 If `isinf` has an applicable variant, it will be picked up: \| `-UnaryExprOrTypeTraitExpr 0x55f9ac949a20 <col:14, col:37> 'unsigned long' sizeof \| `-ParenExpr 0x55f9ac949a00 <col:20, col:37> 'bool' \| `-PseudoObjectExpr 0x55f9ac9499e0 <col:21, col:36> 'bool' \| \|-CallExpr 0x55f9ac949978 <col:21, col:36> 'bool' \| \| \|-ImplicitCastExpr 0x55f9ac949960 <col:21, col:26> 'bool ()(float)' <FunctionToPointerDecay> \| \| \| `-DeclRefExpr 0x55f9ac949930 <col:21, col:26> 'bool (float)' lvalue Function 0x55f9ac7cd1d0 'isinf' 'bool (float)' non_odr_use_unevaluated \| \| `-FloatingLiteral 0x55f9ac9496c0 <col:32> 'float' 1.000000e+00 \| `-CallExpr 0x55f9ac9499b8 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20, //data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:36> 'bool' \| \|-ImplicitCastExpr 0x55f9ac9499a0 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20> 'bool ()(float) __attribute__((nothrow))' <FunctionToPointerDecay> \| \| `-DeclRefExpr 0x55f9ac939790 <col:20> 'bool (float) __attribute__((nothrow))' Function 0x55f9ac939690 'isinf[implementation={extension(disable_implicit_base, match_any, allow_templates)}, device={arch(nvptx, nvptx64)}]' 'bool (float@ \| `-FloatingLiteral 0x55f9ac9496c0 <//data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:32> 'float' 1.000000e+00 That is the behavior I expected, as it happens for any base function call with an applicable variant. This patch doesn't change any of this. We have two specialization that do only differ in their return type but each will only be a variant of a base function with that return type. In any context, when we have a call to the original base function, then we try to specialize. Since only the `bool` return or* the `int` return specializations are variants of the base function, we might replace the base call with a call, but consistent on host and device. I hope this makes some sense, I don't think I did a good job explaining. jdoerfert: So, I actually had to run this to verify what I suspected would happen: `sizeof(isinf(1.0f))`…
		traUnsubmitted Not Done Reply Inline Actions It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all). If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either. Is that in the ballpark of what's happening? If I'm still off, could you point me to more info about how "pragma omp declare variant" works? tra: It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all). [...] Is that in the ballpark of what's happening? Yep. Basically, you can provide N specialization for a function and calls to that function are replaced by calls to a matching specialization. We also only do this for direct calls, that is `&foo` will always give you the address of the base version, which may or may not be desirable but is certainly different from overloading. I also had to completely give up on my overloading based implementation of declare variant :(, but the new one works really well ;) If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either. Exactly, with the caveat mentioned here in the TODO: We mangle the variants to avoid conflicts with the base function. Since this mangling is only based on the context selector and the function name, two variants that only differ in their return type would clash. To avoid this I added a "no-op" context selector trait here that will ensure the names are different in the "overlay/variant" space. how "pragma omp declare variant" works? So, this is an extension to the context selector as allowed by the standard. The latest public version is https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf, `declare variant` is on page 56, Section 2.3.5. OpenMP 5.1 (Nov 2020) will have various clarifications but the principles are the same. Note that there is `declare variant` and the `begin/end` version which behave slightly different. I implemented all of math and complex support with the begin/end version and I believe it to be far superior anyway ;) jdoerfert: > It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target…
		traUnsubmitted Not Done Reply Inline Actions Thank you for the details. tra: Thank you for the details.
		// (this is LLVM after all ;)). It will only influence the mangled name
		// of the variants inside the inner region and avoid the clash.
		#pragma omp begin declare variant match(implementation = {vendor(llvm)})

		__DEVICE__ int isinf(float __x) { return ::__isinff(__x); }
		__DEVICE__ int isinf(double __x) { return ::__isinf(__x); }
		__DEVICE__ int isfinite(float __x) { return ::__finitef(__x); }
		__DEVICE__ int isfinite(double __x) { return ::__isfinited(__x); }
		__DEVICE__ int isnan(float __x) { return ::__isnanf(__x); }
		__DEVICE__ int isnan(double __x) { return ::__isnan(__x); }

		#pragma omp end declare variant

		#endif

__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); }		__DEVICE__ bool isinf(float __x) { return ::__isinff(__x); }
__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); }		__DEVICE__ bool isinf(double __x) { return ::__isinf(__x); }
__DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); }		__DEVICE__ bool isfinite(float __x) { return ::__finitef(__x); }
// For inscrutable reasons, __finite(), the double-precision version of		// For inscrutable reasons, __finite(), the double-precision version of
// __finitef, does not exist when compiling for MacOS. __isfinited is available		// __finitef, does not exist when compiling for MacOS. __isfinited is available
// everywhere and is just as good.		// everywhere and is just as good.
__DEVICE__ bool isfinite(double __x) { return ::__isfinited(__x); }		__DEVICE__ bool isfinite(double __x) { return ::__isfinited(__x); }
__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); }		__DEVICE__ bool isnan(float __x) { return ::__isnanf(__x); }
__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); }		__DEVICE__ bool isnan(double __x) { return ::__isnan(__x); }

		#if defined(__OPENMP_NVPTX__)
		#pragma omp end declare variant
		#endif

#endif		#endif

__DEVICE__ bool isgreater(float __x, float __y) {		__DEVICE__ bool isgreater(float __x, float __y) {
return __builtin_isgreater(__x, __y);		return __builtin_isgreater(__x, __y);
}		}
__DEVICE__ bool isgreater(double __x, double __y) {		__DEVICE__ bool isgreater(double __x, double __y) {
return __builtin_isgreater(__x, __y);		return __builtin_isgreater(__x, __y);
}		}
▲ Show 20 Lines • Show All 381 Lines • Show Last 20 Lines

clang/test/Headers/Inputs/include/cmath

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	bool isinf(float);			bool isinf(float);
	bool isless(double, double);			bool isless(double, double);
	bool islessequal(double, double);			bool islessequal(double, double);
	bool islessequal(float, float);			bool islessequal(float, float);
	bool isless(float, float);			bool isless(float, float);
	bool islessgreater(double, double);			bool islessgreater(double, double);
	bool islessgreater(float, float);			bool islessgreater(float, float);
	bool isnan(long double);			bool isnan(long double);
				#ifdef USE_ISNAN_WITH_INT_RETURN
				int isnan(double);
				int isnan(float);
				#else
	bool isnan(double);			bool isnan(double);
	bool isnan(float);			bool isnan(float);
				#endif
	bool isnormal(double);			bool isnormal(double);
	bool isnormal(float);			bool isnormal(float);
	bool isunordered(double, double);			bool isunordered(double, double);
	bool isunordered(float, float);			bool isunordered(float, float);
	double ldexp(double, int);			double ldexp(double, int);
	float ldexp(float, int);			float ldexp(float, int);
	double lgamma(double);			double lgamma(double);
	float lgamma(float);			float lgamma(float);
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

clang/test/Headers/openmp_device_math_isnan.cpp

This file was added.

				// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix=BOOL_RETURN
				// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast
				// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast \| FileCheck %s --check-prefix=BOOL_RETURN
				// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -DUSE_ISNAN_WITH_INT_RETURN
				// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -DUSE_ISNAN_WITH_INT_RETURN \| FileCheck %s --check-prefix=INT_RETURN
				// RUN: %clang_cc1 -x c++ -internal-isystem %S/Inputs/include -fopenmp -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN
				// RUN: %clang_cc1 -x c++ -include __clang_openmp_device_functions.h -internal-isystem %S/../../lib/Headers/openmp_wrappers -internal-isystem %S/Inputs/include -fopenmp -triple nvptx64-nvidia-cuda -aux-triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -ffast-math -ffp-contract=fast -DUSE_ISNAN_WITH_INT_RETURN \| FileCheck %s --check-prefix=INT_RETURN
				// expected-no-diagnostics

				#include <cmath>

				double math(float f, double d) {
				double r = 0;
				// INT_RETURN: call i32 @__nv_isnanf(float
				// BOOL_RETURN: call i32 @__nv_isnanf(float
				r += std::isnan(f);
				// INT_RETURN: call i32 @__nv_isnand(double
				// BOOL_RETURN: call i32 @__nv_isnand(double
				r += std::isnan(d);
				return r;
				}

				long double foo(float f, double d, long double ld) {
				double r = ld;
				r += math(f, d);
				#pragma omp target map(r)
				{ r += math(f, d); }
				return r;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Overload `std::isnan` and friends multiple times for the GPUClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 292292

clang/lib/Headers/__clang_cuda_cmath.h

clang/test/Headers/Inputs/include/cmath

clang/test/Headers/openmp_device_math_isnan.cpp

[OpenMP] Overload `std::isnan` and friends multiple times for the GPU
ClosedPublic