This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Headers/
-
Headers/
-
CMakeLists.txt
4/8
__clang_cuda_complex_builtins.h
-
openmp_wrappers/
-
complex
-
test/Headers/
-
Headers/
-
Inputs/include/
-
include/
1
complex
-
cstdlib
-
math.h
-
nvptx_device_math_complex.cpp

Differential D80897

[OpenMP] Initial support for std::complex in target regions
ClosedPublic

Authored by jdoerfert on May 31 2020, 1:08 PM.

Download Raw Diff

Details

Reviewers

tra
hfinkel
ABataev
JonChesterfield

Commits

rGd999cbc98832: [OpenMP] Initial support for std::complex in target regions

Summary

This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., __muldc3 not found, when std::complex
operations are used on a device.

In "CUDA mode" this should allow simple complex operations to work in
target regions. Normal mode doesn't work because the globalization in
the std::complex operators is somehow broken. This will most likely not
allow complex make math function calls to work properly, e.g., sin, but
that is more complex (pan intended) anyway.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	30 ms	Clang.Headers::Unknown Unit Message ("")
	50 ms	Clang.Headers::Unknown Unit Message ("")
	30 ms	Clang.Headers::Unknown Unit Message ("")
	60 ms	Clang.Headers::Unknown Unit Message ("")
	70 ms	Clang.Headers::Unknown Unit Message ("")
		View Full Test Results (8 Failed)

Event Timeline

jdoerfert created this revision.May 31 2020, 1:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2020, 1:08 PM

Herald added subscribers: sstefan1, guansong, bollu and 2 others. · View Herald Transcript

Harbormaster failed remote builds in B58576: Diff 267521!May 31 2020, 1:34 PM

Fix tests, add C support

Harbormaster failed remote builds in B58580: Diff 267531!May 31 2020, 4:30 PM

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.
Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to __mul* or __div*, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with -stdlib=libc++ --std=c++11 does not need the soft-float functions. That would explain why we don't see the build breaks.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

clang/lib/Headers/__clang_cuda_complex_builtins.h
29	Nit: this creates impression that we fall back on `double` variant of the function, while in reality we'll end up using `std::isnan<float>`. Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing `std::complex<half>` which may overload resolution ambiguous in some cases.
63	Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html I wonder why only the complex variants of the soft-float support functions are missing. Does it mean that x86 code also does rely on the library to do complex multiplication? If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it?

In D80897#2066723, @tra wrote:

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.

Which functions and missing from where? In CUDA-mode we did provide __XXXXc3 already.

Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to __mul* or __div*, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with -stdlib=libc++ --std=c++11 does not need the soft-float functions. That would explain why we don't see the build breaks.

Its not that simple, and tbh, I don't have the full picture yet. Plain (clang) CUDA uses these functions (https://godbolt.org/z/dp_FY2), they just disappear after inlining because of the linkage. If you however enable -fast-math they are not used (https://godbolt.org/z/_N-STh). I couldn't run with stdlib=libc++ locally and godbold cuts of the output so I'm not sure if they are used and inlined or not used.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

The way I understand this is that we can always provide correct weak versions of __XXXXc3 without any correctness issues. They will be stripped if they are not needed anyway. That said, this patch should not modify the CUDA behavior (except minor float vs double corrections in the __XXXXc3 methods). Could you elaborate what interference you expect?

clang/lib/Headers/__clang_cuda_complex_builtins.h
29	No problem. I'll just use std::NAME for all of them.
63	I wonder why only the complex variants of the soft-float support functions are missing. I would guess others are conceptually missing too, the question is if we need them. I did grep the clang source for 7 non-complex soft-float support functions from the different categories listed in the gcc docs, none was found. Does it mean that x86 code also does rely on the library to do complex multiplication? I think so, yes. Some system library will provide the implementation of `__muldc3` for the slow path of a complex multiplication. If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it? I think I don't understand this (and maybe the question above). What we do in CUDA right now, and with this patch in OpenMP, is to provide the `__XXXXc3` functions on the device. Usually they are in some system library that we just not have on the device so we have to add them somehow.

arsenm added a subscriber: arsenm.Jun 1 2020, 12:10 PM

arsenm added inline comments.

clang/lib/Headers/__clang_cuda_complex_builtins.h
109–110	Why does this try to preserve the sign of a nan? They are meaningless

In D80897#2066952, @jdoerfert wrote:

In D80897#2066723, @tra wrote:

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.

Which functions and missing from where? In CUDA-mode we did provide __XXXXc3 already.

I mean the __XXXXc3 functions added by the patch. I've tried with clang as it is now, before your patch.

Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to __mul* or __div*, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with -stdlib=libc++ --std=c++11 does not need the soft-float functions. That would explain why we don't see the build breaks.

Its not that simple, and tbh, I don't have the full picture yet. Plain (clang) CUDA uses these functions (https://godbolt.org/z/dp_FY2), they just disappear after inlining because of the linkage. If you however enable -fast-math they are not used (https://godbolt.org/z/_N-STh). I couldn't run with stdlib=libc++ locally and godbold cuts of the output so I'm not sure if they are used and inlined or not used.

I've checked it locally and verified that adding --stdlib=libc++ -std=c++11 to your first example shows that __*c3 functions do not appear in IR regardless of inlining or opt level.
I wonder what is that that libstdc++ does that makes those functions show up in IR. AFAICT, it's not invoked directly by the library, so it must be something clang has generated. Perhaps something should be fixed there.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

The way I understand this is that we can always provide correct weak versions of __XXXXc3 without any correctness issues. They will be stripped if they are not needed anyway. That said, this patch should not modify the CUDA behavior (except minor float vs double corrections in the __XXXXc3 methods). Could you elaborate what interference you expect?

One example would be if/when we grow a better libm support for GPUs. Granted, it's just few functions and we could just remove these instances then.
I agree that adding these functions now will probably not interfere with anything we have now -- they are device-side overloads and nobody calls them directly.
The suggestion was based on a general principle of minimizing the changes that overlap with the standard libraries -- there are quite a few versions out there and I can't predict what quirks of theirs I'm not aware of. I've been burned too many times by that to be wary.

clang/lib/Headers/__clang_cuda_complex_builtins.h
63	I'm OK with providing device-side equivalents of the host standard library. What' I'm trying to figure out if why we don't need to do it in some cases. In case whe we do rely on these functions, but don't have them, we have at least two choices -- provide the missing functions (this patch) or ensure we never need these functions (what I'm trying to figure out). If there's a way to reliably ensure that we don't need these functions, I'd prefer that. Right now the observation is that libc++ somehow avoids it. If we can improve clang that libstdc++ would also work without falling back on the `__*c3` functions, that may be a better fix for this. That said, I don't understand yet why/how the standard c++ libraries end up with different code in this case.

I tried to determine why we don't emit such calls for c++11 and stdc++ but I was not successful :( Tracking back from the emission lead to the generic expression codegen without any (obvious) check of the runtime library or std versions.

clang/lib/Headers/__clang_cuda_complex_builtins.h
109–110	Idk [I only work here... ;)] I guess the algorithm was once copied from libc++, unclear if the one in there is still the same, we could check.

saiislam added a subscriber: saiislam.Jun 3 2020, 8:05 AM

jdoerfert marked an inline comment as done.Jun 3 2020, 12:47 PM

jdoerfert added inline comments.

clang/lib/Headers/__clang_cuda_complex_builtins.h
42	This will actually not work right now as we do not overload isinf/isnan/isfinite properly in C++ mode. I first have to find a solution for that mess.

@tra After chatting with @hfinkel I know now why we don't see the calls in the libc++ case. libc++ implements std::complex without _Complex types, stdlib++ does. If the user uses _Complex directly we need these functions for sure as the standard defines them (I think): https://godbolt.org/z/jcXgnH

So we need them and I would like to reuse them in the OpenMP offload path :)

@JonChesterfield @hfinkel @tra ping

I would really like to land this before the release branches off to allow people to use complex in target regions.

I think this change is good. The library story is a bit difficult, but fundamentally openmp needs a shim of some sort to map target math functions onto the libm of the underlying device.

For nvptx, that's the cuda library. Amdgcn has math functions and may need another shim to map them to libm.

include_next is nasty, but that's the existing pattern for some library headers.

clang/test/Headers/Inputs/include/complex
11	Can we #include from libc++ instead? Needs some cmake to skip the test if the library is unavailable but spares duplicating this class

This revision is now accepted and ready to land.Jul 2 2020, 5:17 PM

ye-luo added a subscriber: ye-luo.Jul 7 2020, 6:25 AM

Addressed comments

LGTM.

Harbormaster failed remote builds in B63206: Diff 276053!Jul 7 2020, 9:53 AM

Closed by commit rGd999cbc98832: [OpenMP] Initial support for std::complex in target regions (authored by jdoerfert). · Explain WhyJul 8 2020, 3:36 PM

This revision was automatically updated to reflect the committed changes.

tra mentioned this in D83591: [OpenMP][CUDA] Fix std::complex in GPU regions.Jul 10 2020, 4:47 PM

Revision Contents

Path

Size

clang/

lib/

Headers/

CMakeLists.txt

1 line

__clang_cuda_complex_builtins.h

48 lines

openmp_wrappers/

complex

30 lines

test/

Headers/

Inputs/

include/

complex

301 lines

cstdlib

4 lines

math.h

4 lines

nvptx_device_math_complex.cpp

25 lines

Diff 267521

clang/lib/Headers/CMakeLists.txt

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	set(ppc_wrapper_files
ppc_wrappers/pmmintrin.h		ppc_wrappers/pmmintrin.h
ppc_wrappers/tmmintrin.h		ppc_wrappers/tmmintrin.h
ppc_wrappers/smmintrin.h		ppc_wrappers/smmintrin.h
)		)

set(openmp_wrapper_files		set(openmp_wrapper_files
openmp_wrappers/math.h		openmp_wrappers/math.h
openmp_wrappers/cmath		openmp_wrappers/cmath
		openmp_wrappers/complex
openmp_wrappers/__clang_openmp_device_functions.h		openmp_wrappers/__clang_openmp_device_functions.h
openmp_wrappers/new		openmp_wrappers/new
)		)

set(output_dir ${LLVM_LIBRARY_OUTPUT_INTDIR}/clang/${CLANG_VERSION}/include)		set(output_dir ${LLVM_LIBRARY_OUTPUT_INTDIR}/clang/${CLANG_VERSION}/include)
set(out_files)		set(out_files)
set(generated_files)		set(generated_files)

▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

clang/lib/Headers/__clang_cuda_complex_builtins.h

Show All 9 Lines
#ifndef __CLANG_CUDA_COMPLEX_BUILTINS		#ifndef __CLANG_CUDA_COMPLEX_BUILTINS
#define __CLANG_CUDA_COMPLEX_BUILTINS		#define __CLANG_CUDA_COMPLEX_BUILTINS

// This header defines __muldc3, __mulsc3, __divdc3, and __divsc3. These are		// This header defines __muldc3, __mulsc3, __divdc3, and __divsc3. These are
// libgcc functions that clang assumes are available when compiling c99 complex		// libgcc functions that clang assumes are available when compiling c99 complex
// operations. (These implementations come from libc++, and have been modified		// operations. (These implementations come from libc++, and have been modified
// to work with CUDA.)		// to work with CUDA.)

extern "C" inline __device__ double _Complex __muldc3(double __a, double __b,		#pragma push_macro("__DEVICE__")
double __c, double __d) {		#ifdef _OPENMP
		#pragma omp declare target
		#define __DEVICE__ __attribute__((nothrow))
		#else
		#define __DEVICE__ __device__ inline
		#endif

		#if defined(__cplusplus)
		extern "C" {
		#endif

		traUnsubmitted Not Done Reply Inline Actions Nit: this creates impression that we fall back on `double` variant of the function, while in reality we'll end up using `std::isnan<float>`. Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing `std::complex<half>` which may overload resolution ambiguous in some cases. tra: Nit: this creates impression that we fall back on `double` variant of the function, while in…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions No problem. I'll just use std::NAME for all of them. jdoerfert: No problem. I'll just use std::NAME for all of them.
		__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,
		double __d) {
double __ac = __a * __c;		double __ac = __a * __c;
double __bd = __b * __d;		double __bd = __b * __d;
double __ad = __a * __d;		double __ad = __a * __d;
double __bc = __b * __c;		double __bc = __b * __c;
double _Complex z;		double _Complex z;
__real__(z) = __ac - __bd;		__real__(z) = __ac - __bd;
__imag__(z) = __ad + __bc;		__imag__(z) = __ad + __bc;
if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {		if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
int __recalc = 0;		int __recalc = 0;
if (std::isinf(__a) \|\| std::isinf(__b)) {		if (std::isinf(__a) \|\| std::isinf(__b)) {
__a = std::copysign(std::isinf(__a) ? 1 : 0, __a);		__a = std::copysign(std::isinf(__a) ? 1 : 0, __a);
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions This will actually not work right now as we do not overload isinf/isnan/isfinite properly in C++ mode. I first have to find a solution for that mess. jdoerfert: This will actually not work right now as we do not overload isinf/isnan/isfinite properly in…
__b = std::copysign(std::isinf(__b) ? 1 : 0, __b);		__b = std::copysign(std::isinf(__b) ? 1 : 0, __b);
if (std::isnan(__c))		if (std::isnan(__c))
__c = std::copysign(0, __c);		__c = std::copysign(0, __c);
if (std::isnan(__d))		if (std::isnan(__d))
__d = std::copysign(0, __d);		__d = std::copysign(0, __d);
__recalc = 1;		__recalc = 1;
}		}
if (std::isinf(__c) \|\| std::isinf(__d)) {		if (std::isinf(__c) \|\| std::isinf(__d)) {
__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);		__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);
__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);		__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);
if (std::isnan(__a))		if (std::isnan(__a))
__a = std::copysign(0, __a);		__a = std::copysign(0, __a);
if (std::isnan(__b))		if (std::isnan(__b))
__b = std::copysign(0, __b);		__b = std::copysign(0, __b);
__recalc = 1;		__recalc = 1;
}		}
if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|		if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|
std::isinf(__ad) \|\| std::isinf(__bc))) {		std::isinf(__ad) \|\| std::isinf(__bc))) {
if (std::isnan(__a))		if (std::isnan(__a))
__a = std::copysign(0, __a);		__a = std::copysign(0, __a);
if (std::isnan(__b))		if (std::isnan(__b))
		traUnsubmitted Not Done Reply Inline Actions Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html I wonder why only the complex variants of the soft-float support functions are missing. Does it mean that x86 code also does rely on the library to do complex multiplication? If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it? tra: Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions I wonder why only the complex variants of the soft-float support functions are missing. I would guess others are conceptually missing too, the question is if we need them. I did grep the clang source for 7 non-complex soft-float support functions from the different categories listed in the gcc docs, none was found. Does it mean that x86 code also does rely on the library to do complex multiplication? I think so, yes. Some system library will provide the implementation of `__muldc3` for the slow path of a complex multiplication. If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it? I think I don't understand this (and maybe the question above). What we do in CUDA right now, and with this patch in OpenMP, is to provide the `__XXXXc3` functions on the device. Usually they are in some system library that we just not have on the device so we have to add them somehow. jdoerfert: > I wonder why only the complex variants of the soft-float support functions are missing. I…
		traUnsubmitted Not Done Reply Inline Actions I'm OK with providing device-side equivalents of the host standard library. What' I'm trying to figure out if why we don't need to do it in some cases. In case whe we do rely on these functions, but don't have them, we have at least two choices -- provide the missing functions (this patch) or ensure we never need these functions (what I'm trying to figure out). If there's a way to reliably ensure that we don't need these functions, I'd prefer that. Right now the observation is that libc++ somehow avoids it. If we can improve clang that libstdc++ would also work without falling back on the `__c3` functions, that may be a better fix for this. That said, I don't understand yet why/how the standard c++ libraries end up with different code in this case. tra:* I'm OK with providing device-side equivalents of the host standard library. What' I'm trying…
__b = std::copysign(0, __b);		__b = std::copysign(0, __b);
if (std::isnan(__c))		if (std::isnan(__c))
__c = std::copysign(0, __c);		__c = std::copysign(0, __c);
if (std::isnan(__d))		if (std::isnan(__d))
__d = std::copysign(0, __d);		__d = std::copysign(0, __d);
__recalc = 1;		__recalc = 1;
}		}
if (__recalc) {		if (__recalc) {
// Can't use std::numeric_limits<double>::infinity() -- that doesn't have		// Can't use std::numeric_limits<double>::infinity() -- that doesn't have
// a device overload (and isn't constexpr before C++11, naturally).		// a device overload (and isn't constexpr before C++11, naturally).
__real__(z) = __builtin_huge_valf() * (__a * __c - __b * __d);		__real__(z) = __builtin_huge_val() * (__a * __c - __b * __d);
__imag__(z) = __builtin_huge_valf() * (__a * __d + __b * __c);		__imag__(z) = __builtin_huge_val() * (__a * __d + __b * __c);
}		}
}		}
return z;		return z;
}		}

extern "C" inline __device__ float _Complex __mulsc3(float __a, float __b,		__DEVICE__ float _Complex __mulsc3(float __a, float __b, float __c, float __d) {
float __c, float __d) {
float __ac = __a * __c;		float __ac = __a * __c;
float __bd = __b * __d;		float __bd = __b * __d;
float __ad = __a * __d;		float __ad = __a * __d;
float __bc = __b * __c;		float __bc = __b * __c;
float _Complex z;		float _Complex z;
__real__(z) = __ac - __bd;		__real__(z) = __ac - __bd;
__imag__(z) = __ad + __bc;		__imag__(z) = __ad + __bc;
if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {		if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
Show All 11 Lines	if (std::isinf(__c) \|\| std::isinf(__d)) {
__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);		__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);
__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);		__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);
if (std::isnan(__a))		if (std::isnan(__a))
__a = std::copysign(0, __a);		__a = std::copysign(0, __a);
if (std::isnan(__b))		if (std::isnan(__b))
__b = std::copysign(0, __b);		__b = std::copysign(0, __b);
__recalc = 1;		__recalc = 1;
}		}
if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|		if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|
std::isinf(__ad) \|\| std::isinf(__bc))) {		std::isinf(__ad) \|\| std::isinf(__bc))) {
		arsenmUnsubmitted Not Done Reply Inline Actions Why does this try to preserve the sign of a nan? They are meaningless arsenm: Why does this try to preserve the sign of a nan? They are meaningless
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions Idk [I only work here... ;)] I guess the algorithm was once copied from libc++, unclear if the one in there is still the same, we could check. jdoerfert: Idk [I only work here... ;)] I guess the algorithm was once copied from libc++, unclear if the…
if (std::isnan(__a))		if (std::isnan(__a))
__a = std::copysign(0, __a);		__a = std::copysign(0, __a);
if (std::isnan(__b))		if (std::isnan(__b))
__b = std::copysign(0, __b);		__b = std::copysign(0, __b);
if (std::isnan(__c))		if (std::isnan(__c))
__c = std::copysign(0, __c);		__c = std::copysign(0, __c);
if (std::isnan(__d))		if (std::isnan(__d))
__d = std::copysign(0, __d);		__d = std::copysign(0, __d);
__recalc = 1;		__recalc = 1;
}		}
if (__recalc) {		if (__recalc) {
__real__(z) = __builtin_huge_valf() * (__a * __c - __b * __d);		__real__(z) = __builtin_huge_valf() * (__a * __c - __b * __d);
__imag__(z) = __builtin_huge_valf() * (__a * __d + __b * __c);		__imag__(z) = __builtin_huge_valf() * (__a * __d + __b * __c);
}		}
}		}
return z;		return z;
}		}

extern "C" inline __device__ double _Complex __divdc3(double __a, double __b,		__DEVICE__ double _Complex __divdc3(double __a, double __b, double __c,
double __c, double __d) {		double __d) {
int __ilogbw = 0;		int __ilogbw = 0;
// Can't use std::max, because that's defined in <algorithm>, and we don't		// Can't use std::max, because that's defined in <algorithm>, and we don't
// want to pull that in for every compile. The CUDA headers define		// want to pull that in for every compile. The CUDA headers define
// ::max(float, float) and ::max(double, double), which is sufficient for us.		// ::max(float, float) and ::max(double, double), which is sufficient for us.
double __logbw = std::logb(max(std::abs(__c), std::abs(__d)));		double __logbw = std::logb(max(std::abs(__c), std::abs(__d)));
if (std::isfinite(__logbw)) {		if (std::isfinite(__logbw)) {
__ilogbw = (int)__logbw;		__ilogbw = (int)__logbw;
__c = std::scalbn(__c, -__ilogbw);		__c = std::scalbn(__c, -__ilogbw);
__d = std::scalbn(__d, -__ilogbw);		__d = std::scalbn(__d, -__ilogbw);
}		}
double __denom = __c * __c + __d * __d;		double __denom = __c * __c + __d * __d;
double _Complex z;		double _Complex z;
__real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, -__ilogbw);		__real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, -__ilogbw);
__imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, -__ilogbw);		__imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, -__ilogbw);
if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {		if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
if ((__denom == 0.0) && (!std::isnan(__a) \|\| !std::isnan(__b))) {		if ((__denom == 0.0) && (!std::isnan(__a) \|\| !std::isnan(__b))) {
__real__(z) = std::copysign(__builtin_huge_valf(), __c) * __a;		__real__(z) = std::copysign(__builtin_huge_val(), __c) * __a;
__imag__(z) = std::copysign(__builtin_huge_valf(), __c) * __b;		__imag__(z) = std::copysign(__builtin_huge_val(), __c) * __b;
} else if ((std::isinf(__a) \|\| std::isinf(__b)) && std::isfinite(__c) &&		} else if ((std::isinf(__a) \|\| std::isinf(__b)) && std::isfinite(__c) &&
std::isfinite(__d)) {		std::isfinite(__d)) {
__a = std::copysign(std::isinf(__a) ? 1.0 : 0.0, __a);		__a = std::copysign(std::isinf(__a) ? 1.0 : 0.0, __a);
__b = std::copysign(std::isinf(__b) ? 1.0 : 0.0, __b);		__b = std::copysign(std::isinf(__b) ? 1.0 : 0.0, __b);
__real__(z) = __builtin_huge_valf() * (__a * __c + __b * __d);		__real__(z) = __builtin_huge_val() * (__a * __c + __b * __d);
__imag__(z) = __builtin_huge_valf() * (__b * __c - __a * __d);		__imag__(z) = __builtin_huge_val() * (__b * __c - __a * __d);
} else if (std::isinf(__logbw) && __logbw > 0.0 && std::isfinite(__a) &&		} else if (std::isinf(__logbw) && __logbw > 0.0 && std::isfinite(__a) &&
std::isfinite(__b)) {		std::isfinite(__b)) {
__c = std::copysign(std::isinf(__c) ? 1.0 : 0.0, __c);		__c = std::copysign(std::isinf(__c) ? 1.0 : 0.0, __c);
__d = std::copysign(std::isinf(__d) ? 1.0 : 0.0, __d);		__d = std::copysign(std::isinf(__d) ? 1.0 : 0.0, __d);
__real__(z) = 0.0 * (__a * __c + __b * __d);		__real__(z) = 0.0 * (__a * __c + __b * __d);
__imag__(z) = 0.0 * (__b * __c - __a * __d);		__imag__(z) = 0.0 * (__b * __c - __a * __d);
}		}
}		}
return z;		return z;
}		}

extern "C" inline __device__ float _Complex __divsc3(float __a, float __b,		__DEVICE__ float _Complex __divsc3(float __a, float __b, float __c, float __d) {
float __c, float __d) {
int __ilogbw = 0;		int __ilogbw = 0;
float __logbw = std::logb(max(std::abs(__c), std::abs(__d)));		float __logbw = std::logb(max(std::abs(__c), std::abs(__d)));
if (std::isfinite(__logbw)) {		if (std::isfinite(__logbw)) {
__ilogbw = (int)__logbw;		__ilogbw = (int)__logbw;
__c = std::scalbn(__c, -__ilogbw);		__c = std::scalbn(__c, -__ilogbw);
__d = std::scalbn(__d, -__ilogbw);		__d = std::scalbn(__d, -__ilogbw);
}		}
float __denom = __c * __c + __d * __d;		float __denom = __c * __c + __d * __d;
Show All 16 Lines	if ((__denom == 0) && (!std::isnan(__a) \|\| !std::isnan(__b))) {
__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);		__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);
__real__(z) = 0 * (__a * __c + __b * __d);		__real__(z) = 0 * (__a * __c + __b * __d);
__imag__(z) = 0 * (__b * __c - __a * __d);		__imag__(z) = 0 * (__b * __c - __a * __d);
}		}
}		}
return z;		return z;
}		}

		#if defined(__cplusplus)
		} // extern "C"
		#endif

		#ifdef _OPENMP
		#pragma omp end declare target
		#endif

		#pragma pop_macro("__DEVICE__")

#endif // __CLANG_CUDA_COMPLEX_BUILTINS		#endif // __CLANG_CUDA_COMPLEX_BUILTINS

clang/lib/Headers/openmp_wrappers/complex

This file was added.

				/*===-- complex --- OpenMP complex wrapper for target regions --------- c++ -===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_OPENMP_COMPLEX_H__
				#define __CLANG_OPENMP_COMPLEX_H__

				#ifndef _OPENMP
				#error "This file is for OpenMP compilation only."
				#endif

				// We require std::math functions in the complex builtins below.
				#include <cmath>

				#pragma omp begin declare variant match( \
				device = {arch(nvptx, nvptx64)}, implementation = {extension(match_any)})

				#define __CUDA__
				#include <__clang_cuda_complex_builtins.h>
				#endif

				#pragma omp end declare variant

				// Grab the host header too.
				#include_next <complex>

clang/test/Headers/Inputs/include/complex

This file was added.

				#pragma once

				#include <cmath>

				#define INFINITY (__builtin_inff())

				namespace std {

				// Taken from libc++
				template <class _Tp>
				class complex {
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Can we #include from libc++ instead? Needs some cmake to skip the test if the library is unavailable but spares duplicating this class JonChesterfield: Can we #include from libc++ instead? Needs some cmake to skip the test if the library is…
				public:
				typedef _Tp value_type;

				private:
				value_type __re_;
				value_type __im_;

				public:
				complex(const value_type &__re = value_type(), const value_type &__im = value_type())
				: __re_(__re), __im_(__im) {}
				template <class _Xp>
				complex(const complex<_Xp> &__c)
				: __re_(__c.real()), __im_(__c.imag()) {}

				value_type real() const { return __re_; }
				value_type imag() const { return __im_; }

				void real(value_type __re) { __re_ = __re; }
				void imag(value_type __im) { __im_ = __im; }

				complex &operator=(const value_type &__re) {
				__re_ = __re;
				__im_ = value_type();
				return *this;
				}
				complex &operator+=(const value_type &__re) {
				__re_ += __re;
				return *this;
				}
				complex &operator-=(const value_type &__re) {
				__re_ -= __re;
				return *this;
				}
				complex &operator*=(const value_type &__re) {
				__re_ *= __re;
				__im_ *= __re;
				return *this;
				}
				complex &operator/=(const value_type &__re) {
				__re_ /= __re;
				__im_ /= __re;
				return *this;
				}

				template <class _Xp>
				complex &operator=(const complex<_Xp> &__c) {
				__re_ = __c.real();
				__im_ = __c.imag();
				return *this;
				}
				template <class _Xp>
				complex &operator+=(const complex<_Xp> &__c) {
				__re_ += __c.real();
				__im_ += __c.imag();
				return *this;
				}
				template <class _Xp>
				complex &operator-=(const complex<_Xp> &__c) {
				__re_ -= __c.real();
				__im_ -= __c.imag();
				return *this;
				}
				template <class _Xp>
				complex &operator*=(const complex<_Xp> &__c) {
				this = this * complex(__c.real(), __c.imag());
				return *this;
				}
				template <class _Xp>
				complex &operator/=(const complex<_Xp> &__c) {
				this = this / complex(__c.real(), __c.imag());
				return *this;
				}
				};

				template <class _Tp>
				inline complex<_Tp>
				operator+(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__x);
				__t += __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator+(const complex<_Tp> &__x, const _Tp &__y) {
				complex<_Tp> __t(__x);
				__t += __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator+(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__y);
				__t += __x;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__x);
				__t -= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const complex<_Tp> &__x, const _Tp &__y) {
				complex<_Tp> __t(__x);
				__t -= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(-__y);
				__t += __x;
				return __t;
				}

				template <class _Tp>
				complex<_Tp>
				operator*(const complex<_Tp> &__z, const complex<_Tp> &__w) {
				_Tp __a = __z.real();
				_Tp __b = __z.imag();
				_Tp __c = __w.real();
				_Tp __d = __w.imag();
				_Tp __ac = __a * __c;
				_Tp __bd = __b * __d;
				_Tp __ad = __a * __d;
				_Tp __bc = __b * __c;
				_Tp __x = __ac - __bd;
				_Tp __y = __ad + __bc;
				if (std::isnan(__x) && std::isnan(__y)) {
				bool __recalc = false;
				if (std::isinf(__a) \|\| std::isinf(__b)) {
				__a = copysign(std::isinf(__a) ? _Tp(1) : _Tp(0), __a);
				__b = copysign(std::isinf(__b) ? _Tp(1) : _Tp(0), __b);
				if (std::isnan(__c))
				__c = copysign(_Tp(0), __c);
				if (std::isnan(__d))
				__d = copysign(_Tp(0), __d);
				__recalc = true;
				}
				if (std::isinf(__c) \|\| std::isinf(__d)) {
				__c = copysign(std::isinf(__c) ? _Tp(1) : _Tp(0), __c);
				__d = copysign(std::isinf(__d) ? _Tp(1) : _Tp(0), __d);
				if (std::isnan(__a))
				__a = copysign(_Tp(0), __a);
				if (std::isnan(__b))
				__b = copysign(_Tp(0), __b);
				__recalc = true;
				}
				if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|
				std::isinf(__ad) \|\| std::isinf(__bc))) {
				if (std::isnan(__a))
				__a = copysign(_Tp(0), __a);
				if (std::isnan(__b))
				__b = copysign(_Tp(0), __b);
				if (std::isnan(__c))
				__c = copysign(_Tp(0), __c);
				if (std::isnan(__d))
				__d = copysign(_Tp(0), __d);
				__recalc = true;
				}
				if (__recalc) {
				__x = _Tp(INFINITY) * (__a * __c - __b * __d);
				__y = _Tp(INFINITY) * (__a * __d + __b * __c);
				}
				}
				return complex<_Tp>(__x, __y);
				}

				template <class _Tp>
				inline complex<_Tp>
				operator*(const complex<_Tp> &__x, const _Tp &__y) {
				complex<_Tp> __t(__x);
				__t *= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator*(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__y);
				__t *= __x;
				return __t;
				}

				template <class _Tp>
				complex<_Tp>
				operator/(const complex<_Tp> &__z, const complex<_Tp> &__w) {
				int __ilogbw = 0;
				_Tp __a = __z.real();
				_Tp __b = __z.imag();
				_Tp __c = __w.real();
				_Tp __d = __w.imag();
				_Tp __logbw = logb(fmax(fabs(__c), fabs(__d)));
				if (std::isfinite(__logbw)) {
				__ilogbw = static_cast<int>(__logbw);
				__c = scalbn(__c, -__ilogbw);
				__d = scalbn(__d, -__ilogbw);
				}
				_Tp __denom = __c * __c + __d * __d;
				_Tp __x = scalbn((__a * __c + __b * __d) / __denom, -__ilogbw);
				_Tp __y = scalbn((__b * __c - __a * __d) / __denom, -__ilogbw);
				if (std::isnan(__x) && std::isnan(__y)) {
				if ((__denom == _Tp(0)) && (!std::isnan(__a) \|\| !std::isnan(__b))) {
				__x = copysign(_Tp(INFINITY), __c) * __a;
				__y = copysign(_Tp(INFINITY), __c) * __b;
				} else if ((std::isinf(__a) \|\| std::isinf(__b)) && std::isfinite(__c) && std::isfinite(__d)) {
				__a = copysign(std::isinf(__a) ? _Tp(1) : _Tp(0), __a);
				__b = copysign(std::isinf(__b) ? _Tp(1) : _Tp(0), __b);
				__x = _Tp(INFINITY) * (__a * __c + __b * __d);
				__y = _Tp(INFINITY) * (__b * __c - __a * __d);
				} else if (std::isinf(__logbw) && __logbw > _Tp(0) && std::isfinite(__a) && std::isfinite(__b)) {
				__c = copysign(std::isinf(__c) ? _Tp(1) : _Tp(0), __c);
				__d = copysign(std::isinf(__d) ? _Tp(1) : _Tp(0), __d);
				__x = _Tp(0) * (__a * __c + __b * __d);
				__y = _Tp(0) * (__b * __c - __a * __d);
				}
				}
				return complex<_Tp>(__x, __y);
				}

				template <class _Tp>
				inline complex<_Tp>
				operator/(const complex<_Tp> &__x, const _Tp &__y) {
				return complex<_Tp>(__x.real() / __y, __x.imag() / __y);
				}

				template <class _Tp>
				inline complex<_Tp>
				operator/(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__x);
				__t /= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator+(const complex<_Tp> &__x) {
				return __x;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const complex<_Tp> &__x) {
				return complex<_Tp>(-__x.real(), -__x.imag());
				}

				template <class _Tp>
				inline bool
				operator==(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				return __x.real() == __y.real() && __x.imag() == __y.imag();
				}

				template <class _Tp>
				inline bool
				operator==(const complex<_Tp> &__x, const _Tp &__y) {
				return __x.real() == __y && __x.imag() == 0;
				}

				template <class _Tp>
				inline bool
				operator==(const _Tp &__x, const complex<_Tp> &__y) {
				return __x == __y.real() && 0 == __y.imag();
				}

				template <class _Tp>
				inline bool
				operator!=(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				return !(__x == __y);
				}

				template <class _Tp>
				inline bool
				operator!=(const complex<_Tp> &__x, const _Tp &__y) {
				return !(__x == __y);
				}

				template <class _Tp>
				inline bool
				operator!=(const _Tp &__x, const complex<_Tp> &__y) {
				return !(__x == __y);
				}

				} // namespace std

clang/test/Headers/Inputs/include/cstdlib

	Show All 18 Lines

	inline long			inline long
	abs(long __i) { return __builtin_labs(__i); }			abs(long __i) { return __builtin_labs(__i); }

	inline long long			inline long long
	abs(long long __x) { return __builtin_llabs (__x); }			abs(long long __x) { return __builtin_llabs (__x); }

	float fabs(float __x) { return __builtin_fabs(__x); }			float fabs(float __x) { return __builtin_fabs(__x); }

				float abs(float __x) { return fabs(__x); }
				double abs(double __x) { return fabs(__x); }

	}			}

clang/test/Headers/Inputs/include/math.h

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	float logbf(float __a);			float logbf(float __a);
	float logf(float __a);			float logf(float __a);
	long lrint(double __a);			long lrint(double __a);
	long lrintf(float __a);			long lrintf(float __a);
	long lround(double __a);			long lround(double __a);
	long lroundf(float __a);			long lroundf(float __a);
	int max(int __a, int __b);			int max(int __a, int __b);
	int min(int __a, int __b);			int min(int __a, int __b);
				float max(float __a, float __b);
				float min(float __a, float __b);
				double max(double __a, double __b);
				double min(double __a, double __b);
	double modf(double __a, double *__b);			double modf(double __a, double *__b);
	float modff(float __a, float *__b);			float modff(float __a, float *__b);
	double nearbyint(double __a);			double nearbyint(double __a);
	float nearbyintf(float __a);			float nearbyintf(float __a);
	double nextafter(double __a, double __b);			double nextafter(double __a, double __b);
	float nextafterf(float __a, float __b);			float nextafterf(float __a, float __b);
	double norm(int __dim, const double *__t);			double norm(int __dim, const double *__t);
	double norm3d(double __a, double __b, double __c);			double norm3d(double __a, double __b, double __c);
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

clang/test/Headers/nvptx_device_math_complex.cpp

This file was added.

				// REQUIRES: nvptx-registered-target
				// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
				// expected-no-diagnostics

				#include <complex>

				// CHECK-DAG: define {{.*}} @__mulsc3
				// CHECK-DAG: define {{.*}} @__muldc3
				// CHECK-DAG: define {{.*}} @__divsc3
				// CHECK-DAG: define {{.*}} @__divdc3

				void test_scmplx(std::complex<float> a) {
				#pragma omp target
				{
				(void)(a * (a / a));
				}
				}

				void test_dcmplx(std::complex<double> a) {
				#pragma omp target
				{
				(void)(a * (a / a));
				}
				}