This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Headers/
-
Headers/
-
CMakeLists.txt
4/8
__clang_cuda_complex_builtins.h
-
__clang_cuda_math.h
-
openmp_wrappers/
-
complex
-
complex.h
-
test/Headers/
-
Headers/
-
Inputs/include/
-
include/
-
cmath
1
complex
-
cstdlib
-
nvptx_device_math_complex.c
-
nvptx_device_math_complex.cpp

Differential D80897

[OpenMP] Initial support for std::complex in target regions
ClosedPublic

Authored by jdoerfert on May 31 2020, 1:08 PM.

Download Raw Diff

Details

Reviewers

tra
hfinkel
ABataev
JonChesterfield

Commits

rGd999cbc98832: [OpenMP] Initial support for std::complex in target regions

Summary

This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., __muldc3 not found, when std::complex
operations are used on a device.

In "CUDA mode" this should allow simple complex operations to work in
target regions. Normal mode doesn't work because the globalization in
the std::complex operators is somehow broken. This will most likely not
allow complex make math function calls to work properly, e.g., sin, but
that is more complex (pan intended) anyway.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jdoerfert created this revision.May 31 2020, 1:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2020, 1:08 PM

Herald added subscribers: sstefan1, guansong, bollu and 2 others. · View Herald Transcript

Harbormaster failed remote builds in B58576: Diff 267521!May 31 2020, 1:34 PM

Fix tests, add C support

Harbormaster failed remote builds in B58580: Diff 267531!May 31 2020, 4:30 PM

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.
Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to __mul* or __div*, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with -stdlib=libc++ --std=c++11 does not need the soft-float functions. That would explain why we don't see the build breaks.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

clang/lib/Headers/__clang_cuda_complex_builtins.h
29	Nit: this creates impression that we fall back on `double` variant of the function, while in reality we'll end up using `std::isnan<float>`. Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing `std::complex<half>` which may overload resolution ambiguous in some cases.
63	Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html I wonder why only the complex variants of the soft-float support functions are missing. Does it mean that x86 code also does rely on the library to do complex multiplication? If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it?

In D80897#2066723, @tra wrote:

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.

Which functions and missing from where? In CUDA-mode we did provide __XXXXc3 already.

Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to __mul* or __div*, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with -stdlib=libc++ --std=c++11 does not need the soft-float functions. That would explain why we don't see the build breaks.

Its not that simple, and tbh, I don't have the full picture yet. Plain (clang) CUDA uses these functions (https://godbolt.org/z/dp_FY2), they just disappear after inlining because of the linkage. If you however enable -fast-math they are not used (https://godbolt.org/z/_N-STh). I couldn't run with stdlib=libc++ locally and godbold cuts of the output so I'm not sure if they are used and inlined or not used.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

The way I understand this is that we can always provide correct weak versions of __XXXXc3 without any correctness issues. They will be stripped if they are not needed anyway. That said, this patch should not modify the CUDA behavior (except minor float vs double corrections in the __XXXXc3 methods). Could you elaborate what interference you expect?

clang/lib/Headers/__clang_cuda_complex_builtins.h
29	No problem. I'll just use std::NAME for all of them.
63	I wonder why only the complex variants of the soft-float support functions are missing. I would guess others are conceptually missing too, the question is if we need them. I did grep the clang source for 7 non-complex soft-float support functions from the different categories listed in the gcc docs, none was found. Does it mean that x86 code also does rely on the library to do complex multiplication? I think so, yes. Some system library will provide the implementation of `__muldc3` for the slow path of a complex multiplication. If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it? I think I don't understand this (and maybe the question above). What we do in CUDA right now, and with this patch in OpenMP, is to provide the `__XXXXc3` functions on the device. Usually they are in some system library that we just not have on the device so we have to add them somehow.

arsenm added a subscriber: arsenm.Jun 1 2020, 12:10 PM

arsenm added inline comments.

clang/lib/Headers/__clang_cuda_complex_builtins.h
142–143	Why does this try to preserve the sign of a nan? They are meaningless

In D80897#2066952, @jdoerfert wrote:

In D80897#2066723, @tra wrote:

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.

Which functions and missing from where? In CUDA-mode we did provide __XXXXc3 already.

I mean the __XXXXc3 functions added by the patch. I've tried with clang as it is now, before your patch.

Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to __mul* or __div*, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with -stdlib=libc++ --std=c++11 does not need the soft-float functions. That would explain why we don't see the build breaks.

Its not that simple, and tbh, I don't have the full picture yet. Plain (clang) CUDA uses these functions (https://godbolt.org/z/dp_FY2), they just disappear after inlining because of the linkage. If you however enable -fast-math they are not used (https://godbolt.org/z/_N-STh). I couldn't run with stdlib=libc++ locally and godbold cuts of the output so I'm not sure if they are used and inlined or not used.

I've checked it locally and verified that adding --stdlib=libc++ -std=c++11 to your first example shows that __*c3 functions do not appear in IR regardless of inlining or opt level.
I wonder what is that that libstdc++ does that makes those functions show up in IR. AFAICT, it's not invoked directly by the library, so it must be something clang has generated. Perhaps something should be fixed there.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

The way I understand this is that we can always provide correct weak versions of __XXXXc3 without any correctness issues. They will be stripped if they are not needed anyway. That said, this patch should not modify the CUDA behavior (except minor float vs double corrections in the __XXXXc3 methods). Could you elaborate what interference you expect?

One example would be if/when we grow a better libm support for GPUs. Granted, it's just few functions and we could just remove these instances then.
I agree that adding these functions now will probably not interfere with anything we have now -- they are device-side overloads and nobody calls them directly.
The suggestion was based on a general principle of minimizing the changes that overlap with the standard libraries -- there are quite a few versions out there and I can't predict what quirks of theirs I'm not aware of. I've been burned too many times by that to be wary.

clang/lib/Headers/__clang_cuda_complex_builtins.h
63	I'm OK with providing device-side equivalents of the host standard library. What' I'm trying to figure out if why we don't need to do it in some cases. In case whe we do rely on these functions, but don't have them, we have at least two choices -- provide the missing functions (this patch) or ensure we never need these functions (what I'm trying to figure out). If there's a way to reliably ensure that we don't need these functions, I'd prefer that. Right now the observation is that libc++ somehow avoids it. If we can improve clang that libstdc++ would also work without falling back on the `__*c3` functions, that may be a better fix for this. That said, I don't understand yet why/how the standard c++ libraries end up with different code in this case.

I tried to determine why we don't emit such calls for c++11 and stdc++ but I was not successful :( Tracking back from the emission lead to the generic expression codegen without any (obvious) check of the runtime library or std versions.

clang/lib/Headers/__clang_cuda_complex_builtins.h
142–143	Idk [I only work here... ;)] I guess the algorithm was once copied from libc++, unclear if the one in there is still the same, we could check.

saiislam added a subscriber: saiislam.Jun 3 2020, 8:05 AM

jdoerfert marked an inline comment as done.Jun 3 2020, 12:47 PM

jdoerfert added inline comments.

clang/lib/Headers/__clang_cuda_complex_builtins.h
42	This will actually not work right now as we do not overload isinf/isnan/isfinite properly in C++ mode. I first have to find a solution for that mess.

@tra After chatting with @hfinkel I know now why we don't see the calls in the libc++ case. libc++ implements std::complex without _Complex types, stdlib++ does. If the user uses _Complex directly we need these functions for sure as the standard defines them (I think): https://godbolt.org/z/jcXgnH

So we need them and I would like to reuse them in the OpenMP offload path :)

@JonChesterfield @hfinkel @tra ping

I would really like to land this before the release branches off to allow people to use complex in target regions.

I think this change is good. The library story is a bit difficult, but fundamentally openmp needs a shim of some sort to map target math functions onto the libm of the underlying device.

For nvptx, that's the cuda library. Amdgcn has math functions and may need another shim to map them to libm.

include_next is nasty, but that's the existing pattern for some library headers.

clang/test/Headers/Inputs/include/complex
11	Can we #include from libc++ instead? Needs some cmake to skip the test if the library is unavailable but spares duplicating this class

This revision is now accepted and ready to land.Jul 2 2020, 5:17 PM

ye-luo added a subscriber: ye-luo.Jul 7 2020, 6:25 AM

Addressed comments

LGTM.

Harbormaster failed remote builds in B63206: Diff 276053!Jul 7 2020, 9:53 AM

Closed by commit rGd999cbc98832: [OpenMP] Initial support for std::complex in target regions (authored by jdoerfert). · Explain WhyJul 8 2020, 3:36 PM

This revision was automatically updated to reflect the committed changes.

tra mentioned this in D83591: [OpenMP][CUDA] Fix std::complex in GPU regions.Jul 10 2020, 4:47 PM

Revision Contents

Path

Size

clang/

lib/

Headers/

CMakeLists.txt

2 lines

__clang_cuda_complex_builtins.h

272 lines

__clang_cuda_math.h

10 lines

openmp_wrappers/

complex

25 lines

complex.h

25 lines

test/

Headers/

Inputs/

include/

cmath

4 lines

complex

301 lines

cstdlib

4 lines

nvptx_device_math_complex.c

24 lines

nvptx_device_math_complex.cpp

27 lines

Diff 276584

clang/lib/Headers/CMakeLists.txt

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	set(ppc_wrapper_files
ppc_wrappers/pmmintrin.h		ppc_wrappers/pmmintrin.h
ppc_wrappers/tmmintrin.h		ppc_wrappers/tmmintrin.h
ppc_wrappers/smmintrin.h		ppc_wrappers/smmintrin.h
)		)

set(openmp_wrapper_files		set(openmp_wrapper_files
openmp_wrappers/math.h		openmp_wrappers/math.h
openmp_wrappers/cmath		openmp_wrappers/cmath
		openmp_wrappers/complex.h
		openmp_wrappers/complex
openmp_wrappers/__clang_openmp_device_functions.h		openmp_wrappers/__clang_openmp_device_functions.h
openmp_wrappers/new		openmp_wrappers/new
)		)

set(output_dir ${LLVM_LIBRARY_OUTPUT_INTDIR}/clang/${CLANG_VERSION}/include)		set(output_dir ${LLVM_LIBRARY_OUTPUT_INTDIR}/clang/${CLANG_VERSION}/include)
set(out_files)		set(out_files)
set(generated_files)		set(generated_files)

▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

clang/lib/Headers/__clang_cuda_complex_builtins.h

	/*===-- __clang_cuda_complex_builtins - CUDA impls of runtime complex fns ---===			/*===-- __clang_cuda_complex_builtins - CUDA impls of runtime complex fns ---===
	*			*
	* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	* See https://llvm.org/LICENSE.txt for license information.			* See https://llvm.org/LICENSE.txt for license information.
	* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	*			*
	*===-----------------------------------------------------------------------===			*===-----------------------------------------------------------------------===
	*/			*/

	#ifndef __CLANG_CUDA_COMPLEX_BUILTINS			#ifndef __CLANG_CUDA_COMPLEX_BUILTINS
	#define __CLANG_CUDA_COMPLEX_BUILTINS			#define __CLANG_CUDA_COMPLEX_BUILTINS

	// This header defines __muldc3, __mulsc3, __divdc3, and __divsc3. These are			// This header defines __muldc3, __mulsc3, __divdc3, and __divsc3. These are
	// libgcc functions that clang assumes are available when compiling c99 complex			// libgcc functions that clang assumes are available when compiling c99 complex
	// operations. (These implementations come from libc++, and have been modified			// operations. (These implementations come from libc++, and have been modified
	// to work with CUDA.)			// to work with CUDA and OpenMP target offloading [in C and C++ mode].)

	extern "C" inline __device__ double _Complex __muldc3(double __a, double __b,			#pragma push_macro("__DEVICE__")
	double __c, double __d) {			#ifdef _OPENMP
				#pragma omp declare target
				#define __DEVICE__ __attribute__((noinline, nothrow, cold))
				#else
				#define __DEVICE__ __device__ inline
				#endif

				// Make the algorithms available for C and C++ by selecting the right functions.
				#if defined(__cplusplus)
				// TODO: In OpenMP mode we cannot overload isinf/isnan/isfinite the way we
				// overload all other math functions because old math system headers and not
				traUnsubmitted Not Done Reply Inline Actions Nit: this creates impression that we fall back on `double` variant of the function, while in reality we'll end up using `std::isnan<float>`. Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing `std::complex<half>` which may overload resolution ambiguous in some cases. tra: Nit: this creates impression that we fall back on `double` variant of the function, while in…
				jdoerfertAuthorUnsubmitted Done Reply Inline Actions No problem. I'll just use std::NAME for all of them. jdoerfert: No problem. I'll just use std::NAME for all of them.
				// always conformant and return an integer instead of a boolean. Until that has
				// been addressed we need to work around it. For now, we substituate with the
				// calls we would have used to implement those three functions. Note that we
				// could use the C alternatives as well.
				#define _ISNANd ::__isnan
				#define _ISNANf ::__isnanf
				#define _ISINFd ::__isinf
				#define _ISINFf ::__isinff
				#define _ISFINITEd ::__isfinited
				#define _ISFINITEf ::__finitef
				#define _COPYSIGNd std::copysign
				#define _COPYSIGNf std::copysign
				#define _SCALBNd std::scalbn
				jdoerfertAuthorUnsubmitted Done Reply Inline Actions This will actually not work right now as we do not overload isinf/isnan/isfinite properly in C++ mode. I first have to find a solution for that mess. jdoerfert: This will actually not work right now as we do not overload isinf/isnan/isfinite properly in…
				#define _SCALBNf std::scalbn
				#define _ABSd std::abs
				#define _ABSf std::abs
				#define _LOGBd std::logb
				#define _LOGBf std::logb
				#else
				#define _ISNANd isnan
				#define _ISNANf isnanf
				#define _ISINFd isinf
				#define _ISINFf isinff
				#define _ISFINITEd isfinite
				#define _ISFINITEf isfinitef
				#define _COPYSIGNd copysign
				#define _COPYSIGNf copysignf
				#define _SCALBNd scalbn
				#define _SCALBNf scalbnf
				#define _ABSd abs
				#define _ABSf absf
				#define _LOGBd logb
				#define _LOGBf logbf
				#endif
				traUnsubmitted Not Done Reply Inline Actions Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html I wonder why only the complex variants of the soft-float support functions are missing. Does it mean that x86 code also does rely on the library to do complex multiplication? If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it? tra: Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft…
				jdoerfertAuthorUnsubmitted Done Reply Inline Actions I wonder why only the complex variants of the soft-float support functions are missing. I would guess others are conceptually missing too, the question is if we need them. I did grep the clang source for 7 non-complex soft-float support functions from the different categories listed in the gcc docs, none was found. Does it mean that x86 code also does rely on the library to do complex multiplication? I think so, yes. Some system library will provide the implementation of `__muldc3` for the slow path of a complex multiplication. If x86 can do complex ops, why can't nvptx? If x86 can't, would make sense to teach it? I think I don't understand this (and maybe the question above). What we do in CUDA right now, and with this patch in OpenMP, is to provide the `__XXXXc3` functions on the device. Usually they are in some system library that we just not have on the device so we have to add them somehow. jdoerfert: > I wonder why only the complex variants of the soft-float support functions are missing. I…
				traUnsubmitted Not Done Reply Inline Actions I'm OK with providing device-side equivalents of the host standard library. What' I'm trying to figure out if why we don't need to do it in some cases. In case whe we do rely on these functions, but don't have them, we have at least two choices -- provide the missing functions (this patch) or ensure we never need these functions (what I'm trying to figure out). If there's a way to reliably ensure that we don't need these functions, I'd prefer that. Right now the observation is that libc++ somehow avoids it. If we can improve clang that libstdc++ would also work without falling back on the `__c3` functions, that may be a better fix for this. That said, I don't understand yet why/how the standard c++ libraries end up with different code in this case. tra:* I'm OK with providing device-side equivalents of the host standard library. What' I'm trying…

				#if defined(__cplusplus)
				extern "C" {
				#endif

				__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,
				double __d) {
	double __ac = __a * __c;			double __ac = __a * __c;
	double __bd = __b * __d;			double __bd = __b * __d;
	double __ad = __a * __d;			double __ad = __a * __d;
	double __bc = __b * __c;			double __bc = __b * __c;
	double _Complex z;			double _Complex z;
	__real__(z) = __ac - __bd;			__real__(z) = __ac - __bd;
	__imag__(z) = __ad + __bc;			__imag__(z) = __ad + __bc;
	if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {			if (_ISNANd(__real__(z)) && _ISNANd(__imag__(z))) {
	int __recalc = 0;			int __recalc = 0;
	if (std::isinf(__a) \|\| std::isinf(__b)) {			if (_ISINFd(__a) \|\| _ISINFd(__b)) {
	__a = std::copysign(std::isinf(__a) ? 1 : 0, __a);			__a = _COPYSIGNd(_ISINFd(__a) ? 1 : 0, __a);
	__b = std::copysign(std::isinf(__b) ? 1 : 0, __b);			__b = _COPYSIGNd(_ISINFd(__b) ? 1 : 0, __b);
	if (std::isnan(__c))			if (_ISNANd(__c))
	__c = std::copysign(0, __c);			__c = _COPYSIGNd(0, __c);
	if (std::isnan(__d))			if (_ISNANd(__d))
	__d = std::copysign(0, __d);			__d = _COPYSIGNd(0, __d);
	__recalc = 1;			__recalc = 1;
	}			}
	if (std::isinf(__c) \|\| std::isinf(__d)) {			if (_ISINFd(__c) \|\| _ISINFd(__d)) {
	__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);			__c = _COPYSIGNd(_ISINFd(__c) ? 1 : 0, __c);
	__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);			__d = _COPYSIGNd(_ISINFd(__d) ? 1 : 0, __d);
	if (std::isnan(__a))			if (_ISNANd(__a))
	__a = std::copysign(0, __a);			__a = _COPYSIGNd(0, __a);
	if (std::isnan(__b))			if (_ISNANd(__b))
	__b = std::copysign(0, __b);			__b = _COPYSIGNd(0, __b);
	__recalc = 1;			__recalc = 1;
	}			}
	if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|			if (!__recalc &&
	std::isinf(__ad) \|\| std::isinf(__bc))) {			(_ISINFd(__ac) \|\| _ISINFd(__bd) \|\| _ISINFd(__ad) \|\| _ISINFd(__bc))) {
	if (std::isnan(__a))			if (_ISNANd(__a))
	__a = std::copysign(0, __a);			__a = _COPYSIGNd(0, __a);
	if (std::isnan(__b))			if (_ISNANd(__b))
	__b = std::copysign(0, __b);			__b = _COPYSIGNd(0, __b);
	if (std::isnan(__c))			if (_ISNANd(__c))
	__c = std::copysign(0, __c);			__c = _COPYSIGNd(0, __c);
	if (std::isnan(__d))			if (_ISNANd(__d))
	__d = std::copysign(0, __d);			__d = _COPYSIGNd(0, __d);
	__recalc = 1;			__recalc = 1;
	}			}
	if (__recalc) {			if (__recalc) {
	// Can't use std::numeric_limits<double>::infinity() -- that doesn't have			// Can't use std::numeric_limits<double>::infinity() -- that doesn't have
	// a device overload (and isn't constexpr before C++11, naturally).			// a device overload (and isn't constexpr before C++11, naturally).
	__real__(z) = __builtin_huge_valf() * (__a * __c - __b * __d);			__real__(z) = __builtin_huge_val() * (__a * __c - __b * __d);
	__imag__(z) = __builtin_huge_valf() * (__a * __d + __b * __c);			__imag__(z) = __builtin_huge_val() * (__a * __d + __b * __c);
	}			}
	}			}
	return z;			return z;
	}			}

	extern "C" inline __device__ float _Complex __mulsc3(float __a, float __b,			__DEVICE__ float _Complex __mulsc3(float __a, float __b, float __c, float __d) {
	float __c, float __d) {
	float __ac = __a * __c;			float __ac = __a * __c;
	float __bd = __b * __d;			float __bd = __b * __d;
	float __ad = __a * __d;			float __ad = __a * __d;
	float __bc = __b * __c;			float __bc = __b * __c;
	float _Complex z;			float _Complex z;
	__real__(z) = __ac - __bd;			__real__(z) = __ac - __bd;
	__imag__(z) = __ad + __bc;			__imag__(z) = __ad + __bc;
	if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {			if (_ISNANf(__real__(z)) && _ISNANf(__imag__(z))) {
	int __recalc = 0;			int __recalc = 0;
	if (std::isinf(__a) \|\| std::isinf(__b)) {			if (_ISINFf(__a) \|\| _ISINFf(__b)) {
	__a = std::copysign(std::isinf(__a) ? 1 : 0, __a);			__a = _COPYSIGNf(_ISINFf(__a) ? 1 : 0, __a);
	__b = std::copysign(std::isinf(__b) ? 1 : 0, __b);			__b = _COPYSIGNf(_ISINFf(__b) ? 1 : 0, __b);
	if (std::isnan(__c))			if (_ISNANf(__c))
	__c = std::copysign(0, __c);			__c = _COPYSIGNf(0, __c);
	if (std::isnan(__d))			if (_ISNANf(__d))
	__d = std::copysign(0, __d);			__d = _COPYSIGNf(0, __d);
	__recalc = 1;			__recalc = 1;
	}			}
	if (std::isinf(__c) \|\| std::isinf(__d)) {			if (_ISINFf(__c) \|\| _ISINFf(__d)) {
	__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);			__c = _COPYSIGNf(_ISINFf(__c) ? 1 : 0, __c);
	__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);			__d = _COPYSIGNf(_ISINFf(__d) ? 1 : 0, __d);
	if (std::isnan(__a))			if (_ISNANf(__a))
	__a = std::copysign(0, __a);			__a = _COPYSIGNf(0, __a);
				arsenmUnsubmitted Not Done Reply Inline Actions Why does this try to preserve the sign of a nan? They are meaningless arsenm: Why does this try to preserve the sign of a nan? They are meaningless
				jdoerfertAuthorUnsubmitted Done Reply Inline Actions Idk [I only work here... ;)] I guess the algorithm was once copied from libc++, unclear if the one in there is still the same, we could check. jdoerfert: Idk [I only work here... ;)] I guess the algorithm was once copied from libc++, unclear if the…
	if (std::isnan(__b))			if (_ISNANf(__b))
	__b = std::copysign(0, __b);			__b = _COPYSIGNf(0, __b);
	__recalc = 1;			__recalc = 1;
	}			}
	if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|			if (!__recalc &&
	std::isinf(__ad) \|\| std::isinf(__bc))) {			(_ISINFf(__ac) \|\| _ISINFf(__bd) \|\| _ISINFf(__ad) \|\| _ISINFf(__bc))) {
	if (std::isnan(__a))			if (_ISNANf(__a))
	__a = std::copysign(0, __a);			__a = _COPYSIGNf(0, __a);
	if (std::isnan(__b))			if (_ISNANf(__b))
	__b = std::copysign(0, __b);			__b = _COPYSIGNf(0, __b);
	if (std::isnan(__c))			if (_ISNANf(__c))
	__c = std::copysign(0, __c);			__c = _COPYSIGNf(0, __c);
	if (std::isnan(__d))			if (_ISNANf(__d))
	__d = std::copysign(0, __d);			__d = _COPYSIGNf(0, __d);
	__recalc = 1;			__recalc = 1;
	}			}
	if (__recalc) {			if (__recalc) {
	__real__(z) = __builtin_huge_valf() * (__a * __c - __b * __d);			__real__(z) = __builtin_huge_valf() * (__a * __c - __b * __d);
	__imag__(z) = __builtin_huge_valf() * (__a * __d + __b * __c);			__imag__(z) = __builtin_huge_valf() * (__a * __d + __b * __c);
	}			}
	}			}
	return z;			return z;
	}			}

	extern "C" inline __device__ double _Complex __divdc3(double __a, double __b,			__DEVICE__ double _Complex __divdc3(double __a, double __b, double __c,
	double __c, double __d) {			double __d) {
	int __ilogbw = 0;			int __ilogbw = 0;
	// Can't use std::max, because that's defined in <algorithm>, and we don't			// Can't use std::max, because that's defined in <algorithm>, and we don't
	// want to pull that in for every compile. The CUDA headers define			// want to pull that in for every compile. The CUDA headers define
	// ::max(float, float) and ::max(double, double), which is sufficient for us.			// ::max(float, float) and ::max(double, double), which is sufficient for us.
	double __logbw = std::logb(max(std::abs(__c), std::abs(__d)));			double __logbw = _LOGBd(max(_ABSd(__c), _ABSd(__d)));
	if (std::isfinite(__logbw)) {			if (_ISFINITEd(__logbw)) {
	__ilogbw = (int)__logbw;			__ilogbw = (int)__logbw;
	__c = std::scalbn(__c, -__ilogbw);			__c = _SCALBNd(__c, -__ilogbw);
	__d = std::scalbn(__d, -__ilogbw);			__d = _SCALBNd(__d, -__ilogbw);
	}			}
	double __denom = __c * __c + __d * __d;			double __denom = __c * __c + __d * __d;
	double _Complex z;			double _Complex z;
	__real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, -__ilogbw);			__real__(z) = _SCALBNd((__a * __c + __b * __d) / __denom, -__ilogbw);
	__imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, -__ilogbw);			__imag__(z) = _SCALBNd((__b * __c - __a * __d) / __denom, -__ilogbw);
	if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {			if (_ISNANd(__real__(z)) && _ISNANd(__imag__(z))) {
	if ((__denom == 0.0) && (!std::isnan(__a) \|\| !std::isnan(__b))) {			if ((__denom == 0.0) && (!_ISNANd(__a) \|\| !_ISNANd(__b))) {
	__real__(z) = std::copysign(__builtin_huge_valf(), __c) * __a;			__real__(z) = _COPYSIGNd(__builtin_huge_val(), __c) * __a;
	__imag__(z) = std::copysign(__builtin_huge_valf(), __c) * __b;			__imag__(z) = _COPYSIGNd(__builtin_huge_val(), __c) * __b;
	} else if ((std::isinf(__a) \|\| std::isinf(__b)) && std::isfinite(__c) &&			} else if ((_ISINFd(__a) \|\| _ISINFd(__b)) && _ISFINITEd(__c) &&
	std::isfinite(__d)) {			_ISFINITEd(__d)) {
	__a = std::copysign(std::isinf(__a) ? 1.0 : 0.0, __a);			__a = _COPYSIGNd(_ISINFd(__a) ? 1.0 : 0.0, __a);
	__b = std::copysign(std::isinf(__b) ? 1.0 : 0.0, __b);			__b = _COPYSIGNd(_ISINFd(__b) ? 1.0 : 0.0, __b);
	__real__(z) = __builtin_huge_valf() * (__a * __c + __b * __d);			__real__(z) = __builtin_huge_val() * (__a * __c + __b * __d);
	__imag__(z) = __builtin_huge_valf() * (__b * __c - __a * __d);			__imag__(z) = __builtin_huge_val() * (__b * __c - __a * __d);
	} else if (std::isinf(__logbw) && __logbw > 0.0 && std::isfinite(__a) &&			} else if (_ISINFd(__logbw) && __logbw > 0.0 && _ISFINITEd(__a) &&
	std::isfinite(__b)) {			_ISFINITEd(__b)) {
	__c = std::copysign(std::isinf(__c) ? 1.0 : 0.0, __c);			__c = _COPYSIGNd(_ISINFd(__c) ? 1.0 : 0.0, __c);
	__d = std::copysign(std::isinf(__d) ? 1.0 : 0.0, __d);			__d = _COPYSIGNd(_ISINFd(__d) ? 1.0 : 0.0, __d);
	__real__(z) = 0.0 * (__a * __c + __b * __d);			__real__(z) = 0.0 * (__a * __c + __b * __d);
	__imag__(z) = 0.0 * (__b * __c - __a * __d);			__imag__(z) = 0.0 * (__b * __c - __a * __d);
	}			}
	}			}
	return z;			return z;
	}			}

	extern "C" inline __device__ float _Complex __divsc3(float __a, float __b,			__DEVICE__ float _Complex __divsc3(float __a, float __b, float __c, float __d) {
	float __c, float __d) {
	int __ilogbw = 0;			int __ilogbw = 0;
	float __logbw = std::logb(max(std::abs(__c), std::abs(__d)));			float __logbw = _LOGBf(max(_ABSf(__c), _ABSf(__d)));
	if (std::isfinite(__logbw)) {			if (_ISFINITEf(__logbw)) {
	__ilogbw = (int)__logbw;			__ilogbw = (int)__logbw;
	__c = std::scalbn(__c, -__ilogbw);			__c = _SCALBNf(__c, -__ilogbw);
	__d = std::scalbn(__d, -__ilogbw);			__d = _SCALBNf(__d, -__ilogbw);
	}			}
	float __denom = __c * __c + __d * __d;			float __denom = __c * __c + __d * __d;
	float _Complex z;			float _Complex z;
	__real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, -__ilogbw);			__real__(z) = _SCALBNf((__a * __c + __b * __d) / __denom, -__ilogbw);
	__imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, -__ilogbw);			__imag__(z) = _SCALBNf((__b * __c - __a * __d) / __denom, -__ilogbw);
	if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {			if (_ISNANf(__real__(z)) && _ISNANf(__imag__(z))) {
	if ((__denom == 0) && (!std::isnan(__a) \|\| !std::isnan(__b))) {			if ((__denom == 0) && (!_ISNANf(__a) \|\| !_ISNANf(__b))) {
	__real__(z) = std::copysign(__builtin_huge_valf(), __c) * __a;			__real__(z) = _COPYSIGNf(__builtin_huge_valf(), __c) * __a;
	__imag__(z) = std::copysign(__builtin_huge_valf(), __c) * __b;			__imag__(z) = _COPYSIGNf(__builtin_huge_valf(), __c) * __b;
	} else if ((std::isinf(__a) \|\| std::isinf(__b)) && std::isfinite(__c) &&			} else if ((_ISINFf(__a) \|\| _ISINFf(__b)) && _ISFINITEf(__c) &&
	std::isfinite(__d)) {			_ISFINITEf(__d)) {
	__a = std::copysign(std::isinf(__a) ? 1 : 0, __a);			__a = _COPYSIGNf(_ISINFf(__a) ? 1 : 0, __a);
	__b = std::copysign(std::isinf(__b) ? 1 : 0, __b);			__b = _COPYSIGNf(_ISINFf(__b) ? 1 : 0, __b);
	__real__(z) = __builtin_huge_valf() * (__a * __c + __b * __d);			__real__(z) = __builtin_huge_valf() * (__a * __c + __b * __d);
	__imag__(z) = __builtin_huge_valf() * (__b * __c - __a * __d);			__imag__(z) = __builtin_huge_valf() * (__b * __c - __a * __d);
	} else if (std::isinf(__logbw) && __logbw > 0 && std::isfinite(__a) &&			} else if (_ISINFf(__logbw) && __logbw > 0 && _ISFINITEf(__a) &&
	std::isfinite(__b)) {			_ISFINITEf(__b)) {
	__c = std::copysign(std::isinf(__c) ? 1 : 0, __c);			__c = _COPYSIGNf(_ISINFf(__c) ? 1 : 0, __c);
	__d = std::copysign(std::isinf(__d) ? 1 : 0, __d);			__d = _COPYSIGNf(_ISINFf(__d) ? 1 : 0, __d);
	__real__(z) = 0 * (__a * __c + __b * __d);			__real__(z) = 0 * (__a * __c + __b * __d);
	__imag__(z) = 0 * (__b * __c - __a * __d);			__imag__(z) = 0 * (__b * __c - __a * __d);
	}			}
	}			}
	return z;			return z;
	}			}

				#if defined(__cplusplus)
				} // extern "C"
				#endif

				#undef _ISNANd
				#undef _ISNANf
				#undef _ISINFd
				#undef _ISINFf
				#undef _COPYSIGNd
				#undef _COPYSIGNf
				#undef _ISFINITEd
				#undef _ISFINITEf
				#undef _SCALBNd
				#undef _SCALBNf
				#undef _ABSd
				#undef _ABSf
				#undef _LOGBd
				#undef _LOGBf

				#ifdef _OPENMP
				#pragma omp end declare target
				#endif

				#pragma pop_macro("__DEVICE__")

	#endif // __CLANG_CUDA_COMPLEX_BUILTINS			#endif // __CLANG_CUDA_COMPLEX_BUILTINS

clang/lib/Headers/__clang_cuda_math.h

	Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines
	}			}
	__DEVICE__ double y0(double __a) { return __nv_y0(__a); }			__DEVICE__ double y0(double __a) { return __nv_y0(__a); }
	__DEVICE__ float y0f(float __a) { return __nv_y0f(__a); }			__DEVICE__ float y0f(float __a) { return __nv_y0f(__a); }
	__DEVICE__ double y1(double __a) { return __nv_y1(__a); }			__DEVICE__ double y1(double __a) { return __nv_y1(__a); }
	__DEVICE__ float y1f(float __a) { return __nv_y1f(__a); }			__DEVICE__ float y1f(float __a) { return __nv_y1f(__a); }
	__DEVICE__ double yn(int __a, double __b) { return __nv_yn(__a, __b); }			__DEVICE__ double yn(int __a, double __b) { return __nv_yn(__a, __b); }
	__DEVICE__ float ynf(int __a, float __b) { return __nv_ynf(__a, __b); }			__DEVICE__ float ynf(int __a, float __b) { return __nv_ynf(__a, __b); }

				// In C++ mode OpenMP takes the system versions of these because some math
				// headers provide the wrong return type. This cannot happen in C and we can and
				// want to use the specialized versions right away.
				#if defined(_OPENMP) && !defined(__cplusplus)
				__DEVICE__ int isinff(float __x) { return __nv_isinff(__x); }
				__DEVICE__ int isinf(double __x) { return __nv_isinfd(__x); }
				__DEVICE__ int isnanf(float __x) { return __nv_isnanf(__x); }
				__DEVICE__ int isnan(double __x) { return __nv_isnand(__x); }
				#endif

	#pragma pop_macro("__DEVICE__")			#pragma pop_macro("__DEVICE__")
	#pragma pop_macro("__DEVICE_VOID__")			#pragma pop_macro("__DEVICE_VOID__")
	#pragma pop_macro("__FAST_OR_SLOW")			#pragma pop_macro("__FAST_OR_SLOW")

	#endif // __CLANG_CUDA_DEVICE_FUNCTIONS_H__			#endif // __CLANG_CUDA_DEVICE_FUNCTIONS_H__

clang/lib/Headers/openmp_wrappers/complex

This file was added.

				/*===-- complex --- OpenMP complex wrapper for target regions --------- c++ -===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_OPENMP_COMPLEX__
				#define __CLANG_OPENMP_COMPLEX__

				#ifndef _OPENMP
				#error "This file is for OpenMP compilation only."
				#endif

				// We require std::math functions in the complex builtins below.
				#include <cmath>

				#define __CUDA__
				#include <__clang_cuda_complex_builtins.h>
				#endif

				// Grab the host header too.
				#include_next <complex>

clang/lib/Headers/openmp_wrappers/complex.h

This file was added.

				/*===-- complex --- OpenMP complex wrapper for target regions --------- c++ -===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_OPENMP_COMPLEX_H__
				#define __CLANG_OPENMP_COMPLEX_H__

				#ifndef _OPENMP
				#error "This file is for OpenMP compilation only."
				#endif

				// We require math functions in the complex builtins below.
				#include <math.h>

				#define __CUDA__
				#include <__clang_cuda_complex_builtins.h>
				#endif

				// Grab the host header too.
				#include_next <complex.h>

clang/test/Headers/Inputs/include/cmath

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	double fdim(double, double);			double fdim(double, double);
	float fdim(float, float);			float fdim(float, float);
	double floor(double);			double floor(double);
	float floor(float);			float floor(float);
	double fma(double, double, double);			double fma(double, double, double);
	float fma(float, float, float);			float fma(float, float, float);
	double fmax(double, double);			double fmax(double, double);
	float fmax(float, float);			float fmax(float, float);
				float max(float, float);
				double max(double, double);
	double fmin(double, double);			double fmin(double, double);
	float fmin(float, float);			float fmin(float, float);
				float min(float, float);
				double min(double, double);
	double fmod(double, double);			double fmod(double, double);
	float fmod(float, float);			float fmod(float, float);
	int fpclassify(double);			int fpclassify(double);
	int fpclassify(float);			int fpclassify(float);
	double frexp(double, int *);			double frexp(double, int *);
	float frexp(float, int *);			float frexp(float, int *);
	double hypot(double, double);			double hypot(double, double);
	float hypot(float, float);			float hypot(float, float);
	▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

clang/test/Headers/Inputs/include/complex

This file was added.

				#pragma once

				#include <cmath>

				#define INFINITY (__builtin_inff())

				namespace std {

				// Taken from libc++
				template <class _Tp>
				class complex {
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Can we #include from libc++ instead? Needs some cmake to skip the test if the library is unavailable but spares duplicating this class JonChesterfield: Can we #include from libc++ instead? Needs some cmake to skip the test if the library is…
				public:
				typedef _Tp value_type;

				private:
				value_type __re_;
				value_type __im_;

				public:
				complex(const value_type &__re = value_type(), const value_type &__im = value_type())
				: __re_(__re), __im_(__im) {}
				template <class _Xp>
				complex(const complex<_Xp> &__c)
				: __re_(__c.real()), __im_(__c.imag()) {}

				value_type real() const { return __re_; }
				value_type imag() const { return __im_; }

				void real(value_type __re) { __re_ = __re; }
				void imag(value_type __im) { __im_ = __im; }

				complex &operator=(const value_type &__re) {
				__re_ = __re;
				__im_ = value_type();
				return *this;
				}
				complex &operator+=(const value_type &__re) {
				__re_ += __re;
				return *this;
				}
				complex &operator-=(const value_type &__re) {
				__re_ -= __re;
				return *this;
				}
				complex &operator*=(const value_type &__re) {
				__re_ *= __re;
				__im_ *= __re;
				return *this;
				}
				complex &operator/=(const value_type &__re) {
				__re_ /= __re;
				__im_ /= __re;
				return *this;
				}

				template <class _Xp>
				complex &operator=(const complex<_Xp> &__c) {
				__re_ = __c.real();
				__im_ = __c.imag();
				return *this;
				}
				template <class _Xp>
				complex &operator+=(const complex<_Xp> &__c) {
				__re_ += __c.real();
				__im_ += __c.imag();
				return *this;
				}
				template <class _Xp>
				complex &operator-=(const complex<_Xp> &__c) {
				__re_ -= __c.real();
				__im_ -= __c.imag();
				return *this;
				}
				template <class _Xp>
				complex &operator*=(const complex<_Xp> &__c) {
				this = this * complex(__c.real(), __c.imag());
				return *this;
				}
				template <class _Xp>
				complex &operator/=(const complex<_Xp> &__c) {
				this = this / complex(__c.real(), __c.imag());
				return *this;
				}
				};

				template <class _Tp>
				inline complex<_Tp>
				operator+(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__x);
				__t += __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator+(const complex<_Tp> &__x, const _Tp &__y) {
				complex<_Tp> __t(__x);
				__t += __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator+(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__y);
				__t += __x;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__x);
				__t -= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const complex<_Tp> &__x, const _Tp &__y) {
				complex<_Tp> __t(__x);
				__t -= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(-__y);
				__t += __x;
				return __t;
				}

				template <class _Tp>
				complex<_Tp>
				operator*(const complex<_Tp> &__z, const complex<_Tp> &__w) {
				_Tp __a = __z.real();
				_Tp __b = __z.imag();
				_Tp __c = __w.real();
				_Tp __d = __w.imag();
				_Tp __ac = __a * __c;
				_Tp __bd = __b * __d;
				_Tp __ad = __a * __d;
				_Tp __bc = __b * __c;
				_Tp __x = __ac - __bd;
				_Tp __y = __ad + __bc;
				if (std::isnan(__x) && std::isnan(__y)) {
				bool __recalc = false;
				if (std::isinf(__a) \|\| std::isinf(__b)) {
				__a = copysign(std::isinf(__a) ? _Tp(1) : _Tp(0), __a);
				__b = copysign(std::isinf(__b) ? _Tp(1) : _Tp(0), __b);
				if (std::isnan(__c))
				__c = copysign(_Tp(0), __c);
				if (std::isnan(__d))
				__d = copysign(_Tp(0), __d);
				__recalc = true;
				}
				if (std::isinf(__c) \|\| std::isinf(__d)) {
				__c = copysign(std::isinf(__c) ? _Tp(1) : _Tp(0), __c);
				__d = copysign(std::isinf(__d) ? _Tp(1) : _Tp(0), __d);
				if (std::isnan(__a))
				__a = copysign(_Tp(0), __a);
				if (std::isnan(__b))
				__b = copysign(_Tp(0), __b);
				__recalc = true;
				}
				if (!__recalc && (std::isinf(__ac) \|\| std::isinf(__bd) \|\|
				std::isinf(__ad) \|\| std::isinf(__bc))) {
				if (std::isnan(__a))
				__a = copysign(_Tp(0), __a);
				if (std::isnan(__b))
				__b = copysign(_Tp(0), __b);
				if (std::isnan(__c))
				__c = copysign(_Tp(0), __c);
				if (std::isnan(__d))
				__d = copysign(_Tp(0), __d);
				__recalc = true;
				}
				if (__recalc) {
				__x = _Tp(INFINITY) * (__a * __c - __b * __d);
				__y = _Tp(INFINITY) * (__a * __d + __b * __c);
				}
				}
				return complex<_Tp>(__x, __y);
				}

				template <class _Tp>
				inline complex<_Tp>
				operator*(const complex<_Tp> &__x, const _Tp &__y) {
				complex<_Tp> __t(__x);
				__t *= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator*(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__y);
				__t *= __x;
				return __t;
				}

				template <class _Tp>
				complex<_Tp>
				operator/(const complex<_Tp> &__z, const complex<_Tp> &__w) {
				int __ilogbw = 0;
				_Tp __a = __z.real();
				_Tp __b = __z.imag();
				_Tp __c = __w.real();
				_Tp __d = __w.imag();
				_Tp __logbw = logb(fmax(fabs(__c), fabs(__d)));
				if (std::isfinite(__logbw)) {
				__ilogbw = static_cast<int>(__logbw);
				__c = scalbn(__c, -__ilogbw);
				__d = scalbn(__d, -__ilogbw);
				}
				_Tp __denom = __c * __c + __d * __d;
				_Tp __x = scalbn((__a * __c + __b * __d) / __denom, -__ilogbw);
				_Tp __y = scalbn((__b * __c - __a * __d) / __denom, -__ilogbw);
				if (std::isnan(__x) && std::isnan(__y)) {
				if ((__denom == _Tp(0)) && (!std::isnan(__a) \|\| !std::isnan(__b))) {
				__x = copysign(_Tp(INFINITY), __c) * __a;
				__y = copysign(_Tp(INFINITY), __c) * __b;
				} else if ((std::isinf(__a) \|\| std::isinf(__b)) && std::isfinite(__c) && std::isfinite(__d)) {
				__a = copysign(std::isinf(__a) ? _Tp(1) : _Tp(0), __a);
				__b = copysign(std::isinf(__b) ? _Tp(1) : _Tp(0), __b);
				__x = _Tp(INFINITY) * (__a * __c + __b * __d);
				__y = _Tp(INFINITY) * (__b * __c - __a * __d);
				} else if (std::isinf(__logbw) && __logbw > _Tp(0) && std::isfinite(__a) && std::isfinite(__b)) {
				__c = copysign(std::isinf(__c) ? _Tp(1) : _Tp(0), __c);
				__d = copysign(std::isinf(__d) ? _Tp(1) : _Tp(0), __d);
				__x = _Tp(0) * (__a * __c + __b * __d);
				__y = _Tp(0) * (__b * __c - __a * __d);
				}
				}
				return complex<_Tp>(__x, __y);
				}

				template <class _Tp>
				inline complex<_Tp>
				operator/(const complex<_Tp> &__x, const _Tp &__y) {
				return complex<_Tp>(__x.real() / __y, __x.imag() / __y);
				}

				template <class _Tp>
				inline complex<_Tp>
				operator/(const _Tp &__x, const complex<_Tp> &__y) {
				complex<_Tp> __t(__x);
				__t /= __y;
				return __t;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator+(const complex<_Tp> &__x) {
				return __x;
				}

				template <class _Tp>
				inline complex<_Tp>
				operator-(const complex<_Tp> &__x) {
				return complex<_Tp>(-__x.real(), -__x.imag());
				}

				template <class _Tp>
				inline bool
				operator==(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				return __x.real() == __y.real() && __x.imag() == __y.imag();
				}

				template <class _Tp>
				inline bool
				operator==(const complex<_Tp> &__x, const _Tp &__y) {
				return __x.real() == __y && __x.imag() == 0;
				}

				template <class _Tp>
				inline bool
				operator==(const _Tp &__x, const complex<_Tp> &__y) {
				return __x == __y.real() && 0 == __y.imag();
				}

				template <class _Tp>
				inline bool
				operator!=(const complex<_Tp> &__x, const complex<_Tp> &__y) {
				return !(__x == __y);
				}

				template <class _Tp>
				inline bool
				operator!=(const complex<_Tp> &__x, const _Tp &__y) {
				return !(__x == __y);
				}

				template <class _Tp>
				inline bool
				operator!=(const _Tp &__x, const complex<_Tp> &__y) {
				return !(__x == __y);
				}

				} // namespace std

clang/test/Headers/Inputs/include/cstdlib

	Show All 18 Lines

	inline long			inline long
	abs(long __i) { return __builtin_labs(__i); }			abs(long __i) { return __builtin_labs(__i); }

	inline long long			inline long long
	abs(long long __x) { return __builtin_llabs (__x); }			abs(long long __x) { return __builtin_llabs (__x); }

	float fabs(float __x) { return __builtin_fabs(__x); }			float fabs(float __x) { return __builtin_fabs(__x); }

				float abs(float __x) { return fabs(__x); }
				double abs(double __x) { return fabs(__x); }

	}			}

clang/test/Headers/nvptx_device_math_complex.c

	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s			// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
				// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	// CHECK-DAG: call { float, float } @__divsc3(			#ifdef __cplusplus
	// CHECK-DAG: call { float, float } @__mulsc3(			#include <complex>
				#else
				#include <complex.h>
				#endif

				// CHECK-DAG: define {{.*}} @__mulsc3
				// CHECK-DAG: define {{.*}} @__muldc3
				// CHECK-DAG: define {{.*}} @__divsc3
				// CHECK-DAG: define {{.*}} @__divdc3

				// CHECK-DAG: call float @__nv_scalbnf(
	void test_scmplx(float _Complex a) {			void test_scmplx(float _Complex a) {
	#pragma omp target			#pragma omp target
	{			{
	(void)(a * (a / a));			(void)(a * (a / a));
	}			}
	}			}

				// CHECK-DAG: call double @__nv_scalbn(
	// CHECK-DAG: call { double, double } @__divdc3(
	// CHECK-DAG: call { double, double } @__muldc3(
	void test_dcmplx(double _Complex a) {			void test_dcmplx(double _Complex a) {
	#pragma omp target			#pragma omp target
	{			{
	(void)(a * (a / a));			(void)(a * (a / a));
	}			}
	}			}

clang/test/Headers/nvptx_device_math_complex.cpp

This file was added.

				// REQUIRES: nvptx-registered-target
				// RUN: %clang_cc1 -verify -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -verify -internal-isystem %S/../../lib/Headers/openmp_wrappers -include __clang_openmp_device_functions.h -internal-isystem %S/Inputs/include -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -aux-triple powerpc64le-unknown-unknown -o - \| FileCheck %s
				// expected-no-diagnostics

				#include <complex>

				// CHECK-DAG: define {{.*}} @__mulsc3
				// CHECK-DAG: define {{.*}} @__muldc3
				// CHECK-DAG: define {{.*}} @__divsc3
				// CHECK-DAG: define {{.*}} @__divdc3

				// CHECK-DAG: call float @__nv_scalbnf(
				void test_scmplx(std::complex<float> a) {
				#pragma omp target
				{
				(void)(a * (a / a));
				}
				}

				// CHECK-DAG: call double @__nv_scalbn(
				void test_dcmplx(std::complex<double> a) {
				#pragma omp target
				{
				(void)(a * (a / a));
				}
				}