This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/lib/Headers/
-
trunk/
-
lib/
-
Headers/
-
__clang_cuda_cmath.h
-
__clang_cuda_math_forward_declares.h

Differential D23627

[CUDA] Improve handling of math functions.
ClosedPublic

Authored by jlebar on Aug 17 2016, 2:44 PM.

Download Raw Diff

Details

Reviewers

tra

Commits

rGcb20a09f54ef: [CUDA] Improve handling of math functions.
rC279140: [CUDA] Improve handling of math functions.
rL279140: [CUDA] Improve handling of math functions.

Summary

A bunch of related changes here to our CUDA math headers.

The second arg to nexttoward is a double (well, technically, long double, but we don't have that), not a float.

Add a forward-declare of llround(float), which is defined in the CUDA headers. We need this for the same reason we need most of the other forward-declares: To prevent a constexpr function in our standard library from becoming host+device.

Add nexttowardf implementation.

Pull "foobarf" functions defined by the CUDA headers in the global namespace into namespace std. This lets you do e.g. std::sinf.

Add overloads for math functions accepting integer types. This lets you do e.g. std::sin(0) without having an ambiguity between the overload that takes a float and the one that takes a double.

With these changes, we pass testcases derived from libc++ for cmath and
math.h. We can check these testcases in to the test-suite once support
for CUDA lands there.

Diff Detail

Repository: rL LLVM

Event Timeline

jlebar updated this revision to Diff 68427.Aug 17 2016, 2:44 PM

jlebar retitled this revision from to [CUDA] Improve handling of math functions..

jlebar updated this object.

jlebar added a reviewer: tra.

jlebar added a subscriber: cfe-commits.

tra added inline comments.Aug 17 2016, 3:21 PM

clang/lib/Headers/__clang_cuda_cmath.h
125–133 ↗	(On Diff #68427)	You've got two identical `nexttoward(float, double)` now. Perhaps first one was supposed to remain `nexttoward(float, float)` ?
184–197 ↗	(On Diff #68427)	`is_specialized` will be true for `long double` args and we'll instantiate the function. Can we/should we produce an error instead?

jlebar added inline comments.Aug 17 2016, 4:27 PM

clang/lib/Headers/__clang_cuda_cmath.h
125–133 ↗	(On Diff #68427)	It's hard to see, but one is nexttowardf.
184–197 ↗	(On Diff #68427)	I think it's OK. Or at least, long double is kind of screwed up at the moment. Sometimes we pick `__host__` overloads, sometimes we pick `__device__` overloads; I made no effort to make it correct. I'm much more bullish on making use of long double a compile error as a way to solve these problems.

LGTM, but we may want someone familiar with math library to take a look.

clang/lib/Headers/__clang_cuda_cmath.h
125–133 ↗	(On Diff #68427)	Indeed, I've missed that.

This revision is now accepted and ready to land.Aug 17 2016, 5:04 PM

These changes have always been kind of scary. tra tested this against Thrust with all combinations of CUDA 7.0/7.5, c++98/11, libc++/libstdc++{4.8.5/4.9.3,5.3.0}. So we should be good here. I hope.

Closed by commit rL279140: [CUDA] Improve handling of math functions. (authored by jlebar). · Explain WhyAug 18 2016, 1:51 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

Headers/

__clang_cuda_cmath.h

283 lines

__clang_cuda_math_forward_declares.h

5 lines

Diff 68600

cfe/trunk/lib/Headers/__clang_cuda_cmath.h

Show All 20 Lines
*===-----------------------------------------------------------------------===		*===-----------------------------------------------------------------------===
*/		*/
#ifndef __CLANG_CUDA_CMATH_H__		#ifndef __CLANG_CUDA_CMATH_H__
#define __CLANG_CUDA_CMATH_H__		#define __CLANG_CUDA_CMATH_H__
#ifndef __CUDA__		#ifndef __CUDA__
#error "This file is for CUDA compilation only."		#error "This file is for CUDA compilation only."
#endif		#endif

		#include <limits>

// CUDA lets us use various std math functions on the device side. This file		// CUDA lets us use various std math functions on the device side. This file
// works in concert with __clang_cuda_math_forward_declares.h to make this work.		// works in concert with __clang_cuda_math_forward_declares.h to make this work.
//		//
// Specifically, the forward-declares header declares __device__ overloads for		// Specifically, the forward-declares header declares __device__ overloads for
// these functions in the global namespace, then pulls them into namespace std		// these functions in the global namespace, then pulls them into namespace std
// with 'using' statements. Then this file implements those functions, after		// with 'using' statements. Then this file implements those functions, after
// the implementations have been pulled in.		// their implementations have been pulled in.
//		//
// It's important that we declare the functions in the global namespace and pull		// It's important that we declare the functions in the global namespace and pull
// them into namespace std with using statements, as opposed to simply declaring		// them into namespace std with using statements, as opposed to simply declaring
// these functions in namespace std, because our device functions need to		// these functions in namespace std, because our device functions need to
// overload the standard library functions, which may be declared in the global		// overload the standard library functions, which may be declared in the global
// namespace or in std, depending on the degree of conformance of the stdlib		// namespace or in std, depending on the degree of conformance of the stdlib
// implementation. Declaring in the global namespace and pulling into namespace		// implementation. Declaring in the global namespace and pulling into namespace
// std covers all of the known knowns.		// std covers all of the known knowns.
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	__DEVICE__ bool isunordered(double __x, double __y) {
return __builtin_isunordered(__x, __y);		return __builtin_isunordered(__x, __y);
}		}
__DEVICE__ float ldexp(float __arg, int __exp) {		__DEVICE__ float ldexp(float __arg, int __exp) {
return ::ldexpf(__arg, __exp);		return ::ldexpf(__arg, __exp);
}		}
__DEVICE__ float log(float __x) { return ::logf(__x); }		__DEVICE__ float log(float __x) { return ::logf(__x); }
__DEVICE__ float log10(float __x) { return ::log10f(__x); }		__DEVICE__ float log10(float __x) { return ::log10f(__x); }
__DEVICE__ float modf(float __x, float *__iptr) { return ::modff(__x, __iptr); }		__DEVICE__ float modf(float __x, float *__iptr) { return ::modff(__x, __iptr); }
__DEVICE__ float nexttoward(float __from, float __to) {		__DEVICE__ float nexttoward(float __from, double __to) {
return __builtin_nexttowardf(__from, __to);		return __builtin_nexttowardf(__from, __to);
}		}
__DEVICE__ double nexttoward(double __from, double __to) {		__DEVICE__ double nexttoward(double __from, double __to) {
return __builtin_nexttoward(__from, __to);		return __builtin_nexttoward(__from, __to);
}		}
		__DEVICE__ float nexttowardf(float __from, double __to) {
		return __builtin_nexttowardf(__from, __to);
		}
__DEVICE__ float pow(float __base, float __exp) {		__DEVICE__ float pow(float __base, float __exp) {
return ::powf(__base, __exp);		return ::powf(__base, __exp);
}		}
__DEVICE__ float pow(float __base, int __iexp) {		__DEVICE__ float pow(float __base, int __iexp) {
return ::powif(__base, __iexp);		return ::powif(__base, __iexp);
}		}
__DEVICE__ double pow(double __base, int __iexp) {		__DEVICE__ double pow(double __base, int __iexp) {
return ::powi(__base, __iexp);		return ::powi(__base, __iexp);
}		}
__DEVICE__ bool signbit(float __x) { return ::__signbitf(__x); }		__DEVICE__ bool signbit(float __x) { return ::__signbitf(__x); }
__DEVICE__ bool signbit(double __x) { return ::__signbit(__x); }		__DEVICE__ bool signbit(double __x) { return ::__signbit(__x); }
__DEVICE__ float sin(float __x) { return ::sinf(__x); }		__DEVICE__ float sin(float __x) { return ::sinf(__x); }
__DEVICE__ float sinh(float __x) { return ::sinhf(__x); }		__DEVICE__ float sinh(float __x) { return ::sinhf(__x); }
__DEVICE__ float sqrt(float __x) { return ::sqrtf(__x); }		__DEVICE__ float sqrt(float __x) { return ::sqrtf(__x); }
__DEVICE__ float tan(float __x) { return ::tanf(__x); }		__DEVICE__ float tan(float __x) { return ::tanf(__x); }
__DEVICE__ float tanh(float __x) { return ::tanhf(__x); }		__DEVICE__ float tanh(float __x) { return ::tanhf(__x); }

		// Now we've defined everything we promised we'd define in
		// __clang_cuda_math_forward_declares.h. We need to do two additional things to
		// fix up our math functions.
		//
		// 1) Define __device__ overloads for e.g. sin(int). The CUDA headers define
		// only sin(float) and sin(double), which means that e.g. sin(0) is
		// ambiguous.
		//
		// 2) Pull the __device__ overloads of "foobarf" math functions into namespace
		// std. These are defined in the CUDA headers in the global namespace,
		// independent of everything else we've done here.

		// We can't use std::enable_if, because we want to be pre-C++11 compatible. But
		// we go ahead and unconditionally define functions that are only available when
		// compiling for C++11 to match the behavior of the CUDA headers.
		template<bool __B, class __T = void>
		struct __clang_cuda_enable_if {};

		template <class __T> struct __clang_cuda_enable_if<true, __T> {
		typedef __T type;
		};

		// Defines an overload of __fn that accepts one integral argument, calls
		// __fn((double)x), and returns __retty.
		#define __CUDA_CLANG_FN_INTEGER_OVERLOAD_1(__retty, __fn) \
		template <typename __T> \
		__DEVICE__ \
		typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer, \
		__retty>::type \
		__fn(__T __x) { \
		return ::__fn((double)__x); \
		}

		// Defines an overload of __fn that accepts one two arithmetic arguments, calls
		// __fn((double)x, (double)y), and returns a double.
		//
		// Note this is different from OVERLOAD_1, which generates an overload that
		// accepts only integral arguments.
		#define __CUDA_CLANG_FN_INTEGER_OVERLOAD_2(__retty, __fn) \
		template <typename __T1, typename __T2> \
		__DEVICE__ typename __clang_cuda_enable_if< \
		std::numeric_limits<__T1>::is_specialized && \
		std::numeric_limits<__T2>::is_specialized, \
		__retty>::type \
		__fn(__T1 __x, __T2 __y) { \
		return __fn((double)__x, (double)__y); \
		}

		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acos)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, acosh)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, asin)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, asinh)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, atan)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, atan2);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, atanh)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, cbrt)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, ceil)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, copysign);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, cos)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, cosh)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, erf)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, erfc)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, exp)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, exp2)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, expm1)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, fabs)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, fdim);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, floor)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, fmax);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, fmin);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, fmod);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(int, fpclassify)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, hypot);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(int, ilogb)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(bool, isfinite)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(bool, isgreater);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(bool, isgreaterequal);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(bool, isinf);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(bool, isless);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(bool, islessequal);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(bool, islessgreater);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(bool, isnan);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(bool, isnormal)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(bool, isunordered);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, lgamma)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, log)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, log10)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, log1p)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, log2)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, logb)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(long long, llrint)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(long long, llround)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(long, lrint)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(long, lround)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, nearbyint);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, nextafter);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, pow);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_2(double, remainder);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, rint);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, round);
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(bool, signbit)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, sin)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, sinh)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, sqrt)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, tan)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, tanh)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, tgamma)
		__CUDA_CLANG_FN_INTEGER_OVERLOAD_1(double, trunc);

		#undef __CUDA_CLANG_FN_INTEGER_OVERLOAD_1
		#undef __CUDA_CLANG_FN_INTEGER_OVERLOAD_2

		// Overloads for functions that don't match the patterns expected by
		// __CUDA_CLANG_FN_INTEGER_OVERLOAD_{1,2}.
		template <typename __T1, typename __T2, typename __T3>
		__DEVICE__ typename __clang_cuda_enable_if<
		std::numeric_limits<__T1>::is_specialized &&
		std::numeric_limits<__T2>::is_specialized &&
		std::numeric_limits<__T3>::is_specialized,
		double>::type
		fma(__T1 __x, __T2 __y, __T3 __z) {
		return std::fma((double)__x, (double)__y, (double)__z);
		}

		template <typename __T>
		__DEVICE__ typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer,
		double>::type
		frexp(__T __x, int *__exp) {
		return std::frexp((double)__x, __exp);
		}

		template <typename __T>
		__DEVICE__ typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer,
		double>::type
		ldexp(__T __x, int __exp) {
		return std::ldexp((double)__x, __exp);
		}

		template <typename __T>
		__DEVICE__ typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer,
		double>::type
		nexttoward(__T __from, double __to) {
		return std::nexttoward((double)__from, __to);
		}

		template <typename __T1, typename __T2>
		__DEVICE__ typename __clang_cuda_enable_if<
		std::numeric_limits<__T1>::is_specialized &&
		std::numeric_limits<__T2>::is_specialized,
		double>::type
		remquo(__T1 __x, __T2 __y, int *__quo) {
		return std::remquo((double)__x, (double)__y, __quo);
		}

		template <typename __T>
		__DEVICE__ typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer,
		double>::type
		scalbln(__T __x, long __exp) {
		return std::scalbln((double)__x, __exp);
		}

		template <typename __T>
		__DEVICE__ typename __clang_cuda_enable_if<std::numeric_limits<__T>::is_integer,
		double>::type
		scalbn(__T __x, int __exp) {
		return std::scalbn((double)__x, __exp);
		}

		namespace std {
		// Pull the new overloads we defined above into namespace std.
		using ::acos;
		using ::acosh;
		using ::asin;
		using ::asinh;
		using ::atan;
		using ::atan2;
		using ::atanh;
		using ::cbrt;
		using ::ceil;
		using ::cos;
		using ::cosh;
		using ::erf;
		using ::erfc;
		using ::exp;
		using ::exp2;
		using ::expm1;
		using ::fabs;
		using ::floor;
		using ::frexp;
		using ::ilogb;
		using ::ldexp;
		using ::lgamma;
		using ::llrint;
		using ::llround;
		using ::log;
		using ::log10;
		using ::log1p;
		using ::log2;
		using ::logb;
		using ::lrint;
		using ::lround;
		using ::nexttoward;
		using ::pow;
		using ::remquo;
		using ::scalbln;
		using ::scalbn;
		using ::sin;
		using ::sinh;
		using ::sqrt;
		using ::tan;
		using ::tanh;
		using ::tgamma;

		// Finally, pull the "foobarf" functions that CUDA defines in its headers into
		// namespace std.
		using ::acosf;
		using ::acoshf;
		using ::asinf;
		using ::asinhf;
		using ::atan2f;
		using ::atanf;
		using ::atanhf;
		using ::cbrtf;
		using ::ceilf;
		using ::copysignf;
		using ::cosf;
		using ::coshf;
		using ::erfcf;
		using ::erff;
		using ::exp2f;
		using ::expf;
		using ::expm1f;
		using ::fabsf;
		using ::fdimf;
		using ::floorf;
		using ::fmaf;
		using ::fmaxf;
		using ::fminf;
		using ::fmodf;
		using ::frexpf;
		using ::hypotf;
		using ::ilogbf;
		using ::ldexpf;
		using ::lgammaf;
		using ::llrintf;
		using ::llroundf;
		using ::log10f;
		using ::log1pf;
		using ::log2f;
		using ::logbf;
		using ::logf;
		using ::lrintf;
		using ::lroundf;
		using ::modff;
		using ::nearbyintf;
		using ::nextafterf;
		using ::nexttowardf;
		using ::nexttowardf;
		using ::powf;
		using ::remainderf;
		using ::remquof;
		using ::rintf;
		using ::roundf;
		using ::scalblnf;
		using ::scalbnf;
		using ::sinf;
		using ::sinhf;
		using ::sqrtf;
		using ::tanf;
		using ::tanhf;
		using ::tgammaf;
		using ::truncf;
		}

#undef __DEVICE__		#undef __DEVICE__

#endif		#endif

cfe/trunk/lib/Headers/__clang_cuda_math_forward_declares.h

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	__DEVICE__ double logb(double);			__DEVICE__ double logb(double);
	__DEVICE__ float logb(float);			__DEVICE__ float logb(float);
	__DEVICE__ double log(double);			__DEVICE__ double log(double);
	__DEVICE__ float log(float);			__DEVICE__ float log(float);
	__DEVICE__ long lrint(double);			__DEVICE__ long lrint(double);
	__DEVICE__ long lrint(float);			__DEVICE__ long lrint(float);
	__DEVICE__ long lround(double);			__DEVICE__ long lround(double);
	__DEVICE__ long lround(float);			__DEVICE__ long lround(float);
				__DEVICE__ long long llround(float); // No llround(double).
	__DEVICE__ double modf(double, double *);			__DEVICE__ double modf(double, double *);
	__DEVICE__ float modf(float, float *);			__DEVICE__ float modf(float, float *);
	__DEVICE__ double nan(const char *);			__DEVICE__ double nan(const char *);
	__DEVICE__ float nanf(const char *);			__DEVICE__ float nanf(const char *);
	__DEVICE__ double nearbyint(double);			__DEVICE__ double nearbyint(double);
	__DEVICE__ float nearbyint(float);			__DEVICE__ float nearbyint(float);
	__DEVICE__ double nextafter(double, double);			__DEVICE__ double nextafter(double, double);
	__DEVICE__ float nextafter(float, float);			__DEVICE__ float nextafter(float, float);
	__DEVICE__ double nexttoward(double, double);			__DEVICE__ double nexttoward(double, double);
	__DEVICE__ float nexttoward(float, float);			__DEVICE__ float nexttoward(float, double);
				__DEVICE__ float nexttowardf(float, double);
	__DEVICE__ double pow(double, double);			__DEVICE__ double pow(double, double);
	__DEVICE__ double pow(double, int);			__DEVICE__ double pow(double, int);
	__DEVICE__ float pow(float, float);			__DEVICE__ float pow(float, float);
	__DEVICE__ float pow(float, int);			__DEVICE__ float pow(float, int);
	__DEVICE__ double remainder(double, double);			__DEVICE__ double remainder(double, double);
	__DEVICE__ float remainder(float, float);			__DEVICE__ float remainder(float, float);
	__DEVICE__ double remquo(double, double, int *);			__DEVICE__ double remquo(double, double, int *);
	__DEVICE__ float remquo(float, float, int *);			__DEVICE__ float remquo(float, float, int *);
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	using ::llrint;			using ::llrint;
	using ::log;			using ::log;
	using ::log10;			using ::log10;
	using ::log1p;			using ::log1p;
	using ::log2;			using ::log2;
	using ::logb;			using ::logb;
	using ::lrint;			using ::lrint;
	using ::lround;			using ::lround;
				using ::llround;
	using ::modf;			using ::modf;
	using ::nan;			using ::nan;
	using ::nanf;			using ::nanf;
	using ::nearbyint;			using ::nearbyint;
	using ::nextafter;			using ::nextafter;
	using ::nexttoward;			using ::nexttoward;
	using ::pow;			using ::pow;
	using ::remainder;			using ::remainder;
	Show All 18 Lines