This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Headers/
-
lib/
-
Headers/
1/2
__clang_hip_cmath.h

Differential D90409

[HIP] Math Headers to use type promotion
ClosedPublic

Authored by ashi1 on Oct 29 2020, 10:23 AM.

Download Raw Diff

Details

Reviewers

yaxunl
tra

Commits

rGca5b31502c82: [HIP] Math Headers to use type promotion

Summary

Similar to libcxx implementation of cmath function
overloads, use type promotion templates to determine
return types of multi-argument math functions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ashi1 requested review of this revision.Oct 29 2020, 10:23 AM

ashi1 created this revision.

I'm not quite sure what is the problem this patch is intended to solve. Could you give me more details?

clang/lib/Headers/__clang_hip_cmath.h
291	Will automatically derived return type ever be different from the `__retty` we're explicitly specifying now? If the answer is `no`, then what does the patch buy us here? It looks like a more complicated way to do what we're already doing. If `yes`, this will potentially create observable differences in code behavior before/after -std=c++11.

In D90409#2362554, @tra wrote:

I'm not quite sure what is the problem this patch is intended to solve. Could you give me more details?

@tra, a problem arose with the fma function. When given fma(float, float, char), it was returning a double type. Instead, we want to be more similar to C++ and return the promoted type which is float in this case.
This patch tries to fix a few failures I introduced with my recent HIP header refactoring patch.

Also, HOLD on this patch. I just found another bug. Will update soon.

clang/lib/Headers/__clang_hip_cmath.h
291	Yes, sometimes the return type should be float when called by float mixed with char/int/short args.

In D90409#2362897, @ashi1 wrote:

In D90409#2362554, @tra wrote:

I'm not quite sure what is the problem this patch is intended to solve. Could you give me more details?

@tra, a problem arose with the fma function. When given fma(float, float, char), it was returning a double type. Instead, we want to be more similar to C++ and return the promoted type which is float in this case.
This patch tries to fix a few failures I introduced with my recent HIP header refactoring patch.

That is odd. char should've been promoted to float and fma(flot, float, float) should've been called and this patch should not have been necessary. https://cppinsights.io/s/7cdd71b7
If that's the case, then this patch may not do the right thing either -- it would force the arguments to the derived result type, but if fma(double) is the only choice, the arguments will be implicitly converted to double and back which is probably not what you want.

Perhaps the problem is that fma(flot, float, float) is not visible at the point where the overload resolution happens.

Also, HOLD on this patch. I just found another bug. Will update soon.

In D90409#2363044, @tra wrote:

In D90409#2362897, @ashi1 wrote:

In D90409#2362554, @tra wrote:

I'm not quite sure what is the problem this patch is intended to solve. Could you give me more details?

@tra, a problem arose with the fma function. When given fma(float, float, char), it was returning a double type. Instead, we want to be more similar to C++ and return the promoted type which is float in this case.
This patch tries to fix a few failures I introduced with my recent HIP header refactoring patch.

That is odd. char should've been promoted to float and fma(flot, float, float) should've been called and this patch should not have been necessary. https://cppinsights.io/s/7cdd71b7
If that's the case, then this patch may not do the right thing either -- it would force the arguments to the derived result type, but if fma(double) is the only choice, the arguments will be implicitly converted to double and back which is probably not what you want.

Perhaps the problem is that fma(flot, float, float) is not visible at the point where the overload resolution happens.

Sorry, the original HIP test may be invalid, it was comparing the result of (float, float X) with the float result, and the (double, double, X) with the double result, so it expected (float, float, char) to equal (float, float, float). However, when I am looking at clang's libcxx implementation, the return type of (float, float, char) should instead be promoted to double. I think HIP wants to follow closely with the libcxx implementation of type promotion:
https://github.com/llvm/llvm-project/blob/master/libcxx/include/math.h#L1200

libcxx will change the return type to double, if any of the arguments are of the type: char/int/unsigned/long/ulong/longlong/ulonglong/double:

static void __test(...);
static float __test(float);
static double __test(char);
static double __test(int);
static double __test(unsigned);
static double __test(long);
static double __test(unsigned long);
static double __test(long long);
static double __test(unsigned long long);
static double __test(double);

I tried the previous HIP FMA test, and it looks like libcxx's cmath is expecting fma(float, float, char) to be promoted to (double, double, double) and return type double:
https://cppinsights.io/s/ee45a5ca

Revised the patch to match libcxx, fixed a bug in return type resolution, and ran clang-format on this patch.

I was confused about type conversion. Apparently the standard library says :

If any argument has integral type, it is cast to double

LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT?

This revision is now accepted and ready to land.Nov 3 2020, 9:42 AM

Closed by commit rGca5b31502c82: [HIP] Math Headers to use type promotion (authored by ashi1). · Explain WhyNov 3 2020, 10:41 AM

This revision was automatically updated to reflect the committed changes.

ashi1 added a commit: rGca5b31502c82: [HIP] Math Headers to use type promotion.

Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 10:41 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT?

I agree that the C and C++ standard libraries should behave the same in CUDA mode and host mode!

But if doing so would make our behavior different than nvcc's, maybe we could emit a warning or something? Like, "this code you wrote maybe for nvcc is going to do something different with clang."

nvcc does not support fma(float,float,char)

https://godbolt.org/z/zxbMhP

clang's behavior was different from nvcc already.

In D90409#2371679, @jlebar wrote:

LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT?

I agree that the C and C++ standard libraries should behave the same in CUDA mode and host mode!

But if doing so would make our behavior different than nvcc's, maybe we could emit a warning or something? Like, "this code you wrote maybe for nvcc is going to do something different with clang."

Interestingly enough CUDA 10.1+ already promotes integer fma() arguments to double:
https://godbolt.org/z/crbqTe

I wonder what makes HIP different to require this change.

In D90409#2371969, @yaxunl wrote:

nvcc does not support fma(float,float,char)

It does, it just needs an explicit flag to match clang's treatment of constexpr functions as HD.

In D90409#2371987, @tra wrote:

In D90409#2371969, @yaxunl wrote:

nvcc does not support fma(float,float,char)

It does, it just needs an explicit flag to match clang's treatment of constexpr functions as HD.

In D90409#2371972, @tra wrote:

In D90409#2371679, @jlebar wrote:

LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT?

I agree that the C and C++ standard libraries should behave the same in CUDA mode and host mode!

But if doing so would make our behavior different than nvcc's, maybe we could emit a warning or something? Like, "this code you wrote maybe for nvcc is going to do something different with clang."

Interestingly enough CUDA 10.1+ already promotes integer fma() arguments to double:
https://godbolt.org/z/crbqTe

I wonder what makes HIP different to require this change.

Practically the behavior is the same since they all promote integer types to double. This matches the C++ behavior. However the HIP change will make it conform to C++ for a target supporting long double whereas the previous header did not.

In D90409#2372023, @yaxunl wrote:

In D90409#2371987, @tra wrote:

In D90409#2371969, @yaxunl wrote:

nvcc does not support fma(float,float,char)

It does, it just needs an explicit flag to match clang's treatment of constexpr functions as HD.

In D90409#2371972, @tra wrote:

In D90409#2371679, @jlebar wrote:

LGTM. I think the change would make sense for CUDA, too. @jlebar - WDYT?

I agree that the C and C++ standard libraries should behave the same in CUDA mode and host mode!

But if doing so would make our behavior different than nvcc's, maybe we could emit a warning or something? Like, "this code you wrote maybe for nvcc is going to do something different with clang."

Interestingly enough CUDA 10.1+ already promotes integer fma() arguments to double:
https://godbolt.org/z/crbqTe

I wonder what makes HIP different to require this change.

Practically the behavior is the same since they all promote integer types to double. This matches the C++ behavior. However the HIP change will make it conform to C++ for a target supporting long double whereas the previous header did not.

Sorry I mean the change can make the header extendable to long double easily although it does not yet. Another thing is that it allows resolution of mixed argument types with _Float16.

In D90409#2372042, @yaxunl wrote:

Practically the behavior is the same since they all promote integer types to double. This matches the C++ behavior. However the HIP change will make it conform to C++ for a target supporting long double whereas the previous header did not.

Sorry I mean the change can make the header extendable to long double easily although it does not yet. Another thing is that it allows resolution of mixed argument types with _Float16.

OK. This makes more sense now. Thank you for the explanation.

While this does solve one particular instance of the issue, we can't jsut copy/paste bits of the standard library forever. We need something more robust.
NVIDIA now has their own fork of the standard library https://github.com/NVIDIA/libcudacxx and that may be a good starting point.
I think at some point we (HIP & CUDA owners) need to talk to libc++ maintainers and see if we can find a better way to extend the standard library to CUDA/HIP.

In D90409#2372183, @tra wrote:

In D90409#2372042, @yaxunl wrote:

Practically the behavior is the same since they all promote integer types to double. This matches the C++ behavior. However the HIP change will make it conform to C++ for a target supporting long double whereas the previous header did not.

Sorry I mean the change can make the header extendable to long double easily although it does not yet. Another thing is that it allows resolution of mixed argument types with _Float16.

OK. This makes more sense now. Thank you for the explanation.

While this does solve one particular instance of the issue, we can't jsut copy/paste bits of the standard library forever. We need something more robust.
NVIDIA now has their own fork of the standard library https://github.com/NVIDIA/libcudacxx and that may be a good starting point.
I think at some point we (HIP & CUDA owners) need to talk to libc++ maintainers and see if we can find a better way to extend the standard library to CUDA/HIP.

Agree. A seamless native libc++ support for CUDA/HIP is very attractive. Even if just partial support. At least math functions to start with.

Revision Contents

Path

Size

clang/

lib/

Headers/

__clang_hip_cmath.h

106 lines

Diff 302619

clang/lib/Headers/__clang_hip_cmath.h

	Show All 10 Lines
	#define __CLANG_HIP_CMATH_H__			#define __CLANG_HIP_CMATH_H__

	#if !defined(__HIP__)			#if !defined(__HIP__)
	#error "This file is for HIP and OpenMP AMDGCN device compilation only."			#error "This file is for HIP and OpenMP AMDGCN device compilation only."
	#endif			#endif

	#if defined(__cplusplus)			#if defined(__cplusplus)
	#include <limits>			#include <limits>
				#include <type_traits>
				#include <utility>
	#endif			#endif
	#include <limits.h>			#include <limits.h>
	#include <stdint.h>			#include <stdint.h>

	#pragma push_macro("__DEVICE__")			#pragma push_macro("__DEVICE__")
	#define __DEVICE__ static __device__ inline __attribute__((always_inline))			#define __DEVICE__ static __device__ inline __attribute__((always_inline))

	// Start with functions that cannot be defined by DEF macros below.			// Start with functions that cannot be defined by DEF macros below.
	▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	#pragma push_macro("__HIP_OVERLOAD1")			#pragma push_macro("__HIP_OVERLOAD1")
	#pragma push_macro("__HIP_OVERLOAD2")			#pragma push_macro("__HIP_OVERLOAD2")

	// __hip_enable_if::type is a type function which returns __T if __B is true.			// __hip_enable_if::type is a type function which returns __T if __B is true.
	template <bool __B, class __T = void> struct __hip_enable_if {};			template <bool __B, class __T = void> struct __hip_enable_if {};

	template <class __T> struct __hip_enable_if<true, __T> { typedef __T type; };			template <class __T> struct __hip_enable_if<true, __T> { typedef __T type; };

				// decltype is only available in C++11 and above.
				#if __cplusplus >= 201103L
				// __hip_promote
				namespace __hip {

				template <class _Tp> struct __numeric_type {
				static void __test(...);
				static _Float16 __test(_Float16);
				static float __test(float);
				static double __test(char);
				static double __test(int);
				static double __test(unsigned);
				static double __test(long);
				static double __test(unsigned long);
				static double __test(long long);
				static double __test(unsigned long long);
				static double __test(double);

				typedef decltype(__test(std::declval<_Tp>())) type;
				static const bool value = !std::is_same<type, void>::value;
				};

				template <> struct __numeric_type<void> { static const bool value = true; };

				template <class _A1, class _A2 = void, class _A3 = void,
				bool = __numeric_type<_A1>::value &&__numeric_type<_A2>::value
				&&__numeric_type<_A3>::value>
				class __promote_imp {
				public:
				static const bool value = false;
				};

				template <class _A1, class _A2, class _A3>
				class __promote_imp<_A1, _A2, _A3, true> {
				private:
				typedef typename __promote_imp<_A1>::type __type1;
				typedef typename __promote_imp<_A2>::type __type2;
				typedef typename __promote_imp<_A3>::type __type3;

				public:
				typedef decltype(__type1() + __type2() + __type3()) type;
				static const bool value = true;
				};

				template <class _A1, class _A2> class __promote_imp<_A1, _A2, void, true> {
				private:
				typedef typename __promote_imp<_A1>::type __type1;
				typedef typename __promote_imp<_A2>::type __type2;

				public:
				typedef decltype(__type1() + __type2()) type;
				static const bool value = true;
				};

				template <class _A1> class __promote_imp<_A1, void, void, true> {
				public:
				typedef typename __numeric_type<_A1>::type type;
				static const bool value = true;
				};

				template <class _A1, class _A2 = void, class _A3 = void>
				class __promote : public __promote_imp<_A1, _A2, _A3> {};

				} // namespace __hip
				#endif //__cplusplus >= 201103L

	// __HIP_OVERLOAD1 is used to resolve function calls with integer argument to			// __HIP_OVERLOAD1 is used to resolve function calls with integer argument to
	// avoid compilation error due to ambibuity. e.g. floor(5) is resolved with			// avoid compilation error due to ambibuity. e.g. floor(5) is resolved with
	// floor(double).			// floor(double).
	#define __HIP_OVERLOAD1(__retty, __fn) \			#define __HIP_OVERLOAD1(__retty, __fn) \
	template <typename __T> \			template <typename __T> \
	__DEVICE__ typename __hip_enable_if<std::numeric_limits<__T>::is_integer, \			__DEVICE__ typename __hip_enable_if<std::numeric_limits<__T>::is_integer, \
	__retty>::type \			__retty>::type \
	__fn(__T __x) { \			__fn(__T __x) { \
	return ::__fn((double)__x); \			return ::__fn((double)__x); \
	}			}

	// __HIP_OVERLOAD2 is used to resolve function calls with mixed float/double			// __HIP_OVERLOAD2 is used to resolve function calls with mixed float/double
	// or integer argument to avoid compilation error due to ambibuity. e.g.			// or integer argument to avoid compilation error due to ambibuity. e.g.
	// max(5.0f, 6.0) is resolved with max(double, double).			// max(5.0f, 6.0) is resolved with max(double, double).
				#if __cplusplus >= 201103L
				#define __HIP_OVERLOAD2(__retty, __fn) \
				traUnsubmitted Not Done Reply Inline Actions Will automatically derived return type ever be different from the `__retty` we're explicitly specifying now? If the answer is `no`, then what does the patch buy us here? It looks like a more complicated way to do what we're already doing. If `yes`, this will potentially create observable differences in code behavior before/after -std=c++11. tra: Will automatically derived return type ever be different from the `__retty` we're explicitly…
				ashi1AuthorUnsubmitted Done Reply Inline Actions Yes, sometimes the return type should be float when called by float mixed with char/int/short args. ashi1: Yes, sometimes the return type should be float when called by float mixed with char/int/short…
				template <typename __T1, typename __T2> \
				__DEVICE__ typename __hip_enable_if< \
				std::numeric_limits<__T1>::is_specialized && \
				std::numeric_limits<__T2>::is_specialized, \
				typename __hip::__promote<__T1, __T2>::type>::type \
				__fn(__T1 __x, __T2 __y) { \
				typedef typename __hip::__promote<__T1, __T2>::type __result_type; \
				return __fn((__result_type)__x, (__result_type)__y); \
				}
				#else
	#define __HIP_OVERLOAD2(__retty, __fn) \			#define __HIP_OVERLOAD2(__retty, __fn) \
	template <typename __T1, typename __T2> \			template <typename __T1, typename __T2> \
	__DEVICE__ \			__DEVICE__ \
	typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized && \			typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized && \
	std::numeric_limits<__T2>::is_specialized, \			std::numeric_limits<__T2>::is_specialized, \
	__retty>::type \			__retty>::type \
	__fn(__T1 __x, __T2 __y) { \			__fn(__T1 __x, __T2 __y) { \
	return __fn((double)__x, (double)__y); \			return __fn((double)__x, (double)__y); \
	}			}
				#endif

	__HIP_OVERLOAD1(double, abs)			__HIP_OVERLOAD1(double, abs)
	__HIP_OVERLOAD1(double, acos)			__HIP_OVERLOAD1(double, acos)
	__HIP_OVERLOAD1(double, acosh)			__HIP_OVERLOAD1(double, acosh)
	__HIP_OVERLOAD1(double, asin)			__HIP_OVERLOAD1(double, asin)
	__HIP_OVERLOAD1(double, asinh)			__HIP_OVERLOAD1(double, asinh)
	__HIP_OVERLOAD1(double, atan)			__HIP_OVERLOAD1(double, atan)
	__HIP_OVERLOAD2(double, atan2)			__HIP_OVERLOAD2(double, atan2)
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	__HIP_OVERLOAD1(double, tgamma)			__HIP_OVERLOAD1(double, tgamma)
	__HIP_OVERLOAD1(double, trunc)			__HIP_OVERLOAD1(double, trunc)

	// Overload these but don't add them to std, they are not part of cmath.			// Overload these but don't add them to std, they are not part of cmath.
	__HIP_OVERLOAD2(double, max)			__HIP_OVERLOAD2(double, max)
	__HIP_OVERLOAD2(double, min)			__HIP_OVERLOAD2(double, min)

	// Additional Overloads that don't quite match HIP_OVERLOAD.			// Additional Overloads that don't quite match HIP_OVERLOAD.
				#if __cplusplus >= 201103L
				template <typename __T1, typename __T2, typename __T3>
				__DEVICE__ typename __hip_enable_if<
				std::numeric_limits<__T1>::is_specialized &&
				std::numeric_limits<__T2>::is_specialized &&
				std::numeric_limits<__T3>::is_specialized,
				typename __hip::__promote<__T1, __T2, __T3>::type>::type
				fma(__T1 __x, __T2 __y, __T3 __z) {
				typedef typename __hip::__promote<__T1, __T2, __T3>::type __result_type;
				return ::fma((__result_type)__x, (__result_type)__y, (__result_type)__z);
				}
				#else
	template <typename __T1, typename __T2, typename __T3>			template <typename __T1, typename __T2, typename __T3>
	__DEVICE__			__DEVICE__
	typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized &&			typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized &&
	std::numeric_limits<__T2>::is_specialized &&			std::numeric_limits<__T2>::is_specialized &&
	std::numeric_limits<__T3>::is_specialized,			std::numeric_limits<__T3>::is_specialized,
	double>::type			double>::type
	fma(__T1 __x, __T2 __y, __T3 __z) {			fma(__T1 __x, __T2 __y, __T3 __z) {
	return ::fma((double)__x, (double)__y, (double)__z);			return ::fma((double)__x, (double)__y, (double)__z);
	}			}
				#endif

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__
	typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type			typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type
	frexp(__T __x, int *__exp) {			frexp(__T __x, int *__exp) {
	return ::frexp((double)__x, __exp);			return ::frexp((double)__x, __exp);
	}			}

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__
	typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type			typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type
	ldexp(__T __x, int __exp) {			ldexp(__T __x, int __exp) {
	return ::ldexp((double)__x, __exp);			return ::ldexp((double)__x, __exp);
	}			}

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__
	typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type			typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type
	modf(__T __x, double *__exp) {			modf(__T __x, double *__exp) {
	return ::modf((double)__x, __exp);			return ::modf((double)__x, __exp);
	}			}

				#if __cplusplus >= 201103L
				template <typename __T1, typename __T2>
				__DEVICE__
				typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized &&
				std::numeric_limits<__T2>::is_specialized,
				typename __hip::__promote<__T1, __T2>::type>::type
				remquo(__T1 __x, __T2 __y, int *__quo) {
				typedef typename __hip::__promote<__T1, __T2>::type __result_type;
				return ::remquo((__result_type)__x, (__result_type)__y, __quo);
				}
				#else
	template <typename __T1, typename __T2>			template <typename __T1, typename __T2>
	__DEVICE__			__DEVICE__
	typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized &&			typename __hip_enable_if<std::numeric_limits<__T1>::is_specialized &&
	std::numeric_limits<__T2>::is_specialized,			std::numeric_limits<__T2>::is_specialized,
	double>::type			double>::type
	remquo(__T1 __x, __T2 __y, int *__quo) {			remquo(__T1 __x, __T2 __y, int *__quo) {
	return ::remquo((double)__x, (double)__y, __quo);			return ::remquo((double)__x, (double)__y, __quo);
	}			}
				#endif

	template <typename __T>			template <typename __T>
	__DEVICE__			__DEVICE__
	typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type			typename __hip_enable_if<std::numeric_limits<__T>::is_integer, double>::type
	scalbln(__T __x, long int __exp) {			scalbln(__T __x, long int __exp) {
	return ::scalbln((double)__x, __exp);			return ::scalbln((double)__x, __exp);
	}			}

	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines