This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
HIPSupport.rst
-
lib/Headers/
-
Headers/
-
CMakeLists.txt
-
__clang_hip_runtime_wrapper.h
-
cuda_wrappers/
-
__tuple
-
array
-
functional
-
tuple
-
type_traits
-
utility

Differential D102507

[HIP] Support <functional> in device code
Needs ReviewPublic

Authored by yaxunl on May 14 2021, 9:10 AM.

Download Raw Diff

Details

Reviewers

tra

Summary

This patch adds wrapper headers for <functional>
and a few others which is required to support
<functional>.

The basic idea is to make template functions
defined in these headers host device by pragmas.

Since this only works for libc++. The code is conditioned
for libc++ only. For libstdc++ it is NFC.

A test is added to llvm-test-suite for testing this:
https://reviews.llvm.org/D102508

Diff Detail

Event Timeline

yaxunl created this revision.May 14 2021, 9:10 AM

Herald added a subscriber: mgorny. · View Herald TranscriptMay 14 2021, 9:10 AM

yaxunl requested review of this revision.May 14 2021, 9:10 AM

yaxunl edited the summary of this revision. (Show Details)May 14 2021, 9:22 AM

yaxunl added a child revision: D102508: [HIP] Add test libstd_functional.May 14 2021, 9:27 AM

yaxunl mentioned this in D102508: [HIP] Add test libstd_functional.

In effect this patch applies __host__ __device__ to a subset of the standard library headers and whatever headers *they* happen to include. While it may happen to work, I'm not at all confident that it does not create interesting issues.

Considering that the patch only works with libc++ anyways, perhaps it's time to make (parts) of libc++ itself usable from CUDA/HIP, instead of hacking around it in the wrappers?

@rsmith Richard, who would be the right person to discuss the standard library changes we may need?

Harbormaster completed remote builds in B104515: Diff 345456.May 14 2021, 10:04 AM

In D102507#2760025, @tra wrote:

In effect this patch applies __host__ __device__ to a subset of the standard library headers and whatever headers *they* happen to include. While it may happen to work, I'm not at all confident that it does not create interesting issues.

Considering that the patch only works with libc++ anyways, perhaps it's time to make (parts) of libc++ itself usable from CUDA/HIP, instead of hacking around it in the wrappers?

@rsmith Richard, who would be the right person to discuss the standard library changes we may need?

ping.

If we are allowed to make changes to libc++ we may have cleaner implementation for supporting libc++ in HIP device functions.

Currently by default libc++ functions are host functions except constexpr functions. Except constexpr functions, we can't call libc++ host functions in HIP device functions. Our goal is to make libc++ functions __host__ __device__ functions so that they can be called in HIP device functions. We may not be able to support all libc++ functions, e.g. file I/O, threads, but at least we should be able to support some of them, e.g. type_traits, functional, containers. We do this by supporting the underlying functions e.g. malloc/free on device.

The change will be NFC for other languages.

rsmith added a subscriber: ldionne.Jun 1 2021, 1:35 PM

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

In D102507#2792087, @rsmith wrote:

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

ping

In D102507#2830688, @yaxunl wrote:

In D102507#2792087, @rsmith wrote:

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

ping

If the current approach is to patch libc++ from the outside, then yeah, that's most definitely not a great design IMO. It's going to be very brittle. I think it *may* be reasonable to support this in libc++, but I'd like to see some sort of basic explanation of what the changes would be so we can have a discussion and make our mind up about whether we can support this, and what's the best way of doing it.

In D102507#2833594, @ldionne wrote:

In D102507#2830688, @yaxunl wrote:

In D102507#2792087, @rsmith wrote:

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

ping

If the current approach is to patch libc++ from the outside, then yeah, that's most definitely not a great design IMO. It's going to be very brittle. I think it *may* be reasonable to support this in libc++, but I'd like to see some sort of basic explanation of what the changes would be so we can have a discussion and make our mind up about whether we can support this, and what's the best way of doing it.

Thanks Louis. Please allow me to have a brief explanation about our plan to support libc++ for HIP device compilation.

HIP functions can have __device__, __host__, or __device__ __host__ attributes, indicating the target of a function. __device__ function can only be executed on device (GPU). __host__ functions can only be executed on host. __device__ __host__ functions can be executed on both device and host. By default (without explicit device/host attributes) a non-constexpr function is a host function, a constexpr function is __device__ __host__ function. This also applies to member functions of class. Clang is able to resolve overloaded functions differing only by device/function attributes.

Currently libc++ functions are host functions by default, except constexpr functions. As such the non-constexpr libc++ functions can only be called by host functions in HIP programs. This is similar to C++ programs.

By supporting libc++ in HIP device compilation we mean "allowing libc++ functions to be executed on device in HIP programs". To achieve this we can take 3 approaches:

Many libc++ functions are generic regarding device or host, i.e., their code is common for device and host. For such functions we can make them __device__ __host__ functions.

Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them __device__ __host__ and use #if __HIP_DEVICE_COMPILE__ (indicating device compilation) for the minor difference in the function body.

Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded __device__ functions.

There are two ways to mark libc++ functions as __device__ __host__:

Define a macro which expands to empty for non-HIP programs and expands to __device__ __host__ for HIP and add it to each libc++ function which is to be marked as __device__ __host__.

Define macros which expand to empty for non-HIP programs and expand to #pragma clang force_cuda_host_device begin/end for HIP and put them at the beginning and end of a file where all the functions are to be marked as __device__ __host__.

We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 3) usefulness for device execution 4) availability of lower level support with HIP runtime.

The key difference between C++ and CUDA/HIP, as implemented in clang, is that __host__ and __device__ attributes are considered during function overloading in CUDA and HIP, so __host__ void foo(), __device__ void foo() and __host__ __device__ void foo() are three different functions and not redeclarations of the same function. Details of the original proposal are here: https://goo.gl/EXnymm.

In D102507#2838776, @yaxunl wrote:

Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them __device__ __host__ and use #if __HIP_DEVICE_COMPILE__ (indicating device compilation) for the minor difference in the function body.

I think we should rely on target overloading when possible, instead of the preprocessor. Minimizing the differences between the code seen by compiler during host and device side compilation will minimize potential issues.
Which approach we'll end up using is an implementation detail.

Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded __device__ functions.

There are two ways to mark libc++ functions as __device__ __host__:

Define a macro which expands to empty for non-HIP programs and expands to __device__ __host__ for HIP and add it to each libc++ function which is to be marked as __device__ __host__.

One caveat of the overloading based on target attributes is that we can't re-declare a function with __device__ __host__ as compiler will see attempted redeclaration as a function overload of a function w/o attributes (implicitly __host__).

Define macros which expand to empty for non-HIP programs and expand to #pragma clang force_cuda_host_device begin/end for HIP and put them at the beginning and end of a file where all the functions are to be marked as __device__ __host__.

We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 4) usefulness for device execution 3) availability of lower level support with HIP runtime.

All of the above applies to CUDA, modulo the macro names and some differences in the builtins and the the functions provided (or not) by runtime on the GPU side.

In D102507#2838981, @tra wrote:

The key difference between C++ and CUDA/HIP, as implemented in clang, is that __host__ and __device__ attributes are considered during function overloading in CUDA and HIP, so __host__ void foo(), __device__ void foo() and __host__ __device__ void foo() are three different functions and not redeclarations of the same function. Details of the original proposal are here: https://goo.gl/EXnymm.

In D102507#2838776, @yaxunl wrote:

Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them __device__ __host__ and use #if __HIP_DEVICE_COMPILE__ (indicating device compilation) for the minor difference in the function body.

I think we should rely on target overloading when possible, instead of the preprocessor. Minimizing the differences between the code seen by compiler during host and device side compilation will minimize potential issues.
Which approach we'll end up using is an implementation detail.

Agree.

Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded __device__ functions.

There are two ways to mark libc++ functions as __device__ __host__:

Define a macro which expands to empty for non-HIP programs and expands to __device__ __host__ for HIP and add it to each libc++ function which is to be marked as __device__ __host__.

One caveat of the overloading based on target attributes is that we can't re-declare a function with __device__ __host__ as compiler will see attempted redeclaration as a function overload of a function w/o attributes (implicitly __host__).

If we keep all the declarations consistent we should be fine.

Define macros which expand to empty for non-HIP programs and expand to #pragma clang force_cuda_host_device begin/end for HIP and put them at the beginning and end of a file where all the functions are to be marked as __device__ __host__.

We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 4) usefulness for device execution 3) availability of lower level support with HIP runtime.

All of the above applies to CUDA, modulo the macro names and some differences in the builtins and the the functions provided (or not) by runtime on the GPU side.

ping

Revision Contents

Path

Size

clang/

docs/

HIPSupport.rst

23 lines

lib/

Headers/

CMakeLists.txt

6 lines

__clang_hip_runtime_wrapper.h

4 lines

cuda_wrappers/

24 lines

24 lines

34 lines

24 lines

24 lines

24 lines

Diff 345456

clang/docs/HIPSupport.rst

This file was added.

				============
				HIP Support
				============

				.. contents::
				:local:

				Introduction
				============

				This document describes HIP support in clang. More details are provided in
				`external document <https://github.com/ROCm-Developer-Tools/HIP>`_\ ,
				which are going to be added to clang documentation in the future.

				Standard Library Support
				========================

				<std::functional>
				-----------------

				Clang supports calling std::functioinal functors in HIP device code. However
				this is limited to `-stdlib=libc++`.

clang/lib/Headers/CMakeLists.txt

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	set(files
xsaveintrin.h		xsaveintrin.h
xsaveoptintrin.h		xsaveoptintrin.h
xsavesintrin.h		xsavesintrin.h
xtestintrin.h		xtestintrin.h
)		)

set(cuda_wrapper_files		set(cuda_wrapper_files
cuda_wrappers/algorithm		cuda_wrappers/algorithm
		cuda_wrappers/array
cuda_wrappers/complex		cuda_wrappers/complex
		cuda_wrappers/functional
cuda_wrappers/new		cuda_wrappers/new
		cuda_wrappers/type_traits
		cuda_wrappers/tuple
		cuda_wrappers/utility
		cuda_wrappers/__tuple
)		)

set(ppc_wrapper_files		set(ppc_wrapper_files
ppc_wrappers/mmintrin.h		ppc_wrappers/mmintrin.h
ppc_wrappers/xmmintrin.h		ppc_wrappers/xmmintrin.h
ppc_wrappers/mm_malloc.h		ppc_wrappers/mm_malloc.h
ppc_wrappers/emmintrin.h		ppc_wrappers/emmintrin.h
ppc_wrappers/pmmintrin.h		ppc_wrappers/pmmintrin.h
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

clang/lib/Headers/__clang_hip_runtime_wrapper.h

	Show All 12 Lines
	*			*
	*/			*/

	#ifndef __CLANG_HIP_RUNTIME_WRAPPER_H__			#ifndef __CLANG_HIP_RUNTIME_WRAPPER_H__
	#define __CLANG_HIP_RUNTIME_WRAPPER_H__			#define __CLANG_HIP_RUNTIME_WRAPPER_H__

	#if __HIP__			#if __HIP__

				#if __has_include(<__libcpp_version>)
				#define __HIP_USE_LIBCPP 1
				#endif // __has_include(<__libcpp_version>)

	#if !defined(__HIPCC_RTC__)			#if !defined(__HIPCC_RTC__)
	#include <cmath>			#include <cmath>
	#include <cstdlib>			#include <cstdlib>
	#include <stdlib.h>			#include <stdlib.h>
	#else			#else
	typedef __SIZE_TYPE__ size_t;			typedef __SIZE_TYPE__ size_t;
	// Define macros which are needed to declare HIP device API's without standard			// Define macros which are needed to declare HIP device API's without standard
	// C/C++ headers. This is for readability so that these API's can be written			// C/C++ headers. This is for readability so that these API's can be written
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

clang/lib/Headers/cuda_wrappers/__tuple

This file was added.

				/*===---- __tuple - CUDA/HIP wrapper for <__tuple> -------------------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_CUDA_WRAPPERS___TUPLE
				#define __CLANG_CUDA_WRAPPERS___TUPLE

				#if __HIP_USE_LIBCPP
				#pragma push_macro("_LIBCPP_NO_EXCEPTIONS")
				#define _LIBCPP_NO_EXCEPTIONS
				#pragma clang force_cuda_host_device begin
				#include_next <__tuple>
				#pragma clang force_cuda_host_device end
				#pragma pop_macro("_LIBCPP_NO_EXCEPTIONS")
				#else
				#include_next <__tuple>
				#endif // __HIP_USE_LIBCPP

				#endif // __CLANG_CUDA_WRAPPERS___TUPLE

clang/lib/Headers/cuda_wrappers/array

This file was added.

				/*===---- array - CUDA/HIP wrapper for <array> -----------------------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_CUDA_WRAPPERS_ARRAY
				#define __CLANG_CUDA_WRAPPERS_ARRAY

				#if __HIP_USE_LIBCPP
				#pragma push_macro("_LIBCPP_NO_EXCEPTIONS")
				#define _LIBCPP_NO_EXCEPTIONS
				#pragma clang force_cuda_host_device begin
				#include_next <array>
				#pragma clang force_cuda_host_device end
				#pragma pop_macro("_LIBCPP_NO_EXCEPTIONS")
				#else
				#include_next <array>
				#endif // __HIP_USE_LIBCPP

				#endif // __CLANG_CUDA_WRAPPERS_ARRAY

clang/lib/Headers/cuda_wrappers/functional

This file was added.

				/*===---- functional - CUDA/HIP wrapper for <functional> -------------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_CUDA_WRAPPERS_FUNCTIONAL
				#define __CLANG_CUDA_WRAPPERS_FUNCTIONAL

				#if __HIP_USE_LIBCPP
				#pragma push_macro("_LIBCPP_NO_EXCEPTIONS")
				#define _LIBCPP_NO_EXCEPTIONS

				extern __device__ void abort() __attribute__ ((__noreturn__));
				namespace std {
				namespace __1 {
				inline __device__ void abort() {
				return ::abort();
				}
				}
				}

				#pragma clang force_cuda_host_device begin
				#include_next <functional>
				#pragma clang force_cuda_host_device end
				#pragma pop_macro("_LIBCPP_NO_EXCEPTIONS")
				#else
				#include_next <functional>
				#endif // __HIP_USE_LIBCPP

				#endif // __CLANG_CUDA_WRAPPERS_FUNCTIONAL

clang/lib/Headers/cuda_wrappers/tuple

This file was added.

				/*===---- tuple - CUDA/HIP wrapper for <tuple> -----------------------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_CUDA_WRAPPERS_TUPLE
				#define __CLANG_CUDA_WRAPPERS_TUPLE

				#if __HIP_USE_LIBCPP
				#pragma push_macro("_LIBCPP_NO_EXCEPTIONS")
				#define _LIBCPP_NO_EXCEPTIONS
				#pragma clang force_cuda_host_device begin
				#include_next <tuple>
				#pragma clang force_cuda_host_device end
				#pragma pop_macro("_LIBCPP_NO_EXCEPTIONS")
				#else
				#include_next <tuple>
				#endif // __HIP_USE_LIBCPP

				#endif // __CLANG_CUDA_WRAPPERS_TUPLE

clang/lib/Headers/cuda_wrappers/type_traits

This file was added.

				/*===---- type_traits - CUDA/HIP wrapper for <type_traits> -----------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_CUDA_WRAPPERS_TYPE_TRAITS
				#define __CLANG_CUDA_WRAPPERS_TYPE_TRAITS

				#if __HIP_USE_LIBCPP
				#pragma push_macro("_LIBCPP_NO_EXCEPTIONS")
				#define _LIBCPP_NO_EXCEPTIONS
				#pragma clang force_cuda_host_device begin
				#include_next <type_traits>
				#pragma clang force_cuda_host_device end
				#pragma pop_macro("_LIBCPP_NO_EXCEPTIONS")
				#else
				#include_next <type_traits>
				#endif // __HIP_USE_LIBCPP

				#endif // __CLANG_CUDA_WRAPPERS_TYPE_TRAITS

clang/lib/Headers/cuda_wrappers/utility

This file was added.

				/*===---- utility - CUDA/HIP wrapper for <utility> -------------------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __CLANG_CUDA_WRAPPERS_UTILITY
				#define __CLANG_CUDA_WRAPPERS_UTILITY

				#if __HIP_USE_LIBCPP
				#pragma push_macro("_LIBCPP_NO_EXCEPTIONS")
				#define _LIBCPP_NO_EXCEPTIONS
				#pragma clang force_cuda_host_device begin
				#include_next <utility>
				#pragma clang force_cuda_host_device end
				#pragma pop_macro("_LIBCPP_NO_EXCEPTIONS")
				#else
				#include_next <utility>
				#endif // __HIP_USE_LIBCPP

				#endif // __CLANG_CUDA_WRAPPERS_UTILITY

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support <functional> in device codeNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 345456

clang/docs/HIPSupport.rst

clang/lib/Headers/CMakeLists.txt

clang/lib/Headers/__clang_hip_runtime_wrapper.h

clang/lib/Headers/cuda_wrappers/__tuple

clang/lib/Headers/cuda_wrappers/array

clang/lib/Headers/cuda_wrappers/functional

clang/lib/Headers/cuda_wrappers/tuple

clang/lib/Headers/cuda_wrappers/type_traits

clang/lib/Headers/cuda_wrappers/utility

[HIP] Support <functional> in device code
Needs ReviewPublic