This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support <functional> in device code
Needs ReviewPublic

Authored by yaxunl on May 14 2021, 9:10 AM.

Details

Reviewers
tra
Summary

This patch adds wrapper headers for <functional>
and a few others which is required to support
<functional>.

The basic idea is to make template functions
defined in these headers host device by pragmas.

Since this only works for libc++. The code is conditioned
for libc++ only. For libstdc++ it is NFC.

A test is added to llvm-test-suite for testing this:
https://reviews.llvm.org/D102508

Diff Detail

Event Timeline

yaxunl created this revision.May 14 2021, 9:10 AM
yaxunl requested review of this revision.May 14 2021, 9:10 AM
yaxunl edited the summary of this revision. (Show Details)May 14 2021, 9:22 AM
tra added a subscriber: rsmith.May 14 2021, 9:54 AM

In effect this patch applies __host__ __device__ to a subset of the standard library headers and whatever headers *they* happen to include. While it may happen to work, I'm not at all confident that it does not create interesting issues.

Considering that the patch only works with libc++ anyways, perhaps it's time to make (parts) of libc++ itself usable from CUDA/HIP, instead of hacking around it in the wrappers?

@rsmith Richard, who would be the right person to discuss the standard library changes we may need?

In effect this patch applies __host__ __device__ to a subset of the standard library headers and whatever headers *they* happen to include. While it may happen to work, I'm not at all confident that it does not create interesting issues.

Considering that the patch only works with libc++ anyways, perhaps it's time to make (parts) of libc++ itself usable from CUDA/HIP, instead of hacking around it in the wrappers?

@rsmith Richard, who would be the right person to discuss the standard library changes we may need?

ping.

If we are allowed to make changes to libc++ we may have cleaner implementation for supporting libc++ in HIP device functions.

Currently by default libc++ functions are host functions except constexpr functions. Except constexpr functions, we can't call libc++ host functions in HIP device functions. Our goal is to make libc++ functions __host__ __device__ functions so that they can be called in HIP device functions. We may not be able to support all libc++ functions, e.g. file I/O, threads, but at least we should be able to support some of them, e.g. type_traits, functional, containers. We do this by supporting the underlying functions e.g. malloc/free on device.

The change will be NFC for other languages.

rsmith added a subscriber: ldionne.Jun 1 2021, 1:35 PM
rsmith added a comment.Jun 1 2021, 1:40 PM

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

ping

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

ping

If the current approach is to patch libc++ from the outside, then yeah, that's most definitely not a great design IMO. It's going to be very brittle. I think it *may* be reasonable to support this in libc++, but I'd like to see some sort of basic explanation of what the changes would be so we can have a discussion and make our mind up about whether we can support this, and what's the best way of doing it.

yaxunl added a comment.EditedJun 24 2021, 8:59 AM

@ldionne How should we go about establishing whether libc++ would be prepared to officially support CUDA? Right now, Clang's CUDA support is patching in attributes onto libc++ functions from the outside, which doesn't seem like a sustainable model.

ping

If the current approach is to patch libc++ from the outside, then yeah, that's most definitely not a great design IMO. It's going to be very brittle. I think it *may* be reasonable to support this in libc++, but I'd like to see some sort of basic explanation of what the changes would be so we can have a discussion and make our mind up about whether we can support this, and what's the best way of doing it.

Thanks Louis. Please allow me to have a brief explanation about our plan to support libc++ for HIP device compilation.

HIP functions can have __device__, __host__, or __device__ __host__ attributes, indicating the target of a function. __device__ function can only be executed on device (GPU). __host__ functions can only be executed on host. __device__ __host__ functions can be executed on both device and host. By default (without explicit device/host attributes) a non-constexpr function is a host function, a constexpr function is __device__ __host__ function. This also applies to member functions of class. Clang is able to resolve overloaded functions differing only by device/function attributes.

Currently libc++ functions are host functions by default, except constexpr functions. As such the non-constexpr libc++ functions can only be called by host functions in HIP programs. This is similar to C++ programs.

By supporting libc++ in HIP device compilation we mean "allowing libc++ functions to be executed on device in HIP programs". To achieve this we can take 3 approaches:

  1. Many libc++ functions are generic regarding device or host, i.e., their code is common for device and host. For such functions we can make them __device__ __host__ functions.
  1. Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them __device__ __host__ and use #if __HIP_DEVICE_COMPILE__ (indicating device compilation) for the minor difference in the function body.
  1. Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded __device__ functions.

There are two ways to mark libc++ functions as __device__ __host__:

  1. Define a macro which expands to empty for non-HIP programs and expands to __device__ __host__ for HIP and add it to each libc++ function which is to be marked as __device__ __host__.
  1. Define macros which expand to empty for non-HIP programs and expand to #pragma clang force_cuda_host_device begin/end for HIP and put them at the beginning and end of a file where all the functions are to be marked as __device__ __host__.

We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 3) usefulness for device execution 4) availability of lower level support with HIP runtime.

tra added a comment.Jun 24 2021, 10:29 AM

The key difference between C++ and CUDA/HIP, as implemented in clang, is that __host__ and __device__ attributes are considered during function overloading in CUDA and HIP, so __host__ void foo(), __device__ void foo() and __host__ __device__ void foo() are three different functions and not redeclarations of the same function. Details of the original proposal are here: https://goo.gl/EXnymm.

  1. Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them __device__ __host__ and use #if __HIP_DEVICE_COMPILE__ (indicating device compilation) for the minor difference in the function body.

I think we should rely on target overloading when possible, instead of the preprocessor. Minimizing the differences between the code seen by compiler during host and device side compilation will minimize potential issues.
Which approach we'll end up using is an implementation detail.

  1. Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded __device__ functions.

There are two ways to mark libc++ functions as __device__ __host__:

  1. Define a macro which expands to empty for non-HIP programs and expands to __device__ __host__ for HIP and add it to each libc++ function which is to be marked as __device__ __host__.

One caveat of the overloading based on target attributes is that we can't re-declare a function with __device__ __host__ as compiler will see attempted redeclaration as a function overload of a function w/o attributes (implicitly __host__).

  1. Define macros which expand to empty for non-HIP programs and expand to #pragma clang force_cuda_host_device begin/end for HIP and put them at the beginning and end of a file where all the functions are to be marked as __device__ __host__.

We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 4) usefulness for device execution 3) availability of lower level support with HIP runtime.

All of the above applies to CUDA, modulo the macro names and some differences in the builtins and the the functions provided (or not) by runtime on the GPU side.

The key difference between C++ and CUDA/HIP, as implemented in clang, is that __host__ and __device__ attributes are considered during function overloading in CUDA and HIP, so __host__ void foo(), __device__ void foo() and __host__ __device__ void foo() are three different functions and not redeclarations of the same function. Details of the original proposal are here: https://goo.gl/EXnymm.

  1. Some libc++ functions are mostly common for device or host with minor differences. For such functions, we can make them __device__ __host__ and use #if __HIP_DEVICE_COMPILE__ (indicating device compilation) for the minor difference in the function body.

I think we should rely on target overloading when possible, instead of the preprocessor. Minimizing the differences between the code seen by compiler during host and device side compilation will minimize potential issues.
Which approach we'll end up using is an implementation detail.

Agree.

  1. Some libc++ functions have different implementations for device and host. We can leave these host functions as they are and adding overloaded __device__ functions.

There are two ways to mark libc++ functions as __device__ __host__:

  1. Define a macro which expands to empty for non-HIP programs and expands to __device__ __host__ for HIP and add it to each libc++ function which is to be marked as __device__ __host__.

One caveat of the overloading based on target attributes is that we can't re-declare a function with __device__ __host__ as compiler will see attempted redeclaration as a function overload of a function w/o attributes (implicitly __host__).

If we keep all the declarations consistent we should be fine.

  1. Define macros which expand to empty for non-HIP programs and expand to #pragma clang force_cuda_host_device begin/end for HIP and put them at the beginning and end of a file where all the functions are to be marked as __device__ __host__.

We plan to implement libc++ support in HIP device compilation in a progressive approach, header by header, and document the supported libc++ headers. We will prioritize libc++ headers to support based on 1) user requests 2) whether it has already been supported through clang wrapper headers (patching) 4) usefulness for device execution 3) availability of lower level support with HIP runtime.

All of the above applies to CUDA, modulo the macro names and some differences in the builtins and the the functions provided (or not) by runtime on the GPU side.