This is an archive of the discontinued LLVM Phabricator instance.

[hip] Enable HIP compilation with `<complex`> on MSVC.
ClosedPublic

Authored by hliao on Dec 21 2020, 7:59 AM.

Details

Summary
  • MSVC has different <complex> implementation which calls into functions declared in <ymath.h>. Instead of through builtins or inline functions, <ymath.h> functions are provided through external libraries. To get <complex> compiled with HIP on MSVC, we need to
    • Wrap <ymath.h> to force its functions to be __host__ __device__.
    • Provide its function definitions for the device compilation.

Diff Detail

Event Timeline

hliao requested review of this revision.Dec 21 2020, 7:59 AM
hliao created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 21 2020, 7:59 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao added a comment.Dec 21 2020, 8:27 AM

Beyond the enabling of the compilation with <complex> on Windows, I really have the concern on the current approach supporting <complex> compilation in the device compilation. The device compilation should not relies on the host STL implementation. That results in inconsistent compilation results across various platforms, especially Linux vs. Windows.
BTW, the use of <complex> in CUDA cannot be compiled with NVCC directly even with --expt-relaxed-constexpr, c.f. https://godbolt.org/z/3f79co

hliao updated this revision to Diff 313116.Dec 21 2020, 8:28 AM

Fix typo.

jdoerfert requested changes to this revision.Dec 21 2020, 8:36 AM
jdoerfert added a subscriber: jdoerfert.

Disclaimer: I request changes because of the next sentence, other than that I have no objection but also cannot review this.
All cuda_wrapper headers say something about complex in the first row, copy & paste error. All have the wrong license text (I think).


The device compilation should not relies on the host STL implementation.

The OpenMP one doesn't, at least not as much as the CUDA one. Unsure how that works on windows though.

This revision now requires changes to proceed.Dec 21 2020, 8:36 AM
hliao updated this revision to Diff 313136.Dec 21 2020, 9:53 AM

Fix typo again.

hliao updated this revision to Diff 313138.Dec 21 2020, 9:58 AM

Fix license.

hliao added a comment.Dec 21 2020, 9:59 AM

Disclaimer: I request changes because of the next sentence, other than that I have no objection but also cannot review this.
All cuda_wrapper headers say something about complex in the first row, copy & paste error. All have the wrong license text (I think).

Fixed.


The device compilation should not relies on the host STL implementation.

The OpenMP one doesn't, at least not as much as the CUDA one. Unsure how that works on windows though.

This patch doesn't apply to OpenMP.

jdoerfert resigned from this revision.Dec 21 2020, 11:24 AM

The device compilation should not relies on the host STL implementation.

The OpenMP one doesn't, at least not as much as the CUDA one. Unsure how that works on windows though.

This patch doesn't apply to OpenMP.

I'm aware.

hliao edited the summary of this revision. (Show Details)Dec 21 2020, 2:10 PM
hliao updated this revision to Diff 313199.Dec 21 2020, 2:16 PM

These functions are pure C functions.

hliao updated this revision to Diff 313211.Dec 21 2020, 3:33 PM

Fix the cmake to distribute that header wrapper.

yaxunl added inline comments.Jan 5 2021, 12:36 PM
clang/lib/Headers/cuda_wrappers/ymath.h
17

I am wondering whether we only want to do this for windows, since ymath.h may be an ordinary users' header file on linux.

tra added inline comments.Jan 5 2021, 1:01 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

I don't think we want to provide a HD implementation.
This will potentially change the meaning of these functions on the host side vs what they do in a C++ compilation.
It should probably be just __device__.

Next question is -- do we want to provide the definitions, or would just declarations be sufficient?
In other words -- are these functions needed for some standard C++ library functionality that we expect to work on the GPU?
If it's just to aid with overload resolution, declarations should do.

hliao updated this revision to Diff 314943.Jan 6 2021, 10:55 AM

Only mark HD attributes in ymath.h wrapper header when compiled with MSVC.

hliao marked an inline comment as done.Jan 6 2021, 10:55 AM
hliao added inline comments.
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

These functions are declared in ymath.h and, in the host compilation, are resolved by linking MSVC RT libraries. For the device function, as we already mark all prototypes in ymath.h as HD, we have to implement them as HD to match the declaration. But, those definitions should be only available in the device compilation to avoid conflicts in the host compilation.

tra added inline comments.Jan 6 2021, 11:09 AM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

You don't need the definitions to avoid conflicting declarations.

I'm still not convinced that HD is needed.
Did you try just making these declarations __device__ and remove the ymath.h wrapper?
Basically I'm trying to find the bare minimum we need to do to make it work.

hliao added inline comments.Jan 6 2021, 1:59 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

We don't call these functions directly. They are called in MSVC's <complex>. As functions in <complex> are marked as HD, we need to mark these functions in ymath.h as HD to pass the compilation.

tra added inline comments.Jan 6 2021, 2:34 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

I assume that we're dealing with this file:
https://github.com/microsoft/STL/blob/master/stl/inc/ymath.h

I don't think we need a wrapper for ymath.
It may be sufficient to define or declare __device__ _Cosh() and other functions and let overload resolution pick the right function.
I think it would be a better approach than providing an inline __host__ __device__ definition for those functions and effectively replacing MSVC-provided host-side implementation of those functions.

hliao added inline comments.Jan 6 2021, 2:50 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

ymath.h could be included before <complex>. That implies _Cosh could be declared as H only and results in the compilation failure.
BTW, I don't think replacing host-side implementation is a good idea as the host compilation should be kept consistent with the host compiler as much as possible.

tra added inline comments.Jan 6 2021, 4:03 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

How? Isn't clang_hip_runtime_wrapper.h included before anything in the user source file is processed? If the clang_hip_runtime_wrapper.h is not included, first, then the ymath.h wrapper will not work either as it needs __device__ macros.

replacing host-side implementation is a good idea

While consistency between host/device is good, we should not introduce a possible inconsistency between host-side TUs.
Considering vastly larger amounts of host-side code compiled as C++ (e.g. TF has way more C++ code than HIP/CUDA) and correspondingly more reliance on every possible detail of the implementation, I would err on the side of not changing host-side behavior in any way at all, if possible.

It's reasonably safe to add an overload (it may still be observable, but it's usually possible to add it in a way that does not affect the host). Replacing host-side things is more risky, as it greatly expands the opportunities for things to go wrong.

hliao added inline comments.Jan 6 2021, 5:32 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

<ymath.h> is also included in other headers, which is not wrapped. If we don't wrap <ymath.h>, there's a chance that it's included as it is. That's why we have to wrap <ymath.h> to ensure all functions marked with HD. Do I miss anything?

hliao added inline comments.Jan 6 2021, 5:54 PM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

I am wondering whether we could assume <ymath.h> is an internal header *only*.

hliao added inline comments.Jan 7 2021, 1:46 AM
clang/lib/Headers/__clang_hip_runtime_wrapper.h
73–74

It's turned out that <ymath.h> (an internal header) is included in other headers, which is not wrapped like <complex>. The sequence including <ymath.h> using MSVC 2017 is from <algorithm>, <xmemory>, <xmemory0>, <limits>, and then <ymath.h>. As <algorithm> is included before <complex>, without wrapping <ymath.h>, we cannot overload _Cosh (pure C function.).

hliao updated this revision to Diff 315073.Jan 7 2021, 2:28 AM

Forget that C function could be overloaded on Clang with overloadable
extension. With that, we don't need to mark functions from <ymath.h> as HD.
Instead, we could provide their device-side implementation directly.

tra accepted this revision.Jan 7 2021, 10:20 AM

Forget that C function could be overloaded on Clang with overloadable
extension. With that, we don't need to mark functions from <ymath.h> as HD.
Instead, we could provide their device-side implementation directly.

Ah. It was the "C" part that I was missing. I was only thinking of C++/HIP/CUDA. I think we're on the same page now. LGTM.

This revision is now accepted and ready to land.Jan 7 2021, 10:20 AM
yaxunl accepted this revision.Jan 7 2021, 10:51 AM

LGTM. Thanks.

This revision was landed with ongoing or failed builds.Jan 7 2021, 2:41 PM
This revision was automatically updated to reflect the committed changes.