- MSVC has different <complex> implementation which calls into functions declared in <ymath.h>. Instead of through builtins or inline functions, <ymath.h> functions are provided through external libraries. To get <complex> compiled with HIP on MSVC, we need to
- Wrap <ymath.h> to force its functions to be __host__ __device__.
- Provide its function definitions for the device compilation.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Beyond the enabling of the compilation with <complex> on Windows, I really have the concern on the current approach supporting <complex> compilation in the device compilation. The device compilation should not relies on the host STL implementation. That results in inconsistent compilation results across various platforms, especially Linux vs. Windows.
BTW, the use of <complex> in CUDA cannot be compiled with NVCC directly even with --expt-relaxed-constexpr, c.f. https://godbolt.org/z/3f79co
Disclaimer: I request changes because of the next sentence, other than that I have no objection but also cannot review this.
All cuda_wrapper headers say something about complex in the first row, copy & paste error. All have the wrong license text (I think).
The device compilation should not relies on the host STL implementation.
The OpenMP one doesn't, at least not as much as the CUDA one. Unsure how that works on windows though.
Fixed.
The device compilation should not relies on the host STL implementation.
The OpenMP one doesn't, at least not as much as the CUDA one. Unsure how that works on windows though.
This patch doesn't apply to OpenMP.
clang/lib/Headers/cuda_wrappers/ymath.h | ||
---|---|---|
16 ↗ | (On Diff #313211) | I am wondering whether we only want to do this for windows, since ymath.h may be an ordinary users' header file on linux. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | I don't think we want to provide a HD implementation. Next question is -- do we want to provide the definitions, or would just declarations be sufficient? |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | These functions are declared in ymath.h and, in the host compilation, are resolved by linking MSVC RT libraries. For the device function, as we already mark all prototypes in ymath.h as HD, we have to implement them as HD to match the declaration. But, those definitions should be only available in the device compilation to avoid conflicts in the host compilation. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | You don't need the definitions to avoid conflicting declarations. I'm still not convinced that HD is needed. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | We don't call these functions directly. They are called in MSVC's <complex>. As functions in <complex> are marked as HD, we need to mark these functions in ymath.h as HD to pass the compilation. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | I assume that we're dealing with this file: I don't think we need a wrapper for ymath. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | ymath.h could be included before <complex>. That implies _Cosh could be declared as H only and results in the compilation failure. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | How? Isn't clang_hip_runtime_wrapper.h included before anything in the user source file is processed? If the clang_hip_runtime_wrapper.h is not included, first, then the ymath.h wrapper will not work either as it needs __device__ macros.
While consistency between host/device is good, we should not introduce a possible inconsistency between host-side TUs. It's reasonably safe to add an overload (it may still be observable, but it's usually possible to add it in a way that does not affect the host). Replacing host-side things is more risky, as it greatly expands the opportunities for things to go wrong. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | <ymath.h> is also included in other headers, which is not wrapped. If we don't wrap <ymath.h>, there's a chance that it's included as it is. That's why we have to wrap <ymath.h> to ensure all functions marked with HD. Do I miss anything? |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | I am wondering whether we could assume <ymath.h> is an internal header *only*. |
clang/lib/Headers/__clang_hip_runtime_wrapper.h | ||
---|---|---|
73–74 ↗ | (On Diff #313211) | It's turned out that <ymath.h> (an internal header) is included in other headers, which is not wrapped like <complex>. The sequence including <ymath.h> using MSVC 2017 is from <algorithm>, <xmemory>, <xmemory0>, <limits>, and then <ymath.h>. As <algorithm> is included before <complex>, without wrapping <ymath.h>, we cannot overload _Cosh (pure C function.). |
Forget that C function could be overloaded on Clang with overloadable
extension. With that, we don't need to mark functions from <ymath.h> as HD.
Instead, we could provide their device-side implementation directly.
Ah. It was the "C" part that I was missing. I was only thinking of C++/HIP/CUDA. I think we're on the same page now. LGTM.