The old way worked to some degree for C++-mode but in C mode we actually
tried to introduce variants of macros (e.g., isinf). To make both modes
work reliably we get rid of those extra variants and directly use NVIDIA
intrinsics in the complex implementation. While this has to be revisited
as we add other GPU targets which want to reuse the code, it should be
fine for now.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Fine by me. Let's get nvptx working properly in tree now and work out how to wire up amdgcn subsequently. I'm sure a reasonable abstraction will present itself.
I'm missing something -- what was wrong with the changes in D80897 ?
AMD's HIP compilation already piggy-backs on using clang's C++ wrappers, so this change will likely break them now and I'll be the first in line to revert the change.
@yaxunl -- Sam, does this change affect HIP compilation? If it does, perhaps we should keep C++-based macro definitions around.
It doesn't work for OpenMP. The problem is that we overload some of the math functions fine, e.g., sin(float) but not the template ones. So when the code below calls copysign(int, double) (or something similar), the OpenMP target variant overload is missing. I have template overload support locally but it needs tests and there is one issue I've seen. This was supposed to be a stopgap as it unblocks the OpenMP mode.
AMD's HIP compilation already piggy-backs on using clang's C++ wrappers, so this change will likely break them now and I'll be the first in line to revert the change.
I did not know they are using __clang_cuda headers. (Site note, we should rename them then.)
@yaxunl -- Sam, does this change affect HIP compilation? If it does, perhaps we should keep C++-based macro definitions around.
Sure, I can do this only in OpenMP mode and keep the proper C++ std functions in C++ mode. Does that sound good?
I also did not know that. I am repeatedly caught out by things named 'cuda', 'nvptx' or '__nv' being used by amdgpu.
Perhaps we should refactor the __clang_cuda_* headers to make the distinctions between cuda, hip, openmp-nvptx, openmp-amdgcn clear(er).
It's complicated. :-)
Originally clang's headers were written to tactically fill in the gaps in the CUDA SDK headers that clang could not deal with.
OpenMP grew NVPTX back-end support and wanted to use a subset of those headers that happened to be conveniently close to math.h
AMD's HIP shares C++ front-end with CUDA and wants to benefit from the standard library glue we've implemented for CUDA. It also uses a lot of things internally that were originally targeting CUDA, but are now reused for HIP as well. Hence there are number of places where 'cuda' things do the double duty during HIP compilation. So do some of the CUDA-related headers.
Perhaps we should refactor the __clang_cuda_* headers to make the distinctions between cuda, hip, openmp-nvptx, openmp-amdgcn clear(er).
Agreed.
Balancing OpenMP and CUDA constraints was interesting. With HIP in the picture, it will be even more so. TBH at the moment I do not see a clean way to satisfy all users of GPU-related things in clang. That said, now may be a good time to deal with this. AMD has made a lot of progress making clang work for targeting AMD GPUs and it will likely see a lot more use relatively soon. We do want to keep things working for all parties involved.
Thx for the reviews!
FWIW, OpenMP should be able to use the C/C++ standard functions/macros for this eventually. Getting the overloads right if you don't have type system support is tricky though and I need more time...
On a separate note, we should bundle the resources to get more "GPU-compatible" generic headers in. At the end of the day, what we need for CUDA/HIP/OPENMP on NVIDIA/AMD/... hardware should be very similar, assuming we can make some design choices right ;)