Currently CUDA toolchain only supports nvptx.
This patch will let CUDA toolchain support amdgpu target. It can also serve as an example for supporting CUDA on other targets.
Patch by Greg Rodgers.
Lit test added by Yaxun Liu.
Paths
| Differential D42800
Let CUDA toolchain support amdgpu target AbandonedPublic Authored by gregrodgers on Feb 1 2018, 8:20 AM.
Details
Diff Detail Event TimelineHerald added subscribers: tpr, dstuttard, nhaehnle and 3 others. · View Herald TranscriptFeb 1 2018, 8:20 AM Comment Actions Only commenting on parts that I'm a bit familiar with. In general, does it make sense to split this patch, are there different "stages" of support? Like 1) being able to compile an empty file, 2) generate optimized code, 3) allow using math functions?
Comment Actions
Good suggestion. Actually this patch is mainly to let the toolchain recognise the amdgpu implementation of CUDA and create proper stages. I can try to create a test for compiling an empty file. Comment Actions I don't have enough knowledge about compute on AMD's GPU and would appreciate if you could share your thoughts on how you think CUDA on AMD should work. Is there a good document describing how compute currently works (how do I launch a kernel using rough equivalent of nvidia's driver API ) on AMD GPUs?
Comment Actions Thanks to everyone for the reviews. I hope I replied to all inline comments. Since I sent this to Sam to post, we discovered a major shortcoming. As tra points out, there is a lot of cuda headers in the cuda sdk that are processed. We are able to override asm() expansions with #undef and redefine as an equivalent amdgpu component so the compiler never sees the asm(). I am sure we will need to add more redefines as we broaden our testing. But that is not the big problem. We would like to be able to run cudaclang for AMD GPUs without an install of cuda. Of course you must always install cuda if any of your targeted GPUs are NVidia GPUs. To run cudaclang without cuda when only non-NVidia gpus are specified, we need an open set of headers and we must replace the fatbin tools used in the toolchain. The later can be addressed by using the libomptarget methods for embedding multiple target GPU objects. The former is going to take a lot of work. I am going to be sending an updated patch that has the stubs for the open headers noted in clang_cuda_runtime_wrapper.h. They will be included with the CC1 flag -DUSE_OPEN_HEADERS__. This will be generated by the cuda driver when it finds no cuda installation and all target GPUs are not NVidia. This revision now requires changes to proceed.Feb 1 2018, 6:40 PM Comment Actions Sorry, all my great inline comments got lost somehow. I am a newbie to Phabricator. I will try to reconstruct my comments.
Comment Actions Here my replys to the inline comments. Everything should be fixed in the next revision.
Comment Actions Let's start with the obvious points:
Additionally I don't see responses to all points that @tra raised. Their answers will probably explain the approach you chose, so I think they should also be added to the summary.
Revision Contents
Diff 134107 include/clang/Basic/Cuda.h
include/clang/Driver/ToolChain.h
lib/Basic/Cuda.cpp
lib/Basic/Targets/AMDGPU.h
lib/Basic/Targets/AMDGPU.cpp
lib/Basic/Targets/NVPTX.cpp
lib/Driver/Driver.cpp
lib/Driver/SanitizerArgs.cpp
lib/Driver/ToolChain.cpp
lib/Driver/ToolChains/Clang.cpp
lib/Driver/ToolChains/Cuda.h
lib/Driver/ToolChains/Cuda.cpp
test/Driver/cuda-phases.cu
|
Should complete list of processors for the amdgcn architecture be included? See https://llvm.org/docs/AMDGPUUsage.html#processors .