This patch is separated from https://reviews.llvm.org/D45212.
Patch by Greg Rodgers.
Revised and lit tests added by Yaxun Liu.
Paths
| Differential D45277
[CUDA] Add amdgpu sub archs ClosedPublic Authored by yaxunl on Apr 4 2018, 11:41 AM.
Details
Summary This patch is separated from https://reviews.llvm.org/D45212. Patch by Greg Rodgers.
Diff Detail
Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptApr 4 2018, 11:41 AM This revision is now accepted and ready to land.Apr 4 2018, 11:49 AM Closed by commit rC329232: [CUDA] Add amdgpu sub archs (authored by yaxunl). · Explain WhyApr 4 2018, 2:22 PM This revision was automatically updated to reflect the committed changes. Comment Actions I didn't get a chance to review the patch before it got committed.
Revision Contents
Diff 141056 include/clang/Basic/Cuda.h
lib/Basic/Cuda.cpp
lib/Basic/Targets.h
lib/Basic/Targets.cpp
lib/Basic/Targets/AMDGPU.h
lib/Basic/Targets/AMDGPU.cpp
lib/Basic/Targets/NVPTX.cpp
test/Driver/cuda-arch-translation.cu
|
Unless you're planning to guarantee 1:1 match to functionality provided by nvidia's sm_32, it would be prudent to use some other value for the macro so the source code has a way to tell these GPUs apart.
Another issue with this approach is that typical use pattern for CUDA_ARCH is
#if __CUDA_ARCH__ >= XXX. I don't expect that we'll always be able to maintain order across GPU architectures among NVIDIA and AMD GPUs. Perhaps for HIP compilation it would make more sense to define CUDA_ARCH as 1 (this should serve as a legacy indication of device-side compilation) and define HIP_ARCH to indicate which AMD GPU we're compiling for without accidentally enabling something that was intended for NVIDIA's GPUs only.