CUDA 11.2 and CUDA 11.1 are all available now.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM. Thanks. We will rebase this to 12 and come up with a less convoluted way for 13.
Use PTX 71 for CUDA 11.2 although it supports PTX 72 because seems like Clang
cannot support PTX 72 every well.
libomptarget/deviceRTLs/nvptx/target_impl.cu-cuda_112-sm_75.bc cd /nvm/0/shiltian/build/openmp/debug/libomptarget/deviceRTLs/nvptx && /home/shiltian/Documents/deploy/llvm/release/bin/clang -S -x c++ -O1 -std=c++14 -target nvptx64 -Xclang -emit-llvm-bc -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__CUDACC__ -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src -DOMPTARGET_NVPTX_DEBUG=0 -Xclang -target-cpu -Xclang sm_75 -D__CUDA_ARCH__=750 -Xclang -target-feature -Xclang +ptx72 -DCUDA_VERSION=11200 /home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu -o target_impl.cu-cuda_112-sm_75.bc /home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:71:10: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70|ptx71 return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f);
With this patch, building release started to fail with the following error:
FAILED: openmp/libomptarget/deviceRTLs/nvptx/target_impl.cu-cuda_111-sm_60.bc cd build-release/runtimes/runtimes-bins/openmp/libomptarget/deviceRTLs/nvptx && build-release/./bin/clang -S -x c++ -O1 -std=c++14 -target nvptx64 -Xclang -emit-llvm-bc -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__CUDACC__ -Illvm-project/openmp/libomptarget/deviceRTLs -Illvm-project/openmp/libomptarget/deviceRTLs/nvptx/src -DOMPTARGET_NVPTX_DEBUG=0 -Xclang -target-cpu -Xclang sm_60 -D__CUDA_ARCH__=600 -Xclang -target-feature -Xclang +ptx71 -DCUDA_VERSION=11100 llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu -o target_impl.cu-cuda_111-sm_60.bc llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:71:10: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70 return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f); ^ llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:82:10: error: '__nvvm_shfl_sync_down_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70 return __nvvm_shfl_sync_down_i32(Mask, Var, Delta, T); ^ llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:92:3: error: '__nvvm_bar_warp_sync' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70 __nvvm_bar_warp_sync(Mask); ^
My cmake configuration contains these libomptarget-related (because we have various GPU gens in our cluster):
-DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_70 \ -DLIBOMPTARGET_ENABLE_DEBUG=on \ -DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=true \ -DLIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF \ -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="35;60;70" \
That looks very similar to the above problem with ptx72.
While I sympathise with a desire to run on the latest cuda, openmp building for ptx versions that clang doesn't support seems over-ambitious. Suggest we dial this list back to only those ptx/cuda combinations that clang generates code for, until such point someone (possibly from the openmp effort) updates clang.
Eventually we’re not gonna support various CUDA version. In my local patch which has not been submitted for review, ptx60 is used.
I can confirm, that with the following, I can build release successfully:
set(cuda_version_list 112 111 110 102 101 100 92 91 90 80) set(ptx_feature_list 70 70 70 65 64 63 61 61 60 42)
Release still has
set(ptx_feature_list 71 71 70 65 64 63 61 61 60 42)
and compilation still fails.