Page MenuHomePhabricator

[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1
ClosedPublic

Authored by tianshilei1992 on Feb 18 2021, 3:25 PM.

Details

Summary

CUDA 11.2 and CUDA 11.1 are all available now.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Feb 18 2021, 3:25 PM
tianshilei1992 requested review of this revision.Feb 18 2021, 3:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2021, 3:25 PM
jdoerfert accepted this revision.Feb 18 2021, 3:26 PM

LGTM. Thanks. We will rebase this to 12 and come up with a less convoluted way for 13.

This revision is now accepted and ready to land.Feb 18 2021, 3:26 PM

Use PTX 71 for CUDA 11.2 although it supports PTX 72 because seems like Clang
cannot support PTX 72 every well.

libomptarget/deviceRTLs/nvptx/target_impl.cu-cuda_112-sm_75.bc
cd /nvm/0/shiltian/build/openmp/debug/libomptarget/deviceRTLs/nvptx && /home/shiltian/Documents/deploy/llvm/release/bin/clang -S -x c++ -O1 -std=c++14 -target nvptx64 -Xclang -emit-llvm-bc -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__CUDACC__ -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src -DOMPTARGET_NVPTX_DEBUG=0 -Xclang -target-cpu -Xclang sm_75 -D__CUDA_ARCH__=750 -Xclang -target-feature -Xclang +ptx72 -DCUDA_VERSION=11200 /home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu -o target_impl.cu-cuda_112-sm_75.bc
/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:71:10: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70|ptx71
  return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f);
Harbormaster completed remote builds in B89814: Diff 324793.

If clang doesn't do ptx72, I don't think openmp on clang should attempt to.

With this patch, building release started to fail with the following error:

FAILED: openmp/libomptarget/deviceRTLs/nvptx/target_impl.cu-cuda_111-sm_60.bc 
cd build-release/runtimes/runtimes-bins/openmp/libomptarget/deviceRTLs/nvptx && build-release/./bin/clang -S -x c++ -O1 -std=c++14 -target nvptx64 -Xclang -emit-llvm-bc -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__CUDACC__ -Illvm-project/openmp/libomptarget/deviceRTLs -Illvm-project/openmp/libomptarget/deviceRTLs/nvptx/src -DOMPTARGET_NVPTX_DEBUG=0 -Xclang -target-cpu -Xclang sm_60 -D__CUDA_ARCH__=600 -Xclang -target-feature -Xclang +ptx71 -DCUDA_VERSION=11100 llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu -o target_impl.cu-cuda_111-sm_60.bc
llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:71:10: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70
  return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f);
         ^
llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:82:10: error: '__nvvm_shfl_sync_down_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70
  return __nvvm_shfl_sync_down_i32(Mask, Var, Delta, T);
         ^
llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:92:3: error: '__nvvm_bar_warp_sync' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70
  __nvvm_bar_warp_sync(Mask);
  ^

My cmake configuration contains these libomptarget-related (because we have various GPU gens in our cluster):

-DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_70 \
-DLIBOMPTARGET_ENABLE_DEBUG=on \
-DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=true \
-DLIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF \
-DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="35;60;70" \

That looks very similar to the above problem with ptx72.

While I sympathise with a desire to run on the latest cuda, openmp building for ptx versions that clang doesn't support seems over-ambitious. Suggest we dial this list back to only those ptx/cuda combinations that clang generates code for, until such point someone (possibly from the openmp effort) updates clang.

The error is because PTX71 support is not in the release. I’ll hot fix it today.

That looks very similar to the above problem with ptx72.

While I sympathise with a desire to run on the latest cuda, openmp building for ptx versions that clang doesn't support seems over-ambitious. Suggest we dial this list back to only those ptx/cuda combinations that clang generates code for, until such point someone (possibly from the openmp effort) updates clang.

Eventually we’re not gonna support various CUDA version. In my local patch which has not been submitted for review, ptx60 is used.

I can confirm, that with the following, I can build release successfully:

set(cuda_version_list 112 111 110 102 101 100 92 91 90 80)
set(ptx_feature_list 70 70 70 65 64 63 61 61 60 42)

I can confirm, that with the following, I can build release successfully:

set(cuda_version_list 112 111 110 102 101 100 92 91 90 80)
set(ptx_feature_list 70 70 70 65 64 63 61 61 60 42)

Thanks. The fix has been cherry picked into the release branch.

Release still has

set(ptx_feature_list 71 71 70 65 64 63 61 61 60 42)

and compilation still fails.