This is an archive of the discontinued LLVM Phabricator instance.

libomptarget/deviceRTLs/nvptx/target_impl.cu-cuda_112-sm_75.bc
cd /nvm/0/shiltian/build/openmp/debug/libomptarget/deviceRTLs/nvptx && /home/shiltian/Documents/deploy/llvm/release/bin/clang -S -x c++ -O1 -std=c++14 -target nvptx64 -Xclang -emit-llvm-bc -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__CUDACC__ -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src -DOMPTARGET_NVPTX_DEBUG=0 -Xclang -target-cpu -Xclang sm_75 -D__CUDA_ARCH__=750 -Xclang -target-feature -Xclang +ptx72 -DCUDA_VERSION=11200 /home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu -o target_impl.cu-cuda_112-sm_75.bc
/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:71:10: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70|ptx71
  return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f);

Harbormaster completed remote builds in B89816: Diff 324797.Feb 18 2021, 5:48 PM

Harbormaster completed remote builds in B89814: Diff 324793.

Closed by commit rG89827fd404f9: [OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1 (authored by tianshilei1992). · Explain WhyFeb 18 2021, 6:04 PM

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rG89827fd404f9: [OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1.

If clang doesn't do ptx72, I don't think openmp on clang should attempt to.

With this patch, building release started to fail with the following error:

FAILED: openmp/libomptarget/deviceRTLs/nvptx/target_impl.cu-cuda_111-sm_60.bc 
cd build-release/runtimes/runtimes-bins/openmp/libomptarget/deviceRTLs/nvptx && build-release/./bin/clang -S -x c++ -O1 -std=c++14 -target nvptx64 -Xclang -emit-llvm-bc -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__CUDACC__ -Illvm-project/openmp/libomptarget/deviceRTLs -Illvm-project/openmp/libomptarget/deviceRTLs/nvptx/src -DOMPTARGET_NVPTX_DEBUG=0 -Xclang -target-cpu -Xclang sm_60 -D__CUDA_ARCH__=600 -Xclang -target-feature -Xclang +ptx71 -DCUDA_VERSION=11100 llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu -o target_impl.cu-cuda_111-sm_60.bc
llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:71:10: error: '__nvvm_shfl_sync_idx_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70
  return __nvvm_shfl_sync_idx_i32(Mask, Var, SrcLane, 0x1f);
         ^
llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:82:10: error: '__nvvm_shfl_sync_down_i32' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70
  return __nvvm_shfl_sync_down_i32(Mask, Var, Delta, T);
         ^
llvm-project/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:92:3: error: '__nvvm_bar_warp_sync' needs target feature ptx60|ptx61|ptx63|ptx64|ptx65|ptx70
  __nvvm_bar_warp_sync(Mask);
  ^

My cmake configuration contains these libomptarget-related (because we have various GPU gens in our cluster):

-DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_70 \
-DLIBOMPTARGET_ENABLE_DEBUG=on \
-DLIBOMPTARGET_NVPTX_ENABLE_BCLIB=true \
-DLIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF \
-DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="35;60;70" \

That looks very similar to the above problem with ptx72.

While I sympathise with a desire to run on the latest cuda, openmp building for ptx versions that clang doesn't support seems over-ambitious. Suggest we dial this list back to only those ptx/cuda combinations that clang generates code for, until such point someone (possibly from the openmp effort) updates clang.

The error is because PTX71 support is not in the release. I’ll hot fix it today.

In D97004#2578589, @JonChesterfield wrote:

That looks very similar to the above problem with ptx72.

While I sympathise with a desire to run on the latest cuda, openmp building for ptx versions that clang doesn't support seems over-ambitious. Suggest we dial this list back to only those ptx/cuda combinations that clang generates code for, until such point someone (possibly from the openmp effort) updates clang.

Eventually we’re not gonna support various CUDA version. In my local patch which has not been submitted for review, ptx60 is used.

I can confirm, that with the following, I can build release successfully:

set(cuda_version_list 112 111 110 102 101 100 92 91 90 80)
set(ptx_feature_list 70 70 70 65 64 63 61 61 60 42)

tianshilei1992 mentioned this in D97195: [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM.Feb 22 2021, 7:52 AM

tianshilei1992 mentioned this in rGf6c2984a090e: [OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported….Feb 23 2021, 10:20 AM

In D97004#2578786, @protze.joachim wrote:
I can confirm, that with the following, I can build release successfully:
set(cuda_version_list 112 111 110 102 101 100 92 91 90 80)
set(ptx_feature_list 70 70 70 65 64 63 61 61 60 42)

Thanks. The fix has been cherry picked into the release branch.

Release still has

set(ptx_feature_list 71 71 70 65 64 63 61 61 60 42)

and compilation still fails.

Revision Contents

Path

Size

openmp/

libomptarget/

deviceRTLs/

nvptx/

CMakeLists.txt

4 lines

Diff 324833

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	else()
list(APPEND bc_flags -DOMPTARGET_NVPTX_DEBUG=0)		list(APPEND bc_flags -DOMPTARGET_NVPTX_DEBUG=0)
endif()		endif()

# Create target to build all Bitcode libraries.		# Create target to build all Bitcode libraries.
add_custom_target(omptarget-nvptx-bc)		add_custom_target(omptarget-nvptx-bc)

# This map is from clang/lib/Driver/ToolChains/Cuda.cpp.		# This map is from clang/lib/Driver/ToolChains/Cuda.cpp.
# The last element is the default case.		# The last element is the default case.
set(cuda_version_list 110 102 101 100 92 91 90 80)		set(cuda_version_list 112 111 110 102 101 100 92 91 90 80)
set(ptx_feature_list 70 65 64 63 61 61 60 42)		set(ptx_feature_list 71 71 70 65 64 63 61 61 60 42)
# The following two lines of ugly code is not needed when the minimal CMake		# The following two lines of ugly code is not needed when the minimal CMake
# version requirement is 3.17+.		# version requirement is 3.17+.
list(LENGTH cuda_version_list num_version_supported)		list(LENGTH cuda_version_list num_version_supported)
math(EXPR loop_range "${num_version_supported} - 1")		math(EXPR loop_range "${num_version_supported} - 1")

# Generate a Bitcode library for all the compute capabilities the user		# Generate a Bitcode library for all the compute capabilities the user
# requested and all PTX version we know for now.		# requested and all PTX version we know for now.
foreach(sm ${nvptx_sm_list})		foreach(sm ${nvptx_sm_list})
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 324833

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

[OpenMP][NVPTX] Add the support for CUDA 11.2 and CUDA 11.1
ClosedPublic