HomePhabricator

[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async…

Authored by nyalloc on May 17 2021, 9:28 AM.

Description

[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

Adds NVPTX builtins and intrinsics for the CUDA PTX cp.async instructions for
sm_80 architecture or newer.

PTX ISA description of cp.async:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive

Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100394