Adds NVPTX intrinsics for the CUDA PTX ldmatrix.sync.aligned instructions added in PTX 6.5.
PTX ISA description of ldmatrix.sync.aligned: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix
Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Nit: | is redundant here, IMO.
I think [something] already reads as something is optional.