Adds NVPTX intrinsics and builtins for CUDA PTX cvt instructions for sm80 architectures and above. Requires ptx 7.0.
PTX ISA description of cvt instructions : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Nit: ff2v2bf is a bit hard to parse. I initially tried to interpret it as "convert ff2v to bf" and was confused about what exactly does 2v part mean -- we already have ff to denote two floats.
Perhaps ff2bf16x2 would be a bit easier to read and understand. It would also work consistently for f16 and tf32 variants below.