[WIP] Move part of nvptx devicertl under clang
Example of moving the devicertl functions that depend on cuda
version under clang, so they can be injected at application
build time.
The original idea was to use the intrinsic definitions from
__clang_cuda_intrinsics, but that header needs a lot of cuda
specific setup to compile and includes part of the cuda sdk.
It's therefore difficult to compile as openmp.
This implements the code in headers and will work for c++ with
openmp, but not necessarily for C as the inline functions may not
be instantiated. It will also be a problem for fortran openmp.
I'm inclined to do something broadly equivalent to this, but in
the library. It means clang would need to link against devicertl.bc
and against a small cuda version specific devicertl_tbd.bc.
clang-tidy: warning: 'auto *CTC' can be declared as 'const auto *CTC' [llvm-qualified-auto]
not useful