In current implementation of deviceRTLs, we're using some functions
that are CUDA version dependent (if CUDA_VERSION < 9, it is one; otheriwse, it
is another one). As a result, we have to compile one bitcode library for each
CUDA version supported. A worse problem is forward compatibility. If a new CUDA
version is released, we have to update CMake file as well.
CUDA 9.2 has been released for three years. Instead of using various weird tricks
to make deviceRTLs work with different CUDA versions and still have forward
compatibility, we can simply drop support for CUDA 9.1 or lower version. It has at
least two benifits:
- We don't need to generate bitcode libraries for each CUDA version;
- Clang driver doesn't need to search for the bitcode lib based on CUDA version.
We can claim that starting from LLVM 12, OpenMP offloading on NVPTX target requires
CUDA 9.2+.