From this patch (plus some landed patches), deviceRTLs is taken as a regular OpenMP program with just declare target regions. In this way, ideally, deviceRTLs can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
- For NVPTX, DEVICE is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove DEVICE or probably some other macros.
- Shared variable is implemented with OpenMP allocator, which is defined in allocator.h. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
- CUDA header cuda.h is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
- Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc, such as libomptarget-nvptx-cuda_80-sm_20.bc.
With this change, there are also multiple features to be expected in the near future:
- CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
- Atomic operations used in deviceRTLs can be replaced by omp atomic if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
- Target specific parts will be wrapped into declare variant with isa selector if it can work properly. No target specific macro is needed anymore.
- (Maybe more...)
clang-tidy: error: 'common/device_environment.h' file not found [clang-diagnostic-error]
not useful