These are initial changes to experiment with building the Fortran runtime
as a CUDA or OpenMP target offload library.
The initial patch defines a set of macros that have to be used consistently
in Flang runtime source code so that it can be built for different
offload devices using different programming models (CUDA, HIP, OpenMP target
offload). Currently supported modes are:
- CUDA: Flang runtime may be built as a fatlib for the host and a set of CUDA architectures specified during the build. The packaging of the device code is done by the CUDA toolchain and may differ from toolchan to toolchain.
- OpenMP offload:
- host_device mode: Flang runtime may be built as a fatlib for the host and a set of OpenMP offload architectures. The packaging of the device code is done by the OpenMP offload compiler and may differ from compiler to compiler.
OpenMP offload 'nohost' mode is a TODO to match the build setup
of libomptarget/DeviceRTL. Flang runtime will be built as LLVM Bitcode
library using Clang/LLVM toolchain. The host part of the library
will be "empty", so there will be two distributable object: the host
Flang runtime and dummy host library with device Flang runtime pieces
packaged using clang-offload-packager and clang.
In all supported modes, enabling parts of Flang runtime for the device
compilation can be done iteratively to make the patches observable.
Note that at any point in time the resulting library may have unresolved
references to not yet enabled parts of Flang runtime.
Example cmake/make commands for building with Clang for NVPTX target:
cmake \
-DFLANG_EXPERIMENTAL_CUDA_RUNTIME=ON \
-DCMAKE_CUDA_ARCHITECTURES=80 \
-DCMAKE_C_COMPILER=/clang_nvptx/bin/clang \
-DCMAKE_CXX_COMPILER=/clang_nvptx/bin/clang++ \
-DCMAKE_CUDA_COMPILER=/clang_nvptx/bin/clang \
/llvm-project/flang/runtime/
make -j FortranRuntime
Example cmake/make commands for building with Clang OpenMP offload:
cmake \
-DFLANG_EXPERIMENTAL_OMP_OFFLOAD_BUILD="host_device" \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DFLANG_OMP_DEVICE_ARCHITECTURES="sm_80" \
../flang/runtime/
make -j FortranRuntime
The fact that we now have some but not all members available on the device is not ideal, IMHO. I would have hoped the opposing approach would at least be tried; compile the entire thing for the device, opt-out where necessary.