So far, the clang-offload-bundler has been the default tool for bundling together various files types produced by the different OpenMP offloading toolchains supported by Clang. It does a great job for file types such as .bc, .ll, .ii, .ast. It is also used for bundling object files. Object files are special, in this case object files which contain sections meant to be executed on devices other than the host (such is the case of the OpenMP NVPTX toolchain). The bundling of object files prevents:
- STATIC LINKING: These bundled object files can be part of static libraries which means that the object file requires an unbundling step. If an object file in a static library requires "unbundling" then we need to know the whereabouts of that library and of the files before the actual link step which makes it impossible to do static linking using the "-L/path/to/lib/folder -labc" flag.
- INTEROPERABILITY WITH OTHER COMPILERS: These bundled object files can end up being passed between Clang and other compilers which may lead to incompatibilities: passing a bundled file from Clang to another compiler would lead to that compiler not being able to unbundle it. Passing an unbundled object file to Clang and therefore Clang not knowing that it doesn't need to unbundle it.
Goal:
Disable the use of the clang-offload-bundler for bundling/unbundling object files which contain OpenMP NVPTX device offloaded code. This applies to the case where the following set of flabold textgs are passed to Clang:
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
When the above condition is not met the compiler works as it does today by invoking the clang-offload-bundler for bundling/unbundling object files (at the cost of static linking and interoperability).
The clang-offload-bundler usage on files other than object files is not affected by this patch.
Extensibility
Although this patch disables bundling/unbundling of object files via the clang-offload-bundler for the OpenMP NVPTX device offloading toolchain ONLY, this functionality can be extended to other platforms/system where:
- the device toolchain can produce a host-compatible object AND
- partial linking of host objects is supported.
Current situation in trunk
In the current trunk the OpenMP device offloading toolchain performs the following steps depending on the input. Note: the clang-offload-bundler calls are part of the host toolchain but are shown here for clarity.
- SCENARIO 1 (-c -o)
INPUT TO CLANG: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda input.cpp -c -o input.o RELEVANT COMPILATION STEPS: PTXAS --------[.cubin]-------> clang-offload-bundler --bundle
- SCENARIO 2 (input object file):
INPUT TO CLANG: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda input.o RELEVANT COMPILATION STEPS: clang-offload-bundler --unbundle --------[.cubin]-------> NVLINK
- SCENARIO 3 (static linking):
INPUT TO CLANG: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -L/path/to/lib -lstatic input.cpp RELEVANT COMPILATION STEPS: PTXAS --------[.cubin]-------> NVLINK [STATIC LINKING FLAGS ARE IGNORED]
- SCENARIO 4 (C/C++ compilation):
INPUT TO CLANG: -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda input.cpp RELEVANT COMPILATION STEPS: PTXAS --------[.cubin]-------> NVLINK
In the current trunk, the object on which the device toolchain operates is always a pure device object, i.e. a cubin. This only works when NVLINK can work on cubins directly, when these cubins are part of a static library or are bundled, NVLINK does not detect them anymore.
The solution:
The solution to this problem involves several changes:
A. Make the device object file detectable by NVLINK in all situations (even when it is part of a static library).
To do this we need to add 2 steps to the OpenMP NVPTX device offloading toolchain for the case when an object file is created:
SCENARIO 1 changes to:
PTXAS --------[.cubin AND .s]-------> FATBINARY --------[.c]-------> CLANG++ --------[.o]-------> clang-offload-bundler --bundle
B. Since the new object we create in SCENARIO 1 is a host object we no longer need a custom "bundling" scheme [FIXES INTEROPERABILITY]
SCENARIO 1 changes to:
PTXAS --------[.cubin AND .s]-------> FATBINARY --------[.c]-------> CLANG++ --------[.o]-------> ld -r
!!IMPORTANT!!!: ld -r is a host step shown here for completeness, it replaces the clang-offload-bundler call.
C. With changes A & B we don't need to perform unbundling because the object files can now be passed directly to the OpenMP NVPTX device offloading toolchain. NVLINK detects the device part in each file automatically (no need for special unbundling step here).
SCENARIO 2 changes to:
--------[input.o]-------> NVLINK
D. Enable static linking by passing the input flags directly to existing NVLINK. NVLINK can now detect device objects even when they are packed in a static library (since they were created using FATBINARY + CLANG++). [FIXES STATIC LINKING]
SCENARIO 3 changes to:
PTXAS --------[.cubin]-------> NVLINK -L/path/to/lib -lstatic
Note that SCENARIO 4 remains unchanged.
This patch implements changes A, B, C and D in one go.
Use CamelCase for class local variables.