The current libcgpu.a is actually an archive of fatbinaries. The host
file contains nothing but a section called LLVM_OFFLOADING that
contains embedded device code. This used to be handled implicitly by
borrowing the OpenMP toolchain, which did this packaging internally.
Passing the OpenMP flags causes problems with trying to move to testing.
This patch pulls this logic out into the CMake and handles it manually.
This patch is a lot of noise, but it fundamentally comes down to the
following changes.
- Build the source for every GPU architecture (GPU architectures are generally not backwards compatible)
- Combine all of these files into a single binary blob
- Embed that binary blob into a host file
- Package these host files into a .a archive.
- The device code will be extracted and managed by the offloading linker.
Another important point. Right now we are maintaining an important
distinction with the GPU build. That is, when we build the exported
library we will build for many GPU architectures. However, the internal
version will only be built for a single GPU architecture, one that was
found on the user's system. This is intended to be used for internal
testing, very similar to the current path where libc is compiled for a
single target triple.
gpu can potentially be directory name. So, can we use a suffix like, .__gpu__?