Provide control over clang job / action creation. This feature provides the phase pipeline for an upcoming COMGR action : AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_RELOCATABLE
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/test/Driver/hip-phases.hip | ||
---|---|---|
259 | probably just use // RELOC-NOT: linker same for below also, we need a test for -fgpu-rdc case |
Fix tests + add tests. Add phase test for -fgpu-rdc --no-gpu-link-output (these are not intended to be used together)
Hi @jhuber6
The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.
We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no-gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.
AFAICT, -hip-link allows for linking of offload-bundles / linking through HIPAMD toolchain, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.
So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.
For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.
I am thinking we probably want to rename the option as -fhip-emit-relocatable and limit it to be used with -cuda-device-only and -fno-gpu-rdc only.
The whole point of this work is to give hiprtc a way to compile-to-bitcode and optimize sources in a single step, to make (user-passed) flag handling less weird. Since the intent of LTO is to defer this optimization step, I would assume any way we try to use it here would not be correct.
clang/lib/Driver/Driver.cpp | ||
---|---|---|
3334–3336 | probably needs to be moved to ctor of CudaActionBuilderBase since they are needed by both Cuda and HIP action builders. |
probably needs to be moved to ctor of CudaActionBuilderBase since they are needed by both Cuda and HIP action builders.