This is an archive of the discontinued LLVM Phabricator instance.

[HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc
ClosedPublic

Authored by jrbyrnes on Jun 23 2023, 3:00 PM.

Details

Summary

Provide control over clang job / action creation. This feature provides the phase pipeline for an upcoming COMGR action : AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_RELOCATABLE

Diff Detail

Event Timeline

jrbyrnes created this revision.Jun 23 2023, 3:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 3:00 PM
jrbyrnes requested review of this revision.Jun 23 2023, 3:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 3:00 PM
jrbyrnes edited the summary of this revision. (Show Details)Jun 23 2023, 3:01 PM
jrbyrnes updated this revision to Diff 534086.Jun 23 2023, 3:03 PM

Formatting

yaxunl added inline comments.Jun 23 2023, 4:02 PM
clang/test/Driver/hip-phases.hip
259

probably just use

// RELOC-NOT: linker

same for below

also, we need a test for -fgpu-rdc case

jrbyrnes updated this revision to Diff 534725.Jun 26 2023, 1:11 PM
jrbyrnes marked an inline comment as done.

Fix tests + add tests. Add phase test for -fgpu-rdc --no-gpu-link-output (these are not intended to be used together)

What's the difference here between this and the existing --hip-link?

jrbyrnes added a comment.EditedJun 26 2023, 4:02 PM

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no-gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles / linking through HIPAMD toolchain, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no- gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no- gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.

For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.

For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.

I see, so conceptually this is like -Xarch-device -c (if such a thing worked)?

I am thinking we probably want to rename the option as -fhip-emit-relocatable and limit it to be used with -cuda-device-only and -fno-gpu-rdc only.

For -fno-gpu-rdc case we do not use lto. Since -fno-gpu-rdc has one TU only, we use the non-lto backend to get relocatable object, and use lld for relocatable to shared object. This patch allows us to stop at the relocatable object since comgr needs that.

I see, so conceptually this is like -Xarch-device -c (if such a thing worked)?

It is like that

What's the difference here between this and the existing --hip-link?

Hi @jhuber6

The commit is poorly named, the main purpose is to introduce -no-gpu-link-output.

We want a way to produce relocatable from source. In terms of the Driver, this means building actions and jobs for phases up to phases::Assemble. -no- gpu-link-output does this by overriding BuildActions to stop after phases::Assemble (similar to -no-gpu-bundle-output). -gpu-link-output is NFCI. COMGR would be the client of this, and it would be up to COMGR to handle linking of the relocatable.

AFAICT, -hip-link allows for linking of offload-bundles, so it is conceptually different. We can get (somewhat) close to what we with -emit-llvm -hip-link, but that is probably more due to -emit-llvm. -hip-link by itself produces linker actions / jobs which what we are trying to avoid here.

So, you run the backend and obtain a relocatable ELF, but do not link it via lld? If I'm understanding this correctly, that would be the difference between -flto and -fno-lto, or -foffload-lto and -fno-offload-lto, AMDGPU always having -flto on currently. Also I recall AMDGPU / HIP completely disabling the backend step at some point, so it only emits LLVM-IR.

The whole point of this work is to give hiprtc a way to compile-to-bitcode and optimize sources in a single step, to make (user-passed) flag handling less weird. Since the intent of LTO is to defer this optimization step, I would assume any way we try to use it here would not be correct.

jrbyrnes updated this revision to Diff 535456.Jun 28 2023, 10:08 AM

Naming + -cuda-device-only and -fno-gpu-rdc only

jrbyrnes retitled this revision from [HIP]: Add gpu-link-output to control link job creation to [HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc.Jun 28 2023, 10:09 AM
yaxunl added inline comments.Jun 28 2023, 10:57 AM
clang/lib/Driver/Driver.cpp
3367–3369

There are data members Relocatable and CompileDeviceOnly in the base class. You can use them instead of using local variables.

3374

need to emit a diag diag::err_opt_not_valid_with_opt for fgpu-rdc
and diag::err_opt_not_valid_without_opt if it is not device only.

jrbyrnes updated this revision to Diff 535484.Jun 28 2023, 11:54 AM

Use member variabls + add diagnostic + tests

yaxunl added inline comments.Jun 28 2023, 1:19 PM
clang/lib/Driver/Driver.cpp
3364–3366

probably needs to be moved to ctor of CudaActionBuilderBase since they are needed by both Cuda and HIP action builders.

jrbyrnes updated this revision to Diff 535519.Jun 28 2023, 2:02 PM
jrbyrnes marked 3 inline comments as done.

Address Comment

clang/lib/Driver/Driver.cpp
3364–3366

Thanks

yaxunl accepted this revision.Jun 28 2023, 6:58 PM

LGTM. Thanks

This revision is now accepted and ready to land.Jun 28 2023, 6:58 PM