This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support early finalization of device code for -fno-gpu-rdc
ClosedPublic

Authored by yaxunl on Sep 21 2018, 1:27 PM.

Details

Summary

This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.

The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.

This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.

Diff Detail

Event Timeline

yaxunl created this revision.Sep 21 2018, 1:27 PM
yaxunl updated this revision to Diff 166556.Sep 21 2018, 1:50 PM

Fix comments.

tra added a comment.Sep 21 2018, 2:02 PM

Overall the patch look OK. I'll take a closer look on Monday.

Which mode do you expect will be most commonly used for HIP by default? With this patch we'll have two different ways to do similar things in HIP vs. CUDA.
E.g. by default CUDA compiles GPU code in each TU in a complete executable and requires -fcuda-rdc to compile to GPU object file.
HIP defaults to object-file compilation and requires --hip-early-finalize to match CUDA's default behavior.

I wonder if it would make sense to provide a single way to control this behavior. E.g. --fgpu-rdc (an alias for -cuda-rdc, perhaps?) would default to true in HIP, but disabled in CUDA. -fno-gpu-rdc would force 'whole GPU executable per TU' mode.

yaxunl added a comment.Oct 1 2018, 8:26 AM
In D52377#1242547, @tra wrote:

Overall the patch look OK. I'll take a closer look on Monday.

Which mode do you expect will be most commonly used for HIP by default? With this patch we'll have two different ways to do similar things in HIP vs. CUDA.
E.g. by default CUDA compiles GPU code in each TU in a complete executable and requires -fcuda-rdc to compile to GPU object file.
HIP defaults to object-file compilation and requires --hip-early-finalize to match CUDA's default behavior.

I wonder if it would make sense to provide a single way to control this behavior. E.g. --fgpu-rdc (an alias for -cuda-rdc, perhaps?) would default to true in HIP, but disabled in CUDA. -fno-gpu-rdc would force 'whole GPU executable per TU' mode.

Agree that --fgpu-rdc and -fno-gpu-rdc are better names of the options. I will make changes to use these options.

For the default option, we will use -fno-gpu-rdc to be consistent with cuda-clang.

yaxunl updated this revision to Diff 167774.Oct 1 2018, 10:52 AM
yaxunl retitled this revision from [HIP] Support early finalization of device code to [HIP] Support early finalization of device code for -fno-gpu-rdc.
yaxunl edited the summary of this revision. (Show Details)

Uses -fno-gpu-rdc for early finalization.

tra added inline comments.Oct 1 2018, 4:03 PM
include/clang/Driver/Options.td
587–589

Considering that -f[no-]cuda-rdc has been around for a while, we should still keep it around as an alias to -f[no-]gpu-rdc

yaxunl updated this revision to Diff 167952.Oct 2 2018, 7:58 AM
yaxunl edited the summary of this revision. (Show Details)

Added -f{no}-cuda-rdc as alias to -f{no}-gpu-rdc.

tra accepted this revision.Oct 2 2018, 10:20 AM

LGTM.

This revision is now accepted and ready to land.Oct 2 2018, 10:20 AM
This revision was automatically updated to reflect the committed changes.