This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch
ClosedPublic

Authored by tejohnson on Jun 2 2021, 5:33 PM.

Details

Summary

A recent change (D99683) to support ThinLTO for HIP caused a regression
when compiling cuda code with -flto=thin -fwhole-program-vtables.
Specifically, we now get an error:
error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'

This error is coming from the device offload cc1 action being set up for
the cuda compile, for which -flto=thin doesn't apply and gets dropped.
This is a regression, but points to a potential issue that was silently
occurring before the patch, details below.

Before D99683, the check for fwhole-program-vtables in the driver looked
like:

if (WholeProgramVTables) {
  if (!D.isUsingLTO())
    D.Diag(diag::err_drv_argument_only_allowed_with)
        << "-fwhole-program-vtables"
        << "-flto";
  CmdArgs.push_back("-fwhole-program-vtables");
}

And D.isUsingLTO() returned true since we have -flto=thin. However,
because the cuda cc1 compile is doing device offloading, which didn't
support any LTO, there was other code that suppressed -flto* options
from being passed to the cc1 invocation. So the cc1 invocation silently
had -fwhole-program-vtables without any -flto*. This seems potentially
problematic, since if we had any virtual calls we would get type test
assume sequences without the corresponding LTO pass that handles them.

However, with the patch, which adds support for device offloading LTO
option -foffload-lto=thin, the code has changed so that we set a bool
IsUsingLTO based on either -flto* or -foffload-lto*, depending on
whether this is the device offloading action. For the device offload
action in our compile, since we don't have -foffload-lto, IsUsingLTO is
false, and the check for LTO with -fwhole-program-vtables now fails.

What we should do is only pass through -fwhole-program-vtables to the
cc1 invocation that has LTO enabled (either the device offload action
with -foffload-lto, or the non-device offload action with -flto), and
otherwise drop the -fwhole-program-vtables for the non-LTO action.
Then we should error only if we have -fwhole-program-vtables without any
-f*lto* options.

Diff Detail

Event Timeline

tejohnson created this revision.Jun 2 2021, 5:33 PM
tejohnson requested review of this revision.Jun 2 2021, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2021, 5:33 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tejohnson added inline comments.Jun 2 2021, 5:36 PM
clang/test/Driver/hip-options.hip
72

These are essentially the same checks as before, but I renamed this tag to HIPTHINLTO, added a check to ensure we don't get the error with -fwhole-program-vtables, ensure -fwhole-program-vtables passed through for the offload cc1 command, and added a duplicate of the HIPTHINLTO-NOT check (since in my local build the non-offload cc1 is emitted after the offload cc1, not sure if there is some variation).

tra added inline comments.Jun 3 2021, 10:42 AM
clang/test/Driver/hip-options.hip
63–68

caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables.

We should add a CUDA test for that. This test only covers HIP compilation.

tejohnson added inline comments.Jun 3 2021, 10:51 AM
clang/test/Driver/hip-options.hip
63–68

AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole-program-vtables to.

However, for the purposes of this bug fix, I think adding the below testing here should be sufficient - it triggers exactly the same way as when I saw this in an internal cuda build.

tra added inline comments.Jun 3 2021, 11:44 AM
clang/test/Driver/hip-options.hip
63–68

AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole-program-vtables to.

cuda-options.cu would be the right place.

However, for the purposes of this bug fix, I think adding the below testing here should be sufficient - it triggers exactly the same way as when I saw this in an internal cuda build.

The key difference is that HIP does support LTO on the GPU side, but CUDA's does not, which suggests that their handling of lto-related flags is likely different and worth testing that we do see the expected behavior. E.g. HIP compilation with -foffload-lto=thin does propagate LTO flags to device compilation, but CUDA compilation should not (but should still compile with LTO enabled on the host.

While this patch does fix the issue for both CUDA and HIP, it's good to have a test which demonstrates how we expect HIP/CUDA to behave. Right now we only have that for HIP.

tejohnson updated this revision to Diff 349647.Jun 3 2021, 12:33 PM

Add Cuda test

tejohnson marked an inline comment as done.Jun 3 2021, 12:33 PM
tejohnson added inline comments.
clang/test/Driver/hip-options.hip
63–68

Added the cuda test, and confirmed it fails with the error without my patch.

tra accepted this revision.Jun 3 2021, 1:11 PM

LGTM for CUDA.

@yaxunl Sam, does the change make sense for HIP?

clang/test/Driver/cuda-options.cu
190–192

Nit: This could be combined into --check-prefixes=DEVICE,DEVICE-NOSAVE,....

clang/test/Driver/hip-options.hip
63–68

Thank you.

This revision is now accepted and ready to land.Jun 3 2021, 1:11 PM
yaxunl accepted this revision.Jun 3 2021, 1:21 PM

LGTM. Thanks!

tejohnson updated this revision to Diff 349690.Jun 3 2021, 2:24 PM
tejohnson marked an inline comment as done.

Combine check prefixes as suggested

tejohnson marked an inline comment as done.Jun 3 2021, 2:24 PM
This revision was landed with ongoing or failed builds.Jun 3 2021, 2:25 PM
This revision was automatically updated to reflect the committed changes.