This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Driver/ToolChains/
-
Driver/
-
ToolChains/
-
Clang.cpp
-
test/Driver/
-
Driver/
1/1
cuda-options.cu
4/6
hip-options.hip

Differential D103579

[LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch
ClosedPublic

Authored by tejohnson on Jun 2 2021, 5:33 PM.

Download Raw Diff

Details

Reviewers

yaxunl
tra

Commits

rGd0ee8b64ecf3: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch

Summary

A recent change (D99683) to support ThinLTO for HIP caused a regression
when compiling cuda code with -flto=thin -fwhole-program-vtables.
Specifically, we now get an error:
error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'

This error is coming from the device offload cc1 action being set up for
the cuda compile, for which -flto=thin doesn't apply and gets dropped.
This is a regression, but points to a potential issue that was silently
occurring before the patch, details below.

Before D99683, the check for fwhole-program-vtables in the driver looked
like:

if (WholeProgramVTables) {
  if (!D.isUsingLTO())
    D.Diag(diag::err_drv_argument_only_allowed_with)
        << "-fwhole-program-vtables"
        << "-flto";
  CmdArgs.push_back("-fwhole-program-vtables");
}

And D.isUsingLTO() returned true since we have -flto=thin. However,
because the cuda cc1 compile is doing device offloading, which didn't
support any LTO, there was other code that suppressed -flto* options
from being passed to the cc1 invocation. So the cc1 invocation silently
had -fwhole-program-vtables without any -flto*. This seems potentially
problematic, since if we had any virtual calls we would get type test
assume sequences without the corresponding LTO pass that handles them.

However, with the patch, which adds support for device offloading LTO
option -foffload-lto=thin, the code has changed so that we set a bool
IsUsingLTO based on either -flto* or -foffload-lto*, depending on
whether this is the device offloading action. For the device offload
action in our compile, since we don't have -foffload-lto, IsUsingLTO is
false, and the check for LTO with -fwhole-program-vtables now fails.

What we should do is only pass through -fwhole-program-vtables to the
cc1 invocation that has LTO enabled (either the device offload action
with -foffload-lto, or the non-device offload action with -flto), and
otherwise drop the -fwhole-program-vtables for the non-LTO action.
Then we should error only if we have -fwhole-program-vtables without any
-f*lto* options.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson created this revision.Jun 2 2021, 5:33 PM

Herald added a subscriber: inglorion. · View Herald TranscriptJun 2 2021, 5:33 PM

tejohnson requested review of this revision.Jun 2 2021, 5:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2021, 5:33 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

tejohnson added inline comments.Jun 2 2021, 5:36 PM

clang/test/Driver/hip-options.hip
72	These are essentially the same checks as before, but I renamed this tag to HIPTHINLTO, added a check to ensure we don't get the error with -fwhole-program-vtables, ensure -fwhole-program-vtables passed through for the offload cc1 command, and added a duplicate of the HIPTHINLTO-NOT check (since in my local build the non-offload cc1 is emitted after the offload cc1, not sure if there is some variation).

Harbormaster completed remote builds in B107363: Diff 349431.Jun 2 2021, 6:29 PM

tra added inline comments.Jun 3 2021, 10:42 AM

clang/test/Driver/hip-options.hip
63–68	caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. We should add a CUDA test for that. This test only covers HIP compilation.

tejohnson added inline comments.Jun 3 2021, 10:51 AM

clang/test/Driver/hip-options.hip
63–68	AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole-program-vtables to. However, for the purposes of this bug fix, I think adding the below testing here should be sufficient - it triggers exactly the same way as when I saw this in an internal cuda build.

tra added inline comments.Jun 3 2021, 11:44 AM

clang/test/Driver/hip-options.hip
63–68	AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole-program-vtables to. `cuda-options.cu` would be the right place. However, for the purposes of this bug fix, I think adding the below testing here should be sufficient - it triggers exactly the same way as when I saw this in an internal cuda build. The key difference is that HIP does support LTO on the GPU side, but CUDA's does not, which suggests that their handling of lto-related flags is likely different and worth testing that we do see the expected behavior. E.g. HIP compilation with `-foffload-lto=thin` does propagate LTO flags to device compilation, but CUDA compilation should not (but should still compile with LTO enabled on the host. While this patch does fix the issue for both CUDA and HIP, it's good to have a test which demonstrates how we expect HIP/CUDA to behave. Right now we only have that for HIP.

Add Cuda test

tejohnson marked an inline comment as done.Jun 3 2021, 12:33 PM

tejohnson added inline comments.

clang/test/Driver/hip-options.hip
63–68	Added the cuda test, and confirmed it fails with the error without my patch.

LGTM for CUDA.

@yaxunl Sam, does the change make sense for HIP?

clang/test/Driver/cuda-options.cu
190–192	Nit: This could be combined into `--check-prefixes=DEVICE,DEVICE-NOSAVE,....`
clang/test/Driver/hip-options.hip
63–68	Thank you.

This revision is now accepted and ready to land.Jun 3 2021, 1:11 PM

LGTM. Thanks!

Harbormaster completed remote builds in B107525: Diff 349647.Jun 3 2021, 1:42 PM

Combine check prefixes as suggested

tejohnson marked an inline comment as done.Jun 3 2021, 2:24 PM

This revision was landed with ongoing or failed builds.Jun 3 2021, 2:25 PM

Closed by commit rGd0ee8b64ecf3: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch (authored by tejohnson). · Explain Why

This revision was automatically updated to reflect the committed changes.

tejohnson added a commit: rGd0ee8b64ecf3: [LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch.

Harbormaster completed remote builds in B107557: Diff 349690.Jun 3 2021, 4:03 PM

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Clang.cpp

10 lines

test/

Driver/

cuda-options.cu

10 lines

hip-options.hip

29 lines

Diff 349691

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,641 Lines • ▼ Show 20 Lines	bool WholeProgramVTables = Args.hasFlag(
options::OPT_fno_whole_program_vtables, VirtualFunctionElimination);		options::OPT_fno_whole_program_vtables, VirtualFunctionElimination);
if (VirtualFunctionElimination && !WholeProgramVTables) {		if (VirtualFunctionElimination && !WholeProgramVTables) {
D.Diag(diag::err_drv_argument_not_allowed_with)		D.Diag(diag::err_drv_argument_not_allowed_with)
<< "-fno-whole-program-vtables"		<< "-fno-whole-program-vtables"
<< "-fvirtual-function-elimination";		<< "-fvirtual-function-elimination";
}		}

if (WholeProgramVTables) {		if (WholeProgramVTables) {
if (!IsUsingLTO)		// Propagate -fwhole-program-vtables if this is an LTO compile.
		if (IsUsingLTO)
		CmdArgs.push_back("-fwhole-program-vtables");
		// Check if we passed LTO options but they were suppressed because this is a
		// device offloading action, or we passed device offload LTO options which
		// were suppressed because this is not the device offload action.
		// Otherwise, issue an error.
		else if (!D.isUsingLTO(!IsDeviceOffloadAction))
D.Diag(diag::err_drv_argument_only_allowed_with)		D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"		<< "-fwhole-program-vtables"
<< "-flto";		<< "-flto";
CmdArgs.push_back("-fwhole-program-vtables");
}		}

bool DefaultsSplitLTOUnit =		bool DefaultsSplitLTOUnit =
(WholeProgramVTables \|\| Sanitize.needsLTO()) &&		(WholeProgramVTables \|\| Sanitize.needsLTO()) &&
(LTOMode == LTOK_Full \|\| TC.canSplitThinLTOUnit());		(LTOMode == LTOK_Full \|\| TC.canSplitThinLTOUnit());
bool SplitLTOUnit =		bool SplitLTOUnit =
Args.hasFlag(options::OPT_fsplit_lto_unit,		Args.hasFlag(options::OPT_fsplit_lto_unit,
options::OPT_fno_split_lto_unit, DefaultsSplitLTOUnit);		options::OPT_fno_split_lto_unit, DefaultsSplitLTOUnit);
▲ Show 20 Lines • Show All 1,085 Lines • Show Last 20 Lines

clang/test/Driver/cuda-options.cu

	Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines

	// e) --cuda-include-ptx=all overrides preceding --no-cuda-include-ptx=sm_XX			// e) --cuda-include-ptx=all overrides preceding --no-cuda-include-ptx=sm_XX
	// RUN: %clang -### -target x86_64-linux-gnu \			// RUN: %clang -### -target x86_64-linux-gnu \
	// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \			// RUN: --cuda-gpu-arch=sm_35 --cuda-gpu-arch=sm_30 \
	// RUN: --no-cuda-include-ptx=sm_30 --cuda-include-ptx=all \			// RUN: --no-cuda-include-ptx=sm_30 --cuda-include-ptx=all \
	// RUN: -c %s 2>&1 \			// RUN: -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,PTX-SM35,PTX-SM30 %s			// RUN: \| FileCheck -check-prefixes FATBIN-COMMON,PTX-SM35,PTX-SM30 %s

				// Verify -flto=thin -fwhole-program-vtables handling. This should result in
				// both options being passed to the host compilation, with neither passed to
				// the device compilation.
				// RUN: %clang -### -target x86_64-linux-gnu -c -flto=thin -fwhole-program-vtables %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes DEVICE,DEVICE-NOSAVE,HOST,INCLUDES-DEVICE,NOLINK,THINLTOWPD %s
				// THINLTOWPD-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'

				traUnsubmitted Done Reply Inline Actions Nit: This could be combined into `--check-prefixes=DEVICE,DEVICE-NOSAVE,....` tra: Nit: This could be combined into `--check-prefixes=DEVICE,DEVICE-NOSAVE,....`
	// ARCH-SM20: "-cc1"{{.*}}"-target-cpu" "sm_20"			// ARCH-SM20: "-cc1"{{.*}}"-target-cpu" "sm_20"
	// NOARCH-SM20-NOT: "-cc1"{{.*}}"-target-cpu" "sm_20"			// NOARCH-SM20-NOT: "-cc1"{{.*}}"-target-cpu" "sm_20"
	// ARCH-SM30: "-cc1"{{.*}}"-target-cpu" "sm_30"			// ARCH-SM30: "-cc1"{{.*}}"-target-cpu" "sm_30"
	// NOARCH-SM30-NOT: "-cc1"{{.*}}"-target-cpu" "sm_30"			// NOARCH-SM30-NOT: "-cc1"{{.*}}"-target-cpu" "sm_30"
	// ARCH-SM35: "-cc1"{{.*}}"-target-cpu" "sm_35"			// ARCH-SM35: "-cc1"{{.*}}"-target-cpu" "sm_35"
	// NOARCH-SM35-NOT: "-cc1"{{.*}}"-target-cpu" "sm_35"			// NOARCH-SM35-NOT: "-cc1"{{.*}}"-target-cpu" "sm_35"
	// ARCHALLERROR: error: Unsupported CUDA gpu architecture: all			// ARCHALLERROR: error: Unsupported CUDA gpu architecture: all

	// Match device-side preprocessor and compiler phases with -save-temps.			// Match device-side preprocessor and compiler phases with -save-temps.
	// DEVICE-SAVE: "-cc1" "-triple" "nvptx64-nvidia-cuda"			// DEVICE-SAVE: "-cc1" "-triple" "nvptx64-nvidia-cuda"
	// DEVICE-SAVE-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"			// DEVICE-SAVE-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
	// DEVICE-SAVE-SAME: "-fcuda-is-device"			// DEVICE-SAVE-SAME: "-fcuda-is-device"
	// DEVICE-SAVE-SAME: "-x" "cuda"			// DEVICE-SAVE-SAME: "-x" "cuda"

	// DEVICE-SAVE: "-cc1" "-triple" "nvptx64-nvidia-cuda"			// DEVICE-SAVE: "-cc1" "-triple" "nvptx64-nvidia-cuda"
	// DEVICE-SAVE-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"			// DEVICE-SAVE-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
	// DEVICE-SAVE-SAME: "-fcuda-is-device"			// DEVICE-SAVE-SAME: "-fcuda-is-device"
	// DEVICE-SAVE-SAME: "-x" "cuda-cpp-output"			// DEVICE-SAVE-SAME: "-x" "cuda-cpp-output"

	// Match the job that produces PTX assembly.			// Match the job that produces PTX assembly.
	// DEVICE: "-cc1" "-triple" "nvptx64-nvidia-cuda"			// DEVICE: "-cc1" "-triple" "nvptx64-nvidia-cuda"
	// DEVICE-NOSAVE-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"			// DEVICE-NOSAVE-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
				// THINLTOWPD-NOT: "-flto=thin"
	// DEVICE-SAME: "-fcuda-is-device"			// DEVICE-SAME: "-fcuda-is-device"
	// DEVICE-SM30-SAME: "-target-cpu" "sm_30"			// DEVICE-SM30-SAME: "-target-cpu" "sm_30"
				// THINLTOWPD-NOT: "-fwhole-program-vtables"
	// DEVICE-SAME: "-o" "[[PTXFILE:[^"]*]]"			// DEVICE-SAME: "-o" "[[PTXFILE:[^"]*]]"
	// DEVICE-NOSAVE-SAME: "-x" "cuda"			// DEVICE-NOSAVE-SAME: "-x" "cuda"
	// DEVICE-SAVE-SAME: "-x" "ir"			// DEVICE-SAVE-SAME: "-x" "ir"

	// Match the call to ptxas (which assembles PTX to SASS).			// Match the call to ptxas (which assembles PTX to SASS).
	// DEVICE:ptxas			// DEVICE:ptxas
	// DEVICE-SM30-DAG: "--gpu-name" "sm_30"			// DEVICE-SM30-DAG: "--gpu-name" "sm_30"
	// DEVICE-DAG: "--output-file" "[[CUBINFILE:[^"]*]]"			// DEVICE-DAG: "--output-file" "[[CUBINFILE:[^"]*]]"
	Show All 28 Lines
	// HOST-SAVE: "-cc1" "-triple" "x86_64-unknown-linux-gnu"			// HOST-SAVE: "-cc1" "-triple" "x86_64-unknown-linux-gnu"
	// HOST-SAVE-SAME: "-aux-triple" "nvptx64-nvidia-cuda"			// HOST-SAVE-SAME: "-aux-triple" "nvptx64-nvidia-cuda"
	// HOST-SAVE-NOT: "-fcuda-is-device"			// HOST-SAVE-NOT: "-fcuda-is-device"
	// HOST-SAVE-SAME: "-x" "cuda"			// HOST-SAVE-SAME: "-x" "cuda"

	// Match host-side compilation.			// Match host-side compilation.
	// HOST: "-cc1" "-triple" "x86_64-unknown-linux-gnu"			// HOST: "-cc1" "-triple" "x86_64-unknown-linux-gnu"
	// HOST-SAME: "-aux-triple" "nvptx64-nvidia-cuda"			// HOST-SAME: "-aux-triple" "nvptx64-nvidia-cuda"
				// THINLTOWPD-SAME: "-flto=thin"
	// HOST-NOT: "-fcuda-is-device"			// HOST-NOT: "-fcuda-is-device"
	// There is only one GPU binary after combining it with fatbinary!			// There is only one GPU binary after combining it with fatbinary!
	// INCLUDES-DEVICE2-NOT: "-fcuda-include-gpubinary"			// INCLUDES-DEVICE2-NOT: "-fcuda-include-gpubinary"
	// INCLUDES-DEVICE-SAME: "-fcuda-include-gpubinary" "[[FATBINARY]]"			// INCLUDES-DEVICE-SAME: "-fcuda-include-gpubinary" "[[FATBINARY]]"
	// There is only one GPU binary after combining it with fatbinary.			// There is only one GPU binary after combining it with fatbinary.
	// INCLUDES-DEVICE2-NOT: "-fcuda-include-gpubinary"			// INCLUDES-DEVICE2-NOT: "-fcuda-include-gpubinary"
				// THINLTOWPD-SAME: "-fwhole-program-vtables"
	// HOST-SAME: "-o" "[[HOSTOUTPUT:[^"]*]]"			// HOST-SAME: "-o" "[[HOSTOUTPUT:[^"]*]]"
	// HOST-NOSAVE-SAME: "-x" "cuda"			// HOST-NOSAVE-SAME: "-x" "cuda"
	// HOST-SAVE-SAME: "-x" "cuda-cpp-output"			// HOST-SAVE-SAME: "-x" "cuda-cpp-output"

	// Match external assembler that uses compilation output.			// Match external assembler that uses compilation output.
	// HOST-AS: "-o" "{{.*}}.o" "[[HOSTOUTPUT]]"			// HOST-AS: "-o" "{{.*}}.o" "[[HOSTOUTPUT]]"

	// Match no GPU code inclusion.			// Match no GPU code inclusion.
	Show All 21 Lines

clang/test/Driver/hip-options.hip

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \			// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
	// RUN: --offload-arch=gfx906 -fgpu-inline-threshold=1000 %s 2>&1 \| FileCheck -check-prefix=THRESH %s			// RUN: --offload-arch=gfx906 -fgpu-inline-threshold=1000 %s 2>&1 \| FileCheck -check-prefix=THRESH %s
	// THRESH: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-mllvm" "-inline-threshold=1000"			// THRESH: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-mllvm" "-inline-threshold=1000"
	// THRESH-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-inline-threshold=1000"			// THRESH-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-inline-threshold=1000"

	// Check -foffload-lto=thin translated correctly.			// Check -foffload-lto=thin translated correctly.

	// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \			// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx906 -foffload-lto=thin %s 2>&1 \			// RUN: --cuda-gpu-arch=gfx906 -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=THINLTO %s			// RUN: \| FileCheck -check-prefix=HIPTHINLTO %s

				// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
				// RUN: --cuda-gpu-arch=gfx906 -fgpu-rdc -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=HIPTHINLTO %s
				traUnsubmitted Not Done Reply Inline Actions caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. We should add a CUDA test for that. This test only covers HIP compilation. tra: > caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. We…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole-program-vtables to. However, for the purposes of this bug fix, I think adding the below testing here should be sufficient - it triggers exactly the same way as when I saw this in an internal cuda build. tejohnson: AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole…
				traUnsubmitted Done Reply Inline Actions AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole-program-vtables to. `cuda-options.cu` would be the right place. However, for the purposes of this bug fix, I think adding the below testing here should be sufficient - it triggers exactly the same way as when I saw this in an internal cuda build. The key difference is that HIP does support LTO on the GPU side, but CUDA's does not, which suggests that their handling of lto-related flags is likely different and worth testing that we do see the expected behavior. E.g. HIP compilation with `-foffload-lto=thin` does propagate LTO flags to device compilation, but CUDA compilation should not (but should still compile with LTO enabled on the host. While this patch does fix the issue for both CUDA and HIP, it's good to have a test which demonstrates how we expect HIP/CUDA to behave. Right now we only have that for HIP. tra: > AFAICT there are no existing Cuda lto tests in clang/test/Driver that I could add -fwhole…
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions Added the cuda test, and confirmed it fails with the error without my patch. tejohnson: Added the cuda test, and confirmed it fails with the error without my patch.
				traUnsubmitted Not Done Reply Inline Actions Thank you. tra: Thank you.

				// Ensure we don't error about -fwhole-program-vtables for the non-device offload compile.
				// HIPTHINLTO-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
				// HIPTHINLTO-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-flto-unit"
				tejohnsonAuthorUnsubmitted Done Reply Inline Actions These are essentially the same checks as before, but I renamed this tag to HIPTHINLTO, added a check to ensure we don't get the error with -fwhole-program-vtables, ensure -fwhole-program-vtables passed through for the offload cc1 command, and added a duplicate of the HIPTHINLTO-NOT check (since in my local build the non-offload cc1 is emitted after the offload cc1, not sure if there is some variation). tejohnson: These are essentially the same checks as before, but I renamed this tag to HIPTHINLTO, added a…
				// HIPTHINLTO: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-flto=thin" "-flto-unit" {{.*}} "-fwhole-program-vtables"
				// HIPTHINLTO-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-flto-unit"
				// HIPTHINLTO: lld{{.*}}"-plugin-opt=mcpu=gfx906" "-plugin-opt=thinlto" "-plugin-opt=-force-import-all"

				// Check that -flto=thin is handled correctly, particularly with -fwhole-program-vtables.
				//
	// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \			// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx906 -fgpu-rdc -foffload-lto=thin %s 2>&1 \			// RUN: --cuda-gpu-arch=gfx906 -flto=thin -fwhole-program-vtables %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=THINLTO %s			// RUN: \| FileCheck -check-prefix=THINLTO %s

	// THINLTO-NOT: clang{{.}} "-triple" "x86_64-unknown-linux-gnu" {{.}} "-flto-unit"			// Ensure we don't error about -fwhole-program-vtables for the device offload compile. We should
	// THINLTO: clang{{.}} "-triple" "amdgcn-amd-amdhsa" {{.}} "-flto=thin" "-flto-unit"			// drop -fwhole-program-vtables for the device offload compile and pass it through for the
	// THINLTO: lld{{.*}}"-plugin-opt=mcpu=gfx906" "-plugin-opt=thinlto" "-plugin-opt=-force-import-all"			// non-device offload compile along with -flto=thin.
				// THINLTO-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
				// THINLTO-NOT: clang{{.}}" "-triple" "amdgcn-amd-amdhsa" {{.}} "-fwhole-program-vtables"
				// THINLTO: clang{{.}}" "-triple" "x86_64-unknown-linux-gnu" {{.}} "-flto=thin" {{.*}} "-fwhole-program-vtables"
				// THINLTO-NOT: clang{{.}}" "-triple" "amdgcn-amd-amdhsa" {{.}} "-fwhole-program-vtables"