This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Disable LTO for device-side compilations.
ClosedPublic

Authored by tra on Mar 20 2018, 9:59 AM.

Download Raw Diff

Details

Reviewers

tejohnson
jlebar

Commits

rGecb178bb356f: [CUDA] Disable LTO for device-side compilations.
rC328161: [CUDA] Disable LTO for device-side compilations.
rL328161: [CUDA] Disable LTO for device-side compilations.

Summary

This fixes host-side LTO during CUDA compilation. Before, LTO
pipeline construction was clashing with CUDA pipeline construction.

At the moment there's no point doing LTO on device side as each
device-side TU is a complete program. We will need to figure out
compilation pipeline construction for the device-side LTO when we
have working support for multi-TU device-side CUDA compilation.

Diff Detail

Repository: rL LLVM

Event Timeline

tra created this revision.Mar 20 2018, 9:59 AM

Herald added subscribers: eraman, inglorion, mehdi_amini, sanjoy. · View Herald TranscriptMar 20 2018, 9:59 AM

jlebar accepted this revision.Mar 20 2018, 10:21 AM

This revision is now accepted and ready to land.Mar 20 2018, 10:21 AM

Closed by commit rL328161: [CUDA] Disable LTO for device-side compilations. (authored by tra). · Explain WhyMar 21 2018, 3:25 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptMar 21 2018, 3:25 PM

Hi Artem,

The test failure should be fixed by r328213.

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Driver/

Driver.h

6 lines

lib/

Driver/

Driver.cpp

9 lines

ToolChains/

Clang.cpp

6 lines

test/

Driver/

lto.cu

76 lines

thinlto.cu

50 lines

Diff 139381

cfe/trunk/include/clang/Driver/Driver.h

Show First 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	public:
///		///
/// \return Whether any compilation should be built for this		/// \return Whether any compilation should be built for this
/// invocation.		/// invocation.
bool HandleImmediateArgs(const Compilation &C);		bool HandleImmediateArgs(const Compilation &C);

/// ConstructAction - Construct the appropriate action to do for		/// ConstructAction - Construct the appropriate action to do for
/// \p Phase on the \p Input, taking in to account arguments		/// \p Phase on the \p Input, taking in to account arguments
/// like -fsyntax-only or --analyze.		/// like -fsyntax-only or --analyze.
Action *ConstructPhaseAction(Compilation &C, const llvm::opt::ArgList &Args,		Action *ConstructPhaseAction(
phases::ID Phase, Action *Input) const;		Compilation &C, const llvm::opt::ArgList &Args, phases::ID Phase,
		Action *Input,
		Action::OffloadKind TargetDeviceOffloadKind = Action::OFK_None) const;

/// BuildJobsForAction - Construct the jobs to perform for the action \p A and		/// BuildJobsForAction - Construct the jobs to perform for the action \p A and
/// return an InputInfo for the result of running \p A. Will only construct		/// return an InputInfo for the result of running \p A. Will only construct
/// jobs for a given (Action, ToolChain, BoundArch, DeviceKind) tuple once.		/// jobs for a given (Action, ToolChain, BoundArch, DeviceKind) tuple once.
InputInfo		InputInfo
BuildJobsForAction(Compilation &C, const Action A, const ToolChain TC,		BuildJobsForAction(Compilation &C, const Action A, const ToolChain TC,
StringRef BoundArch, bool AtTopLevel, bool MultipleArchs,		StringRef BoundArch, bool AtTopLevel, bool MultipleArchs,
const char *LinkingOutput,		const char *LinkingOutput,
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

cfe/trunk/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 2,165 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
// Skip the phases that were already dealt with.		// Skip the phases that were already dealt with.
if (Ph < CurPhase)		if (Ph < CurPhase)
continue;		continue;
// We have to be consistent with the host final phase.		// We have to be consistent with the host final phase.
if (Ph > FinalPhase)		if (Ph > FinalPhase)
break;		break;

CudaDeviceActions[I] = C.getDriver().ConstructPhaseAction(		CudaDeviceActions[I] = C.getDriver().ConstructPhaseAction(
C, Args, Ph, CudaDeviceActions[I]);		C, Args, Ph, CudaDeviceActions[I], Action::OFK_Cuda);

if (Ph == phases::Assemble)		if (Ph == phases::Assemble)
break;		break;
}		}

// If we didn't reach the assemble phase, we can't generate the fat		// If we didn't reach the assemble phase, we can't generate the fat
// binary. We don't need to generate the fat binary if we are not in		// binary. We don't need to generate the fat binary if we are not in
// device-only mode.		// device-only mode.
▲ Show 20 Lines • Show All 823 Lines • ▼ Show 20 Lines	void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
Args.ClaimAllArgs(options::OPT_cl_ignored_Group);		Args.ClaimAllArgs(options::OPT_cl_ignored_Group);

// Claim --cuda-host-only and --cuda-compile-host-device, which may be passed		// Claim --cuda-host-only and --cuda-compile-host-device, which may be passed
// to non-CUDA compilations and should not trigger warnings there.		// to non-CUDA compilations and should not trigger warnings there.
Args.ClaimAllArgs(options::OPT_cuda_host_only);		Args.ClaimAllArgs(options::OPT_cuda_host_only);
Args.ClaimAllArgs(options::OPT_cuda_compile_host_device);		Args.ClaimAllArgs(options::OPT_cuda_compile_host_device);
}		}

Action *Driver::ConstructPhaseAction(Compilation &C, const ArgList &Args,		Action *Driver::ConstructPhaseAction(
phases::ID Phase, Action *Input) const {		Compilation &C, const ArgList &Args, phases::ID Phase, Action *Input,
		Action::OffloadKind TargetDeviceOffloadKind) const {
llvm::PrettyStackTraceString CrashInfo("Constructing phase actions");		llvm::PrettyStackTraceString CrashInfo("Constructing phase actions");

// Some types skip the assembler phase (e.g., llvm-bc), but we can't		// Some types skip the assembler phase (e.g., llvm-bc), but we can't
// encode this in the steps because the intermediate type depends on		// encode this in the steps because the intermediate type depends on
// arguments. Just special case here.		// arguments. Just special case here.
if (Phase == phases::Assemble && Input->getType() != types::TY_PP_Asm)		if (Phase == phases::Assemble && Input->getType() != types::TY_PP_Asm)
return Input;		return Input;

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (Args.hasArg(options::OPT_emit_ast))
return C.MakeAction<CompileJobAction>(Input, types::TY_AST);		return C.MakeAction<CompileJobAction>(Input, types::TY_AST);
if (Args.hasArg(options::OPT_module_file_info))		if (Args.hasArg(options::OPT_module_file_info))
return C.MakeAction<CompileJobAction>(Input, types::TY_ModuleFile);		return C.MakeAction<CompileJobAction>(Input, types::TY_ModuleFile);
if (Args.hasArg(options::OPT_verify_pch))		if (Args.hasArg(options::OPT_verify_pch))
return C.MakeAction<VerifyPCHJobAction>(Input, types::TY_Nothing);		return C.MakeAction<VerifyPCHJobAction>(Input, types::TY_Nothing);
return C.MakeAction<CompileJobAction>(Input, types::TY_LLVM_BC);		return C.MakeAction<CompileJobAction>(Input, types::TY_LLVM_BC);
}		}
case phases::Backend: {		case phases::Backend: {
if (isUsingLTO()) {		if (isUsingLTO() && TargetDeviceOffloadKind == Action::OFK_None) {
types::ID Output =		types::ID Output =
Args.hasArg(options::OPT_S) ? types::TY_LTO_IR : types::TY_LTO_BC;		Args.hasArg(options::OPT_S) ? types::TY_LTO_IR : types::TY_LTO_BC;
return C.MakeAction<BackendJobAction>(Input, Output);		return C.MakeAction<BackendJobAction>(Input, Output);
}		}
if (Args.hasArg(options::OPT_emit_llvm)) {		if (Args.hasArg(options::OPT_emit_llvm)) {
types::ID Output =		types::ID Output =
Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;		Args.hasArg(options::OPT_S) ? types::TY_LLVM_IR : types::TY_LLVM_BC;
return C.MakeAction<BackendJobAction>(Input, Output);		return C.MakeAction<BackendJobAction>(Input, Output);
▲ Show 20 Lines • Show All 1,250 Lines • Show Last 20 Lines

cfe/trunk/lib/Driver/ToolChains/Clang.cpp

Show First 20 Lines • Show All 3,243 Lines • ▼ Show 20 Lines	if (isa<AnalyzeJobAction>(JA)) {

// Preserve use-list order by default when emitting bitcode, so that		// Preserve use-list order by default when emitting bitcode, so that
// loading the bitcode up in 'opt' or 'llc' and running passes gives the		// loading the bitcode up in 'opt' or 'llc' and running passes gives the
// same result as running passes here. For LTO, we don't need to preserve		// same result as running passes here. For LTO, we don't need to preserve
// the use-list order, since serialization to bitcode is part of the flow.		// the use-list order, since serialization to bitcode is part of the flow.
if (JA.getType() == types::TY_LLVM_BC)		if (JA.getType() == types::TY_LLVM_BC)
CmdArgs.push_back("-emit-llvm-uselists");		CmdArgs.push_back("-emit-llvm-uselists");

if (D.isUsingLTO()) {		// Device-side jobs do not support LTO.
		bool isDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) \|\|
		JA.isDeviceOffloading(Action::OFK_Host));

		if (D.isUsingLTO() && !isDeviceOffloadAction) {
Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);		Args.AddLastArg(CmdArgs, options::OPT_flto, options::OPT_flto_EQ);

// The Darwin and PS4 linkers currently use the legacy LTO API, which		// The Darwin and PS4 linkers currently use the legacy LTO API, which
// does not support LTO unit features (CFI, whole program vtable opt)		// does not support LTO unit features (CFI, whole program vtable opt)
// under ThinLTO.		// under ThinLTO.
if (!(RawTriple.isOSDarwin() \|\| RawTriple.isPS4()) \|\|		if (!(RawTriple.isOSDarwin() \|\| RawTriple.isPS4()) \|\|
D.getLTOMode() == LTOK_Full)		D.getLTOMode() == LTOK_Full)
CmdArgs.push_back("-flto-unit");		CmdArgs.push_back("-flto-unit");
▲ Show 20 Lines • Show All 2,357 Lines • Show Last 20 Lines

cfe/trunk/test/Driver/lto.cu

				// -flto causes a switch to llvm-bc object files.
				// RUN: %clangxx -nocudainc -nocudalib -ccc-print-phases -c %s -flto 2> %t
				// RUN: FileCheck -check-prefix=CHECK-COMPILE-ACTIONS < %t %s
				//
				// CHECK-COMPILE-ACTIONS: 2: compiler, {1}, ir, (host-cuda)
				// CHECK-COMPILE-ACTIONS-NOT: lto-bc
				// CHECK-COMPILE-ACTIONS: 12: backend, {11}, lto-bc, (host-cuda)

				// RUN: %clangxx -nocudainc -nocudalib -ccc-print-phases %s -flto 2> %t
				// RUN: FileCheck -check-prefix=CHECK-COMPILELINK-ACTIONS < %t %s
				//
				// CHECK-COMPILELINK-ACTIONS: 0: input, "{{.*}}lto.cu", cuda, (host-cuda)
				// CHECK-COMPILELINK-ACTIONS: 1: preprocessor, {0}, cuda-cpp-output
				// CHECK-COMPILELINK-ACTIONS: 2: compiler, {1}, ir, (host-cuda)
				// CHECK-COMPILELINK-ACTIONS: 3: input, "{{.*}}lto.cu", cuda, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 5: compiler, {4}, ir, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 6: backend, {5}, assembler, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 7: assembler, {6}, object, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {7}, object
				// CHECK-COMPILELINK-ACTIONS: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {6}, assembler
				// CHECK-COMPILELINK-ACTIONS: 10: linker, {8, 9}, cuda-fatbin, (device-cuda)
				// CHECK-COMPILELINK-ACTIONS: 11: offload, "host-cuda {{.}}" {2}, "device-cuda{{.}}" {10}, ir
				// CHECK-COMPILELINK-ACTIONS: 12: backend, {11}, lto-bc, (host-cuda)
				// CHECK-COMPILELINK-ACTIONS: 13: linker, {12}, image, (host-cuda)

				// llvm-bc and llvm-ll outputs need to match regular suffixes
				// (unfortunately).
				// RUN: %clangxx %s -nocudainc -nocudalib -flto -save-temps -### 2> %t
				// RUN: FileCheck -check-prefix=CHECK-COMPILELINK-SUFFIXES < %t %s
				//
				// CHECK-COMPILELINK-SUFFIXES: "-o" "[[CPP:.lto-host.\.cui]]" "-x" "cuda" "{{.*}}lto.cu"
				// CHECK-COMPILELINK-SUFFIXES: "-o" "[[BC:.lto-host.\.bc]]" {{.*}}[[CPP]]"
				// CHECK-COMPILELINK-SUFFIXES: "-o" "[[OBJ:.lto-host.\.o]]" {{.*}}[[BC]]"
				// CHECK-COMPILELINK-SUFFIXES: "{{.}}a.{{(out\|exe)}}" {{.}}[[OBJ]]"

				// RUN: %clangxx %s -nocudainc -nocudalib -flto -S -### 2> %t
				// RUN: FileCheck -check-prefix=CHECK-COMPILE-SUFFIXES < %t %s
				//
				// CHECK-COMPILE-SUFFIXES: "-o" "{{.}}lto.s" "-x" "cuda" "{{.}}lto.cu"

				// RUN: not %clangxx -nocudainc -nocudalib %s -emit-llvm 2>&1 \
				// RUN: \| FileCheck --check-prefix=LLVM-LINK %s
				// LLVM-LINK: -emit-llvm cannot be used when linking

				// -flto should cause link using gold plugin
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-LTO-ACTION < %t %s
				//
				// CHECK-LINK-LTO-ACTION: "-plugin" "{{.*}}{{[/\\]}}LLVMgold.{{dll\|dylib\|so}}"

				// -flto=full should cause link using gold plugin
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto=full 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-FULL-ACTION < %t %s
				//
				// CHECK-LINK-FULL-ACTION: "-plugin" "{{.*}}{{[/\\]}}LLVMgold.{{dll\|dylib\|so}}"

				// Check that subsequent -fno-lto takes precedence
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto=full -fno-lto 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-NOLTO-ACTION < %t %s
				//
				// CHECK-LINK-NOLTO-ACTION-NOT: "-plugin" "{{.*}}{{[/\\]}}LLVMgold.{{dll\|dylib\|so}}"

				// -flto passes along an explicit debugger tuning argument.
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto -glldb 2> %t
				// RUN: FileCheck -check-prefix=CHECK-TUNING-LLDB < %t %s
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto -g 2> %t
				// RUN: FileCheck -check-prefix=CHECK-NO-TUNING < %t %s
				//
				// CHECK-TUNING-LLDB: "-plugin-opt=-debugger-tune=lldb"
				// CHECK-NO-TUNING-NOT: "-plugin-opt=-debugger-tune

cfe/trunk/test/Driver/thinlto.cu

				// -flto=thin causes a switch to llvm-bc object files.
				// RUN: %clangxx -ccc-print-phases -nocudainc -nocudalib -c %s -flto=thin 2> %t
				// RUN: FileCheck -check-prefix=CHECK-COMPILE-ACTIONS < %t %s
				//
				// CHECK-COMPILE-ACTIONS: 2: compiler, {1}, ir, (host-cuda)
				// CHECK-COMPILE-ACTIONS-NOT: lto-bc
				// CHECK-COMPILE-ACTIONS: 12: backend, {11}, lto-bc, (host-cuda)

				// RUN: %clangxx -ccc-print-phases -nocudainc -nocudalib %s -flto=thin 2> %t
				// RUN: FileCheck -check-prefix=CHECK-COMPILELINK-ACTIONS < %t %s
				//
				// CHECK-COMPILELINK-ACTIONS: 0: input, "{{.*}}thinlto.cu", cuda, (host-cuda)
				// CHECK-COMPILELINK-ACTIONS: 1: preprocessor, {0}, cuda-cpp-output
				// CHECK-COMPILELINK-ACTIONS: 2: compiler, {1}, ir, (host-cuda)
				// CHECK-COMPILELINK-ACTIONS: 3: input, "{{.*}}thinlto.cu", cuda, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 5: compiler, {4}, ir, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 6: backend, {5}, assembler, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 7: assembler, {6}, object, (device-cuda, sm_20)
				// CHECK-COMPILELINK-ACTIONS: 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {7}, object
				// CHECK-COMPILELINK-ACTIONS: 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_20)" {6}, assembler
				// CHECK-COMPILELINK-ACTIONS: 10: linker, {8, 9}, cuda-fatbin, (device-cuda)
				// CHECK-COMPILELINK-ACTIONS: 11: offload, "host-cuda {{.}}" {2}, "device-cuda{{.}}" {10}, ir
				// CHECK-COMPILELINK-ACTIONS: 12: backend, {11}, lto-bc, (host-cuda)
				// CHECK-COMPILELINK-ACTIONS: 13: linker, {12}, image, (host-cuda)

				// -flto=thin should cause link using gold plugin with thinlto option,
				// also confirm that it takes precedence over earlier -fno-lto and -flto=full.
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto=full -fno-lto -flto=thin 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-THIN-ACTION < %t %s
				//
				// CHECK-LINK-THIN-ACTION: "-plugin" "{{.*}}{{[/\\]}}LLVMgold.{{dll\|dylib\|so}}"
				// CHECK-LINK-THIN-ACTION: "-plugin-opt=thinlto"

				// Check that subsequent -flto=full takes precedence
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto=thin -flto=full 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-FULL-ACTION < %t %s
				//
				// CHECK-LINK-FULL-ACTION: "-plugin" "{{.*}}{{[/\\]}}LLVMgold.{{dll\|dylib\|so}}"
				// CHECK-LINK-FULL-ACTION-NOT: "-plugin-opt=thinlto"

				// Check that subsequent -fno-lto takes precedence
				// RUN: %clangxx -nocudainc -nocudalib \
				// RUN: -target x86_64-unknown-linux -### %s -flto=thin -fno-lto 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-NOLTO-ACTION < %t %s
				//
				// CHECK-LINK-NOLTO-ACTION-NOT: "-plugin" "{{.*}}{{[/\\]}}LLVMgold.{{dll\|dylib\|so}}"
				// CHECK-LINK-NOLTO-ACTION-NOT: "-plugin-opt=thinlto"

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Disable LTO for device-side compilations.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 139381

cfe/trunk/include/clang/Driver/Driver.h

cfe/trunk/lib/Driver/Driver.cpp

cfe/trunk/lib/Driver/ToolChains/Clang.cpp

cfe/trunk/test/Driver/lto.cu

cfe/trunk/test/Driver/thinlto.cu

[CUDA] Disable LTO for device-side compilations.
ClosedPublic