This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/Driver/
-
clang/
-
Driver/
1
Options.td
-
lib/Driver/
-
Driver/
-
Tools.cpp
-
test/Driver/
-
Driver/
-
lto-jobs.c

Differential D24826

[LTO] Add -flto-jobs=N to control backend parallelism
ClosedPublic

Authored by tejohnson on Sep 22 2016, 6:57 AM.

Download Raw Diff

Details

Reviewers

mehdi_amini

Commits

rG12286d22b796: [LTO] Add -flto-jobs=N to control backend parallelism
rC282291: [LTO] Add -flto-jobs=N to control backend parallelism
rL282291: [LTO] Add -flto-jobs=N to control backend parallelism

Summary

Currently, a linker option must be used to control the backend
parallelism of ThinLTO or full LTO (where it only applies to
parallel code gen). The linker option varies depending on the
linker (e.g. gold vs ld64). Add a new clang option -flto-jobs=N
to control this.

I've added in the wiring to pass this to the gold plugin. I also
added in the logic to pass this down in the form I understand that
ld64 uses on MacOS, for the darwin target.

Diff Detail

Event Timeline

tejohnson updated this revision to Diff 72164.Sep 22 2016, 6:57 AM

tejohnson retitled this revision from to [LTO] Add -flto-jobs=N to control backend parallelism.

tejohnson updated this object.

tejohnson added a reviewer: mehdi_amini.

tejohnson added a subscriber: cfe-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptSep 22 2016, 6:57 AM

The Gold path looks fine.
On OSX, we would have the clang driver relying on a LLVM cl::opt, for which I don't think there is any precedent. CC Duncan for advice.

Also I don't think the same option should be used for the parallel LTO codegen: it actually does not generate the same binary, which should deserve a dedicated opt-in (What if I mix ThinLTO and LTO, and I don't want // codegen?)

In D24826#549788, @mehdi_amini wrote:

The Gold path looks fine.
On OSX, we would have the clang driver relying on a LLVM cl::opt, for which I don't think there is any precedent. CC Duncan for advice.

I do see other uses of -mllvm in lib/Driver/Tools.cpp, but are you talking about something else?

Also I don't think the same option should be used for the parallel LTO codegen: it actually does not generate the same binary, which should deserve a dedicated opt-in (What if I mix ThinLTO and LTO, and I don't want // codegen?)

Ok good point. I can change this to -fthinlto_jobs. However, while the two parallel settings are separate in the LTO API, currently the gold-plugin jobs option controls both, so I will need to do a preparatory gold-plugin patch to split this into thinlto_jobs and lto_jobs. On the libLTO/ld64 path, looks like the current -mllvm -threads only affects ThinLTO so there is no work to do there.

tejohnson mentioned this in D24873: [gold] Split plugin options controlling ThinLTO and codegen parallelism..Sep 23 2016, 9:37 AM

Update option description as per decision to split from parallel code gen.

I do see other uses of -mllvm in lib/Driver/Tools.cpp, but are you talking about something else?

I think this is okay, since clang is talking to the same version of libLTO.dylib. I feel like there might be another case where
clang talks to libLTO.dylib through ld64 using -mllvm... perhaps, -O0?

Let's ask around though to be sure.

Ok, let me know what you find out.

Ok good point. I can change this to -fthinlto_jobs. However, while the two parallel settings are separate in the LTO API, currently the gold-plugin jobs option controls both, so I will need to do a preparatory gold-plugin patch to split this into thinlto_jobs and lto_jobs. On the libLTO/ld64 path, looks like the current -mllvm -threads only affects ThinLTO so there is no work to do there.

I actually like -flto-jobs=N better for this. I expect "jobs" not to affect output at all.

I think the current parallel FullLTO CodeGen (where it *does* affect output) should have a special name that calls this out, perhaps -flto-partitions=N? -flto-slices=N? -flto-random-partitions=N? Is it urgent to add that flag now though?

Note that I imagine someone will parallelizing FullLTO the hard way in the future, which won't affect output. That implementation should use -flto-jobs=N.

Ok, sure that seems reasonable. I changed the option documentation to note that this is currently just for ThinLTO. See also D24873 where I split the gold-plugin options.

tejohnson added a subscriber: pcc.Sep 23 2016, 9:42 AM

mehdi_amini added inline comments.Sep 23 2016, 10:31 AM

include/clang/Driver/Options.td
818	`std::thread::hardware_concurrency` is a bit implementation specific, can we find a better formulation? Something like `(default of 0 means the number of thread will be derived from the number of CPUs detected)`?

Update option help message per Mehdi's suggestion

tejohnson added a parent revision: D24873: [gold] Split plugin options controlling ThinLTO and codegen parallelism..Sep 23 2016, 11:55 AM

LGTM, Thanks!

This revision is now accepted and ready to land.Sep 23 2016, 12:50 PM

tejohnson mentioned this in rL282290: [gold] Split plugin options controlling ThinLTO and codegen parallelism..Sep 23 2016, 1:44 PM

Closed by commit rL282291: [LTO] Add -flto-jobs=N to control backend parallelism (authored by tejohnson). · Explain WhySep 23 2016, 1:47 PM

This revision was automatically updated to reflect the committed changes.

This is breaking our android lldb build, because it uses std::to_string. Looks like there is llvm::to_string, which should be preferred. Would someone mind changing it? I don't have commit access or I would myself. :)

Thanks in advance

Revision Contents

Path

Size

include/

clang/

Driver/

Options.td

5 lines

lib/

Driver/

Tools.cpp

30 lines

test/

Driver/

lto-jobs.c

11 lines

Diff 72164

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 806 Lines • ▼ Show 20 Lines
	def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group<f_Group>;			def flax_vector_conversions : Flag<["-"], "flax-vector-conversions">, Group<f_Group>;
	def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group<f_Group>;			def flimited_precision_EQ : Joined<["-"], "flimited-precision=">, Group<f_Group>;
	def flto_EQ : Joined<["-"], "flto=">, Flags<[CC1Option]>, Group<f_Group>,			def flto_EQ : Joined<["-"], "flto=">, Flags<[CC1Option]>, Group<f_Group>,
	HelpText<"Set LTO mode to either 'full' or 'thin'">;			HelpText<"Set LTO mode to either 'full' or 'thin'">;
	def flto : Flag<["-"], "flto">, Flags<[CC1Option]>, Group<f_Group>,			def flto : Flag<["-"], "flto">, Flags<[CC1Option]>, Group<f_Group>,
	HelpText<"Enable LTO in 'full' mode">;			HelpText<"Enable LTO in 'full' mode">;
	def fno_lto : Flag<["-"], "fno-lto">, Group<f_Group>,			def fno_lto : Flag<["-"], "fno-lto">, Group<f_Group>,
	HelpText<"Disable LTO mode (default)">;			HelpText<"Disable LTO mode (default)">;
				def flto_jobs_EQ : Joined<["-"], "flto-jobs=">,
				Flags<[CC1Option]>, Group<f_Group>,
				HelpText<"Controls the backend parallelism of -flto=thin (default "
				"std::thread::hardware_concurrency) or the code generation "
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions `std::thread::hardware_concurrency` is a bit implementation specific, can we find a better formulation? Something like `(default of 0 means the number of thread will be derived from the number of CPUs detected)`? mehdi_amini: `std::thread::hardware_concurrency` is a bit implementation specific, can we find a better…
				"parallelism of -flto=full (default 1)">;
	def fthinlto_index_EQ : Joined<["-"], "fthinlto-index=">,			def fthinlto_index_EQ : Joined<["-"], "fthinlto-index=">,
	Flags<[CC1Option]>, Group<f_Group>,			Flags<[CC1Option]>, Group<f_Group>,
	HelpText<"Perform ThinLTO importing using provided function summary index">;			HelpText<"Perform ThinLTO importing using provided function summary index">;
	def fmacro_backtrace_limit_EQ : Joined<["-"], "fmacro-backtrace-limit=">,			def fmacro_backtrace_limit_EQ : Joined<["-"], "fmacro-backtrace-limit=">,
	Group<f_Group>, Flags<[DriverOption, CoreOption]>;			Group<f_Group>, Flags<[DriverOption, CoreOption]>;
	def fmerge_all_constants : Flag<["-"], "fmerge-all-constants">, Group<f_Group>;			def fmerge_all_constants : Flag<["-"], "fmerge-all-constants">, Group<f_Group>;
	def fmessage_length_EQ : Joined<["-"], "fmessage-length=">, Group<f_Group>;			def fmessage_length_EQ : Joined<["-"], "fmessage-length=">, Group<f_Group>;
	def fms_extensions : Flag<["-"], "fms-extensions">, Group<f_Group>, Flags<[CC1Option, CoreOption]>,			def fms_extensions : Flag<["-"], "fms-extensions">, Group<f_Group>, Flags<[CC1Option, CoreOption]>,
	▲ Show 20 Lines • Show All 1,480 Lines • Show Last 20 Lines

lib/Driver/Tools.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,992 Lines • ▼ Show 20 Lines	case llvm::Triple::amdgcn:
return getR600TargetGPU(Args);		return getR600TargetGPU(Args);

case llvm::Triple::wasm32:		case llvm::Triple::wasm32:
case llvm::Triple::wasm64:		case llvm::Triple::wasm64:
return getWebAssemblyTargetCPU(Args);		return getWebAssemblyTargetCPU(Args);
}		}
}		}

		static unsigned getLTOParallelism(const ArgList &Args, const Driver &D) {
		unsigned Parallelism = 0;
		Arg *LtoJobsArg = Args.getLastArg(options::OPT_flto_jobs_EQ);
		if (LtoJobsArg &&
		StringRef(LtoJobsArg->getValue()).getAsInteger(10, Parallelism))
		D.Diag(diag::err_drv_invalid_int_value) << LtoJobsArg->getAsString(Args)
		<< LtoJobsArg->getValue();
		return Parallelism;
		}

static void AddGoldPlugin(const ToolChain &ToolChain, const ArgList &Args,		static void AddGoldPlugin(const ToolChain &ToolChain, const ArgList &Args,
ArgStringList &CmdArgs, bool IsThinLTO) {		ArgStringList &CmdArgs, bool IsThinLTO,
		const Driver &D) {
// Tell the linker to load the plugin. This has to come before AddLinkerInputs		// Tell the linker to load the plugin. This has to come before AddLinkerInputs
// as gold requires -plugin to come before any -plugin-opt that -Wl might		// as gold requires -plugin to come before any -plugin-opt that -Wl might
// forward.		// forward.
CmdArgs.push_back("-plugin");		CmdArgs.push_back("-plugin");
std::string Plugin =		std::string Plugin =
ToolChain.getDriver().Dir + "/../lib" CLANG_LIBDIR_SUFFIX "/LLVMgold.so";		ToolChain.getDriver().Dir + "/../lib" CLANG_LIBDIR_SUFFIX "/LLVMgold.so";
CmdArgs.push_back(Args.MakeArgString(Plugin));		CmdArgs.push_back(Args.MakeArgString(Plugin));

Show All 16 Lines	else if (A->getOption().matches(options::OPT_O0))
OOpt = "0";		OOpt = "0";
if (!OOpt.empty())		if (!OOpt.empty())
CmdArgs.push_back(Args.MakeArgString(Twine("-plugin-opt=O") + OOpt));		CmdArgs.push_back(Args.MakeArgString(Twine("-plugin-opt=O") + OOpt));
}		}

if (IsThinLTO)		if (IsThinLTO)
CmdArgs.push_back("-plugin-opt=thinlto");		CmdArgs.push_back("-plugin-opt=thinlto");

		if (unsigned Parallelism = getLTOParallelism(Args, D))
		CmdArgs.push_back(Args.MakeArgString(Twine("-plugin-opt=jobs=") +
		std::to_string(Parallelism)));

// If an explicit debugger tuning argument appeared, pass it along.		// If an explicit debugger tuning argument appeared, pass it along.
if (Arg *A = Args.getLastArg(options::OPT_gTune_Group,		if (Arg *A = Args.getLastArg(options::OPT_gTune_Group,
options::OPT_ggdbN_Group)) {		options::OPT_ggdbN_Group)) {
if (A->getOption().matches(options::OPT_glldb))		if (A->getOption().matches(options::OPT_glldb))
CmdArgs.push_back("-plugin-opt=-debugger-tune=lldb");		CmdArgs.push_back("-plugin-opt=-debugger-tune=lldb");
else if (A->getOption().matches(options::OPT_gsce))		else if (A->getOption().matches(options::OPT_gsce))
CmdArgs.push_back("-plugin-opt=-debugger-tune=sce");		CmdArgs.push_back("-plugin-opt=-debugger-tune=sce");
else		else
▲ Show 20 Lines • Show All 5,591 Lines • ▼ Show 20 Lines	void cloudabi::Linker::ConstructJob(Compilation &C, const JobAction &JA,

Args.AddAllArgs(CmdArgs, options::OPT_L);		Args.AddAllArgs(CmdArgs, options::OPT_L);
ToolChain.AddFilePathLibArgs(Args, CmdArgs);		ToolChain.AddFilePathLibArgs(Args, CmdArgs);
Args.AddAllArgs(CmdArgs,		Args.AddAllArgs(CmdArgs,
{options::OPT_T_Group, options::OPT_e, options::OPT_s,		{options::OPT_T_Group, options::OPT_e, options::OPT_s,
options::OPT_t, options::OPT_Z_Flag, options::OPT_r});		options::OPT_t, options::OPT_Z_Flag, options::OPT_r});

if (D.isUsingLTO())		if (D.isUsingLTO())
AddGoldPlugin(ToolChain, Args, CmdArgs, D.getLTOMode() == LTOK_Thin);		AddGoldPlugin(ToolChain, Args, CmdArgs, D.getLTOMode() == LTOK_Thin, D);

AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs);		AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs);

if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs)) {		if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs)) {
if (D.CCCIsCXX())		if (D.CCCIsCXX())
ToolChain.AddCXXStdlibLibArgs(Args, CmdArgs);		ToolChain.AddCXXStdlibLibArgs(Args, CmdArgs);
CmdArgs.push_back("-lc");		CmdArgs.push_back("-lc");
CmdArgs.push_back("-lcompiler_rt");		CmdArgs.push_back("-lcompiler_rt");
▲ Show 20 Lines • Show All 404 Lines • ▼ Show 20 Lines	if (LinkingOutput) {
CmdArgs.push_back(LinkingOutput);		CmdArgs.push_back(LinkingOutput);
}		}

if (Args.hasArg(options::OPT_fnested_functions))		if (Args.hasArg(options::OPT_fnested_functions))
CmdArgs.push_back("-allow_stack_execute");		CmdArgs.push_back("-allow_stack_execute");

getMachOToolChain().addProfileRTLibs(Args, CmdArgs);		getMachOToolChain().addProfileRTLibs(Args, CmdArgs);

		if (unsigned Parallelism =
		getLTOParallelism(Args, getToolChain().getDriver())) {
		CmdArgs.push_back("-mllvm");
		CmdArgs.push_back(
		Args.MakeArgString(Twine("-threads=") + std::to_string(Parallelism)));
		}

if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs)) {		if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs)) {
if (getToolChain().getDriver().CCCIsCXX())		if (getToolChain().getDriver().CCCIsCXX())
getToolChain().AddCXXStdlibLibArgs(Args, CmdArgs);		getToolChain().AddCXXStdlibLibArgs(Args, CmdArgs);

// link_ssp spec is empty.		// link_ssp spec is empty.

// Let the tool chain choose which runtime library to link.		// Let the tool chain choose which runtime library to link.
getMachOToolChain().AddLinkRuntimeLibArgs(Args, CmdArgs);		getMachOToolChain().AddLinkRuntimeLibArgs(Args, CmdArgs);
▲ Show 20 Lines • Show All 714 Lines • ▼ Show 20 Lines	void freebsd::Linker::ConstructJob(Compilation &C, const JobAction &JA,
Args.AddAllArgs(CmdArgs, options::OPT_T_Group);		Args.AddAllArgs(CmdArgs, options::OPT_T_Group);
Args.AddAllArgs(CmdArgs, options::OPT_e);		Args.AddAllArgs(CmdArgs, options::OPT_e);
Args.AddAllArgs(CmdArgs, options::OPT_s);		Args.AddAllArgs(CmdArgs, options::OPT_s);
Args.AddAllArgs(CmdArgs, options::OPT_t);		Args.AddAllArgs(CmdArgs, options::OPT_t);
Args.AddAllArgs(CmdArgs, options::OPT_Z_Flag);		Args.AddAllArgs(CmdArgs, options::OPT_Z_Flag);
Args.AddAllArgs(CmdArgs, options::OPT_r);		Args.AddAllArgs(CmdArgs, options::OPT_r);

if (D.isUsingLTO())		if (D.isUsingLTO())
AddGoldPlugin(ToolChain, Args, CmdArgs, D.getLTOMode() == LTOK_Thin);		AddGoldPlugin(ToolChain, Args, CmdArgs, D.getLTOMode() == LTOK_Thin, D);

bool NeedsSanitizerDeps = addSanitizerRuntimes(ToolChain, Args, CmdArgs);		bool NeedsSanitizerDeps = addSanitizerRuntimes(ToolChain, Args, CmdArgs);
AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs);		AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs);

if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs)) {		if (!Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs)) {
addOpenMPRuntime(CmdArgs, ToolChain, Args);		addOpenMPRuntime(CmdArgs, ToolChain, Args);
if (D.CCCIsCXX()) {		if (D.CCCIsCXX()) {
ToolChain.AddCXXStdlibLibArgs(Args, CmdArgs);		ToolChain.AddCXXStdlibLibArgs(Args, CmdArgs);
▲ Show 20 Lines • Show All 816 Lines • ▼ Show 20 Lines	void gnutools::Linker::ConstructJob(Compilation &C, const JobAction &JA,
}		}

Args.AddAllArgs(CmdArgs, options::OPT_L);		Args.AddAllArgs(CmdArgs, options::OPT_L);
Args.AddAllArgs(CmdArgs, options::OPT_u);		Args.AddAllArgs(CmdArgs, options::OPT_u);

ToolChain.AddFilePathLibArgs(Args, CmdArgs);		ToolChain.AddFilePathLibArgs(Args, CmdArgs);

if (D.isUsingLTO())		if (D.isUsingLTO())
AddGoldPlugin(ToolChain, Args, CmdArgs, D.getLTOMode() == LTOK_Thin);		AddGoldPlugin(ToolChain, Args, CmdArgs, D.getLTOMode() == LTOK_Thin, D);

if (Args.hasArg(options::OPT_Z_Xlinker__no_demangle))		if (Args.hasArg(options::OPT_Z_Xlinker__no_demangle))
CmdArgs.push_back("--no-demangle");		CmdArgs.push_back("--no-demangle");

bool NeedsSanitizerDeps = addSanitizerRuntimes(ToolChain, Args, CmdArgs);		bool NeedsSanitizerDeps = addSanitizerRuntimes(ToolChain, Args, CmdArgs);
bool NeedsXRayDeps = addXRayRuntime(ToolChain, Args, CmdArgs);		bool NeedsXRayDeps = addXRayRuntime(ToolChain, Args, CmdArgs);
AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs);		AddLinkerInputs(ToolChain, Inputs, Args, CmdArgs);
// The profile runtime also needs access to system libraries.		// The profile runtime also needs access to system libraries.
▲ Show 20 Lines • Show All 1,872 Lines • Show Last 20 Lines

test/Driver/lto-jobs.c

This file was added.

				// Confirm that -flto-jobs=N is passed to linker

				// RUN: %clang -target x86_64-unknown-linux -### %s -flto=thin -flto-jobs=5 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-THIN-JOBS-ACTION < %t %s
				//
				// CHECK-LINK-THIN-JOBS-ACTION: "-plugin-opt=jobs=5"

				// RUN: %clang -target x86_64-apple-darwin13.3.0 -### %s -flto=thin -flto-jobs=5 2> %t
				// RUN: FileCheck -check-prefix=CHECK-LINK-THIN-JOBS2-ACTION < %t %s
				//
				// CHECK-LINK-THIN-JOBS2-ACTION: "-mllvm" "-threads=5"

This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Add -flto-jobs=N to control backend parallelismClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 72164

include/clang/Driver/Options.td

lib/Driver/Tools.cpp

test/Driver/lto-jobs.c

[LTO] Add -flto-jobs=N to control backend parallelism
ClosedPublic