This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
ELF/
-
Config.h
6
Driver.cpp
-
LTO.cpp
-
Options.td
-
test/ELF/
-
ELF/
-
basic.s
-
lto/
-
parallel-internalize.ll
-
parallel.ll
1
thinlto.ll

Differential D25452

[LTO] Split the options for ThinLTO jobs and Regular LTO partitions
ClosedPublic

Authored by davide on Oct 10 2016, 1:51 PM.

Download Raw Diff

Details

Reviewers

ruiu
pcc

Commits

rGb6e6e4a07443: [LTO] Split the options for ThinLTO jobs and Regular LTO partitions
rLLD283817: [LTO] Split the options for ThinLTO jobs and Regular LTO partitions
rL283817: [LTO] Split the options for ThinLTO jobs and Regular LTO partitions

Diff Detail

Event Timeline

davide updated this revision to Diff 74172.Oct 10 2016, 1:51 PM

davide retitled this revision from to [LTO] Split the options for ThinLTO jobs and Regular LTO partitions.

davide updated this object.

davide added reviewers: ruiu, pcc.

davide added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptOct 10 2016, 1:51 PM

ruiu added inline comments.Oct 10 2016, 1:56 PM

ELF/Driver.cpp
486	It is better to include the option name here. "--lto-partitions: number of thread must be >0"

davide updated this revision to Diff 74177.Oct 10 2016, 2:53 PM

pcc added inline comments.Oct 10 2016, 3:01 PM

ELF/Driver.cpp
489	I think this would result in a default parallelism level of 1. Can we avoid passing a ThinBackend if the user did not specify a parallelism level?

davide updated this revision to Diff 74182.Oct 10 2016, 3:39 PM

LGTM

ELF/Driver.cpp
489	Maybe `-1u` here and below?

This revision is now accepted and ready to land.Oct 10 2016, 4:13 PM

mehdi_amini added inline comments.Oct 10 2016, 4:27 PM

ELF/Driver.cpp
489	Doesn't it mean that if the user does not specify `--thinlto-jobs` on the command line, ThinLTO won't work? I think it would even lead to the LTO library crashing if the bitcode has been generated with ThinLTO? Is there a test that shows what happen in this case? (I don't see any, but I didn't look closely) The default should be something like `std::hardware_concurrency`.

pcc added inline comments.Oct 10 2016, 4:39 PM

ELF/Driver.cpp
489	No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229 I think we will eventually want to replace the `thread::hardware_concurrency()` default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that `hardware_concurrency` leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients. You're right, we should have a test that does not pass a `--thinlto-jobs` flag. @davide can you please add one?
test/ELF/lto/thinlto.ll
2	I think you need to use `opt -module-summary` here.

mehdi_amini added inline comments.Oct 10 2016, 5:00 PM

ELF/Driver.cpp
489	No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229 OK! Great :) I think we will eventually want to replace the thread::hardware_concurrency() default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that hardware_concurrency leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients. Yes, Apple's clang is using the number of physical cores because of that, but my rational was to limit the memory consumption at a time where debug info where even more costly than now for ThinLTO. So indeed we plan to add support for this in LLVM. On the other hand, David and Teresa benchmarks building Chromium on a 16-cores Xeon machine (32 hyper threaded cores) and they reported 2m18s for the ThinLTO plugin with 32 threads vs 3m38s with 32 threads. So your link is puzzling...

Closed by commit rL283817: [LTO] Split the options for ThinLTO jobs and Regular LTO partitions (authored by davide). · Explain WhyOct 11 2016, 3:04 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

ELF/

3 lines

5 lines

7 lines

4 lines

test/

ELF/

basic.s

2 lines

lto/

parallel-internalize.ll

3 lines

parallel.ll

2 lines

thinlto.ll

4 lines

Diff 74172

ELF/Config.h

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	struct Configuration {
BuildIdKind BuildId = BuildIdKind::None;		BuildIdKind BuildId = BuildIdKind::None;
ELFKind EKind = ELFNoneKind;		ELFKind EKind = ELFNoneKind;
uint16_t DefaultSymbolVersion = llvm::ELF::VER_NDX_GLOBAL;		uint16_t DefaultSymbolVersion = llvm::ELF::VER_NDX_GLOBAL;
uint16_t EMachine = llvm::ELF::EM_NONE;		uint16_t EMachine = llvm::ELF::EM_NONE;
uint64_t EntryAddr = 0;		uint64_t EntryAddr = 0;
uint64_t ImageBase;		uint64_t ImageBase;
uint64_t MaxPageSize;		uint64_t MaxPageSize;
uint64_t ZStackSize;		uint64_t ZStackSize;
unsigned LtoJobs;		unsigned LtoPartitions;
unsigned LtoO;		unsigned LtoO;
unsigned Optimize;		unsigned Optimize;
		unsigned ThinLtoJobs;
};		};

// The only instance of Configuration struct.		// The only instance of Configuration struct.
extern Configuration *Config;		extern Configuration *Config;

} // namespace elf		} // namespace elf
} // namespace lld		} // namespace lld

#endif		#endif

ELF/Driver.cpp

Show First 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	void LinkerDriver::readConfigs(opt::InputArgList &Args) {
Config->OutputFile = getString(Args, OPT_o);		Config->OutputFile = getString(Args, OPT_o);
Config->SoName = getString(Args, OPT_soname);		Config->SoName = getString(Args, OPT_soname);
Config->Sysroot = getString(Args, OPT_sysroot);		Config->Sysroot = getString(Args, OPT_sysroot);

Config->Optimize = getInteger(Args, OPT_O, 1);		Config->Optimize = getInteger(Args, OPT_O, 1);
Config->LtoO = getInteger(Args, OPT_lto_O, 2);		Config->LtoO = getInteger(Args, OPT_lto_O, 2);
if (Config->LtoO > 3)		if (Config->LtoO > 3)
error("invalid optimization level for LTO: " + getString(Args, OPT_lto_O));		error("invalid optimization level for LTO: " + getString(Args, OPT_lto_O));
Config->LtoJobs = getInteger(Args, OPT_lto_jobs, 1);		Config->LtoPartitions = getInteger(Args, OPT_lto_partitions, 1);
if (Config->LtoJobs == 0)		if (Config->LtoPartitions == 0)
error("number of threads must be > 0");		error("number of threads must be > 0");
		ruiuUnsubmitted Not Done Reply Inline Actions It is better to include the option name here. "--lto-partitions: number of thread must be >0" ruiu: It is better to include the option name here. "--lto-partitions: number of thread must be >0"
		Config->ThinLtoJobs = getInteger(Args, OPT_thinlto_jobs, 1);

Config->ZCombreloc = !hasZOption(Args, "nocombreloc");		Config->ZCombreloc = !hasZOption(Args, "nocombreloc");
		pccUnsubmitted Not Done Reply Inline Actions I think this would result in a default parallelism level of 1. Can we avoid passing a ThinBackend if the user did not specify a parallelism level? pcc: I think this would result in a default parallelism level of 1. Can we avoid passing a…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Doesn't it mean that if the user does not specify `--thinlto-jobs` on the command line, ThinLTO won't work? I think it would even lead to the LTO library crashing if the bitcode has been generated with ThinLTO? Is there a test that shows what happen in this case? (I don't see any, but I didn't look closely) The default should be something like `std::hardware_concurrency`. mehdi_amini: Doesn't it mean that if the user does not specify `--thinlto-jobs` on the command line, ThinLTO…
		pccUnsubmitted Not Done Reply Inline Actions No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229 I think we will eventually want to replace the `thread::hardware_concurrency()` default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that `hardware_concurrency` leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients. You're right, we should have a test that does not pass a `--thinlto-jobs` flag. @davide can you please add one? pcc: No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229 OK! Great :) I think we will eventually want to replace the thread::hardware_concurrency() default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that hardware_concurrency leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients. Yes, Apple's clang is using the number of physical cores because of that, but my rational was to limit the memory consumption at a time where debug info where even more costly than now for ThinLTO. So indeed we plan to add support for this in LLVM. On the other hand, David and Teresa benchmarks building Chromium on a 16-cores Xeon machine (32 hyper threaded cores) and they reported 2m18s for the ThinLTO plugin with 32 threads vs 3m38s with 32 threads. So your link is puzzling... mehdi_amini: > No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.
		pccUnsubmitted Not Done Reply Inline Actions Maybe `-1u` here and below? pcc: Maybe `-1u` here and below?
Config->ZExecStack = hasZOption(Args, "execstack");		Config->ZExecStack = hasZOption(Args, "execstack");
Config->ZNodelete = hasZOption(Args, "nodelete");		Config->ZNodelete = hasZOption(Args, "nodelete");
Config->ZNow = hasZOption(Args, "now");		Config->ZNow = hasZOption(Args, "now");
Config->ZOrigin = hasZOption(Args, "origin");		Config->ZOrigin = hasZOption(Args, "origin");
Config->ZRelro = !hasZOption(Args, "norelro");		Config->ZRelro = !hasZOption(Args, "norelro");

if (!Config->Relocatable)		if (!Config->Relocatable)
Config->Strip = getStripOption(Args);		Config->Strip = getStripOption(Args);
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

ELF/LTO.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	static std::unique_ptr<lto::LTO> createLTO() {
Conf.OptPipeline = Config->LtoNewPmPasses;		Conf.OptPipeline = Config->LtoNewPmPasses;
Conf.AAPipeline = Config->LtoAAPipeline;		Conf.AAPipeline = Config->LtoAAPipeline;

if (Config->SaveTemps)		if (Config->SaveTemps)
checkError(Conf.addSaveTemps(std::string(Config->OutputFile) + ".",		checkError(Conf.addSaveTemps(std::string(Config->OutputFile) + ".",
/UseInputModulePath/ true));		/UseInputModulePath/ true));

lto::ThinBackend Backend;		lto::ThinBackend Backend;
if (Config->LtoJobs)		if (Config->ThinLtoJobs)
Backend = lto::createInProcessThinBackend(Config->LtoJobs);		Backend = lto::createInProcessThinBackend(Config->ThinLtoJobs);
return llvm::make_unique<lto::LTO>(std::move(Conf), Backend, Config->LtoJobs);		return llvm::make_unique<lto::LTO>(std::move(Conf), Backend,
		Config->LtoPartitions);
}		}

BitcodeCompiler::BitcodeCompiler() : LtoObj(createLTO()) {}		BitcodeCompiler::BitcodeCompiler() : LtoObj(createLTO()) {}

BitcodeCompiler::~BitcodeCompiler() {}		BitcodeCompiler::~BitcodeCompiler() {}

static void undefine(Symbol *S) {		static void undefine(Symbol *S) {
replaceBody<Undefined>(S, S->body()->getName(), STV_DEFAULT, S->body()->Type,		replaceBody<Undefined>(S, S->body()->getName(), STV_DEFAULT, S->body()->Type,
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

ELF/Options.td

	Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines
	// Aliases for ignored options			// Aliases for ignored options
	def alias_define_common_d: Flag<["-"], "d">, Alias<define_common>;			def alias_define_common_d: Flag<["-"], "d">, Alias<define_common>;
	def alias_define_common_dc: F<"dc">, Alias<define_common>;			def alias_define_common_dc: F<"dc">, Alias<define_common>;
	def alias_define_common_dp: F<"dp">, Alias<define_common>;			def alias_define_common_dp: F<"dp">, Alias<define_common>;
	def alias_version_script_version_script: J<"version-script=">,			def alias_version_script_version_script: J<"version-script=">,
	Alias<version_script>;			Alias<version_script>;

	// LTO-related options.			// LTO-related options.
	def lto_jobs: J<"lto-jobs=">, HelpText<"Number of threads to run codegen">;
	def lto_aa_pipeline: J<"lto-aa-pipeline=">,			def lto_aa_pipeline: J<"lto-aa-pipeline=">,
	HelpText<"AA pipeline to run during LTO. Used in conjunction with -lto-newpm-passes">;			HelpText<"AA pipeline to run during LTO. Used in conjunction with -lto-newpm-passes">;
	def lto_newpm_passes: J<"lto-newpm-passes=">,			def lto_newpm_passes: J<"lto-newpm-passes=">,
	HelpText<"Passes to run during LTO">;			HelpText<"Passes to run during LTO">;
				def lto_partitions: J<"lto-partitions=">,
				HelpText<"Number of LTO codegen partitions">;
	def disable_verify: F<"disable-verify">;			def disable_verify: F<"disable-verify">;
	def mllvm: S<"mllvm">;			def mllvm: S<"mllvm">;
	def save_temps: F<"save-temps">;			def save_temps: F<"save-temps">;
				def thinlto_jobs: J<"thinlto-jobs=">, HelpText<"Number of ThinLTO jobs">;

test/ELF/basic.s

	Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines

	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t			# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
	# RUN: not ld.lld %t %t -o %t2 2>&1 \| FileCheck --check-prefix=DUP %s			# RUN: not ld.lld %t %t -o %t2 2>&1 \| FileCheck --check-prefix=DUP %s
	# DUP: duplicate symbol: _start in {{.}} and {{.}}			# DUP: duplicate symbol: _start in {{.}} and {{.}}

	# RUN: not ld.lld %t -o %t -m wrong_emul_fbsd 2>&1 \| FileCheck --check-prefix=UNKNOWN_EMUL %s			# RUN: not ld.lld %t -o %t -m wrong_emul_fbsd 2>&1 \| FileCheck --check-prefix=UNKNOWN_EMUL %s
	# UNKNOWN_EMUL: unknown emulation: wrong_emul_fbsd			# UNKNOWN_EMUL: unknown emulation: wrong_emul_fbsd

	# RUN: not ld.lld %t --lto-jobs=0 2>&1 \| FileCheck --check-prefix=NOTHREADS %s			# RUN: not ld.lld %t --lto-partitions=0 2>&1 \| FileCheck --check-prefix=NOTHREADS %s
	# NOTHREADS: number of threads must be > 0			# NOTHREADS: number of threads must be > 0

test/ELF/lto/parallel-internalize.ll

	; REQUIRES: x86			; REQUIRES: x86
	; RUN: llvm-as -o %t.bc %s			; RUN: llvm-as -o %t.bc %s
	; RUN: ld.lld -m elf_x86_64 --lto-jobs=2 -save-temps -o %t %t.bc -e foo --lto-O0			; RUN: ld.lld -m elf_x86_64 --lto-partitions=2 -save-temps -o %t %t.bc \
				; RUN: -e foo --lto-O0
	; RUN: llvm-readobj -t -dyn-symbols %t \| FileCheck %s			; RUN: llvm-readobj -t -dyn-symbols %t \| FileCheck %s
	; RUN: llvm-nm %t0.lto.o \| FileCheck --check-prefix=CHECK0 %s			; RUN: llvm-nm %t0.lto.o \| FileCheck --check-prefix=CHECK0 %s
	; RUN: llvm-nm %t1.lto.o \| FileCheck --check-prefix=CHECK1 %s			; RUN: llvm-nm %t1.lto.o \| FileCheck --check-prefix=CHECK1 %s

	; CHECK: Symbols [			; CHECK: Symbols [
	; CHECK-NEXT: Symbol {			; CHECK-NEXT: Symbol {
	; CHECK-NEXT: Name: (0)			; CHECK-NEXT: Name: (0)
	; CHECK-NEXT: Value: 0x0			; CHECK-NEXT: Value: 0x0
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/ELF/lto/parallel.ll

	; REQUIRES: x86			; REQUIRES: x86
	; RUN: llvm-as -o %t.bc %s			; RUN: llvm-as -o %t.bc %s
	; RUN: ld.lld -m elf_x86_64 --lto-jobs=2 -save-temps -o %t %t.bc -shared			; RUN: ld.lld -m elf_x86_64 --lto-partitions=2 -save-temps -o %t %t.bc -shared
	; RUN: llvm-nm %t0.lto.o \| FileCheck --check-prefix=CHECK0 %s			; RUN: llvm-nm %t0.lto.o \| FileCheck --check-prefix=CHECK0 %s
	; RUN: llvm-nm %t1.lto.o \| FileCheck --check-prefix=CHECK1 %s			; RUN: llvm-nm %t1.lto.o \| FileCheck --check-prefix=CHECK1 %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; CHECK0-NOT: bar			; CHECK0-NOT: bar
	; CHECK0: T foo			; CHECK0: T foo
	Show All 13 Lines

test/ELF/lto/thinlto.ll

	; Basic ThinLTO tests.			; Basic ThinLTO tests.
	; RUN: llvm-as %s -o %t.o			; RUN: llvm-as %s -o %t.o
				pccUnsubmitted Not Done Reply Inline Actions I think you need to use `opt -module-summary` here. pcc: I think you need to use `opt -module-summary` here.
	; RUN: llvm-as %p/Inputs/thinlto.ll -o %t2.o			; RUN: llvm-as %p/Inputs/thinlto.ll -o %t2.o

	; First force single-threaded mode			; First force single-threaded mode
	; RUN: ld.lld -save-temps --lto-jobs=1 -shared %t.o %t2.o -o %t			; RUN: ld.lld -save-temps --thinlto-jobs=1 -shared %t.o %t2.o -o %t
	; RUN: llvm-nm %t.lto.o \| FileCheck %s --check-prefix=NM			; RUN: llvm-nm %t.lto.o \| FileCheck %s --check-prefix=NM

	; NM: T f			; NM: T f
	; NM: T g			; NM: T g

	; Next force multi-threaded mode			; Next force multi-threaded mode
	; RUN: ld.lld -save-temps --lto-jobs=2 -shared %t.o %t2.o -o %t2			; RUN: ld.lld -save-temps --thinlto-jobs=2 -shared %t.o %t2.o -o %t2
	; RUN: llvm-nm %t20.lto.o \| FileCheck %s --check-prefix=NM1			; RUN: llvm-nm %t20.lto.o \| FileCheck %s --check-prefix=NM1
	; RUN: llvm-nm %t21.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t21.lto.o \| FileCheck %s --check-prefix=NM2

	; NM1: T g			; NM1: T g
	; NM2: T f			; NM2: T f

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	declare void @g(...)			declare void @g(...)

	define void @f() {			define void @f() {
	entry:			entry:
	call void (...) @g()			call void (...) @g()
	ret void			ret void
	}			}