This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Driver/ToolChains/
-
lib/
-
Driver/
-
ToolChains/
-
CommonArgs.h
-
CommonArgs.cpp
-
Darwin.cpp
-
lld/
-
COFF/
-
Config.h
-
Driver.cpp
-
LTO.cpp
-
ELF/
-
Config.h
-
Driver.cpp
-
LTO.cpp
-
test/
-
COFF/
-
thinlto.ll
-
ELF/
2
basic.s
-
lto/
-
thinlto.ll
-
wasm/lto/
-
lto/
-
thinlto.ll
-
wasm/
-
Config.h
-
Driver.cpp
-
LTO.cpp
-
llvm/
-
include/llvm/
-
llvm/
-
LTO/
-
LTO.h
-
Support/
2/2
Threading.h
-
lib/
-
LTO/
-
LTO.cpp
-
Support/
-
Threading.cpp
-
Windows/
-
Threading.inc
-
test/Transforms/PGOProfile/
-
Transforms/
-
PGOProfile/
-
thinlto_samplepgo_icp3.ll
-
tools/
-
gold/
-
gold-plugin.cpp
-
llvm-lto2/
-
llvm-lto2.cpp

Differential D75153

[ThinLTO] Allow usage of all SMT threads in the system
ClosedPublic

Authored by aganea on Feb 25 2020, 8:20 PM.

Download Raw Diff

Details

Reviewers

tejohnson
thakis
rnk
mehdi_amini
dexonsmith
ruiu
• espindola
MaskRay

Commits

rG09158252f777: [ThinLTO] Allow usage of all hardware threads in the system

Summary

Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.

After this patch, one can say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows.

When N > number-of-hardware-threads-in-the-system, the std::threads will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the std::threads will remain on the CPU socket where the process started (only on Windows).

All cmd-line flags and code paths that lead to ThinLTO have been modified by this patch:
-flto-jobs=...
--thinlto-jobs=...
-thinlto-threads=...
--plugin-opt=jobs=...

This is a follow-up for: https://reviews.llvm.org/D71775#1891709 and: https://reviews.llvm.org/D71775#1892013

Diff Detail

Event Timeline

aganea created this revision.Feb 25 2020, 8:20 PM

Herald added a reviewer: • espindola. · View Herald TranscriptFeb 25 2020, 8:20 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, dang, dexonsmith and 9 others. · View Herald Transcript

aganea edited the summary of this revision. (Show Details)Feb 25 2020, 8:22 PM

ormris added a subscriber: ormris.Feb 26 2020, 1:38 PM

aganea edited the summary of this revision. (Show Details)Feb 27 2020, 6:24 AM

If a simpler (more yucky?) patch is needed to fix what @thakis was suggesting in https://reviews.llvm.org/D71775#1891709, and if we don't want this extra new flag, we can also check the CPU brand for "AMD Opteron", and keep the old behavior in that case.

Based upon the description, I think this patch is more applicable than just targeting a specific AMD proc family since it allows the end-user a choice for maximizing threading with both CMT and SMT on all supported platforms.

BTW, until if/when this patch lands, I just set a static value in source as a local workaround for now.

llvm/lib/Support/Host.cpp

#elif defined(_WIN32)
// Defined in llvm/lib/Support/Windows/Threading.inc
static int computeHostNumPhysicalCores() { return 32; }
#else

aganea added a reviewer: mehdi_amini.Mar 1 2020, 9:38 AM

/opt:lldltojobs=N -- limit usage to N threads, but constrained by usage of heavyweight_hardware_concurrency().

I really dislike this behavior: this seems user hostile to me. I would either:

honor the user request (eventually display a warning), this is in line with other system behavior like ninja -j N for instance.
reject the user request

If you want such a behavior, then it should be another flag which express it in the name like opt:lldltomaxnumjobs.

@mehdi_amini Agreed. In that case, I just need to slightly change this function to ensure threads are equally dispatched on all CPU sockets regardless of the thread strategy.

Simplify. Revert to previous behavior, where any number of threads can be specified on the cmd-line, as suggested by @mehdi_amini. See the updated Summary for cmd-line usage.

+ @dexonsmith for the Darwin part.
+ @ruiu for the LLD part.

Herald added a reviewer: • espindola. · View Herald TranscriptMar 4 2020, 11:24 AM

I'm busy and I haven't looked at the code in detail, but I'm OK with going back to the old way of doing things. I think honoring user requests to use more threads than cores is an important use case. We had plans at one point to add indirection points to allow us to distribute these backend actions to other machines. Essentially, this would be done by having LLD invoke $mydiscc clang -c foo.o -fuse-thin-lto-index=foo.thinlto.bc. We've gone a different direction, so that change is not likely to come soon, but it seems like a reasonable use case where one would want to pass -j LARGE and have it be honored.

This revision is now accepted and ready to land.Mar 4 2020, 1:09 PM

Thanks Reid! I will leave this open for a few days, in case there are any other feedbacks.

As for -fthinlto-index, we have someone looking at distributing it remotely with Fastbuild. I think it is reasonable on the short-term to let the build system handle that (I assume that's what you did). One adjacent question is how to address the size of the local cache folder (ie. lto.cache when compiling LLVM) which grows big very quickly. Pruning periodically is fine, but I wonder if we could keep it compressed on disk. Maybe do DeviceIoControl (...FSCTL_SET_COMPRESSION..) and let Windows handle that? I'll take a look in the weeks to come.

I think making the build system responsible for running the backend actions is the ideal solution, actually. The main reason we have all this threading logic in the linker is to make it easy for users of traditional build systems to use ThinLTO with a few flag flips. We haven't actually made build system changes for Chrome yet. Instead, we replaced the linker with a script that wraps the thin link + backend jobs + native link steps. https://crbug.com/877722 has a partial record of ideas we had and stuff we tried for Chrome.

Does taskset -c 0-3 lld -flavor ... restrict the number of cores?

cpu_set_t cpu;
sched_getaffinity(0, sizeof(cpu), &cpu);
CPU_COUNT(&cpu)

abrachet added a subscriber: abrachet.Mar 4 2020, 6:17 PM

abrachet added inline comments.

llvm/include/llvm/Support/Threading.h
201	Nit: Remove `inline` https://llvm.org/docs/CodingStandards.html#don-t-use-inline-when-defining-a-function-in-a-class-definition

aganea mentioned this in D18487: [ThinLTO] Add optional import message and statistics.Mar 24 2020, 1:25 PM

In D75153#1906580, @MaskRay wrote:
Does taskset -c 0-3 lld -flavor ... restrict the number of cores?
cpu_set_t cpu;
sched_getaffinity(0, sizeof(cpu), &cpu);
CPU_COUNT(&cpu)

Thanks for raising this! This does not seem to work (I currently only have WSL at hand, no real Linux machine). I don't think it worked before my patch. The current code in LLVM is written such as: (note the "if" statement)

#if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
  cpu_set_t Set;
  if (sched_getaffinity(0, sizeof(Set), &Set))
    return CPU_COUNT(&Set);
#endif

The doc for sched_getaffinity says:

On success, sched_setaffinity() and sched_getaffinity() return 0. On error, -1 is returned, and errno is set appropriately.

So it would always fall back to std::thread::hardware_concurrency, which apparently does not always take affinity into account, according to @respindola (please see rG8c0ff9508da5f02e8ce6580a126a2018c9bf702a).

I'll write a follow-up patch to test affinity on Linux and Windows.

llvm/include/llvm/Support/Threading.h
201	After discussing offling with @abrachet , I'll leave the `inline` for now. It makes the symbol weak, removing `inline` would otherwise fail linking. I can move the function(s) to the .CPP after this patch to save on link time.

Closed by commit rG09158252f777: [ThinLTO] Allow usage of all hardware threads in the system (authored by aganea). · Explain WhyMar 27 2020, 7:37 AM

This revision was automatically updated to reflect the committed changes.

aganea marked an inline comment as done.

MaskRay added inline comments.Mar 27 2020, 12:01 PM

lld/test/ELF/basic.s
252–266	This change is not needed. lto/thinlto.ll has already tested the functionally. basic.s should also be split. I did this in 34bdddf9a13cfdbbb5506dc89cf8e781be53105f
253	-verbose is not needed because verbose just prints input filenames, which has nothing to do with --thinlto-jobs=0`.

This patch seems to break when disabling threading (aka LLVM_ENABLE_THREADS == 0) as get_threadpool_strategy is undefined therefore creating linker failures in clang. (Tested on Windows)

In D75153#1947956, @zero9178 wrote:

This patch seems to break when disabling threading (aka LLVM_ENABLE_THREADS == 0) as get_threadpool_strategy is undefined therefore creating linker failures in clang. (Tested on Windows)

Taking a look now.

In D75153#1947956, @zero9178 wrote:

This patch seems to break when disabling threading (aka LLVM_ENABLE_THREADS == 0) as get_threadpool_strategy is undefined therefore creating linker failures in clang. (Tested on Windows)

Should be fixed after rG3ab3f3c5d5825476dc1be15992f7c964629de688.

We've started seeing llvm-cov on our Linux bots with this error:

terminating with uncaught exception of type std::__2::system_error: thread constructor failed: Resource temporarily unavailable

Specifically, we're running llvm export which uses heavyweight_hardware_concurrency (we use the default number of threads, i.e. 0): https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-cov/CoverageExporterJson.cpp#L169

I'm not yet sure what's the problem, but bisecting is pointing at this change.

Herald added a reviewer: MaskRay. · View Herald TranscriptApr 16 2020, 11:50 AM

In D75153#1987272, @phosek wrote:
We've started seeing llvm-cov on our Linux bots with this error:
terminating with uncaught exception of type std::__2::system_error: thread constructor failed: Resource temporarily unavailable
Specifically, we're running llvm export which uses heavyweight_hardware_concurrency (we use the default number of threads, i.e. 0): https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-cov/CoverageExporterJson.cpp#L169

I'm not yet sure what's the problem, but bisecting is pointing at this change.

Also on runs that succeeded, we see the execution times more than doubled.

In D75153#1987320, @phosek wrote:
In D75153#1987272, @phosek wrote:
We've started seeing llvm-cov on our Linux bots with this error:
terminating with uncaught exception of type std::__2::system_error: thread constructor failed: Resource temporarily unavailable
Specifically, we're running llvm export which uses heavyweight_hardware_concurrency (we use the default number of threads, i.e. 0): https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-cov/CoverageExporterJson.cpp#L169

I'm not yet sure what's the problem, but bisecting is pointing at this change.
Also on runs that succeeded, we see the execution times more than doubled.

This is caused by a mix of the previous change (rG8404aeb5, see https://github.com/llvm/llvm-project/commit/8404aeb56a73ab24f9b295111de3b37a37f0b841#diff-9c7f49c15e22d38241ccb9d294f880f4L950) which was restraining the ThreadPool to the number of hardware threads, and this patch which makes the number of threads in the ThreadPool limitless.

I will revert the behavior of llvm-cov to previous state, and will check if there are other cases like this.

In D75153#1942336, @aganea wrote:
In D75153#1906580, @MaskRay wrote:
Does taskset -c 0-3 lld -flavor ... restrict the number of cores?
cpu_set_t cpu;
sched_getaffinity(0, sizeof(cpu), &cpu);
CPU_COUNT(&cpu)
Thanks for raising this! This does not seem to work (I currently only have WSL at hand, no real Linux machine). I don't think it worked before my patch. The current code in LLVM is written such as: (note the "if" statement)
#if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
  cpu_set_t Set;
  if (sched_getaffinity(0, sizeof(Set), &Set))
    return CPU_COUNT(&Set);
#endif
The doc for sched_getaffinity says:

On success, sched_setaffinity() and sched_getaffinity() return 0. On error, -1 is returned, and errno is set appropriately.

So it would always fall back to std::thread::hardware_concurrency, which apparently does not always take affinity into account, according to @respindola (please see rG8c0ff9508da5f02e8ce6580a126a2018c9bf702a).

I'll write a follow-up patch to test affinity on Linux and Windows.

Created D78324 to fix this pre-existing issue. I remember I saw a related bugs.llvm.org report yesterday but I can't find it now...

In D75153#1987538, @MaskRay wrote:

I remember I saw a related bugs.llvm.org report yesterday but I can't find it now...

This? https://bugs.llvm.org/show_bug.cgi?id=45556

https://clang.llvm.org/docs/ThinLTO.html#controlling-backend-parallelism should probably be updated to mention "all".

aganea mentioned this in D89309: [ThinLTO] In documentation, mention possible values for concurrency flags.Oct 13 2020, 5:21 AM

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

CommonArgs.h

3 lines

CommonArgs.cpp

17 lines

Darwin.cpp

9 lines

lld/

COFF/

Config.h

2 lines

Driver.cpp

4 lines

LTO.cpp

5 lines

ELF/

Config.h

2 lines

Driver.cpp

6 lines

LTO.cpp

5 lines

test/

COFF/

thinlto.ll

9 lines

ELF/

basic.s

18 lines

lto/

thinlto.ll

21 lines

wasm/

lto/

thinlto.ll

22 lines

wasm/

Config.h

2 lines

Driver.cpp

6 lines

LTO.cpp

6 lines

llvm/

include/

llvm/

LTO/

LTO.h

2 lines

Support/

Threading.h

24 lines

lib/

LTO/

LTO.cpp

16 lines

Support/

Threading.cpp

28 lines

Windows/

Threading.inc

56 lines

test/

Transforms/

PGOProfile/

thinlto_samplepgo_icp3.ll

10 lines

tools/

gold/

gold-plugin.cpp

18 lines

llvm-lto2/

llvm-lto2.cpp

10 lines

Diff 248254

clang/lib/Driver/ToolChains/CommonArgs.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bool addOpenMPRuntime(llvm::opt::ArgStringList &CmdArgs, const ToolChain &TC,
bool ForceStaticHostRuntime = false,		bool ForceStaticHostRuntime = false,
bool IsOffloadingHost = false, bool GompNeedsRT = false);		bool IsOffloadingHost = false, bool GompNeedsRT = false);

llvm::opt::Arg *getLastProfileUseArg(const llvm::opt::ArgList &Args);		llvm::opt::Arg *getLastProfileUseArg(const llvm::opt::ArgList &Args);
llvm::opt::Arg *getLastProfileSampleUseArg(const llvm::opt::ArgList &Args);		llvm::opt::Arg *getLastProfileSampleUseArg(const llvm::opt::ArgList &Args);

bool isObjCAutoRefCount(const llvm::opt::ArgList &Args);		bool isObjCAutoRefCount(const llvm::opt::ArgList &Args);

unsigned getLTOParallelism(const llvm::opt::ArgList &Args, const Driver &D);		llvm::StringRef getLTOParallelism(const llvm::opt::ArgList &Args,
		const Driver &D);

bool areOptimizationsEnabled(const llvm::opt::ArgList &Args);		bool areOptimizationsEnabled(const llvm::opt::ArgList &Args);

bool isUseSeparateSections(const llvm::Triple &Triple);		bool isUseSeparateSections(const llvm::Triple &Triple);

/// \p EnvVar is split by system delimiter for environment variables.		/// \p EnvVar is split by system delimiter for environment variables.
/// If \p ArgName is "-I", "-L", or an empty string, each entry from \p EnvVar		/// If \p ArgName is "-I", "-L", or an empty string, each entry from \p EnvVar
/// is prefixed by \p ArgName then added to \p Args. Otherwise, for each		/// is prefixed by \p ArgName then added to \p Args. Otherwise, for each
Show All 34 Lines

clang/lib/Driver/ToolChains/CommonArgs.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
#include "llvm/Support/ScopedPrinter.h"		#include "llvm/Support/ScopedPrinter.h"
#include "llvm/Support/TargetParser.h"		#include "llvm/Support/TargetParser.h"
		#include "llvm/Support/Threading.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include "llvm/Support/YAMLParser.h"		#include "llvm/Support/YAMLParser.h"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	case llvm::Triple::amdgcn:
return getR600TargetGPU(Args);		return getR600TargetGPU(Args);

case llvm::Triple::wasm32:		case llvm::Triple::wasm32:
case llvm::Triple::wasm64:		case llvm::Triple::wasm64:
return std::string(getWebAssemblyTargetCPU(Args));		return std::string(getWebAssemblyTargetCPU(Args));
}		}
}		}

unsigned tools::getLTOParallelism(const ArgList &Args, const Driver &D) {		llvm::StringRef tools::getLTOParallelism(const ArgList &Args, const Driver &D) {
unsigned Parallelism = 0;		unsigned Parallelism = 0;
Arg *LtoJobsArg = Args.getLastArg(options::OPT_flto_jobs_EQ);		Arg *LtoJobsArg = Args.getLastArg(options::OPT_flto_jobs_EQ);
if (LtoJobsArg &&		if (!LtoJobsArg)
StringRef(LtoJobsArg->getValue()).getAsInteger(10, Parallelism))		return {};
D.Diag(diag::err_drv_invalid_int_value) << LtoJobsArg->getAsString(Args)		if (!llvm::get_threadpool_strategy(LtoJobsArg->getValue()))
<< LtoJobsArg->getValue();		D.Diag(diag::err_drv_invalid_int_value)
return Parallelism;		<< LtoJobsArg->getAsString(Args) << LtoJobsArg->getValue();
		return LtoJobsArg->getValue();
}		}

// CloudABI uses -ffunction-sections and -fdata-sections by default.		// CloudABI uses -ffunction-sections and -fdata-sections by default.
bool tools::isUseSeparateSections(const llvm::Triple &Triple) {		bool tools::isUseSeparateSections(const llvm::Triple &Triple) {
return Triple.getOS() == llvm::Triple::CloudABI;		return Triple.getOS() == llvm::Triple::CloudABI;
}		}

void tools::addLTOOptions(const ToolChain &ToolChain, const ArgList &Args,		void tools::addLTOOptions(const ToolChain &ToolChain, const ArgList &Args,
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (Args.hasArg(options::OPT_gsplit_dwarf)) {
CmdArgs.push_back(		CmdArgs.push_back(
Args.MakeArgString(Twine("-plugin-opt=dwo_dir=") +		Args.MakeArgString(Twine("-plugin-opt=dwo_dir=") +
Output.getFilename() + "_dwo"));		Output.getFilename() + "_dwo"));
}		}

if (IsThinLTO)		if (IsThinLTO)
CmdArgs.push_back("-plugin-opt=thinlto");		CmdArgs.push_back("-plugin-opt=thinlto");

if (unsigned Parallelism = getLTOParallelism(Args, ToolChain.getDriver()))		StringRef Parallelism = getLTOParallelism(Args, ToolChain.getDriver());
		if (!Parallelism.empty())
CmdArgs.push_back(		CmdArgs.push_back(
Args.MakeArgString("-plugin-opt=jobs=" + Twine(Parallelism)));		Args.MakeArgString("-plugin-opt=jobs=" + Twine(Parallelism)));

// If an explicit debugger tuning argument appeared, pass it along.		// If an explicit debugger tuning argument appeared, pass it along.
if (Arg *A = Args.getLastArg(options::OPT_gTune_Group,		if (Arg *A = Args.getLastArg(options::OPT_gTune_Group,
options::OPT_ggdbN_Group)) {		options::OPT_ggdbN_Group)) {
if (A->getOption().matches(options::OPT_glldb))		if (A->getOption().matches(options::OPT_glldb))
CmdArgs.push_back("-plugin-opt=-debugger-tune=lldb");		CmdArgs.push_back("-plugin-opt=-debugger-tune=lldb");
▲ Show 20 Lines • Show All 996 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Darwin.cpp

Show All 17 Lines
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/SanitizerArgs.h"		#include "clang/Driver/SanitizerArgs.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/ProfileData/InstrProf.h"		#include "llvm/ProfileData/InstrProf.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/ScopedPrinter.h"		#include "llvm/Support/ScopedPrinter.h"
#include "llvm/Support/TargetParser.h"		#include "llvm/Support/TargetParser.h"
		#include "llvm/Support/Threading.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include <cstdlib> // ::getenv		#include <cstdlib> // ::getenv

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;
▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	if (LinkingOutput) {
CmdArgs.push_back(LinkingOutput);		CmdArgs.push_back(LinkingOutput);
}		}

if (Args.hasArg(options::OPT_fnested_functions))		if (Args.hasArg(options::OPT_fnested_functions))
CmdArgs.push_back("-allow_stack_execute");		CmdArgs.push_back("-allow_stack_execute");

getMachOToolChain().addProfileRTLibs(Args, CmdArgs);		getMachOToolChain().addProfileRTLibs(Args, CmdArgs);

if (unsigned Parallelism =		StringRef Parallelism = getLTOParallelism(Args, getToolChain().getDriver());
getLTOParallelism(Args, getToolChain().getDriver())) {		if (!Parallelism.empty()) {
CmdArgs.push_back("-mllvm");		CmdArgs.push_back("-mllvm");
CmdArgs.push_back(Args.MakeArgString("-threads=" + Twine(Parallelism)));		unsigned NumThreads =
		llvm::get_threadpool_strategy(Parallelism)->compute_thread_count();
		CmdArgs.push_back(Args.MakeArgString("-threads=" + Twine(NumThreads)));
}		}

if (getToolChain().ShouldLinkCXXStdlib(Args))		if (getToolChain().ShouldLinkCXXStdlib(Args))
getToolChain().AddCXXStdlibLibArgs(Args, CmdArgs);		getToolChain().AddCXXStdlibLibArgs(Args, CmdArgs);

bool NoStdOrDefaultLibs =		bool NoStdOrDefaultLibs =
Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs);		Args.hasArg(options::OPT_nostdlib, options::OPT_nodefaultlibs);
bool ForceLinkBuiltins = Args.hasArg(options::OPT_fapple_link_rtlib);		bool ForceLinkBuiltins = Args.hasArg(options::OPT_fapple_link_rtlib);
▲ Show 20 Lines • Show All 2,074 Lines • Show Last 20 Lines

lld/COFF/Config.h

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	struct Configuration {
bool safeSEH = false;		bool safeSEH = false;
Symbol *sehTable = nullptr;		Symbol *sehTable = nullptr;
Symbol *sehCount = nullptr;		Symbol *sehCount = nullptr;

// Used for /opt:lldlto=N		// Used for /opt:lldlto=N
unsigned ltoo = 2;		unsigned ltoo = 2;

// Used for /opt:lldltojobs=N		// Used for /opt:lldltojobs=N
unsigned thinLTOJobs = 0;		std::string thinLTOJobs;
// Used for /opt:lldltopartitions=N		// Used for /opt:lldltopartitions=N
unsigned ltoPartitions = 1;		unsigned ltoPartitions = 1;

// Used for /opt:lldltocache=path		// Used for /opt:lldltocache=path
StringRef ltoCache;		StringRef ltoCache;
// Used for /opt:lldltocachepolicy=policy		// Used for /opt:lldltocachepolicy=policy
llvm::CachePruningPolicy ltoCachePolicy;		llvm::CachePruningPolicy ltoCachePolicy;

▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

lld/COFF/Driver.cpp

Show First 20 Lines • Show All 1,410 Lines • ▼ Show 20 Lines	for (StringRef s : vec) {
} else if (s == "nolldtailmerge") {		} else if (s == "nolldtailmerge") {
tailMerge = 0;		tailMerge = 0;
} else if (s.startswith("lldlto=")) {		} else if (s.startswith("lldlto=")) {
StringRef optLevel = s.substr(7);		StringRef optLevel = s.substr(7);
if (optLevel.getAsInteger(10, config->ltoo) \|\| config->ltoo > 3)		if (optLevel.getAsInteger(10, config->ltoo) \|\| config->ltoo > 3)
error("/opt:lldlto: invalid optimization level: " + optLevel);		error("/opt:lldlto: invalid optimization level: " + optLevel);
} else if (s.startswith("lldltojobs=")) {		} else if (s.startswith("lldltojobs=")) {
StringRef jobs = s.substr(11);		StringRef jobs = s.substr(11);
if (jobs.getAsInteger(10, config->thinLTOJobs) \|\|		if (!get_threadpool_strategy(jobs))
config->thinLTOJobs == 0)
error("/opt:lldltojobs: invalid job count: " + jobs);		error("/opt:lldltojobs: invalid job count: " + jobs);
		config->thinLTOJobs = jobs.str();
} else if (s.startswith("lldltopartitions=")) {		} else if (s.startswith("lldltopartitions=")) {
StringRef n = s.substr(17);		StringRef n = s.substr(17);
if (n.getAsInteger(10, config->ltoPartitions) \|\|		if (n.getAsInteger(10, config->ltoPartitions) \|\|
config->ltoPartitions == 0)		config->ltoPartitions == 0)
error("/opt:lldltopartitions: invalid partition count: " + n);		error("/opt:lldltopartitions: invalid partition count: " + n);
} else if (s != "lbr" && s != "nolbr")		} else if (s != "lbr" && s != "nolbr")
error("/opt: unknown option: " + s);		error("/opt: unknown option: " + s);
}		}
▲ Show 20 Lines • Show All 583 Lines • Show Last 20 Lines

lld/COFF/LTO.cpp

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	BitcodeCompiler::BitcodeCompiler() {
// Initialize ltoObj.		// Initialize ltoObj.
lto::ThinBackend backend;		lto::ThinBackend backend;
if (config->thinLTOIndexOnly) {		if (config->thinLTOIndexOnly) {
auto OnIndexWrite = [&](StringRef S) { thinIndices.erase(S); };		auto OnIndexWrite = [&](StringRef S) { thinIndices.erase(S); };
backend = lto::createWriteIndexesThinBackend(		backend = lto::createWriteIndexesThinBackend(
std::string(config->thinLTOPrefixReplace.first),		std::string(config->thinLTOPrefixReplace.first),
std::string(config->thinLTOPrefixReplace.second),		std::string(config->thinLTOPrefixReplace.second),
config->thinLTOEmitImportsFiles, indexFile.get(), OnIndexWrite);		config->thinLTOEmitImportsFiles, indexFile.get(), OnIndexWrite);
} else if (config->thinLTOJobs != 0) {		} else {
backend = lto::createInProcessThinBackend(config->thinLTOJobs);		backend = lto::createInProcessThinBackend(
		llvm::heavyweight_hardware_concurrency(config->thinLTOJobs));
}		}

ltoObj = std::make_unique<lto::LTO>(createConfig(), backend,		ltoObj = std::make_unique<lto::LTO>(createConfig(), backend,
config->ltoPartitions);		config->ltoPartitions);
}		}

BitcodeCompiler::~BitcodeCompiler() = default;		BitcodeCompiler::~BitcodeCompiler() = default;

▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

lld/ELF/Config.h

Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	struct Configuration {
llvm::Optional<uint64_t> imageBase;		llvm::Optional<uint64_t> imageBase;
uint64_t commonPageSize;		uint64_t commonPageSize;
uint64_t maxPageSize;		uint64_t maxPageSize;
uint64_t mipsGotSize;		uint64_t mipsGotSize;
uint64_t zStackSize;		uint64_t zStackSize;
unsigned ltoPartitions;		unsigned ltoPartitions;
unsigned ltoo;		unsigned ltoo;
unsigned optimize;		unsigned optimize;
unsigned thinLTOJobs;		StringRef thinLTOJobs;
unsigned timeTraceGranularity;		unsigned timeTraceGranularity;
int32_t splitStackAdjustSize;		int32_t splitStackAdjustSize;

// The following config options do not directly correspond to any		// The following config options do not directly correspond to any
// particular command line options.		// particular command line options.

// True if we need to pass through relocations in input files to the		// True if we need to pass through relocations in input files to the
// output file. Usually false because we consume relocations.		// output file. Usually false because we consume relocations.
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

lld/ELF/Driver.cpp

Show First 20 Lines • Show All 971 Lines • ▼ Show 20 Lines	static void readConfigs(opt::InputArgList &args) {
config->thinLTOCacheDir = args.getLastArgValue(OPT_thinlto_cache_dir);		config->thinLTOCacheDir = args.getLastArgValue(OPT_thinlto_cache_dir);
config->thinLTOCachePolicy = CHECK(		config->thinLTOCachePolicy = CHECK(
parseCachePruningPolicy(args.getLastArgValue(OPT_thinlto_cache_policy)),		parseCachePruningPolicy(args.getLastArgValue(OPT_thinlto_cache_policy)),
"--thinlto-cache-policy: invalid cache policy");		"--thinlto-cache-policy: invalid cache policy");
config->thinLTOEmitImportsFiles = args.hasArg(OPT_thinlto_emit_imports_files);		config->thinLTOEmitImportsFiles = args.hasArg(OPT_thinlto_emit_imports_files);
config->thinLTOIndexOnly = args.hasArg(OPT_thinlto_index_only) \|\|		config->thinLTOIndexOnly = args.hasArg(OPT_thinlto_index_only) \|\|
args.hasArg(OPT_thinlto_index_only_eq);		args.hasArg(OPT_thinlto_index_only_eq);
config->thinLTOIndexOnlyArg = args.getLastArgValue(OPT_thinlto_index_only_eq);		config->thinLTOIndexOnlyArg = args.getLastArgValue(OPT_thinlto_index_only_eq);
config->thinLTOJobs = args::getInteger(args, OPT_thinlto_jobs, -1u);		config->thinLTOJobs = args.getLastArgValue(OPT_thinlto_jobs);
config->thinLTOObjectSuffixReplace =		config->thinLTOObjectSuffixReplace =
getOldNewOptions(args, OPT_thinlto_object_suffix_replace_eq);		getOldNewOptions(args, OPT_thinlto_object_suffix_replace_eq);
config->thinLTOPrefixReplace =		config->thinLTOPrefixReplace =
getOldNewOptions(args, OPT_thinlto_prefix_replace_eq);		getOldNewOptions(args, OPT_thinlto_prefix_replace_eq);
config->timeTraceEnabled = args.hasArg(OPT_time_trace);		config->timeTraceEnabled = args.hasArg(OPT_time_trace);
config->timeTraceGranularity =		config->timeTraceGranularity =
args::getInteger(args, OPT_time_trace_granularity, 500);		args::getInteger(args, OPT_time_trace_granularity, 500);
config->trace = args.hasArg(OPT_trace);		config->trace = args.hasArg(OPT_trace);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	static void readConfigs(opt::InputArgList &args) {
// Parse -mllvm options.		// Parse -mllvm options.
for (auto *arg : args.filtered(OPT_mllvm))		for (auto *arg : args.filtered(OPT_mllvm))
parseClangOption(arg->getValue(), arg->getSpelling());		parseClangOption(arg->getValue(), arg->getSpelling());

if (config->ltoo > 3)		if (config->ltoo > 3)
error("invalid optimization level for LTO: " + Twine(config->ltoo));		error("invalid optimization level for LTO: " + Twine(config->ltoo));
if (config->ltoPartitions == 0)		if (config->ltoPartitions == 0)
error("--lto-partitions: number of threads must be > 0");		error("--lto-partitions: number of threads must be > 0");
if (config->thinLTOJobs == 0)		if (!get_threadpool_strategy(config->thinLTOJobs))
error("--thinlto-jobs: number of threads must be > 0");		error("--thinlto-jobs: invalid job count: " + config->thinLTOJobs);

if (config->splitStackAdjustSize < 0)		if (config->splitStackAdjustSize < 0)
error("--split-stack-adjust-size: size must be >= 0");		error("--split-stack-adjust-size: size must be >= 0");

// The text segment is traditionally the first segment, whose address equals		// The text segment is traditionally the first segment, whose address equals
// the base address. However, lld places the R PT_LOAD first. -Ttext-segment		// the base address. However, lld places the R PT_LOAD first. -Ttext-segment
// is an old-fashioned option that does not play well with lld's layout.		// is an old-fashioned option that does not play well with lld's layout.
// Suggest --image-base as a likely alternative.		// Suggest --image-base as a likely alternative.
▲ Show 20 Lines • Show All 1,007 Lines • Show Last 20 Lines

lld/ELF/LTO.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	BitcodeCompiler::BitcodeCompiler() {
// Initialize ltoObj.		// Initialize ltoObj.
lto::ThinBackend backend;		lto::ThinBackend backend;
if (config->thinLTOIndexOnly) {		if (config->thinLTOIndexOnly) {
auto onIndexWrite = [&](StringRef s) { thinIndices.erase(s); };		auto onIndexWrite = [&](StringRef s) { thinIndices.erase(s); };
backend = lto::createWriteIndexesThinBackend(		backend = lto::createWriteIndexesThinBackend(
std::string(config->thinLTOPrefixReplace.first),		std::string(config->thinLTOPrefixReplace.first),
std::string(config->thinLTOPrefixReplace.second),		std::string(config->thinLTOPrefixReplace.second),
config->thinLTOEmitImportsFiles, indexFile.get(), onIndexWrite);		config->thinLTOEmitImportsFiles, indexFile.get(), onIndexWrite);
} else if (config->thinLTOJobs != -1U) {		} else {
backend = lto::createInProcessThinBackend(config->thinLTOJobs);		backend = lto::createInProcessThinBackend(
		llvm::heavyweight_hardware_concurrency(config->thinLTOJobs));
}		}

ltoObj = std::make_unique<lto::LTO>(createConfig(), backend,		ltoObj = std::make_unique<lto::LTO>(createConfig(), backend,
config->ltoPartitions);		config->ltoPartitions);

// Initialize usedStartStop.		// Initialize usedStartStop.
for (Symbol *sym : symtab->symbols()) {		for (Symbol *sym : symtab->symbols()) {
StringRef s = sym->getName();		StringRef s = sym->getName();
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

lld/test/COFF/thinlto.ll

	; REQUIRES: x86			; REQUIRES: x86
	; RUN: rm -fr %T/thinlto			; RUN: rm -fr %T/thinlto
	; RUN: mkdir %T/thinlto			; RUN: mkdir %T/thinlto
	; RUN: opt -thinlto-bc -o %T/thinlto/main.obj %s			; RUN: opt -thinlto-bc -o %T/thinlto/main.obj %s
	; RUN: opt -thinlto-bc -o %T/thinlto/foo.obj %S/Inputs/lto-dep.ll			; RUN: opt -thinlto-bc -o %T/thinlto/foo.obj %S/Inputs/lto-dep.ll
	; RUN: lld-link /lldsavetemps /out:%T/thinlto/main.exe /entry:main /subsystem:console %T/thinlto/main.obj %T/thinlto/foo.obj			; RUN: lld-link /lldsavetemps /out:%T/thinlto/main.exe /entry:main /subsystem:console %T/thinlto/main.obj %T/thinlto/foo.obj
	; RUN: llvm-nm %T/thinlto/main.exe1.lto.obj \| FileCheck %s			; RUN: llvm-nm %T/thinlto/main.exe1.lto.obj \| FileCheck %s

				; RUN: lld-link /lldsavetemps /out:%T/thinlto/main.exe /entry:main /subsystem:console %T/thinlto/main.obj %T/thinlto/foo.obj /opt:lldltojobs=1
				; RUN: llvm-nm %T/thinlto/main.exe1.lto.obj \| FileCheck %s
				; RUN: lld-link /lldsavetemps /out:%T/thinlto/main.exe /entry:main /subsystem:console %T/thinlto/main.obj %T/thinlto/foo.obj /opt:lldltojobs=all
				; RUN: llvm-nm %T/thinlto/main.exe1.lto.obj \| FileCheck %s
				; RUN: lld-link /lldsavetemps /out:%T/thinlto/main.exe /entry:main /subsystem:console %T/thinlto/main.obj %T/thinlto/foo.obj /opt:lldltojobs=1000
				; RUN: llvm-nm %T/thinlto/main.exe1.lto.obj \| FileCheck %s
				; RUN: not lld-link /lldsavetemps /out:%T/thinlto/main.exe /entry:main /subsystem:console %T/thinlto/main.obj %T/thinlto/foo.obj /opt:lldltojobs=foo 2>&1 \| FileCheck %s --check-prefix=BAD-JOBS
				; BAD-JOBS: error: /opt:lldltojobs: invalid job count: foo

	; CHECK-NOT: U foo			; CHECK-NOT: U foo

	target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc-windows-msvc"			target triple = "x86_64-pc-windows-msvc"

	define i32 @main() {			define i32 @main() {
	call void @foo()			call void @foo()
	ret i32 0			ret i32 0
	}			}

	declare void @foo()			declare void @foo()

lld/test/ELF/basic.s

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines

	# RUN: not ld.lld %t -o /dev/null -m wrong_emul_fbsd 2>&1 \| FileCheck --check-prefix=UNKNOWN_EMUL %s			# RUN: not ld.lld %t -o /dev/null -m wrong_emul_fbsd 2>&1 \| FileCheck --check-prefix=UNKNOWN_EMUL %s
	# UNKNOWN_EMUL: unknown emulation: wrong_emul_fbsd			# UNKNOWN_EMUL: unknown emulation: wrong_emul_fbsd

	# RUN: not ld.lld %t --lto-partitions=0 2>&1 \| FileCheck --check-prefix=NOTHREADS %s			# RUN: not ld.lld %t --lto-partitions=0 2>&1 \| FileCheck --check-prefix=NOTHREADS %s
	# RUN: not ld.lld %t --plugin-opt=lto-partitions=0 2>&1 \| FileCheck --check-prefix=NOTHREADS %s			# RUN: not ld.lld %t --plugin-opt=lto-partitions=0 2>&1 \| FileCheck --check-prefix=NOTHREADS %s
	# NOTHREADS: --lto-partitions: number of threads must be > 0			# NOTHREADS: --lto-partitions: number of threads must be > 0

	# RUN: not ld.lld %t --thinlto-jobs=0 2>&1 \| FileCheck --check-prefix=NOTHREADSTHIN %s			# RUN: ld.lld %t --thinlto-jobs=0 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
	# RUN: not ld.lld %t --plugin-opt=jobs=0 2>&1 \| FileCheck --check-prefix=NOTHREADSTHIN %s			# RUN: ld.lld %t --thinlto-jobs=1 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				MaskRayUnsubmitted Not Done Reply Inline Actions -verbose is not needed because verbose just prints input filenames, which has nothing to do with --thinlto-jobs=0`. MaskRay: -verbose is not needed because verbose just prints input filenames, which has nothing to do…
	# NOTHREADSTHIN: --thinlto-jobs: number of threads must be > 0			# RUN: ld.lld %t --thinlto-jobs=2 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: ld.lld %t --thinlto-jobs=all -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: ld.lld %t --thinlto-jobs=1000 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# THREADSTHIN: basic.s.tmp
				# RUN: not ld.lld %t --thinlto-jobs=foo -verbose 2>&1 \| FileCheck --check-prefix=BADTHREADSTHIN %s
				# BADTHREADSTHIN: error: --thinlto-jobs: invalid job count: foo

				# RUN: ld.lld %t --plugin-opt=jobs=0 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: ld.lld %t --plugin-opt=jobs=1 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: ld.lld %t --plugin-opt=jobs=2 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: ld.lld %t --plugin-opt=jobs=all -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: ld.lld %t --plugin-opt=jobs=1000 -verbose 2>&1 \| FileCheck --check-prefix=THREADSTHIN %s
				# RUN: not ld.lld %t --plugin-opt=jobs=foo -verbose 2>&1 \| FileCheck --check-prefix=BADTHREADSTHIN %s
				MaskRayUnsubmitted Not Done Reply Inline Actions This change is not needed. lto/thinlto.ll has already tested the functionally. basic.s should also be split. I did this in 34bdddf9a13cfdbbb5506dc89cf8e781be53105f MaskRay: This change is not needed. lto/thinlto.ll has already tested the functionally. basic.s should…

	# RUN: not ld.lld %t -z ifunc-noplt -z text 2>&1 \| FileCheck --check-prefix=NOIFUNCPLTNOTEXTREL %s			# RUN: not ld.lld %t -z ifunc-noplt -z text 2>&1 \| FileCheck --check-prefix=NOIFUNCPLTNOTEXTREL %s
	# NOIFUNCPLTNOTEXTREL: -z text and -z ifunc-noplt may not be used together			# NOIFUNCPLTNOTEXTREL: -z text and -z ifunc-noplt may not be used together

lld/test/ELF/lto/thinlto.ll

	Show All 10 Lines
	; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

	; Next force multi-threaded mode			; Next force multi-threaded mode
	; RUN: rm -f %t31.lto.o %t32.lto.o			; RUN: rm -f %t31.lto.o %t32.lto.o
	; RUN: ld.lld -save-temps --thinlto-jobs=2 -shared %t1.o %t2.o -o %t3			; RUN: ld.lld -save-temps --thinlto-jobs=2 -shared %t1.o %t2.o -o %t3
	; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1			; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
	; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

	; Then check without --thinlto-jobs (which currently default to hardware_concurrency)			; Test with all threads, on all cores, on all CPU sockets
	; RUN: ld.lld -shared %t1.o %t2.o -o %t3			; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: ld.lld -save-temps --thinlto-jobs=all -shared %t1.o %t2.o -o %t3
				; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
				; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

				; Test with many more threads than the system has
				; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: ld.lld -save-temps --thinlto-jobs=1000 -shared %t1.o %t2.o -o %t3
				; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
				; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

				; Test with a bad value
				; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: not ld.lld -save-temps --thinlto-jobs=foo -shared %t1.o %t2.o -o %t3 2>&1 \| FileCheck %s --check-prefix=BAD-JOBS
				; BAD-JOBS: error: --thinlto-jobs: invalid job count: foo

				; Then check without --thinlto-jobs (which currently defaults to heavyweight_hardware_concurrency, meanning one thread per hardware core -- not SMT)
				; RUN: ld.lld -shared -save-temps %t1.o %t2.o -o %t3
	; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1			; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
	; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

	; NM1: T f			; NM1: T f
	; NM2: T g			; NM2: T g

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	declare void @g(...)			declare void @g(...)

	define void @f() {			define void @f() {
	entry:			entry:
	call void (...) @g()			call void (...) @g()
	ret void			ret void
	}			}

lld/test/wasm/lto/thinlto.ll

	; Basic ThinLTO tests.			; Basic ThinLTO tests.
	; RUN: opt -module-summary %s -o %t1.o			; RUN: opt -module-summary %s -o %t1.o
	; RUN: opt -module-summary %p/Inputs/thinlto.ll -o %t2.o			; RUN: opt -module-summary %p/Inputs/thinlto.ll -o %t2.o

	; First force single-threaded mode			; First force single-threaded mode
	; RUN: rm -f %t31.lto.o %t32.lto.o			; RUN: rm -f %t31.lto.o %t32.lto.o
	; RUN: wasm-ld -r -save-temps --thinlto-jobs=1 %t1.o %t2.o -o %t3			; RUN: wasm-ld -r -save-temps --thinlto-jobs=1 %t1.o %t2.o -o %t3
	; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1			; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
	; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

	; Next force multi-threaded mode			; Next force multi-threaded mode
	; RUN: rm -f %t31.lto.o %t32.lto.o			; RUN: rm -f %t31.lto.o %t32.lto.o
	; RUN: wasm-ld -r -save-temps --thinlto-jobs=2 %t1.o %t2.o -o %t3			; RUN: wasm-ld -r -save-temps --thinlto-jobs=2 %t1.o %t2.o -o %t3
	; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1			; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
	; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

	; Check without --thinlto-jobs (which currently default to hardware_concurrency)			; Test with all threads, on all cores, on all CPU sockets
	; RUN: wasm-ld -r %t1.o %t2.o -o %t3			; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: wasm-ld -r -save-temps --thinlto-jobs=all %t1.o %t2.o -o %t3
				; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
				; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

				; Test with many more threads than the system has
				; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: wasm-ld -r -save-temps --thinlto-jobs=1000 %t1.o %t2.o -o %t3
				; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
				; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

				; Test with a bad value
				; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: not wasm-ld -r -save-temps --thinlto-jobs=foo %t1.o %t2.o -o %t3 2>&1 \| FileCheck %s --check-prefix=BAD-JOBS
				; BAD-JOBS: error: --thinlto-jobs: invalid job count: foo

				; Check without --thinlto-jobs (which currently defaults to heavyweight_hardware_concurrency, meanning one thread per hardware core -- not SMT)
				; RUN: rm -f %t31.lto.o %t32.lto.o
				; RUN: wasm-ld -r -save-temps %t1.o %t2.o -o %t3
	; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1			; RUN: llvm-nm %t31.lto.o \| FileCheck %s --check-prefix=NM1
	; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2			; RUN: llvm-nm %t32.lto.o \| FileCheck %s --check-prefix=NM2

	; NM1: T f			; NM1: T f
	; NM2: T g			; NM2: T g

	target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
	target triple = "wasm32-unknown-unknown"			target triple = "wasm32-unknown-unknown"

	declare void @g(...)			declare void @g(...)

	define void @f() {			define void @f() {
	entry:			entry:
	call void (...) @g()			call void (...) @g()
	ret void			ret void
	}			}

lld/wasm/Config.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	struct Configuration {
bool trace;		bool trace;
uint32_t globalBase;		uint32_t globalBase;
uint32_t initialMemory;		uint32_t initialMemory;
uint32_t maxMemory;		uint32_t maxMemory;
uint32_t zStackSize;		uint32_t zStackSize;
unsigned ltoPartitions;		unsigned ltoPartitions;
unsigned ltoo;		unsigned ltoo;
unsigned optimize;		unsigned optimize;
unsigned thinLTOJobs;		llvm::StringRef thinLTOJobs;

llvm::StringRef entry;		llvm::StringRef entry;
llvm::StringRef outputFile;		llvm::StringRef outputFile;
llvm::StringRef thinLTOCacheDir;		llvm::StringRef thinLTOCacheDir;

llvm::StringSet<> allowUndefinedSymbols;		llvm::StringSet<> allowUndefinedSymbols;
llvm::StringSet<> exportedSymbols;		llvm::StringSet<> exportedSymbols;
std::vector<llvm::StringRef> searchPaths;		std::vector<llvm::StringRef> searchPaths;
Show All 23 Lines

lld/wasm/Driver.cpp

Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	static void readConfigs(opt::InputArgList &args) {
config->stripAll = args.hasArg(OPT_strip_all);		config->stripAll = args.hasArg(OPT_strip_all);
config->stripDebug = args.hasArg(OPT_strip_debug);		config->stripDebug = args.hasArg(OPT_strip_debug);
config->stackFirst = args.hasArg(OPT_stack_first);		config->stackFirst = args.hasArg(OPT_stack_first);
config->trace = args.hasArg(OPT_trace);		config->trace = args.hasArg(OPT_trace);
config->thinLTOCacheDir = args.getLastArgValue(OPT_thinlto_cache_dir);		config->thinLTOCacheDir = args.getLastArgValue(OPT_thinlto_cache_dir);
config->thinLTOCachePolicy = CHECK(		config->thinLTOCachePolicy = CHECK(
parseCachePruningPolicy(args.getLastArgValue(OPT_thinlto_cache_policy)),		parseCachePruningPolicy(args.getLastArgValue(OPT_thinlto_cache_policy)),
"--thinlto-cache-policy: invalid cache policy");		"--thinlto-cache-policy: invalid cache policy");
config->thinLTOJobs = args::getInteger(args, OPT_thinlto_jobs, -1u);		config->thinLTOJobs = args.getLastArgValue(OPT_thinlto_jobs);
errorHandler().verbose = args.hasArg(OPT_verbose);		errorHandler().verbose = args.hasArg(OPT_verbose);
LLVM_DEBUG(errorHandler().verbose = true);		LLVM_DEBUG(errorHandler().verbose = true);
threadsEnabled = args.hasFlag(OPT_threads, OPT_no_threads, true);		threadsEnabled = args.hasFlag(OPT_threads, OPT_no_threads, true);

config->initialMemory = args::getInteger(args, OPT_initial_memory, 0);		config->initialMemory = args::getInteger(args, OPT_initial_memory, 0);
config->globalBase = args::getInteger(args, OPT_global_base, 1024);		config->globalBase = args::getInteger(args, OPT_global_base, 1024);
config->maxMemory = args::getInteger(args, OPT_max_memory, 0);		config->maxMemory = args::getInteger(args, OPT_max_memory, 0);
config->zStackSize =		config->zStackSize =
Show All 36 Lines	static void checkOptions(opt::InputArgList &args) {
if (!config->stripDebug && !config->stripAll && config->compressRelocations)		if (!config->stripDebug && !config->stripAll && config->compressRelocations)
error("--compress-relocations is incompatible with output debug"		error("--compress-relocations is incompatible with output debug"
" information. Please pass --strip-debug or --strip-all");		" information. Please pass --strip-debug or --strip-all");

if (config->ltoo > 3)		if (config->ltoo > 3)
error("invalid optimization level for LTO: " + Twine(config->ltoo));		error("invalid optimization level for LTO: " + Twine(config->ltoo));
if (config->ltoPartitions == 0)		if (config->ltoPartitions == 0)
error("--lto-partitions: number of threads must be > 0");		error("--lto-partitions: number of threads must be > 0");
if (config->thinLTOJobs == 0)		if (!get_threadpool_strategy(config->thinLTOJobs))
error("--thinlto-jobs: number of threads must be > 0");		error("--thinlto-jobs: invalid job count: " + config->thinLTOJobs);

if (config->pie && config->shared)		if (config->pie && config->shared)
error("-shared and -pie may not be used together");		error("-shared and -pie may not be used together");

if (config->outputFile.empty())		if (config->outputFile.empty())
error("no output file specified");		error("no output file specified");

if (config->importTable && config->exportTable)		if (config->importTable && config->exportTable)
▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

lld/wasm/LTO.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	static std::unique_ptr<lto::LTO> createLTO() {
else if (config->isPic)		else if (config->isPic)
c.RelocModel = Reloc::PIC_;		c.RelocModel = Reloc::PIC_;
else		else
c.RelocModel = Reloc::Static;		c.RelocModel = Reloc::Static;

if (config->saveTemps)		if (config->saveTemps)
checkError(c.addSaveTemps(config->outputFile.str() + ".",		checkError(c.addSaveTemps(config->outputFile.str() + ".",
/UseInputModulePath/ true));		/UseInputModulePath/ true));
		lto::ThinBackend backend = lto::createInProcessThinBackend(
lto::ThinBackend backend;		llvm::heavyweight_hardware_concurrency(config->thinLTOJobs));
if (config->thinLTOJobs != -1U)
backend = lto::createInProcessThinBackend(config->thinLTOJobs);
return std::make_unique<lto::LTO>(std::move(c), backend,		return std::make_unique<lto::LTO>(std::move(c), backend,
config->ltoPartitions);		config->ltoPartitions);
}		}

BitcodeCompiler::BitcodeCompiler() : ltoObj(createLTO()) {}		BitcodeCompiler::BitcodeCompiler() : ltoObj(createLTO()) {}

BitcodeCompiler::~BitcodeCompiler() = default;		BitcodeCompiler::~BitcodeCompiler() = default;

▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/include/llvm/LTO/LTO.h

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	/// create a ThinBackend using one of the create*ThinBackend() functions below.			/// create a ThinBackend using one of the create*ThinBackend() functions below.
	using ThinBackend = std::function<std::unique_ptr<ThinBackendProc>(			using ThinBackend = std::function<std::unique_ptr<ThinBackendProc>(
	const Config &C, ModuleSummaryIndex &CombinedIndex,			const Config &C, ModuleSummaryIndex &CombinedIndex,
	StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,			StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
	AddStreamFn AddStream, NativeObjectCache Cache)>;			AddStreamFn AddStream, NativeObjectCache Cache)>;

	/// This ThinBackend runs the individual backend jobs in-process.			/// This ThinBackend runs the individual backend jobs in-process.
	/// The default value means to use one job per hardware core (not hyper-thread).			/// The default value means to use one job per hardware core (not hyper-thread).
	ThinBackend createInProcessThinBackend(unsigned ParallelismLevel = 0);			ThinBackend createInProcessThinBackend(ThreadPoolStrategy Parallelism);

	/// This ThinBackend writes individual module indexes to files, instead of			/// This ThinBackend writes individual module indexes to files, instead of
	/// running the individual backend jobs. This backend is for distributed builds			/// running the individual backend jobs. This backend is for distributed builds
	/// where separate processes will invoke the real backends.			/// where separate processes will invoke the real backends.
	///			///
	/// To find the path to write the index to, the backend checks if the path has a			/// To find the path to write the index to, the backend checks if the path has a
	/// prefix of OldPrefix; if so, it replaces that prefix with NewPrefix. It then			/// prefix of OldPrefix; if so, it replaces that prefix with NewPrefix. It then
	/// appends ".thinlto.bc" and writes the index to that path. If			/// appends ".thinlto.bc" and writes the index to that path. If
	▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/include/llvm/Support/Threading.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:
/// accounts for affinity masks and takes advantage of all CPU sockets.		/// accounts for affinity masks and takes advantage of all CPU sockets.
unsigned compute_thread_count() const;		unsigned compute_thread_count() const;

/// Assign the current thread to an ideal hardware CPU or NUMA node. In a		/// Assign the current thread to an ideal hardware CPU or NUMA node. In a
/// multi-socket system, this ensures threads are assigned to all CPU		/// multi-socket system, this ensures threads are assigned to all CPU
/// sockets. \p ThreadPoolNum represents a number bounded by [0,		/// sockets. \p ThreadPoolNum represents a number bounded by [0,
/// compute_thread_count()).		/// compute_thread_count()).
void apply_thread_strategy(unsigned ThreadPoolNum) const;		void apply_thread_strategy(unsigned ThreadPoolNum) const;

		/// Finds the CPU socket where a thread should go. Returns 'None' if the
		/// thread shall remain on the actual CPU socket.
		Optional<unsigned> compute_cpu_socket(unsigned ThreadPoolNum) const;
};		};

		/// Build a strategy from a number of threads as a string provided in \p Num.
		/// When Num is above the max number of threads specified by the \p Default
		/// strategy, we attempt to equally allocate the threads on all CPU sockets.
		/// "0" or an empty string will return the \p Default strategy.
		/// "all" for using all hardware threads.
		Optional<ThreadPoolStrategy>
		get_threadpool_strategy(StringRef Num, ThreadPoolStrategy Default = {});

/// Returns a thread strategy for tasks requiring significant memory or other		/// Returns a thread strategy for tasks requiring significant memory or other
/// resources. To be used for workloads where hardware_concurrency() proves to		/// resources. To be used for workloads where hardware_concurrency() proves to
/// be less efficient. Avoid this strategy if doing lots of I/O. Currently		/// be less efficient. Avoid this strategy if doing lots of I/O. Currently
/// based on physical cores, if available for the host system, otherwise falls		/// based on physical cores, if available for the host system, otherwise falls
/// back to hardware_concurrency(). Returns 1 when LLVM is configured with		/// back to hardware_concurrency(). Returns 1 when LLVM is configured with
/// LLVM_ENABLE_THREADS = OFF.		/// LLVM_ENABLE_THREADS = OFF.
inline ThreadPoolStrategy		inline ThreadPoolStrategy
heavyweight_hardware_concurrency(unsigned ThreadCount = 0) {		heavyweight_hardware_concurrency(unsigned ThreadCount = 0) {
ThreadPoolStrategy S;		ThreadPoolStrategy S;
S.UseHyperThreads = false;		S.UseHyperThreads = false;
S.ThreadsRequested = ThreadCount;		S.ThreadsRequested = ThreadCount;
return S;		return S;
}		}

		/// Like heavyweight_hardware_concurrency() above, but builds a strategy
		/// based on the rules described for get_threadpool_strategy().
		/// If \p Num is invalid, returns a default strategy where one thread per
		/// hardware core is used.
		inline ThreadPoolStrategy heavyweight_hardware_concurrency(StringRef Num) {
		abrachetUnsubmitted Done Reply Inline Actions Nit: Remove `inline` https://llvm.org/docs/CodingStandards.html#don-t-use-inline-when-defining-a-function-in-a-class-definition abrachet: Nit: Remove `inline` https://llvm.org/docs/CodingStandards.html#don-t-use-inline-when-defining…
		aganeaAuthorUnsubmitted Done Reply Inline Actions After discussing offling with @abrachet , I'll leave the `inline` for now. It makes the symbol weak, removing `inline` would otherwise fail linking. I can move the function(s) to the .CPP after this patch to save on link time. aganea: After discussing offling with @abrachet , I'll leave the `inline` for now. It makes the symbol…
		Optional<ThreadPoolStrategy> S =
		get_threadpool_strategy(Num, heavyweight_hardware_concurrency());
		if (S)
		return *S;
		return heavyweight_hardware_concurrency();
		}

/// Returns a default thread strategy where all available hardware ressources		/// Returns a default thread strategy where all available hardware ressources
/// are to be used, except for those initially excluded by an affinity mask.		/// are to be used, except for those initially excluded by an affinity mask.
/// This function takes affinity into consideration. Returns 1 when LLVM is		/// This function takes affinity into consideration. Returns 1 when LLVM is
/// configured with LLVM_ENABLE_THREADS=OFF.		/// configured with LLVM_ENABLE_THREADS=OFF.
inline ThreadPoolStrategy hardware_concurrency(unsigned ThreadCount = 0) {		inline ThreadPoolStrategy hardware_concurrency(unsigned ThreadCount = 0) {
ThreadPoolStrategy S;		ThreadPoolStrategy S;
S.ThreadsRequested = ThreadCount;		S.ThreadsRequested = ThreadCount;
return S;		return S;
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/lib/LTO/LTO.cpp

Show First 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	LTO::RegularLTOState::RegularLTOState(unsigned ParallelCodeGenParallelismLevel,
const Config &Conf)		const Config &Conf)
: ParallelCodeGenParallelismLevel(ParallelCodeGenParallelismLevel),		: ParallelCodeGenParallelismLevel(ParallelCodeGenParallelismLevel),
Ctx(Conf), CombinedModule(std::make_unique<Module>("ld-temp.o", Ctx)),		Ctx(Conf), CombinedModule(std::make_unique<Module>("ld-temp.o", Ctx)),
Mover(std::make_unique<IRMover>(*CombinedModule)) {}		Mover(std::make_unique<IRMover>(*CombinedModule)) {}

LTO::ThinLTOState::ThinLTOState(ThinBackend Backend)		LTO::ThinLTOState::ThinLTOState(ThinBackend Backend)
: Backend(Backend), CombinedIndex(/HaveGVs/ false) {		: Backend(Backend), CombinedIndex(/HaveGVs/ false) {
if (!Backend)		if (!Backend)
this->Backend = createInProcessThinBackend();		this->Backend =
		createInProcessThinBackend(llvm::heavyweight_hardware_concurrency());
}		}

LTO::LTO(Config Conf, ThinBackend Backend,		LTO::LTO(Config Conf, ThinBackend Backend,
unsigned ParallelCodeGenParallelismLevel)		unsigned ParallelCodeGenParallelismLevel)
: Conf(std::move(Conf)),		: Conf(std::move(Conf)),
RegularLTO(ParallelCodeGenParallelismLevel, this->Conf),		RegularLTO(ParallelCodeGenParallelismLevel, this->Conf),
ThinLTO(std::move(Backend)) {}		ThinLTO(std::move(Backend)) {}

▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	class InProcessThinBackend : public ThinBackendProc {
std::set<GlobalValue::GUID> CfiFunctionDecls;		std::set<GlobalValue::GUID> CfiFunctionDecls;

Optional<Error> Err;		Optional<Error> Err;
std::mutex ErrMu;		std::mutex ErrMu;

public:		public:
InProcessThinBackend(		InProcessThinBackend(
const Config &Conf, ModuleSummaryIndex &CombinedIndex,		const Config &Conf, ModuleSummaryIndex &CombinedIndex,
unsigned ThinLTOParallelismLevel,		ThreadPoolStrategy ThinLTOParallelism,
const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,		const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
AddStreamFn AddStream, NativeObjectCache Cache)		AddStreamFn AddStream, NativeObjectCache Cache)
: ThinBackendProc(Conf, CombinedIndex, ModuleToDefinedGVSummaries),		: ThinBackendProc(Conf, CombinedIndex, ModuleToDefinedGVSummaries),
BackendThreadPool(		BackendThreadPool(ThinLTOParallelism), AddStream(std::move(AddStream)),
heavyweight_hardware_concurrency(ThinLTOParallelismLevel)),		Cache(std::move(Cache)) {
AddStream(std::move(AddStream)), Cache(std::move(Cache)) {
for (auto &Name : CombinedIndex.cfiFunctionDefs())		for (auto &Name : CombinedIndex.cfiFunctionDefs())
CfiFunctionDefs.insert(		CfiFunctionDefs.insert(
GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));		GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));
for (auto &Name : CombinedIndex.cfiFunctionDecls())		for (auto &Name : CombinedIndex.cfiFunctionDecls())
CfiFunctionDecls.insert(		CfiFunctionDecls.insert(
GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));		GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));
}		}

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	Error wait() override {
if (Err)		if (Err)
return std::move(*Err);		return std::move(*Err);
else		else
return Error::success();		return Error::success();
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

ThinBackend lto::createInProcessThinBackend(unsigned ParallelismLevel) {		ThinBackend lto::createInProcessThinBackend(ThreadPoolStrategy Parallelism) {
return [=](const Config &Conf, ModuleSummaryIndex &CombinedIndex,		return [=](const Config &Conf, ModuleSummaryIndex &CombinedIndex,
const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,		const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
AddStreamFn AddStream, NativeObjectCache Cache) {		AddStreamFn AddStream, NativeObjectCache Cache) {
return std::make_unique<InProcessThinBackend>(		return std::make_unique<InProcessThinBackend>(
Conf, CombinedIndex, ParallelismLevel, ModuleToDefinedGVSummaries,		Conf, CombinedIndex, Parallelism, ModuleToDefinedGVSummaries, AddStream,
AddStream, Cache);		Cache);
};		};
}		}

// Given the original \p Path to an output file, replace any path		// Given the original \p Path to an output file, replace any path
// prefix matching \p OldPrefix with \p NewPrefix. Also, create the		// prefix matching \p OldPrefix with \p NewPrefix. Also, create the
// resulting directory if it does not yet exist.		// resulting directory if it does not yet exist.
std::string lto::getThinLTOOutputFile(const std::string &Path,		std::string lto::getThinLTOOutputFile(const std::string &Path,
const std::string &OldPrefix,		const std::string &OldPrefix,
▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

llvm/lib/Support/Threading.cpp

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	}			}
	#endif			#endif

	#else			#else

	int computeHostNumHardwareThreads();			int computeHostNumHardwareThreads();

	unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {			unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
				if (ThreadsRequested > 0)
				return ThreadsRequested;

	int MaxThreadCount = UseHyperThreads ? computeHostNumHardwareThreads()			int MaxThreadCount = UseHyperThreads ? computeHostNumHardwareThreads()
	: sys::getHostNumPhysicalCores();			: sys::getHostNumPhysicalCores();
	if (MaxThreadCount <= 0)			if (MaxThreadCount <= 0)
	MaxThreadCount = 1;			MaxThreadCount = 1;

	// No need to create more threads than there are hardware threads, it would
	// uselessly induce more context-switching and cache eviction.
	if (!ThreadsRequested \|\| ThreadsRequested > (unsigned)MaxThreadCount)
	return MaxThreadCount;			return MaxThreadCount;
	return ThreadsRequested;			}

				Optional<ThreadPoolStrategy>
				llvm::get_threadpool_strategy(StringRef Num, ThreadPoolStrategy Default) {
				if (Num == "all")
				return llvm::hardware_concurrency();
				if (Num.empty())
				return Default;
				unsigned V;
				if (Num.getAsInteger(10, V))
				return None; // malformed 'Num' value
				if (V == 0)
				return Default;

				// Do not take the Default into account. This effectively disables
				// heavyweight_hardware_concurrency() if the user asks for any number of
				// threads on the cmd-line.
				ThreadPoolStrategy S = llvm::hardware_concurrency();
				S.ThreadsRequested = V;
				return S;
	}			}

	namespace {			namespace {
	struct SyncThreadInfo {			struct SyncThreadInfo {
	void (UserFn)(void );			void (UserFn)(void );
	void *UserData;			void *UserData;
	};			};

	Show All 30 Lines

llvm/lib/Support/Windows/Threading.inc

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	}			}

	struct ProcessorGroup {			struct ProcessorGroup {
	unsigned ID;			unsigned ID;
	unsigned AllThreads;			unsigned AllThreads;
	unsigned UsableThreads;			unsigned UsableThreads;
	unsigned ThreadsPerCore;			unsigned ThreadsPerCore;
	uint64_t Affinity;			uint64_t Affinity;

				unsigned useableCores() const {
				return std::max(1U, UsableThreads / ThreadsPerCore);
				}
	};			};

	template <typename F>			template <typename F>
	static bool IterateProcInfo(LOGICAL_PROCESSOR_RELATIONSHIP Relationship, F Fn) {			static bool IterateProcInfo(LOGICAL_PROCESSOR_RELATIONSHIP Relationship, F Fn) {
	DWORD Len = 0;			DWORD Len = 0;
	BOOL R = ::GetLogicalProcessorInformationEx(Relationship, NULL, &Len);			BOOL R = ::GetLogicalProcessorInformationEx(Relationship, NULL, &Len);
	if (R \|\| GetLastError() != ERROR_INSUFFICIENT_BUFFER) {			if (R \|\| GetLastError() != ERROR_INSUFFICIENT_BUFFER) {
	return false;			return false;
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

	int computeHostNumHardwareThreads() {			int computeHostNumHardwareThreads() {
	static unsigned Threads =			static unsigned Threads =
	aggregate(getProcessorGroups(),			aggregate(getProcessorGroups(),
	[](const ProcessorGroup &G) { return G.UsableThreads; });			[](const ProcessorGroup &G) { return G.UsableThreads; });
	return Threads;			return Threads;
	}			}

	// Assign the current thread to a more appropriate CPU socket or CPU group			// Finds the proper CPU socket where a thread number should go. Returns 'None'
	void llvm::ThreadPoolStrategy::apply_thread_strategy(			// if the thread shall remain on the actual CPU socket.
	unsigned ThreadPoolNum) const {			Optional<unsigned>
				llvm::ThreadPoolStrategy::compute_cpu_socket(unsigned ThreadPoolNum) const {
	ArrayRef<ProcessorGroup> Groups = getProcessorGroups();			ArrayRef<ProcessorGroup> Groups = getProcessorGroups();
				// Only one CPU socket in the system or process affinity was set, no need to
				// move the thread(s) to another CPU socket.
				if (Groups.size() <= 1)
				return None;

				// We ask for less threads than there are hardware threads per CPU socket, no
				// need to dispatch threads to other CPU sockets.
				unsigned MaxThreadsPerSocket =
				UseHyperThreads ? Groups[0].UsableThreads : Groups[0].useableCores();
				if (compute_thread_count() <= MaxThreadsPerSocket)
				return None;

	assert(ThreadPoolNum < compute_thread_count() &&			assert(ThreadPoolNum < compute_thread_count() &&
	"The thread index is not within thread strategy's range!");			"The thread index is not within thread strategy's range!");

	// In this mode, the ThreadNumber represents the core number, not the			// Assumes the same number of hardware threads per CPU socket.
	// hyper-thread number. Assumes all NUMA groups have the same amount of			return (ThreadPoolNum * Groups.size()) / compute_thread_count();
	// hyper-threads.			}
	if (!UseHyperThreads)
	ThreadPoolNum *= Groups[0].ThreadsPerCore;

	unsigned ThreadRangeStart = 0;
	for (unsigned I = 0; I < Groups.size(); ++I) {
	const ProcessorGroup &G = Groups[I];
	if (ThreadPoolNum >= ThreadRangeStart &&
	ThreadPoolNum < ThreadRangeStart + G.UsableThreads) {

				// Assign the current thread to a more appropriate CPU socket or CPU group
				void llvm::ThreadPoolStrategy::apply_thread_strategy(
				unsigned ThreadPoolNum) const {
				Optional<unsigned> Socket = compute_cpu_socket(ThreadPoolNum);
				if (!Socket)
				return;
				ArrayRef<ProcessorGroup> Groups = getProcessorGroups();
	GROUP_AFFINITY Affinity{};			GROUP_AFFINITY Affinity{};
	Affinity.Group = G.ID;			Affinity.Group = Groups[*Socket].ID;
	Affinity.Mask = G.Affinity;			Affinity.Mask = Groups[*Socket].Affinity;
	SetThreadGroupAffinity(GetCurrentThread(), &Affinity, nullptr);			SetThreadGroupAffinity(GetCurrentThread(), &Affinity, nullptr);
	}			}
	ThreadRangeStart += G.UsableThreads;
	}
	}

	llvm::BitVector llvm::get_thread_affinity_mask() {			llvm::BitVector llvm::get_thread_affinity_mask() {
	GROUP_AFFINITY Affinity{};			GROUP_AFFINITY Affinity{};
	GetThreadGroupAffinity(GetCurrentThread(), &Affinity);			GetThreadGroupAffinity(GetCurrentThread(), &Affinity);

	static unsigned All =			static unsigned All =
	aggregate(getProcessorGroups(),			aggregate(getProcessorGroups(),
	[](const ProcessorGroup &G) { return G.AllThreads; });			[](const ProcessorGroup &G) { return G.AllThreads; });
	Show All 16 Lines

llvm/test/Transforms/PGOProfile/thinlto_samplepgo_icp3.ll

	; REQUIRES: x86-registered-target			; REQUIRES: x86-registered-target

	; Do setup work for all below tests: generate bitcode and combined index			; Do setup work for all below tests: generate bitcode and combined index
	; RUN: opt -module-summary %s -o %t.bc			; RUN: opt -module-summary %s -o %t.bc
	; RUN: opt -module-summary %p/Inputs/thinlto_samplepgo_icp3.ll -o %t2.bc			; RUN: opt -module-summary %p/Inputs/thinlto_samplepgo_icp3.ll -o %t2.bc

	; Test to make sure importing and dead stripping works in the			; Test to make sure importing and dead stripping works in the
	; case where the target is a local function that also indirectly calls itself.			; case where the target is a local function that also indirectly calls itself.
	; RUN: llvm-lto2 run -thinlto-threads=1 -save-temps -o %t3 %t.bc %t2.bc -r %t.bc,fptr,plx -r %t.bc,main,plx -r %t2.bc,_Z6updatei,pl -r %t2.bc,fptr,l -print-imports 2>&1 \| FileCheck %s --check-prefix=IMPORTS			; RUN: llvm-lto2 run -thinlto-threads=1 -save-temps -o %t3 %t.bc %t2.bc -r %t.bc,fptr,plx -r %t.bc,main,plx -r %t2.bc,_Z6updatei,pl -r %t2.bc,fptr,l -print-imports 2>&1 \| FileCheck %s --check-prefix=IMPORTS

				; Also test with all threads on
				; RUN: llvm-lto2 run -thinlto-threads=all -save-temps -o %t3 %t.bc %t2.bc -r %t.bc,fptr,plx -r %t.bc,main,plx -r %t2.bc,_Z6updatei,pl -r %t2.bc,fptr,l -print-imports 2>&1 \| FileCheck %s --check-prefix=IMPORTS

				; Run with more threads than there are in the system
				; RUN: llvm-lto2 run -thinlto-threads=1000 -save-temps -o %t3 %t.bc %t2.bc -r %t.bc,fptr,plx -r %t.bc,main,plx -r %t2.bc,_Z6updatei,pl -r %t2.bc,fptr,l -print-imports 2>&1 \| FileCheck %s --check-prefix=IMPORTS

				; Provide a wrong thread count argument
				; RUN: llvm-lto2 run -thinlto-threads=foo -save-temps -o %t3 %t.bc %t2.bc -r %t.bc,fptr,plx -r %t.bc,main,plx -r %t2.bc,_Z6updatei,pl -r %t2.bc,fptr,l -print-imports 2>&1 \| FileCheck %s --check-prefix=IMPORTS

	; Make sure we import the promted indirectly called target			; Make sure we import the promted indirectly called target
	; IMPORTS: Import _ZL3foov.llvm.0			; IMPORTS: Import _ZL3foov.llvm.0

	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@fptr = local_unnamed_addr global void ()* null, align 8			@fptr = local_unnamed_addr global void ()* null, align 8

	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/tools/gold/gold-plugin.cpp

Show All 22 Lines
#include "llvm/Object/Error.h"		#include "llvm/Object/Error.h"
#include "llvm/Support/CachePruning.h"		#include "llvm/Support/CachePruning.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/ManagedStatic.h"		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
		#include "llvm/Support/Threading.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <list>		#include <list>
#include <map>		#include <map>
#include <plugin-api.h>		#include <plugin-api.h>
#include <string>		#include <string>
#include <system_error>		#include <system_error>
#include <utility>		#include <utility>
#include <vector>		#include <vector>
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	enum OutputType {
OT_NORMAL,		OT_NORMAL,
OT_DISABLE,		OT_DISABLE,
OT_BC_ONLY,		OT_BC_ONLY,
OT_ASM_ONLY,		OT_ASM_ONLY,
OT_SAVE_TEMPS		OT_SAVE_TEMPS
};		};
static OutputType TheOutputType = OT_NORMAL;		static OutputType TheOutputType = OT_NORMAL;
static unsigned OptLevel = 2;		static unsigned OptLevel = 2;
// Default parallelism of 0 used to indicate that user did not specify.
// Actual parallelism default value depends on implementation.
// Currently only affects ThinLTO, where the default is the max cores in the		// Currently only affects ThinLTO, where the default is the max cores in the
// system.		// system. See llvm::get_threadpool_strategy() for acceptable values.
static unsigned Parallelism = 0;		static std::string Parallelism;
// Default regular LTO codegen parallelism (number of partitions).		// Default regular LTO codegen parallelism (number of partitions).
static unsigned ParallelCodeGenParallelismLevel = 1;		static unsigned ParallelCodeGenParallelismLevel = 1;
#ifdef NDEBUG		#ifdef NDEBUG
static bool DisableVerify = true;		static bool DisableVerify = true;
#else		#else
static bool DisableVerify = false;		static bool DisableVerify = false;
#endif		#endif
static std::string obj_path;		static std::string obj_path;
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	if (opt.startswith("mcpu=")) {
cache_dir = std::string(opt.substr(strlen("cache-dir=")));		cache_dir = std::string(opt.substr(strlen("cache-dir=")));
} else if (opt.startswith("cache-policy=")) {		} else if (opt.startswith("cache-policy=")) {
cache_policy = std::string(opt.substr(strlen("cache-policy=")));		cache_policy = std::string(opt.substr(strlen("cache-policy=")));
} else if (opt.size() == 2 && opt[0] == 'O') {		} else if (opt.size() == 2 && opt[0] == 'O') {
if (opt[1] < '0' \|\| opt[1] > '3')		if (opt[1] < '0' \|\| opt[1] > '3')
message(LDPL_FATAL, "Optimization level must be between 0 and 3");		message(LDPL_FATAL, "Optimization level must be between 0 and 3");
OptLevel = opt[1] - '0';		OptLevel = opt[1] - '0';
} else if (opt.startswith("jobs=")) {		} else if (opt.startswith("jobs=")) {
if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))		StringRef Num(opt_ + 5);
message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);		if (!get_threadpool_strategy(Num))
		message(LDPL_FATAL, "Invalid parallelism level: %s", Num.data());
		Parallelism = Num;
} else if (opt.startswith("lto-partitions=")) {		} else if (opt.startswith("lto-partitions=")) {
if (opt.substr(strlen("lto-partitions="))		if (opt.substr(strlen("lto-partitions="))
.getAsInteger(10, ParallelCodeGenParallelismLevel))		.getAsInteger(10, ParallelCodeGenParallelismLevel))
message(LDPL_FATAL, "Invalid codegen partition level: %s", opt_ + 5);		message(LDPL_FATAL, "Invalid codegen partition level: %s", opt_ + 5);
} else if (opt == "disable-verify") {		} else if (opt == "disable-verify") {
DisableVerify = true;		DisableVerify = true;
} else if (opt.startswith("sample-profile=")) {		} else if (opt.startswith("sample-profile=")) {
sample_profile = std::string(opt.substr(strlen("sample-profile=")));		sample_profile = std::string(opt.substr(strlen("sample-profile=")));
▲ Show 20 Lines • Show All 587 Lines • ▼ Show 20 Lines	static std::unique_ptr<LTO> createLTO(IndexWriteCallback OnIndexWrite,
Conf.RelocModel = RelocationModel;		Conf.RelocModel = RelocationModel;
Conf.CodeModel = getCodeModel();		Conf.CodeModel = getCodeModel();
Conf.CGOptLevel = getCGOptLevel();		Conf.CGOptLevel = getCGOptLevel();
Conf.DisableVerify = options::DisableVerify;		Conf.DisableVerify = options::DisableVerify;
Conf.OptLevel = options::OptLevel;		Conf.OptLevel = options::OptLevel;
Conf.PTO.LoopVectorization = options::OptLevel > 1;		Conf.PTO.LoopVectorization = options::OptLevel > 1;
Conf.PTO.SLPVectorization = options::OptLevel > 1;		Conf.PTO.SLPVectorization = options::OptLevel > 1;

if (options::Parallelism)
Backend = createInProcessThinBackend(options::Parallelism);
if (options::thinlto_index_only) {		if (options::thinlto_index_only) {
std::string OldPrefix, NewPrefix;		std::string OldPrefix, NewPrefix;
getThinLTOOldAndNewPrefix(OldPrefix, NewPrefix);		getThinLTOOldAndNewPrefix(OldPrefix, NewPrefix);
Backend = createWriteIndexesThinBackend(OldPrefix, NewPrefix,		Backend = createWriteIndexesThinBackend(OldPrefix, NewPrefix,
options::thinlto_emit_imports_files,		options::thinlto_emit_imports_files,
LinkedObjectsFile, OnIndexWrite);		LinkedObjectsFile, OnIndexWrite);
		} else {
		Backend = createInProcessThinBackend(
		llvm::heavyweight_hardware_concurrency(options::Parallelism));
}		}

Conf.OverrideTriple = options::triple;		Conf.OverrideTriple = options::triple;
Conf.DefaultTriple = sys::getDefaultTargetTriple();		Conf.DefaultTriple = sys::getDefaultTargetTriple();

Conf.DiagHandler = diagnosticHandler;		Conf.DiagHandler = diagnosticHandler;

switch (options::TheOutputType) {		switch (options::TheOutputType) {
▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

llvm/tools/llvm-lto2/llvm-lto2.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

static cl::opt<bool>		static cl::opt<bool>
ThinLTODistributedIndexes("thinlto-distributed-indexes", cl::init(false),		ThinLTODistributedIndexes("thinlto-distributed-indexes", cl::init(false),
cl::desc("Write out individual index and "		cl::desc("Write out individual index and "
"import files for the "		"import files for the "
"distributed backend case"));		"distributed backend case"));

// Default to using all available threads in the system, but using only one		// Default to using all available threads in the system, but using only one
// thread per core, as indicated by the usage of		// thread per core (no SMT).
// heavyweight_hardware_concurrency() in the InProcessThinBackend constructor.		// Use -thinlto-threads=all to use hardware_concurrency() instead, which means
static cl::opt<int> Threads("thinlto-threads", cl::init(0));		// to use all hardware threads or cores in the system.
		static cl::opt<std::string> Threads("thinlto-threads");

static cl::list<std::string> SymbolResolutions(		static cl::list<std::string> SymbolResolutions(
"r",		"r",
cl::desc("Specify a symbol resolution: filename,symbolname,resolution\n"		cl::desc("Specify a symbol resolution: filename,symbolname,resolution\n"
"where \"resolution\" is a sequence (which may be empty) of the\n"		"where \"resolution\" is a sequence (which may be empty) of the\n"
"following characters:\n"		"following characters:\n"
" p - prevailing: the linker has chosen this definition of the\n"		" p - prevailing: the linker has chosen this definition of the\n"
" symbol\n"		" symbol\n"
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	static int run(int argc, char **argv) {
ThinBackend Backend;		ThinBackend Backend;
if (ThinLTODistributedIndexes)		if (ThinLTODistributedIndexes)
Backend = createWriteIndexesThinBackend(/* OldPrefix */ "",		Backend = createWriteIndexesThinBackend(/* OldPrefix */ "",
/* NewPrefix */ "",		/* NewPrefix */ "",
/* ShouldEmitImportsFiles */ true,		/* ShouldEmitImportsFiles */ true,
/* LinkedObjectsFile */ nullptr,		/* LinkedObjectsFile */ nullptr,
/* OnWrite */ {});		/* OnWrite */ {});
else		else
Backend = createInProcessThinBackend(Threads);		Backend = createInProcessThinBackend(
		llvm::heavyweight_hardware_concurrency(Threads));
LTO Lto(std::move(Conf), std::move(Backend));		LTO Lto(std::move(Conf), std::move(Backend));

bool HasErrors = false;		bool HasErrors = false;
for (std::string F : InputFilenames) {		for (std::string F : InputFilenames) {
std::unique_ptr<MemoryBuffer> MB = check(MemoryBuffer::getFile(F), F);		std::unique_ptr<MemoryBuffer> MB = check(MemoryBuffer::getFile(F), F);
std::unique_ptr<InputFile> Input =		std::unique_ptr<InputFile> Input =
check(InputFile::create(MB->getMemBufferRef()), F);		check(InputFile::create(MB->getMemBufferRef()), F);

▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Allow usage of all SMT threads in the systemClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248254

clang/lib/Driver/ToolChains/CommonArgs.h

clang/lib/Driver/ToolChains/CommonArgs.cpp

clang/lib/Driver/ToolChains/Darwin.cpp

lld/COFF/Config.h

lld/COFF/Driver.cpp

lld/COFF/LTO.cpp

lld/ELF/Config.h

lld/ELF/Driver.cpp

lld/ELF/LTO.cpp

lld/test/COFF/thinlto.ll

lld/test/ELF/basic.s

lld/test/ELF/lto/thinlto.ll

lld/test/wasm/lto/thinlto.ll

lld/wasm/Config.h

lld/wasm/Driver.cpp

lld/wasm/LTO.cpp

llvm/include/llvm/LTO/LTO.h

llvm/include/llvm/Support/Threading.h

llvm/lib/LTO/LTO.cpp

llvm/lib/Support/Threading.cpp

llvm/lib/Support/Windows/Threading.inc

llvm/test/Transforms/PGOProfile/thinlto_samplepgo_icp3.ll

llvm/tools/gold/gold-plugin.cpp

llvm/tools/llvm-lto2/llvm-lto2.cpp

[ThinLTO] Allow usage of all SMT threads in the system
ClosedPublic