This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/
-
clang-doc/tool/
-
tool/
-
ClangDocMain.cpp
-
clangd/
-
TUScheduler.cpp
-
index/
-
Background.h
1/3
Background.cpp
-
BackgroundRebuild.h
-
clang/
-
lib/Tooling/
-
Tooling/
-
AllTUsExecution.cpp
-
DependencyScanning/
-
DependencyScanningFilesystem.cpp
-
tools/clang-scan-deps/
-
clang-scan-deps/
-
ClangScanDeps.cpp
-
lld/ELF/
-
ELF/
-
SyntheticSections.cpp
-
llvm/
-
include/llvm/
-
llvm/
-
LTO/
-
LTO.h
-
Support/
-
ThreadPool.h
2/2
Threading.h
-
lib/
-
CodeGen/
-
ParallelCG.cpp
-
DWARFLinker/
-
DWARFLinker.cpp
-
DebugInfo/GSYM/
-
GSYM/
-
DwarfTransformer.cpp
-
ExecutionEngine/Orc/
-
Orc/
-
LLJIT.cpp
-
LTO/
-
LTO.cpp
-
LTOBackend.cpp
-
ThinLTOCodeGenerator.cpp
-
Support/
2/2
Host.cpp
-
Parallel.cpp
-
ThreadPool.cpp
3/4
Threading.cpp
-
Unix/
2/3
Threading.inc
-
Windows/
2/2
Threading.inc
-
tools/
-
dsymutil/
-
dsymutil.cpp
-
gold/
-
gold-plugin.cpp
-
llvm-cov/
-
CodeCoverage.cpp
-
CoverageExporterJson.cpp
-
CoverageReport.cpp
-
llvm-lto2/
-
llvm-lto2.cpp
-
llvm-profdata/
-
llvm-profdata.cpp
-
unittests/Support/
-
Support/
-
Host.cpp
-
TaskQueueTest.cpp
1/2
ThreadPool.cpp
-
Threading.cpp
-
mlir/lib/Pass/
-
lib/
-
Pass/
-
Pass.cpp

Differential D71775

[ThreadPool] On Windows, extend usage to all CPU sockets and all NUMA groups
ClosedPublic

Authored by aganea on Dec 20 2019, 9:20 AM.

Download Raw Diff

Details

Reviewers

mehdi_amini
rnk
tejohnson
russell.gallop
dexonsmith
JDevlieghere
• espindola
jlpeyton
nicolasvasilache
rriddle
MaskRay

Commits

rG8404aeb56a73: [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets…

Summary

TL;DR: This patches ensures that, on Windows, all CPU sockets and all NUMA nodes are used by the ThreadPool. The goal was to have LLD/ThinLTO use all hardware threads in the system, which isn't the case currently on multi-socket or large CPU count systems.

(this could possibly be split into a few patches, but I just wanted an overall opinion)

Background

Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically was represented by the sizeof(void*) (still is that way). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on startup, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation will only get worse in the years to come, as dies with more cores will appear on the market.

The changes in this patch

Previously, the problem was that heavyweight_hardware_concurrency() API was introduced so that only one hardware thread per core was used. Once that API returns, that original intention is lost. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std::thread::hardware_concurrency() -- which returns only processors from the current "processor group".

What if we wanted to use all "processor groups" ? Even if we implemented properly heavyweight_hardware_concurrency(), what should it then return ? 18 or 36 ?
What if the user specified /opt:lldltojobs=36 ? Should we assign 36 threads on the current "processor group" or should we dispatch extra threads on the second "processor groups" ?

To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

Discussion

Ideally, we should consider all "processors" (on Windows) or all "CPUs" (Linux) as all equal, in which case heavyweight_hardware_concurrency() wouldn't be needed. I'm not sure how micro-managing threads, cores and NUMA nodes will scale in the years to come (probably not well). Will it make sense to say "I don't want hyper-threads" ? Or saying /opt:lldltojobs=whatever when you have a thousand(s)-core system ? How would that work with NUMA affinity ? For example the Fujitsu A64FX has 4x "12-core tiles" on the same die, each tile being connected to an internal 8-GB HBM2 memory (each located internally on the CPU die). How would we dispatch threads in that case ? The AMD EPYC uses the same concept of "tiles", however it doesn't have internal memory yet, but most likely the EPYC v3 will use the same architecture.

@tejohnson : Teresa, since you added heavyweight_hardware_concurrency(), do you have a benchmark which compares ThinLTO running with heavyweight_hardware_concurrency() or with hardware_concurrency() ? (I haven't done that test yet)
It would make things a lot simpler if we didn't have that API, and in general considered that we could use all hardware threads in the system; and that they can perform equally.

NOTE: For full effect, this patch needs rpmalloc (D71786)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aganea created this revision.Dec 20 2019, 9:20 AM

Herald added a reviewer: JDevlieghere. · View Herald TranscriptDec 20 2019, 9:20 AM

Herald added a reviewer: • espindola. · View Herald Transcript

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, cfe-commits, usaxena95 and 12 others. · View Herald Transcript

aganea edited the summary of this revision. (Show Details)Dec 20 2019, 9:28 AM

aganea added a reviewer: jlpeyton.

Unit tests: pass. 61057 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B42839: Diff 234910!Dec 20 2019, 9:59 AM

aganea mentioned this in D71786: [Support] On Windows, add optional support for {rpmalloc|snmalloc|mimalloc}.Dec 20 2019, 2:12 PM

aganea edited the summary of this revision. (Show Details)

Will it make sense to say "I don't want hyper-threads" ?

Not sure I remember correctly, but I believe one motivation behind avoiding "hyper-threads" and other virtual cores was that while they improve slightly the performance, they also increase the peak memory requirements: using heavyweight_hardware_concurrency() seemed like a good default tradeoff for most end-users.

Also: using heavyweight_hardware_concurrency() in the linker but having multiple linker jobs schedules by the build system was another reason (I think LLVM CMake default to 2 parallel link jobs when using ThinLTO for instance).

ychen added a subscriber: ychen.Dec 20 2019, 11:22 PM

In D71775#1793767, @mehdi_amini wrote:

Will it make sense to say "I don't want hyper-threads" ?

Not sure I remember correctly, but I believe one motivation behind avoiding "hyper-threads" and other virtual cores was that while they improve slightly the performance, they also increase the peak memory requirements: using heavyweight_hardware_concurrency() seemed like a good default tradeoff for most end-users.

It all makes sense. After this patch, the memory consumption is doubled when using both CPU sockets. Evidently then there's also a discussion about memory bandwidth, which doesn't scale in my case, when using both sockets (possibly on a AMD Epyc it could be better because is has more memory channels.). This is also why enabling the second socket only marginally decrease the timings.

In Ubisoft's case, time is immensely more valuable (both compute and human) than memory sticks. Historically we didn't really use LTO on game productions because it was really slow, and often introduced undesirable bugs or side-effects. The graphs in D71786 are for Rainbow 6: Siege, which is a "smaller" codebase. For larger games, LTO link time is more in the range of 1h 20min, both for MSVC and previous versions of Clang. If there's a LTO-specific bug in a final build, it is very hard to iterate with that kind of timings. In addition, there are hundreds of builds every day on the build system, and we want to keep all the cores in the build system busy. This is why both build and link times are important to us.

In D71775#1793768, @mehdi_amini wrote:

Also: using heavyweight_hardware_concurrency() in the linker but having multiple linker jobs schedules by the build system was another reason (I think LLVM CMake default to 2 parallel link jobs when using ThinLTO for instance).

Understood. If one sets the CPU affinity when starting the application, ie. start /affinity XXX lld-link.exe ..., then this patch disables dispatching on other "processor groups", even if they are available. However, there doesn't seem to be a way to restrain the application to one "processor group".

Remains to see if we want to keep the current behavior, where only one socket is use, and have an option to enable the other CPU sockets. Or if we want that multi-socket behavior by default, as in this patch.

riccibruno added a subscriber: riccibruno.Dec 21 2019, 9:51 AM

yurybura added a subscriber: yurybura.Dec 23 2019, 2:02 PM

+@gbiv @inglorion, other ThinLTO users.

To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

That seems reasonable to me.

llvm/include/llvm/ADT/SmallBitVector.h
487 ↗	(On Diff #234910)	I guess we don't need these changes, since it looks like you use BitVector only below.
500 ↗	(On Diff #234910)	You return `A - B` but then compare for equality to -1 and 1. I guess it works because you are doing it bit by bit, but it's exciting.
llvm/include/llvm/Support/Threading.h
152	Let's name the fields in a way that indicates that these numbers are the requested number of threads, not the final number. So, `ThreadsRequested` or something like that.
157	This could be `UseHyperThreads`. The first time I read it, I guessed that it indicated if the system has hyperthreads.
llvm/lib/Support/Host.cpp
1277	Another -1 case.
1334	Note: the -1 case.
llvm/lib/Support/Threading.cpp
84–85	We already have a public API, getHostNumPhysicalCores. Can this use that?
87	I see these `computeHostNum*` methods return int, and some versions seem to return -1 to indicate errors. I think you'll want to use `int` here and check if `MaxThreadCount <= 0`. Otherwise below we may do a signed/unsigned comparison mismatch, and return ~0U for MaxThreadCount.
llvm/lib/Support/Unix/Threading.inc
273	I'm not sure it makes sense to say "physical threads". I think "physical cores" was meant to distinguish between hardware threads and cores. Describing hardware execution resources as physical or non-physical isn't very precise or meaningful in the first place, but I don't think it should apply to hyper threads.
llvm/lib/Support/Windows/Threading.inc
160	This is cached, so maybe `getProcessorGroups` to indicate that it is not intended to be expensive.
212	Seems like `static` can be used here.
llvm/unittests/Support/ThreadPool.cpp
31	"Temporary" >_>
180	I guess this is why you added comparison operators. In any case, let's remove the commented out code in the final version.

Herald added a subscriber: george.burgess.iv. · View Herald TranscriptDec 30 2019, 1:48 PM

Updated as suggested by @rnk.
I've also removed ThreadPoolStrategy::PinThreads because it wasn't conclusive. In the best case, pinning threads to a core / hyperthread, was similar in performance to that of using a full CPU socket affinity; in the worst case, it was degrading performance. The NT scheduler seems to be doing a pretty good job here, so we'll stick with it :-)

Herald added subscribers: lucyrfox, mgester, arpith-jacob and 6 others. · View Herald TranscriptJan 9 2020, 2:12 PM

aganea added inline comments.Jan 9 2020, 2:12 PM

llvm/lib/Support/Unix/Threading.inc
273	`computeHostNumHardwareThreads()` ?

ormris added a subscriber: ormris.Jan 9 2020, 2:45 PM

Ping! Any further comments?

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJan 16 2020, 6:00 AM

Herald added a subscriber: liufengdb. · View Herald Transcript

lgtm

llvm/lib/Support/Unix/Threading.inc
273	lgtm

This revision is now accepted and ready to land.Feb 11 2020, 5:52 PM

Herald added a reviewer: rriddle. · View Herald TranscriptFeb 11 2020, 5:52 PM

Herald added a subscriber: Joonsoo. · View Herald Transcript

This revision now requires review to proceed.Feb 11 2020, 5:52 PM

aganea mentioned this in D74569: [clang-scan-deps] Switch to using a ThreadPool.Feb 13 2020, 10:29 AM

(Appeasing herald for MLIR changes).

This revision is now accepted and ready to land.Feb 13 2020, 11:28 AM

Closed by commit rG8404aeb56a73: [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets… (authored by aganea). · Explain WhyFeb 14 2020, 7:28 AM

This revision was automatically updated to reflect the committed changes.

https://bugs.chromium.org/p/chromium/issues/detail?id=1051578#c12 :

"""
FYI for those building on AMD Bulldozer family of processors and its various iterations after this commit:

https://reviews.llvm.org/D71775

Building with ThinLTO on Bulldozer and similar appears to now be capped to how Windows reports cores versus logical processors, thus now halving the number of LTO threads available when building. Manually setting /opt:lldltojobs= for LLD does not override it, as that only sets an upper limit.

Found out as I locally build on a 32-core Opteron system. Windows treats it as 16 cores and 32 logical processors, but it is not a SMT setup like Intel HyperTreading. In particular:

"A module consists of a coupling of two "conventional" x86 out of order processing cores. The processing core shares the early pipeline stages (e.g. L1i, fetch, decode), the FPUs, and the L2 cache with the rest of the module."

https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)

Naturally, build times have increased dramatically. YMMV.
"""

Sounds like this patch might have some drawbacks.

@thakis I think this is a side-effect of implementing computeHostNumPhysicalCores() for Windows, which previously returned -1, which in turn made llvm::heavyweight_hardware_concurrency() previously behave like llvm::hardware_concurrency() (on Windows only).
At first sight, Linux seems to report the same thing as Windows (2 cores per "module"). If I read this correctly, the behavior on Linux for a AMD Opteron is/was the same as it is currently on Windows after this patch, that is, only one core out of two would be used for llvm::heavyweight_hardware_concurrency().

I think there is a wider problem outside the scope of this patch, which is, some users want to use 100% of the cores/hyper-threads when using ThinLTO. Currently there's no way. An algorithm using llvm::heavyweight_hardware_concurrency() explicitly states "I only want one core out of two". It'd be nice if we had a cmd-line flag to override this, to say "I want to use all hyper-threads/cores/modules". /opt:lldltojobs=all? The impact is small on a modern Intel Skylake CPU, but it might be better depending on the CPU (AMD Bulldozer).

CORRECTION: The impact is small(-er) with SMT enabled because only half of that link-time truly multi-threaded (in this example). The other half is essentially the falloff and the regular .EXE section merging & debug info merging (of which currently none is multi-threaded for the COFF driver).
Graph of the CPU usage when linking clang.exe (here with rpmalloc):

I too can see how SMT might not afford much performance difference for LTO codegen.

CMT appears to be more significant. I do not have exact numbers right now as my build box is busy, but the change added like an hour to locally building the Chromium browser with ThinLTO optimizations enabled. Win10 running on a 32-core Opteron Piledriver (bdver2) system.

Definitely agree something like "/opt:lldltojobs=all" in a separate patch would be a good solution if possible for this particular (corner) case.

There is something puzzling to me in this patch in the way the ThreadPool was modified: the client of the thread pool cannot request a number of threads anymore, but only a *maximum* number of threads. I don't really understand why? Using the OS reported concurrency was always a "default", isn't this change orthogonal / unrelated to the stated goal of this patch?

aganea marked an inline comment as done.Feb 26 2020, 8:29 AM

aganea added inline comments.

llvm/lib/Support/Threading.cpp
94	@mehdi_amini You mean this? Testing was showing degraded performance if ThreadsRequested > MaxThreadCount, so I thought it'd maybe better to prevent that situation. More soft threads than hardware threads means you'd pay for context switching, for the cache eviction and for extra memory pressure (even more if the allocator has per-thread pools). Do you see a cases where not having this test would be valuable? Providing `--threads=50`, and creating 50-threads ThreadPool when your CPU only supports 12 hardware threads? The only case I can think of is usage of async operations in threads, but then your ThreadPool sounds more like a queue, and maybe it's not the right hammer for the nail? Should we support that case and explicitly tag the ThreadPool constructor in client code with something like ThreadPool(50, AcknowledgeDegradedPerformance)?

mehdi_amini added inline comments.Feb 27 2020, 10:42 PM

llvm/lib/Support/Threading.cpp
94	Testing was showing degraded performance I don't understand why this is relevant to anything else than the default? This is a library and if the caller override the default then IMO this layer should just let it be! The current issue with bulldozer is a perfect example of this: the user is requesting more threads: why wouldn't you honor this request? If they want to shoot themselves in the foot, I'd let them (in most cases the user would know something you don't, like here). I don't think we need the extra `AcknowledgeDegradedPerformance` flag, if the client override the default they can be assumed to know what they're doing (or they shouldn't use such an API in the first place), this seems more in line with the rest of the system (and C++ API in general I believe).

With this patch the Threading.PhysicalConcurrency unit test fails when run with an affinity less than the number of physical cpus. I've raised https://bugs.llvm.org/show_bug.cgi?id=45556.

Herald added a reviewer: MaskRay. · View Herald TranscriptApr 15 2020, 7:55 AM

Herald added subscribers: frgossen, grosul1. · View Herald Transcript

sammccall added a subscriber: sammccall.Jun 22 2020, 2:50 PM

sammccall added inline comments.

clang-tools-extra/clangd/index/Background.cpp
154	Hmm, I finally stumbled across this today when editing the rebuild policy. I do wish this hadn't been changed without understanding what this code is doing or getting review from an owner. After this change, modifying the background-index rebuild frequency (which was initially tuned to be roughly "after each thread built one file") has the side-effect of changing the number of threads used for background indexing! Less seriously, the use of zero to mean "all the threads" is problematic here because in the other thread roots in this project we use zero to mean "no threads, work synchronously". I'll look into reverting the clangd parts of this patch.

Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2020, 2:50 PM

Herald added subscribers: msifontes, jurahul, Kayjukh and 2 others. · View Herald Transcript

aganea marked an inline comment as done.Jun 22 2020, 3:39 PM

aganea added inline comments.

clang-tools-extra/clangd/index/Background.cpp
154	Sorry about that Sam. Do you think 'one' could be used in clangd instead? That is the common value used across other parts of LLVM to signify 'no threads', and also when `LLVM_ENABLE_THREADS` is off. 'zero' means to use the default settings for the thread strategy. That is, `llvm::hardware_concurrency(0)` means to use all hardware threads; or `llvm::heavyweight_hardware_concurrency(0)` means to use all hardware cores, but only one `std::thread` per core.

I've sent https://reviews.llvm.org/D82352 to clean up some of the logic in clangd.

clang-tools-extra/clangd/index/Background.cpp
154	No worries, it happens and nothing came of it except a little head-scratching. Sorry for being grumpy, I shouldn't send email late at night... Do you think 'one' could be used in clangd instead? That is the common value used across other parts of LLVM to signify 'no threads', and also when LLVM_ENABLE_THREADS is off One can't be used here, it means "spawn one background thread". (Clangd has several distinct components that spawn threads). FWIW, clangd can't be built without LLVM_ENABLE_THREADS - it needs concurrency to be useful, and we designed around threads. Most of the threading can be turned off at runtime for certain tests (that's what a threadpool size of zero means) but not the background index - we just disable it instead.

saudi mentioned this in D102633: [clang-scan-deps] Improvements to thread usage.May 17 2021, 8:27 AM

aganea mentioned this in D138747: [Support] On Windows 11, fix an affinity mask issue on large core count machines.Nov 26 2022, 11:52 AM

Revision Contents

Path

Size

clang-tools-extra/

clang-doc/

tool/

ClangDocMain.cpp

3 lines

clangd/

TUScheduler.cpp

8 lines

index/

Background.h

2 lines

Background.cpp

5 lines

BackgroundRebuild.h

4 lines

clang/

lib/

Tooling/

AllTUsExecution.cpp

3 lines

DependencyScanning/

DependencyScanningFilesystem.cpp

3 lines

tools/

clang-scan-deps/

ClangScanDeps.cpp

14 lines

lld/

ELF/

SyntheticSections.cpp

8 lines

llvm/

include/

llvm/

LTO/

LTO.h

3 lines

Support/

ThreadPool.h

17 lines

Threading.h

69 lines

lib/

CodeGen/

ParallelCG.cpp

2 lines

DWARFLinker/

DWARFLinker.cpp

2 lines

DebugInfo/

GSYM/

DwarfTransformer.cpp

2 lines

ExecutionEngine/

Orc/

LLJIT.cpp

3 lines

LTO/

LTO.cpp

6 lines

LTOBackend.cpp

3 lines

ThinLTOCodeGenerator.cpp

6 lines

Support/

7 lines

14 lines

23 lines

46 lines

Unix/

Threading.inc

24 lines

Windows/

Threading.inc

162 lines

tools/

dsymutil/

dsymutil.cpp

9 lines

gold/

gold-plugin.cpp

4 lines

llvm-cov/

CodeCoverage.cpp

6 lines

CoverageExporterJson.cpp

8 lines

CoverageReport.cpp

7 lines

llvm-lto2/

llvm-lto2.cpp

4 lines

llvm-profdata/

llvm-profdata.cpp

9 lines

unittests/

Support/

3 lines

6 lines

48 lines

3 lines

mlir/

lib/

Pass/

Pass.cpp

3 lines

Diff 244664

clang-tools-extra/clang-doc/tool/ClangDocMain.cpp

Show First 20 Lines • Show All 262 Lines • ▼ Show 20 Lines	Exec->get()->getToolResults()->forEachResult(
});		});

// First reducing phase (reduce all decls into one info per decl).		// First reducing phase (reduce all decls into one info per decl).
llvm::outs() << "Reducing " << USRToBitcode.size() << " infos...\n";		llvm::outs() << "Reducing " << USRToBitcode.size() << " infos...\n";
std::atomic<bool> Error;		std::atomic<bool> Error;
Error = false;		Error = false;
llvm::sys::Mutex IndexMutex;		llvm::sys::Mutex IndexMutex;
// ExecutorConcurrency is a flag exposed by AllTUsExecution.h		// ExecutorConcurrency is a flag exposed by AllTUsExecution.h
llvm::ThreadPool Pool(ExecutorConcurrency == 0 ? llvm::hardware_concurrency()		llvm::ThreadPool Pool(llvm::hardware_concurrency(ExecutorConcurrency));
: ExecutorConcurrency);
for (auto &Group : USRToBitcode) {		for (auto &Group : USRToBitcode) {
Pool.async([&]() {		Pool.async([&]() {
std::vector<std::unique_ptr<doc::Info>> Infos;		std::vector<std::unique_ptr<doc::Info>> Infos;

for (auto &Bitcode : Group.getValue()) {		for (auto &Bitcode : Group.getValue()) {
llvm::BitstreamCursor Stream(Bitcode);		llvm::BitstreamCursor Stream(Bitcode);
doc::ClangDocBitcodeReader Reader(Stream);		doc::ClangDocBitcodeReader Reader(Stream);
auto ReadInfos = Reader.readBitcode();		auto ReadInfos = Reader.readBitcode();
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

clang-tools-extra/clangd/TUScheduler.cpp

Show First 20 Lines • Show All 836 Lines • ▼ Show 20 Lines	case TUAction::Idle:
break;		break;
}		}
return OS.str();		return OS.str();
}		}

} // namespace		} // namespace

unsigned getDefaultAsyncThreadsCount() {		unsigned getDefaultAsyncThreadsCount() {
unsigned HardwareConcurrency = llvm::heavyweight_hardware_concurrency();		return llvm::heavyweight_hardware_concurrency().compute_thread_count();
// heavyweight_hardware_concurrency may fall back to hardware_concurrency.
// C++ standard says that hardware_concurrency() may return 0; fallback to 1
// worker thread in that case.
if (HardwareConcurrency == 0)
return 1;
return HardwareConcurrency;
}		}

FileStatus TUStatus::render(PathRef File) const {		FileStatus TUStatus::render(PathRef File) const {
FileStatus FStatus;		FileStatus FStatus;
FStatus.uri = URIForFile::canonicalize(File, /TUPath=/File);		FStatus.uri = URIForFile::canonicalize(File, /TUPath=/File);
FStatus.state = renderTUAction(Action);		FStatus.state = renderTUAction(Action);
return FStatus;		return FStatus;
}		}
▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

clang-tools-extra/clangd/index/Background.h

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	public:			public:
	/// If BuildIndexPeriodMs is greater than 0, the symbol index will only be			/// If BuildIndexPeriodMs is greater than 0, the symbol index will only be
	/// rebuilt periodically (one per \p BuildIndexPeriodMs); otherwise, index is			/// rebuilt periodically (one per \p BuildIndexPeriodMs); otherwise, index is
	/// rebuilt for each indexed file.			/// rebuilt for each indexed file.
	BackgroundIndex(			BackgroundIndex(
	Context BackgroundContext, const FileSystemProvider &,			Context BackgroundContext, const FileSystemProvider &,
	const GlobalCompilationDatabase &CDB,			const GlobalCompilationDatabase &CDB,
	BackgroundIndexStorage::Factory IndexStorageFactory,			BackgroundIndexStorage::Factory IndexStorageFactory,
	size_t ThreadPoolSize = llvm::heavyweight_hardware_concurrency(),			size_t ThreadPoolSize = 0, // 0 = use all hardware threads
	std::function<void(BackgroundQueue::Stats)> OnProgress = nullptr);			std::function<void(BackgroundQueue::Stats)> OnProgress = nullptr);
	~BackgroundIndex(); // Blocks while the current task finishes.			~BackgroundIndex(); // Blocks while the current task finishes.

	// Enqueue translation units for indexing.			// Enqueue translation units for indexing.
	// The indexing happens in a background thread, so the symbols will be			// The indexing happens in a background thread, so the symbols will be
	// available sometime later.			// available sometime later.
	void enqueue(const std::vector<std::string> &ChangedFiles) {			void enqueue(const std::vector<std::string> &ChangedFiles) {
	Queue.push(changedFilesTask(ChangedFiles));			Queue.push(changedFilesTask(ChangedFiles));
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

clang-tools-extra/clangd/index/Background.cpp

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	: SwapIndex(std::make_unique<MemIndex>()), FSProvider(FSProvider), CDB(CDB),
BackgroundContext(std::move(BackgroundContext)),		BackgroundContext(std::move(BackgroundContext)),
Rebuilder(this, &IndexedSymbols, ThreadPoolSize),		Rebuilder(this, &IndexedSymbols, ThreadPoolSize),
IndexStorageFactory(std::move(IndexStorageFactory)),		IndexStorageFactory(std::move(IndexStorageFactory)),
Queue(std::move(OnProgress)),		Queue(std::move(OnProgress)),
CommandsChanged(		CommandsChanged(
CDB.watch([&](const std::vector<std::string> &ChangedFiles) {		CDB.watch([&](const std::vector<std::string> &ChangedFiles) {
enqueue(ChangedFiles);		enqueue(ChangedFiles);
})) {		})) {
assert(ThreadPoolSize > 0 && "Thread pool size can't be zero.");		assert(Rebuilder.TUsBeforeFirstBuild > 0 &&
		"Thread pool size can't be zero.");
assert(this->IndexStorageFactory && "Storage factory can not be null!");		assert(this->IndexStorageFactory && "Storage factory can not be null!");
for (unsigned I = 0; I < ThreadPoolSize; ++I) {		for (unsigned I = 0; I < Rebuilder.TUsBeforeFirstBuild; ++I) {
		sammccallUnsubmitted Not Done Reply Inline Actions Hmm, I finally stumbled across this today when editing the rebuild policy. I do wish this hadn't been changed without understanding what this code is doing or getting review from an owner. After this change, modifying the background-index rebuild frequency (which was initially tuned to be roughly "after each thread built one file") has the side-effect of changing the number of threads used for background indexing! Less seriously, the use of zero to mean "all the threads" is problematic here because in the other thread roots in this project we use zero to mean "no threads, work synchronously". I'll look into reverting the clangd parts of this patch. sammccall: Hmm, I finally stumbled across this today when editing the rebuild policy. I do wish this…
		aganeaAuthorUnsubmitted Done Reply Inline Actions Sorry about that Sam. Do you think 'one' could be used in clangd instead? That is the common value used across other parts of LLVM to signify 'no threads', and also when `LLVM_ENABLE_THREADS` is off. 'zero' means to use the default settings for the thread strategy. That is, `llvm::hardware_concurrency(0)` means to use all hardware threads; or `llvm::heavyweight_hardware_concurrency(0)` means to use all hardware cores, but only one `std::thread` per core. aganea: Sorry about that Sam. Do you think 'one' could be used in clangd instead? That is the common…
		sammccallUnsubmitted Not Done Reply Inline Actions No worries, it happens and nothing came of it except a little head-scratching. Sorry for being grumpy, I shouldn't send email late at night... Do you think 'one' could be used in clangd instead? That is the common value used across other parts of LLVM to signify 'no threads', and also when LLVM_ENABLE_THREADS is off One can't be used here, it means "spawn one background thread". (Clangd has several distinct components that spawn threads). FWIW, clangd can't be built without LLVM_ENABLE_THREADS - it needs concurrency to be useful, and we designed around threads. Most of the threading can be turned off at runtime for certain tests (that's what a threadpool size of zero means) but not the background index - we just disable it instead. sammccall: No worries, it happens and nothing came of it except a little head-scratching. Sorry for being…
ThreadPool.runAsync("background-worker-" + llvm::Twine(I + 1), [this] {		ThreadPool.runAsync("background-worker-" + llvm::Twine(I + 1), [this] {
WithContext Ctx(this->BackgroundContext.clone());		WithContext Ctx(this->BackgroundContext.clone());
Queue.work([&] { Rebuilder.idle(); });		Queue.work([&] { Rebuilder.idle(); });
});		});
}		}
}		}

BackgroundIndex::~BackgroundIndex() {		BackgroundIndex::~BackgroundIndex() {
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

clang-tools-extra/clangd/index/BackgroundRebuild.h

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	// etc. Without external locking, the rebuilt index may include more updates			// etc. Without external locking, the rebuilt index may include more updates
	// than intended, which is fine.			// than intended, which is fine.
	//			//
	// This class is exposed in the header so it can be tested.			// This class is exposed in the header so it can be tested.
	class BackgroundIndexRebuilder {			class BackgroundIndexRebuilder {
	public:			public:
	BackgroundIndexRebuilder(SwapIndex Target, FileSymbols Source,			BackgroundIndexRebuilder(SwapIndex Target, FileSymbols Source,
	unsigned Threads)			unsigned Threads)
	: TUsBeforeFirstBuild(Threads), Target(Target), Source(Source) {}			: TUsBeforeFirstBuild(llvm::heavyweight_hardware_concurrency(Threads)
				.compute_thread_count()),
				Target(Target), Source(Source) {}

	// Called to indicate a TU has been indexed.			// Called to indicate a TU has been indexed.
	// May rebuild, if enough TUs have been indexed.			// May rebuild, if enough TUs have been indexed.
	void indexedTU();			void indexedTU();
	// Called to indicate that all worker threads are idle.			// Called to indicate that all worker threads are idle.
	// May reindex, if the index is not up to date.			// May reindex, if the index is not up to date.
	void idle();			void idle();
	// Called to indicate we're going to load a batch of shards from disk.			// Called to indicate we're going to load a batch of shards from disk.
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

clang/lib/Tooling/AllTUsExecution.cpp

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	llvm::Error AllTUsToolExecutor::execute(
auto Count = [&]() {		auto Count = [&]() {
std::unique_lock<std::mutex> LockGuard(TUMutex);		std::unique_lock<std::mutex> LockGuard(TUMutex);
return ++Counter;		return ++Counter;
};		};

auto &Action = Actions.front();		auto &Action = Actions.front();

{		{
llvm::ThreadPool Pool(ThreadCount == 0 ? llvm::hardware_concurrency()		llvm::ThreadPool Pool(llvm::hardware_concurrency(ThreadCount));
: ThreadCount);
for (std::string File : Files) {		for (std::string File : Files) {
Pool.async(		Pool.async(
[&](std::string Path) {		[&](std::string Path) {
Log("[" + std::to_string(Count()) + "/" + TotalNumStr +		Log("[" + std::to_string(Count()) + "/" + TotalNumStr +
"] Processing file " + Path);		"] Processing file " + Path);
// Each thread gets an indepent copy of a VFS to allow different		// Each thread gets an indepent copy of a VFS to allow different
// concurrent working directories.		// concurrent working directories.
IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS =		IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS =
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines

	DependencyScanningFilesystemSharedCache::			DependencyScanningFilesystemSharedCache::
	DependencyScanningFilesystemSharedCache() {			DependencyScanningFilesystemSharedCache() {
	// This heuristic was chosen using a empirical testing on a			// This heuristic was chosen using a empirical testing on a
	// reasonably high core machine (iMacPro 18 cores / 36 threads). The cache			// reasonably high core machine (iMacPro 18 cores / 36 threads). The cache
	// sharding gives a performance edge by reducing the lock contention.			// sharding gives a performance edge by reducing the lock contention.
	// FIXME: A better heuristic might also consider the OS to account for			// FIXME: A better heuristic might also consider the OS to account for
	// the different cost of lock contention on different OSes.			// the different cost of lock contention on different OSes.
	NumShards = std::max(2u, llvm::hardware_concurrency() / 4);			NumShards =
				std::max(2u, llvm::hardware_concurrency().compute_thread_count() / 4);
	CacheShards = std::make_unique<CacheShard[]>(NumShards);			CacheShards = std::make_unique<CacheShard[]>(NumShards);
	}			}

	/// Returns a cache entry for the corresponding key.			/// Returns a cache entry for the corresponding key.
	///			///
	/// A new cache entry is created if the key is not in the cache. This is a			/// A new cache entry is created if the key is not in the cache. This is a
	/// thread safe call.			/// thread safe call.
	DependencyScanningFilesystemSharedCache::SharedFileSystemEntry &			DependencyScanningFilesystemSharedCache::SharedFileSystemEntry &
	▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

clang/tools/clang-scan-deps/ClangScanDeps.cpp

Show First 20 Lines • Show All 479 Lines • ▼ Show 20 Lines	AdjustingCompilations->appendArgumentsAdjuster(
tooling::getClangStripSerializeDiagnosticAdjuster());		tooling::getClangStripSerializeDiagnosticAdjuster());

SharedStream Errs(llvm::errs());		SharedStream Errs(llvm::errs());
// Print out the dependency results to STDOUT by default.		// Print out the dependency results to STDOUT by default.
SharedStream DependencyOS(llvm::outs());		SharedStream DependencyOS(llvm::outs());

DependencyScanningService Service(ScanMode, Format, ReuseFileManager,		DependencyScanningService Service(ScanMode, Format, ReuseFileManager,
SkipExcludedPPRanges);		SkipExcludedPPRanges);
#if LLVM_ENABLE_THREADS		llvm::ThreadPool Pool(llvm::hardware_concurrency(NumThreads));
unsigned NumWorkers =
NumThreads == 0 ? llvm::hardware_concurrency() : NumThreads;
#else
unsigned NumWorkers = 1;
#endif
llvm::ThreadPool Pool(NumWorkers);
std::vector<std::unique_ptr<DependencyScanningTool>> WorkerTools;		std::vector<std::unique_ptr<DependencyScanningTool>> WorkerTools;
for (unsigned I = 0; I < NumWorkers; ++I)		for (unsigned I = 0; I < Pool.getThreadCount(); ++I)
WorkerTools.push_back(std::make_unique<DependencyScanningTool>(Service));		WorkerTools.push_back(std::make_unique<DependencyScanningTool>(Service));

std::vector<SingleCommandCompilationDatabase> Inputs;		std::vector<SingleCommandCompilationDatabase> Inputs;
for (tooling::CompileCommand Cmd :		for (tooling::CompileCommand Cmd :
AdjustingCompilations->getAllCompileCommands())		AdjustingCompilations->getAllCompileCommands())
Inputs.emplace_back(Cmd);		Inputs.emplace_back(Cmd);

std::atomic<bool> HadErrors(false);		std::atomic<bool> HadErrors(false);
FullDeps FD;		FullDeps FD;
std::mutex Lock;		std::mutex Lock;
size_t Index = 0;		size_t Index = 0;

if (Verbose) {		if (Verbose) {
llvm::outs() << "Running clang-scan-deps on " << Inputs.size()		llvm::outs() << "Running clang-scan-deps on " << Inputs.size()
<< " files using " << NumWorkers << " workers\n";		<< " files using " << Pool.getThreadCount() << " workers\n";
}		}
for (unsigned I = 0; I < NumWorkers; ++I) {		for (unsigned I = 0; I < Pool.getThreadCount(); ++I) {
Pool.async([I, &Lock, &Index, &Inputs, &HadErrors, &FD, &WorkerTools,		Pool.async([I, &Lock, &Index, &Inputs, &HadErrors, &FD, &WorkerTools,
&DependencyOS, &Errs]() {		&DependencyOS, &Errs]() {
llvm::StringSet<> AlreadySeenModules;		llvm::StringSet<> AlreadySeenModules;
while (true) {		while (true) {
const SingleCommandCompilationDatabase *Input;		const SingleCommandCompilationDatabase *Input;
std::string Filename;		std::string Filename;
std::string CWD;		std::string CWD;
size_t LocalIndex;		size_t LocalIndex;
Show All 34 Lines

lld/ELF/SyntheticSections.cpp

Show First 20 Lines • Show All 2,741 Lines • ▼ Show 20 Lines	createSymbols(ArrayRef<std::vector<GdbIndexSection::NameAttrEntry>> nameAttrs,
}		}

// The number of symbols we will handle in this function is of the order		// The number of symbols we will handle in this function is of the order
// of millions for very large executables, so we use multi-threading to		// of millions for very large executables, so we use multi-threading to
// speed it up.		// speed it up.
size_t numShards = 32;		size_t numShards = 32;
size_t concurrency = 1;		size_t concurrency = 1;
if (threadsEnabled)		if (threadsEnabled)
concurrency =		concurrency = std::min<size_t>(
std::min<size_t>(PowerOf2Floor(hardware_concurrency()), numShards);		hardware_concurrency().compute_thread_count(), numShards);

// A sharded map to uniquify symbols by name.		// A sharded map to uniquify symbols by name.
std::vector<DenseMap<CachedHashStringRef, size_t>> map(numShards);		std::vector<DenseMap<CachedHashStringRef, size_t>> map(numShards);
size_t shift = 32 - countTrailingZeros(numShards);		size_t shift = 32 - countTrailingZeros(numShards);

// Instantiate GdbSymbols while uniqufying them by name.		// Instantiate GdbSymbols while uniqufying them by name.
std::vector<std::vector<GdbSymbol>> symbols(numShards);		std::vector<std::vector<GdbSymbol>> symbols(numShards);
parallelForEachN(0, concurrency, [&](size_t threadId) {		parallelForEachN(0, concurrency, [&](size_t threadId) {
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	void MergeNoTailSection::finalizeContents() {
// Initializes string table builders.		// Initializes string table builders.
for (size_t i = 0; i < numShards; ++i)		for (size_t i = 0; i < numShards; ++i)
shards.emplace_back(StringTableBuilder::RAW, alignment);		shards.emplace_back(StringTableBuilder::RAW, alignment);

// Concurrency level. Must be a power of 2 to avoid expensive modulo		// Concurrency level. Must be a power of 2 to avoid expensive modulo
// operations in the following tight loop.		// operations in the following tight loop.
size_t concurrency = 1;		size_t concurrency = 1;
if (threadsEnabled)		if (threadsEnabled)
concurrency =		concurrency = std::min<size_t>(
std::min<size_t>(PowerOf2Floor(hardware_concurrency()), numShards);		hardware_concurrency().compute_thread_count(), numShards);

// Add section pieces to the builders.		// Add section pieces to the builders.
parallelForEachN(0, concurrency, [&](size_t threadId) {		parallelForEachN(0, concurrency, [&](size_t threadId) {
for (MergeInputSection *sec : sections) {		for (MergeInputSection *sec : sections) {
for (size_t i = 0, e = sec->pieces.size(); i != e; ++i) {		for (size_t i = 0, e = sec->pieces.size(); i != e; ++i) {
if (!sec->pieces[i].live)		if (!sec->pieces[i].live)
continue;		continue;
size_t shardId = getShardId(sec->pieces[i].hash);		size_t shardId = getShardId(sec->pieces[i].hash);
▲ Show 20 Lines • Show All 589 Lines • Show Last 20 Lines

llvm/include/llvm/LTO/LTO.h

	Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines
	/// The details of this type definition aren't important; clients can only			/// The details of this type definition aren't important; clients can only
	/// create a ThinBackend using one of the create*ThinBackend() functions below.			/// create a ThinBackend using one of the create*ThinBackend() functions below.
	using ThinBackend = std::function<std::unique_ptr<ThinBackendProc>(			using ThinBackend = std::function<std::unique_ptr<ThinBackendProc>(
	const Config &C, ModuleSummaryIndex &CombinedIndex,			const Config &C, ModuleSummaryIndex &CombinedIndex,
	StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,			StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
	AddStreamFn AddStream, NativeObjectCache Cache)>;			AddStreamFn AddStream, NativeObjectCache Cache)>;

	/// This ThinBackend runs the individual backend jobs in-process.			/// This ThinBackend runs the individual backend jobs in-process.
	ThinBackend createInProcessThinBackend(unsigned ParallelismLevel);			/// The default value means to use one job per hardware core (not hyper-thread).
				ThinBackend createInProcessThinBackend(unsigned ParallelismLevel = 0);

	/// This ThinBackend writes individual module indexes to files, instead of			/// This ThinBackend writes individual module indexes to files, instead of
	/// running the individual backend jobs. This backend is for distributed builds			/// running the individual backend jobs. This backend is for distributed builds
	/// where separate processes will invoke the real backends.			/// where separate processes will invoke the real backends.
	///			///
	/// To find the path to write the index to, the backend checks if the path has a			/// To find the path to write the index to, the backend checks if the path has a
	/// prefix of OldPrefix; if so, it replaces that prefix with NewPrefix. It then			/// prefix of OldPrefix; if so, it replaces that prefix with NewPrefix. It then
	/// appends ".thinlto.bc" and writes the index to that path. If			/// appends ".thinlto.bc" and writes the index to that path. If
	▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/include/llvm/Support/ThreadPool.h

//===-- llvm/Support/ThreadPool.h - A ThreadPool implementation -- C++ --===//		//===-- llvm/Support/ThreadPool.h - A ThreadPool implementation -- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines a crude C++11 based thread pool.		// This file defines a crude C++11 based thread pool.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_SUPPORT_THREAD_POOL_H		#ifndef LLVM_SUPPORT_THREAD_POOL_H
#define LLVM_SUPPORT_THREAD_POOL_H		#define LLVM_SUPPORT_THREAD_POOL_H

		#include "llvm/ADT/BitVector.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
		#include "llvm/Support/Threading.h"
#include "llvm/Support/thread.h"		#include "llvm/Support/thread.h"

#include <future>		#include <future>

#include <atomic>		#include <atomic>
#include <condition_variable>		#include <condition_variable>
#include <functional>		#include <functional>
#include <memory>		#include <memory>
#include <mutex>		#include <mutex>
#include <queue>		#include <queue>
#include <utility>		#include <utility>

namespace llvm {		namespace llvm {

/// A ThreadPool for asynchronous parallel execution on a defined number of		/// A ThreadPool for asynchronous parallel execution on a defined number of
/// threads.		/// threads.
///		///
/// The pool keeps a vector of threads alive, waiting on a condition variable		/// The pool keeps a vector of threads alive, waiting on a condition variable
/// for some work to become available.		/// for some work to become available.
class ThreadPool {		class ThreadPool {
public:		public:
using TaskTy = std::function<void()>;		using TaskTy = std::function<void()>;
using PackagedTaskTy = std::packaged_task<void()>;		using PackagedTaskTy = std::packaged_task<void()>;

/// Construct a pool with the number of threads found by		/// Construct a pool using the hardware strategy \p S for mapping hardware
/// hardware_concurrency().		/// execution resources (threads, cores, CPUs)
ThreadPool();		/// Defaults to using the maximum execution resources in the system, but
		/// excluding any resources contained in the affinity mask.
/// Construct a pool of \p ThreadCount threads		ThreadPool(ThreadPoolStrategy S = hardware_concurrency());
ThreadPool(unsigned ThreadCount);

/// Blocking destructor: the pool will wait for all the threads to complete.		/// Blocking destructor: the pool will wait for all the threads to complete.
~ThreadPool();		~ThreadPool();

/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
template <typename Function, typename... Args>		template <typename Function, typename... Args>
inline std::shared_future<void> async(Function &&F, Args &&... ArgList) {		inline std::shared_future<void> async(Function &&F, Args &&... ArgList) {
auto Task =		auto Task =
std::bind(std::forward<Function>(F), std::forward<Args>(ArgList)...);		std::bind(std::forward<Function>(F), std::forward<Args>(ArgList)...);
return asyncImpl(std::move(Task));		return asyncImpl(std::move(Task));
}		}

/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
template <typename Function>		template <typename Function>
inline std::shared_future<void> async(Function &&F) {		inline std::shared_future<void> async(Function &&F) {
return asyncImpl(std::forward<Function>(F));		return asyncImpl(std::forward<Function>(F));
}		}

/// Blocking wait for all the threads to complete and the queue to be empty.		/// Blocking wait for all the threads to complete and the queue to be empty.
/// It is an error to try to add new tasks while blocking on this call.		/// It is an error to try to add new tasks while blocking on this call.
void wait();		void wait();

		unsigned getThreadCount() const { return ThreadCount; }

private:		private:
/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
std::shared_future<void> asyncImpl(TaskTy F);		std::shared_future<void> asyncImpl(TaskTy F);

/// Threads in flight		/// Threads in flight
std::vector<llvm::thread> Threads;		std::vector<llvm::thread> Threads;

Show All 10 Lines	private:

/// Keep track of the number of thread actually busy		/// Keep track of the number of thread actually busy
std::atomic<unsigned> ActiveThreads;		std::atomic<unsigned> ActiveThreads;

#if LLVM_ENABLE_THREADS // avoids warning for unused variable		#if LLVM_ENABLE_THREADS // avoids warning for unused variable
/// Signal for the destruction of the pool, asking thread to exit.		/// Signal for the destruction of the pool, asking thread to exit.
bool EnableFlag;		bool EnableFlag;
#endif		#endif

		unsigned ThreadCount;
};		};
}		}

#endif // LLVM_SUPPORT_THREAD_POOL_H		#endif // LLVM_SUPPORT_THREAD_POOL_H

llvm/include/llvm/Support/Threading.h

//===-- llvm/Support/Threading.h - Control multithreading mode --- C++ --===//		//===-- llvm/Support/Threading.h - Control multithreading mode --- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares helper functions for running LLVM in a multi-threaded		// This file declares helper functions for running LLVM in a multi-threaded
// environment.		// environment.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_SUPPORT_THREADING_H		#ifndef LLVM_SUPPORT_THREADING_H
#define LLVM_SUPPORT_THREADING_H		#define LLVM_SUPPORT_THREADING_H

		#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/FunctionExtras.h"		#include "llvm/ADT/FunctionExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Config/llvm-config.h" // for LLVM_ON_UNIX		#include "llvm/Config/llvm-config.h" // for LLVM_ON_UNIX
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include <ciso646> // So we can check the C++ standard lib macros.		#include <ciso646> // So we can check the C++ standard lib macros.
#include <functional>		#include <functional>

#if defined(_MSC_VER)		#if defined(_MSC_VER)
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (old_val == Uninitialized) {
tmp = flag.status;		tmp = flag.status;
sys::MemoryFence();		sys::MemoryFence();
}		}
}		}
TsanHappensAfter(&flag.status);		TsanHappensAfter(&flag.status);
#endif		#endif
}		}

/// Get the amount of currency to use for tasks requiring significant		/// This tells how a thread pool will be used
/// memory or other resources. Currently based on physical cores, if		class ThreadPoolStrategy {
/// available for the host system, otherwise falls back to		public:
/// thread::hardware_concurrency().		// The default value (0) means all available threads should be used,
/// Returns 1 when LLVM is configured with LLVM_ENABLE_THREADS=OFF		// excluding affinity mask. If set, this value only represents a suggested
unsigned heavyweight_hardware_concurrency();		// high bound, the runtime might choose a lower value (not higher).
		rnkUnsubmitted Done Reply Inline Actions Let's name the fields in a way that indicates that these numbers are the requested number of threads, not the final number. So, `ThreadsRequested` or something like that. rnk: Let's name the fields in a way that indicates that these numbers are the requested number of…
		unsigned ThreadsRequested = 0;
/// Get the number of threads that the current program can execute
/// concurrently. On some systems std::thread::hardware_concurrency() returns		// If SMT is active, use hyper threads. If false, there will be only one
/// the total number of cores, without taking affinity into consideration.		// std::thread per core.
/// Returns 1 when LLVM is configured with LLVM_ENABLE_THREADS=OFF.		bool UseHyperThreads = true;
		rnkUnsubmitted Done Reply Inline Actions This could be `UseHyperThreads`. The first time I read it, I guessed that it indicated if the system has hyperthreads. rnk: This could be `UseHyperThreads`. The first time I read it, I guessed that it indicated if the…
/// Fallback to std::thread::hardware_concurrency() if sched_getaffinity is
/// not available.		/// Retrieves the max available threads for the current strategy. This
unsigned hardware_concurrency();		/// accounts for affinity masks and takes advantage of all CPU sockets.
		unsigned compute_thread_count() const;

		/// Assign the current thread to an ideal hardware CPU or NUMA node. In a
		/// multi-socket system, this ensures threads are assigned to all CPU
		/// sockets. \p ThreadPoolNum represents a number bounded by [0,
		/// compute_thread_count()).
		void apply_thread_strategy(unsigned ThreadPoolNum) const;
		};

		/// Returns a thread strategy for tasks requiring significant memory or other
		/// resources. To be used for workloads where hardware_concurrency() proves to
		/// be less efficient. Avoid this strategy if doing lots of I/O. Currently
		/// based on physical cores, if available for the host system, otherwise falls
		/// back to hardware_concurrency(). Returns 1 when LLVM is configured with
		/// LLVM_ENABLE_THREADS = OFF
		inline ThreadPoolStrategy
		heavyweight_hardware_concurrency(unsigned ThreadCount = 0) {
		ThreadPoolStrategy S;
		S.UseHyperThreads = false;
		S.ThreadsRequested = ThreadCount;
		return S;
		}

		/// Returns a default thread strategy where all available hardware ressources
		/// are to be used, except for those initially excluded by an affinity mask.
		/// This function takes affinity into consideration. Returns 1 when LLVM is
		/// configured with LLVM_ENABLE_THREADS=OFF.
		inline ThreadPoolStrategy hardware_concurrency(unsigned ThreadCount = 0) {
		ThreadPoolStrategy S;
		S.ThreadsRequested = ThreadCount;
		return S;
		}

/// Return the current thread id, as used in various OS system calls.		/// Return the current thread id, as used in various OS system calls.
/// Note that not all platforms guarantee that the value returned will be		/// Note that not all platforms guarantee that the value returned will be
/// unique across the entire system, so portable code should not assume		/// unique across the entire system, so portable code should not assume
/// this.		/// this.
uint64_t get_threadid();		uint64_t get_threadid();

/// Get the maximum length of a thread name on this platform.		/// Get the maximum length of a thread name on this platform.
Show All 11 Lines	#endif
/// Get the name of the current thread. The level of support for		/// Get the name of the current thread. The level of support for
/// getting a thread's name varies wildly across operating systems, and it		/// getting a thread's name varies wildly across operating systems, and it
/// is not even guaranteed that if you can successfully set a thread's name		/// is not even guaranteed that if you can successfully set a thread's name
/// that you can later get it back. This function is intended for diagnostic		/// that you can later get it back. This function is intended for diagnostic
/// purposes, and as with setting a thread's name no indication of whether		/// purposes, and as with setting a thread's name no indication of whether
/// the operation succeeded or failed is returned.		/// the operation succeeded or failed is returned.
void get_thread_name(SmallVectorImpl<char> &Name);		void get_thread_name(SmallVectorImpl<char> &Name);

		/// Returns a mask that represents on which hardware thread, core, CPU, NUMA
		/// group, the calling thread can be executed. On Windows, threads cannot
		/// cross CPU boundaries.
		llvm::BitVector get_thread_affinity_mask();

		/// Returns how many physical CPUs or NUMA groups the system has.
		unsigned get_cpus();

enum class ThreadPriority {		enum class ThreadPriority {
Background = 0,		Background = 0,
Default = 1,		Default = 1,
};		};
/// If priority is Background tries to lower current threads priority such		/// If priority is Background tries to lower current threads priority such
/// that it does not affect foreground tasks significantly. Can be used for		/// that it does not affect foreground tasks significantly. Can be used for
/// long-running, latency-insensitive tasks to make sure cpu is not hogged by		/// long-running, latency-insensitive tasks to make sure cpu is not hogged by
/// this task.		/// this task.
/// If the priority is default tries to restore current threads priority to		/// If the priority is default tries to restore current threads priority to
/// default scheduling priority.		/// default scheduling priority.
enum class SetThreadPriorityResult { FAILURE, SUCCESS };		enum class SetThreadPriorityResult { FAILURE, SUCCESS };
SetThreadPriorityResult set_thread_priority(ThreadPriority Priority);		SetThreadPriorityResult set_thread_priority(ThreadPriority Priority);
}		}

#endif		#endif

llvm/lib/CodeGen/ParallelCG.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (!BCOSs.empty())
WriteBitcodeToFile(M, BCOSs[0]);		WriteBitcodeToFile(M, BCOSs[0]);
codegen(M.get(), *OSs[0], TMFactory, FileType);		codegen(M.get(), *OSs[0], TMFactory, FileType);
return M;		return M;
}		}

// Create ThreadPool in nested scope so that threads will be joined		// Create ThreadPool in nested scope so that threads will be joined
// on destruction.		// on destruction.
{		{
ThreadPool CodegenThreadPool(OSs.size());		ThreadPool CodegenThreadPool(hardware_concurrency(OSs.size()));
int ThreadCount = 0;		int ThreadCount = 0;

SplitModule(		SplitModule(
std::move(M), OSs.size(),		std::move(M), OSs.size(),
[&](std::unique_ptr<Module> MPart) {		[&](std::unique_ptr<Module> MPart) {
// We want to clone the module in a new context to multi-thread the		// We want to clone the module in a new context to multi-thread the
// codegen. We do it by serializing partition modules to bitcode		// codegen. We do it by serializing partition modules to bitcode
// (while still on the main thread, in order to avoid data races) and		// (while still on the main thread, in order to avoid data races) and
Show All 36 Lines

llvm/lib/DWARFLinker/DWARFLinker.cpp

Show First 20 Lines • Show All 2,440 Lines • ▼ Show 20 Lines	bool DWARFLinker::link() {
// in endDebugObject.		// in endDebugObject.
if (Options.Threads == 1) {		if (Options.Threads == 1) {
for (unsigned I = 0, E = NumObjects; I != E; ++I) {		for (unsigned I = 0, E = NumObjects; I != E; ++I) {
AnalyzeLambda(I);		AnalyzeLambda(I);
CloneLambda(I);		CloneLambda(I);
}		}
EmitLambda();		EmitLambda();
} else {		} else {
ThreadPool Pool(2);		ThreadPool Pool(hardware_concurrency(2));
Pool.async(AnalyzeAll);		Pool.async(AnalyzeAll);
Pool.async(CloneAll);		Pool.async(CloneAll);
Pool.wait();		Pool.wait();
}		}

return true;		return true;
}		}

} // namespace llvm		} // namespace llvm

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp

Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	if (NumThreads == 1) {

// We need to call getAbbreviations sequentially first so that getUnitDIE()		// We need to call getAbbreviations sequentially first so that getUnitDIE()
// only works with its local data.		// only works with its local data.
for (const auto &CU : DICtx.compile_units())		for (const auto &CU : DICtx.compile_units())
CU->getAbbreviations();		CU->getAbbreviations();

// Now parse all DIEs in case we have cross compile unit references in a		// Now parse all DIEs in case we have cross compile unit references in a
// thread pool.		// thread pool.
ThreadPool pool(NumThreads);		ThreadPool pool(hardware_concurrency(NumThreads));
for (const auto &CU : DICtx.compile_units())		for (const auto &CU : DICtx.compile_units())
pool.async([&CU]() { CU->getUnitDIE(false /CUDieOnly/); });		pool.async([&CU]() { CU->getUnitDIE(false /CUDieOnly/); });
pool.wait();		pool.wait();

// Now convert all DWARF to GSYM in a thread pool.		// Now convert all DWARF to GSYM in a thread pool.
std::mutex LogMutex;		std::mutex LogMutex;
for (const auto &CU : DICtx.compile_units()) {		for (const auto &CU : DICtx.compile_units()) {
DWARFDie Die = CU->getUnitDIE(false /CUDieOnly/);		DWARFDie Die = CU->getUnitDIE(false /CUDieOnly/);
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/lib/ExecutionEngine/Orc/LLJIT.cpp

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	else {
}		}
CompileLayer = std::make_unique<IRCompileLayer>(		CompileLayer = std::make_unique<IRCompileLayer>(
ES, ObjTransformLayer, std::move(CompileFunction));		ES, ObjTransformLayer, std::move(CompileFunction));
TransformLayer = std::make_unique<IRTransformLayer>(ES, CompileLayer);		TransformLayer = std::make_unique<IRTransformLayer>(ES, CompileLayer);
}		}

if (S.NumCompileThreads > 0) {		if (S.NumCompileThreads > 0) {
TransformLayer->setCloneToNewContextOnEmit(true);		TransformLayer->setCloneToNewContextOnEmit(true);
CompileThreads = std::make_unique<ThreadPool>(S.NumCompileThreads);		CompileThreads =
		std::make_unique<ThreadPool>(hardware_concurrency(S.NumCompileThreads));
ES->setDispatchMaterialization(		ES->setDispatchMaterialization(
[this](JITDylib &JD, std::unique_ptr<MaterializationUnit> MU) {		[this](JITDylib &JD, std::unique_ptr<MaterializationUnit> MU) {
// FIXME: Switch to move capture once we have c++14.		// FIXME: Switch to move capture once we have c++14.
auto SharedMU = std::shared_ptr<MaterializationUnit>(std::move(MU));		auto SharedMU = std::shared_ptr<MaterializationUnit>(std::move(MU));
auto Work = [SharedMU, &JD]() { SharedMU->doMaterialize(JD); };		auto Work = [SharedMU, &JD]() { SharedMU->doMaterialize(JD); };
CompileThreads->async(std::move(Work));		CompileThreads->async(std::move(Work));
});		});
}		}
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/lib/LTO/LTO.cpp

Show First 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	LTO::RegularLTOState::RegularLTOState(unsigned ParallelCodeGenParallelismLevel,
const Config &Conf)		const Config &Conf)
: ParallelCodeGenParallelismLevel(ParallelCodeGenParallelismLevel),		: ParallelCodeGenParallelismLevel(ParallelCodeGenParallelismLevel),
Ctx(Conf), CombinedModule(std::make_unique<Module>("ld-temp.o", Ctx)),		Ctx(Conf), CombinedModule(std::make_unique<Module>("ld-temp.o", Ctx)),
Mover(std::make_unique<IRMover>(*CombinedModule)) {}		Mover(std::make_unique<IRMover>(*CombinedModule)) {}

LTO::ThinLTOState::ThinLTOState(ThinBackend Backend)		LTO::ThinLTOState::ThinLTOState(ThinBackend Backend)
: Backend(Backend), CombinedIndex(/HaveGVs/ false) {		: Backend(Backend), CombinedIndex(/HaveGVs/ false) {
if (!Backend)		if (!Backend)
this->Backend =		this->Backend = createInProcessThinBackend();
createInProcessThinBackend(llvm::heavyweight_hardware_concurrency());
}		}

LTO::LTO(Config Conf, ThinBackend Backend,		LTO::LTO(Config Conf, ThinBackend Backend,
unsigned ParallelCodeGenParallelismLevel)		unsigned ParallelCodeGenParallelismLevel)
: Conf(std::move(Conf)),		: Conf(std::move(Conf)),
RegularLTO(ParallelCodeGenParallelismLevel, this->Conf),		RegularLTO(ParallelCodeGenParallelismLevel, this->Conf),
ThinLTO(std::move(Backend)) {}		ThinLTO(std::move(Backend)) {}

▲ Show 20 Lines • Show All 600 Lines • ▼ Show 20 Lines

public:		public:
InProcessThinBackend(		InProcessThinBackend(
const Config &Conf, ModuleSummaryIndex &CombinedIndex,		const Config &Conf, ModuleSummaryIndex &CombinedIndex,
unsigned ThinLTOParallelismLevel,		unsigned ThinLTOParallelismLevel,
const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,		const StringMap<GVSummaryMapTy> &ModuleToDefinedGVSummaries,
AddStreamFn AddStream, NativeObjectCache Cache)		AddStreamFn AddStream, NativeObjectCache Cache)
: ThinBackendProc(Conf, CombinedIndex, ModuleToDefinedGVSummaries),		: ThinBackendProc(Conf, CombinedIndex, ModuleToDefinedGVSummaries),
BackendThreadPool(ThinLTOParallelismLevel),		BackendThreadPool(
		heavyweight_hardware_concurrency(ThinLTOParallelismLevel)),
AddStream(std::move(AddStream)), Cache(std::move(Cache)) {		AddStream(std::move(AddStream)), Cache(std::move(Cache)) {
for (auto &Name : CombinedIndex.cfiFunctionDefs())		for (auto &Name : CombinedIndex.cfiFunctionDefs())
CfiFunctionDefs.insert(		CfiFunctionDefs.insert(
GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));		GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));
for (auto &Name : CombinedIndex.cfiFunctionDecls())		for (auto &Name : CombinedIndex.cfiFunctionDecls())
CfiFunctionDecls.insert(		CfiFunctionDecls.insert(
GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));		GlobalValue::getGUID(GlobalValue::dropLLVMManglingEscape(Name)));
}		}
▲ Show 20 Lines • Show All 349 Lines • Show Last 20 Lines

llvm/lib/LTO/LTOBackend.cpp

Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	void codegen(const Config &Conf, TargetMachine *TM, AddStreamFn AddStream,

if (DwoOut)		if (DwoOut)
DwoOut->keep();		DwoOut->keep();
}		}

void splitCodeGen(const Config &C, TargetMachine *TM, AddStreamFn AddStream,		void splitCodeGen(const Config &C, TargetMachine *TM, AddStreamFn AddStream,
unsigned ParallelCodeGenParallelismLevel,		unsigned ParallelCodeGenParallelismLevel,
std::unique_ptr<Module> Mod) {		std::unique_ptr<Module> Mod) {
ThreadPool CodegenThreadPool(ParallelCodeGenParallelismLevel);		ThreadPool CodegenThreadPool(
		heavyweight_hardware_concurrency(ParallelCodeGenParallelismLevel));
unsigned ThreadCount = 0;		unsigned ThreadCount = 0;
const Target *T = &TM->getTarget();		const Target *T = &TM->getTarget();

SplitModule(		SplitModule(
std::move(Mod), ParallelCodeGenParallelismLevel,		std::move(Mod), ParallelCodeGenParallelismLevel,
[&](std::unique_ptr<Module> MPart) {		[&](std::unique_ptr<Module> MPart) {
// We want to clone the module in a new context to multi-thread the		// We want to clone the module in a new context to multi-thread the
// codegen. We do it by serializing partition modules to bitcode		// codegen. We do it by serializing partition modules to bitcode
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
extern cl::opt<std::string> RemarksFilename;		extern cl::opt<std::string> RemarksFilename;
extern cl::opt<std::string> RemarksPasses;		extern cl::opt<std::string> RemarksPasses;
extern cl::opt<bool> RemarksWithHotness;		extern cl::opt<bool> RemarksWithHotness;
extern cl::opt<std::string> RemarksFormat;		extern cl::opt<std::string> RemarksFormat;
}		}

namespace {		namespace {

static cl::opt<int>		// Default to using one job per hardware core in the system
ThreadCount("threads", cl::init(llvm::heavyweight_hardware_concurrency()));		static cl::opt<int> ThreadCount("threads", cl::init(0));

// Simple helper to save temporary files for debug.		// Simple helper to save temporary files for debug.
static void saveTempBitcode(const Module &TheModule, StringRef TempDir,		static void saveTempBitcode(const Module &TheModule, StringRef TempDir,
unsigned count, StringRef Suffix) {		unsigned count, StringRef Suffix) {
if (TempDir.empty())		if (TempDir.empty())
return;		return;
// User asked to save temps, let dump the bitcode file after import.		// User asked to save temps, let dump the bitcode file after import.
std::string SaveTempPath = (TempDir + llvm::Twine(count) + Suffix).str();		std::string SaveTempPath = (TempDir + llvm::Twine(count) + Suffix).str();
▲ Show 20 Lines • Show All 944 Lines • ▼ Show 20 Lines	auto LSize =
Modules[LeftIndex]->getSingleBitcodeModule().getBuffer().size();		Modules[LeftIndex]->getSingleBitcodeModule().getBuffer().size();
auto RSize =		auto RSize =
Modules[RightIndex]->getSingleBitcodeModule().getBuffer().size();		Modules[RightIndex]->getSingleBitcodeModule().getBuffer().size();
return LSize > RSize;		return LSize > RSize;
});		});

// Parallel optimizer + codegen		// Parallel optimizer + codegen
{		{
ThreadPool Pool(ThreadCount);		ThreadPool Pool(heavyweight_hardware_concurrency(ThreadCount));
for (auto IndexCount : ModulesOrdering) {		for (auto IndexCount : ModulesOrdering) {
auto &Mod = Modules[IndexCount];		auto &Mod = Modules[IndexCount];
Pool.async([&](int count) {		Pool.async([&](int count) {
auto ModuleIdentifier = Mod->getName();		auto ModuleIdentifier = Mod->getName();
auto &ExportList = ExportLists[ModuleIdentifier];		auto &ExportList = ExportLists[ModuleIdentifier];

auto &DefinedGVSummaries = ModuleToDefinedGVSummaries[ModuleIdentifier];		auto &DefinedGVSummaries = ModuleToDefinedGVSummaries[ModuleIdentifier];

▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/lib/Support/Host.cpp

Show First 20 Lines • Show All 1,260 Lines • ▼ Show 20 Lines
#else		#else
StringRef sys::getHostCPUName() { return "generic"; }		StringRef sys::getHostCPUName() { return "generic"; }
#endif		#endif

#if defined(__linux__) && defined(__x86_64__)		#if defined(__linux__) && defined(__x86_64__)
// On Linux, the number of physical cores can be computed from /proc/cpuinfo,		// On Linux, the number of physical cores can be computed from /proc/cpuinfo,
// using the number of unique physical/core id pairs. The following		// using the number of unique physical/core id pairs. The following
// implementation reads the /proc/cpuinfo format on an x86_64 system.		// implementation reads the /proc/cpuinfo format on an x86_64 system.
static int computeHostNumPhysicalCores() {		int computeHostNumPhysicalCores() {
// Read /proc/cpuinfo as a stream (until EOF reached). It cannot be		// Read /proc/cpuinfo as a stream (until EOF reached). It cannot be
// mmapped because it appears to have 0 size.		// mmapped because it appears to have 0 size.
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> Text =		llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> Text =
llvm::MemoryBuffer::getFileAsStream("/proc/cpuinfo");		llvm::MemoryBuffer::getFileAsStream("/proc/cpuinfo");
if (std::error_code EC = Text.getError()) {		if (std::error_code EC = Text.getError()) {
llvm::errs() << "Can't read "		llvm::errs() << "Can't read "
<< "/proc/cpuinfo: " << EC.message() << "\n";		<< "/proc/cpuinfo: " << EC.message() << "\n";
return -1;		return -1;
rnkUnsubmitted Done Reply Inline Actions Another -1 case. rnk: Another -1 case.
}		}
SmallVector<StringRef, 8> strs;		SmallVector<StringRef, 8> strs;
(Text)->getBuffer().split(strs, "\n", /MaxSplit=*/-1,		(Text)->getBuffer().split(strs, "\n", /MaxSplit=*/-1,
/KeepEmpty=/false);		/KeepEmpty=/false);
int CurPhysicalId = -1;		int CurPhysicalId = -1;
int CurCoreId = -1;		int CurCoreId = -1;
SmallSet<std::pair<int, int>, 32> UniqueItems;		SmallSet<std::pair<int, int>, 32> UniqueItems;
for (auto &Line : strs) {		for (auto &Line : strs) {
Show All 21 Lines	int computeHostNumPhysicalCores() {
}		}
return UniqueItems.size();		return UniqueItems.size();
}		}
#elif defined(__APPLE__) && defined(__x86_64__)		#elif defined(__APPLE__) && defined(__x86_64__)
#include <sys/param.h>		#include <sys/param.h>
#include <sys/sysctl.h>		#include <sys/sysctl.h>

// Gets the number of physical cores on the machine.		// Gets the number of physical cores on the machine.
static int computeHostNumPhysicalCores() {		int computeHostNumPhysicalCores() {
uint32_t count;		uint32_t count;
size_t len = sizeof(count);		size_t len = sizeof(count);
sysctlbyname("hw.physicalcpu", &count, &len, NULL, 0);		sysctlbyname("hw.physicalcpu", &count, &len, NULL, 0);
if (count < 1) {		if (count < 1) {
int nm[2];		int nm[2];
nm[0] = CTL_HW;		nm[0] = CTL_HW;
nm[1] = HW_AVAILCPU;		nm[1] = HW_AVAILCPU;
sysctl(nm, 2, &count, &len, NULL, 0);		sysctl(nm, 2, &count, &len, NULL, 0);
if (count < 1)		if (count < 1)
return -1;		return -1;
}		}
return count;		return count;
}		}
		#elif defined(_WIN32)
		// Defined in llvm/lib/Support/Windows/Threading.inc
		int computeHostNumPhysicalCores();
#else		#else
// On other systems, return -1 to indicate unknown.		// On other systems, return -1 to indicate unknown.
static int computeHostNumPhysicalCores() { return -1; }		static int computeHostNumPhysicalCores() { return -1; }
		rnkUnsubmitted Done Reply Inline Actions Note: the -1 case. rnk: Note: the -1 case.
#endif		#endif

int sys::getHostNumPhysicalCores() {		int sys::getHostNumPhysicalCores() {
static int NumCores = computeHostNumPhysicalCores();		static int NumCores = computeHostNumPhysicalCores();
return NumCores;		return NumCores;
}		}

#if defined(__i386__) \|\| defined(_M_IX86) \|\| \		#if defined(__i386__) \|\| defined(_M_IX86) \|\| \
▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/lib/Support/Parallel.cpp

Show All 33 Lines	public:

static Executor *getDefaultExecutor();		static Executor *getDefaultExecutor();
};		};

/// An implementation of an Executor that runs closures on a thread pool		/// An implementation of an Executor that runs closures on a thread pool
/// in filo order.		/// in filo order.
class ThreadPoolExecutor : public Executor {		class ThreadPoolExecutor : public Executor {
public:		public:
explicit ThreadPoolExecutor(unsigned ThreadCount = hardware_concurrency()) {		explicit ThreadPoolExecutor(ThreadPoolStrategy S = hardware_concurrency()) {
		unsigned ThreadCount = S.compute_thread_count();
// Spawn all but one of the threads in another thread as spawning threads		// Spawn all but one of the threads in another thread as spawning threads
// can take a while.		// can take a while.
Threads.reserve(ThreadCount);		Threads.reserve(ThreadCount);
Threads.resize(1);		Threads.resize(1);
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
Threads[0] = std::thread([&, ThreadCount] {		Threads[0] = std::thread([this, ThreadCount, S] {
for (unsigned i = 1; i < ThreadCount; ++i) {		for (unsigned I = 1; I < ThreadCount; ++I) {
Threads.emplace_back([=] { work(); });		Threads.emplace_back([=] { work(S, I); });
if (Stop)		if (Stop)
break;		break;
}		}
ThreadsCreated.set_value();		ThreadsCreated.set_value();
work();		work(S, 0);
});		});
}		}

void stop() {		void stop() {
{		{
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
if (Stop)		if (Stop)
return;		return;
Show All 21 Lines	void add(std::function<void()> F) override {
{		{
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
WorkStack.push(F);		WorkStack.push(F);
}		}
Cond.notify_one();		Cond.notify_one();
}		}

private:		private:
void work() {		void work(ThreadPoolStrategy S, unsigned ThreadID) {
		S.apply_thread_strategy(ThreadID);
while (true) {		while (true) {
std::unique_lock<std::mutex> Lock(Mutex);		std::unique_lock<std::mutex> Lock(Mutex);
Cond.wait(Lock, [&] { return Stop \|\| !WorkStack.empty(); });		Cond.wait(Lock, [&] { return Stop \|\| !WorkStack.empty(); });
if (Stop)		if (Stop)
break;		break;
auto Task = WorkStack.top();		auto Task = WorkStack.top();
WorkStack.pop();		WorkStack.pop();
Lock.unlock();		Lock.unlock();
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/lib/Support/ThreadPool.cpp

Show All 14 Lines
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS

// Default to hardware_concurrency		ThreadPool::ThreadPool(ThreadPoolStrategy S)
ThreadPool::ThreadPool() : ThreadPool(hardware_concurrency()) {}		: ActiveThreads(0), EnableFlag(true),
		ThreadCount(S.compute_thread_count()) {
ThreadPool::ThreadPool(unsigned ThreadCount)
: ActiveThreads(0), EnableFlag(true) {
// Create ThreadCount threads that will loop forever, wait on QueueCondition		// Create ThreadCount threads that will loop forever, wait on QueueCondition
// for tasks to be queued or the Pool to be destroyed.		// for tasks to be queued or the Pool to be destroyed.
Threads.reserve(ThreadCount);		Threads.reserve(ThreadCount);
for (unsigned ThreadID = 0; ThreadID < ThreadCount; ++ThreadID) {		for (unsigned ThreadID = 0; ThreadID < ThreadCount; ++ThreadID) {
Threads.emplace_back([&] {		Threads.emplace_back([S, ThreadID, this] {
		S.apply_thread_strategy(ThreadID);
while (true) {		while (true) {
PackagedTaskTy Task;		PackagedTaskTy Task;
{		{
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);
// Wait for tasks to be pushed in the queue		// Wait for tasks to be pushed in the queue
QueueCondition.wait(LockGuard,		QueueCondition.wait(LockGuard,
[&] { return !EnableFlag \|\| !Tasks.empty(); });		[&] { return !EnableFlag \|\| !Tasks.empty(); });
// Exit condition		// Exit condition
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	ThreadPool::~ThreadPool() {
}		}
QueueCondition.notify_all();		QueueCondition.notify_all();
for (auto &Worker : Threads)		for (auto &Worker : Threads)
Worker.join();		Worker.join();
}		}

#else // LLVM_ENABLE_THREADS Disabled		#else // LLVM_ENABLE_THREADS Disabled

ThreadPool::ThreadPool() : ThreadPool(0) {}

// No threads are launched, issue a warning if ThreadCount is not 0		// No threads are launched, issue a warning if ThreadCount is not 0
ThreadPool::ThreadPool(unsigned ThreadCount)		ThreadPool::ThreadPool(ThreadPoolStrategy S)
: ActiveThreads(0) {		: ActiveThreads(0), ThreadCount(S.compute_thread_count()) {
if (ThreadCount) {		if (ThreadCount != 1) {
errs() << "Warning: request a ThreadPool with " << ThreadCount		errs() << "Warning: request a ThreadPool with " << ThreadCount
<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";		<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";
}		}
}		}

void ThreadPool::wait() {		void ThreadPool::wait() {
// Sequential implementation running the tasks		// Sequential implementation running the tasks
while (!Tasks.empty()) {		while (!Tasks.empty()) {
auto Task = std::move(Tasks.front());		auto Task = std::move(Tasks.front());
Tasks.pop();		Tasks.pop();
Task();		Task();
}		}
}		}

std::shared_future<void> ThreadPool::asyncImpl(TaskTy Task) {		std::shared_future<void> ThreadPool::asyncImpl(TaskTy Task) {
// Get a Future with launch::deferred execution using std::async		// Get a Future with launch::deferred execution using std::async
auto Future = std::async(std::launch::deferred, std::move(Task)).share();		auto Future = std::async(std::launch::deferred, std::move(Task)).share();
// Wrap the future so that both ThreadPool::wait() can operate and the		// Wrap the future so that both ThreadPool::wait() can operate and the
// returned future can be sync'ed on.		// returned future can be sync'ed on.
PackagedTaskTy PackagedTask([Future]() { Future.get(); });		PackagedTaskTy PackagedTask([Future]() { Future.get(); });
Tasks.push(std::move(PackagedTask));		Tasks.push(std::move(PackagedTask));
return Future;		return Future;
}		}

ThreadPool::~ThreadPool() {		ThreadPool::~ThreadPool() { wait(); }
wait();
}

#endif		#endif

llvm/lib/Support/Threading.cpp

Show All 39 Lines	#if LLVM_ENABLE_THREADS == 0 \|\| \
(!defined(_WIN32) && !defined(HAVE_PTHREAD_H))		(!defined(_WIN32) && !defined(HAVE_PTHREAD_H))
// Support for non-Win32, non-pthread implementation.		// Support for non-Win32, non-pthread implementation.
void llvm::llvm_execute_on_thread(void (Fn)(void ), void *UserData,		void llvm::llvm_execute_on_thread(void (Fn)(void ), void *UserData,
llvm::Optional<unsigned> StackSizeInBytes) {		llvm::Optional<unsigned> StackSizeInBytes) {
(void)StackSizeInBytes;		(void)StackSizeInBytes;
Fn(UserData);		Fn(UserData);
}		}

unsigned llvm::heavyweight_hardware_concurrency() { return 1; }

unsigned llvm::hardware_concurrency() { return 1; }

uint64_t llvm::get_threadid() { return 0; }		uint64_t llvm::get_threadid() { return 0; }

uint32_t llvm::get_max_thread_name_length() { return 0; }		uint32_t llvm::get_max_thread_name_length() { return 0; }

void llvm::set_thread_name(const Twine &Name) {}		void llvm::set_thread_name(const Twine &Name) {}

void llvm::get_thread_name(SmallVectorImpl<char> &Name) { Name.clear(); }		void llvm::get_thread_name(SmallVectorImpl<char> &Name) { Name.clear(); }

		llvm::BitVector llvm::get_thread_affinity_mask() { return {}; }

		unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
		// When threads are disabled, ensure clients will loop at least once.
		return 1;
		}

#if LLVM_ENABLE_THREADS == 0		#if LLVM_ENABLE_THREADS == 0
void llvm::llvm_execute_on_thread_async(		void llvm::llvm_execute_on_thread_async(
llvm::unique_function<void()> Func,		llvm::unique_function<void()> Func,
llvm::Optional<unsigned> StackSizeInBytes) {		llvm::Optional<unsigned> StackSizeInBytes) {
(void)Func;		(void)Func;
(void)StackSizeInBytes;		(void)StackSizeInBytes;
report_fatal_error("Spawning a detached thread doesn't make sense with no "		report_fatal_error("Spawning a detached thread doesn't make sense with no "
"threading support");		"threading support");
}		}
#else		#else
// Support for non-Win32, non-pthread implementation.		// Support for non-Win32, non-pthread implementation.
void llvm::llvm_execute_on_thread_async(		void llvm::llvm_execute_on_thread_async(
llvm::unique_function<void()> Func,		llvm::unique_function<void()> Func,
llvm::Optional<unsigned> StackSizeInBytes) {		llvm::Optional<unsigned> StackSizeInBytes) {
(void)StackSizeInBytes;		(void)StackSizeInBytes;
std::thread(std::move(Func)).detach();		std::thread(std::move(Func)).detach();
}		}
#endif		#endif

#else		#else

#include <thread>		int computeHostNumHardwareThreads();
unsigned llvm::heavyweight_hardware_concurrency() {
		rnkUnsubmitted Done Reply Inline Actions We already have a public API, getHostNumPhysicalCores. Can this use that? rnk: We already have a public API, getHostNumPhysicalCores. Can this use that?
// Since we can't get here unless LLVM_ENABLE_THREADS == 1, it is safe to use		unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
// `std::thread` directly instead of `llvm::thread` (and indeed, doing so		int MaxThreadCount = UseHyperThreads ? computeHostNumHardwareThreads()
		rnkUnsubmitted Done Reply Inline Actions I see these `computeHostNum` methods return int, and some versions seem to return -1 to indicate errors. I think you'll want to use `int` here and check if `MaxThreadCount <= 0`. Otherwise below we may do a signed/unsigned comparison mismatch, and return ~0U for MaxThreadCount. rnk:* I see these `computeHostNum*` methods return int, and some versions seem to return -1 to…
// allows us to not define `thread` in the llvm namespace, which conflicts		: sys::getHostNumPhysicalCores();
// with some platforms such as FreeBSD whose headers also define a struct		if (MaxThreadCount <= 0)
// called `thread` in the global namespace which can cause ambiguity due to		MaxThreadCount = 1;
// ADL.
int NumPhysical = sys::getHostNumPhysicalCores();		// No need to create more threads than there are hardware threads, it would
if (NumPhysical == -1)		// uselessly induce more context-switching and cache eviction.
return std::thread::hardware_concurrency();		if (!ThreadsRequested \|\| ThreadsRequested > (unsigned)MaxThreadCount)
		aganeaAuthorUnsubmitted Done Reply Inline Actions @mehdi_amini You mean this? Testing was showing degraded performance if ThreadsRequested > MaxThreadCount, so I thought it'd maybe better to prevent that situation. More soft threads than hardware threads means you'd pay for context switching, for the cache eviction and for extra memory pressure (even more if the allocator has per-thread pools). Do you see a cases where not having this test would be valuable? Providing `--threads=50`, and creating 50-threads ThreadPool when your CPU only supports 12 hardware threads? The only case I can think of is usage of async operations in threads, but then your ThreadPool sounds more like a queue, and maybe it's not the right hammer for the nail? Should we support that case and explicitly tag the ThreadPool constructor in client code with something like ThreadPool(50, AcknowledgeDegradedPerformance)? aganea: @mehdi_amini You mean this? Testing was showing degraded performance if ThreadsRequested >…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Testing was showing degraded performance I don't understand why this is relevant to anything else than the default? This is a library and if the caller override the default then IMO this layer should just let it be! The current issue with bulldozer is a perfect example of this: the user is requesting more threads: why wouldn't you honor this request? If they want to shoot themselves in the foot, I'd let them (in most cases the user would know something you don't, like here). I don't think we need the extra `AcknowledgeDegradedPerformance` flag, if the client override the default they can be assumed to know what they're doing (or they shouldn't use such an API in the first place), this seems more in line with the rest of the system (and C++ API in general I believe). mehdi_amini: > Testing was showing degraded performance I don't understand why this is relevant to…
return NumPhysical;		return MaxThreadCount;
}		return ThreadsRequested;

unsigned llvm::hardware_concurrency() {
#if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
cpu_set_t Set;
if (sched_getaffinity(0, sizeof(Set), &Set))
return CPU_COUNT(&Set);
#endif
// Guard against std::thread::hardware_concurrency() returning 0.
if (unsigned Val = std::thread::hardware_concurrency())
return Val;
return 1;
}		}

namespace {		namespace {
struct SyncThreadInfo {		struct SyncThreadInfo {
void (UserFn)(void );		void (UserFn)(void );
void *UserData;		void *UserData;
};		};

Show All 30 Lines

llvm/lib/Support/Unix/Threading.inc

Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	#elif defined(__APPLE__)
return !setpriority(PRIO_DARWIN_THREAD, 0,		return !setpriority(PRIO_DARWIN_THREAD, 0,
Priority == ThreadPriority::Background ? PRIO_DARWIN_BG		Priority == ThreadPriority::Background ? PRIO_DARWIN_BG
: 0)		: 0)
? SetThreadPriorityResult::SUCCESS		? SetThreadPriorityResult::SUCCESS
: SetThreadPriorityResult::FAILURE;		: SetThreadPriorityResult::FAILURE;
#endif		#endif
return SetThreadPriorityResult::FAILURE;		return SetThreadPriorityResult::FAILURE;
}		}

		#include <thread>

		int computeHostNumHardwareThreads() {
		rnkUnsubmitted Done Reply Inline Actions I'm not sure it makes sense to say "physical threads". I think "physical cores" was meant to distinguish between hardware threads and cores. Describing hardware execution resources as physical or non-physical isn't very precise or meaningful in the first place, but I don't think it should apply to hyper threads. rnk: I'm not sure it makes sense to say "physical threads". I think "physical cores" was meant to…
		aganeaAuthorUnsubmitted Done Reply Inline Actions `computeHostNumHardwareThreads()` ? aganea: `computeHostNumHardwareThreads()` ?
		rnkUnsubmitted Not Done Reply Inline Actions lgtm rnk: lgtm
		#if defined(HAVE_SCHED_GETAFFINITY) && defined(HAVE_CPU_COUNT)
		cpu_set_t Set;
		if (sched_getaffinity(0, sizeof(Set), &Set))
		return CPU_COUNT(&Set);
		#endif
		// Guard against std::thread::hardware_concurrency() returning 0.
		if (unsigned Val = std::thread::hardware_concurrency())
		return Val;
		return 1;
		}

		void llvm::ThreadPoolStrategy::apply_thread_strategy(
		unsigned ThreadPoolNum) const {}

		llvm::BitVector llvm::get_thread_affinity_mask() {
		// FIXME: Implement
		llvm_unreachable("Not implemented!");
		}

		unsigned llvm::get_cpus() { return 1; }

llvm/lib/Support/Windows/Threading.inc

Show All 10 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"

#include "WindowsSupport.h"		#include "WindowsSupport.h"
#include <process.h>		#include <process.h>

		#include <bitset>

// Windows will at times define MemoryFence.		// Windows will at times define MemoryFence.
#ifdef MemoryFence		#ifdef MemoryFence
#undef MemoryFence		#undef MemoryFence
#endif		#endif

static unsigned __stdcall threadFuncSync(void *Arg) {		static unsigned __stdcall threadFuncSync(void *Arg) {
SyncThreadInfo TI = static_cast<SyncThreadInfo >(Arg);		SyncThreadInfo TI = static_cast<SyncThreadInfo >(Arg);
TI->UserFn(TI->UserData);		TI->UserFn(TI->UserData);
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	SetThreadPriorityResult llvm::set_thread_priority(ThreadPriority Priority) {
// processing mode.		// processing mode.
return SetThreadPriority(GetCurrentThread(),		return SetThreadPriority(GetCurrentThread(),
Priority == ThreadPriority::Background		Priority == ThreadPriority::Background
? THREAD_MODE_BACKGROUND_BEGIN		? THREAD_MODE_BACKGROUND_BEGIN
: THREAD_MODE_BACKGROUND_END)		: THREAD_MODE_BACKGROUND_END)
? SetThreadPriorityResult::SUCCESS		? SetThreadPriorityResult::SUCCESS
: SetThreadPriorityResult::FAILURE;		: SetThreadPriorityResult::FAILURE;
}		}

		struct ProcessorGroup {
		unsigned ID;
		unsigned AllThreads;
		unsigned UsableThreads;
		unsigned ThreadsPerCore;
		uint64_t Affinity;
		};

		template <typename F>
		static bool IterateProcInfo(LOGICAL_PROCESSOR_RELATIONSHIP Relationship, F Fn) {
		DWORD Len = 0;
		BOOL R = ::GetLogicalProcessorInformationEx(Relationship, NULL, &Len);
		if (R \|\| GetLastError() != ERROR_INSUFFICIENT_BUFFER) {
		return false;
		}
		auto Info = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX )calloc(1, Len);
		R = ::GetLogicalProcessorInformationEx(Relationship, Info, &Len);
		if (R) {
		auto *End =
		(SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX )((uint8_t )Info + Len);
		for (auto *Curr = Info; Curr < End;
		Curr = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX )((uint8_t )Curr +
		Curr->Size)) {
		if (Curr->Relationship != Relationship)
		continue;
		Fn(Curr);
		}
		}
		free(Info);
		return true;
		}

		static ArrayRef<ProcessorGroup> getProcessorGroups() {
		rnkUnsubmitted Done Reply Inline Actions This is cached, so maybe `getProcessorGroups` to indicate that it is not intended to be expensive. rnk: This is cached, so maybe `getProcessorGroups` to indicate that it is not intended to be…
		auto computeGroups = []() {
		SmallVector<ProcessorGroup, 4> Groups;

		auto HandleGroup = [&](SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *ProcInfo) {
		GROUP_RELATIONSHIP &El = ProcInfo->Group;
		for (unsigned J = 0; J < El.ActiveGroupCount; ++J) {
		ProcessorGroup G;
		G.ID = Groups.size();
		G.AllThreads = El.GroupInfo[J].MaximumProcessorCount;
		G.UsableThreads = El.GroupInfo[J].ActiveProcessorCount;
		assert(G.UsableThreads <= 64);
		G.Affinity = El.GroupInfo[J].ActiveProcessorMask;
		Groups.push_back(G);
		}
		};

		if (!IterateProcInfo(RelationGroup, HandleGroup))
		return std::vector<ProcessorGroup>();

		auto HandleProc = [&](SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *ProcInfo) {
		PROCESSOR_RELATIONSHIP &El = ProcInfo->Processor;
		assert(El.GroupCount == 1);
		unsigned NumHyperThreads = 1;
		// If the flag is set, each core supports more than one hyper-thread.
		if (El.Flags & LTP_PC_SMT)
		NumHyperThreads = std::bitset<64>(El.GroupMask[0].Mask).count();
		unsigned I = El.GroupMask[0].Group;
		Groups[I].ThreadsPerCore = NumHyperThreads;
		};

		if (!IterateProcInfo(RelationProcessorCore, HandleProc))
		return std::vector<ProcessorGroup>();

		// If there's an affinity mask set on one of the CPUs, then assume the user
		// wants to constrain the current process to only a single CPU.
		for (auto &G : Groups) {
		if (G.UsableThreads != G.AllThreads) {
		ProcessorGroup NewG{G};
		Groups.clear();
		Groups.push_back(NewG);
		break;
		}
		}

		return std::vector<ProcessorGroup>(Groups.begin(), Groups.end());
		};
		static auto Groups = computeGroups();
		return ArrayRef<ProcessorGroup>(Groups);
		}

		template <typename R, typename UnaryPredicate>
		static unsigned aggregate(R &&Range, UnaryPredicate P) {
		rnkUnsubmitted Done Reply Inline Actions Seems like `static` can be used here. rnk: Seems like `static` can be used here.
		unsigned I{};
		for (const auto &It : Range)
		I += P(It);
		return I;
		}

		// for sys::getHostNumPhysicalCores
		int computeHostNumPhysicalCores() {
		static unsigned Cores =
		aggregate(getProcessorGroups(), [](const ProcessorGroup &G) {
		return G.UsableThreads / G.ThreadsPerCore;
		});
		return Cores;
		}

		int computeHostNumHardwareThreads() {
		static unsigned Threads =
		aggregate(getProcessorGroups(),
		[](const ProcessorGroup &G) { return G.UsableThreads; });
		return Threads;
		}

		// Assign the current thread to a more appropriate CPU socket or CPU group
		void llvm::ThreadPoolStrategy::apply_thread_strategy(
		unsigned ThreadPoolNum) const {
		ArrayRef<ProcessorGroup> Groups = getProcessorGroups();

		assert(ThreadPoolNum < compute_thread_count() &&
		"The thread index is not within thread strategy's range!");

		// In this mode, the ThreadNumber represents the core number, not the
		// hyper-thread number. Assumes all NUMA groups have the same amount of
		// hyper-threads.
		if (!UseHyperThreads)
		ThreadPoolNum *= Groups[0].ThreadsPerCore;

		unsigned ThreadRangeStart = 0;
		for (unsigned I = 0; I < Groups.size(); ++I) {
		const ProcessorGroup &G = Groups[I];
		if (ThreadPoolNum >= ThreadRangeStart &&
		ThreadPoolNum < ThreadRangeStart + G.UsableThreads) {

		GROUP_AFFINITY Affinity{};
		Affinity.Group = G.ID;
		Affinity.Mask = G.Affinity;
		SetThreadGroupAffinity(GetCurrentThread(), &Affinity, nullptr);
		}
		ThreadRangeStart += G.UsableThreads;
		}
		}

		llvm::BitVector llvm::get_thread_affinity_mask() {
		GROUP_AFFINITY Affinity{};
		GetThreadGroupAffinity(GetCurrentThread(), &Affinity);

		static unsigned All =
		aggregate(getProcessorGroups(),
		[](const ProcessorGroup &G) { return G.AllThreads; });

		unsigned StartOffset =
		aggregate(getProcessorGroups(), [&](const ProcessorGroup &G) {
		return G.ID < Affinity.Group ? G.AllThreads : 0;
		});

		llvm::BitVector V;
		V.resize(All);
		for (unsigned I = 0; I < sizeof(KAFFINITY) * 8; ++I) {
		if ((Affinity.Mask >> I) & 1)
		V.set(StartOffset + I);
		}
		return V;
		}

		unsigned llvm::get_cpus() { return getProcessorGroups().size(); }

llvm/tools/dsymutil/dsymutil.cpp

Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	if (opt::Arg *Toolchain = Args.getLastArg(OPT_toolchain))
Options.Toolchain = Toolchain->getValue();		Options.Toolchain = Toolchain->getValue();

if (Args.hasArg(OPT_assembly))		if (Args.hasArg(OPT_assembly))
Options.LinkOpts.FileType = OutputFileType::Assembly;		Options.LinkOpts.FileType = OutputFileType::Assembly;

if (opt::Arg *NumThreads = Args.getLastArg(OPT_threads))		if (opt::Arg *NumThreads = Args.getLastArg(OPT_threads))
Options.LinkOpts.Threads = atoi(NumThreads->getValue());		Options.LinkOpts.Threads = atoi(NumThreads->getValue());
else		else
Options.LinkOpts.Threads = thread::hardware_concurrency();		Options.LinkOpts.Threads = 0; // Use all available hardware threads

if (Options.DumpDebugMap \|\| Options.LinkOpts.Verbose)		if (Options.DumpDebugMap \|\| Options.LinkOpts.Verbose)
Options.LinkOpts.Threads = 1;		Options.LinkOpts.Threads = 1;

if (getenv("RC_DEBUG_OPTIONS"))		if (getenv("RC_DEBUG_OPTIONS"))
Options.PaperTrailWarnings = true;		Options.PaperTrailWarnings = true;

if (opt::Arg *RemarksPrependPath = Args.getLastArg(OPT_remarks_prepend_path))		if (opt::Arg *RemarksPrependPath = Args.getLastArg(OPT_remarks_prepend_path))
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	for (auto &InputFile : Options.InputFiles) {
if (DebugMapPtrsOrErr->empty()) {		if (DebugMapPtrsOrErr->empty()) {
WithColor::error() << "no architecture to link\n";		WithColor::error() << "no architecture to link\n";
return 1;		return 1;
}		}

// Shared a single binary holder for all the link steps.		// Shared a single binary holder for all the link steps.
BinaryHolder BinHolder;		BinaryHolder BinHolder;

unsigned ThreadCount =		unsigned ThreadCount = Options.LinkOpts.Threads;
std::min<unsigned>(Options.LinkOpts.Threads, DebugMapPtrsOrErr->size());		if (!ThreadCount)
ThreadPool Threads(ThreadCount);		ThreadCount = DebugMapPtrsOrErr->size();
		ThreadPool Threads(hardware_concurrency(ThreadCount));

// If there is more than one link to execute, we need to generate		// If there is more than one link to execute, we need to generate
// temporary files.		// temporary files.
const bool NeedsTempFiles =		const bool NeedsTempFiles =
!Options.DumpDebugMap && (Options.OutputFile != "-") &&		!Options.DumpDebugMap && (Options.OutputFile != "-") &&
(DebugMapPtrsOrErr->size() != 1 \|\| Options.LinkOpts.Update);		(DebugMapPtrsOrErr->size() != 1 \|\| Options.LinkOpts.Update);
const bool Verify = Options.Verify && !Options.LinkOpts.NoOutput;		const bool Verify = Options.Verify && !Options.LinkOpts.NoOutput;

▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/tools/gold/gold-plugin.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	enum OutputType {
OT_BC_ONLY,		OT_BC_ONLY,
OT_ASM_ONLY,		OT_ASM_ONLY,
OT_SAVE_TEMPS		OT_SAVE_TEMPS
};		};
static OutputType TheOutputType = OT_NORMAL;		static OutputType TheOutputType = OT_NORMAL;
static unsigned OptLevel = 2;		static unsigned OptLevel = 2;
// Default parallelism of 0 used to indicate that user did not specify.		// Default parallelism of 0 used to indicate that user did not specify.
// Actual parallelism default value depends on implementation.		// Actual parallelism default value depends on implementation.
// Currently only affects ThinLTO, where the default is		// Currently only affects ThinLTO, where the default is the max cores in the
// llvm::heavyweight_hardware_concurrency.		// system.
static unsigned Parallelism = 0;		static unsigned Parallelism = 0;
// Default regular LTO codegen parallelism (number of partitions).		// Default regular LTO codegen parallelism (number of partitions).
static unsigned ParallelCodeGenParallelismLevel = 1;		static unsigned ParallelCodeGenParallelismLevel = 1;
#ifdef NDEBUG		#ifdef NDEBUG
static bool DisableVerify = true;		static bool DisableVerify = true;
#else		#else
static bool DisableVerify = false;		static bool DisableVerify = false;
#endif		#endif
▲ Show 20 Lines • Show All 1,016 Lines • Show Last 20 Lines

llvm/tools/llvm-cov/CodeCoverage.cpp

Show First 20 Lines • Show All 941 Lines • ▼ Show 20 Lines	int CodeCoverageTool::doShow(int argc, const char **argv,
bool ShowFilenames =		bool ShowFilenames =
(SourceFiles.size() != 1) \|\| ViewOpts.hasOutputDirectory() \|\|		(SourceFiles.size() != 1) \|\| ViewOpts.hasOutputDirectory() \|\|
(ViewOpts.Format == CoverageViewOptions::OutputFormat::HTML);		(ViewOpts.Format == CoverageViewOptions::OutputFormat::HTML);

auto NumThreads = ViewOpts.NumThreads;		auto NumThreads = ViewOpts.NumThreads;

// If NumThreads is not specified, auto-detect a good default.		// If NumThreads is not specified, auto-detect a good default.
if (NumThreads == 0)		if (NumThreads == 0)
NumThreads =		NumThreads = SourceFiles.size();
std::max(1U, std::min(llvm::heavyweight_hardware_concurrency(),
unsigned(SourceFiles.size())));

if (!ViewOpts.hasOutputDirectory() \|\| NumThreads == 1) {		if (!ViewOpts.hasOutputDirectory() \|\| NumThreads == 1) {
for (const std::string &SourceFile : SourceFiles)		for (const std::string &SourceFile : SourceFiles)
writeSourceFileView(SourceFile, Coverage.get(), Printer.get(),		writeSourceFileView(SourceFile, Coverage.get(), Printer.get(),
ShowFilenames);		ShowFilenames);
} else {		} else {
// In -output-dir mode, it's safe to use multiple threads to print files.		// In -output-dir mode, it's safe to use multiple threads to print files.
ThreadPool Pool(NumThreads);		ThreadPool Pool(heavyweight_hardware_concurrency(NumThreads));
for (const std::string &SourceFile : SourceFiles)		for (const std::string &SourceFile : SourceFiles)
Pool.async(&CodeCoverageTool::writeSourceFileView, this, SourceFile,		Pool.async(&CodeCoverageTool::writeSourceFileView, this, SourceFile,
Coverage.get(), Printer.get(), ShowFilenames);		Coverage.get(), Printer.get(), ShowFilenames);
Pool.wait();		Pool.wait();
}		}

return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/tools/llvm-cov/CoverageExporterJson.cpp

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	json::Object renderFile(const coverage::CoverageMapping &Coverage,
return File;		return File;
}		}

json::Array renderFiles(const coverage::CoverageMapping &Coverage,		json::Array renderFiles(const coverage::CoverageMapping &Coverage,
ArrayRef<std::string> SourceFiles,		ArrayRef<std::string> SourceFiles,
ArrayRef<FileCoverageSummary> FileReports,		ArrayRef<FileCoverageSummary> FileReports,
const CoverageViewOptions &Options) {		const CoverageViewOptions &Options) {
auto NumThreads = Options.NumThreads;		auto NumThreads = Options.NumThreads;
if (NumThreads == 0) {		if (NumThreads == 0)
NumThreads = std::max(1U, std::min(llvm::heavyweight_hardware_concurrency(),		NumThreads = SourceFiles.size();
unsigned(SourceFiles.size())));		ThreadPool Pool(heavyweight_hardware_concurrency(NumThreads));
}
ThreadPool Pool(NumThreads);
json::Array FileArray;		json::Array FileArray;
std::mutex FileArrayMutex;		std::mutex FileArrayMutex;

for (unsigned I = 0, E = SourceFiles.size(); I < E; ++I) {		for (unsigned I = 0, E = SourceFiles.size(); I < E; ++I) {
auto &SourceFile = SourceFiles[I];		auto &SourceFile = SourceFiles[I];
auto &FileReport = FileReports[I];		auto &FileReport = FileReports[I];
Pool.async([&] {		Pool.async([&] {
auto File = renderFile(Coverage, SourceFile, FileReport, Options);		auto File = renderFile(Coverage, SourceFile, FileReport, Options);
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/tools/llvm-cov/CoverageReport.cpp

Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	std::vector<FileCoverageSummary> CoverageReport::prepareFileReports(
const coverage::CoverageMapping &Coverage, FileCoverageSummary &Totals,		const coverage::CoverageMapping &Coverage, FileCoverageSummary &Totals,
ArrayRef<std::string> Files, const CoverageViewOptions &Options,		ArrayRef<std::string> Files, const CoverageViewOptions &Options,
const CoverageFilter &Filters) {		const CoverageFilter &Filters) {
unsigned LCP = getRedundantPrefixLen(Files);		unsigned LCP = getRedundantPrefixLen(Files);
auto NumThreads = Options.NumThreads;		auto NumThreads = Options.NumThreads;

// If NumThreads is not specified, auto-detect a good default.		// If NumThreads is not specified, auto-detect a good default.
if (NumThreads == 0)		if (NumThreads == 0)
NumThreads =		NumThreads = Files.size();
std::max(1U, std::min(llvm::heavyweight_hardware_concurrency(),		ThreadPool Pool(heavyweight_hardware_concurrency(NumThreads));
unsigned(Files.size())));

ThreadPool Pool(NumThreads);

std::vector<FileCoverageSummary> FileReports;		std::vector<FileCoverageSummary> FileReports;
FileReports.reserve(Files.size());		FileReports.reserve(Files.size());

for (StringRef Filename : Files) {		for (StringRef Filename : Files) {
FileReports.emplace_back(Filename.drop_front(LCP));		FileReports.emplace_back(Filename.drop_front(LCP));
Pool.async(&CoverageReport::prepareSingleFileReport, Filename,		Pool.async(&CoverageReport::prepareSingleFileReport, Filename,
&Coverage, Options, LCP, &FileReports.back(), &Filters);		&Coverage, Options, LCP, &FileReports.back(), &Filters);
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/tools/llvm-lto2/llvm-lto2.cpp

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	static cl::opt<bool> SaveTemps("save-temps", cl::desc("Save temporary files"));			static cl::opt<bool> SaveTemps("save-temps", cl::desc("Save temporary files"));

	static cl::opt<bool>			static cl::opt<bool>
	ThinLTODistributedIndexes("thinlto-distributed-indexes", cl::init(false),			ThinLTODistributedIndexes("thinlto-distributed-indexes", cl::init(false),
	cl::desc("Write out individual index and "			cl::desc("Write out individual index and "
	"import files for the "			"import files for the "
	"distributed backend case"));			"distributed backend case"));

	static cl::opt<int> Threads("thinlto-threads",			// Default to using all hardware cores in the system.
	cl::init(llvm::heavyweight_hardware_concurrency()));			static cl::opt<int> Threads("thinlto-threads", cl::init(0));

	static cl::list<std::string> SymbolResolutions(			static cl::list<std::string> SymbolResolutions(
	"r",			"r",
	cl::desc("Specify a symbol resolution: filename,symbolname,resolution\n"			cl::desc("Specify a symbol resolution: filename,symbolname,resolution\n"
	"where \"resolution\" is a sequence (which may be empty) of the\n"			"where \"resolution\" is a sequence (which may be empty) of the\n"
	"following characters:\n"			"following characters:\n"
	" p - prevailing: the linker has chosen this definition of the\n"			" p - prevailing: the linker has chosen this definition of the\n"
	" symbol\n"			" symbol\n"
	▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

llvm/tools/llvm-profdata/llvm-profdata.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	if (OutputFormat != PF_Binary && OutputFormat != PF_Compact_Binary &&
OutputFormat != PF_Ext_Binary && OutputFormat != PF_Text)		OutputFormat != PF_Ext_Binary && OutputFormat != PF_Text)
exitWithError("Unknown format is specified.");		exitWithError("Unknown format is specified.");

std::mutex ErrorLock;		std::mutex ErrorLock;
SmallSet<instrprof_error, 4> WriterErrorCodes;		SmallSet<instrprof_error, 4> WriterErrorCodes;

// If NumThreads is not specified, auto-detect a good default.		// If NumThreads is not specified, auto-detect a good default.
if (NumThreads == 0)		if (NumThreads == 0)
NumThreads =		NumThreads = std::min(hardware_concurrency().compute_thread_count(),
std::min(hardware_concurrency(), unsigned((Inputs.size() + 1) / 2));		unsigned((Inputs.size() + 1) / 2));
		// FIXME: There's a bug here, where setting NumThreads = Inputs.size() fails
		// the merge_empty_profile.test because the InstrProfWriter.ProfileKind isn't
		// merged, thus the emitted file ends up with a PF_Unknown kind.

// Initialize the writer contexts.		// Initialize the writer contexts.
SmallVector<std::unique_ptr<WriterContext>, 4> Contexts;		SmallVector<std::unique_ptr<WriterContext>, 4> Contexts;
for (unsigned I = 0; I < NumThreads; ++I)		for (unsigned I = 0; I < NumThreads; ++I)
Contexts.emplace_back(std::make_unique<WriterContext>(		Contexts.emplace_back(std::make_unique<WriterContext>(
OutputSparse, ErrorLock, WriterErrorCodes));		OutputSparse, ErrorLock, WriterErrorCodes));

if (NumThreads == 1) {		if (NumThreads == 1) {
for (const auto &Input : Inputs)		for (const auto &Input : Inputs)
loadInput(Input, Remapper, Contexts[0].get());		loadInput(Input, Remapper, Contexts[0].get());
} else {		} else {
ThreadPool Pool(NumThreads);		ThreadPool Pool(hardware_concurrency(NumThreads));

// Load the inputs in parallel (N/NumThreads serial steps).		// Load the inputs in parallel (N/NumThreads serial steps).
unsigned Ctx = 0;		unsigned Ctx = 0;
for (const auto &Input : Inputs) {		for (const auto &Input : Inputs) {
Pool.async(loadInput, Input, Remapper, Contexts[Ctx].get());		Pool.async(loadInput, Input, Remapper, Contexts[Ctx].get());
Ctx = (Ctx + 1) % NumThreads;		Ctx = (Ctx + 1) % NumThreads;
}		}
Pool.wait();		Pool.wait();
▲ Show 20 Lines • Show All 848 Lines • Show Last 20 Lines

llvm/unittests/Support/Host.cpp

	Show All 31 Lines
	class HostTest : public testing::Test {			class HostTest : public testing::Test {
	Triple Host;			Triple Host;

	protected:			protected:
	bool isSupportedArchAndOS() {			bool isSupportedArchAndOS() {
	// Initially this is only testing detection of the number of			// Initially this is only testing detection of the number of
	// physical cores, which is currently only supported/tested for			// physical cores, which is currently only supported/tested for
	// x86_64 Linux and Darwin.			// x86_64 Linux and Darwin.
	return (Host.getArch() == Triple::x86_64 &&			return Host.isOSWindows() \|\|
				(Host.getArch() == Triple::x86_64 &&
	(Host.isOSDarwin() \|\| Host.getOS() == Triple::Linux));			(Host.isOSDarwin() \|\| Host.getOS() == Triple::Linux));
	}			}

	HostTest() : Host(Triple::normalize(sys::getProcessTriple())) {}			HostTest() : Host(Triple::normalize(sys::getProcessTriple())) {}
	};			};

	TEST_F(HostTest, NumPhysicalCores) {			TEST_F(HostTest, NumPhysicalCores) {
	int Num = sys::getHostNumPhysicalCores();			int Num = sys::getHostNumPhysicalCores();
	▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

llvm/unittests/Support/TaskQueueTest.cpp

Show All 16 Lines
using namespace llvm;		using namespace llvm;

class TaskQueueTest : public testing::Test {		class TaskQueueTest : public testing::Test {
protected:		protected:
TaskQueueTest() {}		TaskQueueTest() {}
};		};

TEST_F(TaskQueueTest, OrderedFutures) {		TEST_F(TaskQueueTest, OrderedFutures) {
ThreadPool TP(1);		ThreadPool TP(hardware_concurrency(1));
TaskQueue TQ(TP);		TaskQueue TQ(TP);
std::atomic<int> X{ 0 };		std::atomic<int> X{ 0 };
std::atomic<int> Y{ 0 };		std::atomic<int> Y{ 0 };
std::atomic<int> Z{ 0 };		std::atomic<int> Z{ 0 };

std::mutex M1, M2, M3;		std::mutex M1, M2, M3;
std::unique_lock<std::mutex> L1(M1);		std::unique_lock<std::mutex> L1(M1);
std::unique_lock<std::mutex> L2(M2);		std::unique_lock<std::mutex> L2(M2);
Show All 27 Lines	TEST_F(TaskQueueTest, OrderedFutures) {
L3.unlock();		L3.unlock();
F3.wait();		F3.wait();
ASSERT_EQ(1, X);		ASSERT_EQ(1, X);
ASSERT_EQ(1, Y);		ASSERT_EQ(1, Y);
ASSERT_EQ(1, Z);		ASSERT_EQ(1, Z);
}		}

TEST_F(TaskQueueTest, UnOrderedFutures) {		TEST_F(TaskQueueTest, UnOrderedFutures) {
ThreadPool TP(1);		ThreadPool TP(hardware_concurrency(1));
TaskQueue TQ(TP);		TaskQueue TQ(TP);
std::atomic<int> X{ 0 };		std::atomic<int> X{ 0 };
std::atomic<int> Y{ 0 };		std::atomic<int> Y{ 0 };
std::atomic<int> Z{ 0 };		std::atomic<int> Z{ 0 };
std::mutex M;		std::mutex M;

std::unique_lock<std::mutex> Lock(M);		std::unique_lock<std::mutex> Lock(M);

Show All 13 Lines	TEST_F(TaskQueueTest, UnOrderedFutures) {

F3.wait();		F3.wait();
ASSERT_EQ(1, X);		ASSERT_EQ(1, X);
ASSERT_EQ(1, Y);		ASSERT_EQ(1, Y);
ASSERT_EQ(1, Z);		ASSERT_EQ(1, Z);
}		}

TEST_F(TaskQueueTest, FutureWithReturnValue) {		TEST_F(TaskQueueTest, FutureWithReturnValue) {
ThreadPool TP(1);		ThreadPool TP(hardware_concurrency(1));
TaskQueue TQ(TP);		TaskQueue TQ(TP);
std::future<std::string> F1 = TQ.async([&] { return std::string("Hello"); });		std::future<std::string> F1 = TQ.async([&] { return std::string("Hello"); });
std::future<int> F2 = TQ.async([&] { return 42; });		std::future<int> F2 = TQ.async([&] { return 42; });

ASSERT_EQ(42, F2.get());		ASSERT_EQ(42, F2.get());
ASSERT_EQ("Hello", F1.get());		ASSERT_EQ("Hello", F1.get());
}		}
#endif		#endif

llvm/unittests/Support/ThreadPool.cpp

//========- unittests/Support/ThreadPools.cpp - ThreadPools.h tests --========//		//========- unittests/Support/ThreadPools.cpp - ThreadPools.h tests --========//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Support/ThreadPool.h"		#include "llvm/Support/ThreadPool.h"

		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
		#include "llvm/Support/Threading.h"

#include "gtest/gtest.h"		#include "gtest/gtest.h"

using namespace llvm;		using namespace llvm;

// Fixture for the unittests, allowing to temporarily disable the unittests		// Fixture for the unittests, allowing to temporarily disable the unittests
// on a particular platform		// on a particular platform
class ThreadPoolTest : public testing::Test {		class ThreadPoolTest : public testing::Test {
Triple Host;		Triple Host;
SmallVector<Triple::ArchType, 4> UnsupportedArchs;		SmallVector<Triple::ArchType, 4> UnsupportedArchs;
SmallVector<Triple::OSType, 4> UnsupportedOSs;		SmallVector<Triple::OSType, 4> UnsupportedOSs;
SmallVector<Triple::EnvironmentType, 1> UnsupportedEnvironments;		SmallVector<Triple::EnvironmentType, 1> UnsupportedEnvironments;
protected:		protected:
// This is intended for platform as a temporary "XFAIL"		// This is intended for platform as a temporary "XFAIL"
		rnkUnsubmitted Not Done Reply Inline Actions "Temporary" >_> rnk: "Temporary" >_>
bool isUnsupportedOSOrEnvironment() {		bool isUnsupportedOSOrEnvironment() {
Triple Host(Triple::normalize(sys::getProcessTriple()));		Triple Host(Triple::normalize(sys::getProcessTriple()));

if (find(UnsupportedEnvironments, Host.getEnvironment()) !=		if (find(UnsupportedEnvironments, Host.getEnvironment()) !=
UnsupportedEnvironments.end())		UnsupportedEnvironments.end())
return true;		return true;

if (is_contained(UnsupportedOSs, Host.getOS()))		if (is_contained(UnsupportedOSs, Host.getOS()))
Show All 26 Lines	void setMainThreadReady() {
std::unique_lock<std::mutex> LockGuard(WaitMainThreadMutex);		std::unique_lock<std::mutex> LockGuard(WaitMainThreadMutex);
MainThreadReady = true;		MainThreadReady = true;
}		}
WaitMainThread.notify_all();		WaitMainThread.notify_all();
}		}

void SetUp() override { MainThreadReady = false; }		void SetUp() override { MainThreadReady = false; }

		void TestAllThreads(ThreadPoolStrategy S);

std::condition_variable WaitMainThread;		std::condition_variable WaitMainThread;
std::mutex WaitMainThreadMutex;		std::mutex WaitMainThreadMutex;
bool MainThreadReady = false;		bool MainThreadReady = false;
};		};

#define CHECK_UNSUPPORTED() \		#define CHECK_UNSUPPORTED() \
do { \		do { \
if (isUnsupportedOSOrEnvironment()) \		if (isUnsupportedOSOrEnvironment()) \
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	TEST_F(ThreadPoolTest, Async) {
ASSERT_NE(2, i.load());		ASSERT_NE(2, i.load());
setMainThreadReady();		setMainThreadReady();
Pool.wait();		Pool.wait();
ASSERT_EQ(2, i.load());		ASSERT_EQ(2, i.load());
}		}

TEST_F(ThreadPoolTest, GetFuture) {		TEST_F(ThreadPoolTest, GetFuture) {
CHECK_UNSUPPORTED();		CHECK_UNSUPPORTED();
ThreadPool Pool{2};		ThreadPool Pool(hardware_concurrency(2));
std::atomic_int i{0};		std::atomic_int i{0};
Pool.async([this, &i] {		Pool.async([this, &i] {
waitForMainThread();		waitForMainThread();
++i;		++i;
});		});
// Force the future using get()		// Force the future using get()
Pool.async([&i] { ++i; }).get();		Pool.async([&i] { ++i; }).get();
ASSERT_NE(2, i.load());		ASSERT_NE(2, i.load());
Show All 14 Lines	for (size_t i = 0; i < 5; ++i) {
++checked_in;		++checked_in;
});		});
}		}
ASSERT_EQ(0, checked_in);		ASSERT_EQ(0, checked_in);
setMainThreadReady();		setMainThreadReady();
}		}
ASSERT_EQ(5, checked_in);		ASSERT_EQ(5, checked_in);
}		}

		#if LLVM_ENABLE_THREADS == 1

		void ThreadPoolTest::TestAllThreads(ThreadPoolStrategy S) {
		// FIXME: Skip these tests on non-Windows because multi-socket system were not
		// tested on Unix yet, and llvm::get_thread_affinity_mask() isn't implemented
		// for Unix.
		Triple Host(Triple::normalize(sys::getProcessTriple()));
		if (!Host.isOSWindows())
		return;

		llvm::DenseSet<llvm::BitVector> ThreadsUsed;
		rnkUnsubmitted Done Reply Inline Actions I guess this is why you added comparison operators. In any case, let's remove the commented out code in the final version. rnk: I guess this is why you added comparison operators. In any case, let's remove the commented out…
		std::mutex Lock;
		unsigned Threads = 0;
		{
		ThreadPool Pool(S);
		Threads = Pool.getThreadCount();
		for (size_t I = 0; I < 10000; ++I) {
		Pool.async([&] {
		waitForMainThread();
		std::lock_guard<std::mutex> Guard(Lock);
		auto Mask = llvm::get_thread_affinity_mask();
		ThreadsUsed.insert(Mask);
		});
		}
		ASSERT_EQ(true, ThreadsUsed.empty());
		setMainThreadReady();
		}
		ASSERT_EQ(llvm::get_cpus(), ThreadsUsed.size());
		}

		TEST_F(ThreadPoolTest, AllThreads_UseAllRessources) {
		CHECK_UNSUPPORTED();
		TestAllThreads({});
		}

		TEST_F(ThreadPoolTest, AllThreads_OneThreadPerCore) {
		CHECK_UNSUPPORTED();
		TestAllThreads(llvm::heavyweight_hardware_concurrency());
		}

		#endif

llvm/unittests/Support/Threading.cpp

	Show All 15 Lines
	using namespace llvm;			using namespace llvm;

	namespace {			namespace {

	TEST(Threading, PhysicalConcurrency) {			TEST(Threading, PhysicalConcurrency) {
	auto Num = heavyweight_hardware_concurrency();			auto Num = heavyweight_hardware_concurrency();
	// Since Num is unsigned this will also catch us trying to			// Since Num is unsigned this will also catch us trying to
	// return -1.			// return -1.
	ASSERT_LE(Num, thread::hardware_concurrency());			ASSERT_LE(Num.compute_thread_count(),
				hardware_concurrency().compute_thread_count());
	}			}

	#if LLVM_ENABLE_THREADS			#if LLVM_ENABLE_THREADS

	class Notification {			class Notification {
	public:			public:
	void notify() {			void notify() {
	{			{
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/lib/Pass/Pass.cpp

	Show First 20 Lines • Show All 405 Lines • ▼ Show 20 Lines

	// Run the held pipeline asynchronously across the functions within the module.			// Run the held pipeline asynchronously across the functions within the module.
	void OpToOpPassAdaptorParallel::runOnOperation() {			void OpToOpPassAdaptorParallel::runOnOperation() {
	AnalysisManager am = getAnalysisManager();			AnalysisManager am = getAnalysisManager();

	// Create the async executors if they haven't been created, or if the main			// Create the async executors if they haven't been created, or if the main
	// pipeline has changed.			// pipeline has changed.
	if (asyncExecutors.empty() \|\| hasSizeMismatch(asyncExecutors.front(), mgrs))			if (asyncExecutors.empty() \|\| hasSizeMismatch(asyncExecutors.front(), mgrs))
	asyncExecutors.assign(llvm::hardware_concurrency(), mgrs);			asyncExecutors.assign(llvm::hardware_concurrency().compute_thread_count(),
				mgrs);

	// Run a prepass over the module to collect the operations to execute over.			// Run a prepass over the module to collect the operations to execute over.
	// This ensures that an analysis manager exists for each operation, as well as			// This ensures that an analysis manager exists for each operation, as well as
	// providing a queue of operations to execute over.			// providing a queue of operations to execute over.
	std::vector<std::pair<Operation *, AnalysisManager>> opAMPairs;			std::vector<std::pair<Operation *, AnalysisManager>> opAMPairs;
	for (auto &region : getOperation()->getRegions()) {			for (auto &region : getOperation()->getRegions()) {
	for (auto &block : region) {			for (auto &block : region) {
	for (auto &op : block) {			for (auto &op : block) {
	▲ Show 20 Lines • Show All 333 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ThreadPool] On Windows, extend usage to all CPU sockets and all NUMA groupsClosedPublic

Details

Background

The changes in this patch

Discussion

Diff Detail

Event Timeline

Revision Contents

Diff 244664

clang-tools-extra/clang-doc/tool/ClangDocMain.cpp

clang-tools-extra/clangd/TUScheduler.cpp

clang-tools-extra/clangd/index/Background.h

clang-tools-extra/clangd/index/Background.cpp

clang-tools-extra/clangd/index/BackgroundRebuild.h

clang/lib/Tooling/AllTUsExecution.cpp

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

clang/tools/clang-scan-deps/ClangScanDeps.cpp

lld/ELF/SyntheticSections.cpp

llvm/include/llvm/LTO/LTO.h

llvm/include/llvm/Support/ThreadPool.h

llvm/include/llvm/Support/Threading.h

llvm/lib/CodeGen/ParallelCG.cpp

llvm/lib/DWARFLinker/DWARFLinker.cpp

llvm/lib/DebugInfo/GSYM/DwarfTransformer.cpp

llvm/lib/ExecutionEngine/Orc/LLJIT.cpp

llvm/lib/LTO/LTO.cpp

llvm/lib/LTO/LTOBackend.cpp

llvm/lib/LTO/ThinLTOCodeGenerator.cpp

llvm/lib/Support/Host.cpp

llvm/lib/Support/Parallel.cpp

llvm/lib/Support/ThreadPool.cpp

llvm/lib/Support/Threading.cpp

llvm/lib/Support/Unix/Threading.inc

llvm/lib/Support/Windows/Threading.inc

llvm/tools/dsymutil/dsymutil.cpp

llvm/tools/gold/gold-plugin.cpp

llvm/tools/llvm-cov/CodeCoverage.cpp

llvm/tools/llvm-cov/CoverageExporterJson.cpp

llvm/tools/llvm-cov/CoverageReport.cpp

llvm/tools/llvm-lto2/llvm-lto2.cpp

llvm/tools/llvm-profdata/llvm-profdata.cpp

llvm/unittests/Support/Host.cpp

llvm/unittests/Support/TaskQueueTest.cpp

llvm/unittests/Support/ThreadPool.cpp

llvm/unittests/Support/Threading.cpp

mlir/lib/Pass/Pass.cpp

[ThreadPool] On Windows, extend usage to all CPU sockets and all NUMA groups
ClosedPublic