This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
-
Threading.h
-
lib/Support/
-
Support/
2/2
Threading.cpp
-
tools/
-
dsymutil/
-
dsymutil.cpp
-
llvm-cov/
1/1
CodeCoverage.cpp
-
CoverageExporterJson.cpp
-
CoverageReport.cpp

Differential D78408

[llvm-cov] Prevent llvm-cov from using too many threads
ClosedPublic

Authored by aganea on Apr 17 2020, 5:33 PM.

Download Raw Diff

Details

Reviewers

phosek
MaskRay
JDevlieghere
friss

Commits

rG0e13a0331fb9: [llvm-cov] Prevent llvm-cov from using too many threads

Summary

As requested here: https://reviews.llvm.org/D75153#1987272

Before, each instance of llvm-cov was creating one thread per hardware core, which wasn't needed probably because the number of inputs were small. This was probably causing a thread rlimit issue on large core count systems.

After this patch, the previous behavior is restored (to what was before rG8404aeb5):

If --num-threads is not specified, we create one thread per input, up to num.cores.
When specified, --num-threads indicates any number of threads, with no upper limit.

Petr, could you please confirm that this patch fixes the issue?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aganea created this revision.Apr 17 2020, 5:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2020, 5:33 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B53803: Diff 258450!Apr 17 2020, 5:34 PM

I think we should respect ThreadsRequested if it is greater than 0.

For the particular llvm-cov issue, we can probably restore the previous behavior:

NumThreads =
    std::max(1U, std::min(llvm::heavyweight_hardware_concurrency(),
                          unsigned(SourceFiles.size())));
NumThreads = SourceFiles.size();

In D78408#1990302, @MaskRay wrote:

I think we should respect ThreadsRequested if it is greater than 0.

Nothing is changed in that regard. Only when .Limit is set.

In D78408#1990302, @MaskRay wrote:

For the particular llvm-cov issue, we can probably restore the previous behavior:

I cannot restore the code exactly how it was before because of the reasons mentioned in rG8404aeb5.
I rewrote it in a way which is simpler to understand. I also fixed the other places that suffered from the same issue.

A better fix could be to lazily create threads in the ThreadPool when jobs are pushed through async(). This issue here is that we are currently creating too many threads by default, whereas before we were limiting to one thread per input, given a small num.inputs .

Herald added a reviewer: JDevlieghere. · View Herald TranscriptApr 20 2020, 9:16 AM

Typos.

MaskRay added inline comments.Apr 20 2020, 10:11 AM

llvm/lib/Support/Threading.cpp
88	Note that `sys::getHostNumPhysicalCores` can return -1 in some cases (on some OS/arch). In such cases we will return `1` if `ThreadsRequested == 0`. I still think that if the user asks for more threads than the CPU supports, we should respect that. For one thing, not every task can fully leverage 100% of the computing power of a core.

aganea marked 4 inline comments as done.Apr 20 2020, 1:23 PM

aganea added inline comments.

llvm/lib/Support/Threading.cpp
88	Sorry I'm a bit slow. I completely agree with "if the user asks for more threads than the CPU supports, we should respect that", but I don't understand in which case this isn't satisfied as it stands? `ThreadsRequested == 0` means 'use the default', which is `MaxThreadCount`. Should we return `std::thread::hardware_concurrency()` if `MaxThreadCount == 1`? If the user asks anything > 0, currently we will satisfy that requirement. Except when we want don't want to automatically spawn more than `MaxThreadCount` (thus the new `Limit` variable). Note, `Limit = false` by default.
llvm/tools/llvm-cov/CodeCoverage.cpp
950	I can also do `S.ThreadsRequested = std::min(SourceFiles.size(), heavyweight_hardware_concurreny().compute_thread_count());` here instead of the `.Limit` logic. But then the code becomes more difficult to read I find.

+@friss @aprantl @JDevlieghere for dsymutil changes

aprantl added a reviewer: friss.Apr 20 2020, 4:29 PM

The dsymutil part looks fine, I don't expect there to be many cases where we have more architectures than cores :-)

Ping! This seemed to be a blocker for @phosek.

This LGTM but I haven't yet been able to verify this on our side, but that's because getting a toolchain with a custom patch used by our coverage infrastructure is something we don't support right now and I still haven't managed to get that working (the issue only reproduces on our bots, I haven't managed to reproduce that issue locally on my workstation). I'd be fine landing this and then follow up if there are any issues once we roll this in.

This revision is now accepted and ready to land.Apr 22 2020, 9:29 AM

In D78408#1997212, @phosek wrote:

[...] and I still haven't managed to get that working (the issue only reproduces on our bots, I haven't managed to reproduce that issue locally on my workstation)

The issue is clearly related to the number of cores in the system, the amount of instances of LLVM exes that are running at once, and rlimits. Does your workstation have the same hardware config as the bots?

LGTM. I was waiting on people's response whether the addition bool Limit = false (it increased the complexity of the interface a bit) is necessary. It seems that people are fine with it.

In D78408#1997273, @aganea wrote:

In D78408#1997212, @phosek wrote:

[...] and I still haven't managed to get that working (the issue only reproduces on our bots, I haven't managed to reproduce that issue locally on my workstation)

The issue is clearly related to the number of cores in the system, the amount of instances of LLVM exes that are running at once, and rlimits. Does your workstation have the same hardware config as the bots?

I have finally manage to do a full run on our bots and I can confirm that this addresses the issue.

Can we land this?

Closed by commit rG0e13a0331fb9: [llvm-cov] Prevent llvm-cov from using too many threads (authored by aganea). · Explain WhyApr 24 2020, 1:00 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Support/

Threading.h

4 lines

lib/

Support/

Threading.cpp

9 lines

tools/

dsymutil/

dsymutil.cpp

14 lines

llvm-cov/

CodeCoverage.cpp

16 lines

CoverageExporterJson.cpp

12 lines

CoverageReport.cpp

13 lines

Diff 259970

llvm/include/llvm/Support/Threading.h

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	public:
// a suggested high bound, the runtime might choose a lower value (not		// a suggested high bound, the runtime might choose a lower value (not
// higher).		// higher).
unsigned ThreadsRequested = 0;		unsigned ThreadsRequested = 0;

// If SMT is active, use hyper threads. If false, there will be only one		// If SMT is active, use hyper threads. If false, there will be only one
// std::thread per core.		// std::thread per core.
bool UseHyperThreads = true;		bool UseHyperThreads = true;

		// If set, will constrain 'ThreadsRequested' to the number of hardware
		// threads, or hardware cores.
		bool Limit = false;

/// Retrieves the max available threads for the current strategy. This		/// Retrieves the max available threads for the current strategy. This
/// accounts for affinity masks and takes advantage of all CPU sockets.		/// accounts for affinity masks and takes advantage of all CPU sockets.
unsigned compute_thread_count() const;		unsigned compute_thread_count() const;

/// Assign the current thread to an ideal hardware CPU or NUMA node. In a		/// Assign the current thread to an ideal hardware CPU or NUMA node. In a
/// multi-socket system, this ensures threads are assigned to all CPU		/// multi-socket system, this ensures threads are assigned to all CPU
/// sockets. \p ThreadPoolNum represents a number bounded by [0,		/// sockets. \p ThreadPoolNum represents a number bounded by [0,
/// compute_thread_count()).		/// compute_thread_count()).
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/lib/Support/Threading.cpp

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	}			}
	#endif			#endif

	#else			#else

	int computeHostNumHardwareThreads();			int computeHostNumHardwareThreads();

	unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {			unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
	if (ThreadsRequested > 0)
	return ThreadsRequested;

	int MaxThreadCount = UseHyperThreads ? computeHostNumHardwareThreads()			int MaxThreadCount = UseHyperThreads ? computeHostNumHardwareThreads()
	: sys::getHostNumPhysicalCores();			: sys::getHostNumPhysicalCores();
				MaskRayUnsubmitted Done Reply Inline Actions Note that `sys::getHostNumPhysicalCores` can return -1 in some cases (on some OS/arch). In such cases we will return `1` if `ThreadsRequested == 0`. I still think that if the user asks for more threads than the CPU supports, we should respect that. For one thing, not every task can fully leverage 100% of the computing power of a core. MaskRay: Note that `sys::getHostNumPhysicalCores` can return -1 in some cases (on some OS/arch). In such…
				aganeaAuthorUnsubmitted Done Reply Inline Actions Sorry I'm a bit slow. I completely agree with "if the user asks for more threads than the CPU supports, we should respect that", but I don't understand in which case this isn't satisfied as it stands? `ThreadsRequested == 0` means 'use the default', which is `MaxThreadCount`. Should we return `std::thread::hardware_concurrency()` if `MaxThreadCount == 1`? If the user asks anything > 0, currently we will satisfy that requirement. Except when we want don't want to automatically spawn more than `MaxThreadCount` (thus the new `Limit` variable). Note, `Limit = false` by default. aganea: Sorry I'm a bit slow. I completely agree with "//if the user asks for more threads than the CPU…
	if (MaxThreadCount <= 0)			if (MaxThreadCount <= 0)
	MaxThreadCount = 1;			MaxThreadCount = 1;
				if (ThreadsRequested == 0)
	return MaxThreadCount;			return MaxThreadCount;
				if (!Limit)
				return ThreadsRequested;
				return std::min((unsigned)MaxThreadCount, ThreadsRequested);
	}			}

	namespace {			namespace {
	struct SyncThreadInfo {			struct SyncThreadInfo {
	void (UserFn)(void );			void (UserFn)(void );
	void *UserData;			void *UserData;
	};			};

	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/tools/dsymutil/dsymutil.cpp

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	for (auto &InputFile : Options.InputFiles) {
if (DebugMapPtrsOrErr->empty()) {		if (DebugMapPtrsOrErr->empty()) {
WithColor::error() << "no architecture to link\n";		WithColor::error() << "no architecture to link\n";
return 1;		return 1;
}		}

// Shared a single binary holder for all the link steps.		// Shared a single binary holder for all the link steps.
BinaryHolder BinHolder;		BinaryHolder BinHolder;

unsigned ThreadCount = Options.LinkOpts.Threads;		ThreadPoolStrategy S = hardware_concurrency(Options.LinkOpts.Threads);
if (!ThreadCount)		if (Options.LinkOpts.Threads == 0) {
ThreadCount = DebugMapPtrsOrErr->size();		// If NumThreads is not specified, create one thread for each input, up to
ThreadPool Threads(hardware_concurrency(ThreadCount));		// the number of hardware threads.
		S.ThreadsRequested = DebugMapPtrsOrErr->size();
		S.Limit = true;
		}
		ThreadPool Threads(S);

// If there is more than one link to execute, we need to generate		// If there is more than one link to execute, we need to generate
// temporary files.		// temporary files.
const bool NeedsTempFiles =		const bool NeedsTempFiles =
!Options.DumpDebugMap && (Options.OutputFile != "-") &&		!Options.DumpDebugMap && (Options.OutputFile != "-") &&
(DebugMapPtrsOrErr->size() != 1 \|\| Options.LinkOpts.Update);		(DebugMapPtrsOrErr->size() != 1 \|\| Options.LinkOpts.Update);
const bool Verify = Options.Verify && !Options.LinkOpts.NoOutput;		const bool Verify = Options.Verify && !Options.LinkOpts.NoOutput;

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	for (auto &Map : *DebugMapPtrsOrErr) {
if (Verify)		if (Verify)
AllOK.fetch_and(verify(OutputFile, Map->getTriple().getArchName(),		AllOK.fetch_and(verify(OutputFile, Map->getTriple().getArchName(),
Options.Verbose));		Options.Verbose));
};		};

// FIXME: The DwarfLinker can have some very deep recursion that can max		// FIXME: The DwarfLinker can have some very deep recursion that can max
// out the (significantly smaller) stack when using threads. We don't		// out the (significantly smaller) stack when using threads. We don't
// want this limitation when we only have a single thread.		// want this limitation when we only have a single thread.
if (ThreadCount == 1)		if (S.ThreadsRequested == 1)
LinkLambda(OS, Options.LinkOpts);		LinkLambda(OS, Options.LinkOpts);
else		else
Threads.async(LinkLambda, OS, Options.LinkOpts);		Threads.async(LinkLambda, OS, Options.LinkOpts);
}		}

Threads.wait();		Threads.wait();

if (!AllOK)		if (!AllOK)
Show All 18 Lines

llvm/tools/llvm-cov/CodeCoverage.cpp

Show First 20 Lines • Show All 937 Lines • ▼ Show 20 Lines	if (!Filters.empty()) {
return 0;		return 0;
}		}

// Show files		// Show files
bool ShowFilenames =		bool ShowFilenames =
(SourceFiles.size() != 1) \|\| ViewOpts.hasOutputDirectory() \|\|		(SourceFiles.size() != 1) \|\| ViewOpts.hasOutputDirectory() \|\|
(ViewOpts.Format == CoverageViewOptions::OutputFormat::HTML);		(ViewOpts.Format == CoverageViewOptions::OutputFormat::HTML);

auto NumThreads = ViewOpts.NumThreads;		ThreadPoolStrategy S = hardware_concurrency(ViewOpts.NumThreads);
		if (ViewOpts.NumThreads == 0) {
// If NumThreads is not specified, auto-detect a good default.		// If NumThreads is not specified, create one thread for each input, up to
if (NumThreads == 0)		// the number of hardware cores.
NumThreads = SourceFiles.size();		S = heavyweight_hardware_concurrency(SourceFiles.size());
		aganeaAuthorUnsubmitted Done Reply Inline Actions I can also do `S.ThreadsRequested = std::min(SourceFiles.size(), heavyweight_hardware_concurreny().compute_thread_count());` here instead of the `.Limit` logic. But then the code becomes more difficult to read I find. aganea: I can also do `S.ThreadsRequested = std::min(SourceFiles.size()…
		S.Limit = true;
		}

if (!ViewOpts.hasOutputDirectory() \|\| NumThreads == 1) {		if (!ViewOpts.hasOutputDirectory() \|\| S.ThreadsRequested == 1) {
for (const std::string &SourceFile : SourceFiles)		for (const std::string &SourceFile : SourceFiles)
writeSourceFileView(SourceFile, Coverage.get(), Printer.get(),		writeSourceFileView(SourceFile, Coverage.get(), Printer.get(),
ShowFilenames);		ShowFilenames);
} else {		} else {
// In -output-dir mode, it's safe to use multiple threads to print files.		// In -output-dir mode, it's safe to use multiple threads to print files.
ThreadPool Pool(heavyweight_hardware_concurrency(NumThreads));		ThreadPool Pool(S);
for (const std::string &SourceFile : SourceFiles)		for (const std::string &SourceFile : SourceFiles)
Pool.async(&CodeCoverageTool::writeSourceFileView, this, SourceFile,		Pool.async(&CodeCoverageTool::writeSourceFileView, this, SourceFile,
Coverage.get(), Printer.get(), ShowFilenames);		Coverage.get(), Printer.get(), ShowFilenames);
Pool.wait();		Pool.wait();
}		}

return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/tools/llvm-cov/CoverageExporterJson.cpp

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	json::Object renderFile(const coverage::CoverageMapping &Coverage,
File["summary"] = renderSummary(FileReport);		File["summary"] = renderSummary(FileReport);
return File;		return File;
}		}

json::Array renderFiles(const coverage::CoverageMapping &Coverage,		json::Array renderFiles(const coverage::CoverageMapping &Coverage,
ArrayRef<std::string> SourceFiles,		ArrayRef<std::string> SourceFiles,
ArrayRef<FileCoverageSummary> FileReports,		ArrayRef<FileCoverageSummary> FileReports,
const CoverageViewOptions &Options) {		const CoverageViewOptions &Options) {
auto NumThreads = Options.NumThreads;		ThreadPoolStrategy S = hardware_concurrency(Options.NumThreads);
if (NumThreads == 0)		if (Options.NumThreads == 0) {
NumThreads = SourceFiles.size();		// If NumThreads is not specified, create one thread for each input, up to
ThreadPool Pool(heavyweight_hardware_concurrency(NumThreads));		// the number of hardware cores.
		S = heavyweight_hardware_concurrency(SourceFiles.size());
		S.Limit = true;
		}
		ThreadPool Pool(S);
json::Array FileArray;		json::Array FileArray;
std::mutex FileArrayMutex;		std::mutex FileArrayMutex;

for (unsigned I = 0, E = SourceFiles.size(); I < E; ++I) {		for (unsigned I = 0, E = SourceFiles.size(); I < E; ++I) {
auto &SourceFile = SourceFiles[I];		auto &SourceFile = SourceFiles[I];
auto &FileReport = FileReports[I];		auto &FileReport = FileReports[I];
Pool.async([&] {		Pool.async([&] {
auto File = renderFile(Coverage, SourceFile, FileReport, Options);		auto File = renderFile(Coverage, SourceFile, FileReport, Options);
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/tools/llvm-cov/CoverageReport.cpp

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	void CoverageReport::prepareSingleFileReport(const StringRef Filename,
}		}
}		}

std::vector<FileCoverageSummary> CoverageReport::prepareFileReports(		std::vector<FileCoverageSummary> CoverageReport::prepareFileReports(
const coverage::CoverageMapping &Coverage, FileCoverageSummary &Totals,		const coverage::CoverageMapping &Coverage, FileCoverageSummary &Totals,
ArrayRef<std::string> Files, const CoverageViewOptions &Options,		ArrayRef<std::string> Files, const CoverageViewOptions &Options,
const CoverageFilter &Filters) {		const CoverageFilter &Filters) {
unsigned LCP = getRedundantPrefixLen(Files);		unsigned LCP = getRedundantPrefixLen(Files);
auto NumThreads = Options.NumThreads;

// If NumThreads is not specified, auto-detect a good default.		ThreadPoolStrategy S = hardware_concurrency(Options.NumThreads);
if (NumThreads == 0)		if (Options.NumThreads == 0) {
NumThreads = Files.size();		// If NumThreads is not specified, create one thread for each input, up to
ThreadPool Pool(heavyweight_hardware_concurrency(NumThreads));		// the number of hardware cores.
		S = heavyweight_hardware_concurrency(Files.size());
		S.Limit = true;
		}
		ThreadPool Pool(S);

std::vector<FileCoverageSummary> FileReports;		std::vector<FileCoverageSummary> FileReports;
FileReports.reserve(Files.size());		FileReports.reserve(Files.size());

for (StringRef Filename : Files) {		for (StringRef Filename : Files) {
FileReports.emplace_back(Filename.drop_front(LCP));		FileReports.emplace_back(Filename.drop_front(LCP));
Pool.async(&CoverageReport::prepareSingleFileReport, Filename,		Pool.async(&CoverageReport::prepareSingleFileReport, Filename,
&Coverage, Options, LCP, &FileReports.back(), &Filters);		&Coverage, Options, LCP, &FileReports.back(), &Filters);
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines