This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/tools/clang-scan-deps/
-
tools/
-
clang-scan-deps/
1/2
ClangScanDeps.cpp

Differential D145101

[clang][deps] NFC: Simplify worker loop
ClosedPublic

Authored by jansvoboda11 on Mar 1 2023, 11:53 AM.

Download Raw Diff

Details

Reviewers

Bigcheese
benlangmuir

Commits

rG9679075a1ae6: [clang][deps] NFC: Simplify worker loop

Summary

This patch simplifies the loop inside each worker by extracting index retrieval into a lambda function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jansvoboda11 created this revision.Mar 1 2023, 11:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2023, 11:53 AM

Herald added a subscriber: ributzka. · View Herald Transcript

jansvoboda11 requested review of this revision.Mar 1 2023, 11:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2023, 11:53 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

benlangmuir added inline comments.Mar 1 2023, 12:09 PM

clang/tools/clang-scan-deps/ClangScanDeps.cpp
804	Why are we second-guessing the `ThreadPool` at all? I would think we should do for (unsigned Index = 0; E = Inputs.size(); Index + E; ++Index) { Pool.async([Index, &]{ ... }); } Then the thread pool is responsible for dispatching the tasks when it has available resources instead of us manually looping inside the threads.

jansvoboda11 added inline comments.Mar 1 2023, 12:47 PM

clang/tools/clang-scan-deps/ClangScanDeps.cpp
804	I guess it was originally done this way because we have a number of `DependencyScanning{Tool,Worker}` instances (the same number as there are threads in the pool) we want to reuse to get the advantage of local caching in `DependencyScanningWorkerFilesystem`. I guess having one `Worker` instance "pinned" to one thread might get us better memory access patterns? (I'm just guessing here.) We could implement your simplification and keep the "pinning" if the thread pool gave us index of the thread the async task is running on. But it doesn't seem like we have that API. That means we'd probably need to implement some `Worker` queue each async task would take from. I'm not sure that's better/simpler than monotonically increasing input index. Maybe @Bigcheese can explain this in more detail?

Harbormaster completed remote builds in B216798: Diff 501616.Mar 1 2023, 1:14 PM

I guess it was originally done this way because we have a number of DependencyScanning{Tool,Worker} instances

Oh of course, this makes sense. Yeah we could maybe find a different way to do it if we care, but this is a sufficiently good reason to keep doing what we're doing and shouldn't hold you up.

Thanks for explaining; LGTM.

This revision is now accepted and ready to land.Mar 1 2023, 1:17 PM

Closed by commit rG9679075a1ae6: [clang][deps] NFC: Simplify worker loop (authored by jansvoboda11). · Explain WhyMar 2 2023, 1:50 PM

This revision was automatically updated to reflect the committed changes.

jansvoboda11 added a commit: rG9679075a1ae6: [clang][deps] NFC: Simplify worker loop.

Revision Contents

Path

Size

clang/

tools/

clang-scan-deps/

ClangScanDeps.cpp

31 lines

Diff 501969

clang/tools/clang-scan-deps/ClangScanDeps.cpp

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < Pool.getThreadCount(); ++I)
WorkerTools.push_back(std::make_unique<DependencyScanningTool>(Service));		WorkerTools.push_back(std::make_unique<DependencyScanningTool>(Service));

std::vector<tooling::CompileCommand> Inputs =		std::vector<tooling::CompileCommand> Inputs =
AdjustingCompilations->getAllCompileCommands();		AdjustingCompilations->getAllCompileCommands();

std::atomic<bool> HadErrors(false);		std::atomic<bool> HadErrors(false);
std::optional<FullDeps> FD;		std::optional<FullDeps> FD;
P1689Deps PD;		P1689Deps PD;

std::mutex Lock;		std::mutex Lock;
size_t Index = 0;		size_t Index = 0;
		auto GetNextInputIndex = [&]() -> std::optional<size_t> {
		std::unique_lock<std::mutex> LockGuard(Lock);
		if (Index < Inputs.size())
		return Index++;
		return {};
		};

if (Format == ScanningOutputFormat::Full)		if (Format == ScanningOutputFormat::Full)
FD.emplace(ModuleName.empty() ? Inputs.size() : 0);		FD.emplace(ModuleName.empty() ? Inputs.size() : 0);

if (Verbose) {		if (Verbose) {
llvm::outs() << "Running clang-scan-deps on " << Inputs.size()		llvm::outs() << "Running clang-scan-deps on " << Inputs.size()
<< " files using " << Pool.getThreadCount() << " workers\n";		<< " files using " << Pool.getThreadCount() << " workers\n";
}		}
for (unsigned I = 0; I < Pool.getThreadCount(); ++I) {		for (unsigned I = 0; I < Pool.getThreadCount(); ++I) {
Pool.async([I, &Lock, &Index, &Inputs, &HadErrors, &FD, &PD, &WorkerTools,		Pool.async([&, I]() {
		benlangmuirUnsubmitted Not Done Reply Inline Actions Why are we second-guessing the `ThreadPool` at all? I would think we should do for (unsigned Index = 0; E = Inputs.size(); Index + E; ++Index) { Pool.async([Index, &]{ ... }); } Then the thread pool is responsible for dispatching the tasks when it has available resources instead of us manually looping inside the threads. benlangmuir: Why are we second-guessing the `ThreadPool` at all? I would think we should do ``` for…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I guess it was originally done this way because we have a number of `DependencyScanning{Tool,Worker}` instances (the same number as there are threads in the pool) we want to reuse to get the advantage of local caching in `DependencyScanningWorkerFilesystem`. I guess having one `Worker` instance "pinned" to one thread might get us better memory access patterns? (I'm just guessing here.) We could implement your simplification and keep the "pinning" if the thread pool gave us index of the thread the async task is running on. But it doesn't seem like we have that API. That means we'd probably need to implement some `Worker` queue each async task would take from. I'm not sure that's better/simpler than monotonically increasing input index. Maybe @Bigcheese can explain this in more detail? jansvoboda11: I guess it was originally done this way because we have a number of `DependencyScanning{Tool…
&DependencyOS, &Errs]() {
llvm::StringSet<> AlreadySeenModules;		llvm::StringSet<> AlreadySeenModules;
while (true) {		while (auto MaybeInputIndex = GetNextInputIndex()) {
const tooling::CompileCommand *Input;		size_t LocalIndex = *MaybeInputIndex;
std::string Filename;		const tooling::CompileCommand *Input = &Inputs[LocalIndex];
std::string CWD;		std::string Filename = std::move(Input->Filename);
size_t LocalIndex;		std::string CWD = std::move(Input->Directory);
// Take the next input.
{
std::unique_lock<std::mutex> LockGuard(Lock);
if (Index >= Inputs.size())
return;
LocalIndex = Index;
Input = &Inputs[Index++];
Filename = std::move(Input->Filename);
CWD = std::move(Input->Directory);
}
std::optional<StringRef> MaybeModuleName;		std::optional<StringRef> MaybeModuleName;
if (!ModuleName.empty())		if (!ModuleName.empty())
MaybeModuleName = ModuleName;		MaybeModuleName = ModuleName;

std::string OutputDir(ModuleFilesDir);		std::string OutputDir(ModuleFilesDir);
if (OutputDir.empty())		if (OutputDir.empty())
OutputDir = getModuleCachePath(Input->CommandLine);		OutputDir = getModuleCachePath(Input->CommandLine);
auto LookupOutput = [&](const ModuleID &MID, ModuleOutputKind MOK) {		auto LookupOutput = [&](const ModuleID &MID, ModuleOutputKind MOK) {
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines