This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Support/
-
llvm/
-
Support/
-
ThreadPool.h
-
lib/Support/
-
Support/
-
ThreadPool.cpp
-
unittests/Support/
-
Support/
-
ThreadPool.cpp

Differential D25784

[Support] ThreadPool: Don't spawn any threads when ThreadCount = 1
Needs ReviewPublic

Authored by vsk on Oct 19 2016, 11:55 AM.

Download Raw Diff

Details

Reviewers

tejohnson
mehdi_amini

Summary

This should save clients the time/memory overhead of spawning a thread when ThreadCount/std::thread::hardware_concurrency() = 1. IMO this is also a de facto simplification of the ThreadPool API, because clients no longer have to maintain slightly separate code paths for the ThreadCount = 1 / ThreadCount > 1 cases.

Tested by running unittests/Support/SupportTests configured with -DLLVM_ENABLE_THREADS={On,Off}.

Diff Detail

Event Timeline

vsk updated this revision to Diff 75185.Oct 19 2016, 11:55 AM

vsk retitled this revision from to [Support] ThreadPool: Don't spawn any threads when ThreadCount = 1.

vsk updated this object.

vsk added reviewers: • joker-eph-DISABLED, tejohnson.

vsk added a subscriber: llvm-commits.

Thinking a bit more about it, I'm not sure it is intuitive: the client could do something else after issuing a task to the pool. After this patch this computation would no longer occur in parallel.

mehdi_amini edited reviewers, added: mehdi_amini; removed: • joker-eph-DISABLED.Oct 19 2016, 12:01 PM

mehdi_amini removed a subscriber: mehdi_amini.

In D25784#574553, @mehdi_amini wrote:

Thinking a bit more about it, I'm not sure it is intuitive: the client could do something else after issuing a task to the pool. After this patch this computation would no longer occur in parallel.

Fair enough. Do you think the sequential behavior makes sense when ThreadCount = 0? I.e, should we make ThreadPool(0) behave the same way regardless of whether LLVM_ENABLE_THREADS=On|Off? Currently, the sequential behavior does not seem possible when LLVM_ENABLE_THREADS=On.

That seems fine and makes sense to me from an API point of view.
I'm not sure how to translate this to the user though: I don't expect someone to run -threads=0. And we're back to special casing in the client: ThreadPool Pool(ThreadCount == 1 ? 0 : ThreadCount); which isn't great either.

In D25784#574641, @mehdi_amini wrote:

That seems fine and makes sense to me from an API point of view.
I'm not sure how to translate this to the user though: I don't expect someone to run -threads=0. And we're back to special casing in the client: ThreadPool Pool(ThreadCount == 1 ? 0 : ThreadCount); which isn't great either.

Good point. Wdyt of adding 'bool PreferSequential = false' to the constructors? The meaning of 'PreferSequential' would be ThreadCount = (PreferSequential && MaxThreadCount == 1) ? 0 : MaxThreadCount. That way, users can opt-in to executing all the tasks in wait().

In D25784#574651, @vsk wrote:

In D25784#574641, @mehdi_amini wrote:

That seems fine and makes sense to me from an API point of view.
I'm not sure how to translate this to the user though: I don't expect someone to run -threads=0. And we're back to special casing in the client: ThreadPool Pool(ThreadCount == 1 ? 0 : ThreadCount); which isn't great either.

Good point. Wdyt of adding 'bool PreferSequential = false' to the constructors? The meaning of 'PreferSequential' would be ThreadCount = (PreferSequential && MaxThreadCount == 1) ? 0 : MaxThreadCount. That way, users can opt-in to executing all the tasks in wait().

That looks weird to create a threads pool and saying at the same time that I "prefer sequential": ThreadPool Pool(MaxThreadCount, /* PreferSequential */ true); ; it's almost contradictory (though I see how it could simplify the client code somehow).

This "optimization" seems like a difficult thing to fit in the API. I'm not very inspired right now, so I haven't any great suggestion.

By the way, did you have a particular client in mind? The pool isn't really designed for very lightweight task but rather "heavier" one, where the cost of spawning a thread shouldn't be significant?

In D25784#574739, @mehdi_amini wrote:

Good point. Wdyt of adding 'bool PreferSequential = false' to the constructors? The meaning of 'PreferSequential' would be ThreadCount = (PreferSequential && MaxThreadCount == 1) ? 0 : MaxThreadCount. That way, users can opt-in to executing all the tasks in wait().

That looks weird to create a threads pool and saying at the same time that I "prefer sequential": ThreadPool Pool(MaxThreadCount, /* PreferSequential */ true); ; it's almost contradictory (though I see how it could simplify the client code somehow).

Hm, yeah.

This "optimization" seems like a difficult thing to fit in the API. I'm not very inspired right now, so I haven't any great suggestion.

By the way, did you have a particular client in mind? The pool isn't really designed for very lightweight task but rather "heavier" one, where the cost of spawning a thread shouldn't be significant?

I'll try and think of a more natural way to do this. Maybe adding support for the ThreadCount = 0 case is the simplest option.

The client I had in mind is llvm-cov, where each thread writes out a coverage report to disk. It is not relatively expensive to spawn a thread in this context. But it would be nice to just have one code path for the threaded+non-threaded cases instead of two.

Yeah, right now supporting ThreadCount == 0 seems like the most straightforward and less intrusive way to get it in the API.

The client I had in mind is llvm-cov, where each thread writes out a coverage report to disk. It is not relatively expensive to spawn a thread in this context. But it would be nice to just have one code path for the threaded+non-threaded cases instead of two.

If spawning a thread is not relatively expensive, why would you bother having a separate code path?

In D25784#574827, @mehdi_amini wrote:

If spawning a thread is not relatively expensive, why would you bother having a separate code path?

I got a report about thread-related flakiness in llvm-cov (PR30735), and thought that avoiding thread creation could mitigate it.

I may miss something, but I'm seeing this "hiding a latent bug", and as such is not a great motivation (it means that there might be a latent bug with the threading infrastructure and/or llvm-cov, possibly on a particular platform, but you're reducing the coverage on this aspect).

In D25784#574859, @mehdi_amini wrote:

I may miss something, but I'm seeing this "hiding a latent bug", and as such is not a great motivation (it means that there might be a latent bug with the threading infrastructure and/or llvm-cov, possibly on a particular platform, but you're reducing the coverage on this aspect).

In this particular case (PR30735), we still have tests that stress the multi-threaded path on the affected platform, so the coverage hasn't been removed altogether. I don't expect this kind of change to eradicate the bug, and am still searching for the root cause.

Separately, I still think it should be possible to avoid paying the thread creation overhead when using ThreadPool with LLVM_ENABLE_THREADS=On.

Maybe the ThreadPool Pool(ThreadCount == 1 ? 0 : ThreadCount); is the best we can do for now?

Revision Contents

Path

Size

include/

llvm/

Support/

ThreadPool.h

28 lines

lib/

Support/

ThreadPool.cpp

60 lines

unittests/

Support/

ThreadPool.cpp

6 lines

Diff 75185

include/llvm/Support/ThreadPool.h

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	#else
return asyncImpl([F] (VoidTy) -> VoidTy { F(); return VoidTy(); });		return asyncImpl([F] (VoidTy) -> VoidTy { F(); return VoidTy(); });
#endif		#endif
}		}

/// Blocking wait for all the threads to complete and the queue to be empty.		/// Blocking wait for all the threads to complete and the queue to be empty.
/// It is an error to try to add new tasks while blocking on this call.		/// It is an error to try to add new tasks while blocking on this call.
void wait();		void wait();

		/// Get the number of spawned threads.
		unsigned getNumSpawnedThreads() const;

private:		private:
/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
std::shared_future<VoidTy> asyncImpl(TaskTy F);		std::shared_future<VoidTy> asyncImpl(TaskTy F);

/// Threads in flight		/// A 'wait' implementation which sequentially runs all queued tasks.
std::vector<llvm::thread> Threads;		void sequentialWait();

		/// An 'asyncImpl' implementation which queues tasks without performing any
		/// locking.
		std::shared_future<ThreadPool::VoidTy> sequentialAsyncImpl(TaskTy Task);

/// Tasks waiting for execution in the pool.		/// Tasks waiting for execution in the pool.
std::queue<PackagedTaskTy> Tasks;		std::queue<PackagedTaskTy> Tasks;

		#if LLVM_ENABLE_THREADS // avoids warning for unused variable
		/// Keep track of the number of thread actually busy
		std::atomic<unsigned> ActiveThreads;

		/// Signal for the destruction of the pool, asking thread to exit.
		bool EnableFlag;

		/// Threads in flight
		std::vector<llvm::thread> Threads;

/// Locking and signaling for accessing the Tasks queue.		/// Locking and signaling for accessing the Tasks queue.
std::mutex QueueLock;		std::mutex QueueLock;
std::condition_variable QueueCondition;		std::condition_variable QueueCondition;

/// Locking and signaling for job completion		/// Locking and signaling for job completion
std::mutex CompletionLock;		std::mutex CompletionLock;
std::condition_variable CompletionCondition;		std::condition_variable CompletionCondition;

/// Keep track of the number of thread actually busy
std::atomic<unsigned> ActiveThreads;

#if LLVM_ENABLE_THREADS // avoids warning for unused variable
/// Signal for the destruction of the pool, asking thread to exit.
bool EnableFlag;
#endif		#endif
};		};
}		}

#endif // LLVM_SUPPORT_THREAD_POOL_H		#endif // LLVM_SUPPORT_THREAD_POOL_H

lib/Support/ThreadPool.cpp

Show All 19 Lines

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS

// Default to std::thread::hardware_concurrency		// Default to std::thread::hardware_concurrency
ThreadPool::ThreadPool() : ThreadPool(std::thread::hardware_concurrency()) {}		ThreadPool::ThreadPool() : ThreadPool(std::thread::hardware_concurrency()) {}

ThreadPool::ThreadPool(unsigned ThreadCount)		ThreadPool::ThreadPool(unsigned ThreadCount)
: ActiveThreads(0), EnableFlag(true) {		: ActiveThreads(0), EnableFlag(true) {
		// We don't achieve any extra parallelism by spawning just one thread, so
		// don't do it.
		if (ThreadCount == 1)
		return;

// Create ThreadCount threads that will loop forever, wait on QueueCondition		// Create ThreadCount threads that will loop forever, wait on QueueCondition
// for tasks to be queued or the Pool to be destroyed.		// for tasks to be queued or the Pool to be destroyed.
Threads.reserve(ThreadCount);		Threads.reserve(ThreadCount);
for (unsigned ThreadID = 0; ThreadID < ThreadCount; ++ThreadID) {		for (unsigned ThreadID = 0; ThreadID < ThreadCount; ++ThreadID) {
Threads.emplace_back([&] {		Threads.emplace_back([&] {
while (true) {		while (true) {
PackagedTaskTy Task;		PackagedTaskTy Task;
{		{
Show All 32 Lines	#endif
// Notify task completion, in case someone waits on ThreadPool::wait()		// Notify task completion, in case someone waits on ThreadPool::wait()
CompletionCondition.notify_all();		CompletionCondition.notify_all();
}		}
});		});
}		}
}		}

void ThreadPool::wait() {		void ThreadPool::wait() {
		// Use the sequential 'wait' implementation if no threads are spawned.
		if (!getNumSpawnedThreads()) {
		sequentialWait();
		return;
		}

// Wait for all threads to complete and the queue to be empty		// Wait for all threads to complete and the queue to be empty
std::unique_lock<std::mutex> LockGuard(CompletionLock);		std::unique_lock<std::mutex> LockGuard(CompletionLock);
// The order of the checks for ActiveThreads and Tasks.empty() matters because		// The order of the checks for ActiveThreads and Tasks.empty() matters because
// any active threads might be modifying the Tasks queue, and this would be a		// any active threads might be modifying the Tasks queue, and this would be a
// race.		// race.
CompletionCondition.wait(LockGuard,		CompletionCondition.wait(LockGuard,
[&] { return !ActiveThreads && Tasks.empty(); });		[&] { return !ActiveThreads && Tasks.empty(); });
}		}

std::shared_future<ThreadPool::VoidTy> ThreadPool::asyncImpl(TaskTy Task) {		std::shared_future<ThreadPool::VoidTy> ThreadPool::asyncImpl(TaskTy Task) {
		// Use the sequential 'async' implementation if no threads are spawned.
		if (!getNumSpawnedThreads())
		return sequentialAsyncImpl(std::move(Task));

/// Wrap the Task in a packaged_task to return a future object.		/// Wrap the Task in a packaged_task to return a future object.
PackagedTaskTy PackagedTask(std::move(Task));		PackagedTaskTy PackagedTask(std::move(Task));
auto Future = PackagedTask.get_future();		auto Future = PackagedTask.get_future();
{		{
// Lock the queue and push the new task		// Lock the queue and push the new task
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);

// Don't allow enqueueing after disabling the pool		// Don't allow enqueueing after disabling the pool
assert(EnableFlag && "Queuing a thread during ThreadPool destruction");		assert(EnableFlag && "Queuing a thread during ThreadPool destruction");

Tasks.push(std::move(PackagedTask));		Tasks.push(std::move(PackagedTask));
}		}
QueueCondition.notify_one();		QueueCondition.notify_one();
return Future.share();		return Future.share();
}		}

		unsigned ThreadPool::getNumSpawnedThreads() const { return Threads.size(); }

// The destructor joins all threads, waiting for completion.		// The destructor joins all threads, waiting for completion.
ThreadPool::~ThreadPool() {		ThreadPool::~ThreadPool() {
		// Don't bother setting a condition if no threads are spawned -- just fall
		// through to wait().
		if (!getNumSpawnedThreads()) {
		wait();
		return;
		}

{		{
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);
EnableFlag = false;		EnableFlag = false;
}		}
QueueCondition.notify_all();		QueueCondition.notify_all();
for (auto &Worker : Threads)		for (auto &Worker : Threads)
Worker.join();		Worker.join();
}		}

#else // LLVM_ENABLE_THREADS Disabled		#else // LLVM_ENABLE_THREADS Disabled

ThreadPool::ThreadPool() : ThreadPool(0) {}		ThreadPool::ThreadPool() : ThreadPool(0) {}

// No threads are launched, issue a warning if ThreadCount is not 0		// No threads are launched, issue a warning if ThreadCount is not 0
ThreadPool::ThreadPool(unsigned ThreadCount)		ThreadPool::ThreadPool(unsigned ThreadCount) {
: ActiveThreads(0) {
if (ThreadCount) {		if (ThreadCount) {
errs() << "Warning: request a ThreadPool with " << ThreadCount		errs() << "Warning: request a ThreadPool with " << ThreadCount
<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";		<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";
}		}
}		}

void ThreadPool::wait() {		void ThreadPool::wait() { sequentialWait(); }

		std::shared_future<ThreadPool::VoidTy> ThreadPool::asyncImpl(TaskTy Task) {
		return sequentialAsyncImpl(std::move(Task));
		}

		unsigned ThreadPool::getNumSpawnedThreads() const { return 0; }

		ThreadPool::~ThreadPool() {
		wait();
		}

		#endif // LLVM_ENABLE_THREADS

		void ThreadPool::sequentialWait() {
// Sequential implementation running the tasks		// Sequential implementation running the tasks
while (!Tasks.empty()) {		while (!Tasks.empty()) {
auto Task = std::move(Tasks.front());		auto Task = std::move(Tasks.front());
Tasks.pop();		Tasks.pop();
#ifndef _MSC_VER		#ifndef _MSC_VER
Task();		Task();
#else		#else
Task(/* unused */ false);		Task(/* unused */ false);
#endif		#endif
}		}
}		}

std::shared_future<ThreadPool::VoidTy> ThreadPool::asyncImpl(TaskTy Task) {		std::shared_future<ThreadPool::VoidTy>
		ThreadPool::sequentialAsyncImpl(TaskTy Task) {
#ifndef _MSC_VER		#ifndef _MSC_VER
// Get a Future with launch::deferred execution using std::async		// Get a Future with launch::deferred execution using std::async
auto Future = std::async(std::launch::deferred, std::move(Task)).share();		auto Future = std::async(std::launch::deferred, std::move(Task)).share();
// Wrap the future so that both ThreadPool::wait() can operate and the		// Wrap the future so that both ThreadPool::wait() can operate and the
// returned future can be sync'ed on.		// returned future can be sync'ed on.
PackagedTaskTy PackagedTask([Future]() { Future.get(); });		PackagedTaskTy PackagedTask([Future]() { Future.get(); });
#else		#else
auto Future = std::async(std::launch::deferred, std::move(Task), false).share();		auto Future =
PackagedTaskTy PackagedTask([Future](bool) -> bool { Future.get(); return false; });		std::async(std::launch::deferred, std::move(Task), false).share();
		PackagedTaskTy PackagedTask([Future](bool) -> bool {
		Future.get();
		return false;
		});
#endif		#endif
Tasks.push(std::move(PackagedTask));		Tasks.push(std::move(PackagedTask));
return Future;		return Future;
}		}

ThreadPool::~ThreadPool() {
wait();
}

#endif

unittests/Support/ThreadPool.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	for (size_t i = 0; i < 5; ++i) {
++checked_in;		++checked_in;
});		});
}		}
ASSERT_EQ(0, checked_in);		ASSERT_EQ(0, checked_in);
setMainThreadReady();		setMainThreadReady();
}		}
ASSERT_EQ(5, checked_in);		ASSERT_EQ(5, checked_in);
}		}

		TEST_F(ThreadPoolTest, NoSpawnedThreads) {
		CHECK_UNSUPPORTED();
		ThreadPool Pool(1);
		ASSERT_EQ(Pool.getNumSpawnedThreads(), 0U);
		}

This is an archive of the discontinued LLVM Phabricator instance.

[Support] ThreadPool: Don't spawn any threads when ThreadCount = 1Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 75185

include/llvm/Support/ThreadPool.h

lib/Support/ThreadPool.cpp

unittests/Support/ThreadPool.cpp

[Support] ThreadPool: Don't spawn any threads when ThreadCount = 1
Needs ReviewPublic