Download Raw Diff

Details

Reviewers

aganea
mehdi_amini
rnk

Commits

rG6f2304911972: [Support] Simplify and optimize ThreadPool

Summary

Merge QueueLock and CompletionLock.
Avoid spurious CompletionCondition.notify_all() when ActiveThreads is greater than 0.
Use default member initializers.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Apr 24 2020, 10:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2020, 10:07 PM

Herald added subscribers: llvm-commits, jfb, hiraditya. · View Herald Transcript

Isn't this technically pessimizing some cases by sharing a lock (and so increasing potential contention here)?

llvm/lib/Support/ThreadPool.cpp
57	Can you expand this over different variables/statements? Or at minima add explicit parentheses? I am sure the compiler parses it, but not me :)
59–62	Update the comment here

Address comments

In D78856#2003373, @mehdi_amini wrote:

Isn't this technically pessimizing some cases by sharing a lock (and so increasing potential contention here)?

No. Only the thread(s) calling ThreadPool::Wait() waits on CompletionCondition. When it gets stuck, it will not awake spuriously. Note, before, we took locks 3 times in the loop body. Now it is 2...
I believe this version is strictly better.

Harbormaster completed remote builds in B54667: Diff 260065.Apr 24 2020, 11:12 PM

Harbormaster completed remote builds in B54668: Diff 260066.Apr 24 2020, 11:57 PM

Avoid spurious wake-up of waiter

aganea added inline comments.Apr 25 2020, 9:54 AM

llvm/lib/Support/ThreadPool.cpp
57	Is it worth generalizing the notify condition between this and `ThreadPool::wait()` below, to ease future maintenance/comprehension?

No. Only the thread(s) calling ThreadPool::Wait() waits on CompletionCondition. When it gets stuck, it will not awake spuriously. Note, before, we took locks 3 times in the loop body. Now it is 2...

I wasn't worried about contention with the waiting threads, but between working threads: they are taking the QueueLock more often now (it was taken once or twice per iteration of the loop body before and is now taken an extra time).
I am not sure we'd be able to measure any difference here, just curious how you thought about this tradeoff!

Originally I think I started writing all this code with ActiveThreads being an atomic so that the working threads can increment/decrement it without taking any lock: however adding the "wait()" then I couldn't see how to avoid taking the CompletionLock because of how std::condition_variable is setup.
That said since the waiting thread takes the QueueLock with your change, would leaving the ActiveThreads atomic allow to not take the Queue lock on task completion?
I'm wondering about something like this:

// Notify task completion if this is the last active thread, in case
// someone waits on ThreadPool::wait().
bool Notify = --ActiveThreads == 0;
if (Notify)
  CompletionCondition.notify_all();

llvm/lib/Support/ThreadPool.cpp
57	Can you add parentheses or split the expression? Notify = (--ActiveThreads == 0) && Tasks.empty(); or --ActiveThreads; Notify = !ActiveThreads && Tasks.empty(); The latter avoid the side effect on ActiveThreads in the expression, making the second line "pure" and easier to read.

In D78856#2003725, @mehdi_amini wrote:

No. Only the thread(s) calling ThreadPool::Wait() waits on CompletionCondition. When it gets stuck, it will not awake spuriously. Note, before, we took locks 3 times in the loop body. Now it is 2...

I wasn't worried about contention with the waiting threads, but between working threads: they are taking the QueueLock more often now (it was taken once or twice per iteration of the loop body before and is now taken an extra time).
I am not sure we'd be able to measure any difference here, just curious how you thought about this tradeoff!

There are two stages, before and after the execution of as task. The original code sequence used different locks. I suppose this is what you meant by less contention: a worker in the first stage (holding QueueLock while not holding CompletionLock) cannot contend with a worker in the second stage (holding CompletionLock)..

But note that:

there is still a window when a worker in the first stage can hold CompletionLock. The window is small, but it still exists.
the spurious signalling of a condition variable (CompletionCondition) can cause contention because a condition variable has an internal lock.

Originally I think I started writing all this code with ActiveThreads being an atomic so that the working threads can increment/decrement it without taking any lock: however adding the "wait()" then I couldn't see how to avoid taking the CompletionLock because of how std::condition_variable is setup.
That said since the waiting thread takes the QueueLock with your change, would leaving the ActiveThreads atomic allow to not take the Queue lock on task completion?
I'm wondering about something like this:
// Notify task completion if this is the last active thread, in case
// someone waits on ThreadPool::wait().
bool Notify = --ActiveThreads == 0;
if (Notify)
  CompletionCondition.notify_all();

The suggested code sequence has the problem of lost signals. It happens because a waiter thread can change the observed state between waiter's testing and blocking.

  worker: pop one item from Tasks and Tasks is now empty
waiter: lock QueueLock
waiter: test that `!ActiveThreads && Tasks.empty()` is false and decide to unlock and block
  worker: decrement ActiveThreads to zero
  worker: signal QueueLock  # the signal is lost
waiter: unlock QueueLock and block

With an atomic ActiveThreads, an alternative code sequence is possible, but I'm not sure it is better:

bool Notify = --ActiveThreads == 0;
{ lock_guard guard(QueueLock); Notify &= Tasks.empty(); }
if (Notify) CompletionCondition.notify_all();

llvm/lib/Support/ThreadPool.cpp
57	Is it worth generalizing the notify condition between this and ThreadPool::wait() below, to ease future maintenance/comprehension? Sorry that I am not following this suggestion. Can you elaborate?

Avoid --ActiveThreads in a complex expression

aganea added inline comments.Apr 25 2020, 11:05 AM

llvm/lib/Support/ThreadPool.cpp
57	I meant a function, to be used by the code here and the code below in `wait()`, simply to signify (to users who read the code) that it should check exactly the same things? private: bool workCompletedNoLock() { return !ActiveThreads && Tasks.empty(); }

Place !ActiveThreads && Tasks.empty() in a new member function

llvm/lib/Support/ThreadPool.cpp
57	Gotcha. Good idea!

In D78856#2003762, @MaskRay wrote:

In D78856#2003725, @mehdi_amini wrote:

No. Only the thread(s) calling ThreadPool::Wait() waits on CompletionCondition. When it gets stuck, it will not awake spuriously. Note, before, we took locks 3 times in the loop body. Now it is 2...

I wasn't worried about contention with the waiting threads, but between working threads: they are taking the QueueLock more often now (it was taken once or twice per iteration of the loop body before and is now taken an extra time).
I am not sure we'd be able to measure any difference here, just curious how you thought about this tradeoff!

There are two stages, before and after the execution of as task. The original code sequence used different locks. I suppose this is what you meant by less contention: a worker in the first stage (holding QueueLock while not holding CompletionLock) cannot contend with a worker in the second stage (holding CompletionLock)..

Yes this is what I meant, the "second stage" lock contends with the first stage queue manipulation now, while it didn't before. It also contends with any thread trying to enqueue.
Overall for a given execution there will be quite a lot more (almost doubled?) locking/unlocking of the QueueLock right?

the spurious signalling of a condition variable (CompletionCondition) can cause contention because a condition variable has an internal lock.

We're getting into implementation / platform specific, but I thought that signaling was mostly gonna map to one or two futex syscall?

The suggested code sequence has the problem of lost signals. It happens because a waiter thread can change the observed state between waiter's testing and blocking.
...

Thanks! This is likely the sequence that lead me to add this lock at the time... IIRC you can avoid this with low-level use of futex but there isn't a portable solution.

Anyway, I'm fine with this change overall, I don't think this thread pool implementation is optimized for very small tasks anyway.

Anyway, I'm fine with this change overall, I don't think this thread pool implementation is optimized for very small tasks anyway.

May I get a LGTM? :)

mehdi_amini accepted this revision.Apr 27 2020, 8:31 PM

This revision is now accepted and ready to land.Apr 27 2020, 8:31 PM

Closed by commit rG6f2304911972: [Support] Simplify and optimize ThreadPool (authored by MaskRay). · Explain WhyApr 28 2020, 12:24 PM

This revision was automatically updated to reflect the committed changes.

Diff 260115

llvm/include/llvm/Support/ThreadPool.h

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	private:

/// Tasks waiting for execution in the pool.		/// Tasks waiting for execution in the pool.
std::queue<PackagedTaskTy> Tasks;		std::queue<PackagedTaskTy> Tasks;

/// Locking and signaling for accessing the Tasks queue.		/// Locking and signaling for accessing the Tasks queue.
std::mutex QueueLock;		std::mutex QueueLock;
std::condition_variable QueueCondition;		std::condition_variable QueueCondition;

/// Locking and signaling for job completion		/// Signaling for job completion
std::mutex CompletionLock;
std::condition_variable CompletionCondition;		std::condition_variable CompletionCondition;

/// Keep track of the number of thread actually busy		/// Keep track of the number of thread actually busy
std::atomic<unsigned> ActiveThreads;		unsigned ActiveThreads = 0;

#if LLVM_ENABLE_THREADS // avoids warning for unused variable		#if LLVM_ENABLE_THREADS // avoids warning for unused variable
/// Signal for the destruction of the pool, asking thread to exit.		/// Signal for the destruction of the pool, asking thread to exit.
bool EnableFlag;		bool EnableFlag = true;
#endif		#endif

unsigned ThreadCount;		unsigned ThreadCount;
};		};
}		}

#endif // LLVM_SUPPORT_THREAD_POOL_H		#endif // LLVM_SUPPORT_THREAD_POOL_H

llvm/lib/Support/ThreadPool.cpp

Show All 15 Lines
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS

ThreadPool::ThreadPool(ThreadPoolStrategy S)		ThreadPool::ThreadPool(ThreadPoolStrategy S)
: ActiveThreads(0), EnableFlag(true),		: ThreadCount(S.compute_thread_count()) {
ThreadCount(S.compute_thread_count()) {
// Create ThreadCount threads that will loop forever, wait on QueueCondition		// Create ThreadCount threads that will loop forever, wait on QueueCondition
// for tasks to be queued or the Pool to be destroyed.		// for tasks to be queued or the Pool to be destroyed.
Threads.reserve(ThreadCount);		Threads.reserve(ThreadCount);
for (unsigned ThreadID = 0; ThreadID < ThreadCount; ++ThreadID) {		for (unsigned ThreadID = 0; ThreadID < ThreadCount; ++ThreadID) {
Threads.emplace_back([S, ThreadID, this] {		Threads.emplace_back([S, ThreadID, this] {
S.apply_thread_strategy(ThreadID);		S.apply_thread_strategy(ThreadID);
while (true) {		while (true) {
PackagedTaskTy Task;		PackagedTaskTy Task;
{		{
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);
// Wait for tasks to be pushed in the queue		// Wait for tasks to be pushed in the queue
QueueCondition.wait(LockGuard,		QueueCondition.wait(LockGuard,
[&] { return !EnableFlag \|\| !Tasks.empty(); });		[&] { return !EnableFlag \|\| !Tasks.empty(); });
// Exit condition		// Exit condition
if (!EnableFlag && Tasks.empty())		if (!EnableFlag && Tasks.empty())
return;		return;
// Yeah, we have a task, grab it and release the lock on the queue		// Yeah, we have a task, grab it and release the lock on the queue

// We first need to signal that we are active before popping the queue		// We first need to signal that we are active before popping the queue
// in order for wait() to properly detect that even if the queue is		// in order for wait() to properly detect that even if the queue is
// empty, there is still a task in flight.		// empty, there is still a task in flight.
{
std::unique_lock<std::mutex> LockGuard(CompletionLock);
++ActiveThreads;		++ActiveThreads;
}
Task = std::move(Tasks.front());		Task = std::move(Tasks.front());
Tasks.pop();		Tasks.pop();
}		}
// Run the task we just grabbed		// Run the task we just grabbed
Task();		Task();

		bool Notify;
{		{
// Adjust `ActiveThreads`, in case someone waits on ThreadPool::wait()		// Adjust `ActiveThreads`, in case someone waits on ThreadPool::wait()
std::unique_lock<std::mutex> LockGuard(CompletionLock);		std::lock_guard<std::mutex> LockGuard(QueueLock);
--ActiveThreads;		Notify = --ActiveThreads == 0 && Tasks.empty();
		mehdi_aminiUnsubmitted Done Reply Inline Actions Can you expand this over different variables/statements? Or at minima add explicit parentheses? I am sure the compiler parses it, but not me :) mehdi_amini: Can you expand this over different variables/statements? Or at minima add explicit parentheses?
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Can you add parentheses or split the expression? Notify = (--ActiveThreads == 0) && Tasks.empty(); or --ActiveThreads; Notify = !ActiveThreads && Tasks.empty(); The latter avoid the side effect on ActiveThreads in the expression, making the second line "pure" and easier to read. mehdi_amini: Can you add parentheses or split the expression? ``` Notify = (--ActiveThreads == 0)…
		aganeaUnsubmitted Not Done Reply Inline Actions Is it worth generalizing the notify condition between this and `ThreadPool::wait()` below, to ease future maintenance/comprehension? aganea: Is it worth generalizing the notify condition between this and `ThreadPool::wait()` below, to…
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Is it worth generalizing the notify condition between this and ThreadPool::wait() below, to ease future maintenance/comprehension? Sorry that I am not following this suggestion. Can you elaborate? MaskRay: > Is it worth generalizing the notify condition between this and ThreadPool::wait() below, to…
		aganeaUnsubmitted Done Reply Inline Actions I meant a function, to be used by the code here and the code below in `wait()`, simply to signify (to users who read the code) that it should check exactly the same things? private: bool workCompletedNoLock() { return !ActiveThreads && Tasks.empty(); } aganea: I meant a function, to be used by the code here and the code below in `wait()`, simply to…
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Gotcha. Good idea! MaskRay: Gotcha. Good idea!
}		}
		// Notify task completion if this is the last active thread, in case
// Notify task completion, in case someone waits on ThreadPool::wait()		// someone waits on ThreadPool::wait().
		if (Notify)
CompletionCondition.notify_all();		CompletionCondition.notify_all();
		mehdi_aminiUnsubmitted Done Reply Inline Actions Update the comment here mehdi_amini: Update the comment here
}		}
});		});
}		}
}		}

void ThreadPool::wait() {		void ThreadPool::wait() {
// Wait for all threads to complete and the queue to be empty		// Wait for all threads to complete and the queue to be empty
std::unique_lock<std::mutex> LockGuard(CompletionLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);
// The order of the checks for ActiveThreads and Tasks.empty() matters because		// The order of the checks for ActiveThreads and Tasks.empty() matters because
// any active threads might be modifying the Tasks queue, and this would be a		// any active threads might be modifying the Tasks queue, and this would be a
// race.		// race.
CompletionCondition.wait(LockGuard,		CompletionCondition.wait(LockGuard,
[&] { return !ActiveThreads && Tasks.empty(); });		[&] { return !ActiveThreads && Tasks.empty(); });
}		}

std::shared_future<void> ThreadPool::asyncImpl(TaskTy Task) {		std::shared_future<void> ThreadPool::asyncImpl(TaskTy Task) {
Show All 23 Lines	ThreadPool::~ThreadPool() {
for (auto &Worker : Threads)		for (auto &Worker : Threads)
Worker.join();		Worker.join();
}		}

#else // LLVM_ENABLE_THREADS Disabled		#else // LLVM_ENABLE_THREADS Disabled

// No threads are launched, issue a warning if ThreadCount is not 0		// No threads are launched, issue a warning if ThreadCount is not 0
ThreadPool::ThreadPool(ThreadPoolStrategy S)		ThreadPool::ThreadPool(ThreadPoolStrategy S)
: ActiveThreads(0), ThreadCount(S.compute_thread_count()) {		: ThreadCount(S.compute_thread_count()) {
if (ThreadCount != 1) {		if (ThreadCount != 1) {
errs() << "Warning: request a ThreadPool with " << ThreadCount		errs() << "Warning: request a ThreadPool with " << ThreadCount
<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";		<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";
}		}
}		}

void ThreadPool::wait() {		void ThreadPool::wait() {
// Sequential implementation running the tasks		// Sequential implementation running the tasks
Show All 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Support] Simplify and optimize ThreadPool
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 260115

llvm/include/llvm/Support/ThreadPool.h

llvm/lib/Support/ThreadPool.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[Support] Simplify and optimize ThreadPoolClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 260115

llvm/include/llvm/Support/ThreadPool.h

llvm/lib/Support/ThreadPool.cpp

[Support] Simplify and optimize ThreadPool
ClosedPublic