- Merge QueueLock and CompletionLock.
- Avoid spurious CompletionCondition.notify_all() when ActiveThreads is greater than 0.
- Use default member initializers.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Isn't this technically pessimizing some cases by sharing a lock (and so increasing potential contention here)?
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
57 | Can you expand this over different variables/statements? Or at minima add explicit parentheses? I am sure the compiler parses it, but not me :) | |
59–62 | Update the comment here |
No. Only the thread(s) calling ThreadPool::Wait() waits on CompletionCondition. When it gets stuck, it will not awake spuriously. Note, before, we took locks 3 times in the loop body. Now it is 2...
I believe this version is strictly better.
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
57 | Is it worth generalizing the notify condition between this and ThreadPool::wait() below, to ease future maintenance/comprehension? |
No. Only the thread(s) calling ThreadPool::Wait() waits on CompletionCondition. When it gets stuck, it will not awake spuriously. Note, before, we took locks 3 times in the loop body. Now it is 2...
I wasn't worried about contention with the waiting threads, but between working threads: they are taking the QueueLock more often now (it was taken once or twice per iteration of the loop body before and is now taken an extra time).
I am not sure we'd be able to measure any difference here, just curious how you thought about this tradeoff!
Originally I think I started writing all this code with ActiveThreads being an atomic so that the working threads can increment/decrement it without taking any lock: however adding the "wait()" then I couldn't see how to avoid taking the CompletionLock because of how std::condition_variable is setup.
That said since the waiting thread takes the QueueLock with your change, would leaving the ActiveThreads atomic allow to not take the Queue lock on task completion?
I'm wondering about something like this:
// Notify task completion if this is the last active thread, in case // someone waits on ThreadPool::wait(). bool Notify = --ActiveThreads == 0; if (Notify) CompletionCondition.notify_all();
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
57 | Can you add parentheses or split the expression? Notify = (--ActiveThreads == 0) && Tasks.empty(); or --ActiveThreads; Notify = !ActiveThreads && Tasks.empty(); The latter avoid the side effect on ActiveThreads in the expression, making the second line "pure" and easier to read. |
There are two stages, before and after the execution of as task. The original code sequence used different locks. I suppose this is what you meant by less contention: a worker in the first stage (holding QueueLock while not holding CompletionLock) cannot contend with a worker in the second stage (holding CompletionLock)..
But note that:
- there is still a window when a worker in the first stage can hold CompletionLock. The window is small, but it still exists.
- the spurious signalling of a condition variable (CompletionCondition) can cause contention because a condition variable has an internal lock.
Originally I think I started writing all this code with ActiveThreads being an atomic so that the working threads can increment/decrement it without taking any lock: however adding the "wait()" then I couldn't see how to avoid taking the CompletionLock because of how std::condition_variable is setup.
That said since the waiting thread takes the QueueLock with your change, would leaving the ActiveThreads atomic allow to not take the Queue lock on task completion?
I'm wondering about something like this:// Notify task completion if this is the last active thread, in case // someone waits on ThreadPool::wait(). bool Notify = --ActiveThreads == 0; if (Notify) CompletionCondition.notify_all();
The suggested code sequence has the problem of lost signals. It happens because a waiter thread can change the observed state between waiter's testing and blocking.
worker: pop one item from Tasks and Tasks is now empty waiter: lock QueueLock waiter: test that `!ActiveThreads && Tasks.empty()` is false and decide to unlock and block worker: decrement ActiveThreads to zero worker: signal QueueLock # the signal is lost waiter: unlock QueueLock and block
With an atomic ActiveThreads, an alternative code sequence is possible, but I'm not sure it is better:
bool Notify = --ActiveThreads == 0; { lock_guard guard(QueueLock); Notify &= Tasks.empty(); } if (Notify) CompletionCondition.notify_all();
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
57 |
Sorry that I am not following this suggestion. Can you elaborate? |
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
57 | I meant a function, to be used by the code here and the code below in wait(), simply to signify (to users who read the code) that it should check exactly the same things? private: bool workCompletedNoLock() { return !ActiveThreads && Tasks.empty(); } |
Place !ActiveThreads && Tasks.empty() in a new member function
llvm/lib/Support/ThreadPool.cpp | ||
---|---|---|
57 | Gotcha. Good idea! |
Yes this is what I meant, the "second stage" lock contends with the first stage queue manipulation now, while it didn't before. It also contends with any thread trying to enqueue.
Overall for a given execution there will be quite a lot more (almost doubled?) locking/unlocking of the QueueLock right?
- the spurious signalling of a condition variable (CompletionCondition) can cause contention because a condition variable has an internal lock.
We're getting into implementation / platform specific, but I thought that signaling was mostly gonna map to one or two futex syscall?
The suggested code sequence has the problem of lost signals. It happens because a waiter thread can change the observed state between waiter's testing and blocking.
...
Thanks! This is likely the sequence that lead me to add this lock at the time... IIRC you can avoid this with low-level use of futex but there isn't a portable solution.
Anyway, I'm fine with this change overall, I don't think this thread pool implementation is optimized for very small tasks anyway.
Anyway, I'm fine with this change overall, I don't think this thread pool implementation is optimized for very small tasks anyway.
May I get a LGTM? :)
Can you expand this over different variables/statements? Or at minima add explicit parentheses? I am sure the compiler parses it, but not me :)