This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
11/30
ThreadPool.h
-
lib/Support/
-
Support/
8/16
ThreadPool.cpp
-
tools/llvm-profdata/
-
llvm-profdata/
1/4
llvm-profdata.cpp
-
unittests/Support/
-
Support/
2/2
ThreadPool.cpp

Differential D123225

[ThreadPool] add ability to group tasks into separate groups
ClosedPublic

Authored by llunak on Apr 6 2022, 8:29 AM.

Download Raw Diff

Details

Reviewers

chandlerc
mehdi_amini
MaskRay
aganea

Commits

rG8ef5710e6303: [ThreadPool] add ability to group tasks into separate groups

Summary

This is needed for parallelizing of loading modules symbols in LLDB (D122975). Currently LLDB can parallelize indexing symbols when loading a module, but modules are loaded sequentially. If LLDB index cache is enabled, this means that the cache loading is not parallelized, even though it could. However doing that creates a threadpool-within-threadpool situation, so the number of threads would not be properly limited.

This change adds ThreadPoolTaskGroup as a tag type that can be used with ThreadPool calls to put tasks into groups that can be independently waited for (even recursively from within a task) but still run in the same thread pool.

Diff Detail

Event Timeline

llunak created this revision.Apr 6 2022, 8:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 8:29 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald Transcript

llunak requested review of this revision.Apr 6 2022, 8:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 8:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

llunak mentioned this in D123226: [lldb] use one shared ThreadPool and task groups.Apr 6 2022, 8:32 AM

llunak mentioned this in D122975: parallelize calling of Module::PreloadSymbols().

llunak updated this revision to Diff 420890.Apr 6 2022, 8:35 AM

Harbormaster completed remote builds in B158252: Diff 420890.Apr 6 2022, 10:19 AM

Not sure @chandlerc will be able to review your patch, adding some people that have been working in this space recently.

I think this is an interesting change, but I'm a bit worried that it adds complexity to the the thread task loop. I am wondering if this problem couldn't be solved by packaging the TaskGroup logic in a lambda. In essence the call stack would be: the thread loop in ThreadPool::grow() calls Task() which would call the TaskGroup logic lambda, which would call the user lambda. Regular non-TaskGroup would not go through that logic.

llvm/include/llvm/Support/ThreadPool.h
52	Can you move this above the `ThreadPool`? It'll be easier for future code readers I think.
61	Do you think we could have all these `TaskGroup`-specific functions inside the `TaskGroup` class instead? As an alternative, given your current usage, the tasks could be even queued in a `std::vector` prior, and passed to the `TaskGroup` constructor.
llvm/tools/llvm-profdata/llvm-profdata.cpp
41	Is this related to the current patch?

Herald added a subscriber: StephenFan. · View Herald TranscriptApr 6 2022, 1:05 PM

I like the overall direction of this patch. I do see value in aganea's comments about possibly adding more methods to TaskGroup

llvm/include/llvm/Support/ThreadPool.h
61	I do like that idea, but if we do that it might be best to have TaskGroup take a "ThreadPool &" as a constructor argument so that it can ensure we always use the right ThreadPool if we ask the TaskGroup to wait. Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users keep adding things to the TaskGroup even if work is currently going on? Should we freeze the contents of a TaskGroup once we start waiting on this?

That's a pretty nice improvement! :)

Reading the patch, I'm not sure I have a good grasp about all the interactions of this with the existing invariants of this class: concurrency is always complex...

In essence the call stack would be: the thread loop in ThreadPool::grow() calls Task() which would call the TaskGroup logic lambda, which would call the user lambda. Regular non-TaskGroup would not go through that logic.

I don't quite get what you mean? Can you elaborate a bit?
I agree that the complexity increase is worrisome, so any idea to manage it is welcome :)

llvm/include/llvm/Support/ThreadPool.h
89	Do we have a way to assert on this?
101	I think you need more documentation for the groups, at minima: in particular ownership model / lifetime expectation for the `TaskGroup` objects. the existing APIs that don't take group should like be updated to be documented with respect to groups (is there a concept of a "default group" represented by the nullptr?).
103	Why the change to `deque` in this patch?
131–133	Can you document it please?
llvm/lib/Support/ThreadPool.cpp
98	Seems like if `GroupOfTask` is non-null you're calling `workCompletedUnlocked` twice, why?
104	" notify also threads waiting" ?
105	What does "this function" means here?
118	Nit: seems like The `find_if(...) == end()` could be replaced by `!llvm::any_of(...)`
llvm/tools/llvm-profdata/llvm-profdata.cpp
41	I suspect this file was depending on this header transitively and the include was removed from the ThreadPool.h header.

In D123225#3435142, @mehdi_amini wrote:

In essence the call stack would be: the thread loop in ThreadPool::grow() calls Task() which would call the TaskGroup logic lambda, which would call the user lambda. Regular non-TaskGroup would not go through that logic.

I don't quite get what you mean? Can you elaborate a bit?
I agree that the complexity increase is worrisome, so any idea to manage it is welcome :)

I'm just saying that we shouldn't be modifying ThreadPool::grow and instead implement the TaskGroup logic into a intermediate function, if that's possible. Instead of this currently:

template <typename Func>
auto async(TaskGroup &Group, Func &&F) -> std::shared_future<decltype(F())> {
  return asyncImpl(std::function<decltype(F())()>(std::forward<Func>(F)),
                   &Group);
}

instead we could have this:

template <typename Func>
auto TaskGroup::async(TaskGroup &Group, Func &&F) -> std::shared_future<decltype(F())> {
  return asyncImpl([]() {
     // ...TaskGroup logic...
     F();
     // ...TaskGroup logic... (for example, notify the local TaskGroup condition when all group tasks are done)
  }, &Group);
}

thus it makes sense to make async() and wait() members of TaskGroup and store the necessary state in the TaskGroup object itself (such a task counter -- currently stored by DenseMap<TaskGroup *, unsigned> ActiveGroups, a condition variable, etc.).
I feel the TaskGroup's logic is a subset of the generalized case we already have.

llvm/include/llvm/Support/ThreadPool.h
61	@clayborg Right now the freezing is implicit through the use of a `condition_variable` & a `mutex`, see `llvm/lib/Support/ThreadPool.cpp`, L124. It is a interesting question, should we make it support dynamic additions on the fly, while waiting? I would be tempted to wait for a practical use-case, but perhaps there's already one?
llvm/tools/llvm-profdata/llvm-profdata.cpp
41	Ah, good catch, thanks :-)

Let's first deal with the conceptual stuff, no point in dealing with the small code things as long as there's not agreement that this is the way to implement it.

In D123225#3434132, @aganea wrote:

Not sure @chandlerc will be able to review your patch, adding some people that have been working in this space recently.

I used what CODE_OWNERS.txt lists for 'Support'.

In D123225#3435887, @aganea wrote:

I'm just saying that we shouldn't be modifying ThreadPool::grow and instead implement the TaskGroup logic into a intermediate function, if that's possible.

I think that would be possible only with non-recursive use of ThreadPool. But D122975 requires creating running tasks that themselves run tasks and wait for them, which requires two things:

Waiting just for a specific subset of tasks, otherwise a task could deadlock waiting for itself. This requires the queue-processing code to signal such state.
Not wasting thread pool slots on waiting, otherwise they all could end up waiting for tasks that wouldn't have slots to run and deadlock. My approach handles that by letting such threads process tasks while waiting for a group. Another approach could be launching additional temporary threads, but I don't see how that would make anything simpler or better.

I don't see how either of these could be done without altering the queue-processing code in ThreadPool itself. So unless you have a good idea there, I think the most I can move to TaskGroup is syntactic sugar.

clayborg added inline comments.Apr 7 2022, 10:06 AM

llvm/include/llvm/Support/ThreadPool.h
101	That would be good. Maybe add headerdoc before the TaskGroup class. If we end up making the TaskGroup constructor take a reference to the ThreadPool, we should mention that the ThreadPool must outlive the TaskGroup as far as lifetime goes. More documentation or examples would be nice for: Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread details on if you can add to a group while work is ongoing for that group
103	This is the new code that is adding the ability to run work in the groups
192	Is this needed now that we have the TaskGroup objects? Or does this get signaled when ever all TaskGroups complete all of their work? Maybe update the documentation stating this is used for ThreadPool::wait() only for when all work is done?
llvm/lib/Support/ThreadPool.cpp
54	initialize to nullptr
98	Yeah this seems is should just be: NotifyGroup = GroupOfTask != nullptr;
135–137	Can we detect recursive calls to wait on the same group here?

clayborg added inline comments.Apr 7 2022, 10:09 AM

llvm/include/llvm/Support/ThreadPool.h
103	Ignore my comment, I see that this type was used below where a queue was being used,.

dexonsmith removed a subscriber: dexonsmith.Apr 7 2022, 12:15 PM

llunak marked 4 inline comments as done.Apr 16 2022, 10:46 AM

llunak added inline comments.

llvm/include/llvm/Support/ThreadPool.h
61	As an alternative, given your current usage, the tasks could be even queued in a `std::vector` prior, and passed to the `TaskGroup` constructor. Why? That seems unnecessary and inconsistent with how ThreadPool is used now.
61	Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users keep adding things to the TaskGroup even if work is currently going on? Should we freeze the contents of a TaskGroup once we start waiting on this? I see no reason to do it differently from ThreadPool, and ThreadPool currently says it's an error to add tasks while waiting but then does nothing about it (and AFAICT it's actually an unnecessary restriction).
89	Not without extra variables added just for detecting that. Is that something that would be done in LLVM code?
101	I'm confused about this part about documentation, because I think all of this is either documented or obvious. Did you miss those or are they not as obvious as I find them to be? It seems to me that you expect this change to be more complex than it is - it's just an ability to tag tasks with a TaskGroup pointer and then wait on tasks with a specific tag. The only somewhat complex thing here is making sure wait() called from within a task doesn't not deadlock, and that's not an API thing. in particular ownership model / lifetime expectation for the `TaskGroup` objects. The TaskGroup objects are passed by reference, so I don't see how there could be any ownership. And TaskGroup objects are groups, so lifetime of TaskGroup objects and lifetime of groups are the same thing. the existing APIs that don't take group should like be updated to be documented with respect to groups (is there a concept of a "default group" represented by the nullptr?). Tasks without a group are tasks without a group. It really matters only for wait(), which already says that it waits for all threads. Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread This is done by simply doing it, there's nothing special about it from the API point of view, and wait() documentation says that this is possible. What more documentation would you need? details on if you can add to a group while work is ongoing for that group The description of wait() already says that no. What other details do you need?
103	std::queue has only front() and back(), which is insufficient for checking only tasks in a specific group.
192	Yes, it is still needed. TaskGroup are dumb IDs, so they change nothing about this. This gets signalled when either all tasks in a group get finished, or when all tasks get finished (I'll add this to the description).
llvm/lib/Support/ThreadPool.cpp
54	It gets set on all paths (=only one) before it's used, and if it wouldn't, then initializing it here would prevent a warning about the mistake from -Wsometimes-uninitialized.
98	Because 'Notify' is to be done if work is done for the group or for all tasks (nullptr), while "NotifyGroup' is to be done if the work is done for the group. But I can replace the second call with 'Notify' since it's the same value.
135–137	Not without extra debug variables (see a similar question for wait()). Unless you count a guaranteed deadlock as detecting.
llvm/tools/llvm-profdata/llvm-profdata.cpp
41	Yes.

Small tweaks based on feedback.
Changed ThreadPool::TaskGroup to standalone ThreadPoolTaskGroup that has trivial calls forwarding to ThreadPool functions.

Harbormaster completed remote builds in B159935: Diff 423244.Apr 16 2022, 11:44 AM

I like the change to move some of the API on the group themselves! In particular waiting in the destructor makes it "safer" to me (can't have dangling group pointers in the map in the pool)

llvm/include/llvm/Support/ThreadPool.h
38	Adding a mention of the concept of groups here would be valuable as well I think.
89	Yes it is common done, we guard such code in header within `#if LLVM_ENABLE_ABI_BREAKING_CHECKS`, you can grep the code base for examples (here is one: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/LoopInfo.h#L83-L87 )
101	I'm confused about this part about documentation, because I think all of this is either documented or obvious. Did you miss those or are they not as obvious as I find them to be? I think there is always a natural tendency to find things "obvious" when we design some thing with a mental model and other assumptions in mind. But someone coming there new with fresh eyes won't find things as obvious. in particular ownership model / lifetime expectation for the TaskGroup objects. The TaskGroup objects are passed by reference, so I don't see how there could be any ownership. And TaskGroup objects are groups, so lifetime of TaskGroup objects and lifetime of groups are the same thing. Right but that has implication on the lifetime: for example you should destroy a group while there are still tasks in flight using this group. I think this particular point isn't a concern anymore now that you call wait() in the ThreadPoolTaskGroup destructor. Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread This is done by simply doing it, there's nothing special about it from the API point of view, and wait() documentation says that this is possible. What more documentation would you need? Waiting within a worker thread wasn't allowed until now: it is again more of a question of forming a mental model about the system. Adding this concept of groups (which I welcome: it is solving a major limitation of the pool) is really making the existing model of a simple ThreadPool no longer as "simple", hence why I'm may seem overly cautious :) Some things can be surprising as well for someone who does not really know or think about the implementation details but something like this: group1.async([&]() { auto f = group2.async([&]() { return 1; }); f.wait(); // May deadlock }) but: group1.async([&]() { auto f = group2.async([&]() { return 1; }); group2.wait(); // yield current thread to run tasks from group2 if needed f.wait(); // May not deadlock }) Not that I have a great suggestion on how to express the system and its invariant concisely, but this isn't easy to figure all of this out without reading carefully the implementation right now. (and to be fair, even the "simple" pre-existing model isn't carefully documented in the API)

llunak marked an inline comment as done.Apr 18 2022, 10:29 AM

llunak added inline comments.

llvm/include/llvm/Support/ThreadPool.h
89	I see. I've added an assert for self-wait, but I actually see no good reason to enforce that there can be only one wait() for a group, so I'll instead remove that comment.
101	I think the concept is still fairly simple, as long as one doesn't forget that waiting for oneself will deadlock. But I have added some comments saying this explicitly, I hope that's enough.

Added asserts that wait() will not deadlock waiting for itself.
Added more documentation about deadlocks and usage from within threadpool's threads.

LGTM, but please wait for @aganea to have another look at it.

mehdi_amini accepted this revision.Apr 18 2022, 10:35 AM

This revision is now accepted and ready to land.Apr 18 2022, 10:35 AM

Harbormaster completed remote builds in B160080: Diff 423425.Apr 18 2022, 10:56 AM

Looks great, thanks for implementing many changes!

MaskRay added inline comments.Apr 18 2022, 9:28 PM

llvm/include/llvm/Support/ThreadPool.h
103	Prefer using
153	Use `emplace_back`
166	Use `emplace_back`
llvm/unittests/Support/ThreadPool.cpp
307	Task A runs in the first thread. It

MaskRay added inline comments.Apr 18 2022, 9:46 PM

llvm/include/llvm/Support/ThreadPool.h
61	The `inline` keyword can be removed.
llvm/lib/Support/ThreadPool.cpp
72	`workCompletedUnlocked(WaitingForGroup)` is slow when WaitingForGroup is not null. You may check the inverse of other conditions or cache the result of `workCompletedUnlocked(WaitingForGroup)`
llvm/unittests/Support/ThreadPool.cpp
327	What does this do? Use std::this_thread::sleep_for?

aganea added inline comments.Apr 19 2022, 12:13 PM

llvm/include/llvm/Support/ThreadPool.h
104	This is a bit confusing, we already have a `llvm::TaskQueue`, can we change this to something else? Just `Queue` maybe?
220	Would you mind please inserting a line break here, and after each other function below, just to match the style in this file?
llvm/lib/Support/ThreadPool.cpp
53	Shouldn't this be a stack since we're re-entering `processTasks()`? The following might not assert: int main() { ThreadPool TP{hardware_concurrency(1)}; ThreadPoolTaskGroup G1(TP); ThreadPoolTaskGroup G2(TP); G1.async([]{ G2.wait(); // commenting this line would assert below. G1.wait(); // will deadlock without assert if the line above is there. }); G2.async([]{ // nop }); return 0; }
142	I think there's a bug here, it was there before but now we're using `isWorkerThread()` a lot more. If the `ThreadPool` is shutting down, thus `ThreadLock` is locked in `ThreadPool::~ThreadPool()`, then if anyone calls `.wait()` in a managed thread, `isWorkerThread()` would deadlock. Probably better replacing with a `llvm::sys::RWMutex ThreadLock`, and use a `std::shared_lock<>` here instead.

MaskRay added inline comments.Apr 19 2022, 1:01 PM

llvm/include/llvm/Support/ThreadPool.h
104	We can even omit the type alias. I think this is only used once for the member variable.

Really excited for this! Thanks for taking it on.

aganea added inline comments.Apr 26 2022, 5:49 AM

llvm/lib/Support/ThreadPool.cpp
140	One more thing perhaps: tasks `.wait()`-ing will be "suspended" by re-entering `processTasks`. Any remaining tasks in the queue will be scheduled randomly over the waiting task(s), and this could easily create stack-overflows, since the scheduling is non-deterministic (so we could have several tasks waiting, piled on the top of each other). Depending on the stack size per platform, and how much stack each task eats up, this could potentially cause random crashes in production. Probably the less intrusive approach would be to use fibers for suspended tasks. I suppose we could do that as a secondary step after this patch.

llunak marked 10 inline comments as done.Apr 30 2022, 11:15 PM

llunak added inline comments.

llvm/lib/Support/ThreadPool.cpp
140	Yes, but this is going to cost some resource no matter what, so it's just a choice of what resource it will be. And note that the recursion will be limited to the number of groups, since waiting loops are not allowed.

Updated according to review comments.

Harbormaster completed remote builds in B162138: Diff 426268.May 1 2022, 12:02 AM

LGTM, thanks for all the changes @llunak!

llvm/lib/Support/ThreadPool.cpp
140	I think it is fine for now, let's see first if this is really a problem in practice. Worst case, it could be fixed by ensuring that the scope calling the `.wait` doesn't hold on too many stack variables.

MaskRay accepted this revision.May 2 2022, 2:08 PM

Looks great, really looking forward to seeing this go in!

The "Build Status" here lists a failure, but I cannot reproduce any test failure locally, and in the remote log I do not see any test failure, it looks like the testsuite itself failing. Is that something I should ignore, or am I missing something?

llvm-lit: /var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/discovery.py:247: warning: test suite 'lld-Unit' contained no tests

llvm-lit: /var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/llvm/config.py:438: note: using clang: /usr/bin/clang

llvm-lit: /var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/llvm/config.py:438: note: using ld.lld: /usr/bin/ld.lld

Traceback (most recent call last):

  File "/var/lib/buildkite-agent/builds/llvm-project/build/./bin/llvm-lit", line 59, in <module>

    main(builtin_parameters)

  File "/var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/main.py", line 111, in main

    selected_tests, discovered_tests = GoogleTest.post_process_shard_results(

  File "/var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 231, in post_process_shard_results

    testsuites = json.load(f)['testsuites']

  File "/usr/lib/python3.9/json/__init__.py", line 293, in load

    return loads(fp.read(),

I'm not sure what's going on, maybe rebase and re-upload a patch?

It may be also safe to ignore, it's not this patch that's responsible for this.

This revision was landed with ongoing or failed builds.May 3 2022, 9:19 PM

Closed by commit rG8ef5710e6303: [ThreadPool] add ability to group tasks into separate groups (authored by llunak). · Explain Why

This revision was automatically updated to reflect the committed changes.

llunak added a commit: rG8ef5710e6303: [ThreadPool] add ability to group tasks into separate groups.

Revision Contents

Path

Size

llvm/

include/

llvm/

Support/

ThreadPool.h

78 lines

lib/

Support/

ThreadPool.cpp

138 lines

tools/

llvm-profdata/

llvm-profdata.cpp

1 line

unittests/

Support/

ThreadPool.cpp

167 lines

Diff 423244

llvm/include/llvm/Support/ThreadPool.h

//===-- llvm/Support/ThreadPool.h - A ThreadPool implementation -- C++ --===//		//===-- llvm/Support/ThreadPool.h - A ThreadPool implementation -- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines a crude C++11 based thread pool.		// This file defines a crude C++11 based thread pool.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_SUPPORT_THREADPOOL_H		#ifndef LLVM_SUPPORT_THREADPOOL_H
#define LLVM_SUPPORT_THREADPOOL_H		#define LLVM_SUPPORT_THREADPOOL_H

		#include "llvm/ADT/DenseMap.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
#include "llvm/Support/thread.h"		#include "llvm/Support/thread.h"

#include <future>		#include <future>

#include <condition_variable>		#include <condition_variable>
		#include <deque>
#include <functional>		#include <functional>
#include <memory>		#include <memory>
#include <mutex>		#include <mutex>
#include <queue>
#include <utility>		#include <utility>

namespace llvm {		namespace llvm {

		class ThreadPoolTaskGroup;

/// A ThreadPool for asynchronous parallel execution on a defined number of		/// A ThreadPool for asynchronous parallel execution on a defined number of
/// threads.		/// threads.
///		///
/// The pool keeps a vector of threads alive, waiting on a condition variable		/// The pool keeps a vector of threads alive, waiting on a condition variable
/// for some work to become available.		/// for some work to become available.
		mehdi_aminiUnsubmitted Done Reply Inline Actions Adding a mention of the concept of groups here would be valuable as well I think. mehdi_amini: Adding a mention of the concept of groups here would be valuable as well I think.
class ThreadPool {		class ThreadPool {
public:		public:
/// Construct a pool using the hardware strategy \p S for mapping hardware		/// Construct a pool using the hardware strategy \p S for mapping hardware
/// execution resources (threads, cores, CPUs)		/// execution resources (threads, cores, CPUs)
/// Defaults to using the maximum execution resources in the system, but		/// Defaults to using the maximum execution resources in the system, but
/// accounting for the affinity mask.		/// accounting for the affinity mask.
ThreadPool(ThreadPoolStrategy S = hardware_concurrency());		ThreadPool(ThreadPoolStrategy S = hardware_concurrency());

/// Blocking destructor: the pool will wait for all the threads to complete.		/// Blocking destructor: the pool will wait for all the threads to complete.
~ThreadPool();		~ThreadPool();

/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
template <typename Function, typename... Args>		template <typename Function, typename... Args>
		aganeaUnsubmitted Done Reply Inline Actions Can you move this above the `ThreadPool`? It'll be easier for future code readers I think. aganea: Can you move this above the `ThreadPool`? It'll be easier for future code readers I think.
inline auto async(Function &&F, Args &&...ArgList) {		inline auto async(Function &&F, Args &&...ArgList) {
auto Task =		auto Task =
std::bind(std::forward<Function>(F), std::forward<Args>(ArgList)...);		std::bind(std::forward<Function>(F), std::forward<Args>(ArgList)...);
return async(std::move(Task));		return async(std::move(Task));
}		}

		/// Overload, task will be in the given task group.
		template <typename Function, typename... Args>
		inline auto async(ThreadPoolTaskGroup &Group, Function &&F,
		aganeaUnsubmitted Not Done Reply Inline Actions Do you think we could have all these `TaskGroup`-specific functions inside the `TaskGroup` class instead? As an alternative, given your current usage, the tasks could be even queued in a `std::vector` prior, and passed to the `TaskGroup` constructor. aganea: Do you think we could have all these `TaskGroup`-specific functions inside the `TaskGroup`…
		clayborgUnsubmitted Not Done Reply Inline Actions I do like that idea, but if we do that it might be best to have TaskGroup take a "ThreadPool &" as a constructor argument so that it can ensure we always use the right ThreadPool if we ask the TaskGroup to wait. Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users keep adding things to the TaskGroup even if work is currently going on? Should we freeze the contents of a TaskGroup once we start waiting on this? clayborg: I do like that idea, but if we do that it might be best to have TaskGroup take a "ThreadPool &"…
		aganeaUnsubmitted Not Done Reply Inline Actions @clayborg Right now the freezing is implicit through the use of a `condition_variable` & a `mutex`, see `llvm/lib/Support/ThreadPool.cpp`, L124. It is a interesting question, should we make it support dynamic additions on the fly, while waiting? I would be tempted to wait for a practical use-case, but perhaps there's already one? aganea: @clayborg Right now the freezing is implicit through the use of a `condition_variable` & a…
		llunakAuthorUnsubmitted Not Done Reply Inline Actions Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users keep adding things to the TaskGroup even if work is currently going on? Should we freeze the contents of a TaskGroup once we start waiting on this? I see no reason to do it differently from ThreadPool, and ThreadPool currently says it's an error to add tasks while waiting but then does nothing about it (and AFAICT it's actually an unnecessary restriction). llunak: > Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users…
		llunakAuthorUnsubmitted Not Done Reply Inline Actions As an alternative, given your current usage, the tasks could be even queued in a `std::vector` prior, and passed to the `TaskGroup` constructor. Why? That seems unnecessary and inconsistent with how ThreadPool is used now. llunak: > As an alternative, given your current usage, the tasks could be even queued in a `std…
		MaskRayUnsubmitted Done Reply Inline Actions The `inline` keyword can be removed. MaskRay: The `inline` keyword can be removed.
		Args &&...ArgList) {
		auto Task =
		std::bind(std::forward<Function>(F), std::forward<Args>(ArgList)...);
		return async(Group, std::move(Task));
		}

/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
template <typename Func>		template <typename Func>
auto async(Func &&F) -> std::shared_future<decltype(F())> {		auto async(Func &&F) -> std::shared_future<decltype(F())> {
return asyncImpl(std::function<decltype(F())()>(std::forward<Func>(F)));		return asyncImpl(std::function<decltype(F())()>(std::forward<Func>(F)),
		nullptr);
		}

		template <typename Func>
		auto async(ThreadPoolTaskGroup &Group, Func &&F)
		-> std::shared_future<decltype(F())> {
		return asyncImpl(std::function<decltype(F())()>(std::forward<Func>(F)),
		&Group);
}		}

/// Blocking wait for all the threads to complete and the queue to be empty.		/// Blocking wait for all the threads to complete and the queue to be empty.
/// It is an error to try to add new tasks while blocking on this call.		/// It is an error to try to add new tasks while blocking on this call.
void wait();		void wait();

		/// Blocking wait for only all the threads in the given group to complete.
		/// It is possible to recursively wait even inside a task, if the group is
		/// different. There may be only one active wait() call for a given group.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Do we have a way to assert on this? mehdi_amini: Do we have a way to assert on this?
		llunakAuthorUnsubmitted Not Done Reply Inline Actions Not without extra variables added just for detecting that. Is that something that would be done in LLVM code? llunak: Not without extra variables added just for detecting that. Is that something that would be done…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Yes it is common done, we guard such code in header within `#if LLVM_ENABLE_ABI_BREAKING_CHECKS`, you can grep the code base for examples (here is one: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/LoopInfo.h#L83-L87 ) mehdi_amini: Yes it is common done, we guard such code in header within `#if…
		llunakAuthorUnsubmitted Done Reply Inline Actions I see. I've added an assert for self-wait, but I actually see no good reason to enforce that there can be only one wait() for a group, so I'll instead remove that comment. llunak: I see. I've added an assert for self-wait, but I actually see no good reason to enforce that…
		/// It is possible to add new tasks while blocking on this call, if those
		/// tasks are from a different group.
		void wait(ThreadPoolTaskGroup &Group);

// TODO: misleading legacy name warning!		// TODO: misleading legacy name warning!
// Returns the maximum number of worker threads in the pool, not the current		// Returns the maximum number of worker threads in the pool, not the current
// number of threads!		// number of threads!
unsigned getThreadCount() const { return MaxThreadCount; }		unsigned getThreadCount() const { return MaxThreadCount; }

/// Returns true if the current thread is a worker thread of this thread pool.		/// Returns true if the current thread is a worker thread of this thread pool.
bool isWorkerThread() const;		bool isWorkerThread() const;

		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I think you need more documentation for the groups, at minima: in particular ownership model / lifetime expectation for the `TaskGroup` objects. the existing APIs that don't take group should like be updated to be documented with respect to groups (is there a concept of a "default group" represented by the nullptr?). mehdi_amini: I think you need more documentation for the groups, at minima: - in particular ownership model…
		clayborgUnsubmitted Not Done Reply Inline Actions That would be good. Maybe add headerdoc before the TaskGroup class. If we end up making the TaskGroup constructor take a reference to the ThreadPool, we should mention that the ThreadPool must outlive the TaskGroup as far as lifetime goes. More documentation or examples would be nice for: Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread details on if you can add to a group while work is ongoing for that group clayborg: That would be good. Maybe add headerdoc before the TaskGroup class. If we end up making the…
		llunakAuthorUnsubmitted Not Done Reply Inline Actions I'm confused about this part about documentation, because I think all of this is either documented or obvious. Did you miss those or are they not as obvious as I find them to be? It seems to me that you expect this change to be more complex than it is - it's just an ability to tag tasks with a TaskGroup pointer and then wait on tasks with a specific tag. The only somewhat complex thing here is making sure wait() called from within a task doesn't not deadlock, and that's not an API thing. in particular ownership model / lifetime expectation for the `TaskGroup` objects. The TaskGroup objects are passed by reference, so I don't see how there could be any ownership. And TaskGroup objects are groups, so lifetime of TaskGroup objects and lifetime of groups are the same thing. the existing APIs that don't take group should like be updated to be documented with respect to groups (is there a concept of a "default group" represented by the nullptr?). Tasks without a group are tasks without a group. It really matters only for wait(), which already says that it waits for all threads. Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread This is done by simply doing it, there's nothing special about it from the API point of view, and wait() documentation says that this is possible. What more documentation would you need? details on if you can add to a group while work is ongoing for that group The description of wait() already says that no. What other details do you need? llunak: I'm confused about this part about documentation, because I think all of this is either…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'm confused about this part about documentation, because I think all of this is either documented or obvious. Did you miss those or are they not as obvious as I find them to be? I think there is always a natural tendency to find things "obvious" when we design some thing with a mental model and other assumptions in mind. But someone coming there new with fresh eyes won't find things as obvious. in particular ownership model / lifetime expectation for the TaskGroup objects. The TaskGroup objects are passed by reference, so I don't see how there could be any ownership. And TaskGroup objects are groups, so lifetime of TaskGroup objects and lifetime of groups are the same thing. Right but that has implication on the lifetime: for example you should destroy a group while there are still tasks in flight using this group. I think this particular point isn't a concern anymore now that you call wait() in the ThreadPoolTaskGroup destructor. Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread This is done by simply doing it, there's nothing special about it from the API point of view, and wait() documentation says that this is possible. What more documentation would you need? Waiting within a worker thread wasn't allowed until now: it is again more of a question of forming a mental model about the system. Adding this concept of groups (which I welcome: it is solving a major limitation of the pool) is really making the existing model of a simple ThreadPool no longer as "simple", hence why I'm may seem overly cautious :) Some things can be surprising as well for someone who does not really know or think about the implementation details but something like this: group1.async([&]() { auto f = group2.async([&]() { return 1; }); f.wait(); // May deadlock }) but: group1.async([&]() { auto f = group2.async([&]() { return 1; }); group2.wait(); // yield current thread to run tasks from group2 if needed f.wait(); // May not deadlock }) Not that I have a great suggestion on how to express the system and its invariant concisely, but this isn't easy to figure all of this out without reading carefully the implementation right now. (and to be fair, even the "simple" pre-existing model isn't carefully documented in the API) mehdi_amini: > I'm confused about this part about documentation, because I think all of this is either…
		llunakAuthorUnsubmitted Not Done Reply Inline Actions I think the concept is still fairly simple, as long as one doesn't forget that waiting for oneself will deadlock. But I have added some comments saying this explicitly, I hope that's enough. llunak: I think the concept is still fairly simple, as long as one doesn't forget that waiting for…
private:		private:
		typedef std::deque<std::pair<std::function<void()>, ThreadPoolTaskGroup *>>
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Why the change to `deque` in this patch? mehdi_amini: Why the change to `deque` in this patch?
		clayborgUnsubmitted Not Done Reply Inline Actions This is the new code that is adding the ability to run work in the groups clayborg: This is the new code that is adding the ability to run work in the groups
		clayborgUnsubmitted Not Done Reply Inline Actions Ignore my comment, I see that this type was used below where a queue was being used,. clayborg: Ignore my comment, I see that this type was used below where a queue was being used,.
		llunakAuthorUnsubmitted Done Reply Inline Actions std::queue has only front() and back(), which is insufficient for checking only tasks in a specific group. llunak: std::queue has only front() and back(), which is insufficient for checking only tasks in a…
		MaskRayUnsubmitted Not Done Reply Inline Actions Prefer using MaskRay: Prefer using
		TaskQueue;
		aganeaUnsubmitted Not Done Reply Inline Actions This is a bit confusing, we already have a `llvm::TaskQueue`, can we change this to something else? Just `Queue` maybe? aganea: This is a bit confusing, we already have a `llvm::TaskQueue`, can we change this to something…
		MaskRayUnsubmitted Done Reply Inline Actions We can even omit the type alias. I think this is only used once for the member variable. MaskRay: We can even omit the type alias. I think this is only used once for the member variable.

/// Helpers to create a promise and a callable wrapper of \p Task that sets		/// Helpers to create a promise and a callable wrapper of \p Task that sets
/// the result of the promise. Returns the callable and a future to access the		/// the result of the promise. Returns the callable and a future to access the
/// result.		/// result.
template <typename ResTy>		template <typename ResTy>
static std::pair<std::function<void()>, std::future<ResTy>>		static std::pair<std::function<void()>, std::future<ResTy>>
createTaskAndFuture(std::function<ResTy()> Task) {		createTaskAndFuture(std::function<ResTy()> Task) {
std::shared_ptr<std::promise<ResTy>> Promise =		std::shared_ptr<std::promise<ResTy>> Promise =
std::make_shared<std::promise<ResTy>>();		std::make_shared<std::promise<ResTy>>();
Show All 9 Lines	createTaskAndFuture(std::function<void()> Task) {
auto F = Promise->get_future();		auto F = Promise->get_future();
return {[Promise = std::move(Promise), Task]() {		return {[Promise = std::move(Promise), Task]() {
Task();		Task();
Promise->set_value();		Promise->set_value();
},		},
std::move(F)};		std::move(F)};
}		}

bool workCompletedUnlocked() { return !ActiveThreads && Tasks.empty(); }		/// Returns true if all tasks in the given group have finished (nullptr means
		/// all tasks regardless of their group).
		bool workCompletedUnlocked(ThreadPoolTaskGroup *Group) const;
		mehdi_aminiUnsubmitted Done Reply Inline Actions Can you document it please? mehdi_amini: Can you document it please?

/// Asynchronous submission of a task to the pool. The returned future can be		/// Asynchronous submission of a task to the pool. The returned future can be
/// used to wait for the task to finish and is non-blocking on destruction.		/// used to wait for the task to finish and is non-blocking on destruction.
template <typename ResTy>		template <typename ResTy>
std::shared_future<ResTy> asyncImpl(std::function<ResTy()> Task) {		std::shared_future<ResTy> asyncImpl(std::function<ResTy()> Task,
		ThreadPoolTaskGroup *Group) {

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS
/// Wrap the Task in a std::function<void()> that sets the result of the		/// Wrap the Task in a std::function<void()> that sets the result of the
/// corresponding future.		/// corresponding future.
auto R = createTaskAndFuture(Task);		auto R = createTaskAndFuture(Task);

int requestedThreads;		int requestedThreads;
{		{
// Lock the queue and push the new task		// Lock the queue and push the new task
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);

// Don't allow enqueueing after disabling the pool		// Don't allow enqueueing after disabling the pool
assert(EnableFlag && "Queuing a thread during ThreadPool destruction");		assert(EnableFlag && "Queuing a thread during ThreadPool destruction");
Tasks.push(std::move(R.first));		Tasks.push_back(std::make_pair(std::move(R.first), Group));
		MaskRayUnsubmitted Done Reply Inline Actions Use `emplace_back` MaskRay: Use `emplace_back`
requestedThreads = ActiveThreads + Tasks.size();		requestedThreads = ActiveThreads + Tasks.size();
}		}
QueueCondition.notify_one();		QueueCondition.notify_one();
grow(requestedThreads);		grow(requestedThreads);
return R.second.share();		return R.second.share();

#else // LLVM_ENABLE_THREADS Disabled		#else // LLVM_ENABLE_THREADS Disabled

// Get a Future with launch::deferred execution using std::async		// Get a Future with launch::deferred execution using std::async
auto Future = std::async(std::launch::deferred, std::move(Task)).share();		auto Future = std::async(std::launch::deferred, std::move(Task)).share();
// Wrap the future so that both ThreadPool::wait() can operate and the		// Wrap the future so that both ThreadPool::wait() can operate and the
// returned future can be sync'ed on.		// returned future can be sync'ed on.
Tasks.push([Future]() { Future.get(); });		Tasks.push_back(std::make_pair([Future]() { Future.get(); }, Group));
		MaskRayUnsubmitted Done Reply Inline Actions Use `emplace_back` MaskRay: Use `emplace_back`
return Future;		return Future;
#endif		#endif
}		}

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS
// Grow to ensure that we have at least `requested` Threads, but do not go		// Grow to ensure that we have at least `requested` Threads, but do not go
// over MaxThreadCount.		// over MaxThreadCount.
void grow(int requested);		void grow(int requested);

		void processTasks(ThreadPoolTaskGroup *WaitingForGroup);
#endif		#endif

/// Threads in flight		/// Threads in flight
std::vector<llvm::thread> Threads;		std::vector<llvm::thread> Threads;
/// Lock protecting access to the Threads vector.		/// Lock protecting access to the Threads vector.
mutable std::mutex ThreadsLock;		mutable std::mutex ThreadsLock;

/// Tasks waiting for execution in the pool.		/// Tasks waiting for execution in the pool.
std::queue<std::function<void()>> Tasks;		TaskQueue Tasks;

/// Locking and signaling for accessing the Tasks queue.		/// Locking and signaling for accessing the Tasks queue.
std::mutex QueueLock;		std::mutex QueueLock;
std::condition_variable QueueCondition;		std::condition_variable QueueCondition;

/// Signaling for job completion		/// Signaling for job completion (all tasks or all tasks in a group).
std::condition_variable CompletionCondition;		std::condition_variable CompletionCondition;
		clayborgUnsubmitted Not Done Reply Inline Actions Is this needed now that we have the TaskGroup objects? Or does this get signaled when ever all TaskGroups complete all of their work? Maybe update the documentation stating this is used for ThreadPool::wait() only for when all work is done? clayborg: Is this needed now that we have the TaskGroup objects? Or does this get signaled when ever all…
		llunakAuthorUnsubmitted Done Reply Inline Actions Yes, it is still needed. TaskGroup are dumb IDs, so they change nothing about this. This gets signalled when either all tasks in a group get finished, or when all tasks get finished (I'll add this to the description). llunak: Yes, it is still needed. TaskGroup are dumb IDs, so they change nothing about this. This gets…

/// Keep track of the number of thread actually busy		/// Keep track of the number of thread actually busy
unsigned ActiveThreads = 0;		unsigned ActiveThreads = 0;
		/// Number of threads active for tasks in the given group (only non-zero).
		DenseMap<ThreadPoolTaskGroup *, unsigned> ActiveGroups;

#if LLVM_ENABLE_THREADS // avoids warning for unused variable		#if LLVM_ENABLE_THREADS // avoids warning for unused variable
/// Signal for the destruction of the pool, asking thread to exit.		/// Signal for the destruction of the pool, asking thread to exit.
bool EnableFlag = true;		bool EnableFlag = true;
#endif		#endif

const ThreadPoolStrategy Strategy;		const ThreadPoolStrategy Strategy;

/// Maximum number of threads to potentially grow this pool to.		/// Maximum number of threads to potentially grow this pool to.
const unsigned MaxThreadCount;		const unsigned MaxThreadCount;
};		};

		/// A group of tasks to be run on a thread pool. Thread pool tasks in different
		/// groups can run on the same threadpool but can be waited for separately.
		class ThreadPoolTaskGroup {
		public:
		/// The ThreadPool argument is the thread pool to forward calls to.
		ThreadPoolTaskGroup(ThreadPool &Pool) : Pool(Pool) {}
		/// Blocking destructor: will wait for all the tasks in the group to complete
		/// by calling ThreadPool::wait().
		~ThreadPoolTaskGroup() { wait(); }
		/// Calls ThreadPool::async() for this group.
		template <typename Function, typename... Args>
		aganeaUnsubmitted Done Reply Inline Actions Would you mind please inserting a line break here, and after each other function below, just to match the style in this file? aganea: Would you mind please inserting a line break here, and after each other function below, just to…
		inline auto async(Function &&F, Args &&...ArgList) {
		return Pool.async(*this, std::forward<Function>(F),
		std::forward<Args>(ArgList)...);
}		}
		/// Calls ThreadPool::wait() for this group.
		void wait() { Pool.wait(*this); }

		private:
		ThreadPool &Pool;
		};

		} // namespace llvm

#endif // LLVM_SUPPORT_THREADPOOL_H		#endif // LLVM_SUPPORT_THREADPOOL_H

llvm/lib/Support/ThreadPool.cpp

Show All 18 Lines
#else		#else
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#endif		#endif

using namespace llvm;		using namespace llvm;

#if LLVM_ENABLE_THREADS		#if LLVM_ENABLE_THREADS

		// A note on thread groups: Tasks are by default in no group (represented
		// by nullptr ThreadPoolTaskGroup pointer in the Tasks queue) and functionality
		// here normally works on all tasks regardless of their group (functions
		// in that case receive nullptr ThreadPoolTaskGroup pointer as argument).
		// A task in a group has a pointer to that ThreadPoolTaskGroup in the Tasks queue,
		// and functions called to work only on tasks from one group take that pointer.

ThreadPool::ThreadPool(ThreadPoolStrategy S)		ThreadPool::ThreadPool(ThreadPoolStrategy S)
: Strategy(S), MaxThreadCount(S.compute_thread_count()) {}		: Strategy(S), MaxThreadCount(S.compute_thread_count()) {}

void ThreadPool::grow(int requested) {		void ThreadPool::grow(int requested) {
std::unique_lock<std::mutex> LockGuard(ThreadsLock);		std::unique_lock<std::mutex> LockGuard(ThreadsLock);
if (Threads.size() >= MaxThreadCount)		if (Threads.size() >= MaxThreadCount)
return; // Already hit the max thread pool size.		return; // Already hit the max thread pool size.
int newThreadCount = std::min<int>(requested, MaxThreadCount);		int newThreadCount = std::min<int>(requested, MaxThreadCount);
while (static_cast<int>(Threads.size()) < newThreadCount) {		while (static_cast<int>(Threads.size()) < newThreadCount) {
int ThreadID = Threads.size();		int ThreadID = Threads.size();
Threads.emplace_back([this, ThreadID] {		Threads.emplace_back([this, ThreadID] {
Strategy.apply_thread_strategy(ThreadID);		Strategy.apply_thread_strategy(ThreadID);
		processTasks(nullptr);
		});
		}
		}

		// WaitingForGroup == nullptr means all tasks regardless of their group.
		void ThreadPool::processTasks(ThreadPoolTaskGroup *WaitingForGroup) {
while (true) {		while (true) {
		aganeaUnsubmitted Done Reply Inline Actions Shouldn't this be a stack since we're re-entering `processTasks()`? The following might not assert: int main() { ThreadPool TP{hardware_concurrency(1)}; ThreadPoolTaskGroup G1(TP); ThreadPoolTaskGroup G2(TP); G1.async([]{ G2.wait(); // commenting this line would assert below. G1.wait(); // will deadlock without assert if the line above is there. }); G2.async([]{ // nop }); return 0; } aganea: Shouldn't this be a stack since we're re-entering `processTasks()`? The following might not…
std::function<void()> Task;		std::function<void()> Task;
		clayborgUnsubmitted Not Done Reply Inline Actions initialize to nullptr clayborg: initialize to nullptr
		llunakAuthorUnsubmitted Not Done Reply Inline Actions It gets set on all paths (=only one) before it's used, and if it wouldn't, then initializing it here would prevent a warning about the mistake from -Wsometimes-uninitialized. llunak: It gets set on all paths (=only one) before it's used, and if it wouldn't, then initializing it…
		ThreadPoolTaskGroup *GroupOfTask;
{		{
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);
// Wait for tasks to be pushed in the queue		// Wait for tasks to be pushed in the queue
QueueCondition.wait(LockGuard,		QueueCondition.wait(LockGuard, [&] {
[&] { return !EnableFlag \|\| !Tasks.empty(); });		return !EnableFlag \|\| !Tasks.empty() \|\|
		(WaitingForGroup != nullptr &&
		workCompletedUnlocked(WaitingForGroup));
		});
// Exit condition		// Exit condition
if (!EnableFlag && Tasks.empty())		if (!EnableFlag && Tasks.empty())
return;		return;
		if (WaitingForGroup != nullptr && workCompletedUnlocked(WaitingForGroup))
		return;
// Yeah, we have a task, grab it and release the lock on the queue		// Yeah, we have a task, grab it and release the lock on the queue

// We first need to signal that we are active before popping the queue		// We first need to signal that we are active before popping the queue
// in order for wait() to properly detect that even if the queue is		// in order for wait() to properly detect that even if the queue is
		MaskRayUnsubmitted Done Reply Inline Actions `workCompletedUnlocked(WaitingForGroup)` is slow when WaitingForGroup is not null. You may check the inverse of other conditions or cache the result of `workCompletedUnlocked(WaitingForGroup)` MaskRay: `workCompletedUnlocked(WaitingForGroup)` is slow when WaitingForGroup is not null. You may…
// empty, there is still a task in flight.		// empty, there is still a task in flight.
++ActiveThreads;		++ActiveThreads;
Task = std::move(Tasks.front());		Task = std::move(Tasks.front().first);
Tasks.pop();		GroupOfTask = Tasks.front().second;
		// Need to count active threads in each group separately, ActiveThreads
		// would never be 0 if waiting for another group inside a wait.
		if (GroupOfTask != nullptr)
		++ActiveGroups[GroupOfTask]; // Increment or set to 1 if new item
		Tasks.pop_front();
}		}
// Run the task we just grabbed		// Run the task we just grabbed
Task();		Task();

bool Notify;		bool Notify;
		bool NotifyGroup;
{		{
// Adjust `ActiveThreads`, in case someone waits on ThreadPool::wait()		// Adjust `ActiveThreads`, in case someone waits on ThreadPool::wait()
std::lock_guard<std::mutex> LockGuard(QueueLock);		std::lock_guard<std::mutex> LockGuard(QueueLock);
--ActiveThreads;		--ActiveThreads;
Notify = workCompletedUnlocked();		if (GroupOfTask != nullptr) {
		auto A = ActiveGroups.find(GroupOfTask);
		if (--(A->second) == 0)
		ActiveGroups.erase(A);
		}
		Notify = workCompletedUnlocked(GroupOfTask);
		NotifyGroup = GroupOfTask != nullptr && Notify;
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Seems like if `GroupOfTask` is non-null you're calling `workCompletedUnlocked` twice, why? mehdi_amini: Seems like if `GroupOfTask` is non-null you're calling `workCompletedUnlocked` twice, why?
		clayborgUnsubmitted Not Done Reply Inline Actions Yeah this seems is should just be: NotifyGroup = GroupOfTask != nullptr; clayborg: Yeah this seems is should just be: ``` NotifyGroup = GroupOfTask != nullptr; ```
		llunakAuthorUnsubmitted Done Reply Inline Actions Because 'Notify' is to be done if work is done for the group or for all tasks (nullptr), while "NotifyGroup' is to be done if the work is done for the group. But I can replace the second call with 'Notify' since it's the same value. llunak: Because 'Notify' is to be done if work is done for the group or for all tasks (nullptr), while…
}		}
// Notify task completion if this is the last active thread, in case		// Notify task completion if this is the last active thread, in case
// someone waits on ThreadPool::wait().		// someone waits on ThreadPool::wait().
if (Notify)		if (Notify)
CompletionCondition.notify_all();		CompletionCondition.notify_all();
		// If this was a task in a group, notify also threads waiting for tasks
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions " notify also threads waiting" ? mehdi_amini: " notify also threads waiting" ?
		// in this function on QueueCondition, to make a recursive wait() return
		mehdi_aminiUnsubmitted Done Reply Inline Actions What does "this function" means here? mehdi_amini: What does "this function" means here?
		// after the group it's been waiting for has finished.
		if (NotifyGroup)
		QueueCondition.notify_all();
}		}
});
}		}

		bool ThreadPool::workCompletedUnlocked(ThreadPoolTaskGroup *Group) const {
		if (Group == nullptr)
		return !ActiveThreads && Tasks.empty();
		return ActiveGroups.count(Group) == 0 &&
		!llvm::any_of(Tasks,
		[Group](const auto &T) { return T.second == Group; });
}		}
		mehdi_aminiUnsubmitted Done Reply Inline Actions Nit: seems like The `find_if(...) == end()` could be replaced by `!llvm::any_of(...)` mehdi_amini: Nit: seems like The `find_if(...) == end()` could be replaced by `!llvm::any_of(...)`

void ThreadPool::wait() {		void ThreadPool::wait() {
// Wait for all threads to complete and the queue to be empty		// Wait for all threads to complete and the queue to be empty
std::unique_lock<std::mutex> LockGuard(QueueLock);		std::unique_lock<std::mutex> LockGuard(QueueLock);
CompletionCondition.wait(LockGuard, [&] { return workCompletedUnlocked(); });		CompletionCondition.wait(LockGuard,
		[&] { return workCompletedUnlocked(nullptr); });
		}

		void ThreadPool::wait(ThreadPoolTaskGroup &Group) {
		// Wait for all threads in the group to complete.
		if (!isWorkerThread()) {
		std::unique_lock<std::mutex> LockGuard(QueueLock);
		CompletionCondition.wait(LockGuard,
		[&] { return workCompletedUnlocked(&Group); });
		return;
		}
		// Handle the case of recursive call from another task in a different group,
		// in which case process tasks while waiting to keep the thread busy and avoid
		// possible deadlock.
		clayborgUnsubmitted Not Done Reply Inline Actions Can we detect recursive calls to wait on the same group here? clayborg: Can we detect recursive calls to wait on the same group here?
		llunakAuthorUnsubmitted Done Reply Inline Actions Not without extra debug variables (see a similar question for wait()). Unless you count a guaranteed deadlock as detecting. llunak: Not without extra debug variables (see a similar question for wait()). Unless you count a…
		processTasks(&Group);
}		}

		aganeaUnsubmitted Not Done Reply Inline Actions One more thing perhaps: tasks `.wait()`-ing will be "suspended" by re-entering `processTasks`. Any remaining tasks in the queue will be scheduled randomly over the waiting task(s), and this could easily create stack-overflows, since the scheduling is non-deterministic (so we could have several tasks waiting, piled on the top of each other). Depending on the stack size per platform, and how much stack each task eats up, this could potentially cause random crashes in production. Probably the less intrusive approach would be to use fibers for suspended tasks. I suppose we could do that as a secondary step after this patch. aganea: One more thing perhaps: tasks `.wait()`-ing will be "suspended" by re-entering `processTasks`.
		llunakAuthorUnsubmitted Done Reply Inline Actions Yes, but this is going to cost some resource no matter what, so it's just a choice of what resource it will be. And note that the recursion will be limited to the number of groups, since waiting loops are not allowed. llunak: Yes, but this is going to cost some resource no matter what, so it's just a choice of what…
		aganeaUnsubmitted Not Done Reply Inline Actions I think it is fine for now, let's see first if this is really a problem in practice. Worst case, it could be fixed by ensuring that the scope calling the `.wait` doesn't hold on too many stack variables. aganea: I think it is fine for now, let's see first if this is really a problem in practice. Worst case…
bool ThreadPool::isWorkerThread() const {		bool ThreadPool::isWorkerThread() const {
std::unique_lock<std::mutex> LockGuard(ThreadsLock);		std::unique_lock<std::mutex> LockGuard(ThreadsLock);
		aganeaUnsubmitted Done Reply Inline Actions I think there's a bug here, it was there before but now we're using `isWorkerThread()` a lot more. If the `ThreadPool` is shutting down, thus `ThreadLock` is locked in `ThreadPool::~ThreadPool()`, then if anyone calls `.wait()` in a managed thread, `isWorkerThread()` would deadlock. Probably better replacing with a `llvm::sys::RWMutex ThreadLock`, and use a `std::shared_lock<>` here instead. aganea: I think there's a bug here, it was there before but now we're using `isWorkerThread()` a lot…
llvm::thread::id CurrentThreadId = llvm::this_thread::get_id();		llvm::thread::id CurrentThreadId = llvm::this_thread::get_id();
for (const llvm::thread &Thread : Threads)		for (const llvm::thread &Thread : Threads)
if (CurrentThreadId == Thread.get_id())		if (CurrentThreadId == Thread.get_id())
return true;		return true;
return false;		return false;
}		}

// The destructor joins all threads, waiting for completion.		// The destructor joins all threads, waiting for completion.
Show All 17 Lines	if (ThreadCount != 1) {
errs() << "Warning: request a ThreadPool with " << ThreadCount		errs() << "Warning: request a ThreadPool with " << ThreadCount
<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";		<< " threads, but LLVM_ENABLE_THREADS has been turned off\n";
}		}
}		}

void ThreadPool::wait() {		void ThreadPool::wait() {
// Sequential implementation running the tasks		// Sequential implementation running the tasks
while (!Tasks.empty()) {		while (!Tasks.empty()) {
auto Task = std::move(Tasks.front());		auto Task = std::move(Tasks.front().first);
Tasks.pop();		Tasks.pop_front();
Task();		Task();
}		}
}		}

		void ThreadPool::wait(ThreadPoolTaskGroup &) {
		// Simply wait for all, this works even if recursive (the running task
		// is already removed from the queue).
		wait();
		}

bool ThreadPool::isWorkerThread() const {		bool ThreadPool::isWorkerThread() const {
report_fatal_error("LLVM compiled without multithreading");		report_fatal_error("LLVM compiled without multithreading");
}		}

ThreadPool::~ThreadPool() { wait(); }		ThreadPool::~ThreadPool() { wait(); }

#endif		#endif

llvm/tools/llvm-profdata/llvm-profdata.cpp

	Show All 32 Lines
	#include "llvm/Support/InitLLVM.h"			#include "llvm/Support/InitLLVM.h"
	#include "llvm/Support/MemoryBuffer.h"			#include "llvm/Support/MemoryBuffer.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
	#include "llvm/Support/ThreadPool.h"			#include "llvm/Support/ThreadPool.h"
	#include "llvm/Support/Threading.h"			#include "llvm/Support/Threading.h"
	#include "llvm/Support/WithColor.h"			#include "llvm/Support/WithColor.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <algorithm>			#include <algorithm>
				#include <queue>
				aganeaUnsubmitted Not Done Reply Inline Actions Is this related to the current patch? aganea: Is this related to the current patch?
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I suspect this file was depending on this header transitively and the include was removed from the ThreadPool.h header. mehdi_amini: I suspect this file was depending on this header transitively and the include was removed from…
				aganeaUnsubmitted Not Done Reply Inline Actions Ah, good catch, thanks :-) aganea: Ah, good catch, thanks :-)
				llunakAuthorUnsubmitted Done Reply Inline Actions Yes. llunak: Yes.

	using namespace llvm;			using namespace llvm;

	enum ProfileFormat {			enum ProfileFormat {
	PF_None = 0,			PF_None = 0,
	PF_Text,			PF_Text,
	PF_Compact_Binary,			PF_Compact_Binary,
	PF_Ext_Binary,			PF_Ext_Binary,
	▲ Show 20 Lines • Show All 2,680 Lines • Show Last 20 Lines

llvm/unittests/Support/ThreadPool.cpp

Show All 12 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"

		#include <chrono>

#include "gtest/gtest.h"		#include "gtest/gtest.h"

using namespace llvm;		using namespace llvm;

// Fixture for the unittests, allowing to temporarily disable the unittests		// Fixture for the unittests, allowing to temporarily disable the unittests
// on a particular platform		// on a particular platform
class ThreadPoolTest : public testing::Test {		class ThreadPoolTest : public testing::Test {
Triple Host;		Triple Host;
SmallVector<Triple::ArchType, 4> UnsupportedArchs;		SmallVector<Triple::ArchType, 4> UnsupportedArchs;
SmallVector<Triple::OSType, 4> UnsupportedOSs;		SmallVector<Triple::OSType, 4> UnsupportedOSs;
SmallVector<Triple::EnvironmentType, 1> UnsupportedEnvironments;		SmallVector<Triple::EnvironmentType, 1> UnsupportedEnvironments;

protected:		protected:
// This is intended for platform as a temporary "XFAIL"		// This is intended for platform as a temporary "XFAIL"
bool isUnsupportedOSOrEnvironment() {		bool isUnsupportedOSOrEnvironment() {
Triple Host(Triple::normalize(sys::getProcessTriple()));		Triple Host(Triple::normalize(sys::getProcessTriple()));

if (find(UnsupportedEnvironments, Host.getEnvironment()) !=		if (find(UnsupportedEnvironments, Host.getEnvironment()) !=
UnsupportedEnvironments.end())		UnsupportedEnvironments.end())
return true;		return true;
Show All 12 Lines	ThreadPoolTest() {
// UnsupportedArchs.push_back(Triple::x86_64);		// UnsupportedArchs.push_back(Triple::x86_64);

// See https://llvm.org/bugs/show_bug.cgi?id=25829		// See https://llvm.org/bugs/show_bug.cgi?id=25829
UnsupportedArchs.push_back(Triple::ppc64le);		UnsupportedArchs.push_back(Triple::ppc64le);
UnsupportedArchs.push_back(Triple::ppc64);		UnsupportedArchs.push_back(Triple::ppc64);
}		}

/// Make sure this thread not progress faster than the main thread.		/// Make sure this thread not progress faster than the main thread.
void waitForMainThread() {		void waitForMainThread() { waitForPhase(1); }
std::unique_lock<std::mutex> LockGuard(WaitMainThreadMutex);
WaitMainThread.wait(LockGuard, [&] { return MainThreadReady; });
}

/// Set the readiness of the main thread.		/// Set the readiness of the main thread.
void setMainThreadReady() {		void setMainThreadReady() { setPhase(1); }

		/// Wait until given phase is set using setPhase(); first "main" phase is 1.
		/// See also PhaseResetHelper below.
		void waitForPhase(int Phase) {
		std::unique_lock<std::mutex> LockGuard(CurrentPhaseMutex);
		CurrentPhaseCondition.wait(
		LockGuard, [&] { return CurrentPhase == Phase \|\| CurrentPhase < 0; });
		}
		/// If a thread waits on another phase, the test could bail out on a failed
		/// assertion and ThreadPool destructor would wait() on all threads, which
		/// would deadlock on the task waiting. Create this helper to automatically
		/// reset the phase and unblock such threads.
		struct PhaseResetHelper {
		PhaseResetHelper(ThreadPoolTest *test) : test(test) {}
		~PhaseResetHelper() { test->setPhase(-1); }
		ThreadPoolTest *test;
		};

		/// Advance to the given phase.
		void setPhase(int Phase) {
{		{
std::unique_lock<std::mutex> LockGuard(WaitMainThreadMutex);		std::unique_lock<std::mutex> LockGuard(CurrentPhaseMutex);
MainThreadReady = true;		assert(Phase == CurrentPhase + 1 \|\| Phase < 0);
		CurrentPhase = Phase;
}		}
WaitMainThread.notify_all();		CurrentPhaseCondition.notify_all();
}		}

void SetUp() override { MainThreadReady = false; }		void SetUp() override { CurrentPhase = 0; }

std::vector<llvm::BitVector> RunOnAllSockets(ThreadPoolStrategy S);		std::vector<llvm::BitVector> RunOnAllSockets(ThreadPoolStrategy S);

std::condition_variable WaitMainThread;		std::condition_variable CurrentPhaseCondition;
std::mutex WaitMainThreadMutex;		std::mutex CurrentPhaseMutex;
bool MainThreadReady = false;		int CurrentPhase; // -1 = error, 0 = setup, 1 = ready, 2+ = custom
};		};

#define CHECK_UNSUPPORTED() \		#define CHECK_UNSUPPORTED() \
do { \		do { \
if (isUnsupportedOSOrEnvironment()) \		if (isUnsupportedOSOrEnvironment()) \
GTEST_SKIP(); \		GTEST_SKIP(); \
} while (0);		} while (0);

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	for (size_t i = 0; i < 5; ++i) {
});		});
}		}
ASSERT_EQ(0, checked_in);		ASSERT_EQ(0, checked_in);
setMainThreadReady();		setMainThreadReady();
}		}
ASSERT_EQ(5, checked_in);		ASSERT_EQ(5, checked_in);
}		}

		// Check running tasks in different groups.
		TEST_F(ThreadPoolTest, Groups) {
		CHECK_UNSUPPORTED();
		// Need at least two threads, as the task in group2
		// might block a thread until all tasks in group1 finish.
		ThreadPoolStrategy S = hardware_concurrency(2);
		if (S.compute_thread_count() < 2)
		return;
		ThreadPool Pool(S);
		PhaseResetHelper Helper(this);
		ThreadPoolTaskGroup Group1(Pool);
		ThreadPoolTaskGroup Group2(Pool);

		// Check that waiting for an empty group is a no-op.
		Group1.wait();

		std::atomic_int checked_in1{0};
		std::atomic_int checked_in2{0};

		for (size_t i = 0; i < 5; ++i) {
		Group1.async([this, &checked_in1] {
		waitForMainThread();
		++checked_in1;
		});
		}
		Group2.async([this, &checked_in2] {
		waitForPhase(2);
		++checked_in2;
		});
		ASSERT_EQ(0, checked_in1);
		ASSERT_EQ(0, checked_in2);
		// Start first group and wait for it.
		setMainThreadReady();
		Group1.wait();
		ASSERT_EQ(5, checked_in1);
		// Second group has not yet finished, start it and wait for it.
		ASSERT_EQ(0, checked_in2);
		setPhase(2);
		Group2.wait();
		ASSERT_EQ(5, checked_in1);
		ASSERT_EQ(1, checked_in2);
		}

		// Check recursive tasks.
		TEST_F(ThreadPoolTest, RecursiveGroups) {
		CHECK_UNSUPPORTED();
		ThreadPool Pool;
		ThreadPoolTaskGroup Group(Pool);

		std::atomic_int checked_in1{0};

		for (size_t i = 0; i < 5; ++i) {
		Group.async([this, &Pool, &checked_in1] {
		waitForMainThread();

		ThreadPoolTaskGroup LocalGroup(Pool);

		// Check that waiting for an empty group is a no-op.
		LocalGroup.wait();

		std::atomic_int checked_in2{0};
		for (size_t i = 0; i < 5; ++i) {
		LocalGroup.async([&checked_in2] { ++checked_in2; });
		}
		LocalGroup.wait();
		ASSERT_EQ(5, checked_in2);

		++checked_in1;
		});
		}
		ASSERT_EQ(0, checked_in1);
		setMainThreadReady();
		Group.wait();
		ASSERT_EQ(5, checked_in1);
		}

		TEST_F(ThreadPoolTest, RecursiveWaitDeadlock) {
		CHECK_UNSUPPORTED();
		ThreadPoolStrategy S = hardware_concurrency(2);
		if (S.compute_thread_count() < 2)
		return;
		ThreadPool Pool(S);
		PhaseResetHelper Helper(this);
		ThreadPoolTaskGroup Group(Pool);

		// Test that a thread calling wait() for a group and is waiting for more tasks
		// returns when the last task finishes in a different thread while the waiting
		// thread was waiting for more tasks to process while waiting.

		// Task A is run in a first thread, it finishes and leaves
		MaskRayUnsubmitted Done Reply Inline Actions Task A runs in the first thread. It MaskRay: Task A runs in the first thread. It
		// the background thread waiting for more tasks.
		Group.async([this] {
		waitForMainThread();
		setPhase(2);
		});
		// Task B is run in a second thread, it launches yet another
		// task C in a different group, which will be handled by the waiting
		// thread started above.
		Group.async([this, &Pool] {
		waitForPhase(2);
		ThreadPoolTaskGroup LocalGroup(Pool);
		LocalGroup.async([this] {
		waitForPhase(3);
		// Give the other thread enough time to check that there's no task
		// to process and suspend waiting for a notification. This is indeed racy,
		// but probably the best that can be done.
		auto start = std::chrono::system_clock::now();
		while (std::chrono::system_clock::now() - start <
		std::chrono::milliseconds(10))
		;
		MaskRayUnsubmitted Done Reply Inline Actions What does this do? Use std::this_thread::sleep_for? MaskRay: What does this do? Use std::this_thread::sleep_for?
		});
		// And task B only now will wait for the tasks in the group (=task C)
		// to finish. This test checks that it does not deadlock. If the
		// `NotifyGroup` handling in ThreadPool::processTasks() didn't take place,
		// this task B would be stuck waiting for tasks to arrive.
		setPhase(3);
		LocalGroup.wait();
		});
		setMainThreadReady();
		Group.wait();
		}

#if LLVM_ENABLE_THREADS == 1		#if LLVM_ENABLE_THREADS == 1

// FIXME: Skip some tests below on non-Windows because multi-socket systems		// FIXME: Skip some tests below on non-Windows because multi-socket systems
// were not fully tested on Unix yet, and llvm::get_thread_affinity_mask()		// were not fully tested on Unix yet, and llvm::get_thread_affinity_mask()
// isn't implemented for Unix (need AffinityMask in Support/Unix/Program.inc).		// isn't implemented for Unix (need AffinityMask in Support/Unix/Program.inc).
#ifdef _WIN32		#ifdef _WIN32

std::vector<llvm::BitVector>		std::vector<llvm::BitVector>
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ThreadPool] add ability to group tasks into separate groupsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 423244

llvm/include/llvm/Support/ThreadPool.h

llvm/lib/Support/ThreadPool.cpp

llvm/tools/llvm-profdata/llvm-profdata.cpp

llvm/unittests/Support/ThreadPool.cpp

[ThreadPool] add ability to group tasks into separate groups
ClosedPublic