This is an archive of the discontinued LLVM Phabricator instance.

[ThreadPool] add ability to group tasks into separate groups
ClosedPublic

Authored by llunak on Apr 6 2022, 8:29 AM.

Details

Summary

This is needed for parallelizing of loading modules symbols in LLDB (D122975). Currently LLDB can parallelize indexing symbols when loading a module, but modules are loaded sequentially. If LLDB index cache is enabled, this means that the cache loading is not parallelized, even though it could. However doing that creates a threadpool-within-threadpool situation, so the number of threads would not be properly limited.

This change adds ThreadPoolTaskGroup as a tag type that can be used with ThreadPool calls to put tasks into groups that can be independently waited for (even recursively from within a task) but still run in the same thread pool.

Diff Detail

Event Timeline

llunak created this revision.Apr 6 2022, 8:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 8:29 AM
llunak requested review of this revision.Apr 6 2022, 8:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 8:29 AM
llunak updated this revision to Diff 420890.Apr 6 2022, 8:35 AM
aganea added a subscriber: aganea.

Not sure @chandlerc will be able to review your patch, adding some people that have been working in this space recently.

I think this is an interesting change, but I'm a bit worried that it adds complexity to the the thread task loop. I am wondering if this problem couldn't be solved by packaging the TaskGroup logic in a lambda. In essence the call stack would be: the thread loop in ThreadPool::grow() calls Task() which would call the TaskGroup logic lambda, which would call the user lambda. Regular non-TaskGroup would not go through that logic.

llvm/include/llvm/Support/ThreadPool.h
52

Can you move this above the ThreadPool? It'll be easier for future code readers I think.

61

Do you think we could have all these TaskGroup-specific functions inside the TaskGroup class instead? As an alternative, given your current usage, the tasks could be even queued in a std::vector prior, and passed to the TaskGroup constructor.

llvm/tools/llvm-profdata/llvm-profdata.cpp
41

Is this related to the current patch?

I like the overall direction of this patch. I do see value in aganea's comments about possibly adding more methods to TaskGroup

llvm/include/llvm/Support/ThreadPool.h
61

I do like that idea, but if we do that it might be best to have TaskGroup take a "ThreadPool &" as a constructor argument so that it can ensure we always use the right ThreadPool if we ask the TaskGroup to wait.

Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users keep adding things to the TaskGroup even if work is currently going on? Should we freeze the contents of a TaskGroup once we start waiting on this?

That's a pretty nice improvement! :)

Reading the patch, I'm not sure I have a good grasp about all the interactions of this with the existing invariants of this class: concurrency is always complex...

In essence the call stack would be: the thread loop in ThreadPool::grow() calls Task() which would call the TaskGroup logic lambda, which would call the user lambda. Regular non-TaskGroup would not go through that logic.

I don't quite get what you mean? Can you elaborate a bit?
I agree that the complexity increase is worrisome, so any idea to manage it is welcome :)

llvm/include/llvm/Support/ThreadPool.h
89

Do we have a way to assert on this?

101

I think you need more documentation for the groups, at minima:

  • in particular ownership model / lifetime expectation for the TaskGroup objects.
  • the existing APIs that don't take group should like be updated to be documented with respect to groups (is there a concept of a "default group" represented by the nullptr?).
103

Why the change to deque in this patch?

131–133

Can you document it please?

llvm/lib/Support/ThreadPool.cpp
98

Seems like if GroupOfTask is non-null you're calling workCompletedUnlocked twice, why?

104

" notify also threads waiting" ?

105

What does "this function" means here?

118

Nit: seems like The find_if(...) == end() could be replaced by !llvm::any_of(...)

llvm/tools/llvm-profdata/llvm-profdata.cpp
41

I suspect this file was depending on this header transitively and the include was removed from the ThreadPool.h header.

aganea added a comment.Apr 7 2022, 5:42 AM

In essence the call stack would be: the thread loop in ThreadPool::grow() calls Task() which would call the TaskGroup logic lambda, which would call the user lambda. Regular non-TaskGroup would not go through that logic.

I don't quite get what you mean? Can you elaborate a bit?
I agree that the complexity increase is worrisome, so any idea to manage it is welcome :)

I'm just saying that we shouldn't be modifying ThreadPool::grow and instead implement the TaskGroup logic into a intermediate function, if that's possible. Instead of this currently:

template <typename Func>
auto async(TaskGroup &Group, Func &&F) -> std::shared_future<decltype(F())> {
  return asyncImpl(std::function<decltype(F())()>(std::forward<Func>(F)),
                   &Group);
}

instead we could have this:

template <typename Func>
auto TaskGroup::async(TaskGroup &Group, Func &&F) -> std::shared_future<decltype(F())> {
  return asyncImpl([]() {
     // ...TaskGroup logic...
     F();
     // ...TaskGroup logic... (for example, notify the local TaskGroup condition when all group tasks are done)
  }, &Group);
}

thus it makes sense to make async() and wait() members of TaskGroup and store the necessary state in the TaskGroup object itself (such a task counter -- currently stored by DenseMap<TaskGroup *, unsigned> ActiveGroups, a condition variable, etc.).
I feel the TaskGroup's logic is a subset of the generalized case we already have.

llvm/include/llvm/Support/ThreadPool.h
61

@clayborg Right now the freezing is implicit through the use of a condition_variable & a mutex, see llvm/lib/Support/ThreadPool.cpp, L124. It is a interesting question, should we make it support dynamic additions on the fly, while waiting? I would be tempted to wait for a practical use-case, but perhaps there's already one?

llvm/tools/llvm-profdata/llvm-profdata.cpp
41

Ah, good catch, thanks :-)

llunak added a comment.Apr 7 2022, 9:13 AM

Let's first deal with the conceptual stuff, no point in dealing with the small code things as long as there's not agreement that this is the way to implement it.

Not sure @chandlerc will be able to review your patch, adding some people that have been working in this space recently.

I used what CODE_OWNERS.txt lists for 'Support'.

I'm just saying that we shouldn't be modifying ThreadPool::grow and instead implement the TaskGroup logic into a intermediate function, if that's possible.

I think that would be possible only with non-recursive use of ThreadPool. But D122975 requires creating running tasks that themselves run tasks and wait for them, which requires two things:

  • Waiting just for a specific subset of tasks, otherwise a task could deadlock waiting for itself. This requires the queue-processing code to signal such state.
  • Not wasting thread pool slots on waiting, otherwise they all could end up waiting for tasks that wouldn't have slots to run and deadlock. My approach handles that by letting such threads process tasks while waiting for a group. Another approach could be launching additional temporary threads, but I don't see how that would make anything simpler or better.

I don't see how either of these could be done without altering the queue-processing code in ThreadPool itself. So unless you have a good idea there, I think the most I can move to TaskGroup is syntactic sugar.

clayborg added inline comments.Apr 7 2022, 10:06 AM
llvm/include/llvm/Support/ThreadPool.h
101

That would be good. Maybe add headerdoc before the TaskGroup class. If we end up making the TaskGroup constructor take a reference to the ThreadPool, we should mention that the ThreadPool must outlive the TaskGroup as far as lifetime goes. More documentation or examples would be nice for:

  • Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread
  • details on if you can add to a group while work is ongoing for that group
103

This is the new code that is adding the ability to run work in the groups

192

Is this needed now that we have the TaskGroup objects? Or does this get signaled when ever all TaskGroups complete all of their work? Maybe update the documentation stating this is used for ThreadPool::wait() only for when all work is done?

llvm/lib/Support/ThreadPool.cpp
54

initialize to nullptr

98

Yeah this seems is should just be:

NotifyGroup = GroupOfTask != nullptr;
135–137

Can we detect recursive calls to wait on the same group here?

clayborg added inline comments.Apr 7 2022, 10:09 AM
llvm/include/llvm/Support/ThreadPool.h
103

Ignore my comment, I see that this type was used below where a queue was being used,.

llunak marked 4 inline comments as done.Apr 16 2022, 10:46 AM
llunak added inline comments.
llvm/include/llvm/Support/ThreadPool.h
61

As an alternative, given your current usage, the tasks could be even queued in a std::vector prior, and passed to the TaskGroup constructor.

Why? That seems unnecessary and inconsistent with how ThreadPool is used now.

61

Do we need to lock down the task group once "wait()" is called with a TaskGroup or can users keep adding things to the TaskGroup even if work is currently going on? Should we freeze the contents of a TaskGroup once we start waiting on this?

I see no reason to do it differently from ThreadPool, and ThreadPool currently says it's an error to add tasks while waiting but then does nothing about it (and AFAICT it's actually an unnecessary restriction).

89

Not without extra variables added just for detecting that. Is that something that would be done in LLVM code?

101

I'm confused about this part about documentation, because I think all of this is either documented or obvious. Did you miss those or are they not as obvious as I find them to be? It seems to me that you expect this change to be more complex than it is - it's just an ability to tag tasks with a TaskGroup pointer and then wait on tasks with a specific tag. The only somewhat complex thing here is making sure wait() called from within a task doesn't not deadlock, and that's not an API thing.

  • in particular ownership model / lifetime expectation for the TaskGroup objects.

The TaskGroup objects are passed by reference, so I don't see how there could be any ownership. And TaskGroup objects are groups, so lifetime of TaskGroup objects and lifetime of groups are the same thing.

  • the existing APIs that don't take group should like be updated to be documented with respect to groups (is there a concept of a "default group" represented by the nullptr?).

Tasks without a group are tasks without a group. It really matters only for wait(), which already says that it waits for all threads.

  • Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread

This is done by simply doing it, there's nothing special about it from the API point of view, and wait() documentation says that this is possible. What more documentation would you need?

  • details on if you can add to a group while work is ongoing for that group

The description of wait() already says that no. What other details do you need?

103

std::queue has only front() and back(), which is insufficient for checking only tasks in a specific group.

192

Yes, it is still needed. TaskGroup are dumb IDs, so they change nothing about this. This gets signalled when either all tasks in a group get finished, or when all tasks get finished (I'll add this to the description).

llvm/lib/Support/ThreadPool.cpp
54

It gets set on all paths (=only one) before it's used, and if it wouldn't, then initializing it here would prevent a warning about the mistake from -Wsometimes-uninitialized.

98

Because 'Notify' is to be done if work is done for the group or for all tasks (nullptr), while "NotifyGroup' is to be done if the work is done for the group. But I can replace the second call with 'Notify' since it's the same value.

135–137

Not without extra debug variables (see a similar question for wait()). Unless you count a guaranteed deadlock as detecting.

llvm/tools/llvm-profdata/llvm-profdata.cpp
41

Yes.

llunak updated this revision to Diff 423244.Apr 16 2022, 10:47 AM
llunak edited the summary of this revision. (Show Details)

Small tweaks based on feedback.
Changed ThreadPool::TaskGroup to standalone ThreadPoolTaskGroup that has trivial calls forwarding to ThreadPool functions.

I like the change to move some of the API on the group themselves! In particular waiting in the destructor makes it "safer" to me (can't have dangling group pointers in the map in the pool)

llvm/include/llvm/Support/ThreadPool.h
38

Adding a mention of the concept of groups here would be valuable as well I think.

89

Yes it is common done, we guard such code in header within #if LLVM_ENABLE_ABI_BREAKING_CHECKS, you can grep the code base for examples (here is one: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/LoopInfo.h#L83-L87 )

101

I'm confused about this part about documentation, because I think all of this is either documented or obvious. Did you miss those or are they not as obvious as I find them to be?

I think there is always a natural tendency to find things "obvious" when we design some thing with a mental model and other assumptions in mind. But someone coming there new with fresh eyes won't find things as obvious.

in particular ownership model / lifetime expectation for the TaskGroup objects.
The TaskGroup objects are passed by reference, so I don't see how there could be any ownership. And TaskGroup objects are groups, so lifetime of TaskGroup objects and lifetime of groups are the same thing.

Right but that has implication on the lifetime: for example you should destroy a group while there are still tasks in flight using this group. I think this particular point isn't a concern anymore now that you call wait() in the ThreadPoolTaskGroup destructor.

Making a TaskGroup 1 and then during work for TaskGroup 1 creating a TaskGroup 2 and then waiting on that within a worker thread
This is done by simply doing it, there's nothing special about it from the API point of view, and wait() documentation says that this is possible. What more documentation would you need?

Waiting within a worker thread wasn't allowed until now: it is again more of a question of forming a mental model about the system.
Adding this concept of groups (which I welcome: it is solving a major limitation of the pool) is really making the existing model of a simple ThreadPool no longer as "simple", hence why I'm may seem overly cautious :)

Some things can be surprising as well for someone who does not really know or think about the implementation details but something like this:

group1.async([&]() {
  auto f = group2.async([&]() {
    return 1;
  });
  f.wait(); // May deadlock
})

but:

group1.async([&]() {
  auto f = group2.async([&]() {
    return 1;
  });
  group2.wait();  // yield current thread to run tasks from group2 if needed
  f.wait(); // May not deadlock
})

Not that I have a great suggestion on how to express the system and its invariant concisely, but this isn't easy to figure all of this out without reading carefully the implementation right now.
(and to be fair, even the "simple" pre-existing model isn't carefully documented in the API)

llunak marked an inline comment as done.Apr 18 2022, 10:29 AM
llunak added inline comments.
llvm/include/llvm/Support/ThreadPool.h
89

I see. I've added an assert for self-wait, but I actually see no good reason to enforce that there can be only one wait() for a group, so I'll instead remove that comment.

101

I think the concept is still fairly simple, as long as one doesn't forget that waiting for oneself will deadlock. But I have added some comments saying this explicitly, I hope that's enough.

llunak updated this revision to Diff 423425.Apr 18 2022, 10:32 AM

Added asserts that wait() will not deadlock waiting for itself.
Added more documentation about deadlocks and usage from within threadpool's threads.

LGTM, but please wait for @aganea to have another look at it.

mehdi_amini accepted this revision.Apr 18 2022, 10:35 AM
This revision is now accepted and ready to land.Apr 18 2022, 10:35 AM

Looks great, thanks for implementing many changes!

MaskRay added inline comments.Apr 18 2022, 9:28 PM
llvm/include/llvm/Support/ThreadPool.h
103

Prefer using

153

Use emplace_back

166

Use emplace_back

llvm/unittests/Support/ThreadPool.cpp
307

Task A runs in the first thread. It

MaskRay added inline comments.Apr 18 2022, 9:46 PM
llvm/include/llvm/Support/ThreadPool.h
61

The inline keyword can be removed.

llvm/lib/Support/ThreadPool.cpp
72

workCompletedUnlocked(WaitingForGroup) is slow when WaitingForGroup is not null. You may check the inverse of other conditions or cache the result of workCompletedUnlocked(WaitingForGroup)

llvm/unittests/Support/ThreadPool.cpp
327

What does this do? Use std::this_thread::sleep_for?

aganea added inline comments.Apr 19 2022, 12:13 PM
llvm/include/llvm/Support/ThreadPool.h
104

This is a bit confusing, we already have a llvm::TaskQueue, can we change this to something else? Just Queue maybe?

220

Would you mind please inserting a line break here, and after each other function below, just to match the style in this file?

llvm/lib/Support/ThreadPool.cpp
53

Shouldn't this be a stack since we're re-entering processTasks()? The following might not assert:

int main() {
  ThreadPool TP{hardware_concurrency(1)};
  ThreadPoolTaskGroup G1(TP);
  ThreadPoolTaskGroup G2(TP);
  G1.async([]{
    G2.wait(); // commenting this line would assert below.
    G1.wait(); // will deadlock without assert if the line above is there.
  });
  G2.async([]{
    // nop
  });
  return 0;
}
142

I think there's a bug here, it was there before but now we're using isWorkerThread() a lot more. If the ThreadPool is shutting down, thus ThreadLock is locked in ThreadPool::~ThreadPool(), then if anyone calls .wait() in a managed thread, isWorkerThread() would deadlock. Probably better replacing with a llvm::sys::RWMutex ThreadLock, and use a std::shared_lock<> here instead.

MaskRay added inline comments.Apr 19 2022, 1:01 PM
llvm/include/llvm/Support/ThreadPool.h
104

We can even omit the type alias. I think this is only used once for the member variable.

Really excited for this! Thanks for taking it on.

aganea added inline comments.Apr 26 2022, 5:49 AM
llvm/lib/Support/ThreadPool.cpp
140

One more thing perhaps: tasks .wait()-ing will be "suspended" by re-entering processTasks. Any remaining tasks in the queue will be scheduled randomly over the waiting task(s), and this could easily create stack-overflows, since the scheduling is non-deterministic (so we could have several tasks waiting, piled on the top of each other). Depending on the stack size per platform, and how much stack each task eats up, this could potentially cause random crashes in production.

Probably the less intrusive approach would be to use fibers for suspended tasks. I suppose we could do that as a secondary step after this patch.

llunak marked 10 inline comments as done.Apr 30 2022, 11:15 PM
llunak added inline comments.
llvm/lib/Support/ThreadPool.cpp
140

Yes, but this is going to cost some resource no matter what, so it's just a choice of what resource it will be. And note that the recursion will be limited to the number of groups, since waiting loops are not allowed.

llunak updated this revision to Diff 426268.Apr 30 2022, 11:16 PM

Updated according to review comments.

aganea accepted this revision.May 2 2022, 5:22 AM

LGTM, thanks for all the changes @llunak!

llvm/lib/Support/ThreadPool.cpp
140

I think it is fine for now, let's see first if this is really a problem in practice. Worst case, it could be fixed by ensuring that the scope calling the .wait doesn't hold on too many stack variables.

MaskRay accepted this revision.May 2 2022, 2:08 PM

Looks great, really looking forward to seeing this go in!

llunak added a comment.May 3 2022, 3:55 AM

The "Build Status" here lists a failure, but I cannot reproduce any test failure locally, and in the remote log I do not see any test failure, it looks like the testsuite itself failing. Is that something I should ignore, or am I missing something?

llvm-lit: /var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/discovery.py:247: warning: test suite 'lld-Unit' contained no tests

llvm-lit: /var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/llvm/config.py:438: note: using clang: /usr/bin/clang

llvm-lit: /var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/llvm/config.py:438: note: using ld.lld: /usr/bin/ld.lld

Traceback (most recent call last):

  File "/var/lib/buildkite-agent/builds/llvm-project/build/./bin/llvm-lit", line 59, in <module>

    main(builtin_parameters)

  File "/var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/main.py", line 111, in main

    selected_tests, discovered_tests = GoogleTest.post_process_shard_results(

  File "/var/lib/buildkite-agent/builds/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 231, in post_process_shard_results

    testsuites = json.load(f)['testsuites']

  File "/usr/lib/python3.9/json/__init__.py", line 293, in load

    return loads(fp.read(),

I'm not sure what's going on, maybe rebase and re-upload a patch?

It may be also safe to ignore, it's not this patch that's responsible for this.

This revision was landed with ongoing or failed builds.May 3 2022, 9:19 PM
This revision was automatically updated to reflect the committed changes.