This is an archive of the discontinued LLVM Phabricator instance.

[Support][Parallel] Add sequential mode to TaskGroup::spawn().
ClosedPublic

Authored by avl on Apr 19 2023, 10:21 AM.

Details

Summary

This patch allows to specify that some part of tasks should be
done in sequential order. It makes it possible to not use
condition operator for separating sequential tasks:

TaskGroup tg;
for () {
  if(condition)      ==>   tg.spawn([]{fn();}, condition)
    fn();
  else
    tg.spawn(, []{fn();});
}

It also prevents execution on main thread. Which allows adding
checks for getThreadIndex() function discussed in D142318.

The patch also replaces std::stack with std::queue in the
ThreadPoolExecutor to have natural execution order in case
(parallel::strategy.ThreadsRequested == 1).

Diff Detail

Event Timeline

avl created this revision.Apr 19 2023, 10:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 10:21 AM
avl requested review of this revision.Apr 19 2023, 10:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 10:21 AM
avl edited the summary of this revision. (Show Details)Apr 19 2023, 10:21 AM
avl edited the summary of this revision. (Show Details)Apr 20 2023, 2:03 AM
MaskRay added inline comments.Apr 21 2023, 9:28 AM
llvm/lib/Support/Parallel.cpp
145

After Lock.unclok(), another thread may grab the lock and run a task concurrently.

llvm/unittests/Support/ParallelTest.cpp
100

llvm style prefers pre-increments ++Count

andrewng added inline comments.Apr 21 2023, 11:36 AM
llvm/lib/Support/Parallel.cpp
16–17

Unfortunately, at least in my testing on Windows, this change causes a drop in performance of around 3-4%. I believe that's why std::stack is used.

104–109

Would this be nicer?

114

Make const?

118

Make const?

145

Yes, this can happen but I believe the key point is that it can only be a thread with a different "index". However, one subtlety is that "sequential" tasks will run on different threads and thus use a different "index".

148

I think that moving this up after the other std::atomic would feel better.

llvm/unittests/Support/ParallelTest.cpp
95–119

I'm not entirely sure what this test case adds in terms of coverage that isn't covered by the test case that follows.

andrewng added inline comments.Apr 21 2023, 11:42 AM
llvm/lib/Support/Parallel.cpp
140–143

Not entirely sure this is better but does remove the code duplication.

avl added inline comments.Apr 21 2023, 2:19 PM
llvm/lib/Support/Parallel.cpp
16–17

Unfortunately, at least in my testing on Windows, this change causes a drop in performance of around 3-4%. I believe that's why std::stack is used.

will check it.

145

After Lock.unclok(), another thread may grab the lock and run a task concurrently.

145

After Lock.unclok(), another thread may grab the lock and run a task concurrently.

After Lock.unlock(); the SequentialQueueIsLocked is true. Thus another thread could not start new sequential task until SequentialQueueIsLocked becomes false. SequentialQueueIsLocked set to false when sequential task is finished. Thus there could not be two sequential tasks executing in parallel.

145

Yes, this can happen but I believe the key point is that it can only be a thread with a different "index". However, one subtlety is that "sequential" tasks will run on different threads and thus use a different "index".

Yes, "sequential" tasks may be run on different threads and use a different "index". I supposed that this is not a problem.

llvm/unittests/Support/ParallelTest.cpp
95–119

I'm not entirely sure what this test case adds in terms of coverage that isn't covered by the test case that follows.

The idea was to test a case when previous task is already executed before new .spawn() is called.
Will remove if it seems redundant.

avl added inline comments.Apr 22 2023, 12:35 PM
llvm/lib/Support/Parallel.cpp
16–17

Unfortunately, at least in my testing on Windows, this change causes a drop in performance of around 3-4%. I believe that's why std::stack is used.

yes, there is small performance degradation in case std::queue is used instead of std::stack.
But, to have it noticeable, there should be created very large amount of tasks.
For the 100000000 tasks the std::queue works 8sec slower:

llvm::parallel::TaskGroup tg;
for (size_t Idx = 0; Idx < 100000000; Idx++) {
  tg.spawn([](){});
}

                        Time(sec)  Memory(kB)

queue TaskGroup          121.28    441568

stack TaskGroup          113.61    471820

Generally, it is a bad idea to create such amount of tasks.
parallelFor has optimization to not create a lot of tasks.
There is no any performance difference if parallelFor is used:

parallelFor(0,100000000,[](size_t){});

                        Time(sec)  Memory(kB)

queue parallelFor          0.03    9128

stack parallelFor          0.03    9128

So, probably using std::queue is OK.

Or, Dou you have a test case when using queue still bad idea?

avl updated this revision to Diff 516102.Apr 22 2023, 1:28 PM

addressed comments(except using std::queue and extra test)

andrewng added inline comments.Apr 24 2023, 4:41 AM
llvm/lib/Support/Parallel.cpp
16–17

So, probably using std::queue is OK.

Or, Dou you have a test case when using queue still bad idea?

I used my test case from D147493 which is a UE4 based link with a LLVM 16 libc++ self-hosted release build with the rpmalloc allocator on Windows 10, AMD Ryzen 3900X 12C/24T, Samsung 970 EVO NVMe SSD, 64GB RAM. This link shows the 3-4% drop in performance which I think is significant enough to justify consideration.

One solution would be to use std::deque and add "work" to the front or back dependent on Sequential, e.g.

void add(std::function<void()> F, bool Sequential = false) override {
  {
    std::lock_guard<std::mutex> Lock(Mutex);
    if (Sequential)
      WorkQueueSequential.emplace_front(std::move(F));
    else
      WorkQueue.emplace_back(std::move(F));
  }
  Cond.notify_one();
}
void work(ThreadPoolStrategy S, unsigned ThreadID) {
  ...
    auto &Queue = Sequential ? WorkQueueSequential : WorkQueue;
    auto Task = std::move(Queue.back());
    Queue.pop_back();
  ...
}

This change seems to restore the performance of my test link. What do you think?

TBH, I think we need to be a little bit careful with any further development of these parallel methods. IIRC, they were added to support basic and simple parallelism in LLD. I don't think the intention was ever for them to be general parallel algorithms for all general use cases, e.g. nested parallel for. I think my main concern is that making them more general will likely add threading overhead which in turn may impact the effectiveness and performance in LLD.

145

Yes, "sequential" tasks may be run on different threads and use a different "index". I supposed that this is not a problem.

I don't think it is a problem as far as correctness, but might be somewhat unexpected.

llvm/unittests/Support/ParallelTest.cpp
95–119

The idea was to test a case when previous task is already executed before new .spawn() is called.
Will remove if it seems redundant.

OK, I can kind of see the intent, although I don't think the test actually tests or demonstrates anything extra. Personally, I think it can be removed. If you do remove, you can remove the two includes too.

I think it would be worthwhile adding a test that shows that sequential tasks are executed in the order of submission.

avl added a comment.Apr 24 2023, 8:32 AM

I think it would be worthwhile adding a test that shows that sequential tasks are executed in the order of submission.

It looks like TaskGroupSequentialFor already do this?

llvm/lib/Support/Parallel.cpp
16–17

One solution would be to use std::deque and add "work" to the front or back dependent on Sequential, e.g.

void add(std::function<void()> F, bool Sequential = false) override {
  {
    std::lock_guard<std::mutex> Lock(Mutex);
    if (Sequential)
      WorkQueueSequential.emplace_front(std::move(F));
    else
      WorkQueue.emplace_back(std::move(F));
  }
  Cond.notify_one();
}
void work(ThreadPoolStrategy S, unsigned ThreadID) {
  ...
    auto &Queue = Sequential ? WorkQueueSequential : WorkQueue;
    auto Task = std::move(Queue.back());
    Queue.pop_back();
  ...
}

This change seems to restore the performance of my test link. What do you think?

this is OK for me. will implement it.

By the way, as LLD is sensitive to changing stack to the queue, it might be beneficial to replace for loops with parallelFor some when.

for (InputSectionBase *s : f->getSections()) {
....
for (Partition &part : partitions)

TBH, I think we need to be a little bit careful with any further development of these parallel methods. IIRC, they were added to support basic and simple parallelism in LLD. I don't think the intention was ever for them to be general parallel algorithms for all general use cases, e.g. nested parallel for. I think my main concern is that making them more general will likely add threading overhead which in turn may impact the effectiveness and performance in LLD.

yep. My intention is not to support nested parallel for by parallel methods. But allow to use these methods with other solutions. f.e. nested parallel for might be done using existing ThreadPool and basic parallel methods

std::function<void()> Fn = [&]() {
  parallelFor(Passes) {
  }
};

ThreadPool Pool;

for (Files) {
  Pool.async(Fn);

Pool.wait();

In that case parallel methods just work as simple parallel methods.

145

will document that behavior.

I think it would be worthwhile adding a test that shows that sequential tasks are executed in the order of submission.

It looks like TaskGroupSequentialFor already do this?

Yes, sorry, of course it does!

avl updated this revision to Diff 516713.Apr 25 2023, 2:13 AM

addressed comments(deleted extra test, changed queue to deque,
put tasks in front of deque in case sequential or parallel::strategy.ThreadsRequested == 1)

andrewng accepted this revision.Apr 25 2023, 6:55 AM

Apart from the comment nits, LGTM. Thanks!

llvm/include/llvm/Support/Parallel.h
88

how -> which.

89

done -> executed or run. Drop the the.

This revision is now accepted and ready to land.Apr 25 2023, 6:55 AM
avl added a comment.Apr 26 2023, 4:52 AM

Thank you for the review!

This revision was landed with ongoing or failed builds.Apr 26 2023, 4:53 AM
This revision was automatically updated to reflect the committed changes.