This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Support/
-
llvm/
-
Support/
-
Parallel.h
-
lib/Support/
-
Support/
-
Parallel.cpp

Differential D61115

Parallel: only allow the first TaskGroup to run tasks parallelly
ClosedPublic

Authored by MaskRay on Apr 24 2019, 11:14 PM.

Download Raw Diff

Details

Reviewers

ruiu
sbc100
andrewrk
zturner
dblaikie

Commits

rGf6a6290908df: Parallel: only allow the first TaskGroup to run tasks parallelly
rL359182: Parallel: only allow the first TaskGroup to run tasks parallelly

Summary

Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).

When parallel_for_each() is about to return, in ~Latch() called by
~TaskGroup(), a thread (in the default executor) may block in
Latch::sync() waiting for Count to become zero. If all threads in the
default executor are blocked, it is a dead lock.

To fix this, force serial execution if the current TaskGroup is not the
first one. For a nested llvm::parallel::for_each(), only the outermost
one runs parallelly.

Diff Detail

Repository: rL LLVM

Event Timeline

MaskRay created this revision.Apr 24 2019, 11:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2019, 11:14 PM

Herald added subscribers: llvm-commits, kristina, jfb. · View Herald Transcript

Harbormaster completed remote builds in B30980: Diff 196580.Apr 24 2019, 11:14 PM

A triply nested loop can easily reproduce the dead lock:

#include <llvm/Support/Parallel.h>
#include <unistd.h>

int main() {
  int a[99] = {};
  llvm::parallel::for_each(llvm::parallel::par, a, a+99, [&](int x) {
    usleep(1000);
    llvm::parallel::for_each(llvm::parallel::par, a, a+99, [&](int x) {
      usleep(1000);
      llvm::parallel::for_each(llvm::parallel::par, a, a+99, [&](int x) {
        usleep(1000);
      });
    });
  });
}

strace -f ./a => You soon see no nanosleep syscalls are made as all threads get stuck in futex(). This patch fixes that.

Don't you think we can just get rid of the notion of TaskGroup? Looks like we don't need TaskGroup to implement a thread pool and a parallel-for-loop.

Generally, it doesn't make much sense to have more than one thread pool in a process (invoking more threads than necessary is against the aim of the thread pool itself), so we can make the thread pool a singleton class. Let's call a thread pool class ThreadPool. That class would define getInstance() to get an instance. The only other member function would be void execute(std::function<void()> Fn), which executes Fn in some thread. Internally, that function would add a given function object to a queue and wakes up threads using a condition variable.

I believe it shouldn't be too hard to implement such thread pool class.

Once we implement the thread pool, then we can implement parallel-for-each like this:

template <class IterTy, class FuncTy>
void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
  // Split tasks into groups.
  ptrdiff_t TaskSize = std::distance(Begin, End) / 1024;
  if (TaskSize == 0)
    TaskSize = 1;
                              
  size_t NumTasks;
  std::mutex Mu;
  std::condition_varaible Cond;
  
  // Submit jobs to the thread pool.
  while (TaskSize < std::distance(Begin, End)) {
    ++NumTasks;
    ThreadPool::getInstance().execute([&] {                                             
      std::for_each(Begin, Begin + TaskSize, Fn);
      std::lock_guard<std::mutex> Lock(Mu);
      if (--NumTasks == 0)
        Cond.notify_all();
    });
    Begin += TaskSize;
  } 

  std::for_each(Begin, End, Fn);

  // Wait for everybody to complete.
  std::lock_guard<std::mutex> Lock(Mu);
  cv.wait(lk, [&] {return NumTasks == 0;});
}

This parallel_for_each should be completely reentrant.

Generally, it doesn't make much sense to have more than one thread pool in a process

The existing Executor::getDefaultExecutor() implements the thread pool. I agree that one thread pool suffices, so we parallelize the outermost loop and serialize inner loops.

Note a thread pool doesn't solve the problem. Latch::sync() makes one thread block (before the function returns there must be a synchronization point). (This can be fixed by introducing a thread scheduler but it is unnecessary for our use cases)

Once we implement the thread pool, then we can implement parallel-for-each like this:

TaskGroup encapsulates the common code that parallel_sort, parallel_for_each and parallel_for_each_n need: 1) spawns a task 2) synchronization point.
If we had only parallel_for_each but not the other parallel_* variants, I would agree that inlining TaskGroup would be good.

LGTM

I spent a few hours to simplify this code and make it fundamentally reentrant but failed. It is hard to debug multi-threading code. For now, I think this is a good measure to prevent accidentally writing nested use of this thread pool executor.

This revision is now accepted and ready to land.Apr 25 2019, 3:58 AM

Update comments

Harbormaster completed remote builds in B30988: Diff 196610.Apr 25 2019, 4:20 AM

Fix typo

Harbormaster completed remote builds in B30989: Diff 196611.Apr 25 2019, 4:21 AM

Closed by commit rL359182: Parallel: only allow the first TaskGroup to run tasks parallelly (authored by MaskRay). · Explain WhyApr 25 2019, 4:32 AM

This revision was automatically updated to reflect the committed changes.

aganea mentioned this in D104207: [Verifier] Parallelize verification and dom checking. NFC..Sep 16 2021, 12:55 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Support/

Parallel.h

4 lines

lib/

Support/

Parallel.cpp

31 lines

Diff 196612

llvm/trunk/include/llvm/Support/Parallel.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	public:
void sync() const {		void sync() const {
std::unique_lock<std::mutex> lock(Mutex);		std::unique_lock<std::mutex> lock(Mutex);
Cond.wait(lock, [&] { return Count == 0; });		Cond.wait(lock, [&] { return Count == 0; });
}		}
};		};

class TaskGroup {		class TaskGroup {
Latch L;		Latch L;
		bool Parallel;

public:		public:
		TaskGroup();
		~TaskGroup();

void spawn(std::function<void()> f);		void spawn(std::function<void()> f);

void sync() const { L.sync(); }		void sync() const { L.sync(); }
};		};

#if defined(_MSC_VER)		#if defined(_MSC_VER)
template <class RandomAccessIterator, class Comparator>		template <class RandomAccessIterator, class Comparator>
void parallel_sort(RandomAccessIterator Start, RandomAccessIterator End,		void parallel_sort(RandomAccessIterator Start, RandomAccessIterator End,
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/trunk/lib/Support/Parallel.cpp

	Show All 11 Lines
	#if LLVM_ENABLE_THREADS			#if LLVM_ENABLE_THREADS

	#include "llvm/Support/Threading.h"			#include "llvm/Support/Threading.h"

	#include <atomic>			#include <atomic>
	#include <stack>			#include <stack>
	#include <thread>			#include <thread>

	using namespace llvm;			namespace llvm {
				namespace parallel {
				namespace detail {

	namespace {			namespace {

	/// An abstract class that takes closures and runs them asynchronously.			/// An abstract class that takes closures and runs them asynchronously.
	class Executor {			class Executor {
	public:			public:
	virtual ~Executor() = default;			virtual ~Executor() = default;
	virtual void add(std::function<void()> func) = 0;			virtual void add(std::function<void()> func) = 0;
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

	Executor *Executor::getDefaultExecutor() {			Executor *Executor::getDefaultExecutor() {
	static ThreadPoolExecutor exec;			static ThreadPoolExecutor exec;
	return &exec;			return &exec;
	}			}
	#endif			#endif
	}			}

	void parallel::detail::TaskGroup::spawn(std::function<void()> F) {			static std::atomic<int> TaskGroupInstances;

				// Latch::sync() called by the dtor may cause one thread to block. If is a dead
				// lock if all threads in the default executor are blocked. To prevent the dead
				// lock, only allow the first TaskGroup to run tasks parallelly. In the scenario
				// of nested parallel_for_each(), only the outermost one runs parallelly.
				TaskGroup::TaskGroup() : Parallel(TaskGroupInstances++ == 0) {}
				TaskGroup::~TaskGroup() { --TaskGroupInstances; }

				void TaskGroup::spawn(std::function<void()> F) {
				if (Parallel) {
	L.inc();			L.inc();
	Executor::getDefaultExecutor()->add([&, F] {			Executor::getDefaultExecutor()->add([&, F] {
	F();			F();
	L.dec();			L.dec();
	});			});
				} else {
				F();
	}			}
				}

				} // namespace detail
				} // namespace parallel
				} // namespace llvm
	#endif // LLVM_ENABLE_THREADS			#endif // LLVM_ENABLE_THREADS