This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
CMakeLists.txt
-
COFF/
-
ICF.cpp
-
MapFile.cpp
-
Writer.cpp
-
ELF/
-
Threads.h
-
include/lld/Core/
-
lld/
-
Core/
-
Parallel.h
-
lib/ReaderWriter/MachO/
-
ReaderWriter/
-
MachO/
-
LayoutPass.cpp
-
unittests/
-
CMakeLists.txt
-
CoreTests/
-
CMakeLists.txt
-
ParallelTest.cpp
-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
2
Parallel.h
-
unittests/Support/
-
Support/
-
CMakeLists.txt
-
ParallelTest.cpp

Differential D32826

Move Parallel.h from LLD to LLVM
AcceptedPublic

Authored by zturner on May 3 2017, 1:11 PM.

Download Raw Diff

Details

Reviewers

ruiu
scott.smith
dvyukov
chandlerc
Bigcheese
• espindola
labath

Summary

These algorithms are useful in many contexts, not just LLD. In particular, LLDB currently has need for similar functionality and it would be unfortunate to have to roll our own when similar functionality already exists.

Diff Detail

Event Timeline

zturner created this revision.May 3 2017, 1:11 PM

Herald added a subscriber: mgorny. · View Herald TranscriptMay 3 2017, 1:11 PM

zturner mentioned this in D32597: Initiate loading of shared libraries in parallel.May 3 2017, 1:12 PM

Is the point just to move code, or to field improvements (in this code review)?

I can always follow up with a 2nd review for things I think would improve performance if that's better.

In D32826#745253, @scott.smith wrote:

Is the point just to move code, or to field improvements (in this code review)?

I can always follow up with a 2nd review for things I think would improve performance if that's better.

I think it woudl be better if improvements were made independently of a code move. Because if you combine improvements and code moves, the improvements kind of get "lost" in the history.

In D32826#745258, @zturner wrote:

I think it woudl be better if improvements were made independently of a code move. Because if you combine improvements and code moves, the improvements kind of get "lost" in the history.

Yeah, I tend to agree.

This revision is now accepted and ready to land.May 3 2017, 1:23 PM

labath added inline comments.May 3 2017, 1:32 PM

llvm/include/llvm/Support/Parallel.h
125	There's a somewhat annoying bug https://sourceware.org/bugzilla/show_bug.cgi?id=19951 present in all versions of glibc, which can lead to crashes if the thread happens to exit while it is being detached. I see no synchronization here which would prevent that, so I guess whether the crash happens depends on the type of work we will do here. The workaround is to add some synchronization which makes sure these two things don't happen concurrently (adds some overhead, but if done right it can be negligible), or create the threads in a detached state (impossible through the std::thread API).

labath added inline comments.May 3 2017, 2:04 PM

llvm/include/llvm/Support/Parallel.h
125	Ok, I think the situation is not as bad as I first thought. The synchronization in the Latch destructor should make sure this does not happen for the worker threads. This just leaves the spawner thread as a question. It may be possible it will be slow enough that it will not trigger the bug, but heavy system load can do wonders to thread scheduling...

We have llvm/include/llvm/Support/ThreadPool.h and llvm/include/llvm/Support/thread.h. Did you take a look at these files? It seems parallel_for and parallel_for_each can be implemented on top of them.

In D32826#745297, @ruiu wrote:

We have llvm/include/llvm/Support/ThreadPool.h and llvm/include/llvm/Support/thread.h. Did you take a look at these files? It seems parallel_for and parallel_for_each can be implemented on top of them.

I might be willing to take a stab at using llvm/Support/thread.h, but it should be in a separate patch, because I think this should one not be any functional change. Using llvm/Support/thread.h would potentially allow us to get rid of some of the ugly preprocessor defines in Parallel.h though, so I think we should do it.

Using ThreadPool.h seems a little more risky. It's almost certain there are subtle differences in the semantics between the two things being used, so it would need to be done much more carefully.

Yes, that can be done after moving this code. I'm not sure if I'm the right person to LGTM on adding new file to llvm/Support, but at least for moving these files out of LLD, it LGTM. Thank you for doing this!

zturner added a reviewer: chandlerc.May 3 2017, 3:40 PM

So, I understand that this is just moving code from LLD to LLVM, but this is pretty complex and subtle code. I think it needs really careful and thorough review. I'm going to try to plan some time for that, but I wonder -- are there any meaningful splits you can make to introduce this more incrementally to LLVM

D649 is the code review for the original implementation of this. +Bigcheese since he seems to have authored the original implementation.

If we are going to examine this file, I strongly recommend porting the same functionalities (i.e. parallel_for and parallel_for_each) on top of llvm/Support/ThreadPool.h. I don't think that is hard to do as TheadPool.h seems to provide the same functionality as this file. That would be easier than review this file again.

If we are going to do that, I would definitely rather do it in a follow-up :-/

As this code has been used in LLD for over 4 years, it seems at least to pass the stability bar. So if we want to make improvements by porting to llvm::ThreadPool and then re-reviewing it in its entirety, we can do that, but I don't think it should be part of this change.

One idea might be to put this in early as a straight move, then in a follow-up patch split it up so that:

All of the LLVM_ENABLE_THREADS=0 stuff is remains in the .h file.
All of the Windows / ConcRT codepath is in Windows/Parallel.inc
All of the custom implementations are in Posix/Parallel.inc.

This sort of address the concern Chandler had of splitting it up so it's more incremental, but it does so after the fact. I'm a little hesitant to make functional changes at the same time as a code move, since it loses history, so it would be nice if this could be a straight move.

It's been used for many years in LLD, but that doesn't cover all possible use cases. I think this implementation actually has a bug that if you call parallel_for from a function that is called by parallel_for, something goes wrong, IIRC. I didn't take a look into this at that moment as that use case does not exist in LLD, but that's a valid use case.

Even if we are going to just copy it from LLD to LLVM, we should hide all but parallel_for and parallel_for_each. This patch exposes Latch and TaskGroup.

Yes, I meant to hide by moving it to .cpp or to inside the "internal" namespace.

So, my concern about reviewing this isn't just about the implementation, it's also about the *API*. We need to make sure that the API design for parallel building blocks that we want to be available for the entire LLVM project are right. That's the particular split I was looking for...

There is also the fact that all of the code is currently still in LLD's old coding convention and needs to be updated to LLVM's coding conventions before it moves IMO.

In D32826#746628, @chandlerc wrote:

So, my concern about reviewing this isn't just about the implementation, it's also about the *API*. We need to make sure that the API design for parallel building blocks that we want to be available for the entire LLVM project are right. That's the particular split I was looking for...

There is also the fact that all of the code is currently still in LLD's old coding convention and needs to be updated to LLVM's coding conventions before it moves IMO.

Fair enough, I'll work on fixing up the coding conventions and segregating the Windows/non-Windows codepaths so it's a little less ugly to look at. In the meantime, feel free to either continue reviewing as time permits or wait for an update.

Then maybe you want to that in-place in LLD, so you can do that incrementally? Once it's done, you can move it to LLVM.

Talked to chandlerc@ offline about this. I'm going to work on splitting it up a bit inside of LLD and trying to get the API to match the C++ 17 Parallel TS as closely as possible. Once that's done, I'll upload a new patch.

@scott.smith , if you want to use this in LLDB in the interim, you can copy this file into LLDB/Utility.h, change up the namespaces, delete the parallel_sort since we don't need it, and remove as much unused code as possible. Then, once we get it into LLVM, it should at least be a straightforward change in LLDB to port it over to the common version.

labath resigned from this revision.May 8 2018, 8:47 AM

Herald added a reviewer: • espindola. · View Herald TranscriptMay 8 2018, 8:47 AM

Herald added subscribers: mgrang, arichardson, emaste. · View Herald Transcript

Revision Contents

Path

Size

lld/

CMakeLists.txt

1 line

COFF/

ICF.cpp

2 lines

MapFile.cpp

2 lines

Writer.cpp

2 lines

ELF/

Threads.h

6 lines

include/

lld/

Core/

Parallel.h

lib/

ReaderWriter/

MachO/

LayoutPass.cpp

5 lines

unittests/

CMakeLists.txt

1 line

CoreTests/

CMakeLists.txt

ParallelTest.cpp

llvm/

include/

llvm/

Support/

Parallel.h

331 lines

unittests/

Support/

CMakeLists.txt

3 lines

ParallelTest.cpp

46 lines

Diff 97720

lld/CMakeLists.txt

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	if (VTUNE_FOUND)
add_definitions(-DLLD_HAS_VTUNE)		add_definitions(-DLLD_HAS_VTUNE)
endif()		endif()
endif()		endif()

option(LLD_BUILD_TOOLS		option(LLD_BUILD_TOOLS
"Build the lld tools. If OFF, just generate build targets." ON)		"Build the lld tools. If OFF, just generate build targets." ON)

if (MSVC)		if (MSVC)
add_definitions(-wd4530) # Suppress 'warning C4530: C++ exception handler used, but unwind semantics are not enabled.'
add_definitions(-wd4062) # Suppress 'warning C4062: enumerator X in switch of enum Y is not handled' from system header.		add_definitions(-wd4062) # Suppress 'warning C4062: enumerator X in switch of enum Y is not handled' from system header.
endif()		endif()

include_directories(BEFORE		include_directories(BEFORE
${CMAKE_CURRENT_BINARY_DIR}/include		${CMAKE_CURRENT_BINARY_DIR}/include
${CMAKE_CURRENT_SOURCE_DIR}/include		${CMAKE_CURRENT_SOURCE_DIR}/include
)		)

Show All 20 Lines

lld/COFF/ICF.cpp

	Show All 15 Lines
	//			//
	// See ELF/ICF.cpp for the details about the algortihm.			// See ELF/ICF.cpp for the details about the algortihm.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "Chunks.h"			#include "Chunks.h"
	#include "Error.h"			#include "Error.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "lld/Core/Parallel.h"
	#include "llvm/ADT/Hashing.h"			#include "llvm/ADT/Hashing.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
				#include "llvm/Support/Parallel.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <algorithm>			#include <algorithm>
	#include <atomic>			#include <atomic>
	#include <vector>			#include <vector>

	using namespace llvm;			using namespace llvm;

	namespace lld {			namespace lld {
	▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

lld/COFF/MapFile.cpp

	Show All 19 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "MapFile.h"			#include "MapFile.h"
	#include "Error.h"			#include "Error.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "Writer.h"			#include "Writer.h"

	#include "lld/Core/Parallel.h"			#include "llvm/Support/Parallel.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	using namespace llvm;			using namespace llvm;
	using namespace llvm::object;			using namespace llvm::object;

	using namespace lld;			using namespace lld;
	using namespace lld::coff;			using namespace lld::coff;

	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

lld/COFF/Writer.cpp

	Show All 11 Lines
	#include "DLL.h"			#include "DLL.h"
	#include "Error.h"			#include "Error.h"
	#include "InputFiles.h"			#include "InputFiles.h"
	#include "MapFile.h"			#include "MapFile.h"
	#include "Memory.h"			#include "Memory.h"
	#include "PDB.h"			#include "PDB.h"
	#include "SymbolTable.h"			#include "SymbolTable.h"
	#include "Symbols.h"			#include "Symbols.h"
	#include "lld/Core/Parallel.h"
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/ADT/StringSwitch.h"			#include "llvm/ADT/StringSwitch.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"
	#include "llvm/Support/FileOutputBuffer.h"			#include "llvm/Support/FileOutputBuffer.h"
				#include "llvm/Support/Parallel.h"
	#include "llvm/Support/RandomNumberGenerator.h"			#include "llvm/Support/RandomNumberGenerator.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <algorithm>			#include <algorithm>
	#include <cstdio>			#include <cstdio>
	#include <map>			#include <map>
	#include <memory>			#include <memory>
	#include <utility>			#include <utility>

	▲ Show 20 Lines • Show All 841 Lines • Show Last 20 Lines

lld/ELF/Threads.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_ELF_THREADS_H			#ifndef LLD_ELF_THREADS_H
	#define LLD_ELF_THREADS_H			#define LLD_ELF_THREADS_H

	#include "Config.h"			#include "Config.h"

	#include "lld/Core/Parallel.h"			#include "llvm/Support/Parallel.h"
	#include <algorithm>			#include <algorithm>
	#include <functional>			#include <functional>

	namespace lld {			namespace lld {
	namespace elf {			namespace elf {

	template <class IterTy, class FuncTy>			template <class IterTy, class FuncTy>
	void parallelForEach(IterTy Begin, IterTy End, FuncTy Fn) {			void parallelForEach(IterTy Begin, IterTy End, FuncTy Fn) {
	if (Config->Threads)			if (Config->Threads)
	parallel_for_each(Begin, End, Fn);			llvm::parallel_for_each(Begin, End, Fn);
	else			else
	std::for_each(Begin, End, Fn);			std::for_each(Begin, End, Fn);
	}			}

	inline void parallelFor(size_t Begin, size_t End,			inline void parallelFor(size_t Begin, size_t End,
	std::function<void(size_t)> Fn) {			std::function<void(size_t)> Fn) {
	if (Config->Threads) {			if (Config->Threads) {
	parallel_for(Begin, End, Fn);			llvm::parallel_for(Begin, End, Fn);
	} else {			} else {
	for (size_t I = Begin; I < End; ++I)			for (size_t I = Begin; I < End; ++I)
	Fn(I);			Fn(I);
	}			}
	}			}
	}			}
	}			}

	#endif			#endif

lld/include/lld/Core/Parallel.h

This file was deleted.

	//===- lld/Core/Parallel.h - Parallel utilities ---------------------------===//
	//
	// The LLVM Linker
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//

	#ifndef LLD_CORE_PARALLEL_H
	#define LLD_CORE_PARALLEL_H

	#include "lld/Core/Instrumentation.h"
	#include "lld/Core/LLVM.h"
	#include "llvm/Support/MathExtras.h"
	#include "llvm/Support/thread.h"

	#include <algorithm>
	#include <atomic>
	#include <condition_variable>
	#include <mutex>
	#include <stack>

	#if defined(_MSC_VER) && LLVM_ENABLE_THREADS
	#include <concrt.h>
	#include <ppl.h>
	#endif

	namespace lld {
	/// \brief Allows one or more threads to wait on a potentially unknown number of
	/// events.
	///
	/// A latch starts at \p count. inc() increments this, and dec() decrements it.
	/// All calls to sync() will block while the count is not 0.
	///
	/// Calling dec() on a Latch with a count of 0 has undefined behaivor.
	class Latch {
	uint32_t _count;
	mutable std::mutex _condMut;
	mutable std::condition_variable _cond;

	public:
	explicit Latch(uint32_t count = 0) : _count(count) {}
	~Latch() { sync(); }

	void inc() {
	std::unique_lock<std::mutex> lock(_condMut);
	++_count;
	}

	void dec() {
	std::unique_lock<std::mutex> lock(_condMut);
	if (--_count == 0)
	_cond.notify_all();
	}

	void sync() const {
	std::unique_lock<std::mutex> lock(_condMut);
	_cond.wait(lock, [&] {
	return _count == 0;
	});
	}
	};

	// Classes in this namespace are implementation details of this header.
	namespace internal {

	/// \brief An abstract class that takes closures and runs them asynchronously.
	class Executor {
	public:
	virtual ~Executor() = default;
	virtual void add(std::function<void()> func) = 0;
	};

	#if !defined(LLVM_ENABLE_THREADS) \|\| LLVM_ENABLE_THREADS == 0
	class SyncExecutor : public Executor {
	public:
	virtual void add(std::function<void()> func) {
	func();
	}
	};

	inline Executor *getDefaultExecutor() {
	static SyncExecutor exec;
	return &exec;
	}
	#elif defined(_MSC_VER)
	/// \brief An Executor that runs tasks via ConcRT.
	class ConcRTExecutor : public Executor {
	struct Taskish {
	Taskish(std::function<void()> task) : _task(task) {}

	std::function<void()> _task;

	static void run(void *p) {
	Taskish self = static_cast<Taskish >(p);
	self->_task();
	concurrency::Free(self);
	}
	};

	public:
	virtual void add(std::function<void()> func) {
	Concurrency::CurrentScheduler::ScheduleTask(Taskish::run,
	new (concurrency::Alloc(sizeof(Taskish))) Taskish(func));
	}
	};

	inline Executor *getDefaultExecutor() {
	static ConcRTExecutor exec;
	return &exec;
	}
	#else
	/// \brief An implementation of an Executor that runs closures on a thread pool
	/// in filo order.
	class ThreadPoolExecutor : public Executor {
	public:
	explicit ThreadPoolExecutor(unsigned threadCount =
	std::thread::hardware_concurrency())
	: _stop(false), _done(threadCount) {
	// Spawn all but one of the threads in another thread as spawning threads
	// can take a while.
	std::thread([&, threadCount] {
	for (size_t i = 1; i < threadCount; ++i) {
	std::thread([=] {
	work();
	}).detach();
	}
	work();
	}).detach();
	}

	~ThreadPoolExecutor() override {
	std::unique_lock<std::mutex> lock(_mutex);
	_stop = true;
	lock.unlock();
	_cond.notify_all();
	// Wait for ~Latch.
	}

	void add(std::function<void()> f) override {
	std::unique_lock<std::mutex> lock(_mutex);
	_workStack.push(f);
	lock.unlock();
	_cond.notify_one();
	}

	private:
	void work() {
	while (true) {
	std::unique_lock<std::mutex> lock(_mutex);
	_cond.wait(lock, [&] {
	return _stop \|\| !_workStack.empty();
	});
	if (_stop)
	break;
	auto task = _workStack.top();
	_workStack.pop();
	lock.unlock();
	task();
	}
	_done.dec();
	}

	std::atomic<bool> _stop;
	std::stack<std::function<void()>> _workStack;
	std::mutex _mutex;
	std::condition_variable _cond;
	Latch _done;
	};

	inline Executor *getDefaultExecutor() {
	static ThreadPoolExecutor exec;
	return &exec;
	}
	#endif

	} // namespace internal

	/// \brief Allows launching a number of tasks and waiting for them to finish
	/// either explicitly via sync() or implicitly on destruction.
	class TaskGroup {
	Latch _latch;

	public:
	void spawn(std::function<void()> f) {
	_latch.inc();
	internal::getDefaultExecutor()->add([&, f] {
	f();
	_latch.dec();
	});
	}

	void sync() const { _latch.sync(); }
	};

	#if !defined(LLVM_ENABLE_THREADS) \|\| LLVM_ENABLE_THREADS == 0
	template <class RandomAccessIterator, class Comp>
	void parallel_sort(
	RandomAccessIterator start, RandomAccessIterator end,
	const Comp &comp = std::less<
	typename std::iterator_traits<RandomAccessIterator>::value_type>()) {
	std::sort(start, end, comp);
	}
	#elif defined(_MSC_VER)
	// Use ppl parallel_sort on Windows.
	template <class RandomAccessIterator, class Comp>
	void parallel_sort(
	RandomAccessIterator start, RandomAccessIterator end,
	const Comp &comp = std::less<
	typename std::iterator_traits<RandomAccessIterator>::value_type>()) {
	concurrency::parallel_sort(start, end, comp);
	}
	#else
	namespace detail {
	const ptrdiff_t minParallelSize = 1024;

	/// \brief Inclusive median.
	template <class RandomAccessIterator, class Comp>
	RandomAccessIterator medianOf3(RandomAccessIterator start,
	RandomAccessIterator end, const Comp &comp) {
	RandomAccessIterator mid = start + (std::distance(start, end) / 2);
	return comp(start, (end - 1))
	? (comp(mid, (end - 1)) ? (comp(start, mid) ? mid : start)
	: end - 1)
	: (comp(mid, start) ? (comp((end - 1), mid) ? mid : end - 1)
	: start);
	}

	template <class RandomAccessIterator, class Comp>
	void parallel_quick_sort(RandomAccessIterator start, RandomAccessIterator end,
	const Comp &comp, TaskGroup &tg, size_t depth) {
	// Do a sequential sort for small inputs.
	if (std::distance(start, end) < detail::minParallelSize \|\| depth == 0) {
	std::sort(start, end, comp);
	return;
	}

	// Partition.
	auto pivot = medianOf3(start, end, comp);
	// Move pivot to end.
	std::swap((end - 1), pivot);
	pivot = std::partition(start, end - 1, [&comp, end](decltype(*start) v) {
	return comp(v, *(end - 1));
	});
	// Move pivot to middle of partition.
	std::swap(pivot, (end - 1));

	// Recurse.
	tg.spawn([=, &comp, &tg] {
	parallel_quick_sort(start, pivot, comp, tg, depth - 1);
	});
	parallel_quick_sort(pivot + 1, end, comp, tg, depth - 1);
	}
	}

	template <class RandomAccessIterator, class Comp>
	void parallel_sort(
	RandomAccessIterator start, RandomAccessIterator end,
	const Comp &comp = std::less<
	typename std::iterator_traits<RandomAccessIterator>::value_type>()) {
	TaskGroup tg;
	detail::parallel_quick_sort(start, end, comp, tg,
	llvm::Log2_64(std::distance(start, end)) + 1);
	}
	#endif

	template <class T> void parallel_sort(T start, T end) {
	parallel_sort(start, end, std::less<T>());
	}

	#if !defined(LLVM_ENABLE_THREADS) \|\| LLVM_ENABLE_THREADS == 0
	template <class IterTy, class FuncTy>
	void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
	std::for_each(Begin, End, Fn);
	}

	template <class IndexTy, class FuncTy>
	void parallel_for(IndexTy Begin, IndexTy End, FuncTy Fn) {
	for (IndexTy I = Begin; I != End; ++I)
	Fn(I);
	}
	#elif defined(_MSC_VER)
	// Use ppl parallel_for_each on Windows.
	template <class IterTy, class FuncTy>
	void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
	concurrency::parallel_for_each(Begin, End, Fn);
	}

	template <class IndexTy, class FuncTy>
	void parallel_for(IndexTy Begin, IndexTy End, FuncTy Fn) {
	concurrency::parallel_for(Begin, End, Fn);
	}
	#else
	template <class IterTy, class FuncTy>
	void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
	// TaskGroup has a relatively high overhead, so we want to reduce
	// the number of spawn() calls. We'll create up to 1024 tasks here.
	// (Note that 1024 is an arbitrary number. This code probably needs
	// improving to take the number of available cores into account.)
	ptrdiff_t TaskSize = std::distance(Begin, End) / 1024;
	if (TaskSize == 0)
	TaskSize = 1;

	TaskGroup Tg;
	while (TaskSize <= std::distance(Begin, End)) {
	Tg.spawn([=, &Fn] { std::for_each(Begin, Begin + TaskSize, Fn); });
	Begin += TaskSize;
	}
	Tg.spawn([=, &Fn] { std::for_each(Begin, End, Fn); });
	}

	template <class IndexTy, class FuncTy>
	void parallel_for(IndexTy Begin, IndexTy End, FuncTy Fn) {
	ptrdiff_t TaskSize = (End - Begin) / 1024;
	if (TaskSize == 0)
	TaskSize = 1;

	TaskGroup Tg;
	IndexTy I = Begin;
	for (; I + TaskSize < End; I += TaskSize) {
	Tg.spawn([=, &Fn] {
	for (IndexTy J = I, E = I + TaskSize; J != E; ++J)
	Fn(J);
	});
	}
	Tg.spawn([=, &Fn] {
	for (IndexTy J = I; J < End; ++J)
	Fn(J);
	});
	}
	#endif
	} // end namespace lld

	#endif // LLD_CORE_PARALLEL_H

lld/lib/ReaderWriter/MachO/LayoutPass.cpp

//===-- ReaderWriter/MachO/LayoutPass.cpp - Layout atoms ------------------===//		//===-- ReaderWriter/MachO/LayoutPass.cpp - Layout atoms ------------------===//
//		//
// The LLVM Linker		// The LLVM Linker
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "LayoutPass.h"		#include "LayoutPass.h"
#include "lld/Core/Instrumentation.h"		#include "lld/Core/Instrumentation.h"
#include "lld/Core/Parallel.h"
#include "lld/Core/PassManager.h"		#include "lld/Core/PassManager.h"
#include "lld/ReaderWriter/MachOLinkingContext.h"		#include "lld/ReaderWriter/MachOLinkingContext.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/Parallel.h"
#include <algorithm>		#include <algorithm>
#include <set>		#include <set>
#include <utility>		#include <utility>

using namespace lld;		using namespace lld;

#define DEBUG_TYPE "LayoutPass"		#define DEBUG_TYPE "LayoutPass"

▲ Show 20 Lines • Show All 430 Lines • ▼ Show 20 Lines	llvm::Error LayoutPass::perform(SimpleFile &mergedFile) {
buildOrdinalOverrideMap(atomRange);		buildOrdinalOverrideMap(atomRange);

DEBUG({		DEBUG({
llvm::dbgs() << "unsorted atoms:\n";		llvm::dbgs() << "unsorted atoms:\n";
printDefinedAtoms(atomRange);		printDefinedAtoms(atomRange);
});		});

std::vector<LayoutPass::SortKey> vec = decorate(atomRange);		std::vector<LayoutPass::SortKey> vec = decorate(atomRange);
parallel_sort(vec.begin(), vec.end(),		llvm::parallel_sort(
		vec.begin(), vec.end(),
[&](const LayoutPass::SortKey &l, const LayoutPass::SortKey &r) -> bool {		[&](const LayoutPass::SortKey &l, const LayoutPass::SortKey &r) -> bool {
return compareAtoms(l, r, _customSorter);		return compareAtoms(l, r, _customSorter);
});		});
DEBUG(checkTransitivity(vec, _customSorter));		DEBUG(checkTransitivity(vec, _customSorter));
undecorate(atomRange, vec);		undecorate(atomRange, vec);

DEBUG({		DEBUG({
llvm::dbgs() << "sorted atoms:\n";		llvm::dbgs() << "sorted atoms:\n";
Show All 17 Lines

lld/unittests/CMakeLists.txt

	add_custom_target(LLDUnitTests)			add_custom_target(LLDUnitTests)
	set_target_properties(LLDUnitTests PROPERTIES FOLDER "lld tests")			set_target_properties(LLDUnitTests PROPERTIES FOLDER "lld tests")

	set(CMAKE_BUILD_WITH_INSTALL_RPATH OFF)			set(CMAKE_BUILD_WITH_INSTALL_RPATH OFF)

	# add_lld_unittest(test_dirname file1.cpp file2.cpp)			# add_lld_unittest(test_dirname file1.cpp file2.cpp)
	#			#
	# Will compile the list of files together and link against lld			# Will compile the list of files together and link against lld
	# Produces a binary named 'basename(test_dirname)'.			# Produces a binary named 'basename(test_dirname)'.
	function(add_lld_unittest test_dirname)			function(add_lld_unittest test_dirname)
	add_unittest(LLDUnitTests ${test_dirname} ${ARGN})			add_unittest(LLDUnitTests ${test_dirname} ${ARGN})
	target_link_libraries(${test_dirname} ${LLVM_COMMON_LIBS})			target_link_libraries(${test_dirname} ${LLVM_COMMON_LIBS})
	endfunction()			endfunction()

	add_subdirectory(CoreTests)
	add_subdirectory(DriverTests)			add_subdirectory(DriverTests)
	add_subdirectory(MachOTests)			add_subdirectory(MachOTests)

lld/unittests/CoreTests/CMakeLists.txt

This file was deleted.

	add_lld_unittest(CoreTests
	ParallelTest.cpp
	)

	target_link_libraries(CoreTests
	${LLVM_PTHREAD_LIB}
	)

lld/unittests/CoreTests/ParallelTest.cpp

This file was deleted.

	//===- lld/unittest/ParallelTest.cpp --------------------------------------===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// \file
	/// \brief Parallel.h unit tests.
	///
	//===----------------------------------------------------------------------===//

	#include "gtest/gtest.h"
	#include "lld/Core/Parallel.h"
	#include <array>
	#include <random>

	uint32_t array[1024 * 1024];

	TEST(Parallel, sort) {
	std::mt19937 randEngine;
	std::uniform_int_distribution<uint32_t> dist;

	for (auto &i : array)
	i = dist(randEngine);

	lld::parallel_sort(std::begin(array), std::end(array));
	ASSERT_TRUE(std::is_sorted(std::begin(array), std::end(array)));
	}

	TEST(Parallel, parallel_for) {
	// We need to test the case with a TaskSize > 1. We are white-box testing
	// here. The TaskSize is calculated as (End - Begin) / 1024 at the time of
	// writing.
	uint32_t range[2050];
	std::fill(range, range + 2050, 1);
	lld::parallel_for(0, 2049, [&range](size_t I) { ++range[I]; });

	uint32_t expected[2049];
	std::fill(expected, expected + 2049, 2);
	ASSERT_TRUE(std::equal(range, range + 2049, expected));
	// Check that we don't write past the end of the requested range.
	ASSERT_EQ(range[2049], 1u);
	}

llvm/include/llvm/Support/Parallel.h

This file was added.

				//===- llvm/Support/Parallel.h - Parallel utilities -----------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_SUPPORT_PARALLEL_H
				#define LLVM_SUPPORT_PARALLEL_H

				#include "llvm/Support/MathExtras.h"
				#include "llvm/Support/thread.h"

				#include <algorithm>
				#include <atomic>
				#include <condition_variable>
				#include <mutex>
				#include <stack>

				#if defined(_MSC_VER) && LLVM_ENABLE_THREADS
				// Suppress 'warning C4530: C++ exception handler used, but unwind semantics are
				// not enabled.'
				#pragma warning(push)
				#pragma warning(disable : 4530)

				#include <concrt.h>
				#include <ppl.h>
				#pragma warning(pop)
				#endif

				namespace llvm {
				/// \brief Allows one or more threads to wait on a potentially unknown number of
				/// events.
				///
				/// A latch starts at \p count. inc() increments this, and dec() decrements it.
				/// All calls to sync() will block while the count is not 0.
				///
				/// Calling dec() on a Latch with a count of 0 has undefined behaivor.
				class Latch {
				uint32_t _count;
				mutable std::mutex _condMut;
				mutable std::condition_variable _cond;

				public:
				explicit Latch(uint32_t count = 0) : _count(count) {}
				~Latch() { sync(); }

				void inc() {
				std::unique_lock<std::mutex> lock(_condMut);
				++_count;
				}

				void dec() {
				std::unique_lock<std::mutex> lock(_condMut);
				if (--_count == 0)
				_cond.notify_all();
				}

				void sync() const {
				std::unique_lock<std::mutex> lock(_condMut);
				_cond.wait(lock, [&] { return _count == 0; });
				}
				};

				// Classes in this namespace are implementation details of this header.
				namespace internal {

				/// \brief An abstract class that takes closures and runs them asynchronously.
				class Executor {
				public:
				virtual ~Executor() = default;
				virtual void add(std::function<void()> func) = 0;
				};

				#if !defined(LLVM_ENABLE_THREADS) \|\| LLVM_ENABLE_THREADS == 0
				class SyncExecutor : public Executor {
				public:
				virtual void add(std::function<void()> func) { func(); }
				};

				inline Executor *getDefaultExecutor() {
				static SyncExecutor exec;
				return &exec;
				}
				#elif defined(_MSC_VER)
				/// \brief An Executor that runs tasks via ConcRT.
				class ConcRTExecutor : public Executor {
				struct Taskish {
				Taskish(std::function<void()> task) : _task(task) {}

				std::function<void()> _task;

				static void run(void *p) {
				Taskish self = static_cast<Taskish >(p);
				self->_task();
				concurrency::Free(self);
				}
				};

				public:
				virtual void add(std::function<void()> func) {
				Concurrency::CurrentScheduler::ScheduleTask(
				Taskish::run, new (concurrency::Alloc(sizeof(Taskish))) Taskish(func));
				}
				};

				inline Executor *getDefaultExecutor() {
				static ConcRTExecutor exec;
				return &exec;
				}
				#else
				/// \brief An implementation of an Executor that runs closures on a thread pool
				/// in filo order.
				class ThreadPoolExecutor : public Executor {
				public:
				explicit ThreadPoolExecutor(
				unsigned threadCount = std::thread::hardware_concurrency())
				: _stop(false), _done(threadCount) {
				// Spawn all but one of the threads in another thread as spawning threads
				// can take a while.
				std::thread([&, threadCount] {
				for (size_t i = 1; i < threadCount; ++i) {
				std::thread([=] { work(); }).detach();
				labathUnsubmitted Not Done Reply Inline Actions There's a somewhat annoying bug https://sourceware.org/bugzilla/show_bug.cgi?id=19951 present in all versions of glibc, which can lead to crashes if the thread happens to exit while it is being detached. I see no synchronization here which would prevent that, so I guess whether the crash happens depends on the type of work we will do here. The workaround is to add some synchronization which makes sure these two things don't happen concurrently (adds some overhead, but if done right it can be negligible), or create the threads in a detached state (impossible through the std::thread API). labath: There's a somewhat annoying bug <https://sourceware.org/bugzilla/show_bug.cgi?id=19951> present…
				labathUnsubmitted Not Done Reply Inline Actions Ok, I think the situation is not as bad as I first thought. The synchronization in the Latch destructor should make sure this does not happen for the worker threads. This just leaves the spawner thread as a question. It may be possible it will be slow enough that it will not trigger the bug, but heavy system load can do wonders to thread scheduling... labath: Ok, I think the situation is not as bad as I first thought. The synchronization in the Latch…
				}
				work();
				}).detach();
				}

				~ThreadPoolExecutor() override {
				std::unique_lock<std::mutex> lock(_mutex);
				_stop = true;
				lock.unlock();
				_cond.notify_all();
				// Wait for ~Latch.
				}

				void add(std::function<void()> f) override {
				std::unique_lock<std::mutex> lock(_mutex);
				_workStack.push(f);
				lock.unlock();
				_cond.notify_one();
				}

				private:
				void work() {
				while (true) {
				std::unique_lock<std::mutex> lock(_mutex);
				_cond.wait(lock, [&] { return _stop \|\| !_workStack.empty(); });
				if (_stop)
				break;
				auto task = _workStack.top();
				_workStack.pop();
				lock.unlock();
				task();
				}
				_done.dec();
				}

				std::atomic<bool> _stop;
				std::stack<std::function<void()>> _workStack;
				std::mutex _mutex;
				std::condition_variable _cond;
				Latch _done;
				};

				inline Executor *getDefaultExecutor() {
				static ThreadPoolExecutor exec;
				return &exec;
				}
				#endif

				} // namespace internal

				/// \brief Allows launching a number of tasks and waiting for them to finish
				/// either explicitly via sync() or implicitly on destruction.
				class TaskGroup {
				Latch _latch;

				public:
				void spawn(std::function<void()> f) {
				_latch.inc();
				internal::getDefaultExecutor()->add([&, f] {
				f();
				_latch.dec();
				});
				}

				void sync() const { _latch.sync(); }
				};

				#if !defined(LLVM_ENABLE_THREADS) \|\| LLVM_ENABLE_THREADS == 0
				template <class RandomAccessIterator, class Comp>
				void parallel_sort(
				RandomAccessIterator start, RandomAccessIterator end,
				const Comp &comp = std::less<
				typename std::iterator_traits<RandomAccessIterator>::value_type>()) {
				std::sort(start, end, comp);
				}
				#elif defined(_MSC_VER)
				// Use ppl parallel_sort on Windows.
				template <class RandomAccessIterator, class Comp>
				void parallel_sort(
				RandomAccessIterator start, RandomAccessIterator end,
				const Comp &comp = std::less<
				typename std::iterator_traits<RandomAccessIterator>::value_type>()) {
				concurrency::parallel_sort(start, end, comp);
				}
				#else
				namespace detail {
				const ptrdiff_t minParallelSize = 1024;

				/// \brief Inclusive median.
				template <class RandomAccessIterator, class Comp>
				RandomAccessIterator medianOf3(RandomAccessIterator start,
				RandomAccessIterator end, const Comp &comp) {
				RandomAccessIterator mid = start + (std::distance(start, end) / 2);
				return comp(start, (end - 1))
				? (comp(mid, (end - 1)) ? (comp(start, mid) ? mid : start)
				: end - 1)
				: (comp(mid, start) ? (comp((end - 1), mid) ? mid : end - 1)
				: start);
				}

				template <class RandomAccessIterator, class Comp>
				void parallel_quick_sort(RandomAccessIterator start, RandomAccessIterator end,
				const Comp &comp, TaskGroup &tg, size_t depth) {
				// Do a sequential sort for small inputs.
				if (std::distance(start, end) < detail::minParallelSize \|\| depth == 0) {
				std::sort(start, end, comp);
				return;
				}

				// Partition.
				auto pivot = medianOf3(start, end, comp);
				// Move pivot to end.
				std::swap((end - 1), pivot);
				pivot = std::partition(start, end - 1, [&comp, end](decltype(*start) v) {
				return comp(v, *(end - 1));
				});
				// Move pivot to middle of partition.
				std::swap(pivot, (end - 1));

				// Recurse.
				tg.spawn([=, &comp, &tg] {
				parallel_quick_sort(start, pivot, comp, tg, depth - 1);
				});
				parallel_quick_sort(pivot + 1, end, comp, tg, depth - 1);
				}
				}

				template <class RandomAccessIterator, class Comp>
				void parallel_sort(
				RandomAccessIterator start, RandomAccessIterator end,
				const Comp &comp = std::less<
				typename std::iterator_traits<RandomAccessIterator>::value_type>()) {
				TaskGroup tg;
				detail::parallel_quick_sort(start, end, comp, tg,
				llvm::Log2_64(std::distance(start, end)) + 1);
				}
				#endif

				template <class T> void parallel_sort(T start, T end) {
				parallel_sort(start, end, std::less<T>());
				}

				#if !defined(LLVM_ENABLE_THREADS) \|\| LLVM_ENABLE_THREADS == 0
				template <class IterTy, class FuncTy>
				void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
				std::for_each(Begin, End, Fn);
				}

				template <class IndexTy, class FuncTy>
				void parallel_for(IndexTy Begin, IndexTy End, FuncTy Fn) {
				for (IndexTy I = Begin; I != End; ++I)
				Fn(I);
				}
				#elif defined(_MSC_VER)
				// Use ppl parallel_for_each on Windows.
				template <class IterTy, class FuncTy>
				void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
				concurrency::parallel_for_each(Begin, End, Fn);
				}

				template <class IndexTy, class FuncTy>
				void parallel_for(IndexTy Begin, IndexTy End, FuncTy Fn) {
				concurrency::parallel_for(Begin, End, Fn);
				}
				#else
				template <class IterTy, class FuncTy>
				void parallel_for_each(IterTy Begin, IterTy End, FuncTy Fn) {
				// TaskGroup has a relatively high overhead, so we want to reduce
				// the number of spawn() calls. We'll create up to 1024 tasks here.
				// (Note that 1024 is an arbitrary number. This code probably needs
				// improving to take the number of available cores into account.)
				ptrdiff_t TaskSize = std::distance(Begin, End) / 1024;
				if (TaskSize == 0)
				TaskSize = 1;

				TaskGroup Tg;
				while (TaskSize <= std::distance(Begin, End)) {
				Tg.spawn([=, &Fn] { std::for_each(Begin, Begin + TaskSize, Fn); });
				Begin += TaskSize;
				}
				Tg.spawn([=, &Fn] { std::for_each(Begin, End, Fn); });
				}

				template <class IndexTy, class FuncTy>
				void parallel_for(IndexTy Begin, IndexTy End, FuncTy Fn) {
				ptrdiff_t TaskSize = (End - Begin) / 1024;
				if (TaskSize == 0)
				TaskSize = 1;

				TaskGroup Tg;
				IndexTy I = Begin;
				for (; I + TaskSize < End; I += TaskSize) {
				Tg.spawn([=, &Fn] {
				for (IndexTy J = I, E = I + TaskSize; J != E; ++J)
				Fn(J);
				});
				}
				Tg.spawn([=, &Fn] {
				for (IndexTy J = I; J < End; ++J)
				Fn(J);
				});
				}
				#endif
				} // end namespace llvm

				#endif // LLVM_SUPPORT_PARALLEL_H

llvm/unittests/Support/CMakeLists.txt

Show All 30 Lines	add_llvm_unittest(SupportTests
LineIteratorTest.cpp		LineIteratorTest.cpp
LockFileManagerTest.cpp		LockFileManagerTest.cpp
MD5Test.cpp		MD5Test.cpp
ManagedStatic.cpp		ManagedStatic.cpp
MathExtrasTest.cpp		MathExtrasTest.cpp
MemoryBufferTest.cpp		MemoryBufferTest.cpp
MemoryTest.cpp		MemoryTest.cpp
NativeFormatTests.cpp		NativeFormatTests.cpp
		ParallelTest.cpp
Path.cpp		Path.cpp
ProcessTest.cpp		ProcessTest.cpp
ProgramTest.cpp		ProgramTest.cpp
RegexTest.cpp		RegexTest.cpp
ReplaceFileTest.cpp		ReplaceFileTest.cpp
ScaledNumberTest.cpp		ScaledNumberTest.cpp
SourceMgrTest.cpp		SourceMgrTest.cpp
SpecialCaseListTest.cpp		SpecialCaseListTest.cpp
Show All 13 Lines	add_llvm_unittest(SupportTests
YAMLParserTest.cpp		YAMLParserTest.cpp
formatted_raw_ostream_test.cpp		formatted_raw_ostream_test.cpp
raw_ostream_test.cpp		raw_ostream_test.cpp
raw_pwrite_stream_test.cpp		raw_pwrite_stream_test.cpp
raw_sha1_ostream_test.cpp		raw_sha1_ostream_test.cpp
xxhashTest.cpp		xxhashTest.cpp
)		)

# ManagedStatic.cpp uses <pthread>.		# ManagedStatic and Parallel uses <pthread>.
target_link_libraries(SupportTests ${LLVM_PTHREAD_LIB})		target_link_libraries(SupportTests ${LLVM_PTHREAD_LIB})

add_subdirectory(DynamicLibrary)		add_subdirectory(DynamicLibrary)

llvm/unittests/Support/ParallelTest.cpp

This file was added.

				//===- lld/unittest/ParallelTest.cpp --------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// \brief Parallel.h unit tests.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/Support/Parallel.h"
				#include "gtest/gtest.h"
				#include <array>
				#include <random>

				uint32_t array[1024 * 1024];

				TEST(Parallel, sort) {
				std::mt19937 randEngine;
				std::uniform_int_distribution<uint32_t> dist;

				for (auto &i : array)
				i = dist(randEngine);

				llvm::parallel_sort(std::begin(array), std::end(array));
				ASSERT_TRUE(std::is_sorted(std::begin(array), std::end(array)));
				}

				TEST(Parallel, parallel_for) {
				// We need to test the case with a TaskSize > 1. We are white-box testing
				// here. The TaskSize is calculated as (End - Begin) / 1024 at the time of
				// writing.
				uint32_t range[2050];
				std::fill(range, range + 2050, 1);
				llvm::parallel_for(0, 2049, [&range](size_t I) { ++range[I]; });

				uint32_t expected[2049];
				std::fill(expected, expected + 2049, 2);
				ASSERT_TRUE(std::equal(range, range + 2049, expected));
				// Check that we don't write past the end of the requested range.
				ASSERT_EQ(range[2049], 1u);
				}

This is an archive of the discontinued LLVM Phabricator instance.

Move Parallel.h from LLD to LLVMAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97720

lld/CMakeLists.txt

lld/COFF/ICF.cpp

lld/COFF/MapFile.cpp

lld/COFF/Writer.cpp

lld/ELF/Threads.h

lld/include/lld/Core/Parallel.h

lld/lib/ReaderWriter/MachO/LayoutPass.cpp

lld/unittests/CMakeLists.txt

lld/unittests/CoreTests/CMakeLists.txt

lld/unittests/CoreTests/ParallelTest.cpp

llvm/include/llvm/Support/Parallel.h

llvm/unittests/Support/CMakeLists.txt

llvm/unittests/Support/ParallelTest.cpp

Move Parallel.h from LLD to LLVM
AcceptedPublic