This is an archive of the discontinued LLVM Phabricator instance.

Add TaskMap for iterating a function over a set of integers
ClosedPublic

Authored by scott.smith on May 2 2017, 10:28 AM.

Download Raw Diff

Details

Reviewers

clayborg
labath
tberghammer
zturner

Commits

rG5c913e9973f4: Add TaskMap for iterating a function over a set of integers
rLLDB302223: Add TaskMap for iterating a function over a set of integers
rL302223: Add TaskMap for iterating a function over a set of integers

Summary

Many parallel tasks just want to iterate over all the possible numbers from 0 to N-1. Rather than enqueue N work items, instead just "map" the function across the requested integer space.

Diff Detail

Repository: rL LLVM

Event Timeline

IMO, this is a simpler interface than TaskRunner. Also, after this, there are no users of TaskRunner left. Should I just delete them?

I did this change to help parallel symbol demangling (to come in a separate patch). Rather than have the symbol demangler use batching, etc, I thought it should be a higher level function.

scott.smith added a reviewer: tberghammer.May 2 2017, 10:55 AM

In D32757#743567, @scott.smith wrote:

IMO, this is a simpler interface than TaskRunner. Also, after this, there are no users of TaskRunner left. Should I just delete them?

I think you might not have understood TaskRunner's real benefits. It is smart in the way it runs things: it lets you run N things and get each item as soon as it completes. The TaskMap will serialize everything. So if you have 100 items to do and item 0 takes 200 seconds to complete, and items 1 - 99 take 1ms to complete, you will need to wait for task 0 to complete before you can start parsing the data. This will slow down the DWARF parser if you switch over to this. TaskRunner should not be deleted as it has a very specific purpose. Your TaskMap works fine for one case, but not in the other.

I did this change to help parallel symbol demangling (to come in a separate patch). Rather than have the symbol demangler use batching, etc, I thought it should be a higher level function.

Make sure this is a win before switching demangling over to using threads. I tried to improve performance on demangling before and I got worse performance on MacOS when we tried it because all of the malloc contention and threading overhead.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1988 ↗	(On Diff #97469)	This replacement is ok, since no expensive work is being done in the while loop that did the task_runner_extract.WaitForNextCompletedTask();.
1994 ↗	(On Diff #97469)	Note that we lost performance here. The old code would run: while (true) { std::future<uint32_t> f = task_runner.WaitForNextCompletedTask(); // Do expensive work as soon as possible with any random task that completes as soon as it completes. Your current code says "wait to do all expensive work until I complete everything and then do all of the expensive work".
source/Utility/TaskPool.cpp
77 ↗	(On Diff #97469)	Is this named correctly? Maybe SerializedTaskMap? Or BatchedTaskMap?
100 ↗	(On Diff #97469)	TaskRunner is smart in the way it runs things: it lets you run N things and get each item as it completes. This implementation, if read it correctly, will serialize everything. So if you have 100 items to do and item at index 0 takes 200 seconds to complete, and items 1 - 99 take 1ms to complete, you will need to wait for task 0 to complete before you can start parsing the data. This will slow down the DWARF parser if you switch over to this.

This revision now requires changes to proceed.May 2 2017, 11:29 AM

And note in the DWARF parser you can't add the expensive code that aggregates all of the data into the SymbolFileDWARF into "parser_fn" because only on thread can be putting stuff into SymbolFileDWARF at a time.

In D32757#743657, @clayborg wrote:

In D32757#743567, @scott.smith wrote:

IMO, this is a simpler interface than TaskRunner. Also, after this, there are no users of TaskRunner left. Should I just delete them?

I think you might not have understood TaskRunner's real benefits. It is smart in the way it runs things: it lets you run N things and get each item as soon as it completes. The TaskMap will serialize everything. So if you have 100 items to do and item 0 takes 200 seconds to complete, and items 1 - 99 take 1ms to complete, you will need to wait for task 0 to complete before you can start parsing the data. This will slow down the DWARF parser if you switch over to this. TaskRunner should not be deleted as it has a very specific purpose. Your TaskMap works fine for one case, but not in the other.

That may provide a benefit on machines with a few cores, but doesn't really help when you have 40+ cores. Granted, the average laptop has 4 cores/8 hyperthreads.

I did this change to help parallel symbol demangling (to come in a separate patch). Rather than have the symbol demangler use batching, etc, I thought it should be a higher level function.

Make sure this is a win before switching demangling over to using threads. I tried to improve performance on demangling before and I got worse performance on MacOS when we tried it because all of the malloc contention and threading overhead.

It is on my machine, but maybe not on other machines. That would be unfortunate. I'm guessing I shouldn't add go ahead and add jemalloc/tcmalloc to work around poor a MacOS malloc? haha.

I would suggest putting this in LLVM, as something liek this:

namespace llvm {
template<typename Range, typename Func>
void parallel_apply(Range &&R, Func &&F) {
  // enqueue items here.
  // wait for all tasks to complete.
}
}

In D32757#743669, @scott.smith wrote:

In D32757#743657, @clayborg wrote:

In D32757#743567, @scott.smith wrote:

IMO, this is a simpler interface than TaskRunner. Also, after this, there are no users of TaskRunner left. Should I just delete them?

I think you might not have understood TaskRunner's real benefits. It is smart in the way it runs things: it lets you run N things and get each item as soon as it completes. The TaskMap will serialize everything. So if you have 100 items to do and item 0 takes 200 seconds to complete, and items 1 - 99 take 1ms to complete, you will need to wait for task 0 to complete before you can start parsing the data. This will slow down the DWARF parser if you switch over to this. TaskRunner should not be deleted as it has a very specific purpose. Your TaskMap works fine for one case, but not in the other.

That may provide a benefit on machines with a few cores, but doesn't really help when you have 40+ cores. Granted, the average laptop has 4 cores/8 hyperthreads.

It is no different on any machine with any number of cores. TaskRunner will be faster in all cases for the second DWARF loop. It is also nice to be able to consume the information as it comes in, so TaskRunner is needed. I do like the TaskMap idea to make things batch-able so I think they both add value.

I did this change to help parallel symbol demangling (to come in a separate patch). Rather than have the symbol demangler use batching, etc, I thought it should be a higher level function.

Make sure this is a win before switching demangling over to using threads. I tried to improve performance on demangling before and I got worse performance on MacOS when we tried it because all of the malloc contention and threading overhead.

It is on my machine, but maybe not on other machines. That would be unfortunate. I'm guessing I shouldn't add go ahead and add jemalloc/tcmalloc to work around poor a MacOS malloc? haha.

lol. If it does improve things and if it integrates nicely with all of the malloc tools on MacOS, I wouldn't be opposed. I don't know much about jemalloc/tcmalloc, but if there are perf wins to be had that don't hose over the malloc zone/block iterations I am all for it!

I can make more measurements on this.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1994 ↗	(On Diff #97469)	The following loop is not expensive, it's simple vector concatenation of fairly simple types. The actual slow work is Finalize(), which calls Sort(). m_function_basename_index is of type NameDIE, which has a UniqueCStringMap member, which declared collection as std::vector. Though arguably the Append should be pushed down into the RunTasks below.
source/Utility/TaskPool.cpp
100 ↗	(On Diff #97469)	If items 1-99 complete quickly, there isn't much advantage to handling their output before handling the output of item 0, since item 0 is likely to produce more output than 1-99 combined.

IMO the vector append issue doesn't matter, because the very next thing we do is sort. Sorting is more expensive than appending, so the append is noise.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
1994 ↗	(On Diff #97469)	This takes <0.25s (out of a total of 15 seconds of runtime) when timing lldb starting lldb (RelWithDebInfo build). That's for an aggregate of 14M+ names being added to the vectors. IMO that should not block this change. I also moved the append into RunTasks, just because we already have those subtasks.
source/Utility/TaskPool.cpp
77 ↗	(On Diff #97469)	TaskMapOverInt?

change name to TaskRunOverint
remove TaskRunner

Not to sound like a broken record, but please try to put this in LLVM instead of LLVM. I suggested a convenient function signature earlier.

This revision now requires changes to proceed.May 2 2017, 1:50 PM

s/instead of LLVM/instead of LLDB/

In D32757#743796, @zturner wrote:

s/instead of LLVM/instead of LLDB/

I hear you, but IMO it's not ready for that yet.

It would depend on ThreadPool, but
LLDB hasn't switched to ThreadPool yet, because
I want to figure out how to incorporate tasks enqueuing tasks first.

I don't want to commit a monolithic patch with all my changes (and I haven't developed them all yet), so instead I submit incremental improvements.

So I don't see where you moved all of the .Append functions. And if you did your timings with them not being in there then your timings are off.

In D32757#743874, @clayborg wrote:

So I don't see where you moved all of the .Append functions. And if you did your timings with them not being in there then your timings are off.

finalize_fn calls Append on each source first, then calls Finalize. It might be hard to see because it's now a lambda that takes the src vector and dst NameToDIE as parameters.

I know no one is using TaskRunner, but I would rather not remove it just yet. It has the possibility of being useful. Not in this case, but in general. I'd be interested in hearing what Pavel and Tamas think? None of this affects LLDB on Mac because all Darwin (macOS, iOS, tvOS, watchOS) have Apple accelerator tables emitted by default. If you really want to speed things up, at least when compiling with a new clang, we can change the compiler to emit the apple accelerator tables on all systems...

Personally I never really liked TaskRunner (even though I was the one implemented it) because I think it adds a lot of extra complexity for fairly little gain and thinking about it a bit more I think in most use case a more targeted solution at the call site will probably give better results. Also if we need it in the future it can always be restored based on git/svn history. Based on that I am happy to delete it if we have no use case for it at the moment.

Regarding Apple accelerator tables, giving it a try can be interesting (pass '-glldb' to a sufficiently new clang) but as far as I know they are not compatible with split dwarf what can be show stopper for very large applications (e.g. linking chromium on linux with "no-limit-debug-info" and without split dwarf requires unreasonably large amount of memory).

-glldb doesn't emit the Apple accelerator tables IIRC. We don't need to change the DWARF, but just add the Apple accelerator tables by modifying clang to emit them. They will be just fine with DWO as each DWO file would have a set of them. The other way to do this to just check out if this works is to modify "llvm-dsymutil --update" to be able to work on ELF files. "llvm-dsymutil --update" regenerates the accelerator tables and puts them into the DWARF using an existing file with debug info. It was made in case we missing something in the accelerator tables that we added later so we could update old DWARF files gotten from build servers.

That being said, if no one is going to miss TaskRunner then we can let it go.

scott.smith mentioned this in D32820: Parallelize demangling.May 3 2017, 11:55 AM

In D32757#743793, @zturner wrote:

Not to sound like a broken record, but please try to put this in LLVM instead of LLVM. I suggested a convenient function signature earlier.

@zturner ok to commit? TaskMapOverInt(x, y, fn) maps directly to parallel_for(0, x, fn). Rather than rebundle the change you have for lldb, only for it to be deleted once you get it into llvm, can we just commit this as a stopgap?

It is a step in the right direction as it removes TaskRunner and puts us on an API more likely to end up in LLVM.

zturner accepted this revision.May 4 2017, 5:58 PM

This revision is now accepted and ready to land.May 4 2017, 5:58 PM

Change the API to more closely match parallel_for (begin, end, fn) instead of (end, batch, fn).
Fix unit test. I didn't realize I had to run check-lldb-unit separately from dotest (oops).
Fix formatting via git-clang-format.

Last changes are cosmetic (though look big because I captured a different amount of context) so hopefully doesn't need a re-review. Can someone push them for me? Thank you!

I can do the pushing. :)

Thanks for the patch.

include/lldb/Utility/TaskPool.h
89 ↗	(On Diff #97907)	Making this a template would enable you to get rid of the std::function wrapper overhead. I have no idea whether it would make a difference in practice.

Closed by commit rL302223: Add TaskMap for iterating a function over a set of integers (authored by labath). · Explain WhyMay 5 2017, 4:30 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lldb/

trunk/

include/

lldb/

Utility/

TaskPool.h

108 lines

source/

Plugins/

SymbolFile/

DWARF/

SymbolFileDWARF.cpp

75 lines

Utility/

TaskPool.cpp

23 lines

unittests/

Utility/

TaskPoolTest.cpp

31 lines

Diff 97931

lldb/trunk/include/lldb/Utility/TaskPool.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
private:		private:
TaskPool() = delete;		TaskPool() = delete;

template <typename... T> struct RunTaskImpl;		template <typename... T> struct RunTaskImpl;

static void AddTaskImpl(std::function<void()> &&task_fn);		static void AddTaskImpl(std::function<void()> &&task_fn);
};		};

// Wrapper class around the global TaskPool implementation to make it possible
// to create a set of
// tasks and then wait for the tasks to be completed by the
// WaitForNextCompletedTask call. This
// class should be used when WaitForNextCompletedTask is needed because this
// class add no other
// extra functionality to the TaskPool class and it have a very minor
// performance overhead.
template <typename T> // The return type of the tasks what will be added to this
// task runner
class TaskRunner {
public:
// Add a task to the task runner what will also add the task to the global
// TaskPool. The
// function doesn't return the std::future for the task because it will be
// supplied by the
// WaitForNextCompletedTask after the task is completed.
template <typename F, typename... Args> void AddTask(F &&f, Args &&... args);

// Wait for the next task in this task runner to finish and then return the
// std::future what
// belongs to the finished task. If there is no task in this task runner
// (neither pending nor
// comleted) then this function will return an invalid future. Usually this
// function should be
// called in a loop processing the results of the tasks until it returns an
// invalid std::future
// what means that all task in this task runner is completed.
std::future<T> WaitForNextCompletedTask();

// Convenience method to wait for all task in this TaskRunner to finish. Do
// NOT use this class
// just because of this method. Use TaskPool instead and wait for each
// std::future returned by
// AddTask in a loop.
void WaitForAllTasks();

private:
std::list<std::future<T>> m_ready;
std::list<std::future<T>> m_pending;
std::mutex m_mutex;
std::condition_variable m_cv;
};

template <typename F, typename... Args>		template <typename F, typename... Args>
std::future<typename std::result_of<F(Args...)>::type>		std::future<typename std::result_of<F(Args...)>::type>
TaskPool::AddTask(F &&f, Args &&... args) {		TaskPool::AddTask(F &&f, Args &&... args) {
auto task_sp = std::make_shared<		auto task_sp = std::make_shared<
std::packaged_task<typename std::result_of<F(Args...)>::type()>>(		std::packaged_task<typename std::result_of<F(Args...)>::type()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));		std::bind(std::forward<F>(f), std::forward<Args>(args)...));

AddTaskImpl([task_sp]() { (*task_sp)(); });		AddTaskImpl([task_sp]() { (*task_sp)(); });
Show All 13 Lines	static void Run(Head &&h, Tail &&... t) {
f.wait();		f.wait();
}		}
};		};

template <> struct TaskPool::RunTaskImpl<> {		template <> struct TaskPool::RunTaskImpl<> {
static void Run() {}		static void Run() {}
};		};

template <typename T>		// Run 'func' on every value from begin .. end-1. Each worker will grab
template <typename F, typename... Args>		// 'batch_size' numbers at a time to work on, so for very fast functions, batch
void TaskRunner<T>::AddTask(F &&f, Args &&... args) {		// should be large enough to avoid too much cache line contention.
std::unique_lock<std::mutex> lock(m_mutex);		void TaskMapOverInt(size_t begin, size_t end,
auto it = m_pending.emplace(m_pending.end());		std::function<void(size_t)> const &func);
*it = std::move(TaskPool::AddTask(
[this, it](F f, Args... args) {
T &&r = f(std::forward<Args>(args)...);

std::unique_lock<std::mutex> lock(this->m_mutex);
this->m_ready.splice(this->m_ready.end(), this->m_pending, it);
lock.unlock();

this->m_cv.notify_one();
return r;
},
std::forward<F>(f), std::forward<Args>(args)...));
}

template <>
template <typename F, typename... Args>
void TaskRunner<void>::AddTask(F &&f, Args &&... args) {
std::unique_lock<std::mutex> lock(m_mutex);
auto it = m_pending.emplace(m_pending.end());
*it = std::move(TaskPool::AddTask(
[this, it](F f, Args... args) {
f(std::forward<Args>(args)...);

std::unique_lock<std::mutex> lock(this->m_mutex);
this->m_ready.emplace_back(std::move(*it));
this->m_pending.erase(it);
lock.unlock();

this->m_cv.notify_one();
},
std::forward<F>(f), std::forward<Args>(args)...));
}

template <typename T> std::future<T> TaskRunner<T>::WaitForNextCompletedTask() {
std::unique_lock<std::mutex> lock(m_mutex);
if (m_ready.empty() && m_pending.empty())
return std::future<T>(); // No more tasks

if (m_ready.empty())
m_cv.wait(lock, [this]() { return !this->m_ready.empty(); });

std::future<T> res = std::move(m_ready.front());
m_ready.pop_front();

lock.unlock();
res.wait();

return std::move(res);
}

template <typename T> void TaskRunner<T>::WaitForAllTasks() {
while (WaitForNextCompletedTask().valid())
;
}

#endif // #ifndef utility_TaskPool_h_		#endif // #ifndef utility_TaskPool_h_

lldb/trunk/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp

Show First 20 Lines • Show All 1,940 Lines • ▼ Show 20 Lines	if (debug_info) {
std::vector<NameToDIE> function_fullname_index(num_compile_units);		std::vector<NameToDIE> function_fullname_index(num_compile_units);
std::vector<NameToDIE> function_method_index(num_compile_units);		std::vector<NameToDIE> function_method_index(num_compile_units);
std::vector<NameToDIE> function_selector_index(num_compile_units);		std::vector<NameToDIE> function_selector_index(num_compile_units);
std::vector<NameToDIE> objc_class_selectors_index(num_compile_units);		std::vector<NameToDIE> objc_class_selectors_index(num_compile_units);
std::vector<NameToDIE> global_index(num_compile_units);		std::vector<NameToDIE> global_index(num_compile_units);
std::vector<NameToDIE> type_index(num_compile_units);		std::vector<NameToDIE> type_index(num_compile_units);
std::vector<NameToDIE> namespace_index(num_compile_units);		std::vector<NameToDIE> namespace_index(num_compile_units);

std::vector<bool> clear_cu_dies(num_compile_units, false);		// std::vector<bool> might be implemented using bit test-and-set, so use
		// uint8_t instead.
		std::vector<uint8_t> clear_cu_dies(num_compile_units, false);
auto parser_fn = [debug_info, &function_basename_index,		auto parser_fn = [debug_info, &function_basename_index,
&function_fullname_index, &function_method_index,		&function_fullname_index, &function_method_index,
&function_selector_index, &objc_class_selectors_index,		&function_selector_index, &objc_class_selectors_index,
&global_index, &type_index,		&global_index, &type_index,
&namespace_index](uint32_t cu_idx) {		&namespace_index](uint32_t cu_idx) {
DWARFCompileUnit *dwarf_cu = debug_info->GetCompileUnitAtIndex(cu_idx);		DWARFCompileUnit *dwarf_cu = debug_info->GetCompileUnitAtIndex(cu_idx);
if (dwarf_cu) {		if (dwarf_cu) {
dwarf_cu->Index(		dwarf_cu->Index(
function_basename_index[cu_idx], function_fullname_index[cu_idx],		function_basename_index[cu_idx], function_fullname_index[cu_idx],
function_method_index[cu_idx], function_selector_index[cu_idx],		function_method_index[cu_idx], function_selector_index[cu_idx],
objc_class_selectors_index[cu_idx], global_index[cu_idx],		objc_class_selectors_index[cu_idx], global_index[cu_idx],
type_index[cu_idx], namespace_index[cu_idx]);		type_index[cu_idx], namespace_index[cu_idx]);
}		}
return cu_idx;		return cu_idx;
};		};

auto extract_fn = [debug_info](uint32_t cu_idx) {		auto extract_fn = [debug_info, &clear_cu_dies](uint32_t cu_idx) {
DWARFCompileUnit *dwarf_cu = debug_info->GetCompileUnitAtIndex(cu_idx);		DWARFCompileUnit *dwarf_cu = debug_info->GetCompileUnitAtIndex(cu_idx);
if (dwarf_cu) {		if (dwarf_cu) {
// dwarf_cu->ExtractDIEsIfNeeded(false) will return zero if the		// dwarf_cu->ExtractDIEsIfNeeded(false) will return zero if the
// DIEs for a compile unit have already been parsed.		// DIEs for a compile unit have already been parsed.
return std::make_pair(cu_idx, dwarf_cu->ExtractDIEsIfNeeded(false) > 1);		if (dwarf_cu->ExtractDIEsIfNeeded(false) > 1)
		clear_cu_dies[cu_idx] = true;
}		}
return std::make_pair(cu_idx, false);
};		};

// Create a task runner that extracts dies for each DWARF compile unit in a		// Create a task runner that extracts dies for each DWARF compile unit in a
// separate thread		// separate thread
TaskRunner<std::pair<uint32_t, bool>> task_runner_extract;
for (uint32_t cu_idx = 0; cu_idx < num_compile_units; ++cu_idx)
task_runner_extract.AddTask(extract_fn, cu_idx);

//----------------------------------------------------------------------		//----------------------------------------------------------------------
// First figure out which compile units didn't have their DIEs already		// First figure out which compile units didn't have their DIEs already
// parsed and remember this. If no DIEs were parsed prior to this index		// parsed and remember this. If no DIEs were parsed prior to this index
// function call, we are going to want to clear the CU dies after we		// function call, we are going to want to clear the CU dies after we
// are done indexing to make sure we don't pull in all DWARF dies, but		// are done indexing to make sure we don't pull in all DWARF dies, but
// we need to wait until all compile units have been indexed in case		// we need to wait until all compile units have been indexed in case
// a DIE in one compile unit refers to another and the indexes accesses		// a DIE in one compile unit refers to another and the indexes accesses
// those DIEs.		// those DIEs.
//----------------------------------------------------------------------		//----------------------------------------------------------------------
while (true) {		TaskMapOverInt(0, num_compile_units, extract_fn);
auto f = task_runner_extract.WaitForNextCompletedTask();
if (!f.valid())
break;
unsigned cu_idx;
bool clear;
std::tie(cu_idx, clear) = f.get();
clear_cu_dies[cu_idx] = clear;
}

// Now create a task runner that can index each DWARF compile unit in a		// Now create a task runner that can index each DWARF compile unit in a
// separate		// separate
// thread so we can index quickly.		// thread so we can index quickly.

TaskRunner<uint32_t> task_runner;		TaskMapOverInt(0, num_compile_units, parser_fn);
for (uint32_t cu_idx = 0; cu_idx < num_compile_units; ++cu_idx)
task_runner.AddTask(parser_fn, cu_idx);		auto finalize_fn = [](NameToDIE &index, std::vector<NameToDIE> &srcs) {
		for (auto &src : srcs)
while (true) {		index.Append(src);
std::future<uint32_t> f = task_runner.WaitForNextCompletedTask();		index.Finalize();
if (!f.valid())		};
break;
uint32_t cu_idx = f.get();		TaskPool::RunTasks(
		[&]() {
m_function_basename_index.Append(function_basename_index[cu_idx]);		finalize_fn(m_function_basename_index, function_basename_index);
m_function_fullname_index.Append(function_fullname_index[cu_idx]);		},
m_function_method_index.Append(function_method_index[cu_idx]);		[&]() {
m_function_selector_index.Append(function_selector_index[cu_idx]);		finalize_fn(m_function_fullname_index, function_fullname_index);
m_objc_class_selectors_index.Append(objc_class_selectors_index[cu_idx]);		},
m_global_index.Append(global_index[cu_idx]);		[&]() { finalize_fn(m_function_method_index, function_method_index); },
m_type_index.Append(type_index[cu_idx]);		[&]() {
m_namespace_index.Append(namespace_index[cu_idx]);		finalize_fn(m_function_selector_index, function_selector_index);
}		},
		[&]() {
TaskPool::RunTasks([&]() { m_function_basename_index.Finalize(); },		finalize_fn(m_objc_class_selectors_index, objc_class_selectors_index);
[&]() { m_function_fullname_index.Finalize(); },		},
[&]() { m_function_method_index.Finalize(); },		[&]() { finalize_fn(m_global_index, global_index); },
[&]() { m_function_selector_index.Finalize(); },		[&]() { finalize_fn(m_type_index, type_index); },
[&]() { m_objc_class_selectors_index.Finalize(); },		[&]() { finalize_fn(m_namespace_index, namespace_index); });
[&]() { m_global_index.Finalize(); },
[&]() { m_type_index.Finalize(); },
[&]() { m_namespace_index.Finalize(); });

//----------------------------------------------------------------------		//----------------------------------------------------------------------
// Keep memory down by clearing DIEs for any compile units if indexing		// Keep memory down by clearing DIEs for any compile units if indexing
// caused us to load the compile unit's DIEs.		// caused us to load the compile unit's DIEs.
//----------------------------------------------------------------------		//----------------------------------------------------------------------
for (uint32_t cu_idx = 0; cu_idx < num_compile_units; ++cu_idx) {		for (uint32_t cu_idx = 0; cu_idx < num_compile_units; ++cu_idx) {
if (clear_cu_dies[cu_idx])		if (clear_cu_dies[cu_idx])
debug_info->GetCompileUnitAtIndex(cu_idx)->ClearDIEs(true);		debug_info->GetCompileUnitAtIndex(cu_idx)->ClearDIEs(true);
▲ Show 20 Lines • Show All 2,259 Lines • Show Last 20 Lines

lldb/trunk/source/Utility/TaskPool.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	while (true) {

std::function<void()> f = pool->m_tasks.front();		std::function<void()> f = pool->m_tasks.front();
pool->m_tasks.pop();		pool->m_tasks.pop();
lock.unlock();		lock.unlock();

f();		f();
}		}
}		}

		void TaskMapOverInt(size_t begin, size_t end,
		std::function<void(size_t)> const &func) {
		std::atomic<size_t> idx{begin};
		size_t num_workers =
		std::min<size_t>(end, std::thread::hardware_concurrency());

		auto wrapper = [&idx, end, &func]() {
		while (true) {
		size_t i = idx.fetch_add(1);
		if (i >= end)
		break;
		func(i);
		}
		};

		std::vector<std::future<void>> futures;
		futures.reserve(num_workers);
		for (size_t i = 0; i < num_workers; i++)
		futures.push_back(TaskPool::AddTask(wrapper));
		for (size_t i = 0; i < num_workers; i++)
		futures[i].wait();
		}

lldb/trunk/unittests/Utility/TaskPoolTest.cpp

Show All 24 Lines	TaskPool::RunTasks([fn, &r]() { fn(1, r[0]); }, [fn, &r]() { fn(2, r[1]); },
[fn, &r]() { fn(3, r[2]); }, [fn, &r]() { fn(4, r[3]); });		[fn, &r]() { fn(3, r[2]); }, [fn, &r]() { fn(4, r[3]); });

ASSERT_EQ(2, r[0]);		ASSERT_EQ(2, r[0]);
ASSERT_EQ(5, r[1]);		ASSERT_EQ(5, r[1]);
ASSERT_EQ(10, r[2]);		ASSERT_EQ(10, r[2]);
ASSERT_EQ(17, r[3]);		ASSERT_EQ(17, r[3]);
}		}

TEST(TaskPoolTest, TaskRunner) {		TEST(TaskPoolTest, TaskMap) {
auto fn = [](int x) { return std::make_pair(x, x * x); };		int data[4];
		auto fn = [&data](int x) { data[x] = x * x; };
TaskRunner<std::pair<int, int>> tr;
tr.AddTask(fn, 1);		TaskMapOverInt(0, 4, fn);
tr.AddTask(fn, 2);
tr.AddTask(fn, 3);		ASSERT_EQ(data[0], 0);
tr.AddTask(fn, 4);		ASSERT_EQ(data[1], 1);
		ASSERT_EQ(data[2], 4);
int count = 0;		ASSERT_EQ(data[3], 9);
while (true) {
auto f = tr.WaitForNextCompletedTask();
if (!f.valid())
break;

++count;
std::pair<int, int> v = f.get();
ASSERT_EQ(v.first * v.first, v.second);
}

ASSERT_EQ(4, count);
}		}

This is an archive of the discontinued LLVM Phabricator instance.

Add TaskMap for iterating a function over a set of integersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97931

lldb/trunk/include/lldb/Utility/TaskPool.h

lldb/trunk/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp

lldb/trunk/source/Utility/TaskPool.cpp

lldb/trunk/unittests/Utility/TaskPoolTest.cpp

Add TaskMap for iterating a function over a set of integers
ClosedPublic