This is an archive of the discontinued LLVM Phabricator instance.

Enable tasks dependencies hashmaps resizing
ClosedPublic

Authored by viroulep on Sep 11 2019, 7:31 AM.

Details

Summary

This patch is a follow up to https://reviews.llvm.org/D63196.
In this paper (full text) we studied the impact of the dependencies hash table capacity on the performance of one of our applications (see section 3.3, experiments are made using the "jacobi" kernel in the KASTORS benchmark suite).
We did a breakdown of the task management related task (Figure 3) which showed that a lot of time was spent in the check dependencies part, which we narrowed down to the dependency lookup part.
Statically changing the size proved to be an appropriate quickfix, but it would be nicer to have the hashtables be resized automatically when reaching some threshold.

While simply doubling the hashtable capacity would be a quite easy implementation, it also leads to disastrous collisions statistics (see here and here for some experiments on a 24 core haswell architecture and an arm architecture).
It shows how many buckets have x elements just before resizing the hashtable (I removed the 0 elements bar for visibility, as most buckets are actually empty), which basically shows there are a whole lot of collisions.

So instead I went for an arbitrarily fixed amount of resizing, using prime numbers close to twice the previous capacity.
The buckets distributions for the same application executions (here and here) look far better as the majority of the buckets are used and collisions don't go as high as before.

The hashtable resizing is triggered when the total number of conflicts in all buckets exceeds the number of buckets.

By using this resizing mechanic we managed to observe roughly the same performance gains as in the paper, but without having to manually change the hashtable size.

Diff Detail

Repository
rL LLVM

Event Timeline

viroulep created this revision.Sep 11 2019, 7:31 AM

I like this. If there are objections, questions, concerns, please raise them now.

runtime/src/kmp_taskdeps.cpp
72 ↗(On Diff #219714)

Do we want to give up here? I've seen people with *a lot* of dynamic tasks so we might want to scale somehow.

AndreyChurbanov accepted this revision.Sep 16 2019, 9:10 AM
AndreyChurbanov added inline comments.
runtime/src/kmp_taskdeps.cpp
72 ↗(On Diff #219714)

I don't think making unlimited hash is the right way to scale application with millions of different dependences per single hash. There should be trivial workaround for users, like replace

for (i = 0; i < N; i++) {
  #pragma omp task depend(inout: deps[i])

with

for (i = 0; i < N; i++) {
  #pragma omp task depend(inout: deps[i%some_threshold])

This should work fine given that some_threshold is reasonably bigger than number of threads. Rather than having unlimited hash, same dependences can better be re-used without losing semantics, e.g. if previous tasks had already been executed by the team. Even if this is not realistic, e.g. because tasks are executed too slowly comparing to speed of their generation, the introduced overhead of extra dependences should be negligible comparing to maintaining unlimited hash size (my view may be questionable of cause...).

I also thought of things like cleaning hash entry after the dependence is not needed any more, but his does not look simple to implement. And may impact performance because of needed extra synchronizations.

This revision is now accepted and ready to land.Sep 16 2019, 9:10 AM

@viroulep do you have commit access or should we land that for you?

Sorry for the delay in getting back to you there.
@protze.joachim I actually don't have commit access, it would be awesome if one of you could land this patch for me!

@AndreyChurbanov, @jdoerfert thanks for the reviews and comments!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 7:39 AM