This is an archive of the discontinued LLVM Phabricator instance.

Create a runtime option to disable task throttling
ClosedPublic

Authored by viroulep on Jun 12 2019, 6:06 AM.

Details

Summary

By default a task execution is serialized when a task queue is full, the one exception being when its dependencies are not met, then the task queue is extended.

This behavior may cripple the application performance in some specific task graph scenarios, like the ones detailed in section 4.2 from this paper published at IWOMP 2018 (the full text can be found here).
In such cases not having the full task graph prevent some opportunities for cache reuse between successive tasks of the stencil algorithm.

After the paper Thierry Gautier (whom I work with) was contacted by Jim Cownie who seemed to think some of the improvements described in the paper may be interesting enough to be merged upstream.

While the changes in the paper were made against an earlier version of the runtime, it's actually easier to integrate now, as the mechanism to resize task queues already exists.
So my suggestion would be to add an option to make sure the task queues are always resized if it's full and task throttling is disabled.

Please let me know what you think.

Diff Detail

Repository
rL LLVM

Event Timeline

viroulep created this revision.Jun 12 2019, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2019, 6:06 AM

Hi there,

by any chance did anyone had the time to take a look at this?

AndreyChurbanov accepted this revision.Jun 25 2019, 5:25 AM

LGTM,

sorry for the delay.

This revision is now accepted and ready to land.Jun 25 2019, 5:25 AM

Thank you very much for the review Andrey!
I don't have commit access to the repository, would you mind committing these changes on my behalf or pointing me to someone who can please?

Isn't this missing test coverage?

Yes, adding tests would be great. Simplest thing to do is to extend existing tests with throttling on/off, I think.

Philippe,
If you don't mind, could you please update the patch adding tests. I'd suggest to use two existing tests with big number of created tasks (all others use small number of tasks and thus are not relevant to this patch). Possible additional change is:

Index: runtime/test/tasking/omp_taskloop_grainsize.c
===================================================================
--- runtime/test/tasking/omp_taskloop_grainsize.c       (revision 364425)
+++ runtime/test/tasking/omp_taskloop_grainsize.c       (working copy)
@@ -1,5 +1,7 @@
 // RUN: %libomp-compile-and-run
 // RUN: %libomp-compile && env KMP_TASKLOOP_MIN_TASKS=1 %libomp-run
+// RUN: %libomp-compile && env KMP_ENABLE_TASK_THROTTLING=0 %libomp-run
+// RUN: %libomp-compile && env KMP_ENABLE_TASK_THROTTLING=1 %libomp-run
 // REQUIRES: openmp-4.5

 // These compilers don't support the taskloop construct
Index: runtime/test/tasking/omp_taskloop_num_tasks.c
===================================================================
--- runtime/test/tasking/omp_taskloop_num_tasks.c       (revision 364425)
+++ runtime/test/tasking/omp_taskloop_num_tasks.c       (working copy)
@@ -2,6 +2,8 @@
 // UNSUPPORTED: netbsd
 // RUN: %libomp-compile-and-run
 // RUN: %libomp-compile && env KMP_TASKLOOP_MIN_TASKS=1 %libomp-run
+// RUN: %libomp-compile && env KMP_ENABLE_TASK_THROTTLING=0 %libomp-run
+// RUN: %libomp-compile && env KMP_ENABLE_TASK_THROTTLING=1 %libomp-run
 // REQUIRES: openmp-4.5

 // These compilers don't support the taskloop construct
AndreyChurbanov requested changes to this revision.Jun 27 2019, 6:01 AM

Of cause you are free to choose your own test, or simply extend existing test(s).

This revision now requires changes to proceed.Jun 27 2019, 6:01 AM
viroulep updated this revision to Diff 207317.Jul 1 2019, 8:16 AM

This adds a specific test for the task throttling behavior.

Thanks Roman and Andrey for the additional comments!
Sorry for not including tests right away, and thanks for the detailed suggestions, I really appreciate it.

I ended up adding an extra test: I didn't want to mess around the grain size in the taskloop test (as the number of tasks created is not high enough), and I also wanted to explicitly test the task throttling behavior.
I did this by creating a lot of tasks and checking the number of tasks enqueued when the master thread starts executing tasks.
I've made a (hopefully) comprehensive explanation at the beginning of the test.

This revision is now accepted and ready to land.Jul 2 2019, 8:05 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 8:11 AM

Thanks for the merge!