Page MenuHomePhabricator

[OpenMP] Added the support for unshackled task in RTL
Needs ReviewPublic

Authored by tianshilei1992 on Apr 6 2020, 4:54 PM.

Details

Summary

The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first unshackled task is encountered, and is only responsible for the execution of unshackled tasks. We first use pthread_create to create a new thread, let's call it the initial and also the master thread of the unshackled team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls __kmpc_fork_call. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for master thread, which is the initial thread that we create via pthread_create on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that master thread needs to wait there is, in current implementation, once master thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, LIBOMP_NUM_UNSHACKLED_THREADS and LIBOMP_USE_UNSHACKLED_TASK, are also set to configure the number of threads and enable/disable this feature. By default, the number of unshackled threads is 8.

Here are some open issues to be discussed:

  1. The master thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
tianshilei1992 added inline comments.Aug 26 2020, 6:42 PM
openmp/runtime/src/kmp_tasking.cpp
3727

Sorry I still didn't understand this part. Could you please expatiate it?

Fixed several issues

tianshilei1992 marked an inline comment as done.

Fixed one issue in __kmp_allocate_thread

tianshilei1992 marked an inline comment as done.Aug 30 2020, 5:21 PM

Fixed wrong code comments

Harbormaster completed remote builds in B70056: Diff 288879.

Fixed the issue in must_wait

tianshilei1992 marked an inline comment as done.Aug 31 2020, 12:06 PM
adurang added inline comments.Sep 1 2020, 4:57 AM
openmp/runtime/src/kmp_tasking.cpp
3727

For the code below:

#pragma omp parallel num_threads(2)
{
#pragma omp target nowait
   blah()
#pragma omp taskwait
}

With your current code (because you're using a shared counter for the whole team), both thread 1 and 2 are waiting for each others target regions ( so for example, even if target-th1 was finished thread1 would be blocked until target-th2 was completed). Each taskwait should only be waiting for their own child target tasks.

Hope this helps.

tianshilei1992 added inline comments.Sep 1 2020, 9:54 AM
openmp/runtime/src/kmp_tasking.cpp
3727

Thanks for the explanation. But this lines of code are in the function __kmp_task_team_wait that is not called by __kmpc_omp_taskwait. If I understand correctly, __kmp_task_team_wait is called by the master thread of a team to wait for all tasks created in the team to finish so that it can proceed. So we need to wait for all unshackled tasks encountered/created in the task team.

ye-luo added a subscriber: ye-luo.Sep 1 2020, 2:00 PM
ye-luo added inline comments.
openmp/runtime/src/kmp_tasking.cpp
3727

@adurang your example demonstrates exactly the code pattern I use. taskwait should only wait for the child tasks.

adurang added inline comments.Sep 1 2020, 3:00 PM
openmp/runtime/src/kmp_tasking.cpp
3727

Ah sorry, I didn't notice the patch changing functions. I should really look at the whole file!

But if taskwait is working correctly the flag.wait call in __kmp_task_team_wait should also make sure that no outstanding unshackled tasks are left and then I don't think the extra check shouldn't be needed. Do you have any tests with taskgroup/taskwait?

In any case, could you move it inside the same if statement as the other checks? Also, you need to set tt_unfinished_unshackled_tasks to FALSE in case the same task_team structure is reused (Note that is done the same for various fields just above).

Added another condition to see whether we need to wait in the task team

Couple of tests needed to check if the implementation works - one with unshackled task encountered before parallel, and another with unshackled task encountered after / between parallels.

openmp/runtime/src/kmp_runtime.cpp
4335

This condition is false if new_gtid started with (__kmp_unshackled_threads_num + 1), that is for regular thread. Thus all threads will mistakenly get the same gtid.

Fixed some problems and added the first test case. More cases are on the way.

Added another test case to test dependence process

Fixed a potental race condition

Still trying to fix a race problem

Added the missing part for the team destroy

Refactored code in z_Linux_util.cpp

Updated some tests

tianshilei1992 marked 5 inline comments as done.Oct 16 2020, 1:38 PM

Enabled unshackled thread by default

ye-luo added a comment.EditedOct 16 2020, 4:31 PM

Enabled unshackled thread by default

What is the current supported way of turning off unshackled thread team with zero side effect?

Enabled unshackled thread by default

What is the current supported way of turning off unshackled thread team with zero side effect?

Currently it can be disabled by setting LIBOMP_USE_UNSHACKLED_TASK to OFF at the CMake stage. It is also feasible to make it during runtime but that would bring in extra overhead.

Added a new test case for taskgroup

Fixed test case description

Disabled unshackled task on macOS as well

tianshilei1992 retitled this revision from [OpenMP][WIP] Added the support for unshackled task in RTL to [OpenMP] Added the support for unshackled task in RTL.Oct 19 2020, 12:58 PM

I left some comments. Generally, I would prefer we minimize the use of the macro to elide declarations. I'd also prefer to use the macro as part of the conditions to avoid duplication.
Instead of

#ifdef MACRO
foo(X)
#else 
foo(Y)
#endif

we do

v = MACRO ? X : Y;
foo(v);

which is really helpful if foo is complex code and just as fast.

@adurang @AndreyChurbanov have your concerns been addressed?

openmp/runtime/src/kmp.h
2244–2245

Do we really need the USE_UNSHACKLED_TASK flag? Even if we want it, we don't need it here in the struct. Let's waste one bit on windows until we catch up and remove complexity for everyone.

2299

Grammar:
"The task team of its parent task team"
and
"therefore we it when this task is created"

2309

Similarly, I don't think the byte savings here are worth it.

openmp/runtime/src/kmp_runtime.cpp
3644

Is the code in the #else case the same as in the } else { case? If so, make the conditional if (USE_UNSHACKLED_TASK && ...) and avoid the duplication of bugs.

4334

Nit: can we move the initialization out of the loop, hard to read. A comment might help as well.

Looking at the code more generally, is this the same code as below with different bounds? If so, avoid the duplication all together please, same way as suggested above.

@adurang @AndreyChurbanov have your concerns been addressed?

I didn't see the problem with release_deps being solved (maybe I missed). And I think we should really have a mechanism to set the number of threads instead of a hardcoded '8' and not have the threads created until is necessary.

Also, given efforts in OpenMP to remove master and similar terms maybe we should think about renaming "unshackled" to something else like "helper" or "auxilary"? I know is a bit of a pain to do that so I won't press for this but thought that I should mention it.

tianshilei1992 added inline comments.Wed, Oct 28, 7:41 AM
openmp/runtime/src/kmp_taskdeps.h
126

@adurang The problem of release deeps was fixed here.

Enhanced one test case and fixed some comments

tianshilei1992 marked 2 inline comments as done.Wed, Oct 28, 8:04 PM

I left some comments. Generally, I would prefer we minimize the use of the macro to elide declarations. I'd also prefer to use the macro as part of the conditions to avoid duplication.
Instead of

#ifdef MACRO
foo(X)
#else 
foo(Y)
#endif

we do

v = MACRO ? X : Y;
foo(v);

which is really helpful if foo is complex code and just as fast.

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

Changed some code to make it more readable and less complex.

The failed case is because the gtid is not offset. What is a right way to detect whether a CMake variable or macro is defined?

jdoerfert added a comment.EditedThu, Oct 29, 1:08 PM

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

Added support for setting number of unshackled threads via environment variable

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Then make USE_UNSHACKLED_TASK default and remove all the uses that elide declarations and definitions.

ye-luo added a comment.EditedTue, Nov 10, 10:05 AM

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Then make USE_UNSHACKLED_TASK default and remove all the uses that elide declarations and definitions.

Better to have a way to elide unshackled thread team creation at runtime before putting LIBOMP_USE_UNSHACKLED_TASK by default.

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Then make USE_UNSHACKLED_TASK default and remove all the uses that elide declarations and definitions.

Better to have a way to elide unshackled thread team creation at runtime before putting LIBOMP_USE_UNSHACKLED_TASK by default.

It's already included in this patch.

Added the missing variable initialization

Removed the marcro USE_UNSHACKLED_TASK

tianshilei1992 edited the summary of this revision. (Show Details)Wed, Nov 11, 8:58 AM
tianshilei1992 marked 3 inline comments as done.Wed, Nov 11, 12:51 PM