Page MenuHomePhabricator

[OpenMP] Added the support for unshackled task in RTL
Needs ReviewPublic

Authored by tianshilei1992 on Apr 6 2020, 4:54 PM.

Details

Reviewers
jdoerfert
Summary

This patch is just a drafted version to show my design of implementing unshackled task.
It contains some problems, like error information is not right, but it could basically show the whole picture.
Will make things right if the design itself is generally right.

The basic design is to create an outer-most parallel team.
It is not regular team because it has its own root.
We first use pthread_create to create a new thread, let's call it the initial and also master thread of unshackled team.
This initial thread then initializes a new root, just like what RTL does in initialization.
After that, directly call __kmpc_fork_call. It is like the initial thread encounters a parallel region.
The wrapped function for this team is, for master thread, which is the initial thread that we create via pthread_create on Linux,
it waits on a condition variable.
The condition variable can only be signaled when RTL is deconstructing.
For other work threads, they just do nothing.
The reason that master thread needs to wait there is, in current implementation, once master thread finishes the wrapped function of this team,
it starts to free the team which is not what we want.

Here are some open issues to be discussed:

  1. The master thread sleeps when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Diff Detail

Event Timeline

tianshilei1992 created this revision.Apr 6 2020, 4:54 PM
tianshilei1992 edited the summary of this revision. (Show Details)Apr 6 2020, 4:54 PM

@jdoerfert Please help add more people. :-)

tianshilei1992 edited the summary of this revision. (Show Details)Apr 7 2020, 10:50 AM

The idea looks feasible to me in general.

Major unclear points off the top of my head are:

  • The implementation behavior is not yet defined by the specification, so it will in any way be pilot implementation. Or are you trying it to help the specification progress?
  • "For now all tasks are unshackled" - good. But to me this looks like a central point of implementation. What tasks should/can be given to unshackled threads? Is it user's decision, or runtime decision, or some combination (suggest new clause/hint?)? If such a task produce another tasks what to do with them (which thread to push them to, which team should execute them)? Tasks may have dependencies, affinity, priority, what to do with them? Should we exclude them from unshackled threads, or have some heuristics here as well? Etc.

Regarding platform (in)dependence, we do have the implementation of monitor thread, similar technique can be used for the unshackled master thread. And it could not only sleep on condition variable forever, but do some periodic bookkeeping for its team, e.g. to support various wait policies.

Also the question can be for a parent of the task delegated to unshackled thread. Parent-child relation has some bookkeepings those should not be broken.

  • Andrey

Thank Andrey for your valuable comment.

The implementation behavior is not yet defined by the specification, so it will in any way be pilot implementation. Or are you trying it to help the specification progress?

That is a good question. The vision of course is to make it part of specification. :-)

"For now all tasks are unshackled" - good. But to me this looks like a central point of implementation.

Sorry for the unclear here. "All tasks are unshackled" means, if you look at the source code, the flag is always set to 1. Just for debug, no other reason. Basically, unshackled tasks should not impact current regular task, at least before spec says anything different.

What tasks should/can be given to unshackled threads? Is it user's decision, or runtime decision, or some combination (suggest new clause/hint?)?

That is a good point. For current purpose, it can only be used by compiler. If a nowait target is not in any parallel region, an unshackled task will be created by compiler. The motivation is, just for your information in case that you don't know the context, the "unshackled" here means it will not be bound to any parallel region. That is particularly useful for the case I mentioned. Of course it also implies, especially in current implementation design, they're actually implicitly bound to the same outer-most parallel region. Regular task synchronization should NOT work for them, except one special case that the taskwait depends on an unshackled task. What's more, since it is not part of spec, it can not be used by users. Of course if it is part of spec in the future, users can decide it and there must be new clause for it.

If such a task produce another tasks what to do with them (which thread to push them to, which team should execute them)? Tasks may have dependencies, affinity, priority, what to do with them? Should we exclude them from unshackled threads, or have some heuristics here as well? Etc.

Those are actually good open problems. From my perspective, the task generated by unshackled task should also be unshackled. Dependencies are not problem, same with regular tasks. As for others, maybe we should exclude those parts. Unshackled task is not born to replace regular task. It is a convenient way to write some tasks without wrapping them into a parallel region which is pretty weird. If users require these fancy features, they should go to regular tasks.

Regarding platform (in)dependence, we do have the implementation of monitor thread, similar technique can be used for the unshackled master thread.

Could you please tell which part is monitor thread?

And it could not only sleep on condition variable forever, but do some periodic bookkeeping for its team, e.g. to support various wait policies.

It might depend on whether we will have various wait polices. :-)

Could you please tell which part is monitor thread?

Code under "#if KMP_USE_MONITOR" implements monitor thread which used earlier to control behavior of worker threads. Now it is not used by default, and worker threads follow requested wait policy themselves.

It might depend on whether we will have various wait polices. :-)

We do have active/passive wait policies for worker threads. I see no reason why unshackled thread would not follow same policy as other threads.

Kept the parent-child chain so that taskwait could seamless work. Also mark must_wait as true if unshackled task is enabled because we might have a task out of any parallel region which means even though the team is serial

This patch basically did the following things:

  1. Rebased source code;
  2. Changed the way to define macro to align with existing method;
  3. Fixed an issue that taskwait might be stuck due to the reason that all unshackled threads can not steal tasks from the master thread correctly.
tianshilei1992 updated this revision to Diff 263093.EditedMay 10 2020, 5:48 PM

This patch tried to fix various issues of race condition and introduced the concept of shadow concept. Now unshackled tasks are distributed to different unshackled threads rather than just pushing them to the master thread of unshackled team.

FWIW, we are still developing this (or Shilei is), then we'll evaluate it, and then we will propose it "properly". Early feedback is useful though :)

What tasks should/can be given to unshackled threads? Is it user's decision, or runtime decision, or some combination (suggest new clause/hint?)?

That is a good point. For current purpose, it can only be used by compiler. If a nowait target is not in any parallel region, an unshackled task will be created by compiler. The motivation is, just for your information in case that you don't know the context, the "unshackled" here means it will not be bound to any parallel region. That is particularly useful for the case I mentioned. Of course it also implies, especially in current implementation design, they're actually implicitly bound to the same outer-most parallel region. Regular task synchronization should NOT work for them, except one special case that the taskwait depends on an unshackled task. What's more, since it is not part of spec, it can not be used by users. Of course if it is part of spec in the future, users can decide it and there must be new clause for it.

According to the OpenMP specification, you always have an implicit parallel region surrounding the whole execution. Your target nowait binds to this implicit parallel region. For this reason, the target task must respect all means of synchronization like dependencies, taskwait, barrier, ...

What is the concrete scenario, in which you need this monitor thread to schedule device activity?
Did you consider to use detached tasks to implement asynchronous target offloading?

What is the concrete scenario, in which you need this monitor thread to schedule device activity?
Did you consider to use detached tasks to implement asynchronous target offloading?

Yes. In fact, it is natural to think about using detachable tasks. However, if the host side just creates a detachable task and move forward doing some heavy workload, such as encountering a work-sharing construct, no thread can pick up the task and execute it, which hurts the concurrency between host and device.

What tasks should/can be given to unshackled threads? Is it user's decision, or runtime decision, or some combination (suggest new clause/hint?)?

That is a good point. For current purpose, it can only be used by compiler. If a nowait target is not in any parallel region, an unshackled task will be created by compiler. The motivation is, just for your information in case that you don't know the context, the "unshackled" here means it will not be bound to any parallel region. That is particularly useful for the case I mentioned. Of course it also implies, especially in current implementation design, they're actually implicitly bound to the same outer-most parallel region. Regular task synchronization should NOT work for them, except one special case that the taskwait depends on an unshackled task. What's more, since it is not part of spec, it can not be used by users. Of course if it is part of spec in the future, users can decide it and there must be new clause for it.

According to the OpenMP specification, you always have an implicit parallel region surrounding the whole execution. Your target nowait binds to this implicit parallel region. For this reason, the target task must respect all means of synchronization like dependencies, taskwait, barrier, ...

Sure. Do you have any scenario in mind you expect not to work? The encountering thread will wait for these "unshackled" tasks on a taskwait for example. They also interact with depend clauses the way you want them too, as far as I can tell.

What is the concrete scenario, in which you need this monitor thread to schedule device activity?

As an alternative to "real tasks":

  • no explicit host parallel region
  • all threads in the host parallel region busy

As an alternative to how detached tasks would look like (see below):

  • no asynchronous native runtime calls available (e.g., host)
  • can be much faster if host dependence resolution is involved
  • is as fast as the async native runtime call solution in XL in the absence of dependences but *reusable* for other cases & runtimes

Did you consider to use detached tasks to implement asynchronous target offloading?

Sure, but they don't work too different after all, at least not in the way you would use them here.
You would *not* create a task to be picked up, as that has the drawbacks mentioned above, among other things.
You would execute async native runtime calls and instruct the native runtime to fulfill the event once all work has been done.
From the users perspective this is no different than a new thread picking it up (as the native runtime actually *is* a new thread picking it up).

tianshilei1992 edited the summary of this revision. (Show Details)Mon, Jun 29, 9:35 PM
tianshilei1992 edited the summary of this revision. (Show Details)
tianshilei1992 edited the summary of this revision. (Show Details)