Page MenuHomePhabricator

[Clang][OpenMP] Added support for nowait target in CodeGen via regular task
ClosedPublic

Authored by tianshilei1992 on Apr 13 2020, 7:48 PM.

Details

Summary

Previously for nowait target, CG emitted a function call to __tgt_target_nowait, etc. However, in OpenMP RTL, these functions just directly call the no-nowait version, which means nowait is not working as expected.

OpenMP specification says a target is acutally a target task, which is an untied and detachable task. It is natural to go to the direction that generates a task for a nowait target. However, OpenMP task has a problem that it must be within to a parallel region; otherwise the task will be executed immediately. As a result, if we directly wrap to a regular task, the target nowait outside of a parallel region is still a synchronous version.

In D77609, I added the support for unshackled task in OpenMP RTL. Basically, unshackled task is a task that is not bound to any parallel region. So all nowait target will be tranformed into an unshackled task. In order to distinguish from regular task, a new flag bit is set for unshackled task. This flag will be used by RTL for later process.

Since all target tasks are allocated via __kmpc_omp_target_task_alloc, and in current libomptarget, __kmpc_omp_target_task_alloc just calls __kmpc_omp_task_alloc. Therefore, we can modify the flag in __kmpc_omp_target_task_alloc so that we don't need to modify the FE too much. If users choose to opt out the feature, they just need to use a RTL w/o support of unshackled threads.

As a result, in this patch, the target nowait region is simply wrapped into a regular task. Later once we have RTL support for unshackled tasks, the wrapped tasks can be executed by unshackled threads w/o changes in the FE.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Apr 13 2020, 7:48 PM
Herald added a project: Restricted Project. · View Herald Transcript

You need to update the tests too.

You need to update the tests too.

Yeah, I will do that. Basically I would like to do that if this direction is not wrong... :-)

Will update failed tests later

Rebased the patch

tianshilei1992 retitled this revision from [Clang][OpenMP] Added support for nowait target in CodeGen to [WIP][Clang][OpenMP] Added support for nowait target in CodeGen.Sep 1 2020, 8:39 PM
tianshilei1992 removed a reviewer: jdoerfert.

Update one test

ye-luo added a subscriber: ye-luo.Sep 14 2020, 3:51 PM

However, OpenMP task has a problem that it must be within
to a parallel region; otherwise the task will be executed immediately. As a
result, if we directly wrap to a regular task, the nowait target outside of a
parallel region is still a synchronous version.

The spec says an implicit task can be generated by an implicit parallel region which can be the whole OpenMP program. For this reason, the need of explicit parallel region is a limitation of the llvm OpenMP runtime, right?

Can I have an option to run the nowait region as a regular task instead of an unshackled task? So I can use "parallel" and well established ways to control the thread affinity.

However, OpenMP task has a problem that it must be within
to a parallel region; otherwise the task will be executed immediately. As a
result, if we directly wrap to a regular task, the nowait target outside of a
parallel region is still a synchronous version.

The spec says an implicit task can be generated by an implicit parallel region which can be the whole OpenMP program. For this reason, the need of explicit parallel region is a limitation of the llvm OpenMP runtime, right?

Can I have an option to run the nowait region as a regular task instead of an unshackled task? So I can use "parallel" and well established ways to control the thread affinity.

According to the spec, an implicit parallel region is an inactive parallel region that is not generated from a parallel construct. And based on the definition of active parallel region, which is a parallel region that is executed by a team consisting of more than one thread, an inactive parallel region only has one thread. Since we only have one thread, if we encounter a task, executing it immediately does make sense as we don't have another thread to execute it.

I do remember your request about the regular task. This patch is exactly what you need. Later when I finish the RTL, I could provide an option.

However, OpenMP task has a problem that it must be within
to a parallel region; otherwise the task will be executed immediately. As a
result, if we directly wrap to a regular task, the nowait target outside of a
parallel region is still a synchronous version.

The spec says an implicit task can be generated by an implicit parallel region which can be the whole OpenMP program. For this reason, the need of explicit parallel region is a limitation of the llvm OpenMP runtime, right?

Can I have an option to run the nowait region as a regular task instead of an unshackled task? So I can use "parallel" and well established ways to control the thread affinity.

According to the spec, an implicit parallel region is an inactive parallel region that is not generated from a parallel construct. And based on the definition of active parallel region, which is a parallel region that is executed by a team consisting of more than one thread, an inactive parallel region only has one thread. Since we only have one thread, if we encounter a task, executing it immediately does make sense as we don't have another thread to execute it.

If I remember correctly, you may yield the thread inside a target region after enqueuing kernels and transfers. So even with 1 thread, there is chance to run other tasks without finishing this target. Isn't that possible?

I do remember your request about the regular task. This patch is exactly what you need. Later when I finish the RTL, I could provide an option.

Thanks. I see, we will be able to control that in the runtime library.

Fixed declare_mapper_codegen.cpp with CK0

If I remember correctly, you may yield the thread inside a target region after enqueuing kernels and transfers. So even with 1 thread, there is chance to run other tasks without finishing this target. Isn't that possible?

I assume you were referring to taskyield because thread yield doesn't make sense here. I don't think it can help. Where should we insert the task yield? What's more, yielding current task means "blocking" the encountering thread, so it has no difference from executing it immediately. Besides, we do synchronization at the end of wrapped task.

The only way to make things right w/o using unshackled task when we only have one regular OpenMP thread, or say user-visible thread, is to use detached task. However, like we talked in the paper, it has some drawbacks so that we decided to use unshackled tasks.

Fixed an issue that one wildcard is missing in the CHECK line

tianshilei1992 edited the summary of this revision. (Show Details)Sep 16 2020, 12:14 PM

Fixed the test case target_codegen.cpp

Fixed the test failure in OMP50

Fixed the test case target_parallel_codegen.cpp

Fixed the case target_parallel_for_codegen.cpp

Continued to fix the case target_parallel_for_codegen.cpp

Fixed the case target_parallel_for_simd_codegen.cpp

Fixed the test case target_simd_codegen.cpp

Fixed the case target_teams_codegen.cpp

Fixed the case target_teams_distribute_codegen.cpp

Fixed the case target_teams_distribute_simd_codegen.cpp

Fixed the case target_parallel_for_codegen.cpp

Refined the case declare_mapper_codegen.cpp

tianshilei1992 retitled this revision from [WIP][Clang][OpenMP] Added support for nowait target in CodeGen to [Clang][OpenMP] Added support for nowait target in CodeGen.Sep 22 2020, 11:24 AM

Rebased the source code and refined declare_mapper_codegen.cpp

jdoerfert accepted this revision.Sep 25 2020, 11:00 AM

LGTM. Nice to get async through "regular" tasks :)

This revision is now accepted and ready to land.Sep 25 2020, 11:00 AM
tianshilei1992 retitled this revision from [Clang][OpenMP] Added support for nowait target in CodeGen to [Clang][OpenMP] Added support for nowait target in CodeGen via regular task.Sep 25 2020, 11:09 AM
tianshilei1992 edited the summary of this revision. (Show Details)