This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix task wait doesn't work as expected in serialized team
ClosedPublic

Authored by tianshilei1992 on Aug 4 2021, 12:29 PM.

Details

Summary

As discussed in D107121, task wait doesn't work when a regular task T depends on
a detached task or a hidden helper task T' in a serialized team. The root cause is,
since the team is serialized, the last task will not be tracked by
td_incomplete_child_tasks. When T' is finished, it first releases its
dependences, and then decrements its parent counter. So far so good. For the thread
that is running task wait, if at the moment it is still spinning and trying to
execute tasks, it is fine because it can detect the new task and execute it.
However, if it happends to finish the function flag.execute_tasks(...), it will
be broken because td_incomplete_child_tasks is 0 now.

In this patch, we update the rule to track children tasks a little bit. If the
task team encounters a proxy task or a hidden helper task, all following tasks
will be tracked.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Aug 4 2021, 12:29 PM
tianshilei1992 requested review of this revision.Aug 4 2021, 12:29 PM
Herald added a project: Restricted Project. · View Herald Transcript

It's not just the last task, but any task after a detached/hidden task that has the issue to not be synchronized with the taskwait.

Instead of looking for the values in task_team, you could also just add || taskdata->td_parent->td_incomplete_child_tasks>0 and avoid a branch.

tianshilei1992 edited the summary of this revision. (Show Details)Aug 4 2021, 5:47 PM

It's not just the last task, but any task after a detached/hidden task that has the issue to not be synchronized with the taskwait.

Thanks for pointing out. I've updated the description.

Instead of looking for the values in task_team, you could also just add || taskdata->td_parent->td_incomplete_child_tasks>0 and avoid a branch.

They're not equivalent I think, especially when decrementing the counter. We should not decrement the counter if only td_incomplete_child_tasks > 0. I prefer to make increment and decrement "symmetric" such that we know they can be paired.

protze.joachim added a comment.EditedAug 4 2021, 11:01 PM

You are right with your statement, that the two are not completely equivalent. But I still prefer my approach.

Consider the following sequence:

#pragma omp task detach(e) depend(out:a)
{}
#pragma omp taskwait depend(in:a)
#pragma omp task
{}

Your approach will enforce reference counting of child tasks until the end of the parallel region (remember that also main has an implicit inactive parallel region!). So, you will in/decrement the counter also for the last task.

My approach will stop reference counting, as soon as no other task is alive in the team and the task is really serialized. The race condition for at td_allocated_child_tasks>0 and then possibly incrementing from td_allocated_child_tasks=0 is actually harmless, because the same task will decrement the value at the end of it's execution.

For a really serialized task (which does no reference counting), no other task can be created during it's execution, so that td_allocated_child_tasks cannot be >0 at the end of execution.
Similar reasoning holds for your approach: for a serialized task, neither of the two flags in task_team can be set while the task executes and they are never set back to 0, aren't they?

The only requirement is, that the same set of flags is checked for the creation and end of task execution; and it is also important, that the values cannot change between the two encountering. I think, we should collect the flags into a macro/function to make sure we keep them symmetric.

With more and more conditions to this "fast path" for serialized tasks: when does the number of conditions and checks become more expensive than the operations we try to avoid - atomic writes to some variable with no contention?

openmp/runtime/src/kmp_tasking.cpp
1406

Btw: How can this be valid? From my understanding, task_serial marks the task as undeferred. Execution cannot continue before the task finished execution.

You are right with your statement, that the two are not completely equivalent. But I still prefer my approach.

Consider the following sequence:

#pragma omp task detach(e) depend(out:a)
{}
#pragma omp taskwait depend(in:a)
#pragma omp task
{}

Your approach will enforce reference counting of child tasks until the end of the parallel region (remember that also main has an implicit inactive parallel region!). So, you will in/decrement the counter also for the last task.

My approach will stop reference counting, as soon as no other task is alive in the team and the task is really serialized. The race condition for at td_allocated_child_tasks>0 and then possibly incrementing from td_allocated_child_tasks=0 is actually harmless, because the same task will decrement the value at the end of it's execution.

Yes, you're right. Mine will start the counting as long as there is a detached task or hidden helper task found. I'm just thinking if it is possible there is a task that doesn't increment the count when it is allocated but decrements the counter by mistake.

For a really serialized task (which does no reference counting), no other task can be created during it's execution, so that td_allocated_child_tasks cannot be >0 at the end of execution.
Similar reasoning holds for your approach: for a serialized task, neither of the two flags in task_team can be set while the task executes and they are never set back to 0, aren't they?

Tasks can still be created but whether they are able to deferred, it depends. No?

The only requirement is, that the same set of flags is checked for the creation and end of task execution; and it is also important, that the values cannot change between the two encountering. I think, we should collect the flags into a macro/function to make sure we keep them symmetric.

I'll wrap them into a function.

With more and more conditions to this "fast path" for serialized tasks: when does the number of conditions and checks become more expensive than the operations we try to avoid - atomic writes to some variable with no contention?

That's a good point. It might worth a try to see which path is better.

rebase and fix comments

tianshilei1992 added inline comments.Aug 5 2021, 9:45 AM
openmp/runtime/src/kmp_tasking.cpp
1406

Hidden helper task must be deferred task, right?

I think this patch can also fix the flaky failure of openmp/libomptarget/test/offloading/bug50022.cpp.

AndreyChurbanov accepted this revision.Aug 31 2021, 2:52 AM

LGTM with one nit.

openmp/runtime/src/kmp_tasking.cpp
821

Why not use flags here instead of taskdata->td_flags?

This revision is now accepted and ready to land.Aug 31 2021, 2:52 AM

rebase and fix comments

tianshilei1992 marked 2 inline comments as done.Aug 31 2021, 8:44 AM