This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add missing `tt_hidden_helper_task_encountered` along with `tt_found_proxy_tasks`

Authored by tianshilei1992 on Aug 2 2021, 4:06 PM.



In most cases, hidden helper task behave similar as detached tasks. That means,
for example, if we have to wait for detached tasks, we have to do the same thing
for hidden helper tasks as well. This patch adds the missing condition for hidden
helper task accordingly along with detached task.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Aug 2 2021, 4:06 PM
tianshilei1992 requested review of this revision.Aug 2 2021, 4:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 4:06 PM

Is it necessary, to have tt_hidden_helper_task_encountered and tt_found_proxy_tasks? Would it be sufficient, to just use tt_found_proxy_tasks all the time?

tianshilei1992 added a comment.EditedAug 4 2021, 9:50 AM

Is it necessary, to have tt_hidden_helper_task_encountered and tt_found_proxy_tasks? Would it be sufficient, to just use tt_found_proxy_tasks all the time?

Technically, I think it might be fine. However, I still would like to keep them separated as though proxy task and hidden helper task are similar, their execution are totally different. There is no proxy task generated during the whole execution, unlike detached task.

This patch can potentially fix a problem exposed in the added test case. However,
the case cannot pass now because it hangs in __kmp_execute_tasks_template.

rebase and fix hangs


That is the reason that we can observe hangs in the past. Basically, D107496 changed the way to track incomplete children. Once we encountered a hidden helper task or detachable task, all following tasks will be tracked. In the simple case we added to this patch, we have a regular target region depending on a hidden helper task. The allocation of the regular target region increments the counter of incomplete children. Since it is a regular target region with dependences, clang lowers it to a function call to __kmpc_omp_wait_deps first followed by invoking the task directly. At this moment, current_task->td_incomplete_child_tasks is either 2 or 1, depending on whether the hidden helper task is finished, but it will never be 0. Therefore, we are trapped into the infinite loop here.
I simply change it to check if the td_incomplete_child_tasks is greater than 1 which can exclude itself. Since we don't know if the function call to __kmpc_omp_wait_deps is for taskwait or for a if0 task, I don't have any other good idea on the top of my mind to check here. We might have very tiny performance loss, caused by returning from this function and then entering it again, for the fairly rare case.


I think the correct fix here would be addition of condition && !task_team->tt.tt_hidden_helper_task_encountered instead of changing the condition on current_task->td_incomplete_child_tasks. That should cause the thread to exit task execution loop only in the presence of a hidden helper task to check if it still needs to wait for this task or can proceed. In other cases the thread may need to execute tasks from its own queue without exiting the loop.

tianshilei1992 marked an inline comment as done.Dec 26 2021, 5:50 PM
tianshilei1992 added inline comments.

I fixed it in another way that is to add another check of the flag before set use_own_tasks. We do have two similar checks in different code path, but in the test case added to this patch, existing check cannot cover.

tianshilei1992 marked an inline comment as done.Dec 28 2021, 1:24 PM
This revision is now accepted and ready to land.Dec 29 2021, 1:52 PM