Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[OpenMP][OMPT] ompt_parallel_invoker_program used when clang generates the code of parallel if(0)-serialized regions
Needs ReviewPublic

Authored by vladaindjic on Oct 14 2021, 4:00 AM.



When clang generates the code of serialized parallel region whose omp parallel
directive contains if(0) clause, then the implicit task of the corresponding
region is invoked directly by the application, so the invoker takes the value
of “ompt_parallel_invoker_program” flag.

For all other cases, implicit tasks of the parallel regions are invoked by
the runtime, so “ompt_parallel_invoker_runtime” flag is used instead.

Four test cases have been introduced to prove that the “invoker” argument
is determined correctly while dispatching “ompt_callback_parallel_begin”
and “ompt_callback_parallel_end” callbacks. Each test case supplies the
master thread’s call stack which points to the call site of the corresponding
implicit task and argues about the right value of the “invoker” argument.

Diff Detail

Event Timeline

vladaindjic created this revision.Oct 14 2021, 4:00 AM
vladaindjic requested review of this revision.Oct 14 2021, 4:00 AM
Herald added a project: Restricted Project. · View Herald Transcript
  • clang-formatted diffs in kmp_csupport.cpp

Tests should not burn CPU cycles. We use my_sleep from omp_my_sleep.h or delay from ompt-signal.h instead to allow other tests make progress in the meantime.

As mentioned in the comment, a better solution would be to refactor the runtime and avoid internally calling into the __kmpc_ interface.

vladaindjic updated this revision to Diff 390654.EditedNov 30 2021, 3:25 AM

As @protze.joachim suggested, "kmpc_end_serialized_parallel" is not called from within the runtime code on host.
Instead, "
kmp_end_serialized_parallel" function has been created by copying the body
of __kmpc_ counterpart. Its parameters list has been extended with the “invoker” parameter
responsible to distinct whether the parallel region’s wrapper function is called
by the runtime code or the OpenMP program directly. The later one corresponds to the
parallel construct that contains if(0) clause compiled with clang.

The “foo” function is introduced to improve the tests clarification, while the “burn_CPU” has been removed.

Please rebase, as the patch cannot be applied currently.