This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] libomp: Fix dist barrier for nested parallel/team
Needs ReviewPublic

Authored by t-msn on Apr 6 2022, 2:19 AM.

Details

Summary

Some code path in nested parallelism miss updates for dist barrier
which could lead hang. Add appropriate function calls.

Diff Detail

Event Timeline

t-msn created this revision.Apr 6 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 2:19 AM
t-msn added a project: Restricted Project.
t-msn added a subscriber: openmp-commits.
t-msn published this revision for review.Apr 6 2022, 2:30 AM

I noticed this when running check-openmp with dist barrier by using https://reviews.llvm.org/D122645
btw, it seems parallel tests run does not work when using dist barrier, but I couldn't figure out the reason.

Hi -- can you tell me which specific tests failed reliably? Was it one particular arch? I want to make sure I'm covering those tests and will give your change a try and review soon. Thanks!

t-msn added a comment.Apr 7 2022, 9:13 PM

Hi
I saw check-openmp hang on both x86 and arm64. Adding timeout value to lit option gives me the following result:

$ env LIT_OPTS="--show-xfail --timeout=60 -j 1"  CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist" ninja check-openmp
...
Timed Out Tests (15):
  libomp :: affinity/bug-nested.c
  libomp :: affinity/format/fields_values.c
  libomp :: affinity/format/nested.c
  libomp :: affinity/format/nested2.c
  libomp :: affinity/format/nested_mixed.c
  libomp :: env/omp_thread_limit.c
  libomp :: ompt/misc/threads_nested.c
  libomp :: ompt/parallel/nested.c
  libomp :: ompt/parallel/nested_lwt.c
  libomp :: ompt/parallel/nested_thread_num.c
  libomp :: ompt/parallel/nested_threadnum.c
  libomp :: ompt/teams/parallel_team.c
  libomp :: parallel/omp_nested.c
  libomp :: teams/kmp_num_teams.c
  libomp :: worksharing/for/omp_monotonic_schedule_set_get.c

As these tests are related to nest/teams, I spotted suspected code path. With this patch, these timeout failures are gone.
(In addition to above timeouts, ompt/synchronization/reduction/tree_reduce.c is failed. I believe this is a test problem and partially addressed in D123359)