This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] libomp: fix dynamic loop dispatcher
ClosedPublic

Authored by AndreyChurbanov on Jun 3 2021, 2:23 PM.

Details

Summary

Restructured dynamic loop dispatcher code.
Fixed work with dispatch buffers for nonmonotonic dynamic (static_steal) schedule:

  • eliminated possibility of stealing iterations of the wrong loop when victim thread changed its buffer to work on another loop;
  • fixed race when victim thread changed its buffer to work in nested parallel.
  • eliminated "static" property of the schedule, that is now a single thread can execute whole loop.

Diff Detail

Event Timeline

AndreyChurbanov created this revision.Jun 3 2021, 2:23 PM
AndreyChurbanov requested review of this revision.Jun 3 2021, 2:23 PM

Enable one more test fixed by this patch.

Can you elaborate in the summary what your fixing about the static_steal schedule?

openmp/runtime/src/kmp_dispatch.cpp
1254–1256

Can we get rid of these commented lines?

1400–1402

Can we get rid of these commented lines?

AndreyChurbanov edited the summary of this revision. (Show Details)

Some comments removed.

This revision is now accepted and ready to land.Jun 17 2021, 1:06 PM
This revision was landed with ongoing or failed builds.Jun 22 2021, 6:29 AM
This revision was automatically updated to reflect the committed changes.
AndreyChurbanov marked 2 inline comments as done.
hoy added a subscriber: hoy.Jun 24 2021, 1:25 PM

Hello, I was seeing an assert firing in kmp_dispatch.cpp when running the omp_parallel_reduction.c test with a -DLLVM_ENABLE_ASSERTIONS enabled toolset. Would you mind taking a look? Thanks.

******************** TEST 'libomp :: parallel/omp_parallel_reduction.c' FAILED ********************
Script:
... omitted 

Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1453).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.

Hello, I was seeing an assert firing in kmp_dispatch.cpp when running the omp_parallel_reduction.c test with a -DLLVM_ENABLE_ASSERTIONS enabled toolset. Would you mind taking a look? Thanks.

  • TEST 'libomp :: parallel/omp_parallel_reduction.c' FAILED ****

Script:
... omitted

Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1453).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.

Thanks for reporting.
Actually the wrong assertion existed before this patch, which apparently increased the probability of its triggering.
I've fixed the assertion in https://reviews.llvm.org/D104880.

hoy added a comment.Jun 24 2021, 3:53 PM

Hello, I was seeing an assert firing in kmp_dispatch.cpp when running the omp_parallel_reduction.c test with a -DLLVM_ENABLE_ASSERTIONS enabled toolset. Would you mind taking a look? Thanks.

  • TEST 'libomp :: parallel/omp_parallel_reduction.c' FAILED ****

Script:
... omitted

Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.
OMP: Error #13: Assertion failure at kmp_dispatch.cpp(1453).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_dispatch.cpp(1453): (vnew.p.ub - 1) * (UT)chunk <= trip.

Thanks for reporting.
Actually the wrong assertion existed before this patch, which apparently increased the probability of its triggering.
I've fixed the assertion in https://reviews.llvm.org/D104880.

Thanks for the fast turnaround!

rogfer01 added inline comments.
openmp/runtime/src/kmp_dispatch.cpp
911

Sometimes (not always, so it seems a data race) running this test in an Arm 64-bit machine with 46 cores (and in a Power9 machine with 40 cores) all the threads end waiting here, so the test doesn't progress anymore.

All the cases I've seen happen with KMP_DISP_NUM_BUFFERS=3 and -DMY_SCHEDULE=guided.

Any idea how I could debug this further?

A quick look about sh->buffer_index shows it is a volatile and it is updated in

sh->buffer_index += __kmp_dispatch_num_buffers;
KD_TRACE(100, ("__kmp_dispatch_next: T#%d change buffer_index:%d\n",
               gtid, sh->buffer_index));

KMP_MB(); /* Flush all pending memory write invalidates.  */

Given that this is not an atomic operation (yet it goes followed by a memory barrier) my only hypothesis is that the original load of sh->buffer_index might have read an old value but that would suggest KMP_MB() is not effective in these targets? So I am at loss here.

Thanks!

openmp/runtime/src/kmp_dispatch.cpp
911

Which test(s) you see hanging?

There indeed may be a data race somewhere in the code, I will try to take a look. Still better to know the exact test case.

rogfer01 added inline comments.Aug 18 2021, 9:29 AM
openmp/runtime/src/kmp_dispatch.cpp
911

Hi Andrey,

apologies I forgot to mention that in the comment above

env/kmp_set_dispatch_buf.c was easy to cause a deadlock with KMP_DISP_NUM_BUFFERS=3 and -DMY_SCHEDULE=guided.

Kind regards,