Page MenuHomePhabricator

Fix for third case reported in https://bugs.llvm.org/show_bug.cgi?id=41584
ClosedPublic

Authored by AndreyChurbanov on May 22 2019, 5:58 AM.

Details

Summary

The code fragments of recruiting a thread from pool (kmp_runtime.cpp:4297):

new_thr->th.th_in_pool = FALSE;
__kmp_lock_suspend_mx(new_thr);
if (new_thr->th.th_active_in_pool == TRUE) {
  ...
else
  KMP_DEBUG_ASSERT(new_thr->th.th_active == FALSE);

and its awakening, executed under the mutex (z_Linux_util.cpp:1534):

th->th.th_active = TRUE;
if (th->th.th_in_pool) {
  th->th.th_active_in_pool = TRUE;
}

can be executed in parallel. Since the "th_in_pool=0" assignment is out of critical section, it is valid to have "th_active==1" and "th_active_in_pool==0" for the worker thread been recruited from common pool (e.g. in case of spurious wakeup). So the assertion triggered looks wrong. The patch deletes it.

I actually failed to reproduce the assertion on my server with RHEL7 (probably because it is hard to force spurious wakeup of a thread, and without it I failed to find code path of simultaneous execution of code fragments I referenced), but the patch looks logically correct.

Diff Detail

Repository
rL LLVM

Event Timeline

AndreyChurbanov edited the summary of this revision. (Show Details)May 22 2019, 6:10 AM

I tried at r361335, where the original reproducer failed this assert consistently quickly for me today. After applying your patch there and running each reproducer from my bug report in a shell while loop for about an hour, I saw no failures. Thanks!

hbae accepted this revision.May 22 2019, 9:36 AM

LGTM.

This revision is now accepted and ready to land.May 22 2019, 9:36 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2019, 9:48 AM