Page MenuHomePhabricator

[OpenMP] Fix hang on Windows
ClosedPublic

Authored by jlpeyton on Mar 25 2019, 12:21 PM.

Details

Summary

Debug dump on large machine shows when many OpenMP threads (401 in total)
sleep on a barrier, one of the innermost nesting levels sleeps
on a child's b_arrived flag whose value is equal to 4 and is equal to
checker value. i.e., (1) sleep bit is 0, and (2) done_check() would
return true if called.

It is unclear how this might happen. It could be Windows Server 2016's
error of EnterCriticalSection / LeaveCriticalSection, or
error of WaitForSingleObject / SetEvent / ResetEvent, or
error in the library which is very difficult to find.

As a workaround, change INFINITE wait to timed wait, so that each
thread awakens each 5 seconds (the timeout was chosen arbitrary to not
disturb other threads much), check flag condition under the lock, and
either go to sleep again or stop sleeping as a result of the check.

Patch by Andrey Churbanov

Diff Detail

Repository
rL LLVM

Event Timeline

jlpeyton created this revision.Mar 25 2019, 12:21 PM
hbae accepted this revision.Apr 3 2019, 2:25 PM

LGTM.

This revision is now accepted and ready to land.Apr 3 2019, 2:25 PM
Closed by commit rL357722: [OpenMP] Fix hang on Windows (authored by jlpeyton, committed by ). · Explain WhyApr 4 2019, 1:37 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2019, 1:37 PM