This is an archive of the discontinued LLVM Phabricator instance.

Lock-free start of serialized parallel regions
ClosedPublic

Authored by jlpeyton on Aug 12 2015, 1:49 PM.

Details

Reviewers
hfinkel
Summary

Moved initial checks of num_threads vs. 1 from kmp_reserve_threads(), which is called under the forkjoin lock, to the upper level kmp_fork_call().
Thus most cases of serialized parallels do not interfere with the forkjoin lock (e.g. nested parallels when omp nested is disabled).

Diff Detail

Repository
rL LLVM

Event Timeline

jlpeyton updated this revision to Diff 31975.Aug 12 2015, 1:49 PM
jlpeyton retitled this revision from to Lock-free start of serialized parallel regions.
jlpeyton updated this object.
jlpeyton added a reviewer: hfinkel.
jlpeyton set the repository for this revision to rL LLVM.
hfinkel added inline comments.Aug 16 2015, 2:59 AM
runtime/src/kmp_runtime.c
1740

Why do we only release the lock when nthreads == 1? Does __kmp_reserve_threads release it otherwise?

(I realize that you've only moved this line from down below, but this seems non-obvious)

runtime/src/kmp_runtime.c
1740

No, the __kmp_reserve_threads does not release the lock.

Let me detail the rational of the change:

Old code: get lock always at the beginning, then release lock for nthreads==1 on line 1756, for nthreads>1 on line 2168 when a lot of multithread-sensitive actions have completed.

New code: lock skipped for simple 1-thread cases, but still got lock for other 1-thread cases (e.g. when serial execution caused by dynamic threads adjustment inside __kmp_reserve_threads). As a result the lock releasing for 1-thread moved here, because it now cannot be done for all 1-thread cases. Multi-thread case releases the lock in the same place as earlier.

Performance result - 10x or more speedup of the code like

<long loop>
  #pragma omp parallel
    #pragma omp parallel

where inner parallel region are serialized by default because OMP nesting is disabled, and number of threads in outer region is big (e.g. 60 threads on Xeon PHI to keep all cores busy).

hfinkel accepted this revision.Aug 18 2015, 1:30 AM
hfinkel edited edge metadata.
hfinkel added inline comments.
runtime/src/kmp_runtime.c
1740

Okay, sounds good. Please add a comment here explaining that in the multi-thread case the lock is released later on in the function. Otherwise, LGTM.

This revision is now accepted and ready to land.Aug 18 2015, 1:30 AM