This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/include/
-
include/
-
__threading_support

Differential D148681

[libc++] Make std::mutex use ADAPTIVE_NP intializer if available on gnu
Needs RevisionPublic

Authored by goldstein.w.n on Apr 18 2023, 7:56 PM.

Download Raw Diff

Details

Reviewers

ldionne
Mordante
philnik

Group Reviewers

Restricted Project

Summary

The most performant pthread_mutex_t configuration in GLIBC is
PTHREAD_MUTEX_ADAPTIVE_NP.

The default is timed mutex which is implemented almost entirely
through the futex syscall (no attempt at looping for any amount of
time in userland). This adds has a high context switch cost in almost
all situations.

The ADAPTIVE_NP implementation, on the other hand, progresses through
exponential backoff before going to the futex allowing for lower
latency for shorter waits with extra minimal overhead or memory
traffic for longer waits.

On my Icelake machine running the GLIBC mutex benchmarks suite for N=5
runs.

There is a 12% performance improvement for low contention cases
(nthreads <= nprocs / 2) and a 2.5% performance improvement in high
contention cases (nthreads > nprocs / 2)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

goldstein.w.n created this revision.Apr 18 2023, 7:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2023, 7:56 PM

goldstein.w.n requested review of this revision.Apr 18 2023, 7:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2023, 7:56 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Here are the full results for GLIBC benchmark suite:

                                                         Lower=better
Benchmark Config                                        ,Speedup (ADAPTIVE_NP / TIMED_NP)
type=adaptive/non_crt_len=1/crt_len=0/threads=2         ,0.605
type=adaptive/non_crt_len=1/crt_len=0/threads=4         ,0.861
type=adaptive/non_crt_len=1/crt_len=0/threads=8         ,1.140
type=adaptive/non_crt_len=1/crt_len=0/threads=16        ,1.161
type=adaptive/non_crt_len=1/crt_len=0/threads=20        ,1.180
type=adaptive/non_crt_len=1/crt_len=1/threads=2         ,0.658
type=adaptive/non_crt_len=1/crt_len=1/threads=4         ,0.830
type=adaptive/non_crt_len=1/crt_len=1/threads=8         ,1.092
type=adaptive/non_crt_len=1/crt_len=1/threads=16        ,1.119
type=adaptive/non_crt_len=1/crt_len=1/threads=20        ,1.120
type=adaptive/non_crt_len=1/crt_len=2/threads=2         ,0.585
type=adaptive/non_crt_len=1/crt_len=2/threads=4         ,0.735
type=adaptive/non_crt_len=1/crt_len=2/threads=8         ,1.019
type=adaptive/non_crt_len=1/crt_len=2/threads=16        ,1.095
type=adaptive/non_crt_len=1/crt_len=2/threads=20        ,1.098
type=adaptive/non_crt_len=1/crt_len=4/threads=2         ,0.637
type=adaptive/non_crt_len=1/crt_len=4/threads=4         ,0.746
type=adaptive/non_crt_len=1/crt_len=4/threads=8         ,0.966
type=adaptive/non_crt_len=1/crt_len=4/threads=16        ,1.020
type=adaptive/non_crt_len=1/crt_len=4/threads=20        ,1.034
type=adaptive/non_crt_len=1/crt_len=8/threads=2         ,0.512
type=adaptive/non_crt_len=1/crt_len=8/threads=4         ,0.781
type=adaptive/non_crt_len=1/crt_len=8/threads=8         ,0.906
type=adaptive/non_crt_len=1/crt_len=8/threads=16        ,0.951
type=adaptive/non_crt_len=1/crt_len=8/threads=20        ,0.951
type=adaptive/non_crt_len=1/crt_len=16/threads=2        ,0.860
type=adaptive/non_crt_len=1/crt_len=16/threads=4        ,0.867
type=adaptive/non_crt_len=1/crt_len=16/threads=8        ,0.893
type=adaptive/non_crt_len=1/crt_len=16/threads=16       ,0.901
type=adaptive/non_crt_len=1/crt_len=16/threads=20       ,0.903
type=adaptive/non_crt_len=1/crt_len=32/threads=2        ,0.939
type=adaptive/non_crt_len=1/crt_len=32/threads=4        ,0.845
type=adaptive/non_crt_len=1/crt_len=32/threads=8        ,0.901
type=adaptive/non_crt_len=1/crt_len=32/threads=16       ,0.960
type=adaptive/non_crt_len=1/crt_len=32/threads=20       ,0.955
type=adaptive/non_crt_len=1/crt_len=64/threads=2        ,0.941
type=adaptive/non_crt_len=1/crt_len=64/threads=4        ,0.837
type=adaptive/non_crt_len=1/crt_len=64/threads=8        ,1.000
type=adaptive/non_crt_len=1/crt_len=64/threads=16       ,0.967
type=adaptive/non_crt_len=1/crt_len=64/threads=20       ,0.949
type=adaptive/non_crt_len=1/crt_len=128/threads=2       ,0.992
type=adaptive/non_crt_len=1/crt_len=128/threads=4       ,0.984
type=adaptive/non_crt_len=1/crt_len=128/threads=8       ,0.997
type=adaptive/non_crt_len=1/crt_len=128/threads=16      ,0.994
type=adaptive/non_crt_len=1/crt_len=128/threads=20      ,0.989
type=adaptive/non_crt_len=32/crt_len=0/threads=2        ,0.981
type=adaptive/non_crt_len=32/crt_len=0/threads=4        ,0.914
type=adaptive/non_crt_len=32/crt_len=0/threads=8        ,0.995
type=adaptive/non_crt_len=32/crt_len=0/threads=16       ,1.049
type=adaptive/non_crt_len=32/crt_len=0/threads=20       ,1.055
type=adaptive/non_crt_len=32/crt_len=1/threads=2        ,0.989
type=adaptive/non_crt_len=32/crt_len=1/threads=4        ,0.924
type=adaptive/non_crt_len=32/crt_len=1/threads=8        ,0.983
type=adaptive/non_crt_len=32/crt_len=1/threads=16       ,1.044
type=adaptive/non_crt_len=32/crt_len=1/threads=20       ,1.037
type=adaptive/non_crt_len=32/crt_len=2/threads=2        ,0.996
type=adaptive/non_crt_len=32/crt_len=2/threads=4        ,0.937
type=adaptive/non_crt_len=32/crt_len=2/threads=8        ,0.973
type=adaptive/non_crt_len=32/crt_len=2/threads=16       ,1.040
type=adaptive/non_crt_len=32/crt_len=2/threads=20       ,1.038
type=adaptive/non_crt_len=32/crt_len=4/threads=2        ,1.005
type=adaptive/non_crt_len=32/crt_len=4/threads=4        ,0.951
type=adaptive/non_crt_len=32/crt_len=4/threads=8        ,0.964
type=adaptive/non_crt_len=32/crt_len=4/threads=16       ,1.006
type=adaptive/non_crt_len=32/crt_len=4/threads=20       ,0.999
type=adaptive/non_crt_len=32/crt_len=8/threads=2        ,0.993
type=adaptive/non_crt_len=32/crt_len=8/threads=4        ,0.883
type=adaptive/non_crt_len=32/crt_len=8/threads=8        ,0.924
type=adaptive/non_crt_len=32/crt_len=8/threads=16       ,0.964
type=adaptive/non_crt_len=32/crt_len=8/threads=20       ,0.961
type=adaptive/non_crt_len=32/crt_len=16/threads=2       ,0.933
type=adaptive/non_crt_len=32/crt_len=16/threads=4       ,0.904
type=adaptive/non_crt_len=32/crt_len=16/threads=8       ,0.909
type=adaptive/non_crt_len=32/crt_len=16/threads=16      ,0.924
type=adaptive/non_crt_len=32/crt_len=16/threads=20      ,0.919
type=adaptive/non_crt_len=32/crt_len=32/threads=2       ,0.922
type=adaptive/non_crt_len=32/crt_len=32/threads=4       ,0.918
type=adaptive/non_crt_len=32/crt_len=32/threads=8       ,0.913
type=adaptive/non_crt_len=32/crt_len=32/threads=16      ,0.978
type=adaptive/non_crt_len=32/crt_len=32/threads=20      ,0.959
type=adaptive/non_crt_len=32/crt_len=64/threads=2       ,0.991
type=adaptive/non_crt_len=32/crt_len=64/threads=4       ,0.979
type=adaptive/non_crt_len=32/crt_len=64/threads=8       ,0.952
type=adaptive/non_crt_len=32/crt_len=64/threads=16      ,0.879
type=adaptive/non_crt_len=32/crt_len=64/threads=20      ,0.896
type=adaptive/non_crt_len=32/crt_len=128/threads=2      ,0.912
type=adaptive/non_crt_len=32/crt_len=128/threads=4      ,0.979
type=adaptive/non_crt_len=32/crt_len=128/threads=8      ,0.947
type=adaptive/non_crt_len=32/crt_len=128/threads=16     ,0.956
type=adaptive/non_crt_len=32/crt_len=128/threads=20     ,0.954
type=adaptive/non_crt_len=128/crt_len=0/threads=2       ,1.005
type=adaptive/non_crt_len=128/crt_len=0/threads=4       ,0.988
type=adaptive/non_crt_len=128/crt_len=0/threads=8       ,0.850
type=adaptive/non_crt_len=128/crt_len=0/threads=16      ,1.045
type=adaptive/non_crt_len=128/crt_len=0/threads=20      ,1.039
type=adaptive/non_crt_len=128/crt_len=1/threads=2       ,0.990
type=adaptive/non_crt_len=128/crt_len=1/threads=4       ,0.988
type=adaptive/non_crt_len=128/crt_len=1/threads=8       ,0.845
type=adaptive/non_crt_len=128/crt_len=1/threads=16      ,1.043
type=adaptive/non_crt_len=128/crt_len=1/threads=20      ,1.043
type=adaptive/non_crt_len=128/crt_len=2/threads=2       ,1.004
type=adaptive/non_crt_len=128/crt_len=2/threads=4       ,0.994
type=adaptive/non_crt_len=128/crt_len=2/threads=8       ,0.852
type=adaptive/non_crt_len=128/crt_len=2/threads=16      ,1.044
type=adaptive/non_crt_len=128/crt_len=2/threads=20      ,1.039
type=adaptive/non_crt_len=128/crt_len=4/threads=2       ,0.993
type=adaptive/non_crt_len=128/crt_len=4/threads=4       ,1.003
type=adaptive/non_crt_len=128/crt_len=4/threads=8       ,0.864
type=adaptive/non_crt_len=128/crt_len=4/threads=16      ,1.026
type=adaptive/non_crt_len=128/crt_len=4/threads=20      ,1.019
type=adaptive/non_crt_len=128/crt_len=8/threads=2       ,1.005
type=adaptive/non_crt_len=128/crt_len=8/threads=4       ,0.975
type=adaptive/non_crt_len=128/crt_len=8/threads=8       ,0.898
type=adaptive/non_crt_len=128/crt_len=8/threads=16      ,0.979
type=adaptive/non_crt_len=128/crt_len=8/threads=20      ,0.977
type=adaptive/non_crt_len=128/crt_len=16/threads=2      ,1.035
type=adaptive/non_crt_len=128/crt_len=16/threads=4      ,0.809
type=adaptive/non_crt_len=128/crt_len=16/threads=8      ,0.887
type=adaptive/non_crt_len=128/crt_len=16/threads=16     ,0.939
type=adaptive/non_crt_len=128/crt_len=16/threads=20     ,0.942
type=adaptive/non_crt_len=128/crt_len=32/threads=2      ,1.026
type=adaptive/non_crt_len=128/crt_len=32/threads=4      ,0.805
type=adaptive/non_crt_len=128/crt_len=32/threads=8      ,0.889
type=adaptive/non_crt_len=128/crt_len=32/threads=16     ,0.956
type=adaptive/non_crt_len=128/crt_len=32/threads=20     ,0.937
type=adaptive/non_crt_len=128/crt_len=64/threads=2      ,0.971
type=adaptive/non_crt_len=128/crt_len=64/threads=4      ,0.822
type=adaptive/non_crt_len=128/crt_len=64/threads=8      ,0.944
type=adaptive/non_crt_len=128/crt_len=64/threads=16     ,0.914
type=adaptive/non_crt_len=128/crt_len=64/threads=20     ,0.928
type=adaptive/non_crt_len=128/crt_len=128/threads=2     ,0.810
type=adaptive/non_crt_len=128/crt_len=128/threads=4     ,0.824
type=adaptive/non_crt_len=128/crt_len=128/threads=8     ,0.940
type=adaptive/non_crt_len=128/crt_len=128/threads=16    ,0.919
type=adaptive/non_crt_len=128/crt_len=128/threads=20    ,0.929

Looking at https://stackoverflow.com/questions/19863734/what-is-pthread-mutex-adaptive-np I don't think we want to do this change. If someone needs such a special mutex, I guess they have to implement it themselves. We have to provide a useful mutex for all cases, not just for low contention cases that only have short times during which the mutex is locked.

This revision now requires changes to proceed.Apr 18 2023, 8:48 PM

In D148681#4279310, @philnik wrote:

Looking at https://stackoverflow.com/questions/19863734/what-is-pthread-mutex-adaptive-np I don't think we want to do this change. If someone needs such a special mutex, I guess they have to implement it themselves. We have to provide a useful mutex for all cases, not just for low contention cases that only have short times during which the mutex is locked.

I don't think the SO link really does justice to the reality of the implementation.
At the very least since the SO post was been written the following have been
added.

expo backoff to avoid memory traffic
Read guards on the CAS (same as above as exclusive state in mind)

It still hits the futex, just not immediately as doing so has a serious
latency cost.

Spinning ~100 times with a dozen or so memory accesses (with likely
all of them read-only) has a minimal cost in the high contention case
(its +maybe 100-200 cycles but the futex alone is 10-100x that) and
helps ALOT in the lower contention case (avoids really expensive syscalls).

Edit: To be clear. I belive ADAPTIVE_NP is the generic implementation that
does well in all cases whereas TIMED_NP is the specialized.

Harbormaster completed remote builds in B226504: Diff 514811.Apr 18 2023, 11:56 PM

In D148681#4279326, @goldstein.w.n wrote:

In D148681#4279310, @philnik wrote:

Looking at https://stackoverflow.com/questions/19863734/what-is-pthread-mutex-adaptive-np I don't think we want to do this change. If someone needs such a special mutex, I guess they have to implement it themselves. We have to provide a useful mutex for all cases, not just for low contention cases that only have short times during which the mutex is locked.

I don't think the SO link really does justice to the reality of the implementation.
At the very least since the SO post was been written the following have been
added.

expo backoff to avoid memory traffic

Read guards on the CAS (same as above as exclusive state in mind)

It still hits the futex, just not immediately as doing so has a serious
latency cost.

Spinning ~100 times with a dozen or so memory accesses (with likely
all of them read-only) has a minimal cost in the high contention case
(its +maybe 100-200 cycles but the futex alone is 10-100x that) and
helps ALOT in the lower contention case (avoids really expensive syscalls).

Edit: To be clear. I belive ADAPTIVE_NP is the generic implementation that
does well in all cases whereas TIMED_NP is the specialized.

Also FWIW, the fast both is identical for both of them. The only difference
is whether they handle contention by just going directly to futex (TIMED_NP)
or spin for an adaptive number of iterations <= 100 (ADAPTIVE_NP).

Revision Contents

Path

Size

libcxx/

include/

__threading_support

4 lines

Diff 514811

libcxx/include/__threading_support

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

	#if !defined(_LIBCPP_HAS_NO_THREADS)			#if !defined(_LIBCPP_HAS_NO_THREADS)

	#if defined(_LIBCPP_HAS_THREAD_API_PTHREAD)			#if defined(_LIBCPP_HAS_THREAD_API_PTHREAD)
	// Mutex			// Mutex
	typedef pthread_mutex_t __libcpp_mutex_t;			typedef pthread_mutex_t __libcpp_mutex_t;
				#if defined(__USE_GNU) && defined(PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP)
				#define _LIBCPP_MUTEX_INITIALIZER PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
				#else
	#define _LIBCPP_MUTEX_INITIALIZER PTHREAD_MUTEX_INITIALIZER			#define _LIBCPP_MUTEX_INITIALIZER PTHREAD_MUTEX_INITIALIZER
				#endif

	typedef pthread_mutex_t __libcpp_recursive_mutex_t;			typedef pthread_mutex_t __libcpp_recursive_mutex_t;

	// Condition Variable			// Condition Variable
	typedef pthread_cond_t __libcpp_condvar_t;			typedef pthread_cond_t __libcpp_condvar_t;
	#define _LIBCPP_CONDVAR_INITIALIZER PTHREAD_COND_INITIALIZER			#define _LIBCPP_CONDVAR_INITIALIZER PTHREAD_COND_INITIALIZER

	// Execute once			// Execute once
	▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines