The most performant pthread_mutex_t configuration in GLIBC is
PTHREAD_MUTEX_ADAPTIVE_NP.
The default is timed mutex which is implemented almost entirely
through the futex syscall (no attempt at looping for any amount of
time in userland). This adds has a high context switch cost in almost
all situations.
The ADAPTIVE_NP implementation, on the other hand, progresses through
exponential backoff before going to the futex allowing for lower
latency for shorter waits with extra minimal overhead or memory
traffic for longer waits.
On my Icelake machine running the GLIBC mutex benchmarks suite for N=5
runs.
There is a 12% performance improvement for low contention cases
(nthreads <= nprocs / 2) and a 2.5% performance improvement in high
contention cases (nthreads > nprocs / 2)