Instead of trying to be clever and design our own locking primitive,
simply rely on the OS-provided implementation to do the right thing.
Indeed, manually yielding to the OS does not provide the necessary
information for it to make good prioritization decisions. For example,
if a thread with higher priority yields while waiting for a lock held
by a thread with lower priority but the system is contended, it is
possible for the thread with lower priority to not run until the higher
priority thread has yielded 16 times and goes for __libcpp_mutex_lock().
Once that happens, the OS can bump the priority of the thread that
currently holds the lock to unblock everyone. So instead, we might as
well give the system all the information from the start so it can make
appropriate decisions.
As a fly-by change, also increase the number of locks in the table.
The size increase is modest, but has the potential to half the amount
of contention on those locks.
rdar://93598606
Since you change this value it might be a good moment to document why 32 is "the right" value.