This is an archive of the discontinued LLVM Phabricator instance.

sanitizer_common: optimize Mutex for high contention
ClosedPublic

Authored by dvyukov on Aug 10 2021, 6:55 AM.

Details

Summary

After switching tsan from the old mutex to the new sanitizer_common mutex,
we've observed a significant degradation of performance on a test.
The test effectively stresses a lock-free stack with 4 threads
with a mix of atomic_compare_exchange and atomic_load operations.
The former takes write lock, while the latter takes read lock.
It turned out the new mutex performs worse because readers don't
use active spinning, which results in significant amount of thread
blocking/unblocking. The old tsan mutex used active spinning
for both writers and readers.

Add active spinning for readers.
Don't hand off the mutex to readers, and instread make them
compete for the mutex after wake up again.
This makes readers and writers almost symmetric.

Diff Detail

Event Timeline

dvyukov created this revision.Aug 10 2021, 6:55 AM
dvyukov requested review of this revision.Aug 10 2021, 6:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2021, 6:55 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dvyukov updated this revision to Diff 365468.Aug 10 2021, 7:09 AM

fixed some DCHECK checks

melver added inline comments.Aug 10 2021, 8:02 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
194

I think we need a 'pause' hint in active spinning. Do we have arch-independent pause-hint wrapper already?

216–217

Could that unnecessarily shift fairness towards readers? E.g. if there are lots of readers, the probability that the kReaderSpinWait bit is set is high, which would mean wake_writer upon Unlock() is false.

Before, it would always prioritize writers, but now it prioritizes any spinner. If the other writer is Wait()'ing, it would prioritize the read-spinners (+ any Wait()'ing reader as a side-effect).

Is that ok?

I guess it's fine to prefer to let the spinners through first, but if there are read-spinners + read-waiters it'll turn the Wait()'ing readers into spinners as well, without waking the waiting writers, and therefore prioritize readers.

.. or I missed something that's currently preventing that, or perhaps it's not as bad as I think.

248

Is it ok that kReaderSpinWait bit is imprecise because it's just an optimization?
Because multiple readers may keep flipping this bit on or off, and a wait-reader may unset the bit while there is still an in-flight spin-reader (which would immediately set it back).

264

Same here, I think we need 'pause' hint for CPU.

dvyukov added inline comments.Aug 10 2021, 8:09 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
194

We have, but the problem with PAUSE on x86 is that its latency increased from 1 cycle to 100+ recently. I am not sure how its supposed to be used now. Also 100+ cycles looks like just too much for a backoff.
We used to do 10 PAUSES in a row, but now I did what abseil mutex does (just 1500 active spin iterations w/o PAUSE):
https://github.com/abseil/abseil-cpp/blob/master/absl/synchronization/mutex.cc#L144
(what abseil does can't be too bad)

dvyukov added inline comments.Aug 10 2021, 8:10 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
248

Looks OK to me.
It would not be OK to set kReaderSpinWait when there are no spinners, but it's fine the other way around.

melver accepted this revision.Aug 10 2021, 8:11 AM
melver added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
194

Ok, that's bad. I guess some recent microarch broke it then :-)
That's fine then.

This revision is now accepted and ready to land.Aug 10 2021, 8:11 AM
dvyukov added inline comments.Aug 10 2021, 8:20 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
216–217

Yes, it's more unfair now with more bias towards readers.
I don't know what's better. There is an inherent conflict between fairness and throughput. We try to make it more fair, we get these 100x slowdown fallouts. I don't think there is even the single right answer.
We already see too much fairness hits us badly. We have not yet seen how too less fairness hits us :)
I don't think we want too strong fairness guarantees (like a production mutex could have). For us throughput is quite important since we slow down execution a lot and tend to create high contention in some cases. As long as the program makes forward progress, I think throughput is more preferable.
If this version proves to be problematically unfair, maybe we could add the starvation prevention logic I used in the Go's sync.Mutex:
https://github.com/golang/go/commit/0556e26273f704db73df9e7c4c3d2e8434dec7be
It ensures there is no pathological unfairness w/o penalizing normal execution.

melver added inline comments.Aug 10 2021, 8:25 AM
compiler-rt/lib/sanitizer_common/sanitizer_mutex.h
216–217

Ok makes sense.
Just wanted to check; and thanks for the explanation!

LGTM, thanks!

This revision was automatically updated to reflect the committed changes.