Releasing the mutex before the call to notify_all is an optimization.
This optimization cannot be used here. The thread waiting on the
condition might destroy the associated resources — mutex + condition
variable — and the notifier thread will access an destroyed variable
— the condition variable. In fact, notify_all_at_thread_exit is meant
exactly to join on detached threads, and the waiting thread doesn't
expect for the notifier thread to access any further shared resources,
making this scenario very likely to happen. The waiting thread might
awake spuriously on the release of the mutex lock. The reorder is
necessary to prevent this race.
Further details can be found at
https://cplusplus.github.io/LWG/issue3343.
Btw, update: I've emailed @lewissbaker to ask why the proof-of-concept wandbox from https://cplusplus.github.io/LWG/issue3343 (namely, https://wandbox.org/permlink/eUu3eiQbLl7JQKMm ) doesn't reproduce anymore. It's trivial to turn that code snippet into a libc++ unit test; but since it's already green on trunk, it wouldn't be very useful.
He hasn't gotten back to me yet. But this PR is still on my radar!