There is a hard-to-reproduce crash happening on OS X that involves terminating the main thread (dispatch_main does that, see discussion at http://reviews.llvm.org/D18496) and later reusing the main thread's ThreadContext. This patch disables reuse of the main thread. I believe this problem exists only on OS X, because on other systems the main thread cannot be terminated without exiting the process.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Why is it difficult to reproduce? The quarantine if FIFO queue of size 16 IIRC. So if you create/destroy 16 threads, next thread creation will reuse the oldest thread context. Or the crash does not always happen after reuse?
I am hinting on a test.
The crash happens with GCD worker threads. The only way to wait for a worker thread to be destroyed that I know of, is using long sleep()s and even then it's non-deterministic and the actual delays differ in different OS versions.
If you can make the test crash once in 100 runs that's still better than no test. If we have a regression, the failure will be detected eventually (on bots or in manual test runs).
No, it is not OK. How does the look like?
Note to get thread id reuse, you don't need to use GCD worker threads, you can create 16 normal threads, ensure that they started and join them.
The crash only happens when the main thread is reused as a worker thread. It's hard to trigger that, since you can't control the creation and termination of worker threads. I don't know how would throwing in some regular pthreads help.