This is an archive of the discontinued LLVM Phabricator instance.

Differential D19722

[sanitizer] Don't reuse the main thread in ThreadRegistry
ClosedPublic

Authored by kubamracek on Apr 29 2016, 6:49 AM.

Download Raw Diff

Details

Reviewers

kcc
glider
dvyukov
samsonov

Commits

rGd052a579008b: [sanitizer] Don't reuse the main thread in ThreadRegistry
rCRT268238: [sanitizer] Don't reuse the main thread in ThreadRegistry
rL268238: [sanitizer] Don't reuse the main thread in ThreadRegistry

Summary

There is a hard-to-reproduce crash happening on OS X that involves terminating the main thread (dispatch_main does that, see discussion at http://reviews.llvm.org/D18496) and later reusing the main thread's ThreadContext. This patch disables reuse of the main thread. I believe this problem exists only on OS X, because on other systems the main thread cannot be terminated without exiting the process.

Diff Detail

Repository: rL LLVM

Event Timeline

kubamracek updated this revision to Diff 55591.Apr 29 2016, 6:49 AM

kubamracek retitled this revision from to [sanitizer] Don't reuse the main thread in ThreadRegistry.

kubamracek updated this object.

kubamracek added reviewers: dvyukov, samsonov, glider, kcc.

kubamracek added a project: Restricted Project.

kubamracek added subscribers: llvm-commits, zaks.anna, dcoughlin.

Herald added a subscriber: kubamracek. · View Herald TranscriptApr 29 2016, 6:49 AM

Why is it difficult to reproduce? The quarantine if FIFO queue of size 16 IIRC. So if you create/destroy 16 threads, next thread creation will reuse the oldest thread context. Or the crash does not always happen after reuse?
I am hinting on a test.

The crash happens with GCD worker threads. The only way to wait for a worker thread to be destroyed that I know of, is using long sleep()s and even then it's non-deterministic and the actual delays differ in different OS versions.

If you can make the test crash once in 100 runs that's still better than no test. If we have a regression, the failure will be detected eventually (on bots or in manual test runs).

The test would need to sleep for a long time (~30 seconds total). Is that okay?

No, it is not OK. How does the look like?
Note to get thread id reuse, you don't need to use GCD worker threads, you can create 16 normal threads, ensure that they started and join them.

In D19722#418402, @dvyukov wrote:

No, it is not OK. How does the look like?
Note to get thread id reuse, you don't need to use GCD worker threads, you can create 16 normal threads, ensure that they started and join them.

The crash only happens when the main thread is reused as a worker thread. It's hard to trigger that, since you can't control the creation and termination of worker threads. I don't know how would throwing in some regular pthreads help.

OK, let's leave it as is.

This revision is now accepted and ready to land.May 2 2016, 3:13 AM

Closed by commit rL268238: [sanitizer] Don't reuse the main thread in ThreadRegistry (authored by kuba.brecka). · Explain WhyMay 2 2016, 8:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

compiler-rt/

trunk/

lib/

sanitizer_common/

sanitizer_thread_registry.cc

2 lines

Diff 55821

compiler-rt/trunk/lib/sanitizer_common/sanitizer_thread_registry.cc

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	void ThreadRegistry::StartThread(u32 tid, uptr os_id, void *arg) {
CHECK_LT(tid, n_contexts_);		CHECK_LT(tid, n_contexts_);
ThreadContextBase *tctx = threads_[tid];		ThreadContextBase *tctx = threads_[tid];
CHECK_NE(tctx, 0);		CHECK_NE(tctx, 0);
CHECK_EQ(ThreadStatusCreated, tctx->status);		CHECK_EQ(ThreadStatusCreated, tctx->status);
tctx->SetStarted(os_id, arg);		tctx->SetStarted(os_id, arg);
}		}

void ThreadRegistry::QuarantinePush(ThreadContextBase *tctx) {		void ThreadRegistry::QuarantinePush(ThreadContextBase *tctx) {
		if (tctx->tid == 0)
		return; // Don't reuse the main thread. It's a special snowflake.
dead_threads_.push_back(tctx);		dead_threads_.push_back(tctx);
if (dead_threads_.size() <= thread_quarantine_size_)		if (dead_threads_.size() <= thread_quarantine_size_)
return;		return;
tctx = dead_threads_.front();		tctx = dead_threads_.front();
dead_threads_.pop_front();		dead_threads_.pop_front();
CHECK_EQ(tctx->status, ThreadStatusDead);		CHECK_EQ(tctx->status, ThreadStatusDead);
tctx->Reset();		tctx->Reset();
tctx->reuse_count++;		tctx->reuse_count++;
Show All 14 Lines