This is an archive of the discontinued LLVM Phabricator instance.

[scudo] RFC thread specific data refactoring
AbandonedPublic

Authored by cryptoad on Sep 7 2017, 1:27 PM.

Download Raw Diff

Details

Reviewers

eugenis
kcc
alekseyshl
dvyukov
vitalybuka
filcab

Summary

This is a request for comment, not an actual patch submission.

I would like to get some reviewers feedback on a refactoring of how the thread
specific data is handled in Scudo. The Linux or Android distinction is too
narrow and can't be sensically extanded to other platforms.

The core of the idea is to not make this distinction anymore, but to introduce
a shared TSD model (N caches shared between threads, locking) vs an exclusive
TSD model (1 cache per thread, no locking required). This would allow for easy
platform inclusions, as demonstrated with the addition of Fuchsia here. The
model could ultimately be specified via defines as opposed to be set in stone
for a given platform (eg: we could do shared caches on Linux).

While the code included works, it's merely for demonstration purposes. The
previous organization involved .inc files per platform, which I am not opposed
to do again but felt cumbersome to deal with. I am looking for comments,
suggestions, ideas regarding code organization, naming, so that this refactoring
would make sense to others.

Thanks in advance!

Diff Detail

Build Status

Buildable 9996
Build 9996: arc lint + arc unit

Event Timeline

cryptoad created this revision.Sep 7 2017, 1:27 PM

Herald added subscribers: mgorny, srhines. · View Herald TranscriptSep 7 2017, 1:27 PM

Harbormaster completed remote builds in B9996: Diff 114247.Sep 7 2017, 1:30 PM

eugenis added inline comments.Sep 7 2017, 5:02 PM

lib/scudo/scudo_allocator.cpp
460	Perhaps this could be done in a RAII fashion? Having to do unlock() on every successfull TSD access is error-prone.

vitalybuka added inline comments.Sep 7 2017, 5:05 PM

lib/scudo/scudo_allocator.cpp
460	e.g. with __sanitizer::at_scope_exit

Looks reasonable.

Would be easier to understand if s/ScudoThreadContext/ScudoTSD/ renaming is separated.
So this is basically no functional changes, just rebranding what we had for linux as "context per thread" and what we had for android as "several shared contexts", right?

The model could ultimately be specified via defines as opposed to be set in stone for a given platform (eg: we could do shared caches on Linux).

It would be interesting to benchmark Android with cache per thread model as well.

dvyukov added inline comments.Sep 12 2017, 1:34 AM

lib/scudo/scudo_allocator.cpp
255	This and getPrng look unnecessary now.
385	It feels that in this new model getTSDAndLock() should just never return NULL. If some TSD implementation can't return a thread-local descriptor, it should be _its_ problem to allocate and return a fallback descriptor instead. Since we now have a notion of getTSDAndLock()/TSD->unlock(), global mutex-protected fallback descriptor fits into into this model well. But the "global hashed set of TSDs" implementation just does not have this problem.
lib/scudo/scudo_tsd.h
27	I think this file and the cc will be much easier to understand if split into multiple files (one per model).

In D37590#867632, @dvyukov wrote:

So this is basically no functional changes, just rebranding what we had for linux as "context per thread" and what we had for android as "several shared contexts", right?

This is correct. Rebranding and reorganization of the code.

lib/scudo/scudo_allocator.cpp
255	I think I had them for accessor consistency more than anything else.
385	I think I tried that at some point. I got into some issues with the fact that the fallback TSD is a shared one (eg: with mutex) while using exclusive per-thread TSDs (no locks). This ended up with both TSD versions having to be present in the code (current it's one or the other) and added a layer of complexity that ended up being counter intuitive to me. There is likely some more clever way to do it that I haven't thought about though. I'll have another look.
lib/scudo/scudo_tsd.h
27	I think I will indeed have to keep the previous multilple .h/.inc/.cpp organization.

In D37590#867633, @dvyukov wrote:

It would be interesting to benchmark Android with cache per thread model as well.

As of now, Android still doesn't have ELF TLS but is using emutls for its thread_local variables, which doesn't work for us.

In D37590#868114, @cryptoad wrote:

In D37590#867633, @dvyukov wrote:

It would be interesting to benchmark Android with cache per thread model as well.

As of now, Android still doesn't have ELF TLS but is using emutls for its thread_local variables, which doesn't work for us.

We do have a TLS slot, check out get_android_tls_ptr.

In D37590#868133, @dvyukov wrote:

We do have a TLS slot, check out get_android_tls_ptr.

My bad, misunderstood your comment.

Abandoning this, I am gonna work on an actual CL now.

cryptoad mentioned this in D38139: [scudo] Scudo thread specific data refactor, part 1.Sep 21 2017, 9:12 AM

cryptoad mentioned this in rL313987: [scudo] Scudo thread specific data refactor, part 1.Sep 22 2017, 8:37 AM

Revision Contents

Path

Size

lib/

scudo/

CMakeLists.txt

3 lines

scudo_allocator.cpp

58 lines

scudo_platform.h

35 lines

scudo_tls_android.cpp

scudo_tls_android.inc

scudo_tls_context_android.inc

scudo_tls_context_linux.inc

131 lines

141 lines

Diff 114247

lib/scudo/CMakeLists.txt

	add_compiler_rt_component(scudo)			add_compiler_rt_component(scudo)

	include_directories(..)			include_directories(..)

	set(SCUDO_CFLAGS ${SANITIZER_COMMON_CFLAGS})			set(SCUDO_CFLAGS ${SANITIZER_COMMON_CFLAGS})
	# SANITIZER_COMMON_CFLAGS include -fno-builtin, but we actually want builtins!			# SANITIZER_COMMON_CFLAGS include -fno-builtin, but we actually want builtins!
	list(APPEND SCUDO_CFLAGS -fbuiltin)			list(APPEND SCUDO_CFLAGS -fbuiltin)
	append_rtti_flag(OFF SCUDO_CFLAGS)			append_rtti_flag(OFF SCUDO_CFLAGS)

	set(SCUDO_SOURCES			set(SCUDO_SOURCES
	scudo_allocator.cpp			scudo_allocator.cpp
	scudo_flags.cpp			scudo_flags.cpp
	scudo_crc32.cpp			scudo_crc32.cpp
	scudo_interceptors.cpp			scudo_interceptors.cpp
	scudo_new_delete.cpp			scudo_new_delete.cpp
	scudo_termination.cpp			scudo_termination.cpp
	scudo_tls_android.cpp			scudo_tsd.cpp
	scudo_tls_linux.cpp
	scudo_utils.cpp)			scudo_utils.cpp)

	# Enable the SSE 4.2 instruction set for scudo_crc32.cpp, if available.			# Enable the SSE 4.2 instruction set for scudo_crc32.cpp, if available.
	if (COMPILER_RT_HAS_MSSE4_2_FLAG)			if (COMPILER_RT_HAS_MSSE4_2_FLAG)
	set_source_files_properties(scudo_crc32.cpp PROPERTIES COMPILE_FLAGS -msse4.2)			set_source_files_properties(scudo_crc32.cpp PROPERTIES COMPILE_FLAGS -msse4.2)
	endif()			endif()

	# Enable the AArch64 CRC32 feature for scudo_crc32.cpp, if available.			# Enable the AArch64 CRC32 feature for scudo_crc32.cpp, if available.
	Show All 18 Lines

lib/scudo/scudo_allocator.cpp

Show All 11 Lines
/// heap corruption vulnerabilities. It provides a checksum-guarded chunk		/// heap corruption vulnerabilities. It provides a checksum-guarded chunk
/// header, a delayed free list, and additional sanity checks.		/// header, a delayed free list, and additional sanity checks.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "scudo_allocator.h"		#include "scudo_allocator.h"
#include "scudo_crc32.h"		#include "scudo_crc32.h"
#include "scudo_flags.h"		#include "scudo_flags.h"
#include "scudo_tls.h"		#include "scudo_tsd.h"
#include "scudo_utils.h"		#include "scudo_utils.h"

#include "sanitizer_common/sanitizer_allocator_checks.h"		#include "sanitizer_common/sanitizer_allocator_checks.h"
#include "sanitizer_common/sanitizer_allocator_interface.h"		#include "sanitizer_common/sanitizer_allocator_interface.h"
#include "sanitizer_common/sanitizer_errno.h"		#include "sanitizer_common/sanitizer_errno.h"
#include "sanitizer_common/sanitizer_quarantine.h"		#include "sanitizer_common/sanitizer_quarantine.h"

#include <string.h>		#include <string.h>
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	struct QuarantineCallback {
}		}

AllocatorCache *Cache_;		AllocatorCache *Cache_;
};		};

typedef Quarantine<QuarantineCallback, ScudoChunk> ScudoQuarantine;		typedef Quarantine<QuarantineCallback, ScudoChunk> ScudoQuarantine;
typedef ScudoQuarantine::Cache ScudoQuarantineCache;		typedef ScudoQuarantine::Cache ScudoQuarantineCache;
COMPILER_CHECK(sizeof(ScudoQuarantineCache) <=		COMPILER_CHECK(sizeof(ScudoQuarantineCache) <=
sizeof(ScudoThreadContext::QuarantineCachePlaceHolder));		sizeof(ScudoTSD::QuarantineCachePlaceHolder));

AllocatorCache getAllocatorCache(ScudoThreadContext ThreadContext) {		AllocatorCache getAllocatorCache(ScudoTSD TSD) {
		dvyukovUnsubmitted Not Done Reply Inline Actions This and getPrng look unnecessary now. dvyukov: This and getPrng look unnecessary now.
		cryptoadAuthorUnsubmitted Not Done Reply Inline Actions I think I had them for accessor consistency more than anything else. cryptoad: I think I had them for accessor consistency more than anything else.
return &ThreadContext->Cache;		return &TSD->Cache;
}		}

ScudoQuarantineCache getQuarantineCache(ScudoThreadContext ThreadContext) {		ScudoQuarantineCache getQuarantineCache(ScudoTSD TSD) {
return reinterpret_cast<		return reinterpret_cast<ScudoQuarantineCache *>(
ScudoQuarantineCache *>(ThreadContext->QuarantineCachePlaceHolder);		TSD->QuarantineCachePlaceHolder);
}		}

ScudoPrng getPrng(ScudoThreadContext ThreadContext) {		ScudoPrng getPrng(ScudoTSD TSD) {
return &ThreadContext->Prng;		return &TSD->Prng;
}		}

struct ScudoAllocator {		struct ScudoAllocator {
static const uptr MaxAllowedMallocSize =		static const uptr MaxAllowedMallocSize =
FIRST_32_SECOND_64(2UL << 30, 1ULL << 40);		FIRST_32_SECOND_64(2UL << 30, 1ULL << 40);

typedef ReturnNullOrDieOnFailure FailureHandler;		typedef ReturnNullOrDieOnFailure FailureHandler;

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	void *allocate(uptr Size, uptr Alignment, AllocType Type,
// but the Secondary will take care of its own alignment needs.		// but the Secondary will take care of its own alignment needs.
bool FromPrimary = PrimaryAllocator::CanAllocate(AlignedSize, MinAlignment);		bool FromPrimary = PrimaryAllocator::CanAllocate(AlignedSize, MinAlignment);

void *Ptr;		void *Ptr;
u8 Salt;		u8 Salt;
uptr AllocSize;		uptr AllocSize;
if (FromPrimary) {		if (FromPrimary) {
AllocSize = AlignedSize;		AllocSize = AlignedSize;
ScudoThreadContext *ThreadContext = getThreadContextAndLock();		ScudoTSD *TSD = getTSDAndLock();
if (LIKELY(ThreadContext)) {		if (LIKELY(TSD)) {
		dvyukovUnsubmitted Not Done Reply Inline Actions It feels that in this new model getTSDAndLock() should just never return NULL. If some TSD implementation can't return a thread-local descriptor, it should be _its_ problem to allocate and return a fallback descriptor instead. Since we now have a notion of getTSDAndLock()/TSD->unlock(), global mutex-protected fallback descriptor fits into into this model well. But the "global hashed set of TSDs" implementation just does not have this problem. dvyukov: It feels that in this new model getTSDAndLock() should just never return NULL. If some TSD…
		cryptoadAuthorUnsubmitted Not Done Reply Inline Actions I think I tried that at some point. I got into some issues with the fact that the fallback TSD is a shared one (eg: with mutex) while using exclusive per-thread TSDs (no locks). This ended up with both TSD versions having to be present in the code (current it's one or the other) and added a layer of complexity that ended up being counter intuitive to me. There is likely some more clever way to do it that I haven't thought about though. I'll have another look. cryptoad: I think I tried that at some point. I got into some issues with the fact that the fallback TSD…
Salt = getPrng(ThreadContext)->getU8();		Salt = getPrng(TSD)->getU8();
Ptr = BackendAllocator.allocatePrimary(getAllocatorCache(ThreadContext),		Ptr = BackendAllocator.allocatePrimary(getAllocatorCache(TSD),
AllocSize);		AllocSize);
ThreadContext->unlock();		TSD->unlock();
} else {		} else {
SpinMutexLock l(&FallbackMutex);		SpinMutexLock l(&FallbackMutex);
Salt = FallbackPrng.getU8();		Salt = FallbackPrng.getU8();
Ptr = BackendAllocator.allocatePrimary(&FallbackAllocatorCache,		Ptr = BackendAllocator.allocatePrimary(&FallbackAllocatorCache,
AllocSize);		AllocSize);
}		}
} else {		} else {
{		{
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	struct ScudoAllocator {
void quarantineOrDeallocateChunk(ScudoChunk Chunk, UnpackedHeader Header,		void quarantineOrDeallocateChunk(ScudoChunk Chunk, UnpackedHeader Header,
uptr Size) {		uptr Size) {
const bool BypassQuarantine = (AllocatorQuarantine.GetCacheSize() == 0) \|\|		const bool BypassQuarantine = (AllocatorQuarantine.GetCacheSize() == 0) \|\|
(Size > QuarantineChunksUpToSize);		(Size > QuarantineChunksUpToSize);
if (BypassQuarantine) {		if (BypassQuarantine) {
Chunk->eraseHeader();		Chunk->eraseHeader();
void *Ptr = Chunk->getAllocBeg(Header);		void *Ptr = Chunk->getAllocBeg(Header);
if (Header->FromPrimary) {		if (Header->FromPrimary) {
ScudoThreadContext *ThreadContext = getThreadContextAndLock();		ScudoTSD *TSD = getTSDAndLock();
if (LIKELY(ThreadContext)) {		if (LIKELY(TSD)) {
getBackendAllocator().deallocatePrimary(		getBackendAllocator().deallocatePrimary(getAllocatorCache(TSD), Ptr);
getAllocatorCache(ThreadContext), Ptr);		TSD->unlock();
		eugenisUnsubmitted Not Done Reply Inline Actions Perhaps this could be done in a RAII fashion? Having to do unlock() on every successfull TSD access is error-prone. eugenis: Perhaps this could be done in a RAII fashion? Having to do unlock() on every successfull TSD…
		vitalybukaUnsubmitted Not Done Reply Inline Actions e.g. with __sanitizer::at_scope_exit vitalybuka: e.g. with __sanitizer::at_scope_exit
ThreadContext->unlock();
} else {		} else {
SpinMutexLock Lock(&FallbackMutex);		SpinMutexLock Lock(&FallbackMutex);
getBackendAllocator().deallocatePrimary(&FallbackAllocatorCache, Ptr);		getBackendAllocator().deallocatePrimary(&FallbackAllocatorCache, Ptr);
}		}
} else {		} else {
getBackendAllocator().deallocateSecondary(Ptr);		getBackendAllocator().deallocateSecondary(Ptr);
}		}
} else {		} else {
// If a small memory amount was allocated with a larger alignment, we want		// If a small memory amount was allocated with a larger alignment, we want
// to take that into account. Otherwise the Quarantine would be filled		// to take that into account. Otherwise the Quarantine would be filled
// with tiny chunks, taking a lot of VA memory. This is an approximation		// with tiny chunks, taking a lot of VA memory. This is an approximation
// of the usable size, that allows us to not call		// of the usable size, that allows us to not call
// GetActuallyAllocatedSize.		// GetActuallyAllocatedSize.
uptr EstimatedSize = Size + (Header->Offset << MinAlignmentLog);		uptr EstimatedSize = Size + (Header->Offset << MinAlignmentLog);
UnpackedHeader NewHeader = *Header;		UnpackedHeader NewHeader = *Header;
NewHeader.State = ChunkQuarantine;		NewHeader.State = ChunkQuarantine;
Chunk->compareExchangeHeader(&NewHeader, Header);		Chunk->compareExchangeHeader(&NewHeader, Header);
ScudoThreadContext *ThreadContext = getThreadContextAndLock();		ScudoTSD *TSD = getTSDAndLock();
if (LIKELY(ThreadContext)) {		if (LIKELY(TSD)) {
AllocatorQuarantine.Put(getQuarantineCache(ThreadContext),		AllocatorQuarantine.Put(getQuarantineCache(TSD),
QuarantineCallback(		QuarantineCallback(getAllocatorCache(TSD)),
getAllocatorCache(ThreadContext)),
Chunk, EstimatedSize);		Chunk, EstimatedSize);
ThreadContext->unlock();		TSD->unlock();
} else {		} else {
SpinMutexLock l(&FallbackMutex);		SpinMutexLock l(&FallbackMutex);
AllocatorQuarantine.Put(&FallbackQuarantineCache,		AllocatorQuarantine.Put(&FallbackQuarantineCache,
QuarantineCallback(&FallbackAllocatorCache),		QuarantineCallback(&FallbackAllocatorCache),
Chunk, EstimatedSize);		Chunk, EstimatedSize);
}		}
}		}
}		}
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	struct ScudoAllocator {

void *calloc(uptr NMemB, uptr Size) {		void *calloc(uptr NMemB, uptr Size) {
initThreadMaybe();		initThreadMaybe();
if (UNLIKELY(CheckForCallocOverflow(NMemB, Size)))		if (UNLIKELY(CheckForCallocOverflow(NMemB, Size)))
return FailureHandler::OnBadRequest();		return FailureHandler::OnBadRequest();
return allocate(NMemB * Size, MinAlignment, FromMalloc, true);		return allocate(NMemB * Size, MinAlignment, FromMalloc, true);
}		}

void commitBack(ScudoThreadContext *ThreadContext) {		void commitBack(ScudoTSD *TSD) {
AllocatorCache *Cache = getAllocatorCache(ThreadContext);		AllocatorCache *Cache = getAllocatorCache(TSD);
AllocatorQuarantine.Drain(getQuarantineCache(ThreadContext),		AllocatorQuarantine.Drain(getQuarantineCache(TSD),
QuarantineCallback(Cache));		QuarantineCallback(Cache));
BackendAllocator.destroyCache(Cache);		BackendAllocator.destroyCache(Cache);
}		}

uptr getStats(AllocatorStat StatType) {		uptr getStats(AllocatorStat StatType) {
initThreadMaybe();		initThreadMaybe();
uptr stats[AllocatorStatCount];		uptr stats[AllocatorStatCount];
BackendAllocator.getStats(stats);		BackendAllocator.getStats(stats);
return stats[StatType];		return stats[StatType];
}		}
};		};

static ScudoAllocator Instance(LINKER_INITIALIZED);		static ScudoAllocator Instance(LINKER_INITIALIZED);

static ScudoBackendAllocator &getBackendAllocator() {		static ScudoBackendAllocator &getBackendAllocator() {
return Instance.BackendAllocator;		return Instance.BackendAllocator;
}		}

static void initScudoInternal(const AllocatorOptions &Options) {		static void initScudoInternal(const AllocatorOptions &Options) {
Instance.init(Options);		Instance.init(Options);
}		}

void ScudoThreadContext::init() {		void ScudoTSD::init() {
getBackendAllocator().initCache(&Cache);		getBackendAllocator().initCache(&Cache);
Prng.init();		Prng.init();
memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));		memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));
}		}

void ScudoThreadContext::commitBack() {		void ScudoTSD::commitBack() {
Instance.commitBack(this);		Instance.commitBack(this);
}		}

void *scudoMalloc(uptr Size, AllocType Type) {		void *scudoMalloc(uptr Size, AllocType Type) {
return SetErrnoOnNull(Instance.allocate(Size, MinAlignment, Type));		return SetErrnoOnNull(Instance.allocate(Size, MinAlignment, Type));
}		}

void scudoFree(void *Ptr, AllocType Type) {		void scudoFree(void *Ptr, AllocType Type) {
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

lib/scudo/scudo_platform.h

This file was added.

				//===-- scudo_platform.h ----------------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// Scudo platform specific definitions.
				///
				//===----------------------------------------------------------------------===//

				#ifndef SCUDO_PLATFORM_H_
				#define SCUDO_PLATFORM_H_

				#include "sanitizer_common/sanitizer_platform.h"

				#if !SANITIZER_LINUX && !SANITIZER_FUCHSIA
				# error "The Scudo hardened allocator is currently not supported this platform."
				#endif

				// Android and Fuchsia use a pool of TSDs shared between threads.
				#if SANITIZER_ANDROID \|\| SANITIZER_FUCHSIA
				# define SCUDO_TSD_EXCLUSIVE 0
				#endif // SANITIZER_FUCHSIA \|\| SANITIZER_FUCHSIA

				// Non-Android Linux use an exclusive TSD per thread.
				#if SANITIZER_LINUX && !SANITIZER_ANDROID
				# define SCUDO_TSD_EXCLUSIVE 1
				#endif // SANITIZER_LINUX && !SANITIZER_ANDROID

				// TODO(kostyak): ideally we would like a replacement to NanoTime that can
				// leverage the vDSO instead of doing a syscall each time.
				#endif // SCUDO_PLATFORM_H_

lib/scudo/scudo_tls_android.cpp

This file was deleted.

	//===-- scudo_tls_android.cpp ------------------------------------ C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// Scudo thread local structure implementation for Android.
	///
	//===----------------------------------------------------------------------===//

	#include "sanitizer_common/sanitizer_platform.h"

	#if SANITIZER_LINUX && SANITIZER_ANDROID

	#include "scudo_tls.h"

	#include <pthread.h>

	namespace __scudo {

	static pthread_once_t GlobalInitialized = PTHREAD_ONCE_INIT;
	static pthread_key_t PThreadKey;

	static atomic_uint32_t ThreadContextCurrentIndex;
	static ScudoThreadContext *ThreadContexts;
	static uptr NumberOfContexts;

	// sysconf(_SC_NPROCESSORS_{CONF,ONLN}) cannot be used as they allocate memory.
	static uptr getNumberOfCPUs() {
	cpu_set_t CPUs;
	CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
	return CPU_COUNT(&CPUs);
	}

	static void initOnce() {
	// Hack: TLS_SLOT_TSAN was introduced in N. To be able to use it on M for
	// testing, we create an unused key. Since the key_data array follows the tls
	// array, it basically gives us the extra entry we need.
	// TODO(kostyak): remove and restrict to N and above.
	CHECK_EQ(pthread_key_create(&PThreadKey, NULL), 0);
	initScudo();
	NumberOfContexts = getNumberOfCPUs();
	ThreadContexts = reinterpret_cast<ScudoThreadContext *>(
	MmapOrDie(sizeof(ScudoThreadContext) * NumberOfContexts, __func__));
	for (uptr i = 0; i < NumberOfContexts; i++)
	ThreadContexts[i].init();
	}

	void initThread() {
	pthread_once(&GlobalInitialized, initOnce);
	// Initial context assignment is done in a plain round-robin fashion.
	u32 Index = atomic_fetch_add(&ThreadContextCurrentIndex, 1,
	memory_order_relaxed);
	ScudoThreadContext *ThreadContext =
	&ThreadContexts[Index % NumberOfContexts];
	*get_android_tls_ptr() = reinterpret_cast<uptr>(ThreadContext);
	}

	ScudoThreadContext *getThreadContextAndLockSlow() {
	ScudoThreadContext *ThreadContext;
	// Go through all the contexts and find the first unlocked one.
	for (u32 i = 0; i < NumberOfContexts; i++) {
	ThreadContext = &ThreadContexts[i];
	if (ThreadContext->tryLock()) {
	*get_android_tls_ptr() = reinterpret_cast<uptr>(ThreadContext);
	return ThreadContext;
	}
	}
	// No luck, find the one with the lowest precedence, and slow lock it.
	u64 Precedence = UINT64_MAX;
	for (u32 i = 0; i < NumberOfContexts; i++) {
	u64 SlowLockPrecedence = ThreadContexts[i].getSlowLockPrecedence();
	if (SlowLockPrecedence && SlowLockPrecedence < Precedence) {
	ThreadContext = &ThreadContexts[i];
	Precedence = SlowLockPrecedence;
	}
	}
	if (LIKELY(Precedence != UINT64_MAX)) {
	ThreadContext->lock();
	*get_android_tls_ptr() = reinterpret_cast<uptr>(ThreadContext);
	return ThreadContext;
	}
	// Last resort (can this happen?), stick with the current one.
	ThreadContext =
	reinterpret_cast<ScudoThreadContext >(get_android_tls_ptr());
	ThreadContext->lock();
	return ThreadContext;
	}

	} // namespace __scudo

	#endif // SANITIZER_LINUX && SANITIZER_ANDROID

lib/scudo/scudo_tls_android.inc

This file was deleted.

	//===-- scudo_tls_android.inc ------------------------------------ C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// Scudo thread local structure fastpath functions implementation for Android.
	///
	//===----------------------------------------------------------------------===//

	#ifndef SCUDO_TLS_ANDROID_H_
	#define SCUDO_TLS_ANDROID_H_

	#ifndef SCUDO_TLS_H_
	# error "This file must be included inside scudo_tls.h."
	#endif // SCUDO_TLS_H_

	#if SANITIZER_LINUX && SANITIZER_ANDROID

	ALWAYS_INLINE void initThreadMaybe() {
	if (LIKELY(*get_android_tls_ptr()))
	return;
	initThread();
	}

	ScudoThreadContext *getThreadContextAndLockSlow();

	ALWAYS_INLINE ScudoThreadContext *getThreadContextAndLock() {
	ScudoThreadContext *ThreadContext =
	reinterpret_cast<ScudoThreadContext >(get_android_tls_ptr());
	CHECK(ThreadContext);
	// Try to lock the currently associated context.
	if (ThreadContext->tryLock())
	return ThreadContext;
	// If it failed, go the slow path.
	return getThreadContextAndLockSlow();
	}

	#endif // SANITIZER_LINUX && SANITIZER_ANDROID

	#endif // SCUDO_TLS_ANDROID_H_

lib/scudo/scudo_tls_context_android.inc

This file was deleted.

	//===-- scudo_tls_context_android.inc ---------------------------- C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// Android specific base thread context definition.
	///
	//===----------------------------------------------------------------------===//

	#ifndef SCUDO_TLS_CONTEXT_ANDROID_INC_
	#define SCUDO_TLS_CONTEXT_ANDROID_INC_

	#ifndef SCUDO_TLS_H_
	# error "This file must be included inside scudo_tls.h."
	#endif // SCUDO_TLS_H_

	#if SANITIZER_LINUX && SANITIZER_ANDROID

	struct ScudoThreadContextPlatform {
	INLINE bool tryLock() {
	if (Mutex.TryLock()) {
	atomic_store_relaxed(&SlowLockPrecedence, 0);
	return true;
	}
	if (atomic_load_relaxed(&SlowLockPrecedence) == 0)
	atomic_store_relaxed(&SlowLockPrecedence, NanoTime());
	return false;
	}

	INLINE void lock() {
	Mutex.Lock();
	atomic_store_relaxed(&SlowLockPrecedence, 0);
	}

	INLINE void unlock() {
	Mutex.Unlock();
	}

	INLINE u64 getSlowLockPrecedence() {
	return atomic_load_relaxed(&SlowLockPrecedence);
	}

	private:
	StaticSpinMutex Mutex;
	atomic_uint64_t SlowLockPrecedence;
	};

	#endif // SANITIZER_LINUX && SANITIZER_ANDROID

	#endif // SCUDO_TLS_CONTEXT_ANDROID_INC_

lib/scudo/scudo_tls_context_linux.inc

This file was deleted.

	//===-- scudo_tls_context_linux.inc ------------------------------ C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// Linux specific base thread context definition.
	///
	//===----------------------------------------------------------------------===//

	#ifndef SCUDO_TLS_CONTEXT_LINUX_INC_
	#define SCUDO_TLS_CONTEXT_LINUX_INC_

	#ifndef SCUDO_TLS_H_
	# error "This file must be included inside scudo_tls.h."
	#endif // SCUDO_TLS_H_

	#if SANITIZER_LINUX && !SANITIZER_ANDROID

	struct ScudoThreadContextPlatform {
	ALWAYS_INLINE void unlock() {}
	};

	#endif // SANITIZER_LINUX && !SANITIZER_ANDROID

	#endif // SCUDO_TLS_CONTEXT_LINUX_INC_

lib/scudo/scudo_tls_linux.cpp

This file was deleted.

	//===-- scudo_tls_linux.cpp -------------------------------------- C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// Scudo thread local structure implementation for platforms supporting
	/// thread_local.
	///
	//===----------------------------------------------------------------------===//

	#include "sanitizer_common/sanitizer_platform.h"

	#if SANITIZER_LINUX && !SANITIZER_ANDROID

	#include "scudo_tls.h"

	#include <pthread.h>

	namespace __scudo {

	static pthread_once_t GlobalInitialized = PTHREAD_ONCE_INIT;
	static pthread_key_t PThreadKey;

	__attribute__((tls_model("initial-exec")))
	THREADLOCAL ThreadState ScudoThreadState = ThreadNotInitialized;
	__attribute__((tls_model("initial-exec")))
	THREADLOCAL ScudoThreadContext ThreadLocalContext;

	static void teardownThread(void *Ptr) {
	uptr I = reinterpret_cast<uptr>(Ptr);
	// The glibc POSIX thread-local-storage deallocation routine calls user
	// provided destructors in a loop of PTHREAD_DESTRUCTOR_ITERATIONS.
	// We want to be called last since other destructors might call free and the
	// like, so we wait until PTHREAD_DESTRUCTOR_ITERATIONS before draining the
	// quarantine and swallowing the cache.
	if (I > 1) {
	// If pthread_setspecific fails, we will go ahead with the teardown.
	if (LIKELY(pthread_setspecific(PThreadKey,
	reinterpret_cast<void *>(I - 1)) == 0))
	return;
	}
	ThreadLocalContext.commitBack();
	ScudoThreadState = ThreadTornDown;
	}


	static void initOnce() {
	CHECK_EQ(pthread_key_create(&PThreadKey, teardownThread), 0);
	initScudo();
	}

	void initThread() {
	CHECK_EQ(pthread_once(&GlobalInitialized, initOnce), 0);
	CHECK_EQ(pthread_setspecific(PThreadKey, reinterpret_cast<void *>(
	GetPthreadDestructorIterations())), 0);
	ThreadLocalContext.init();
	ScudoThreadState = ThreadInitialized;
	}

	} // namespace __scudo

	#endif // SANITIZER_LINUX && !SANITIZER_ANDROID

lib/scudo/scudo_tls_linux.inc

This file was deleted.

	//===-- scudo_tls_linux.inc -------------------------------------- C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	///
	/// Scudo thread local structure fastpath functions implementation for platforms
	/// supporting thread_local.
	///
	//===----------------------------------------------------------------------===//

	#ifndef SCUDO_TLS_LINUX_H_
	#define SCUDO_TLS_LINUX_H_

	#ifndef SCUDO_TLS_H_
	# error "This file must be included inside scudo_tls.h."
	#endif // SCUDO_TLS_H_

	#if SANITIZER_LINUX && !SANITIZER_ANDROID

	enum ThreadState : u8 {
	ThreadNotInitialized = 0,
	ThreadInitialized,
	ThreadTornDown,
	};
	__attribute__((tls_model("initial-exec")))
	extern THREADLOCAL ThreadState ScudoThreadState;
	__attribute__((tls_model("initial-exec")))
	extern THREADLOCAL ScudoThreadContext ThreadLocalContext;

	ALWAYS_INLINE void initThreadMaybe() {
	if (LIKELY(ScudoThreadState != ThreadNotInitialized))
	return;
	initThread();
	}

	ALWAYS_INLINE ScudoThreadContext *getThreadContextAndLock() {
	if (UNLIKELY(ScudoThreadState == ThreadTornDown))
	return nullptr;
	return &ThreadLocalContext;
	}

	#endif // SANITIZER_LINUX && !SANITIZER_ANDROID

	#endif // SCUDO_TLS_LINUX_H_

lib/scudo/scudo_tsd.h

This file was added.

				//===-- scudo_tsd.h ---------------------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// Scudo thread specific data definition.
				///
				//===----------------------------------------------------------------------===//

				#ifndef SCUDO_TSD_H_
				#define SCUDO_TSD_H_

				#include "scudo_allocator.h"
				#include "scudo_platform.h"
				#include "scudo_utils.h"

				#include "sanitizer_common/sanitizer_linux.h"

				#include <pthread.h>

				namespace __scudo {

				#if !SCUDO_TSD_EXCLUSIVE
				dvyukovUnsubmitted Not Done Reply Inline Actions I think this file and the cc will be much easier to understand if split into multiple files (one per model). dvyukov: I think this file and the cc will be much easier to understand if split into multiple files…
				cryptoadAuthorUnsubmitted Not Done Reply Inline Actions I think I will indeed have to keep the previous multilple .h/.inc/.cpp organization. cryptoad: I think I will indeed have to keep the previous multilple .h/.inc/.cpp organization.
				struct ScudoTSDPlatform {
				INLINE bool tryLock() {
				if (Mutex.TryLock()) {
				atomic_store_relaxed(&Precedence, 0);
				return true;
				}
				if (atomic_load_relaxed(&Precedence) == 0)
				atomic_store_relaxed(&Precedence, NanoTime());
				return false;
				}

				INLINE void lock() {
				// TODO(kostyak): can the following prevent multiple threads from waiting on
				// the same TSD that would have the lowest precedence?
				// atomic_store_relaxed(&Precedence, NanoTime());
				Mutex.Lock();
				atomic_store_relaxed(&Precedence, 0);
				}

				INLINE void unlock() { Mutex.Unlock(); }

				INLINE u64 getPrecedence() {
				return atomic_load_relaxed(&Precedence);
				}

				private:
				StaticSpinMutex Mutex;
				atomic_uint64_t Precedence;
				};
				#else
				struct ScudoTSDPlatform {
				ALWAYS_INLINE void unlock() {}
				};
				#endif // !SCUDO_TSD_EXCLUSIVE

				struct ALIGNED(64) ScudoTSD final : ScudoTSDPlatform {
				AllocatorCache Cache;
				ScudoPrng Prng;
				uptr QuarantineCachePlaceHolder[4];
				void init();
				void commitBack();
				};

				void initThread();

				#if !SCUDO_TSD_EXCLUSIVE

				# if SANITIZER_ANDROID
				ALWAYS_INLINE ScudoTSD *getCurrentTSD() {
				return reinterpret_cast<ScudoTSD >(get_android_tls_ptr());
				}
				# else
				extern pthread_key_t PThreadKey;

				ALWAYS_INLINE ScudoTSD *getCurrentTSD() {
				return reinterpret_cast<ScudoTSD *>(pthread_getspecific(PThreadKey));
				}
				# endif // SANITIZER_ANDROID

				ALWAYS_INLINE void initThreadMaybe() {
				if (LIKELY(getCurrentTSD()))
				return;
				initThread();
				}

				ScudoTSD *getTSDAndLockSlow();

				ALWAYS_INLINE ScudoTSD *getTSDAndLock() {
				ScudoTSD *TSD = getCurrentTSD();
				CHECK(TSD);
				// Try to lock the currently associated context.
				if (TSD->tryLock())
				return TSD;
				// If it failed, go the slow path.
				return getTSDAndLockSlow();
				}
				#else
				enum ThreadState : u8 {
				ThreadNotInitialized = 0,
				ThreadInitialized,
				ThreadTornDown,
				};

				__attribute__((tls_model("initial-exec")))
				extern THREADLOCAL ThreadState ScudoThreadState;
				__attribute__((tls_model("initial-exec")))
				extern THREADLOCAL ScudoTSD TSD;

				ALWAYS_INLINE void initThreadMaybe() {
				if (LIKELY(ScudoThreadState != ThreadNotInitialized))
				return;
				initThread();
				}

				ALWAYS_INLINE ScudoTSD *getTSDAndLock() {
				if (UNLIKELY(ScudoThreadState == ThreadTornDown))
				return nullptr;
				return &TSD;
				}
				#endif // !SCUDO_TSD_EXCLUSIVE

				} // namespace __scudo

				#endif // SCUDO_TSD_H_

lib/scudo/scudo_tsd.cpp

This file was added.

				//===-- scudo_tsd.cpp -------------------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// Scudo thread specific data implementation.
				///
				//===----------------------------------------------------------------------===//

				// TODO(kostyak): solve annoying UNIT64_MAX & the like redefinitions
				#include "sanitizer_common/sanitizer_platform.h"
				#if SANITIZER_FUCHSIA
				# include <magenta/syscalls.h>
				#endif // SANITIZER_FUCHSIA

				#include "scudo_tsd.h"

				namespace __scudo {

				static pthread_once_t GlobalInitialized = PTHREAD_ONCE_INIT;
				pthread_key_t PThreadKey;

				#if !SCUDO_TSD_EXCLUSIVE
				static ScudoTSD *TSDs;
				static uptr NumberOfTSDs;

				// sysconf(_SC_NPROCESSORS_{CONF,ONLN}) cannot be used as they allocate memory.
				static uptr getNumberOfCPUs() {
				#if SANITIZER_FUCHSIA
				return _mx_system_get_num_cpus();
				#else
				cpu_set_t CPUs;
				CHECK_EQ(sched_getaffinity(0, sizeof(cpu_set_t), &CPUs), 0);
				return CPU_COUNT(&CPUs);
				#endif // SANITIZER_FUCHSIA
				}

				static void initOnce() {
				CHECK_EQ(pthread_key_create(&PThreadKey, NULL), 0);
				initScudo();
				NumberOfTSDs = getNumberOfCPUs();
				if (UNLIKELY(NumberOfTSDs == 0))
				NumberOfTSDs = 1;
				if (NumberOfTSDs > 8)
				NumberOfTSDs = 8;
				TSDs = reinterpret_cast<ScudoTSD >(MmapOrDie(sizeof(ScudoTSD) NumberOfTSDs,
				"ScudoTSDs"));
				for (uptr i = 0; i < NumberOfTSDs; i++)
				TSDs[i].init();
				}

				ALWAYS_INLINE void setCurrentTSD(ScudoTSD *TSD) {
				# if SANITIZER_ANDROID
				*get_android_tls_ptr() = reinterpret_cast<uptr>(TSD);
				# else
				CHECK_EQ(pthread_setspecific(PThreadKey, reinterpret_cast<void *>(TSD)), 0);
				# endif // SANITIZER_ANDROID
				}

				void initThread() {
				static atomic_uint32_t CurrentTSDIndex;
				pthread_once(&GlobalInitialized, initOnce);
				// Initial context assignment is done in a plain round-robin fashion.
				u32 Index = atomic_fetch_add(&CurrentTSDIndex, 1, memory_order_relaxed);
				ScudoTSD *TSD = &TSDs[Index % NumberOfTSDs];
				setCurrentTSD(TSD);
				}

				ScudoTSD *getTSDAndLockSlow() {
				ScudoTSD *TSD;
				if (NumberOfTSDs > 1) {
				// Go through all the contexts and find the first unlocked one.
				for (uptr i = 0; i < NumberOfTSDs; i++) {
				TSD = &TSDs[i];
				if (TSD->tryLock()) {
				setCurrentTSD(TSD);
				return TSD;
				}
				}
				// No luck, find the one with the earliest attempted lock, and slow lock it.
				u64 LowestPrecedence = UINT64_MAX;
				for (uptr i = 0; i < NumberOfTSDs; i++) {
				u64 Precedence = TSDs[i].getPrecedence();
				if (Precedence && Precedence < LowestPrecedence) {
				TSD = &TSDs[i];
				LowestPrecedence = Precedence;
				}
				}
				if (LIKELY(LowestPrecedence != UINT64_MAX)) {
				TSD->lock();
				setCurrentTSD(TSD);
				return TSD;
				}
				}
				// Stick with the current one.
				TSD = getCurrentTSD();
				TSD->lock();
				return TSD;
				}
				#else
				__attribute__((tls_model("initial-exec")))
				THREADLOCAL ThreadState ScudoThreadState = ThreadNotInitialized;
				__attribute__((tls_model("initial-exec")))
				THREADLOCAL ScudoTSD TSD;

				static void teardownThread(void *Ptr) {
				uptr I = reinterpret_cast<uptr>(Ptr);
				// The glibc POSIX thread-local-storage deallocation routine calls user
				// provided destructors in a loop of PTHREAD_DESTRUCTOR_ITERATIONS.
				// We want to be called last since other destructors might call free and the
				// like, so we wait until PTHREAD_DESTRUCTOR_ITERATIONS before draining the
				// quarantine and swallowing the cache.
				if (I > 1) {
				// If pthread_setspecific fails, we will go ahead with the teardown.
				if (LIKELY(pthread_setspecific(PThreadKey,
				reinterpret_cast<void *>(I - 1)) == 0))
				return;
				}
				TSD.commitBack();
				ScudoThreadState = ThreadTornDown;
				}

				static void initOnce() {
				CHECK_EQ(pthread_key_create(&PThreadKey, teardownThread), 0);
				initScudo();
				}

				void initThread() {
				CHECK_EQ(pthread_once(&GlobalInitialized, initOnce), 0);
				CHECK_EQ(pthread_setspecific(PThreadKey, reinterpret_cast<void *>(
				GetPthreadDestructorIterations())), 0);
				TSD.init();
				ScudoThreadState = ThreadInitialized;
				}
				#endif // !SCUDO_TSD_EXCLUSIVE

				} // namespace __scudo