This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/trunk/lib/scudo/standalone/
-
trunk/
-
lib/
-
scudo/
-
standalone/
-
CMakeLists.txt
-
internal_defs.h
-
mutex.h
-
quarantine.h
-
tests/
-
CMakeLists.txt
-
primary_test.cc
-
tsd_test.cc
-
tsd.h
-
tsd_exclusive.h
-
tsd_shared.h

Differential D62258

[scudo][standalone] Introduce the thread specific data structures
ClosedPublic

Authored by cryptoad on May 22 2019, 8:18 AM.

Download Raw Diff

Details

Reviewers

eugenis
vitalybuka
morehouse
hctim

Commits

rG52bfd673d155: [scudo][standalone] Introduce the thread specific data structures
rL362962: [scudo][standalone] Introduce the thread specific data structures
rCRT362962: [scudo][standalone] Introduce the thread specific data structures

Summary

This CL adds the structures dealing with thread specific data for the
allocator. This includes the thread specific data structure itself and
two registries for said structures: an exclusive one, where each thread
will have its own TSD struct, and a shared one, where a pool of TSD
structs will be shared by all threads, with dynamic reassignment at
runtime based on contention.

This departs from the current Scudo implementation: we intend to make
the Registry a template parameter of the allocator (as opposed to a
single global entity), allowing various allocators to coexist with
different TSD registry models. As a result, TSD registry and Allocator
are tightly coupled.

This also corrects a couple of things in other files that I noticed
while adding this.

Diff Detail

Repository: rL LLVM

Event Timeline

cryptoad created this revision.May 22 2019, 8:18 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 22 2019, 8:18 AM

Herald added subscribers: Restricted Project, jfb, delcypher and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B32317: Diff 200755.May 22 2019, 8:19 AM

Correct a test and some formatting.

Harbormaster completed remote builds in B32324: Diff 200779.May 22 2019, 9:55 AM

morehouse added inline comments.May 23 2019, 5:08 PM

lib/scudo/standalone/tests/tsd_test.cc
34 ↗	(On Diff #200779)	I assume allocators will always be linker-initialized, which is why we have `reset` for testing instead of an `init`?
49 ↗	(On Diff #200779)	`Allocator->` is cleaner than `Allocator.get()->`, unless there's a good reason to get the raw pointer.
lib/scudo/standalone/tsd.h
27 ↗	(On Diff #200779)	`Mutex.initLinkerInitialized()`
lib/scudo/standalone/tsd_exclusive.h
28 ↗	(On Diff #200779)	Shouldn't this also call `initLinkerInitialized`?
60 ↗	(On Diff #200779)	Will threads have different allocators?
64 ↗	(On Diff #200779)	What is the use-case for `MinimalInit` initialization?
74 ↗	(On Diff #200779)	Any reason this needs to be a pointer?
94 ↗	(On Diff #200779)	Assigning to `N` seems pointless. Can we simplify to: if (TSDRegistryT::ThreadTSD.DestructorIterations > 1) { TSDRegistryT::ThreadTSD.DestructorIterations--; ... }
lib/scudo/standalone/tsd_shared.h
21 ↗	(On Diff #200779)	Shouldn't this also call `initLinkerInitialized`?
23 ↗	(On Diff #200779)	Should Allocator initialization just be done in `initLinkerInitialized`?
61 ↗	(On Diff #200779)	What are the CoPrimes used for, and what is the strange calculation above doing?
74 ↗	(On Diff #200779)	Do we really need three ways to do TLS? Can we just use pthreads for all?
97 ↗	(On Diff #200779)	Is it sufficient to just check `NumberOfTSDs`?
123 ↗	(On Diff #200779)	`Index %= NumberOfTSDs`
144 ↗	(On Diff #200779)	Why do we need these guards here but not in tsd_exclusive.h?

Matt, thank you for all the reviews you are doing. Very insightful points.

lib/scudo/standalone/tests/tsd_test.cc
34 ↗	(On Diff #200779)	There is indeed a subtlety here, the `initOnce` function of a TSD will call the `initLinkerInitialized` function of the allocator. Calling this `init` would (I feel), indicate the usual construct of `init` calling `initLinkedInitialized`, which shouldn't be the case here, we just want a nulled out structure but not call the `initLinkedInitialized` wanna-be-constructor. Hence using `reset`.
lib/scudo/standalone/tsd_exclusive.h
60 ↗	(On Diff #200779)	Multiple allocators can coexist in a process, hence a need to know which allocator a registry belongs to.
64 ↗	(On Diff #200779)	Right, this is documented in combined.h that hasn't landed yet, here is the comment: // For a deallocation, we only ensure minimal initialization, meaning thread // local data will be left uninitialized for now (when using ELF TLS). The // fallback cache will be used instead. This is a workaround for a situation // where the only heap operation performed in a thread would be a free past // the TLS destructors, ending up in initialized thread specific data never // being destroyed properly. Any other heap operation will do a full init. There is test case in the current Scudo for that situation that will be in as well.
74 ↗	(On Diff #200779)	The main reason is for an uninitialized allocator to take as little memory space as possible. A TSD is usually around 8kB (varying based on the number of classes and pointers cached). It costs an extra map() on init, but allows several allocators to coexist with using extra memory (the code footprint cost still being there).
lib/scudo/standalone/tsd_shared.h
23 ↗	(On Diff #200779)	This question made me realize that something is a bit sketchy. `initOnce` is initializing the allocator, which in turn is calling `initLinkerInitialized`of the registry: at this point we are in the spinmutex, and call the `initLinkerInitialized` method of that same mutex. It all works out because 1) everything is zeroinitialized 2) the `initLinkerInitialized` of the spinmutex is a no-op. But it is definitely logically skewed. The original version was using `pthread_once` which was alleviating the need for the mutex, but I needed the Instance parameter for `initOnce` (pthread_one doesn't allow parameters to the once function). I am going to have to rethink that.
61 ↗	(On Diff #200779)	I added a comment. The original idea was from Dmitry, but there are online reference such as: https://lemire.me/blog/2017/09/18/visiting-all-values-in-an-array-exactly-once-in-random-order/
74 ↗	(On Diff #200779)	The ELF TLS would be the fastest as it's only a couple of instructions to access the data relative to %fs. The pthread implementations vary widely. They require a call, which is not ideal, but also they range in terms of efficiency. The `getAndroidTlsPtr` ends up being a few asm instructions as well. Now with the introduction of ELF TLS in Android (beginning of this year), we might be able to get rid of the Android TLS ptr trick. I haven't tested it yet. Also I am not sure the ELF TLS works from within the libc (it doesn't on Fuchsia for example).
97 ↗	(On Diff #200779)	This is to help the compiler optimize the block out. If `MaxTSDCount` is 1 (likely the svelte Android config case), the compiler doesn't know that NumberOfTSDs is 1 and leaves the block in. The `MaxTSDCount` check will be optimized at compile time, and for the 1 case, the whole block goes away.
123 ↗	(On Diff #200779)	Here the division is potentially costly, hence going for a comparison and subtraction. I am not sure the performance difference is significant in this situation (slow path, 4 iterations at most) but I tried to avoid divisions everywhere as a general rule of thumb.
144 ↗	(On Diff #200779)	The exclusive version only uses ELF TLS. So if opting for an exclusive TSD it assumes the platform supports it. For the shared version, it can work without ELF TLS, hence having `THREADLOCAL` variables within defines. A couple of things to consider: Fuchsia supports `THREADLOCAL`, but not within the libc, and we are in the libc. Android recently got support for ELF TLS, but 1) I haven't tested it yet 2) I don't know if `THREADLOCAL` works from within the C library I will revisit this when things settle, but as of now, I am sure this version works.

Addressing several of Matt's review points:

adding comments to obscure code snippets
calling initLinkerInitialized when needed
changing unique_ptr's get()-> to ->
simplifying some code constructs

Harbormaster completed remote builds in B32461: Diff 201263.May 24 2019, 8:51 AM

Switching from a spin mutex to a blocking mutex for initOnce.

Harbormaster completed remote builds in B32480: Diff 201333.May 24 2019, 2:07 PM

morehouse added inline comments.May 29 2019, 10:40 AM

lib/scudo/standalone/tsd_exclusive.h
28 ↗	(On Diff #200779)	I think switching to `BlockingMutex` introduced a new bug. `BlockingMutex` has a default constructor that will run at global ctor time. So the following sequence could happen: `TSDRegistryExT::initLInkerInitialized()` `Mutex.lock()` `TSDRegistryExT()` implicit constructor runs, calling `Mutex`'s ctor. So now `Mutex` is unlocked even though `unlock` was never called. (also applies to the other registry)
60 ↗	(On Diff #200779)	Ok, but will a single registry ever have multiple allocators? If not, we don't need to pass the allocator to the therad-specific `initThread` and should instead pass it to the global `init`.
64 ↗	(On Diff #200779)	Thanks for the explanation. Would it make sense to put that comment here, or is the combined a better place?

cryptoad marked an inline comment as done.May 29 2019, 11:49 AM

cryptoad added inline comments.

lib/scudo/standalone/tsd_exclusive.h
28 ↗	(On Diff #200779)	Damn, I failed. The other option was to implement a call_once type function, that would be specific to the registry, using an atomic cas (like llvm::call_once) and a little spin, and not have a mutex at all. I can't use std::call_once and pthread_once, but if you have another idea, I am happy to oblige.
60 ↗	(On Diff #200779)	Ah, I see what you mean. That is indeed the case, I'll reorganize that.
64 ↗	(On Diff #200779)	I'll add a short version of it here.

morehouse added inline comments.May 29 2019, 12:54 PM

lib/scudo/standalone/tsd_exclusive.h
28 ↗	(On Diff #200779)	Is there a reason we need the init loop to begin with? It would be nice to have a one-way initialization instead.
lib/scudo/standalone/tsd_shared.h
23 ↗	(On Diff #200779)	Seems this is a consequence of the tight coupling between registry and allocator. I think ideally we'd have a one-way dependency so either allocator-has-a-registry, or registry-has-an-allocator. Then the initialization only ever makes sense one-way. Loose coupling in general would also make it easier to understand each piece in isolation, which makes future code maintenance much easier.

cryptoad marked 2 inline comments as not done.May 29 2019, 3:12 PM

cryptoad added inline comments.

lib/scudo/standalone/tsd_exclusive.h
28 ↗	(On Diff #200779)	The loop isn't required, it's sort of a residue at my attempt at lazy initializing everything, while keeping the init/initLinkerInitialized construct. The initialization flow should be as follows: we have a zero-initialized Combined structure someone calls allocate() on that Combined this does initThread, which calls initThread in the registry (since it's really a registry thing) thread isn't initialized, neither is the allocator, so call the combined's initOnce parse the flag, initialize the internal structures

This, hopefully, detangles a bit the initialization process for
the registry. In order to do that, we get rid of the mutex in favor
of our own call_once type construct (losely inspired by
llvm::call_once). We move the initialization code into
initLinkerInitialized.

This gets rid of the potential init loop, and hopefully makes the
code structure more coherent.

Additional, I had some local flakes while testing this, and realized
that I messed up some region log sizes in the tests. Since we are
modifyin that file here, unflake the test as well.

Harbormaster completed remote builds in B32827: Diff 202742.Jun 3 2019, 9:47 AM

Add a comment for MinimalInit.

Harbormaster completed remote builds in B32828: Diff 202744.Jun 3 2019, 9:59 AM

Change the Primary test again to be more forgiving to OOM.

Harbormaster completed remote builds in B32830: Diff 202747.Jun 3 2019, 10:13 AM

Disable a test on 32-bit for now: we are running out of address space.
This will be fixed in a subsequent CL, but affects the tests of this one.

Harbormaster completed remote builds in B32832: Diff 202752.Jun 3 2019, 10:32 AM

Ping pretty please! I think all the comments were addressed one way or another. The new structure doesn't "cycle" anymore.

morehouse added inline comments.Jun 6 2019, 12:56 PM

lib/scudo/standalone/tests/tsd_test.cc
29 ↗	(On Diff #201333)	Don't we need `TSDRegistry.initLinkerInitialized()`?
lib/scudo/standalone/tsd_exclusive.h
62 ↗	(On Diff #202752)	With this change, we now expect initialization to only happen through `initThreadMaybe`, right? Should we even keep `initLinkerInitialized` around then?
67 ↗	(On Diff #202752)	Is the fence necessary? Not an atomic expert, but doesn't the load achieve the same thing?
77 ↗	(On Diff #202752)	I think things would be much simpler if we can use a `StaticSpinMutex`. We're assuming we're zero-initialized anyway, so the mutex will be initialized before use as well.

cryptoad added inline comments.Jun 6 2019, 1:17 PM

lib/scudo/standalone/tsd_exclusive.h
62 ↗	(On Diff #202752)	If I did things correctly, the Registry can also now be initialized with `initLinkerInitialized`, which will carry out the "once" initialization. Then `initThreadMaybe` will do the thread initialization skipping the "once". My current plan is to have everything lazy initialized the first time someone calls `malloc` (or whatever else), meaning the first `initThreadMaybe` will carry the "once" initialization, but the alternative of pre-initializing is now offered.
67 ↗	(On Diff #202752)	Not an atomic expert either, it's inspired from from `llvm::call_once` which does this. I figured it had a purpose and that I should keep it.
77 ↗	(On Diff #202752)	I agree. But the issue that would come up using `StaticSpinMutex`, is that we would land in `initLinkerInitialized`where we should call the Mutex's `initLinkerInitialized` (to be consistent), while holding it. It works because it's a noop, but doesn't make sense from a logical perspective. Implementing it this way (with our wannabe `call_once`) allows us to not have such a loophole.

morehouse added inline comments.Jun 6 2019, 3:37 PM

lib/scudo/standalone/tsd_exclusive.h
77 ↗	(On Diff #202752)	Can we simply remove `initLinkerInitialized` from `StaticSpinMutex`, as we have in sanitizer_common? We could possibly add a comment that calls to `init` are only necessary if the mutex is not linker initialized.

cryptoad added inline comments.Jun 6 2019, 5:02 PM

lib/scudo/standalone/tsd_exclusive.h
77 ↗	(On Diff #202752)	Definitely can!

As discussed through comments, re-introduce a StaticSpinMutex to the
"once" initialization of the TSD registry, and remove its no-op
initLinkerInitialized in the various places it was used.
Also adds a test to exercise the path of "direct" initialization
via calling initLinkerInitialized directly on the registry.

Harbormaster completed remote builds in B33064: Diff 203569.Jun 7 2019, 9:16 AM

morehouse added inline comments.Jun 7 2019, 1:00 PM

lib/scudo/standalone/tsd_exclusive.h
69 ↗	(On Diff #203569)	This is unguarded by the mutex. I think what we need to do is call `initOnce` unconditionally.
lib/scudo/standalone/tsd_shared.h
99 ↗	(On Diff #203569)	This is also unguarded by mutex.

cryptoad added inline comments.Jun 7 2019, 1:13 PM

lib/scudo/standalone/tsd_exclusive.h
69 ↗	(On Diff #203569)	My understanding is that it's not needed. If it's `true` then initialization has been carried, if `false`, then we lock the mutex in `initOnce`. It's assuming that the bool can't be partially modified, but I guess I should enforce that with an atomic_u8 instead.

Use an atomic_u8 for Initialized instead of a bool.

Harbormaster completed remote builds in B33080: Diff 203611.Jun 7 2019, 1:20 PM

clang-format'ing the code.

Harbormaster completed remote builds in B33081: Diff 203613.Jun 7 2019, 1:24 PM

morehouse accepted this revision.Jun 7 2019, 1:50 PM

morehouse added inline comments.

lib/scudo/standalone/tsd_exclusive.h
69 ↗	(On Diff #203569)	I think another issue is that the compiler is allowed to reorder instructions within a locking context. So it is possible that `Initialized = true` but initialization hasn't fully finished. If we always acquire the lock before accessing `Initialized`, we guarantee that we're fully initialized since compiler can't reorder instructions outside of the locking context.

This revision is now accepted and ready to land.Jun 7 2019, 1:50 PM

Always call initOnce, and rename it initOnceMaybe to reflect that
initialization might not necessarily occur if it already happened.

Harbormaster completed remote builds in B33083: Diff 203621.Jun 7 2019, 2:01 PM

Closed by commit rL362962: [scudo][standalone] Introduce the thread specific data structures (authored by cryptoad). · Explain WhyJun 10 2019, 9:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

compiler-rt/

trunk/

lib/

scudo/

standalone/

3 lines

2 lines

1 line

2 lines

tests/

1 line

21 lines

152 lines

61 lines

114 lines

166 lines

Diff 203850

compiler-rt/trunk/lib/scudo/standalone/CMakeLists.txt

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	set(SCUDO_HEADERS
primary64.h		primary64.h
quarantine.h		quarantine.h
release.h		release.h
report.h		report.h
secondary.h		secondary.h
size_class_map.h		size_class_map.h
stats.h		stats.h
string_utils.h		string_utils.h
		tsd.h
		tsd_exclusive.h
		tsd_shared.h
vector.h)		vector.h)

if(COMPILER_RT_HAS_SCUDO_STANDALONE)		if(COMPILER_RT_HAS_SCUDO_STANDALONE)
add_compiler_rt_object_libraries(RTScudoStandalone		add_compiler_rt_object_libraries(RTScudoStandalone
ARCHS ${SCUDO_STANDALONE_SUPPORTED_ARCH}		ARCHS ${SCUDO_STANDALONE_SUPPORTED_ARCH}
SOURCES ${SCUDO_SOURCES}		SOURCES ${SCUDO_SOURCES}
ADDITIONAL_HEADERS ${SCUDO_HEADERS}		ADDITIONAL_HEADERS ${SCUDO_HEADERS}
CFLAGS ${SCUDO_CFLAGS})		CFLAGS ${SCUDO_CFLAGS})
Show All 13 Lines

compiler-rt/trunk/lib/scudo/standalone/internal_defs.h

	Show All 11 Lines
	#include "platform.h"			#include "platform.h"

	#include <stdint.h>			#include <stdint.h>

	#ifndef SCUDO_DEBUG			#ifndef SCUDO_DEBUG
	#define SCUDO_DEBUG 0			#define SCUDO_DEBUG 0
	#endif			#endif

	#define ARRAY_SIZE(a) (sizeof(a) / sizeof((a)[0]))			#define ARRAY_SIZE(A) (sizeof(A) / sizeof((A)[0]))

	// String related macros.			// String related macros.

	#define STRINGIFY_(S) #S			#define STRINGIFY_(S) #S
	#define STRINGIFY(S) STRINGIFY_(S)			#define STRINGIFY(S) STRINGIFY_(S)
	#define CONCATENATE_(S, C) S##C			#define CONCATENATE_(S, C) S##C
	#define CONCATENATE(S, C) CONCATENATE_(S, C)			#define CONCATENATE(S, C) CONCATENATE_(S, C)

	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

compiler-rt/trunk/lib/scudo/standalone/mutex.h

	Show All 10 Lines

	#include "atomic_helpers.h"			#include "atomic_helpers.h"
	#include "common.h"			#include "common.h"

	namespace scudo {			namespace scudo {

	class StaticSpinMutex {			class StaticSpinMutex {
	public:			public:
	void initLinkerInitialized() {}
	void init() { atomic_store_relaxed(&State, 0); }			void init() { atomic_store_relaxed(&State, 0); }

	void lock() {			void lock() {
	if (tryLock())			if (tryLock())
	return;			return;
	lockSlow();			lockSlow();
	}			}

	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

compiler-rt/trunk/lib/scudo/standalone/quarantine.h

Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	void initLinkerInitialized(uptr Size, uptr CacheSize) {
// is zero (it allows us to perform just one atomic read per put() call).		// is zero (it allows us to perform just one atomic read per put() call).
CHECK((Size == 0 && CacheSize == 0) \|\| CacheSize != 0);		CHECK((Size == 0 && CacheSize == 0) \|\| CacheSize != 0);

atomic_store_relaxed(&MaxSize, Size);		atomic_store_relaxed(&MaxSize, Size);
atomic_store_relaxed(&MinSize, Size / 10 * 9); // 90% of max size.		atomic_store_relaxed(&MinSize, Size / 10 * 9); // 90% of max size.
atomic_store_relaxed(&MaxCacheSize, CacheSize);		atomic_store_relaxed(&MaxCacheSize, CacheSize);

Cache.initLinkerInitialized();		Cache.initLinkerInitialized();
CacheMutex.initLinkerInitialized();
RecyleMutex.initLinkerInitialized();
}		}
void init(uptr Size, uptr CacheSize) {		void init(uptr Size, uptr CacheSize) {
memset(this, 0, sizeof(*this));		memset(this, 0, sizeof(*this));
initLinkerInitialized(Size, CacheSize);		initLinkerInitialized(Size, CacheSize);
}		}

uptr getMaxSize() const { return atomic_load_relaxed(&MaxSize); }		uptr getMaxSize() const { return atomic_load_relaxed(&MaxSize); }
uptr getCacheSize() const { return atomic_load_relaxed(&MaxCacheSize); }		uptr getCacheSize() const { return atomic_load_relaxed(&MaxCacheSize); }
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

compiler-rt/trunk/lib/scudo/standalone/tests/CMakeLists.txt

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	set(SCUDO_UNIT_TEST_SOURCES
primary_test.cc		primary_test.cc
quarantine_test.cc		quarantine_test.cc
release_test.cc		release_test.cc
report_test.cc		report_test.cc
secondary_test.cc		secondary_test.cc
size_class_map_test.cc		size_class_map_test.cc
stats_test.cc		stats_test.cc
strings_test.cc		strings_test.cc
		tsd_test.cc
vector_test.cc		vector_test.cc
scudo_unit_test_main.cc)		scudo_unit_test_main.cc)

add_scudo_unittest(ScudoUnitTest		add_scudo_unittest(ScudoUnitTest
SOURCES ${SCUDO_UNIT_TEST_SOURCES})		SOURCES ${SCUDO_UNIT_TEST_SOURCES})

compiler-rt/trunk/lib/scudo/standalone/tests/primary_test.cc

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	template <typename Primary> static void testPrimary() {
}		}
Cache.destroy(nullptr);		Cache.destroy(nullptr);
Allocator->releaseToOS();		Allocator->releaseToOS();
Allocator->printStats();		Allocator->printStats();
}		}

TEST(ScudoPrimaryTest, BasicPrimary) {		TEST(ScudoPrimaryTest, BasicPrimary) {
using SizeClassMap = scudo::DefaultSizeClassMap;		using SizeClassMap = scudo::DefaultSizeClassMap;
testPrimary<scudo::SizeClassAllocator32<SizeClassMap, 24U>>();		testPrimary<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
testPrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();		testPrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();
}		}

// The 64-bit SizeClassAllocator can be easily OOM'd with small region sizes.		// The 64-bit SizeClassAllocator can be easily OOM'd with small region sizes.
// For the 32-bit one, it requires actually exhausting memory, so we skip it.		// For the 32-bit one, it requires actually exhausting memory, so we skip it.
TEST(ScudoPrimaryTest, Primary64OOM) {		TEST(ScudoPrimaryTest, Primary64OOM) {
using Primary = scudo::SizeClassAllocator64<scudo::DefaultSizeClassMap, 20U>;		using Primary = scudo::SizeClassAllocator64<scudo::DefaultSizeClassMap, 20U>;
using TransferBatch = Primary::CacheT::TransferBatch;		using TransferBatch = Primary::CacheT::TransferBatch;
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	template <typename Primary> static void testIteratePrimary() {
}		}
Cache.destroy(nullptr);		Cache.destroy(nullptr);
Allocator->releaseToOS();		Allocator->releaseToOS();
Allocator->printStats();		Allocator->printStats();
}		}

TEST(ScudoPrimaryTest, PrimaryIterate) {		TEST(ScudoPrimaryTest, PrimaryIterate) {
using SizeClassMap = scudo::DefaultSizeClassMap;		using SizeClassMap = scudo::DefaultSizeClassMap;
testIteratePrimary<scudo::SizeClassAllocator32<SizeClassMap, 24U>>();		testIteratePrimary<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
testIteratePrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();		testIteratePrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();
}		}

		// TODO(kostyak): reenable on 32-bit after implementing unmapTestOnly for the
		// primary: we are running out of addressable space without.
		#if SCUDO_WORDSIZE == 64U

static std::mutex Mutex;		static std::mutex Mutex;
static std::condition_variable Cv;		static std::condition_variable Cv;
static bool Ready = false;		static bool Ready = false;

template <typename Primary> static void performAllocations(Primary *Allocator) {		template <typename Primary> static void performAllocations(Primary *Allocator) {
static THREADLOCAL typename Primary::CacheT Cache;		static THREADLOCAL typename Primary::CacheT Cache;
Cache.init(nullptr, Allocator);		Cache.init(nullptr, Allocator);
std::vector<std::pair<scudo::uptr, void *>> V;		std::vector<std::pair<scudo::uptr, void *>> V;
{		{
std::unique_lock<std::mutex> Lock(Mutex);		std::unique_lock<std::mutex> Lock(Mutex);
while (!Ready)		while (!Ready)
Cv.wait(Lock);		Cv.wait(Lock);
}		}
for (scudo::uptr I = 0; I < 256U; I++) {		for (scudo::uptr I = 0; I < 256U; I++) {
const scudo::uptr Size = std::rand() % Primary::SizeClassMap::MaxSize;		const scudo::uptr Size = std::rand() % Primary::SizeClassMap::MaxSize;
const scudo::uptr ClassId = Primary::SizeClassMap::getClassIdBySize(Size);		const scudo::uptr ClassId = Primary::SizeClassMap::getClassIdBySize(Size);
void *P = Cache.allocate(ClassId);		void *P = Cache.allocate(ClassId);
		if (P)
V.push_back(std::make_pair(ClassId, P));		V.push_back(std::make_pair(ClassId, P));
}		}
while (!V.empty()) {		while (!V.empty()) {
auto Pair = V.back();		auto Pair = V.back();
Cache.deallocate(Pair.first, Pair.second);		Cache.deallocate(Pair.first, Pair.second);
V.pop_back();		V.pop_back();
}		}
Cache.destroy(nullptr);		Cache.destroy(nullptr);
}		}

template <typename Primary> static void testPrimaryThreaded() {		template <typename Primary> static void testPrimaryThreaded() {
std::unique_ptr<Primary> Allocator(new Primary);		std::unique_ptr<Primary> Allocator(new Primary);
Allocator->init(/ReleaseToOsInterval=/-1);		Allocator->init(/ReleaseToOsInterval=/-1);
std::thread Threads[10];		std::thread Threads[32];
for (scudo::uptr I = 0; I < 10U; I++)		for (scudo::uptr I = 0; I < ARRAY_SIZE(Threads); I++)
Threads[I] = std::thread(performAllocations<Primary>, Allocator.get());		Threads[I] = std::thread(performAllocations<Primary>, Allocator.get());
{		{
std::unique_lock<std::mutex> Lock(Mutex);		std::unique_lock<std::mutex> Lock(Mutex);
Ready = true;		Ready = true;
Cv.notify_all();		Cv.notify_all();
}		}
for (auto &T : Threads)		for (auto &T : Threads)
T.join();		T.join();
Allocator->releaseToOS();		Allocator->releaseToOS();
Allocator->printStats();		Allocator->printStats();
}		}

TEST(ScudoPrimaryTest, PrimaryThreaded) {		TEST(ScudoPrimaryTest, PrimaryThreaded) {
using SizeClassMap = scudo::SvelteSizeClassMap;		using SizeClassMap = scudo::SvelteSizeClassMap;
testPrimaryThreaded<scudo::SizeClassAllocator32<SizeClassMap, 24U>>();		testPrimaryThreaded<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
testPrimaryThreaded<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();		testPrimaryThreaded<scudo::SizeClassAllocator64<SizeClassMap, 28U>>();
}		}

		#endif // SCUDO_WORDSIZE == 64U

compiler-rt/trunk/lib/scudo/standalone/tests/tsd_test.cc

				//===-- tsd_test.cc ---------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "tsd_exclusive.h"
				#include "tsd_shared.h"

				#include "gtest/gtest.h"

				#include <condition_variable>
				#include <mutex>
				#include <thread>

				// We mock out an allocator with a TSD registry, mostly using empty stubs. The
				// cache contains a single volatile uptr, to be able to test that several
				// concurrent threads will not access or modify the same cache at the same time.
				template <class Config> class MockAllocator {
				public:
				using ThisT = MockAllocator<Config>;
				using TSDRegistryT = typename Config::template TSDRegistryT<ThisT>;
				using CacheT = struct MockCache { volatile scudo::uptr Canary; };
				using QuarantineCacheT = struct MockQuarantine {};

				void initLinkerInitialized() {
				// This should only be called once by the registry.
				EXPECT_FALSE(Initialized);
				Initialized = true;
				}
				void reset() { memset(this, 0, sizeof(*this)); }

				void initCache(CacheT Cache) { memset(Cache, 0, sizeof(Cache)); }
				void commitBack(scudo::TSD<MockAllocator> *TSD) {}
				TSDRegistryT *getTSDRegistry() { return &TSDRegistry; }

				bool isInitialized() { return Initialized; }

				private:
				bool Initialized;
				TSDRegistryT TSDRegistry;
				};

				struct OneCache {
				template <class Allocator>
				using TSDRegistryT = scudo::TSDRegistrySharedT<Allocator, 1U>;
				};

				struct SharedCaches {
				template <class Allocator>
				using TSDRegistryT = scudo::TSDRegistrySharedT<Allocator, 16U>;
				};

				struct ExclusiveCaches {
				template <class Allocator>
				using TSDRegistryT = scudo::TSDRegistryExT<Allocator>;
				};

				TEST(ScudoTSDTest, TSDRegistryInit) {
				using AllocatorT = MockAllocator<OneCache>;
				std::unique_ptr<AllocatorT> Allocator(new AllocatorT);
				Allocator->reset();
				EXPECT_FALSE(Allocator->isInitialized());

				auto Registry = Allocator->getTSDRegistry();
				Registry->initLinkerInitialized(Allocator.get());
				EXPECT_TRUE(Allocator->isInitialized());
				}

				template <class AllocatorT> static void testRegistry() {
				std::unique_ptr<AllocatorT> Allocator(new AllocatorT);
				Allocator->reset();
				EXPECT_FALSE(Allocator->isInitialized());

				auto Registry = Allocator->getTSDRegistry();
				Registry->initThreadMaybe(Allocator.get(), /MinimalInit=/true);
				EXPECT_TRUE(Allocator->isInitialized());

				bool UnlockRequired;
				auto TSD = Registry->getTSDAndLock(&UnlockRequired);
				EXPECT_NE(TSD, nullptr);
				EXPECT_EQ(TSD->Cache.Canary, 0U);
				if (UnlockRequired)
				TSD->unlock();

				Registry->initThreadMaybe(Allocator.get(), /MinimalInit=/false);
				TSD = Registry->getTSDAndLock(&UnlockRequired);
				EXPECT_NE(TSD, nullptr);
				EXPECT_EQ(TSD->Cache.Canary, 0U);
				memset(&TSD->Cache, 0x42, sizeof(TSD->Cache));
				if (UnlockRequired)
				TSD->unlock();
				}

				TEST(ScudoTSDTest, TSDRegistryBasic) {
				testRegistry<MockAllocator<OneCache>>();
				testRegistry<MockAllocator<SharedCaches>>();
				testRegistry<MockAllocator<ExclusiveCaches>>();
				}

				static std::mutex Mutex;
				static std::condition_variable Cv;
				static bool Ready = false;

				template <typename AllocatorT> static void stressCache(AllocatorT *Allocator) {
				auto Registry = Allocator->getTSDRegistry();
				{
				std::unique_lock<std::mutex> Lock(Mutex);
				while (!Ready)
				Cv.wait(Lock);
				}
				Registry->initThreadMaybe(Allocator, /MinimalInit=/false);
				bool UnlockRequired;
				auto TSD = Registry->getTSDAndLock(&UnlockRequired);
				EXPECT_NE(TSD, nullptr);
				// For an exclusive TSD, the cache should be empty. We cannot guarantee the
				// same for a shared TSD.
				if (!UnlockRequired)
				EXPECT_EQ(TSD->Cache.Canary, 0U);
				// Transform the thread id to a uptr to use it as canary.
				const scudo::uptr Canary = static_cast<scudo::uptr>(
				std::hash<std::thread::id>{}(std::this_thread::get_id()));
				TSD->Cache.Canary = Canary;
				// Loop a few times to make sure that a concurrent thread isn't modifying it.
				for (scudo::uptr I = 0; I < 4096U; I++)
				EXPECT_EQ(TSD->Cache.Canary, Canary);
				if (UnlockRequired)
				TSD->unlock();
				}

				template <class AllocatorT> static void testRegistryThreaded() {
				std::unique_ptr<AllocatorT> Allocator(new AllocatorT);
				Allocator->reset();
				std::thread Threads[32];
				for (scudo::uptr I = 0; I < ARRAY_SIZE(Threads); I++)
				Threads[I] = std::thread(stressCache<AllocatorT>, Allocator.get());
				{
				std::unique_lock<std::mutex> Lock(Mutex);
				Ready = true;
				Cv.notify_all();
				}
				for (auto &T : Threads)
				T.join();
				}

				TEST(ScudoTSDTest, TSDRegistryThreaded) {
				testRegistryThreaded<MockAllocator<OneCache>>();
				testRegistryThreaded<MockAllocator<SharedCaches>>();
				testRegistryThreaded<MockAllocator<ExclusiveCaches>>();
				}

compiler-rt/trunk/lib/scudo/standalone/tsd.h

				//===-- tsd.h ---------------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef SCUDO_TSD_H_
				#define SCUDO_TSD_H_

				#include "atomic_helpers.h"
				#include "common.h"
				#include "mutex.h"

				#include <limits.h> // for PTHREAD_DESTRUCTOR_ITERATIONS

				namespace scudo {

				template <class Allocator> struct ALIGNED(SCUDO_CACHE_LINE_SIZE) TSD {
				typename Allocator::CacheT Cache;
				typename Allocator::QuarantineCacheT QuarantineCache;
				u8 DestructorIterations;

				void initLinkerInitialized(Allocator *Instance) {
				Instance->initCache(&Cache);
				DestructorIterations = PTHREAD_DESTRUCTOR_ITERATIONS;
				}
				void init(Allocator *Instance) {
				memset(this, 0, sizeof(*this));
				initLinkerInitialized(Instance);
				}

				void commitBack(Allocator *Instance) { Instance->commitBack(this); }

				INLINE bool tryLock() {
				if (Mutex.tryLock()) {
				atomic_store_relaxed(&Precedence, 0);
				return true;
				}
				if (atomic_load_relaxed(&Precedence) == 0)
				atomic_store_relaxed(
				&Precedence,
				static_cast<uptr>(getMonotonicTime() >> FIRST_32_SECOND_64(16, 0)));
				return false;
				}
				INLINE void lock() {
				atomic_store_relaxed(&Precedence, 0);
				Mutex.lock();
				}
				INLINE void unlock() { Mutex.unlock(); }
				INLINE uptr getPrecedence() { return atomic_load_relaxed(&Precedence); }

				private:
				StaticSpinMutex Mutex;
				atomic_uptr Precedence;
				};

				} // namespace scudo

				#endif // SCUDO_TSD_H_

compiler-rt/trunk/lib/scudo/standalone/tsd_exclusive.h

				//===-- tsd_exclusive.h ------------------------------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef SCUDO_TSD_EXCLUSIVE_H_
				#define SCUDO_TSD_EXCLUSIVE_H_

				#include "tsd.h"

				#include <pthread.h>

				namespace scudo {

				enum class ThreadState : u8 {
				NotInitialized = 0,
				Initialized,
				TornDown,
				};

				template <class Allocator> void teardownThread(void *Ptr);

				template <class Allocator> struct TSDRegistryExT {
				void initLinkerInitialized(Allocator *Instance) {
				Instance->initLinkerInitialized();
				CHECK_EQ(pthread_key_create(&PThreadKey, teardownThread<Allocator>), 0);
				FallbackTSD = reinterpret_cast<TSD<Allocator> *>(
				map(nullptr, sizeof(TSD<Allocator>), "scudo:tsd"));
				FallbackTSD->initLinkerInitialized(Instance);
				Initialized = true;
				}
				void init(Allocator *Instance) {
				memset(this, 0, sizeof(*this));
				initLinkerInitialized(Instance);
				}

				ALWAYS_INLINE void initThreadMaybe(Allocator *Instance, bool MinimalInit) {
				if (LIKELY(State != ThreadState::NotInitialized))
				return;
				initThread(Instance, MinimalInit);
				}

				ALWAYS_INLINE TSD<Allocator> getTSDAndLock(bool UnlockRequired) {
				if (LIKELY(State == ThreadState::Initialized)) {
				*UnlockRequired = false;
				return &ThreadTSD;
				}
				DCHECK(FallbackTSD);
				FallbackTSD->lock();
				*UnlockRequired = true;
				return FallbackTSD;
				}

				private:
				void initOnceMaybe(Allocator *Instance) {
				SpinMutexLock L(&Mutex);
				if (Initialized)
				return;
				initLinkerInitialized(Instance); // Sets Initialized.
				}

				// Using minimal initialization allows for global initialization while keeping
				// the thread specific structure untouched. The fallback structure will be
				// used instead.
				NOINLINE void initThread(Allocator *Instance, bool MinimalInit) {
				initOnceMaybe(Instance);
				if (MinimalInit)
				return;
				CHECK_EQ(
				pthread_setspecific(PThreadKey, reinterpret_cast<void *>(Instance)), 0);
				ThreadTSD.initLinkerInitialized(Instance);
				State = ThreadState::Initialized;
				}

				pthread_key_t PThreadKey;
				bool Initialized;
				TSD<Allocator> *FallbackTSD;
				StaticSpinMutex Mutex;
				static THREADLOCAL ThreadState State;
				static THREADLOCAL TSD<Allocator> ThreadTSD;

				friend void teardownThread<Allocator>(void *Ptr);
				};

				template <class Allocator>
				THREADLOCAL TSD<Allocator> TSDRegistryExT<Allocator>::ThreadTSD;
				template <class Allocator>
				THREADLOCAL ThreadState TSDRegistryExT<Allocator>::State;

				template <class Allocator> void teardownThread(void *Ptr) {
				typedef TSDRegistryExT<Allocator> TSDRegistryT;
				Allocator Instance = reinterpret_cast<Allocator >(Ptr);
				// The glibc POSIX thread-local-storage deallocation routine calls user
				// provided destructors in a loop of PTHREAD_DESTRUCTOR_ITERATIONS.
				// We want to be called last since other destructors might call free and the
				// like, so we wait until PTHREAD_DESTRUCTOR_ITERATIONS before draining the
				// quarantine and swallowing the cache.
				if (TSDRegistryT::ThreadTSD.DestructorIterations > 1) {
				TSDRegistryT::ThreadTSD.DestructorIterations--;
				// If pthread_setspecific fails, we will go ahead with the teardown.
				if (LIKELY(pthread_setspecific(Instance->getTSDRegistry()->PThreadKey,
				Ptr) == 0))
				return;
				}
				TSDRegistryT::ThreadTSD.commitBack(Instance);
				TSDRegistryT::State = ThreadState::TornDown;
				}

				} // namespace scudo

				#endif // SCUDO_TSD_EXCLUSIVE_H_

compiler-rt/trunk/lib/scudo/standalone/tsd_shared.h

				//===-- tsd_shared.h --------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef SCUDO_TSD_SHARED_H_
				#define SCUDO_TSD_SHARED_H_

				#include "linux.h" // for getAndroidTlsPtr()
				#include "tsd.h"

				#include <pthread.h>

				namespace scudo {

				template <class Allocator, u32 MaxTSDCount> struct TSDRegistrySharedT {
				void initLinkerInitialized(Allocator *Instance) {
				Instance->initLinkerInitialized();
				CHECK_EQ(pthread_key_create(&PThreadKey, nullptr), 0); // For non-TLS
				NumberOfTSDs = Min(Max(1U, getNumberOfCPUs()), MaxTSDCount);
				TSDs = reinterpret_cast<TSD<Allocator> *>(
				map(nullptr, sizeof(TSD<Allocator>) * NumberOfTSDs, "scudo:tsd"));
				for (u32 I = 0; I < NumberOfTSDs; I++)
				TSDs[I].initLinkerInitialized(Instance);
				// Compute all the coprimes of NumberOfTSDs. This will be used to walk the
				// array of TSDs in a random order. For details, see:
				// https://lemire.me/blog/2017/09/18/visiting-all-values-in-an-array-exactly-once-in-random-order/
				for (u32 I = 0; I < NumberOfTSDs; I++) {
				u32 A = I + 1;
				u32 B = NumberOfTSDs;
				// Find the GCD between I + 1 and NumberOfTSDs. If 1, they are coprimes.
				while (B != 0) {
				const u32 T = A;
				A = B;
				B = T % B;
				}
				if (A == 1)
				CoPrimes[NumberOfCoPrimes++] = I + 1;
				}
				Initialized = true;
				}
				void init(Allocator *Instance) {
				memset(this, 0, sizeof(*this));
				initLinkerInitialized(Instance);
				}

				ALWAYS_INLINE void initThreadMaybe(Allocator *Instance,
				UNUSED bool MinimalInit) {
				if (LIKELY(getCurrentTSD()))
				return;
				initThread(Instance);
				}

				ALWAYS_INLINE TSD<Allocator> getTSDAndLock(bool UnlockRequired) {
				TSD<Allocator> *TSD = getCurrentTSD();
				DCHECK(TSD);
				*UnlockRequired = true;
				// Try to lock the currently associated context.
				if (TSD->tryLock())
				return TSD;
				// If that fails, go down the slow path.
				return getTSDAndLockSlow(TSD);
				}

				private:
				ALWAYS_INLINE void setCurrentTSD(TSD<Allocator> *CurrentTSD) {
				#if SCUDO_ANDROID
				*getAndroidTlsPtr() = reinterpret_cast<uptr>(CurrentTSD);
				#elif SCUDO_LINUX
				ThreadTSD = CurrentTSD;
				#else
				CHECK_EQ(
				pthread_setspecific(PThreadKey, reinterpret_cast<void *>(CurrentTSD)),
				0);
				#endif
				}

				ALWAYS_INLINE TSD<Allocator> *getCurrentTSD() {
				#if SCUDO_ANDROID
				return reinterpret_cast<TSD<Allocator> >(getAndroidTlsPtr());
				#elif SCUDO_LINUX
				return ThreadTSD;
				#else
				return reinterpret_cast<TSD<Allocator> *>(pthread_getspecific(PThreadKey));
				#endif
				}

				void initOnceMaybe(Allocator *Instance) {
				SpinMutexLock L(&Mutex);
				if (Initialized)
				return;
				initLinkerInitialized(Instance); // Sets Initialized.
				}

				NOINLINE void initThread(Allocator *Instance) {
				initOnceMaybe(Instance);
				// Initial context assignment is done in a plain round-robin fashion.
				const u32 Index = atomic_fetch_add(&CurrentIndex, 1U, memory_order_relaxed);
				setCurrentTSD(&TSDs[Index % NumberOfTSDs]);
				}

				NOINLINE TSD<Allocator> getTSDAndLockSlow(TSD<Allocator> CurrentTSD) {
				if (MaxTSDCount > 1U && NumberOfTSDs > 1U) {
				// Use the Precedence of the current TSD as our random seed. Since we are
				// in the slow path, it means that tryLock failed, and as a result it's
				// very likely that said Precedence is non-zero.
				u32 RandState = static_cast<u32>(CurrentTSD->getPrecedence());
				const u32 R = getRandomU32(&RandState);
				const u32 Inc = CoPrimes[R % NumberOfCoPrimes];
				u32 Index = R % NumberOfTSDs;
				uptr LowestPrecedence = UINTPTR_MAX;
				TSD<Allocator> *CandidateTSD = nullptr;
				// Go randomly through at most 4 contexts and find a candidate.
				for (u32 I = 0; I < Min(4U, NumberOfTSDs); I++) {
				if (TSDs[Index].tryLock()) {
				setCurrentTSD(&TSDs[Index]);
				return &TSDs[Index];
				}
				const uptr Precedence = TSDs[Index].getPrecedence();
				// A 0 precedence here means another thread just locked this TSD.
				if (UNLIKELY(Precedence == 0))
				continue;
				if (Precedence < LowestPrecedence) {
				CandidateTSD = &TSDs[Index];
				LowestPrecedence = Precedence;
				}
				Index += Inc;
				if (Index >= NumberOfTSDs)
				Index -= NumberOfTSDs;
				}
				if (CandidateTSD) {
				CandidateTSD->lock();
				setCurrentTSD(CandidateTSD);
				return CandidateTSD;
				}
				}
				// Last resort, stick with the current one.
				CurrentTSD->lock();
				return CurrentTSD;
				}

				pthread_key_t PThreadKey;
				atomic_u32 CurrentIndex;
				u32 NumberOfTSDs;
				TSD<Allocator> *TSDs;
				u32 NumberOfCoPrimes;
				u32 CoPrimes[MaxTSDCount];
				bool Initialized;
				StaticSpinMutex Mutex;
				#if SCUDO_LINUX && !SCUDO_ANDROID
				static THREADLOCAL TSD<Allocator> *ThreadTSD;
				#endif
				};

				#if SCUDO_LINUX && !SCUDO_ANDROID
				template <class Allocator, u32 MaxTSDCount>
				THREADLOCAL TSD<Allocator>
				*TSDRegistrySharedT<Allocator, MaxTSDCount>::ThreadTSD;
				#endif

				} // namespace scudo

				#endif // SCUDO_TSD_SHARED_H_