This is an archive of the discontinued LLVM Phabricator instance.

[Sanitizers] Allocator: new "release memory to OS" implementation
ClosedPublic

Authored by alekseyshl on Sep 25 2017, 11:30 AM.

Download Raw Diff

Details

Reviewers

eugenis
cryptoad
dvyukov

Commits

rG04ce5ac30690: [Sanitizers] Allocator: new "release memory to OS" implementation
rCRT314311: [Sanitizers] Allocator: new "release memory to OS" implementation
rL314311: [Sanitizers] Allocator: new "release memory to OS" implementation

Summary

The current implementation of the allocator returning freed memory
back to OS (controlled by allocator_release_to_os_interval_ms flag)
requires sorting of the free chunks list, which has two major issues,
first, when free list grows to millions of chunks, sorting, even the
fastest one, is just too slow, and second, sorting chunks in place
is unacceptable for Scudo allocator as it makes allocations more
predictable and less secure.

The proposed approach is linear in complexity (altough requires quite
a bit more temporary memory). The idea is to count the number of free
chunks on each memory page and release pages containing free chunks
only. It requires one iteration over the free list of chunks and one
iteration over the array of page counters. The obvious disadvantage
is the allocation of the array of the counters, but even in the worst
case we support (4T allocator space, 64 buckets, 16 bytes bucket size,
full free list, which leads to 2 bytes per page counter and ~17M page
counters), requires just about 34Mb of the intermediate buffer (comparing
to ~64Gb of actually allocated chunks) and usually it stays under 100K
and released after each use. It is expected to be a relatively rare event,
releasing memory back to OS, keeping the buffer between those runs
and added complexity of the bookkeeping seems unnesessary here (it can
always be improved later, though, never say never).

The most interesting problem here is how to calculate the number of chunks
falling into each memory page in the bucket. Skipping all the details,
there are three cases when the number of chunks per page is constant:

P >= C, P % C == 0 --> N = P / C
C > P , C % P == 0 --> N = 1
C <= P, P % C != 0 && C % (P % C) == 0 --> N = P / C + 1

where P is page size, C is chunk size and N is the number of chunks per
page and the rest of the cases, where the number of chunks per page is
calculated on the go, during the page counter array iteration.

Among the rest, there are still cases where N can be deduced from the
page index, but they require not that much less calculations per page
than the current "brute force" way and 2/3 of the buckets fall into
the first three categories anyway, so, for the sake of simplicity,
it was decided to stick to those two variations. It can always be
refined and improved later, should we see that brute force way slows
us down unacceptably.

Diff Detail

Repository: rL LLVM

Event Timeline

alekseyshl created this revision.Sep 25 2017, 11:30 AM

Herald added subscribers: mehdi_amini, kubamracek. · View Herald TranscriptSep 25 2017, 11:30 AM

Harbormaster completed remote builds in B10574: Diff 116579.Sep 25 2017, 11:30 AM

I do not have comments on the code, it looks good to me.

There is a few points I was thinking about, not in the sense of suggestions for this CL, but rather general wondering:

How about about bumping last_release_at_ns when a region is grown? This would prevent the reclaiming to occur shortly after, and could lower the chances of releasing pages that were just allocated.
Could there be enough room in the free array to not have to map memory? eg: the memory between num_freed_chunks*sizeof(CompactPtrT) and mapped_free_array could be enough. I am not sure it would necessarily be a gain except maybe in memory usage. But since the memory is unmapped right after it might be worthless.
Did the tests show in any way what could be a good balance timing/cpu wise to set a default to?

Why are you mapping the counters in a specific place just after the free array, instead of simply anywhere? I.e. why MAP_FIXED at all?

lib/sanitizer_common/sanitizer_allocator_primary64.h
317 ↗	(On Diff #116579)	Either use a named constant instead of UINT64_MAX, or use 8 instead of sizeof(*buffer).
772 ↗	(On Diff #116579)	These are const pointers to mutable objects. What's the point?

looks good to me

Switched temporary counter buffer from fixed to random mapping and address minor style comments.

In D38245#880753, @eugenis wrote:

Why are you mapping the counters in a specific place just after the free array, instead of simply anywhere? I.e. why MAP_FIXED at all?

The original idea was to keep it mapped between iterations, but yes, it does not make much sense anymore and we can always get back to it later. Switched to just mapping.

lib/sanitizer_common/sanitizer_allocator_primary64.h
772 ↗	(On Diff #116579)	The point is to indicate that this pointer do not change, but you just reminded me that actually I was going to make CompactPtrToPointer et al const and switch to const refs. Done.

In D38245#880666, @cryptoad wrote:

I do not have comments on the code, it looks good to me.

There is a few points I was thinking about, not in the sense of suggestions for this CL, but rather general wondering:

How about about bumping last_release_at_ns when a region is grown? This would prevent the reclaiming to occur shortly after, and could lower the chances of releasing pages that were just allocated.

Could there be enough room in the free array to not have to map memory? eg: the memory between num_freed_chunks*sizeof(CompactPtrT) and mapped_free_array could be enough. I am not sure it would necessarily be a gain except maybe in memory usage. But since the memory is unmapped right after it might be worthless.

Did the tests show in any way what could be a good balance timing/cpu wise to set a default to?

Bumping last_release_at_ns makes sense, let me think about it a bit more
I'd rather not complicate the code just yet, but I like the idea of using whatever is already mapped. Will add a comment about it.
This version is pretty much linear to the number of pages allocated, everything else does not matter that much, so the more memory allocated, the slower it is. Comparing to the previous version, even the larger apps I tried spent insignificant amount of time releasing pages. I'd like to collect more diverse data before drawing any conclusions (fuzzer would make a great test app).

Add a couple TODOs

In D38245#881272, @alekseyshl wrote:

In D38245#880666, @cryptoad wrote:

I do not have comments on the code, it looks good to me.

There is a few points I was thinking about, not in the sense of suggestions for this CL, but rather general wondering:

How about about bumping last_release_at_ns when a region is grown? This would prevent the reclaiming to occur shortly after, and could lower the chances of releasing pages that were just allocated.

Could there be enough room in the free array to not have to map memory? eg: the memory between num_freed_chunks*sizeof(CompactPtrT) and mapped_free_array could be enough. I am not sure it would necessarily be a gain except maybe in memory usage. But since the memory is unmapped right after it might be worthless.

Did the tests show in any way what could be a good balance timing/cpu wise to set a default to?

Bumping last_release_at_ns makes sense, let me think about it a bit more

I'd rather not complicate the code just yet, but I like the idea of using whatever is already mapped. Will add a comment about it.

This version is pretty much linear to the number of pages allocated, everything else does not matter that much, so the more memory allocated, the slower it is. Comparing to the previous version, even the larger apps I tried spent insignificant amount of time releasing pages. I'd like to collect more diverse data before drawing any conclusions (fuzzer would make a great test app).

Regarding 1), decided to have a TODO for now. I am concerned with the case when free_list is growing relatively slowly but enough to block release to os (on the other hand, if it grows constantly, it means that we should not release anything yet anyway). I'd rather do it as a separate experiment and separate patch.

Harbormaster completed remote builds in B10603: Diff 116687.Sep 26 2017, 3:31 PM

Harbormaster completed remote builds in B10604: Diff 116690.

LGTM on my side.

This revision is now accepted and ready to land.Sep 26 2017, 3:37 PM

eugenis accepted this revision.Sep 26 2017, 4:23 PM

Closed by commit rL314311: [Sanitizers] Allocator: new "release memory to OS" implementation (authored by alekseyshl). · Explain WhySep 27 2017, 8:39 AM

This revision was automatically updated to reflect the committed changes.

alekseyshl mentioned this in rL314318: [Sanitizer] Disable compact size class tests on Android.Sep 27 2017, 10:12 AM

alekseyshl mentioned this in D39318: [Sanitizers] Set default allocator_release_to_os_interval_ms to 5 seconds.Oct 25 2017, 9:48 PM

alekseyshl mentioned this in rL316683: [Sanitizers] Set default allocator_release_to_os_interval_ms to 5 seconds.Oct 26 2017, 10:59 AM

Revision Contents

Path

Size

compiler-rt/

trunk/

lib/

sanitizer_common/

sanitizer_allocator_primary64.h

343 lines

tests/

sanitizer_allocator_test.cc

275 lines

Diff 116821

compiler-rt/trunk/lib/sanitizer_common/sanitizer_allocator_primary64.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	public:

typedef SizeClassAllocator64<Params> ThisT;		typedef SizeClassAllocator64<Params> ThisT;
typedef SizeClassAllocator64LocalCache<ThisT> AllocatorCache;		typedef SizeClassAllocator64LocalCache<ThisT> AllocatorCache;

// When we know the size class (the region base) we can represent a pointer		// When we know the size class (the region base) we can represent a pointer
// as a 4-byte integer (offset from the region start shifted right by 4).		// as a 4-byte integer (offset from the region start shifted right by 4).
typedef u32 CompactPtrT;		typedef u32 CompactPtrT;
static const uptr kCompactPtrScale = 4;		static const uptr kCompactPtrScale = 4;
CompactPtrT PointerToCompactPtr(uptr base, uptr ptr) {		CompactPtrT PointerToCompactPtr(uptr base, uptr ptr) const {
return static_cast<CompactPtrT>((ptr - base) >> kCompactPtrScale);		return static_cast<CompactPtrT>((ptr - base) >> kCompactPtrScale);
}		}
uptr CompactPtrToPointer(uptr base, CompactPtrT ptr32) {		uptr CompactPtrToPointer(uptr base, CompactPtrT ptr32) const {
return base + (static_cast<uptr>(ptr32) << kCompactPtrScale);		return base + (static_cast<uptr>(ptr32) << kCompactPtrScale);
}		}

void Init(s32 release_to_os_interval_ms) {		void Init(s32 release_to_os_interval_ms) {
uptr TotalSpaceSize = kSpaceSize + AdditionalSize();		uptr TotalSpaceSize = kSpaceSize + AdditionalSize();
if (kUsingConstantSpaceBeg) {		if (kUsingConstantSpaceBeg) {
CHECK_EQ(kSpaceBeg, reinterpret_cast<uptr>(		CHECK_EQ(kSpaceBeg, reinterpret_cast<uptr>(
MmapFixedNoAccess(kSpaceBeg, TotalSpaceSize)));		MmapFixedNoAccess(kSpaceBeg, TotalSpaceSize)));
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	public:
uptr GetRegionBegin(const void *p) {		uptr GetRegionBegin(const void *p) {
if (kUsingConstantSpaceBeg)		if (kUsingConstantSpaceBeg)
return reinterpret_cast<uptr>(p) & ~(kRegionSize - 1);		return reinterpret_cast<uptr>(p) & ~(kRegionSize - 1);
uptr space_beg = SpaceBeg();		uptr space_beg = SpaceBeg();
return ((reinterpret_cast<uptr>(p) - space_beg) & ~(kRegionSize - 1)) +		return ((reinterpret_cast<uptr>(p) - space_beg) & ~(kRegionSize - 1)) +
space_beg;		space_beg;
}		}

uptr GetRegionBeginBySizeClass(uptr class_id) {		uptr GetRegionBeginBySizeClass(uptr class_id) const {
return SpaceBeg() + kRegionSize * class_id;		return SpaceBeg() + kRegionSize * class_id;
}		}

uptr GetSizeClass(const void *p) {		uptr GetSizeClass(const void *p) {
if (kUsingConstantSpaceBeg && (kSpaceBeg % kSpaceSize) == 0)		if (kUsingConstantSpaceBeg && (kSpaceBeg % kSpaceSize) == 0)
return ((reinterpret_cast<uptr>(p)) / kRegionSize) % kNumClassesRounded;		return ((reinterpret_cast<uptr>(p)) / kRegionSize) % kNumClassesRounded;
return ((reinterpret_cast<uptr>(p) - SpaceBeg()) / kRegionSize) %		return ((reinterpret_cast<uptr>(p) - SpaceBeg()) / kRegionSize) %
kNumClassesRounded;		kNumClassesRounded;
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	static uptr AdditionalSize() {
return RoundUpTo(sizeof(RegionInfo) * kNumClassesRounded,		return RoundUpTo(sizeof(RegionInfo) * kNumClassesRounded,
GetPageSizeCached());		GetPageSizeCached());
}		}

typedef SizeClassMap SizeClassMapT;		typedef SizeClassMap SizeClassMapT;
static const uptr kNumClasses = SizeClassMap::kNumClasses;		static const uptr kNumClasses = SizeClassMap::kNumClasses;
static const uptr kNumClassesRounded = SizeClassMap::kNumClassesRounded;		static const uptr kNumClassesRounded = SizeClassMap::kNumClassesRounded;

		// A packed array of counters. Each counter occupies 2^n bits, enough to store
		// counter's max_value. Ctor will try to allocate the required buffer via
		// mapper->MapPackedCounterArrayBuffer and the caller is expected to check
		// whether the initialization was successful by checking IsAllocated() result.
		// For the performance sake, none of the accessors check the validity of the
		// arguments, it is assumed that index is always in [0, n) range and the value
		// is not incremented past max_value.
		template<class MemoryMapperT>
		class PackedCounterArray {
		public:
		PackedCounterArray(u64 num_counters, u64 max_value, MemoryMapperT *mapper)
		: n(num_counters), memory_mapper(mapper) {
		CHECK_GT(num_counters, 0);
		CHECK_GT(max_value, 0);
		constexpr u64 kMaxCounterBits = sizeof(buffer) 8ULL;
		// Rounding counter storage size up to the power of two allows for using
		// bit shifts calculating particular counter's index and offset.
		uptr counter_size_bits =
		RoundUpToPowerOfTwo(MostSignificantSetBitIndex(max_value) + 1);
		CHECK_LE(counter_size_bits, kMaxCounterBits);
		counter_size_bits_log = Log2(counter_size_bits);
		counter_mask = ~0ULL >> (kMaxCounterBits - counter_size_bits);

		uptr packing_ratio = kMaxCounterBits >> counter_size_bits_log;
		CHECK_GT(packing_ratio, 0);
		packing_ratio_log = Log2(packing_ratio);
		bit_offset_mask = packing_ratio - 1;

		buffer_size =
		(RoundUpTo(n, 1ULL << packing_ratio_log) >> packing_ratio_log) *
		sizeof(*buffer);
		buffer = reinterpret_cast<u64*>(
		memory_mapper->MapPackedCounterArrayBuffer(buffer_size));
		}
		~PackedCounterArray() {
		if (buffer) {
		memory_mapper->UnmapPackedCounterArrayBuffer(
		reinterpret_cast<uptr>(buffer), buffer_size);
		}
		}

		bool IsAllocated() const {
		return !!buffer;
		}

		u64 GetCount() const {
		return n;
		}

		uptr Get(uptr i) const {
		DCHECK_LT(i, n);
		uptr index = i >> packing_ratio_log;
		uptr bit_offset = (i & bit_offset_mask) << counter_size_bits_log;
		return (buffer[index] >> bit_offset) & counter_mask;
		}

		void Inc(uptr i) const {
		DCHECK_LT(Get(i), counter_mask);
		uptr index = i >> packing_ratio_log;
		uptr bit_offset = (i & bit_offset_mask) << counter_size_bits_log;
		buffer[index] += 1ULL << bit_offset;
		}

		void IncRange(uptr from, uptr to) const {
		DCHECK_LE(from, to);
		for (uptr i = from; i <= to; i++)
		Inc(i);
		}

		private:
		const u64 n;
		u64 counter_size_bits_log;
		u64 counter_mask;
		u64 packing_ratio_log;
		u64 bit_offset_mask;

		MemoryMapperT* const memory_mapper;
		u64 buffer_size;
		u64* buffer;
		};

		template<class MemoryMapperT>
		class FreePagesRangeTracker {
		public:
		explicit FreePagesRangeTracker(MemoryMapperT* mapper)
		: memory_mapper(mapper),
		page_size_scaled_log(Log2(GetPageSizeCached() >> kCompactPtrScale)),
		in_the_range(false), current_page(0), current_range_start_page(0) {}

		void NextPage(bool freed) {
		if (freed) {
		if (!in_the_range) {
		current_range_start_page = current_page;
		in_the_range = true;
		}
		} else {
		CloseOpenedRange();
		}
		current_page++;
		}

		void Done() {
		CloseOpenedRange();
		}

		private:
		void CloseOpenedRange() {
		if (in_the_range) {
		memory_mapper->ReleasePageRangeToOS(
		current_range_start_page << page_size_scaled_log,
		current_page << page_size_scaled_log);
		in_the_range = false;
		}
		}

		MemoryMapperT* const memory_mapper;
		const uptr page_size_scaled_log;
		bool in_the_range;
		uptr current_page;
		uptr current_range_start_page;
		};

		// Iterates over the free_array to identify memory pages containing freed
		// chunks only and returns these pages back to OS.
		// allocated_pages_count is the total number of pages allocated for the
		// current bucket.
		template<class MemoryMapperT>
		static void ReleaseFreeMemoryToOS(CompactPtrT *free_array,
		uptr free_array_count, uptr chunk_size,
		uptr allocated_pages_count,
		MemoryMapperT *memory_mapper) {
		const uptr page_size = GetPageSizeCached();

		// Figure out the number of chunks per page and whether we can take a fast
		// path (the number of chunks per page is the same for all pages).
		uptr full_pages_chunk_count_max;
		bool same_chunk_count_per_page;
		if (chunk_size <= page_size && page_size % chunk_size == 0) {
		// Same number of chunks per page, no cross overs.
		full_pages_chunk_count_max = page_size / chunk_size;
		same_chunk_count_per_page = true;
		} else if (chunk_size <= page_size && page_size % chunk_size != 0 &&
		chunk_size % (page_size % chunk_size) == 0) {
		// Some chunks are crossing page boundaries, which means that the page
		// contains one or two partial chunks, but all pages contain the same
		// number of chunks.
		full_pages_chunk_count_max = page_size / chunk_size + 1;
		same_chunk_count_per_page = true;
		} else if (chunk_size <= page_size) {
		// Some chunks are crossing page boundaries, which means that the page
		// contains one or two partial chunks.
		full_pages_chunk_count_max = page_size / chunk_size + 2;
		same_chunk_count_per_page = false;
		} else if (chunk_size > page_size && chunk_size % page_size == 0) {
		// One chunk covers multiple pages, no cross overs.
		full_pages_chunk_count_max = 1;
		same_chunk_count_per_page = true;
		} else if (chunk_size > page_size) {
		// One chunk covers multiple pages, Some chunks are crossing page
		// boundaries. Some pages contain one chunk, some contain two.
		full_pages_chunk_count_max = 2;
		same_chunk_count_per_page = false;
		} else {
		UNREACHABLE("All chunk_size/page_size ratios must be handled.");
		}

		PackedCounterArray<MemoryMapperT> counters(allocated_pages_count,
		full_pages_chunk_count_max,
		memory_mapper);
		if (!counters.IsAllocated())
		return;

		const uptr chunk_size_scaled = chunk_size >> kCompactPtrScale;
		const uptr page_size_scaled = page_size >> kCompactPtrScale;
		const uptr page_size_scaled_log = Log2(page_size_scaled);

		// Iterate over free chunks and count how many free chunks affect each
		// allocated page.
		if (chunk_size <= page_size && page_size % chunk_size == 0) {
		// Each chunk affects one page only.
		for (uptr i = 0; i < free_array_count; i++)
		counters.Inc(free_array[i] >> page_size_scaled_log);
		} else {
		// In all other cases chunks might affect more than one page.
		for (uptr i = 0; i < free_array_count; i++) {
		counters.IncRange(
		free_array[i] >> page_size_scaled_log,
		(free_array[i] + chunk_size_scaled - 1) >> page_size_scaled_log);
		}
		}

		// Iterate over pages detecting ranges of pages with chunk counters equal
		// to the expected number of chunks for the particular page.
		FreePagesRangeTracker<MemoryMapperT> range_tracker(memory_mapper);
		if (same_chunk_count_per_page) {
		// Fast path, every page has the same number of chunks affecting it.
		for (uptr i = 0; i < counters.GetCount(); i++)
		range_tracker.NextPage(counters.Get(i) == full_pages_chunk_count_max);
		} else {
		// Show path, go through the pages keeping count how many chunks affect
		// each page.
		const uptr pn =
		chunk_size < page_size ? page_size_scaled / chunk_size_scaled : 1;
		const uptr pnc = pn * chunk_size_scaled;
		// The idea is to increment the current page pointer by the first chunk
		// size, middle portion size (the portion of the page covered by chunks
		// except the first and the last one) and then the last chunk size, adding
		// up the number of chunks on the current page and checking on every step
		// whether the page boundary was crossed.
		uptr prev_page_boundary = 0;
		uptr current_boundary = 0;
		for (uptr i = 0; i < counters.GetCount(); i++) {
		uptr page_boundary = prev_page_boundary + page_size_scaled;
		uptr chunks_per_page = pn;
		if (current_boundary < page_boundary) {
		if (current_boundary > prev_page_boundary)
		chunks_per_page++;
		current_boundary += pnc;
		if (current_boundary < page_boundary) {
		chunks_per_page++;
		current_boundary += chunk_size_scaled;
		}
		}
		prev_page_boundary = page_boundary;

		range_tracker.NextPage(counters.Get(i) == chunks_per_page);
		}
		}
		range_tracker.Done();
		}

private:		private:
		friend class MemoryMapper;

static const uptr kRegionSize = kSpaceSize / kNumClassesRounded;		static const uptr kRegionSize = kSpaceSize / kNumClassesRounded;
// FreeArray is the array of free-d chunks (stored as 4-byte offsets).		// FreeArray is the array of free-d chunks (stored as 4-byte offsets).
// In the worst case it may reguire kRegionSize/SizeClassMap::kMinSize		// In the worst case it may reguire kRegionSize/SizeClassMap::kMinSize
// elements, but in reality this will not happen. For simplicity we		// elements, but in reality this will not happen. For simplicity we
// dedicate 1/8 of the region's virtual space to FreeArray.		// dedicate 1/8 of the region's virtual space to FreeArray.
static const uptr kFreeArraySize = kRegionSize / 8;		static const uptr kFreeArraySize = kRegionSize / 8;

static const bool kUsingConstantSpaceBeg = kSpaceBeg != ~(uptr)0;		static const bool kUsingConstantSpaceBeg = kSpaceBeg != ~(uptr)0;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	private:
u32 RandN(u32 *state, u32 n) { return Rand(state) % n; } // [0, n)		u32 RandN(u32 *state, u32 n) { return Rand(state) % n; } // [0, n)

void RandomShuffle(u32 a, u32 n, u32 rand_state) {		void RandomShuffle(u32 a, u32 n, u32 rand_state) {
if (n <= 1) return;		if (n <= 1) return;
for (u32 i = n - 1; i > 0; i--)		for (u32 i = n - 1; i > 0; i--)
Swap(a[i], a[RandN(rand_state, i + 1)]);		Swap(a[i], a[RandN(rand_state, i + 1)]);
}		}

RegionInfo *GetRegionInfo(uptr class_id) {		RegionInfo *GetRegionInfo(uptr class_id) const {
CHECK_LT(class_id, kNumClasses);		CHECK_LT(class_id, kNumClasses);
RegionInfo *regions =		RegionInfo *regions =
reinterpret_cast<RegionInfo *>(SpaceBeg() + kSpaceSize);		reinterpret_cast<RegionInfo *>(SpaceBeg() + kSpaceSize);
return &regions[class_id];		return &regions[class_id];
}		}

uptr GetMetadataEnd(uptr region_beg) {		uptr GetMetadataEnd(uptr region_beg) const {
return region_beg + kRegionSize - kFreeArraySize;		return region_beg + kRegionSize - kFreeArraySize;
}		}

uptr GetChunkIdx(uptr chunk, uptr size) {		uptr GetChunkIdx(uptr chunk, uptr size) const {
if (!kUsingConstantSpaceBeg)		if (!kUsingConstantSpaceBeg)
chunk -= SpaceBeg();		chunk -= SpaceBeg();

uptr offset = chunk % kRegionSize;		uptr offset = chunk % kRegionSize;
// Here we divide by a non-constant. This is costly.		// Here we divide by a non-constant. This is costly.
// size always fits into 32-bits. If the offset fits too, use 32-bit div.		// size always fits into 32-bits. If the offset fits too, use 32-bit div.
if (offset >> (SANITIZER_WORDSIZE / 2))		if (offset >> (SANITIZER_WORDSIZE / 2))
return offset / size;		return offset / size;
return (u32)offset / (u32)size;		return (u32)offset / (u32)size;
}		}

CompactPtrT *GetFreeArray(uptr region_beg) {		CompactPtrT *GetFreeArray(uptr region_beg) const {
return reinterpret_cast<CompactPtrT *>(region_beg + kRegionSize -		return reinterpret_cast<CompactPtrT *>(GetMetadataEnd(region_beg));
kFreeArraySize);
}		}

bool MapWithCallback(uptr beg, uptr size) {		bool MapWithCallback(uptr beg, uptr size) {
uptr mapped = reinterpret_cast<uptr>(MmapFixedOrDieOnFatalError(beg, size));		uptr mapped = reinterpret_cast<uptr>(MmapFixedOrDieOnFatalError(beg, size));
if (UNLIKELY(!mapped))		if (UNLIKELY(!mapped))
return false;		return false;
CHECK_EQ(beg, mapped);		CHECK_EQ(beg, mapped);
MapUnmapCallback().OnMap(beg, size);		MapUnmapCallback().OnMap(beg, size);
Show All 9 Lines	void UnmapWithCallbackOrDie(uptr beg, uptr size) {
MapUnmapCallback().OnUnmap(beg, size);		MapUnmapCallback().OnUnmap(beg, size);
UnmapOrDie(reinterpret_cast<void *>(beg), size);		UnmapOrDie(reinterpret_cast<void *>(beg), size);
}		}

bool EnsureFreeArraySpace(RegionInfo *region, uptr region_beg,		bool EnsureFreeArraySpace(RegionInfo *region, uptr region_beg,
uptr num_freed_chunks) {		uptr num_freed_chunks) {
uptr needed_space = num_freed_chunks * sizeof(CompactPtrT);		uptr needed_space = num_freed_chunks * sizeof(CompactPtrT);
if (region->mapped_free_array < needed_space) {		if (region->mapped_free_array < needed_space) {
CHECK_LE(needed_space, kFreeArraySize);
uptr new_mapped_free_array = RoundUpTo(needed_space, kFreeArrayMapSize);		uptr new_mapped_free_array = RoundUpTo(needed_space, kFreeArrayMapSize);
		CHECK_LE(new_mapped_free_array, kFreeArraySize);
uptr current_map_end = reinterpret_cast<uptr>(GetFreeArray(region_beg)) +		uptr current_map_end = reinterpret_cast<uptr>(GetFreeArray(region_beg)) +
region->mapped_free_array;		region->mapped_free_array;
uptr new_map_size = new_mapped_free_array - region->mapped_free_array;		uptr new_map_size = new_mapped_free_array - region->mapped_free_array;
if (UNLIKELY(!MapWithCallback(current_map_end, new_map_size)))		if (UNLIKELY(!MapWithCallback(current_map_end, new_map_size)))
return false;		return false;
region->mapped_free_array = new_mapped_free_array;		region->mapped_free_array = new_mapped_free_array;
}		}
return true;		return true;
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	NOINLINE bool PopulateFreeArray(AllocatorStats *stat, uptr class_id,
// 'allocated_*' counters.		// 'allocated_*' counters.
region->num_freed_chunks += new_chunks_count;		region->num_freed_chunks += new_chunks_count;
region->allocated_user += new_chunks_count * size;		region->allocated_user += new_chunks_count * size;
CHECK_LE(region->allocated_user, region->mapped_user);		CHECK_LE(region->allocated_user, region->mapped_user);
region->allocated_meta = requested_allocated_meta;		region->allocated_meta = requested_allocated_meta;
CHECK_LE(region->allocated_meta, region->mapped_meta);		CHECK_LE(region->allocated_meta, region->mapped_meta);
region->exhausted = false;		region->exhausted = false;

		// TODO(alekseyshl): Consider bumping last_release_at_ns here to prevent
		// MaybeReleaseToOS from releasing just allocated pages or protect these
		// not yet used chunks some other way.

return true;		return true;
}		}

void MaybeReleaseChunkRange(uptr region_beg, uptr chunk_size,		class MemoryMapper {
CompactPtrT first, CompactPtrT last) {		public:
uptr beg_ptr = CompactPtrToPointer(region_beg, first);		MemoryMapper(const ThisT& base_allocator, uptr class_id)
uptr end_ptr = CompactPtrToPointer(region_beg, last) + chunk_size;		: allocator(base_allocator),
ReleaseMemoryPagesToOS(beg_ptr, end_ptr);		region_base(base_allocator.GetRegionBeginBySizeClass(class_id)),
		released_ranges_count(0) {
		}

		uptr GetReleasedRangesCount() const {
		return released_ranges_count;
		}

		uptr MapPackedCounterArrayBuffer(uptr buffer_size) {
		// TODO(alekseyshl): The idea to explore is to check if we have enough
		// space between num_freed_chunks*sizeof(CompactPtrT) and
		// mapped_free_array to fit buffer_size bytes and use that space instead
		// of mapping a temporary one.
		return reinterpret_cast<uptr>(
		MmapOrDieOnFatalError(buffer_size, "ReleaseToOSPageCounters"));
		}

		void UnmapPackedCounterArrayBuffer(uptr buffer, uptr buffer_size) {
		UnmapOrDie(reinterpret_cast<void *>(buffer), buffer_size);
		}

		// Releases [from, to) range of pages back to OS.
		void ReleasePageRangeToOS(CompactPtrT from, CompactPtrT to) {
		ReleaseMemoryPagesToOS(
		allocator.CompactPtrToPointer(region_base, from),
		allocator.CompactPtrToPointer(region_base, to));
		released_ranges_count++;
}		}

// Attempts to release some RAM back to OS. The region is expected to be		private:
// locked.		const ThisT& allocator;
// Algorithm:		const uptr region_base;
// * Sort the chunks.		uptr released_ranges_count;
// * Find ranges fully covered by free-d chunks		};
// * Release them to OS with madvise.
		// Attempts to release RAM occupied by freed chunks back to OS. The region is
		// expected to be locked.
void MaybeReleaseToOS(uptr class_id) {		void MaybeReleaseToOS(uptr class_id) {
RegionInfo *region = GetRegionInfo(class_id);		RegionInfo *region = GetRegionInfo(class_id);
const uptr chunk_size = ClassIdToSize(class_id);		const uptr chunk_size = ClassIdToSize(class_id);
const uptr page_size = GetPageSizeCached();		const uptr page_size = GetPageSizeCached();

uptr n = region->num_freed_chunks;		uptr n = region->num_freed_chunks;
if (n * chunk_size < page_size)		if (n * chunk_size < page_size)
return; // No chance to release anything.		return; // No chance to release anything.
if ((region->stats.n_freed -		if ((region->stats.n_freed -
region->rtoi.n_freed_at_last_release) * chunk_size < page_size) {		region->rtoi.n_freed_at_last_release) * chunk_size < page_size) {
return; // Nothing new to release.		return; // Nothing new to release.
}		}

s32 interval_ms = ReleaseToOSIntervalMs();		s32 interval_ms = ReleaseToOSIntervalMs();
if (interval_ms < 0)		if (interval_ms < 0)
return;		return;

u64 now_ns = NanoTime();		if (region->rtoi.last_release_at_ns + interval_ms * 1000000ULL > NanoTime())
if (region->rtoi.last_release_at_ns + interval_ms * 1000000ULL > now_ns)
return; // Memory was returned recently.		return; // Memory was returned recently.
region->rtoi.last_release_at_ns = now_ns;

uptr region_beg = GetRegionBeginBySizeClass(class_id);		MemoryMapper memory_mapper(*this, class_id);
CompactPtrT *free_array = GetFreeArray(region_beg);
SortArray(free_array, n);

const uptr scaled_chunk_size = chunk_size >> kCompactPtrScale;		ReleaseFreeMemoryToOS<MemoryMapper>(
const uptr kScaledGranularity = page_size >> kCompactPtrScale;		GetFreeArray(GetRegionBeginBySizeClass(class_id)), n, chunk_size,
		RoundUpTo(region->allocated_user, page_size) / page_size,
		&memory_mapper);

uptr range_beg = free_array[0];		if (memory_mapper.GetReleasedRangesCount() > 0) {
uptr prev = free_array[0];
for (uptr i = 1; i < n; i++) {
uptr chunk = free_array[i];
CHECK_GT(chunk, prev);
if (chunk - prev != scaled_chunk_size) {
CHECK_GT(chunk - prev, scaled_chunk_size);
if (prev + scaled_chunk_size - range_beg >= kScaledGranularity) {
MaybeReleaseChunkRange(region_beg, chunk_size, range_beg, prev);
region->rtoi.n_freed_at_last_release = region->stats.n_freed;		region->rtoi.n_freed_at_last_release = region->stats.n_freed;
region->rtoi.num_releases++;		region->rtoi.num_releases += memory_mapper.GetReleasedRangesCount();
}
range_beg = chunk;
}
prev = chunk;
}		}
		region->rtoi.last_release_at_ns = NanoTime();
}		}
};		};

compiler-rt/trunk/lib/sanitizer_common/tests/sanitizer_allocator_test.cc

Show All 14 Lines
#include "sanitizer_common/sanitizer_allocator_internal.h"		#include "sanitizer_common/sanitizer_allocator_internal.h"
#include "sanitizer_common/sanitizer_common.h"		#include "sanitizer_common/sanitizer_common.h"

#include "sanitizer_test_utils.h"		#include "sanitizer_test_utils.h"
#include "sanitizer_pthread_wrappers.h"		#include "sanitizer_pthread_wrappers.h"

#include "gtest/gtest.h"		#include "gtest/gtest.h"

		#include <stdio.h>
#include <stdlib.h>		#include <stdlib.h>
#include <algorithm>		#include <algorithm>
#include <vector>		#include <vector>
#include <random>		#include <random>
#include <set>		#include <set>

using namespace __sanitizer;		using namespace __sanitizer;

▲ Show 20 Lines • Show All 977 Lines • ▼ Show 20 Lines	TEST(SanitizerCommon, SizeClassAllocator64PopulateFreeListOOM) {
cache.Drain(a);		cache.Drain(a);
ASSERT_EQ(p[6][Size2 - 1], 42);		ASSERT_EQ(p[6][Size2 - 1], 42);
a->TestOnlyUnmap();		a->TestOnlyUnmap();
delete a;		delete a;
}		}

#endif		#endif

		#if SANITIZER_CAN_USE_ALLOCATOR64

		class NoMemoryMapper {
		public:
		uptr last_request_buffer_size;

		NoMemoryMapper() : last_request_buffer_size(0) {}

		uptr MapPackedCounterArrayBuffer(uptr buffer_size) {
		last_request_buffer_size = buffer_size;
		return 0;
		}
		void UnmapPackedCounterArrayBuffer(uptr buffer, uptr buffer_size) {}
		};

		class RedZoneMemoryMapper {
		public:
		RedZoneMemoryMapper() {
		const auto page_size = GetPageSize();
		buffer = MmapOrDie(3ULL * page_size, "");
		MprotectNoAccess(reinterpret_cast<uptr>(buffer), page_size);
		MprotectNoAccess(reinterpret_cast<uptr>(buffer) + page_size * 2, page_size);
		}
		~RedZoneMemoryMapper() {
		UnmapOrDie(buffer, 3 * GetPageSize());
		}

		uptr MapPackedCounterArrayBuffer(uptr buffer_size) {
		const auto page_size = GetPageSize();
		CHECK_EQ(buffer_size, page_size);
		memset(reinterpret_cast<void*>(reinterpret_cast<uptr>(buffer) + page_size),
		0, page_size);
		return reinterpret_cast<uptr>(buffer) + page_size;
		}
		void UnmapPackedCounterArrayBuffer(uptr buffer, uptr buffer_size) {}

		private:
		void *buffer;
		};

		TEST(SanitizerCommon, SizeClassAllocator64PackedCounterArray) {
		NoMemoryMapper no_memory_mapper;
		typedef Allocator64::PackedCounterArray<NoMemoryMapper>
		NoMemoryPackedCounterArray;

		for (int i = 0; i < 64; i++) {
		// Various valid counter's max values packed into one word.
		NoMemoryPackedCounterArray counters_2n(1, 1ULL << i, &no_memory_mapper);
		EXPECT_EQ(8ULL, no_memory_mapper.last_request_buffer_size);

		// Check the "all bit set" values too.
		NoMemoryPackedCounterArray counters_2n1_1(1, ~0ULL >> i, &no_memory_mapper);
		EXPECT_EQ(8ULL, no_memory_mapper.last_request_buffer_size);

		// Verify the packing ratio, the counter is expected to be packed into the
		// closest power of 2 bits.
		NoMemoryPackedCounterArray counters(64, 1ULL << i, &no_memory_mapper);
		EXPECT_EQ(8ULL * RoundUpToPowerOfTwo(i + 1),
		no_memory_mapper.last_request_buffer_size);
		}

		RedZoneMemoryMapper memory_mapper;
		typedef Allocator64::PackedCounterArray<RedZoneMemoryMapper>
		RedZonePackedCounterArray;
		// Go through 1, 2, 4, 8, .. 64 bits per counter.
		for (int i = 0; i < 7; i++) {
		// Make sure counters request one memory page for the buffer.
		const u64 kNumCounters = (GetPageSize() / 8) * (64 >> i);
		RedZonePackedCounterArray counters(kNumCounters,
		1ULL << ((1 << i) - 1),
		&memory_mapper);
		counters.Inc(0);
		for (u64 c = 1; c < kNumCounters - 1; c++) {
		ASSERT_EQ(0ULL, counters.Get(c));
		counters.Inc(c);
		ASSERT_EQ(1ULL, counters.Get(c - 1));
		}
		ASSERT_EQ(0ULL, counters.Get(kNumCounters - 1));
		counters.Inc(kNumCounters - 1);

		if (i > 0) {
		counters.IncRange(0, kNumCounters - 1);
		for (u64 c = 0; c < kNumCounters; c++)
		ASSERT_EQ(2ULL, counters.Get(c));
		}
		}
		}

		class RangeRecorder {
		public:
		std::string reported_pages;

		RangeRecorder()
		: page_size_scaled_log(
		Log2(GetPageSizeCached() >> Allocator64::kCompactPtrScale)),
		last_page_reported(0) {}

		void ReleasePageRangeToOS(u32 from, u32 to) {
		from >>= page_size_scaled_log;
		to >>= page_size_scaled_log;
		ASSERT_LT(from, to);
		if (!reported_pages.empty())
		ASSERT_LT(last_page_reported, from);
		reported_pages.append(from - last_page_reported, '.');
		reported_pages.append(to - from, 'x');
		last_page_reported = to;
		}
		private:
		const uptr page_size_scaled_log;
		u32 last_page_reported;
		};

		TEST(SanitizerCommon, SizeClassAllocator64FreePagesRangeTracker) {
		typedef Allocator64::FreePagesRangeTracker<RangeRecorder> RangeTracker;

		// 'x' denotes a page to be released, '.' denotes a page to be kept around.
		const char* test_cases[] = {
		"",
		".",
		"x",
		"........",
		"xxxxxxxxxxx",
		"..............xxxxx",
		"xxxxxxxxxxxxxxxxxx.....",
		"......xxxxxxxx........",
		"xxx..........xxxxxxxxxxxxxxx",
		"......xxxx....xxxx........",
		"xxx..........xxxxxxxx....xxxxxxx",
		"x.x.x.x.x.x.x.x.x.x.x.x.",
		".x.x.x.x.x.x.x.x.x.x.x.x",
		".x.x.x.x.x.x.x.x.x.x.x.x.",
		"x.x.x.x.x.x.x.x.x.x.x.x.x",
		};

		for (auto test_case : test_cases) {
		RangeRecorder range_recorder;
		RangeTracker tracker(&range_recorder);
		for (int i = 0; test_case[i] != 0; i++)
		tracker.NextPage(test_case[i] == 'x');
		tracker.Done();
		// Strip trailing '.'-pages before comparing the results as they are not
		// going to be reported to range_recorder anyway.
		const char* last_x = strrchr(test_case, 'x');
		std::string expected(
		test_case,
		last_x == nullptr ? 0 : (last_x - test_case + 1));
		EXPECT_STREQ(expected.c_str(), range_recorder.reported_pages.c_str());
		}
		}

		class ReleasedPagesTrackingMemoryMapper {
		public:
		std::set<u32> reported_pages;

		uptr MapPackedCounterArrayBuffer(uptr buffer_size) {
		reported_pages.clear();
		return reinterpret_cast<uptr>(calloc(1, buffer_size));
		}
		void UnmapPackedCounterArrayBuffer(uptr buffer, uptr buffer_size) {
		free(reinterpret_cast<void*>(buffer));
		}

		void ReleasePageRangeToOS(u32 from, u32 to) {
		uptr page_size_scaled =
		GetPageSizeCached() >> Allocator64::kCompactPtrScale;
		for (u32 i = from; i < to; i += page_size_scaled)
		reported_pages.insert(i);
		}
		};

		template <class Allocator>
		void TestReleaseFreeMemoryToOS() {
		ReleasedPagesTrackingMemoryMapper memory_mapper;
		const uptr kAllocatedPagesCount = 1024;
		const uptr page_size = GetPageSizeCached();
		const uptr page_size_scaled = page_size >> Allocator::kCompactPtrScale;
		std::mt19937 r;
		uint32_t rnd_state = 42;

		for (uptr class_id = 1; class_id <= Allocator::SizeClassMapT::kLargestClassID;
		class_id++) {
		const uptr chunk_size = Allocator::SizeClassMapT::Size(class_id);
		const uptr chunk_size_scaled = chunk_size >> Allocator::kCompactPtrScale;
		const uptr max_chunks =
		kAllocatedPagesCount * GetPageSizeCached() / chunk_size;

		// Generate the random free list.
		std::vector<u32> free_array;
		bool in_free_range = false;
		uptr current_range_end = 0;
		for (uptr i = 0; i < max_chunks; i++) {
		if (i == current_range_end) {
		in_free_range = (my_rand_r(&rnd_state) & 1U) == 1;
		current_range_end += my_rand_r(&rnd_state) % 100 + 1;
		}
		if (in_free_range)
		free_array.push_back(i * chunk_size_scaled);
		}
		if (free_array.empty())
		continue;
		// Shuffle free_list to verify that ReleaseFreeMemoryToOS does not depend on
		// the list ordering.
		std::shuffle(free_array.begin(), free_array.end(), r);

		Allocator::ReleaseFreeMemoryToOS(&free_array[0], free_array.size(),
		chunk_size, kAllocatedPagesCount,
		&memory_mapper);

		// Verify that there are no released pages touched by used chunks and all
		// ranges of free chunks big enough to contain the entire memory pages had
		// these pages released.
		uptr verified_released_pages = 0;
		std::set<u32> free_chunks(free_array.begin(), free_array.end());

		u32 current_chunk = 0;
		in_free_range = false;
		u32 current_free_range_start = 0;
		for (uptr i = 0; i <= max_chunks; i++) {
		bool is_free_chunk = free_chunks.find(current_chunk) != free_chunks.end();

		if (is_free_chunk) {
		if (!in_free_range) {
		in_free_range = true;
		current_free_range_start = current_chunk;
		}
		} else {
		// Verify that this used chunk does not touch any released page.
		for (uptr i_page = current_chunk / page_size_scaled;
		i_page <= (current_chunk + chunk_size_scaled - 1) /
		page_size_scaled;
		i_page++) {
		bool page_released =
		memory_mapper.reported_pages.find(i_page * page_size_scaled) !=
		memory_mapper.reported_pages.end();
		ASSERT_EQ(false, page_released);
		}

		if (in_free_range) {
		in_free_range = false;
		// Verify that all entire memory pages covered by this range of free
		// chunks were released.
		u32 page = RoundUpTo(current_free_range_start, page_size_scaled);
		while (page + page_size_scaled <= current_chunk) {
		bool page_released =
		memory_mapper.reported_pages.find(page) !=
		memory_mapper.reported_pages.end();
		ASSERT_EQ(true, page_released);
		verified_released_pages++;
		page += page_size_scaled;
		}
		}
		}

		current_chunk += chunk_size_scaled;
		}

		ASSERT_EQ(memory_mapper.reported_pages.size(), verified_released_pages);
		}
		}

		TEST(SanitizerCommon, SizeClassAllocator64ReleaseFreeMemoryToOS) {
		TestReleaseFreeMemoryToOS<Allocator64>();
		}

		TEST(SanitizerCommon, SizeClassAllocator64CompactReleaseFreeMemoryToOS) {
		TestReleaseFreeMemoryToOS<Allocator64Compact>();
		}

		TEST(SanitizerCommon, SizeClassAllocator64VeryCompactReleaseFreeMemoryToOS) {
		TestReleaseFreeMemoryToOS<Allocator64VeryCompact>();
		}

		#endif // SANITIZER_CAN_USE_ALLOCATOR64

TEST(SanitizerCommon, TwoLevelByteMap) {		TEST(SanitizerCommon, TwoLevelByteMap) {
const u64 kSize1 = 1 << 6, kSize2 = 1 << 12;		const u64 kSize1 = 1 << 6, kSize2 = 1 << 12;
const u64 n = kSize1 * kSize2;		const u64 n = kSize1 * kSize2;
TwoLevelByteMap<kSize1, kSize2> m;		TwoLevelByteMap<kSize1, kSize2> m;
m.TestOnlyInit();		m.TestOnlyInit();
for (u64 i = 0; i < n; i += 7) {		for (u64 i = 0; i < n; i += 7) {
m.set(i, (i % 100) + 1);		m.set(i, (i % 100) + 1);
}		}
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines