This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/sanitizer_common/
-
sanitizer_common/
7/7
sanitizer_allocator_primary32.h

Differential D33454

[sanitizer] Change the 32-bit Primary AllocateRegion to reduce fragmentation
ClosedPublic

Authored by cryptoad on May 23 2017, 11:29 AM.

Download Raw Diff

Details

Reviewers

alekseyshl
kcc
dvyukov

Commits

rG0dd40cf28d2e: [sanitizer] Change the 32-bit Primary AllocateRegion to reduce fragmentation
rCRT303879: [sanitizer] Change the 32-bit Primary AllocateRegion to reduce fragmentation
rL303879: [sanitizer] Change the 32-bit Primary AllocateRegion to reduce fragmentation

Summary

Currently, AllocateRegion has a tendency to fragment memory: it allocates
2*kRegionSize, and if the memory is aligned, will unmap kRegionSize bytes,
thus creating a hole, which can't itself be reused for another region. This
is exacerbated by the fact that if 2 regions get allocated one after another
without any mmap in between, the second will be aligned due to mappings
generally being contiguous.

An idea, suggested by @alekseyshl, to prevent such a behavior is to have a
stash of regions: if the 2*kRegionSize allocation is properly aligned, split
it in two, and stash the second part to be returned next time a region is
requested.

At this point, I thought about a couple of ways to implement this:

either an IntrusiveList of regions candidates, storing next at the begining of the region;
a small array of regions candidates existing in the Primary.

While the second option is more constrained in terms of size, it offers several
advantages:

security wise, a pointer in a region candidate could be overflowed into, and abused when popping an element;
we do not dirty the first page of the region by storing something in it;
unless several threads request regions simultaneously from different size classes, the stash rarely goes above 1 entry.

I am not certain about the Windows impact of this change, as sanitizer_win.cc
has its own version of MmapAlignedOrDie, maybe someone could chime in on this.

MmapAlignedOrDie is effectively unused after this change and could be removed
at a later point. I didn't notice any sizeable performance gain, even though we
are saving a few mmap/munmap syscalls.

Diff Detail

Build Status

Buildable 6734
Build 6734: arc lint + arc unit

Event Timeline

cryptoad created this revision.May 23 2017, 11:29 AM

Herald added a subscriber: kubamracek. · View Herald TranscriptMay 23 2017, 11:29 AM

cryptoad edited the summary of this revision. (Show Details)May 23 2017, 11:50 AM

How about having a stash for one region in SizeClassInfo? You won't need more than one since sci is locked and it is available when AllocateRegion is called, just pass it there.

alekseyshl added inline comments.May 23 2017, 5:16 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h
346	I'd move the allocation into separate private function anyway.

In D33454#762799, @alekseyshl wrote:

How about having a stash for one region in SizeClassInfo? You won't need more than one since sci is locked and it is available when AllocateRegion is called, just pass it there.

I thought about it since you mentioned it in your initial suggestion. It's not improbable to have several "hanging" regions within different SCI that wouldn't be used for a while. With 52 classes (and as many SCI), the worst case scenario would be 51MB of VA lost in stashes. In a regular use, I assume it would be several MB.
I feel like having a common stash, while adding a mutex, allows to avoid this, allowing to repurpose the stashed ones pretty quickly.

lib/sanitizer_common/sanitizer_allocator_primary32.h
346	Will do.

FWIW I've solved a similar problem in tcmalloc by trimming blocks either on left or on right. Namely, if heap grows up we trim on right; if heap grows down we trim on left. But I don't see how it is better than this solution.

I am not certain about the Windows impact of this change

I have not looked at the code, but windows has a thing called "allocation granularity" which is larger than page size (historically equal to 64K). OS allocator manages virtual address space only at that granularity. I.e. it is not possible to allocate/free a single page. That can be the reason for the different implementation.

looks good to me

In D33454#762830, @cryptoad wrote:

In D33454#762799, @alekseyshl wrote:

How about having a stash for one region in SizeClassInfo? You won't need more than one since sci is locked and it is available when AllocateRegion is called, just pass it there.

I thought about it since you mentioned it in your initial suggestion. It's not improbable to have several "hanging" regions within different SCI that wouldn't be used for a while. With 52 classes (and as many SCI), the worst case scenario would be 51MB of VA lost in stashes. In a regular use, I assume it would be several MB.
I feel like having a common stash, while adding a mutex, allows to avoid this, allowing to repurpose the stashed ones pretty quickly.

You can store the region in size_class_info_array and traverse it instead of maintaining and traversing regions_stash, but it will complicate the synchronization. Ok, let's move the allocation into separate function and I'm fine with the rest.

As per review feedback, move the new code into its own function.
Additionally, change the top file comment to reflect the new way of doing
things.

Harbormaster completed remote builds in B6726: Diff 100113.May 24 2017, 9:24 AM

LGTM with some code structure suggestions which you're free to ignore, if you like your way better.

lib/sanitizer_common/sanitizer_allocator_primary32.h
287	I'd just inline map_res + map_size in the only place you use map_end.
288	Can we rename it to 'region'?
289	The name 'extra_region' is a bit narrower than what this flag indicates. How about changing the structure a bit: bool trim_region = true; if (IsAligned(res, kRegionSize)) { SpinMutexLock l(&regions_stash_mutex); if (num_stashed_regions < kMaxStashedRegions) { ... trim_region = false; } } if (trim_region) { trim both left and right } UnmapOrDie handles 0 sizes just fine.
306	I know, the result is the same, but doesn't it make more sense to just assign here? It aligns better with how 'end' is calculated. map_size = kRegionSize;
357	Please add your "unless several threads request regions simultaneously from different size classes, the stash rarely goes above 1 entry" comment to this constant.

This revision is now accepted and ready to land.May 24 2017, 11:36 AM

cryptoad marked 7 inline comments as done.May 24 2017, 1:36 PM

Including @alekseyshl's suggestions.

Harbormaster completed remote builds in B6733: Diff 100156.May 24 2017, 1:36 PM

Correcting grammar in the newly added comment.

Harbormaster completed remote builds in B6734: Diff 100157.May 24 2017, 1:38 PM

LGTM

alekseyshl accepted this revision.May 25 2017, 9:14 AM

cryptoad closed this revision.May 25 2017, 9:20 AM

cryptoad mentioned this in D34152: [sanitizer] MmapAlignedOrDie changes to reduce fragmentation.Jun 13 2017, 9:11 AM

cryptoad mentioned this in rL305391: [sanitizer] MmapAlignedOrDie changes to reduce fragmentation.Jun 14 2017, 8:33 AM

Revision Contents

Path

Size

lib/

sanitizer_common/

sanitizer_allocator_primary32.h

63 lines

Diff 100157

lib/sanitizer_common/sanitizer_allocator_primary32.h

Show All 18 Lines
// SizeClassAllocator32 -- allocator for 32-bit address space.		// SizeClassAllocator32 -- allocator for 32-bit address space.
// This allocator can theoretically be used on 64-bit arch, but there it is less		// This allocator can theoretically be used on 64-bit arch, but there it is less
// efficient than SizeClassAllocator64.		// efficient than SizeClassAllocator64.
//		//
// [kSpaceBeg, kSpaceBeg + kSpaceSize) is the range of addresses which can		// [kSpaceBeg, kSpaceBeg + kSpaceSize) is the range of addresses which can
// be returned by MmapOrDie().		// be returned by MmapOrDie().
//		//
// Region:		// Region:
// a result of a single call to MmapAlignedOrDie(kRegionSize, kRegionSize).		// a result of an allocation of kRegionSize bytes aligned on kRegionSize.
// Since the regions are aligned by kRegionSize, there are exactly		// Since the regions are aligned by kRegionSize, there are exactly
// kNumPossibleRegions possible regions in the address space and so we keep		// kNumPossibleRegions possible regions in the address space and so we keep
// a ByteMap possible_regions to store the size classes of each Region.		// a ByteMap possible_regions to store the size classes of each Region.
// 0 size class means the region is not used by the allocator.		// 0 size class means the region is not used by the allocator.
//		//
// One Region is used to allocate chunks of a single size class.		// One Region is used to allocate chunks of a single size class.
// A Region looks like this:		// A Region looks like this:
// UserChunk1 .. UserChunkN <gap> MetaChunkN .. MetaChunk1		// UserChunk1 .. UserChunkN <gap> MetaChunkN .. MetaChunk1
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:
}		}

typedef SizeClassAllocator32<Params> ThisT;		typedef SizeClassAllocator32<Params> ThisT;
typedef SizeClassAllocator32LocalCache<ThisT> AllocatorCache;		typedef SizeClassAllocator32LocalCache<ThisT> AllocatorCache;

void Init(s32 release_to_os_interval_ms) {		void Init(s32 release_to_os_interval_ms) {
possible_regions.TestOnlyInit();		possible_regions.TestOnlyInit();
internal_memset(size_class_info_array, 0, sizeof(size_class_info_array));		internal_memset(size_class_info_array, 0, sizeof(size_class_info_array));
		num_stashed_regions = 0;
}		}

s32 ReleaseToOSIntervalMs() const {		s32 ReleaseToOSIntervalMs() const {
return kReleaseToOSIntervalNever;		return kReleaseToOSIntervalNever;
}		}

void SetReleaseToOSIntervalMs(s32 release_to_os_interval_ms) {		void SetReleaseToOSIntervalMs(s32 release_to_os_interval_ms) {
// This is empty here. Currently only implemented in 64-bit allocator.		// This is empty here. Currently only implemented in 64-bit allocator.
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	uptr ComputeRegionId(uptr mem) {
CHECK_LT(res, kNumPossibleRegions);		CHECK_LT(res, kNumPossibleRegions);
return res;		return res;
}		}

uptr ComputeRegionBeg(uptr mem) {		uptr ComputeRegionBeg(uptr mem) {
return mem & ~(kRegionSize - 1);		return mem & ~(kRegionSize - 1);
}		}

		// Allocates a region of kRegionSize bytes, aligned on kRegionSize, by first
		// allocating 2 * kRegionSize. If the result of the initial allocation is
		// aligned, split it in two, and attempt to store the second part into a
		// stash. In the event the stash is full, just unmap the superfluous memory.
		// If the initial allocation is not aligned, trim the memory before and after.
		uptr AllocateRegionSlow(AllocatorStats *stat) {
		uptr map_size = 2 * kRegionSize;
		uptr map_res = (uptr)MmapOrDie(map_size, "SizeClassAllocator32");
		uptr region = map_res;
		alekseyshlUnsubmitted Done Reply Inline Actions I'd just inline map_res + map_size in the only place you use map_end. alekseyshl: I'd just inline map_res + map_size in the only place you use map_end.
		bool trim_region = true;
		alekseyshlUnsubmitted Done Reply Inline Actions Can we rename it to 'region'? alekseyshl: Can we rename it to 'region'?
		if (IsAligned(region, kRegionSize)) {
		alekseyshlUnsubmitted Done Reply Inline Actions The name 'extra_region' is a bit narrower than what this flag indicates. How about changing the structure a bit: bool trim_region = true; if (IsAligned(res, kRegionSize)) { SpinMutexLock l(&regions_stash_mutex); if (num_stashed_regions < kMaxStashedRegions) { ... trim_region = false; } } if (trim_region) { trim both left and right } UnmapOrDie handles 0 sizes just fine. alekseyshl: The name 'extra_region' is a bit narrower than what this flag indicates. How about changing the…
		// We are aligned, attempt to stash the second half.
		SpinMutexLock l(&regions_stash_mutex);
		if (num_stashed_regions < kMaxStashedRegions) {
		regions_stash[num_stashed_regions++] = region + kRegionSize;
		trim_region = false;
		}
		}
		// Trim the superfluous memory in front and behind us.
		if (trim_region) {
		// If map_res is already aligned on kRegionSize (in the event of a full
		// stash), the following two lines amount to a no-op.
		region = (map_res + kRegionSize - 1) & ~(kRegionSize - 1);
		UnmapOrDie((void*)map_res, region - map_res);
		uptr end = region + kRegionSize;
		UnmapOrDie((void*)end, map_res + map_size - end);
		map_size = kRegionSize;
		}
		alekseyshlUnsubmitted Done Reply Inline Actions I know, the result is the same, but doesn't it make more sense to just assign here? It aligns better with how 'end' is calculated. map_size = kRegionSize; alekseyshl: I know, the result is the same, but doesn't it make more sense to just assign here? It aligns…
		MapUnmapCallback().OnMap(region, map_size);
		stat->Add(AllocatorStatMapped, map_size);
		return region;
		}

uptr AllocateRegion(AllocatorStats *stat, uptr class_id) {		uptr AllocateRegion(AllocatorStats *stat, uptr class_id) {
CHECK_LT(class_id, kNumClasses);		CHECK_LT(class_id, kNumClasses);
uptr res = reinterpret_cast<uptr>(MmapAlignedOrDie(kRegionSize, kRegionSize,		uptr region = 0;
"SizeClassAllocator32"));		{
MapUnmapCallback().OnMap(res, kRegionSize);		SpinMutexLock l(&regions_stash_mutex);
stat->Add(AllocatorStatMapped, kRegionSize);		if (num_stashed_regions > 0)
CHECK_EQ(0U, (res & (kRegionSize - 1)));		region = regions_stash[--num_stashed_regions];
possible_regions.set(ComputeRegionId(res), static_cast<u8>(class_id));		}
return res;		if (!region)
		region = AllocateRegionSlow(stat);
		CHECK(IsAligned(region, kRegionSize));
		possible_regions.set(ComputeRegionId(region), static_cast<u8>(class_id));
		return region;
}		}

SizeClassInfo *GetSizeClassInfo(uptr class_id) {		SizeClassInfo *GetSizeClassInfo(uptr class_id) {
CHECK_LT(class_id, kNumClasses);		CHECK_LT(class_id, kNumClasses);
return &size_class_info_array[class_id];		return &size_class_info_array[class_id];
}		}

void PopulateFreeList(AllocatorStats stat, AllocatorCache c,		void PopulateFreeList(AllocatorStats stat, AllocatorCache c,
SizeClassInfo *sci, uptr class_id) {		SizeClassInfo *sci, uptr class_id) {
uptr size = ClassIdToSize(class_id);		uptr size = ClassIdToSize(class_id);
uptr reg = AllocateRegion(stat, class_id);		uptr reg = AllocateRegion(stat, class_id);
uptr n_chunks = kRegionSize / (size + kMetadataSize);		uptr n_chunks = kRegionSize / (size + kMetadataSize);
uptr max_count = TransferBatch::MaxCached(class_id);		uptr max_count = TransferBatch::MaxCached(class_id);
TransferBatch *b = nullptr;		TransferBatch *b = nullptr;
for (uptr i = reg; i < reg + n_chunks * size; i += size) {		for (uptr i = reg; i < reg + n_chunks * size; i += size) {
if (!b) {		if (!b) {
b = c->CreateBatch(class_id, this, (TransferBatch*)i);		b = c->CreateBatch(class_id, this, (TransferBatch*)i);
b->Clear();		b->Clear();
}		}
b->Add((void*)i);		b->Add((void*)i);
if (b->Count() == max_count) {		if (b->Count() == max_count) {
CHECK_GT(b->Count(), 0);		CHECK_GT(b->Count(), 0);
		alekseyshlUnsubmitted Done Reply Inline Actions I'd move the allocation into separate private function anyway. alekseyshl: I'd move the allocation into separate private function anyway.
		cryptoadAuthorUnsubmitted Done Reply Inline Actions Will do. cryptoad: Will do.
sci->free_list.push_back(b);		sci->free_list.push_back(b);
b = nullptr;		b = nullptr;
}		}
}		}
if (b) {		if (b) {
CHECK_GT(b->Count(), 0);		CHECK_GT(b->Count(), 0);
sci->free_list.push_back(b);		sci->free_list.push_back(b);
}		}
}		}

		// Unless several threads request regions simultaneously from different size
		alekseyshlUnsubmitted Done Reply Inline Actions Please add your "unless several threads request regions simultaneously from different size classes, the stash rarely goes above 1 entry" comment to this constant. alekseyshl: Please add your "unless several threads request regions simultaneously from different size…
		// classes, the stash rarely contains more than 1 entry.
		static const uptr kMaxStashedRegions = 8;
		SpinMutex regions_stash_mutex;
		uptr num_stashed_regions;
		uptr regions_stash[kMaxStashedRegions];

ByteMap possible_regions;		ByteMap possible_regions;
SizeClassInfo size_class_info_array[kNumClasses];		SizeClassInfo size_class_info_array[kNumClasses];
};		};