This is an archive of the discontinued LLVM Phabricator instance.

[scudo] Adjust page map buffer size
ClosedPublic

Authored by Chia-hungDuan on Feb 24 2023, 12:23 PM.

Download Raw Diff

Details

Reviewers

cferris
cryptoad

Commits

rGc514198e4d39: [scudo] Adjust page map buffer size

Summary

Given the memory group, we are unlikely to need a huge page map to
record entire region. This CL reduces the size of default page map
buffer from 2048 to 512 and increase the number of static buffers to 2.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Chia-hungDuan created this revision.Feb 24 2023, 12:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 12:23 PM

Herald added subscribers: yaneury, supersymetrie, Enna1, cryptoad. · View Herald Transcript

Chia-hungDuan requested review of this revision.Feb 24 2023, 12:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 12:23 PM

Herald added a subscriber: Restricted Project. · View Herald Transcript

Chia-hungDuan added a parent revision: D143303: [scudo] Simplify markFreeBlocks.Feb 24 2023, 12:24 PM

Chia-hungDuan added reviewers: cferris, cryptoad.

Chia-hungDuan added a child revision: D144768: [scudo] Mitigate page releasing thrashing.Feb 24 2023, 4:55 PM

Harbormaster completed remote builds in B215816: Diff 500272.Feb 24 2023, 10:18 PM

Update thread-safety annotation

Harbormaster completed remote builds in B216564: Diff 501271.Feb 28 2023, 1:01 PM

A couple of nits, but one question about the design.

Also, it would be good to include the data you gathered about the sizes needed to show that the 512 size was not grabbed completely out of the air.

compiler-rt/lib/scudo/standalone/release.h
69	Would it make sense to change this to a try lock? Because if you are stuck waiting for another thread to finish, it might take more time than falling back to doing the mmap. Using the try lock should be fast and avoid blocking multiple threads trying to get a block.
118–121	I know this is pulled from the previous code, but the comment should reference that this is only done for Fuchsia. We might want to say that it hasn't proven a performance benefit on other platforms.
129	This is slightly confusing. Maybe something like a '1' means that buffer index is not used. '0' means the buffer is in use.

This revision now requires changes to proceed.Feb 28 2023, 2:17 PM

Address review comment and slightly reorganize conditional branch

Herald added subscribers: abrachet, phosek. · View Herald TranscriptMar 3 2023, 10:57 AM

Chia-hungDuan added inline comments.Mar 3 2023, 10:57 AM

compiler-rt/lib/scudo/standalone/release.h
69	Add comment and move the getDynamicBuffer() out the lock acquiring scope.

Harbormaster completed remote builds in B217212: Diff 502189.Mar 3 2023, 10:58 AM

The size is determined by collecting the buffer usage from several apps and we didn't see them over 256, most of them are smaller than 100. With partial range releasing, the required buffer size is even smaller. The following is a sample with buffer size grater than 200. It's 15 out of 312 buffer usages.

I scudo   : AllocatedUser = 28734656, Origin Buffer Size = 220, Optimized BufferSize = 177                                                                                                                         
I scudo   : AllocatedUser = 28734656, Origin Buffer Size = 220, Optimized BufferSize = 17
I scudo   : AllocatedUser = 28734656, Origin Buffer Size = 220, Optimized BufferSize = 81
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 5 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 173 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 25
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 181 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 33
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 9 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 177 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 89
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 177 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 9 
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 41
I scudo   : AllocatedUser = 29947696, Origin Buffer Size = 229, Optimized BufferSize = 41

LGTM

This revision is now accepted and ready to land.Mar 3 2023, 12:20 PM

Chia-hungDuan removed a child revision: D144768: [scudo] Mitigate page releasing thrashing.Mar 3 2023, 1:27 PM

Chia-hungDuan added a child revision: D145419: [scudo] Slightly improve the handling of last block in a region.Mar 6 2023, 1:37 PM

Closed by commit rGc514198e4d39: [scudo] Adjust page map buffer size (authored by Chia-hungDuan). · Explain WhyMar 8 2023, 1:21 PM

This revision was automatically updated to reflect the committed changes.

Chia-hungDuan added a commit: rGc514198e4d39: [scudo] Adjust page map buffer size.

Revision Contents

Path

Size

compiler-rt/

lib/

scudo/

standalone/

release.h

129 lines

release.cpp

4 lines

tests/

release_test.cpp

21 lines

Diff 503494

compiler-rt/lib/scudo/standalone/release.h

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	private:
// Therefore, store them separately to make it work on all the platforms.		// Therefore, store them separately to make it work on all the platforms.
uptr Base = 0;		uptr Base = 0;
// The release offset from Base. This is used when we know a given range after		// The release offset from Base. This is used when we know a given range after
// Base will not be released.		// Base will not be released.
uptr Offset = 0;		uptr Offset = 0;
MapPlatformData *Data = nullptr;		MapPlatformData *Data = nullptr;
};		};

		// A buffer pool which holds a fixed number of static buffers for fast buffer
		// allocation. If the request size is greater than `StaticBufferSize`, it'll
		// delegate the allocation to map().
		template <uptr StaticBufferCount, uptr StaticBufferSize> class BufferPool {
		public:
		// Preserve 1 bit in the `Mask` so that we don't need to do zero-check while
		// extracting the least significant bit from the `Mask`.
		static_assert(StaticBufferCount < SCUDO_WORDSIZE, "");
		static_assert(isAligned(StaticBufferSize, SCUDO_CACHE_LINE_SIZE), "");

		// Return a buffer which is at least `BufferSize`.
		uptr *getBuffer(const uptr BufferSize) {
		if (UNLIKELY(BufferSize > StaticBufferSize))
		return getDynamicBuffer(BufferSize);

		uptr index;
		{
		// TODO: In general, we expect this operation should be fast so the
		cferrisUnsubmitted Done Reply Inline Actions Would it make sense to change this to a try lock? Because if you are stuck waiting for another thread to finish, it might take more time than falling back to doing the mmap. Using the try lock should be fast and avoid blocking multiple threads trying to get a block. cferris: Would it make sense to change this to a try lock? Because if you are stuck waiting for another…
		Chia-hungDuanAuthorUnsubmitted Done Reply Inline Actions Add comment and move the getDynamicBuffer() out the lock acquiring scope. Chia-hungDuan: Add comment and move the getDynamicBuffer() out the lock acquiring scope.
		// waiting thread won't be put into sleep. The HybridMutex does implement
		// the busy-waiting but we may want to review the performance and see if
		// we need an explict spin lock here.
		ScopedLock L(Mutex);
		index = getLeastSignificantSetBitIndex(Mask);
		if (index < StaticBufferCount)
		Mask ^= static_cast<uptr>(1) << index;
		}

		if (index >= StaticBufferCount)
		return getDynamicBuffer(BufferSize);

		const uptr Offset = index * StaticBufferSize;
		memset(&RawBuffer[Offset], 0, StaticBufferSize);
		return &RawBuffer[Offset];
		}

		void releaseBuffer(uptr *Buffer, const uptr BufferSize) {
		const uptr index = getStaticBufferIndex(Buffer, BufferSize);
		if (index < StaticBufferCount) {
		ScopedLock L(Mutex);
		DCHECK_EQ((Mask & (static_cast<uptr>(1) << index)), 0U);
		Mask \|= static_cast<uptr>(1) << index;
		} else {
		unmap(reinterpret_cast<void *>(Buffer),
		roundUp(BufferSize, getPageSizeCached()));
		}
		}

		bool isStaticBufferTestOnly(uptr *Buffer, uptr BufferSize) {
		return getStaticBufferIndex(Buffer, BufferSize) < StaticBufferCount;
		}

		private:
		uptr getStaticBufferIndex(uptr *Buffer, uptr BufferSize) {
		if (UNLIKELY(BufferSize > StaticBufferSize))
		return StaticBufferCount;

		const uptr BufferBase = reinterpret_cast<uptr>(Buffer);
		const uptr RawBufferBase = reinterpret_cast<uptr>(RawBuffer);

		if (BufferBase < RawBufferBase \|\|
		BufferBase >= RawBufferBase + sizeof(RawBuffer)) {
		return StaticBufferCount;
		}

		DCHECK_LE(BufferSize, StaticBufferSize);
		DCHECK_LE(BufferBase + BufferSize, RawBufferBase + sizeof(RawBuffer));
		DCHECK_EQ((BufferBase - RawBufferBase) % StaticBufferSize, 0U);

		const uptr index =
		(BufferBase - RawBufferBase) / (StaticBufferSize * sizeof(uptr));
		cferrisUnsubmitted Done Reply Inline Actions I know this is pulled from the previous code, but the comment should reference that this is only done for Fuchsia. We might want to say that it hasn't proven a performance benefit on other platforms. cferris: I know this is pulled from the previous code, but the comment should reference that this is…
		DCHECK_LT(index, StaticBufferCount);
		return index;
		}

		uptr *getDynamicBuffer(const uptr BufferSize) {
		// When using a heap-based buffer, precommit the pages backing the
		// Vmar by passing \|MAP_PRECOMMIT\| flag. This allows an optimization
		// where page fault exceptions are skipped as the allocated memory
		cferrisUnsubmitted Done Reply Inline Actions This is slightly confusing. Maybe something like a '1' means that buffer index is not used. '0' means the buffer is in use. cferris: This is slightly confusing. Maybe something like a '1' means that buffer index is not used. '0'…
		// is accessed. So far, this is only enabled on Fuchsia. It hasn't proven a
		// performance benefit on other platforms.
		const uptr MmapFlags = MAP_ALLOWNOMEM \| (SCUDO_FUCHSIA ? MAP_PRECOMMIT : 0);
		return reinterpret_cast<uptr *>(
		map(nullptr, roundUp(BufferSize, getPageSizeCached()), "scudo:counters",
		MmapFlags, &MapData));
		}

		HybridMutex Mutex;
		// '1' means that buffer index is not used. '0' means the buffer is in use.
		uptr Mask GUARDED_BY(Mutex) = ~static_cast<uptr>(0);
		uptr RawBuffer[StaticBufferCount * StaticBufferSize] GUARDED_BY(Mutex);
		[[no_unique_address]] MapPlatformData MapData = {};
		};

// A Region page map is used to record the usage of pages in the regions. It		// A Region page map is used to record the usage of pages in the regions. It
// implements a packed array of Counters. Each counter occupies 2^N bits, enough		// implements a packed array of Counters. Each counter occupies 2^N bits, enough
// to store counter's MaxValue. Ctor will try to use a static buffer first, and		// to store counter's MaxValue. Ctor will try to use a static buffer first, and
// if that fails (the buffer is too small or already locked), will allocate the		// if that fails (the buffer is too small or already locked), will allocate the
// required Buffer via map(). The caller is expected to check whether the		// required Buffer via map(). The caller is expected to check whether the
// initialization was successful by checking isAllocated() result. For		// initialization was successful by checking isAllocated() result. For
// performance sake, none of the accessors check the validity of the arguments,		// performance sake, none of the accessors check the validity of the arguments,
// It is assumed that Index is always in [0, N) range and the value is not		// It is assumed that Index is always in [0, N) range and the value is not
Show All 11 Lines	RegionPageMap()
BufferSize(0),		BufferSize(0),
Buffer(nullptr) {}		Buffer(nullptr) {}
RegionPageMap(uptr NumberOfRegions, uptr CountersPerRegion, uptr MaxValue) {		RegionPageMap(uptr NumberOfRegions, uptr CountersPerRegion, uptr MaxValue) {
reset(NumberOfRegions, CountersPerRegion, MaxValue);		reset(NumberOfRegions, CountersPerRegion, MaxValue);
}		}
~RegionPageMap() {		~RegionPageMap() {
if (!isAllocated())		if (!isAllocated())
return;		return;
if (Buffer == &StaticBuffer[0])		Buffers.releaseBuffer(Buffer, BufferSize);
Mutex.unlock();
else
unmap(reinterpret_cast<void *>(Buffer),
roundUp(BufferSize, getPageSizeCached()));
Buffer = nullptr;		Buffer = nullptr;
}		}

// Lock of `StaticBuffer` is acquired conditionally and there's no easy way to		// Lock of `StaticBuffer` is acquired conditionally and there's no easy way to
// specify the thread-safety attribute properly in current code structure.		// specify the thread-safety attribute properly in current code structure.
// Besides, it's the only place we may want to check thread safety. Therefore,		// Besides, it's the only place we may want to check thread safety. Therefore,
// it's fine to bypass the thread-safety analysis now.		// it's fine to bypass the thread-safety analysis now.
void reset(uptr NumberOfRegion, uptr CountersPerRegion,		void reset(uptr NumberOfRegion, uptr CountersPerRegion, uptr MaxValue) {
uptr MaxValue) NO_THREAD_SAFETY_ANALYSIS {
DCHECK_GT(NumberOfRegion, 0);		DCHECK_GT(NumberOfRegion, 0);
DCHECK_GT(CountersPerRegion, 0);		DCHECK_GT(CountersPerRegion, 0);
DCHECK_GT(MaxValue, 0);		DCHECK_GT(MaxValue, 0);

Regions = NumberOfRegion;		Regions = NumberOfRegion;
NumCounters = CountersPerRegion;		NumCounters = CountersPerRegion;

constexpr uptr MaxCounterBits = sizeof(Buffer) 8UL;		constexpr uptr MaxCounterBits = sizeof(Buffer) 8UL;
Show All 9 Lines	void reset(uptr NumberOfRegion, uptr CountersPerRegion, uptr MaxValue) {
DCHECK_GT(PackingRatio, 0);		DCHECK_GT(PackingRatio, 0);
PackingRatioLog = getLog2(PackingRatio);		PackingRatioLog = getLog2(PackingRatio);
BitOffsetMask = PackingRatio - 1;		BitOffsetMask = PackingRatio - 1;

SizePerRegion =		SizePerRegion =
roundUp(NumCounters, static_cast<uptr>(1U) << PackingRatioLog) >>		roundUp(NumCounters, static_cast<uptr>(1U) << PackingRatioLog) >>
PackingRatioLog;		PackingRatioLog;
BufferSize = SizePerRegion * sizeof(Buffer) Regions;		BufferSize = SizePerRegion * sizeof(Buffer) Regions;
if (BufferSize <= (StaticBufferCount * sizeof(Buffer[0])) &&		Buffer = Buffers.getBuffer(BufferSize);
Mutex.tryLock()) {		DCHECK_NE(Buffer, nullptr);
Buffer = &StaticBuffer[0];
memset(Buffer, 0, BufferSize);
} else {
// When using a heap-based buffer, precommit the pages backing the
// Vmar by passing \|MAP_PRECOMMIT\| flag. This allows an optimization
// where page fault exceptions are skipped as the allocated memory
// is accessed.
const uptr MmapFlags =
MAP_ALLOWNOMEM \| (SCUDO_FUCHSIA ? MAP_PRECOMMIT : 0);
Buffer = reinterpret_cast<uptr *>(
map(nullptr, roundUp(BufferSize, getPageSizeCached()),
"scudo:counters", MmapFlags, &MapData));
}
}		}

bool isAllocated() const { return !!Buffer; }		bool isAllocated() const { return !!Buffer; }

uptr getCount() const { return NumCounters; }		uptr getCount() const { return NumCounters; }

uptr get(uptr Region, uptr I) const {		uptr get(uptr Region, uptr I) const {
DCHECK_LT(Region, Regions);		DCHECK_LT(Region, Regions);
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	bool updateAsAllCountedIf(uptr Region, uptr I, uptr MaxCount) {
return false;		return false;
}		}
bool isAllCounted(uptr Region, uptr I) const {		bool isAllCounted(uptr Region, uptr I) const {
return get(Region, I) == CounterMask;		return get(Region, I) == CounterMask;
}		}

uptr getBufferSize() const { return BufferSize; }		uptr getBufferSize() const { return BufferSize; }

static const uptr StaticBufferCount = 2048U;

private:		private:
uptr Regions;		uptr Regions;
uptr NumCounters;		uptr NumCounters;
uptr CounterSizeBitsLog;		uptr CounterSizeBitsLog;
uptr CounterMask;		uptr CounterMask;
uptr PackingRatioLog;		uptr PackingRatioLog;
uptr BitOffsetMask;		uptr BitOffsetMask;

uptr SizePerRegion;		uptr SizePerRegion;
uptr BufferSize;		uptr BufferSize;
uptr *Buffer;		uptr *Buffer;
[[no_unique_address]] MapPlatformData MapData = {};

static HybridMutex Mutex;		// We may consider making this configurable if there are cases which may
static uptr StaticBuffer[StaticBufferCount] GUARDED_BY(Mutex);		// benefit from this.
		static const uptr StaticBufferCount = 2U;
		static const uptr StaticBufferSize = 512U;
		static BufferPool<StaticBufferCount, StaticBufferSize> Buffers;
};		};

template <class ReleaseRecorderT> class FreePagesRangeTracker {		template <class ReleaseRecorderT> class FreePagesRangeTracker {
public:		public:
explicit FreePagesRangeTracker(ReleaseRecorderT &Recorder)		explicit FreePagesRangeTracker(ReleaseRecorderT &Recorder)
: Recorder(Recorder), PageSizeLog(getLog2(getPageSizeCached())) {}		: Recorder(Recorder), PageSizeLog(getLog2(getPageSizeCached())) {}

void processNextPage(bool Released) {		void processNextPage(bool Released) {
▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

compiler-rt/lib/scudo/standalone/release.cpp

	//===-- release.cpp ---------------------------------------------- C++ --===//			//===-- release.cpp ---------------------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "release.h"			#include "release.h"

	namespace scudo {			namespace scudo {

	HybridMutex RegionPageMap::Mutex = {};			BufferPool<RegionPageMap::StaticBufferCount, RegionPageMap::StaticBufferSize>
	uptr RegionPageMap::StaticBuffer[RegionPageMap::StaticBufferCount];			RegionPageMap::Buffers;

	} // namespace scudo			} // namespace scudo

compiler-rt/lib/scudo/standalone/tests/release_test.cpp

	Show First 20 Lines • Show All 556 Lines • ▼ Show 20 Lines
	}			}

	TEST(ScudoReleaseTest, ReleasePartialRegion) {			TEST(ScudoReleaseTest, ReleasePartialRegion) {
	testReleasePartialRegion<scudo::DefaultSizeClassMap>();			testReleasePartialRegion<scudo::DefaultSizeClassMap>();
	testReleasePartialRegion<scudo::AndroidSizeClassMap>();			testReleasePartialRegion<scudo::AndroidSizeClassMap>();
	testReleasePartialRegion<scudo::FuchsiaSizeClassMap>();			testReleasePartialRegion<scudo::FuchsiaSizeClassMap>();
	testReleasePartialRegion<scudo::SvelteSizeClassMap>();			testReleasePartialRegion<scudo::SvelteSizeClassMap>();
	}			}

				TEST(ScudoReleaseTest, BufferPool) {
				constexpr scudo::uptr StaticBufferCount = SCUDO_WORDSIZE - 1;
				constexpr scudo::uptr StaticBufferSize = 512U;
				scudo::BufferPool<StaticBufferCount, StaticBufferSize> Pool;

				std::vector<std::pair<scudo::uptr *, scudo::uptr>> Buffers;
				for (scudo::uptr I = 0; I < StaticBufferCount; ++I) {
				scudo::uptr *P = Pool.getBuffer(StaticBufferSize);
				EXPECT_TRUE(Pool.isStaticBufferTestOnly(P, StaticBufferSize));
				Buffers.emplace_back(P, StaticBufferSize);
				}

				// The static buffer is supposed to be used up.
				scudo::uptr *P = Pool.getBuffer(StaticBufferSize);
				EXPECT_FALSE(Pool.isStaticBufferTestOnly(P, StaticBufferSize));

				Pool.releaseBuffer(P, StaticBufferSize);
				for (auto &Buffer : Buffers)
				Pool.releaseBuffer(Buffer.first, Buffer.second);
				}