This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/sanitizer_common/
-
sanitizer_common/
-
sanitizer_allocator.h
12/15
sanitizer_allocator_primary32.h
-
sanitizer_allocator_primary64.h
-
test/scudo/
-
scudo/
-
random_shuffle.cpp

Differential D39244

[sanitizer] Random shuffling of chunks for the 32-bit Primary Allocator
ClosedPublic

Authored by cryptoad on Oct 24 2017, 8:57 AM.

Download Raw Diff

Details

Reviewers

alekseyshl

Commits

rGc484912b068f: [sanitizer] Random shuffling of chunks for the 32-bit Primary Allocator
rCRT316596: [sanitizer] Random shuffling of chunks for the 32-bit Primary Allocator
rL316596: [sanitizer] Random shuffling of chunks for the 32-bit Primary Allocator

Summary

The 64-bit primary has had random shuffling of chunks for a while, this
implements it for the 32-bit primary. Scudo is currently the only user of
kRandomShuffleChunks.

This change consists of a few modifications:

move the random shuffling functions out of the 64-bit primary to sanitizer_common.h. Alternatively I could move them to sanitizer_allocator.h as they are only used in the allocator, I don't feel strongly either way;
small change in the 64-bit primary to make the rand_state initialization UNLIKELY;
addition of a rand_state in the 32-bit primary's SizeClassInfo and shuffling of chunks when populating the free list.
enabling the random_shuffle.cpp test on platforms using the 32-bit primary for Scudo.

Some comments on why the shuffling is done that way. Initially I just
implemented a Shuffle function in the TransferBatch which was simpler but I
came to realize this wasn't good enough: for chunks of 10000 bytes for example,
with a CompactSizeClassMap, a batch holds only 1 chunk, meaning shuffling the
batch has no effect, while a region is usually 1MB, eg: 104 chunks of that size.
So I decided to "stage" the newly gathered chunks in a temporary array that
would be shuffled prior to placing the chunks in batches.
The result is looping twice through n_chunks even if shuffling is not enabled,
but I didn't notice any significant significant performance impact.

Diff Detail

Build Status

Buildable 11488
Build 11488: arc lint + arc unit

Event Timeline

cryptoad created this revision.Oct 24 2017, 8:57 AM

Harbormaster completed remote builds in B11432: Diff 120078.Oct 24 2017, 8:57 AM

Herald added subscribers: kubamracek, srhines. · View Herald TranscriptOct 24 2017, 8:57 AM

Updating some indentation & a comment.

Harbormaster completed remote builds in B11436: Diff 120083.Oct 24 2017, 9:20 AM

alekseyshl added inline comments.Oct 24 2017, 10:51 AM

lib/sanitizer_common/sanitizer_allocator_primary32.h
273	How about using [kCacheLineSize - offsetof(SizeClassInfo, padding)] instead?
345	How about using the newly allocated region for the array? You can fill and shuffle all the pointers at once, create the batches and then zero reg out. Still two passes, but less stack and chunks will be farther apart. The code will be simpler too and I have a feeling that we can even avoid double copying for !kShuffleArraySize.
lib/sanitizer_common/sanitizer_common.h
933 ↗	(On Diff #120083)	Yep, you're right, let's not pollute this file more than it is necessary and move it to sanitizer_allocator.h.

cryptoad added inline comments.Oct 24 2017, 11:34 AM

lib/sanitizer_common/sanitizer_allocator_primary32.h
345	This is tricky because if kRandomShuffleChunks is not used with kUseSeparateSizeClassForBatch , then the batches can also use the region itself. Meaning while we iterate through the chunks, we could create a batch that would overlap and corrupt the array we are iterating on. We can't really assume any part of the region is "safe" to store the array either due to the randomness, so using the end of the region or the beginning doesn't change that. If kUseSeparateSizeClassForBatch is used, then we are golden and that can work. Currently Scudo uses both, but I feel they should be treated distinctly nonetheless.

Moving Rand* to sanitizer_allocator.h instead.

Harbormaster completed remote builds in B11443: Diff 120108.Oct 24 2017, 11:44 AM

cryptoad added inline comments.Oct 24 2017, 12:14 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h
273	I can't seem to make that general idea to work one way or another. The other possibility that some sanitizer parts use is just char padding[kCacheLinesize].

cryptoad added inline comments.Oct 24 2017, 12:34 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h
345	Rectification: even with kUseSeparateSizeClassforBatch, there is the special case of kBatchClassID for which the batches are stored in the same region.

alekseyshl added inline comments.Oct 24 2017, 1:50 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h
273	Ah, right, that should not work there, sorry. Why not make it explicit then: kCacheLineSize - sizeof(SpinMutex) - sizeof(IntrusiveList<TransferBatch> - sizeof(u32)) or struct SizeClassInfoData { SpinMutex mutex; IntrusiveList<TransferBatch> free_list; u32 rand_state; } template <typename T, uptr N> struct Padded : public T { private: char padding[N - sizeof(T)]; }; typedef Padded<SizeClassInfoData, kCacheLineSize> SizeClassInfo; but yeah, it's too much for the cause :)
345	Doesn't this latter case create a loop? We're in PopulateFreeList because AllocateBatch figured that sci->free_list.empty(), and when PopulateFreeList calls CreateBatch, don't we get back to AllocateBatch with free list still empty? Sorry for too many questions, just want to understand what's going on here.

cryptoad added inline comments.Oct 24 2017, 2:37 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h

273

Regarding the first part, it actually raises an interesting point: if using the 32-bit primary on 64-bit, then fields are uptr aligned.
So - sizeof(u32) doesn't work as opposed to sizeof(uptr).
Which raises the question of: should the structures be packed tighter?

345

No worries, all questions are welcome!

So the first PopulateFreeList will be with the class_id that we requested with malloc, then it will go down the again in PopulateFreeList except that this time the class_id will be kBatchClassId, which is serviced from its own region (eg: it doesn't go down into PopulateFreeList again).
So for the first invocation (where it needs to create a region for the size requested and then for the batch), the stack looks like:

#0  __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::CreateBatch
#1  0x00000000005095f1 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateBatches
#2  0x0000000000509123 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateFreeList
#3  0x0000000000508c4e in __sanitizer::SizeClassAllocator32<__scudo::AP32>::AllocateBatch
#4  0x0000000000508a26 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Refill
#5  0x0000000000508942 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Allocate
#6  0x00000000005085f2 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::CreateBatch
#7  0x00000000005095f1 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateBatches
#8  0x0000000000509123 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateFreeList
#9  0x0000000000508c4e in __sanitizer::SizeClassAllocator32<__scudo::AP32>::AllocateBatch
#10 0x0000000000508a26 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Refill
#11 0x0000000000508942 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Allocate

Frames 11 to 6 are wrt to the requested class_id, 5 to 0 are for kBatchClassId.
When reaching 0, we just return b and everything unrolls back up.
This is only for the first allocation, afterwards the Region for the TransferBatch is created and the associated free list is populated so we won't go there again unless we need to re-populate them.

alekseyshl added inline comments.Oct 24 2017, 3:34 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h
345	Thanks for the stack! So, you're saying it never happens, we never allocate batches from the same bucket (and thus, the same region)? Then why my proposal of using the new region for the temp buffer is not viable? Otherwise, if I got your explanation wrong and there are cases when batches are allocated from the same region (the region we're populating free list for), how come it does not get into a loop I mentioned earlier? I am a bit confused...

cryptoad added inline comments.Oct 24 2017, 4:02 PM

lib/sanitizer_common/sanitizer_allocator_primary32.h
345	I am going to try and take a step back to make sure I am on the same wave length as you. The proposal as I understand it is to put all the possible pointers in the newly allocated region, and shuffle them, there, thus allowing for more than a 48 shuffle at a time. Then create and fill batches by iterating through the shuffled array, potentially zero out the region. If this is not the actual proposal, please let me know. With the proposal, there are 2 cases to take into account: kUseSeparateSizeClassforBatch is true or false. If kUseSeparateSizeClassforBatch is false, then if a chunk in a class is large enough to hold a transfer batch, it will be used as a TranferBatch. https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_allocator_local_cache.h#L146 Meaning that when we do a b->Add(xxx) we actually overwrite some of region memory itself. In this case, it sounds like it would be tough to not overlap an in-region transfer batch with the pointers we are iterating through. As such, I am not sure the proposal is viable (or at least without significant code complexity). If kUseSeparateSizeClassforBatch is true, then we allocate transfer batches from a separate class size, so we could reuse the region for all other classes except that one. Because for that particular one, we fall back into an equivalent of the preceding case: batches for the kBatchClassId class are allocated within the class region itself. Now we could make this work by ensuring there is no randomization in that specific kBatchClassId class: indeed we do not need batches to be randomized. A couple of other points to consider: For class 1 (16 bytes) with a 1MB region, we get 65536 chunks in the region. If we are using the region to hold the array, with 64-bit pointers, it's 524288 bytes that we are dirtying in the region (as well as the transfer batch region really); Zeroing that memory (to avoid potential disclosure of valid pointers) will be somewhat costly as well. By using a local array, we reduce the randomness of our chunk shuffle (at most 48 consecutive chunks get shuffled at a time), but we do not touch the newly allocated region (yet) and save on some complications. I hope this clarifies my point of view (if my understanding of what you are suggesting is correct).

alekseyshl accepted this revision.Oct 24 2017, 4:54 PM

alekseyshl added inline comments.

lib/sanitizer_common/sanitizer_allocator_primary32.h
273	I guess they should not, uptr alignment is faster, right? So, yep, leave it as it is.
345	Yes, you understanding is correct, that was exactly what I proposed. So, when batches are allocated from the same region as regular chunks, the batch contain a pointer to itself, right? Ok, now it makes more sense. Considering your other points, yes, local array seem like a proper (and safer) solution.
346	Can you rename j to something less Fortran-y?
351	Move the shuffling into PopulateBatches.

This revision is now accepted and ready to land.Oct 24 2017, 4:54 PM

Addressing comments from the review:

moving shuffling into PopulateBatches
renaming an ambiguous variable

I also changed a couple more things:

if using a separate class for batches, do not shuffle it (there is no user controlled allocations in the region so randomizing isn't useful);
32-bit primary rand_state initialized with sci rather than this so that 2 different classes initialized at the same time would have different seeds.

alekseyshl accepted this revision.Oct 25 2017, 10:19 AM

cryptoad closed this revision.Oct 25 2017, 10:25 AM

Revision Contents

Path

Size

lib/

sanitizer_common/

sanitizer_allocator.h

13 lines

sanitizer_allocator_primary32.h

54 lines

sanitizer_allocator_primary64.h

18 lines

test/

scudo/

random_shuffle.cpp

2 lines

Diff 120272

lib/sanitizer_common/sanitizer_allocator.h

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	struct NoOpMapUnmapCallback {			struct NoOpMapUnmapCallback {
	void OnMap(uptr p, uptr size) const { }			void OnMap(uptr p, uptr size) const { }
	void OnUnmap(uptr p, uptr size) const { }			void OnUnmap(uptr p, uptr size) const { }
	};			};

	// Callback type for iterating over chunks.			// Callback type for iterating over chunks.
	typedef void (ForEachChunkCallback)(uptr chunk, void arg);			typedef void (ForEachChunkCallback)(uptr chunk, void arg);

				INLINE u32 Rand(u32 *state) { // ANSI C linear congruential PRNG.
				return (state = state * 1103515245 + 12345) >> 16;
				}

				INLINE u32 RandN(u32 *state, u32 n) { return Rand(state) % n; } // [0, n)

				template<typename T>
				INLINE void RandomShuffle(T a, u32 n, u32 rand_state) {
				if (n <= 1) return;
				for (u32 i = n - 1; i > 0; i--)
				Swap(a[i], a[RandN(rand_state, i + 1)]);
				}

	#include "sanitizer_allocator_size_class_map.h"			#include "sanitizer_allocator_size_class_map.h"
	#include "sanitizer_allocator_stats.h"			#include "sanitizer_allocator_stats.h"
	#include "sanitizer_allocator_primary64.h"			#include "sanitizer_allocator_primary64.h"
	#include "sanitizer_allocator_bytemap.h"			#include "sanitizer_allocator_bytemap.h"
	#include "sanitizer_allocator_primary32.h"			#include "sanitizer_allocator_primary32.h"
	#include "sanitizer_allocator_local_cache.h"			#include "sanitizer_allocator_local_cache.h"
	#include "sanitizer_allocator_secondary.h"			#include "sanitizer_allocator_secondary.h"
	#include "sanitizer_allocator_combined.h"			#include "sanitizer_allocator_combined.h"

	} // namespace __sanitizer			} // namespace __sanitizer

	#endif // SANITIZER_ALLOCATOR_H			#endif // SANITIZER_ALLOCATOR_H

lib/sanitizer_common/sanitizer_allocator_primary32.h

Show First 20 Lines • Show All 262 Lines • ▼ Show 20 Lines

private:		private:
static const uptr kRegionSize = 1 << kRegionSizeLog;		static const uptr kRegionSize = 1 << kRegionSizeLog;
static const uptr kNumPossibleRegions = kSpaceSize / kRegionSize;		static const uptr kNumPossibleRegions = kSpaceSize / kRegionSize;

struct SizeClassInfo {		struct SizeClassInfo {
SpinMutex mutex;		SpinMutex mutex;
IntrusiveList<TransferBatch> free_list;		IntrusiveList<TransferBatch> free_list;
char padding[kCacheLineSize - sizeof(uptr) -		u32 rand_state;
		char padding[kCacheLineSize - 2 * sizeof(uptr) -
sizeof(IntrusiveList<TransferBatch>)];		sizeof(IntrusiveList<TransferBatch>)];
		alekseyshlUnsubmitted Not Done Reply Inline Actions How about using [kCacheLineSize - offsetof(SizeClassInfo, padding)] instead? alekseyshl: How about using [kCacheLineSize - offsetof(SizeClassInfo, padding)] instead?
		cryptoadAuthorUnsubmitted Not Done Reply Inline Actions I can't seem to make that general idea to work one way or another. The other possibility that some sanitizer parts use is just char padding[kCacheLinesize]. cryptoad: I can't seem to make that general idea to work one way or another. The other possibility that…
		alekseyshlUnsubmitted Not Done Reply Inline Actions Ah, right, that should not work there, sorry. Why not make it explicit then: kCacheLineSize - sizeof(SpinMutex) - sizeof(IntrusiveList<TransferBatch> - sizeof(u32)) or struct SizeClassInfoData { SpinMutex mutex; IntrusiveList<TransferBatch> free_list; u32 rand_state; } template <typename T, uptr N> struct Padded : public T { private: char padding[N - sizeof(T)]; }; typedef Padded<SizeClassInfoData, kCacheLineSize> SizeClassInfo; but yeah, it's too much for the cause :) alekseyshl: Ah, right, that should not work there, sorry. Why not make it explicit then: kCacheLineSize…
		cryptoadAuthorUnsubmitted Done Reply Inline Actions Regarding the first part, it actually raises an interesting point: if using the 32-bit primary on 64-bit, then fields are uptr aligned. So - sizeof(u32) doesn't work as opposed to sizeof(uptr). Which raises the question of: should the structures be packed tighter? cryptoad: Regarding the first part, it actually raises an interesting point: if using the 32-bit primary…
		alekseyshlUnsubmitted Done Reply Inline Actions I guess they should not, uptr alignment is faster, right? So, yep, leave it as it is. alekseyshl: I guess they should not, uptr alignment is faster, right? So, yep, leave it as it is.
};		};
COMPILER_CHECK(sizeof(SizeClassInfo) == kCacheLineSize);		COMPILER_CHECK(sizeof(SizeClassInfo) == kCacheLineSize);

uptr ComputeRegionId(uptr mem) {		uptr ComputeRegionId(uptr mem) {
uptr res = mem >> kRegionSizeLog;		uptr res = mem >> kRegionSizeLog;
CHECK_LT(res, kNumPossibleRegions);		CHECK_LT(res, kNumPossibleRegions);
return res;		return res;
}		}
Show All 15 Lines	uptr AllocateRegion(AllocatorStats *stat, uptr class_id) {
return res;		return res;
}		}

SizeClassInfo *GetSizeClassInfo(uptr class_id) {		SizeClassInfo *GetSizeClassInfo(uptr class_id) {
CHECK_LT(class_id, kNumClasses);		CHECK_LT(class_id, kNumClasses);
return &size_class_info_array[class_id];		return &size_class_info_array[class_id];
}		}

		bool PopulateBatches(AllocatorCache c, SizeClassInfo sci, uptr class_id,
		TransferBatch **current_batch, uptr max_count,
		uptr *pointers_array, uptr count) {
		// If using a separate class for batches, we do not need to shuffle it.
		if (kRandomShuffleChunks && (!kUseSeparateSizeClassForBatch \|\|
		class_id != SizeClassMap::kBatchClassID))
		RandomShuffle(pointers_array, count, &sci->rand_state);
		TransferBatch b = current_batch;
		for (uptr i = 0; i < count; i++) {
		if (!b) {
		b = c->CreateBatch(class_id, this, (TransferBatch*)pointers_array[i]);
		if (UNLIKELY(!b))
		return false;
		b->Clear();
		}
		b->Add((void*)pointers_array[i]);
		if (b->Count() == max_count) {
		sci->free_list.push_back(b);
		b = nullptr;
		}
		}
		*current_batch = b;
		return true;
		}

bool PopulateFreeList(AllocatorStats stat, AllocatorCache c,		bool PopulateFreeList(AllocatorStats stat, AllocatorCache c,
SizeClassInfo *sci, uptr class_id) {		SizeClassInfo *sci, uptr class_id) {
uptr size = ClassIdToSize(class_id);		uptr size = ClassIdToSize(class_id);
uptr reg = AllocateRegion(stat, class_id);		uptr reg = AllocateRegion(stat, class_id);
if (UNLIKELY(!reg))		if (UNLIKELY(!reg))
return false;		return false;
		if (kRandomShuffleChunks)
		if (UNLIKELY(sci->rand_state == 0))
		// The random state is initialized from ASLR (PIE) and time.
		sci->rand_state = reinterpret_cast<uptr>(sci) ^ NanoTime();
uptr n_chunks = kRegionSize / (size + kMetadataSize);		uptr n_chunks = kRegionSize / (size + kMetadataSize);
uptr max_count = TransferBatch::MaxCached(class_id);		uptr max_count = TransferBatch::MaxCached(class_id);
CHECK_GT(max_count, 0);		CHECK_GT(max_count, 0);
TransferBatch *b = nullptr;		TransferBatch *b = nullptr;
		const uptr kShuffleArraySize = 48;
		uptr shuffle_array[kShuffleArraySize];
		alekseyshlUnsubmitted Done Reply Inline Actions How about using the newly allocated region for the array? You can fill and shuffle all the pointers at once, create the batches and then zero reg out. Still two passes, but less stack and chunks will be farther apart. The code will be simpler too and I have a feeling that we can even avoid double copying for !kShuffleArraySize. alekseyshl: How about using the newly allocated region for the array? You can fill and shuffle all the…
		cryptoadAuthorUnsubmitted Done Reply Inline Actions This is tricky because if kRandomShuffleChunks is not used with kUseSeparateSizeClassForBatch , then the batches can also use the region itself. Meaning while we iterate through the chunks, we could create a batch that would overlap and corrupt the array we are iterating on. We can't really assume any part of the region is "safe" to store the array either due to the randomness, so using the end of the region or the beginning doesn't change that. If kUseSeparateSizeClassForBatch is used, then we are golden and that can work. Currently Scudo uses both, but I feel they should be treated distinctly nonetheless. cryptoad: This is tricky because if kRandomShuffleChunks is not used with kUseSeparateSizeClassForBatch…
		cryptoadAuthorUnsubmitted Done Reply Inline Actions Rectification: even with kUseSeparateSizeClassforBatch, there is the special case of kBatchClassID for which the batches are stored in the same region. cryptoad: Rectification: even with kUseSeparateSizeClassforBatch, there is the special case of…
		alekseyshlUnsubmitted Done Reply Inline Actions Doesn't this latter case create a loop? We're in PopulateFreeList because AllocateBatch figured that sci->free_list.empty(), and when PopulateFreeList calls CreateBatch, don't we get back to AllocateBatch with free list still empty? Sorry for too many questions, just want to understand what's going on here. alekseyshl: Doesn't this latter case create a loop? We're in PopulateFreeList because AllocateBatch figured…
		cryptoadAuthorUnsubmitted Done Reply Inline Actions No worries, all questions are welcome! So the first PopulateFreeList will be with the class_id that we requested with malloc, then it will go down the again in PopulateFreeList except that this time the class_id will be kBatchClassId, which is serviced from its own region (eg: it doesn't go down into PopulateFreeList again). So for the first invocation (where it needs to create a region for the size requested and then for the batch), the stack looks like: #0 __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::CreateBatch #1 0x00000000005095f1 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateBatches #2 0x0000000000509123 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateFreeList #3 0x0000000000508c4e in __sanitizer::SizeClassAllocator32<__scudo::AP32>::AllocateBatch #4 0x0000000000508a26 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Refill #5 0x0000000000508942 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Allocate #6 0x00000000005085f2 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::CreateBatch #7 0x00000000005095f1 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateBatches #8 0x0000000000509123 in __sanitizer::SizeClassAllocator32<__scudo::AP32>::PopulateFreeList #9 0x0000000000508c4e in __sanitizer::SizeClassAllocator32<__scudo::AP32>::AllocateBatch #10 0x0000000000508a26 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Refill #11 0x0000000000508942 in __sanitizer::SizeClassAllocator32LocalCache<__sanitizer::SizeClassAllocator32<__scudo::AP32> >::Allocate Frames 11 to 6 are wrt to the requested class_id, 5 to 0 are for kBatchClassId. When reaching 0, we just return b and everything unrolls back up. This is only for the first allocation, afterwards the Region for the TransferBatch is created and the associated free list is populated so we won't go there again unless we need to re-populate them. cryptoad: No worries, all questions are welcome! So the first PopulateFreeList will be with the class_id…
		alekseyshlUnsubmitted Done Reply Inline Actions Thanks for the stack! So, you're saying it never happens, we never allocate batches from the same bucket (and thus, the same region)? Then why my proposal of using the new region for the temp buffer is not viable? Otherwise, if I got your explanation wrong and there are cases when batches are allocated from the same region (the region we're populating free list for), how come it does not get into a loop I mentioned earlier? I am a bit confused... alekseyshl: Thanks for the stack! So, you're saying it never happens, we never allocate batches from the…
		cryptoadAuthorUnsubmitted Done Reply Inline Actions I am going to try and take a step back to make sure I am on the same wave length as you. The proposal as I understand it is to put all the possible pointers in the newly allocated region, and shuffle them, there, thus allowing for more than a 48 shuffle at a time. Then create and fill batches by iterating through the shuffled array, potentially zero out the region. If this is not the actual proposal, please let me know. With the proposal, there are 2 cases to take into account: kUseSeparateSizeClassforBatch is true or false. If kUseSeparateSizeClassforBatch is false, then if a chunk in a class is large enough to hold a transfer batch, it will be used as a TranferBatch. https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_allocator_local_cache.h#L146 Meaning that when we do a b->Add(xxx) we actually overwrite some of region memory itself. In this case, it sounds like it would be tough to not overlap an in-region transfer batch with the pointers we are iterating through. As such, I am not sure the proposal is viable (or at least without significant code complexity). If kUseSeparateSizeClassforBatch is true, then we allocate transfer batches from a separate class size, so we could reuse the region for all other classes except that one. Because for that particular one, we fall back into an equivalent of the preceding case: batches for the kBatchClassId class are allocated within the class region itself. Now we could make this work by ensuring there is no randomization in that specific kBatchClassId class: indeed we do not need batches to be randomized. A couple of other points to consider: For class 1 (16 bytes) with a 1MB region, we get 65536 chunks in the region. If we are using the region to hold the array, with 64-bit pointers, it's 524288 bytes that we are dirtying in the region (as well as the transfer batch region really); Zeroing that memory (to avoid potential disclosure of valid pointers) will be somewhat costly as well. By using a local array, we reduce the randomness of our chunk shuffle (at most 48 consecutive chunks get shuffled at a time), but we do not touch the newly allocated region (yet) and save on some complications. I hope this clarifies my point of view (if my understanding of what you are suggesting is correct). cryptoad: I am going to try and take a step back to make sure I am on the same wave length as you. The…
		alekseyshlUnsubmitted Done Reply Inline Actions Yes, you understanding is correct, that was exactly what I proposed. So, when batches are allocated from the same region as regular chunks, the batch contain a pointer to itself, right? Ok, now it makes more sense. Considering your other points, yes, local array seem like a proper (and safer) solution. alekseyshl: Yes, you understanding is correct, that was exactly what I proposed. So, when batches are…
		uptr count = 0;
		alekseyshlUnsubmitted Done Reply Inline Actions Can you rename j to something less Fortran-y? alekseyshl: Can you rename j to something less Fortran-y?
for (uptr i = reg; i < reg + n_chunks * size; i += size) {		for (uptr i = reg; i < reg + n_chunks * size; i += size) {
if (!b) {		shuffle_array[count++] = i;
b = c->CreateBatch(class_id, this, (TransferBatch*)i);		if (count == kShuffleArraySize) {
if (UNLIKELY(!b))		if (UNLIKELY(!PopulateBatches(c, sci, class_id, &b, max_count,
		shuffle_array, count)))
		alekseyshlUnsubmitted Done Reply Inline Actions Move the shuffling into PopulateBatches. alekseyshl: Move the shuffling into PopulateBatches.
return false;		return false;
b->Clear();		count = 0;
}		}
b->Add((void*)i);
if (b->Count() == max_count) {
sci->free_list.push_back(b);
b = nullptr;
}		}
		if (count) {
		if (UNLIKELY(!PopulateBatches(c, sci, class_id, &b, max_count,
		shuffle_array, count)))
		return false;
}		}
if (b) {		if (b) {
CHECK_GT(b->Count(), 0);		CHECK_GT(b->Count(), 0);
sci->free_list.push_back(b);		sci->free_list.push_back(b);
}		}
return true;		return true;
}		}

ByteMap possible_regions;		ByteMap possible_regions;
SizeClassInfo size_class_info_array[kNumClasses];		SizeClassInfo size_class_info_array[kNumClasses];
};		};

lib/sanitizer_common/sanitizer_allocator_primary64.h

Show First 20 Lines • Show All 591 Lines • ▼ Show 20 Lines	struct RegionInfo {
uptr mapped_meta; // Bytes mapped for metadata.		uptr mapped_meta; // Bytes mapped for metadata.
u32 rand_state; // Seed for random shuffle, used if kRandomShuffleChunks.		u32 rand_state; // Seed for random shuffle, used if kRandomShuffleChunks.
bool exhausted; // Whether region is out of space for new chunks.		bool exhausted; // Whether region is out of space for new chunks.
Stats stats;		Stats stats;
ReleaseToOsInfo rtoi;		ReleaseToOsInfo rtoi;
};		};
COMPILER_CHECK(sizeof(RegionInfo) >= kCacheLineSize);		COMPILER_CHECK(sizeof(RegionInfo) >= kCacheLineSize);

u32 Rand(u32 *state) { // ANSI C linear congruential PRNG.
return (state = state * 1103515245 + 12345) >> 16;
}

u32 RandN(u32 *state, u32 n) { return Rand(state) % n; } // [0, n)

void RandomShuffle(u32 a, u32 n, u32 rand_state) {
if (n <= 1) return;
for (u32 i = n - 1; i > 0; i--)
Swap(a[i], a[RandN(rand_state, i + 1)]);
}

RegionInfo *GetRegionInfo(uptr class_id) const {		RegionInfo *GetRegionInfo(uptr class_id) const {
CHECK_LT(class_id, kNumClasses);		CHECK_LT(class_id, kNumClasses);
RegionInfo *regions =		RegionInfo *regions =
reinterpret_cast<RegionInfo *>(SpaceBeg() + kSpaceSize);		reinterpret_cast<RegionInfo *>(SpaceBeg() + kSpaceSize);
return &regions[class_id];		return &regions[class_id];
}		}

uptr GetMetadataEnd(uptr region_beg) const {		uptr GetMetadataEnd(uptr region_beg) const {
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	NOINLINE bool PopulateFreeArray(AllocatorStats *stat, uptr class_id,
// region->mutex is held.		// region->mutex is held.
const uptr size = ClassIdToSize(class_id);		const uptr size = ClassIdToSize(class_id);
const uptr new_space_beg = region->allocated_user;		const uptr new_space_beg = region->allocated_user;
const uptr new_space_end = new_space_beg + requested_count * size;		const uptr new_space_end = new_space_beg + requested_count * size;
const uptr region_beg = GetRegionBeginBySizeClass(class_id);		const uptr region_beg = GetRegionBeginBySizeClass(class_id);

// Map more space for chunks, if necessary.		// Map more space for chunks, if necessary.
if (new_space_end > region->mapped_user) {		if (new_space_end > region->mapped_user) {
if (!kUsingConstantSpaceBeg && region->mapped_user == 0)		if (!kUsingConstantSpaceBeg && kRandomShuffleChunks)
region->rand_state = static_cast<u32>(region_beg >> 12); // From ASLR.		if (UNLIKELY(region->mapped_user == 0))
		// The random state is initialized from ASLR.
		region->rand_state = static_cast<u32>(region_beg >> 12);
// Do the mmap for the user memory.		// Do the mmap for the user memory.
uptr map_size = kUserMapSize;		uptr map_size = kUserMapSize;
while (new_space_end > region->mapped_user + map_size)		while (new_space_end > region->mapped_user + map_size)
map_size += kUserMapSize;		map_size += kUserMapSize;
CHECK_GE(region->mapped_user + map_size, new_space_end);		CHECK_GE(region->mapped_user + map_size, new_space_end);
if (UNLIKELY(!MapWithCallback(region_beg + region->mapped_user,		if (UNLIKELY(!MapWithCallback(region_beg + region->mapped_user,
map_size)))		map_size)))
return false;		return false;
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

test/scudo/random_shuffle.cpp

	// RUN: %clang_scudo %s -o %t			// RUN: %clang_scudo %s -o %t
	// RUN: rm -rf %T/random_shuffle_tmp_dir			// RUN: rm -rf %T/random_shuffle_tmp_dir
	// RUN: mkdir %T/random_shuffle_tmp_dir			// RUN: mkdir %T/random_shuffle_tmp_dir
	// RUN: %run %t 100 > %T/random_shuffle_tmp_dir/out1			// RUN: %run %t 100 > %T/random_shuffle_tmp_dir/out1
	// RUN: %run %t 100 > %T/random_shuffle_tmp_dir/out2			// RUN: %run %t 100 > %T/random_shuffle_tmp_dir/out2
	// RUN: %run %t 10000 > %T/random_shuffle_tmp_dir/out1			// RUN: %run %t 10000 > %T/random_shuffle_tmp_dir/out1
	// RUN: %run %t 10000 > %T/random_shuffle_tmp_dir/out2			// RUN: %run %t 10000 > %T/random_shuffle_tmp_dir/out2
	// RUN: not diff %T/random_shuffle_tmp_dir/out?			// RUN: not diff %T/random_shuffle_tmp_dir/out?
	// RUN: rm -rf %T/random_shuffle_tmp_dir			// RUN: rm -rf %T/random_shuffle_tmp_dir
	// UNSUPPORTED: i386-linux,arm-linux,armhf-linux,aarch64-linux,mips-linux,mipsel-linux,mips64-linux,mips64el-linux
	// UNSUPPORTED: android

	// Tests that the allocator shuffles the chunks before returning to the user.			// Tests that the allocator shuffles the chunks before returning to the user.

	#include <stdlib.h>			#include <stdlib.h>
	#include <stdio.h>			#include <stdio.h>

	int main(int argc, char **argv) {			int main(int argc, char **argv) {
	int alloc_size = argc == 2 ? atoi(argv[1]) : 100;			int alloc_size = argc == 2 ? atoi(argv[1]) : 100;
	char *base = new char[alloc_size];			char *base = new char[alloc_size];
	for (int i = 0; i < 20; i++) {			for (int i = 0; i < 20; i++) {
	char *p = new char[alloc_size];			char *p = new char[alloc_size];
	printf("%zd\n", base - p);			printf("%zd\n", base - p);
	}			}
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] Random shuffling of chunks for the 32-bit Primary AllocatorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120272

lib/sanitizer_common/sanitizer_allocator.h

lib/sanitizer_common/sanitizer_allocator_primary32.h

lib/sanitizer_common/sanitizer_allocator_primary64.h

test/scudo/random_shuffle.cpp

[sanitizer] Random shuffling of chunks for the 32-bit Primary Allocator
ClosedPublic