This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/scudo/
-
scudo/
2/2
scudo_allocator.cpp
-
scudo_tls.h
1/7
scudo_utils.h
-
scudo_utils.cpp

Differential D35221

[scudo] PRNG makeover
ClosedPublic

Authored by cryptoad on Jul 10 2017, 1:24 PM.

Download Raw Diff

Details

Reviewers

alekseyshl

Commits

rG00582563bef7: [scudo] PRNG makeover
rCRT307798: [scudo] PRNG makeover
rL307798: [scudo] PRNG makeover

Summary

This follows the addition of GetRandom with D34412. We remove our
/dev/urandom code and use the new function. Additionally, change the PRNG for
a slightly faster version. One of the issues with the old code is that we have
64 full bits of randomness per "next", using only 8 of those for the Salt and
discarding the rest. So we add a cached u64 in the PRNG that can serve up to
8 u8 before having to call the "next" function again.

During some integration work, I also realized that some very early processes
(like init) do not benefit from /dev/urandom yet. So if there is no
getrandom syscall as well, we have to fallback to some sort of initialization
of the PRNG.

Now a few words on why XoRoShiRo and not something else. I have played a while
with various PRNGs on 32 & 64 bit platforms. Some results are below. LCG 32 & 64
are usually faster but produce respectively 15 & 31 bits of entropy, meaning
that to get a full 64-bit, you would need to call them several times. The simple
XorShift is fast, produces 32 bits but is mediocre with regard to PRNG test
suites, PCG is slower overall, and XoRoShiRo is faster than XorShift128+ and
produces full 64 bits.

root@tulip-chiphd:/data # ./randtest.arm
[+] starting xs32...
[?] xs32 duration: 22431833053ns
[+] starting lcg32...
[?] lcg32 duration: 14941402090ns
[+] starting pcg32...
[?] pcg32 duration: 44941973771ns
[+] starting xs128p...
[?] xs128p duration: 48889786981ns
[+] starting lcg64...
[?] lcg64 duration: 33831042391ns
[+] starting xos128p...
[?] xos128p duration: 44850878605ns

root@tulip-chiphd:/data # ./randtest.aarch64
[+] starting xs32...
[?] xs32 duration: 22425151678ns
[+] starting lcg32...
[?] lcg32 duration: 14954255257ns
[+] starting pcg32...
[?] pcg32 duration: 37346265726ns
[+] starting xs128p...
[?] xs128p duration: 22523807219ns
[+] starting lcg64...
[?] lcg64 duration: 26141304679ns
[+] starting xos128p...
[?] xos128p duration: 14937033215ns

Diff Detail

Build Status

Buildable 8160
Build 8160: arc lint + arc unit

Event Timeline

cryptoad created this revision.Jul 10 2017, 1:24 PM

Harbormaster completed remote builds in B8106: Diff 105911.Jul 10 2017, 1:24 PM

Herald added subscribers: kristof.beyls, aemerson. · View Herald TranscriptJul 10 2017, 1:24 PM

alekseyshl accepted this revision.Jul 10 2017, 5:28 PM

alekseyshl added inline comments.

lib/scudo/scudo_allocator.cpp
583	This change is unrelated, please revert.
lib/scudo/scudo_utils.h
86	Does this trick really help with performance?

This revision is now accepted and ready to land.Jul 10 2017, 5:28 PM

cryptoad added inline comments.Jul 10 2017, 6:51 PM

lib/scudo/scudo_allocator.cpp
583	Ack. Slipped in from another one sorry.
lib/scudo/scudo_utils.h
86	The numbers do not differ enough to go one way or the other, but the u8 was taking up to extra 8bytes depending on architecture. It feels it should be faster, but it definitely save space.

Addressing comment: removing an UNLIKELY that slipped through.

Harbormaster completed remote builds in B8139: Diff 106032.Jul 11 2017, 8:27 AM

cryptoad added inline comments.Jul 11 2017, 9:09 AM

lib/scudo/scudo_utils.h
86	Actually scratch that. My machine was too loaded to give correct results. The initial version appears to be faster (with the u8): kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE kostyak@kostyak-linux:~$ ./rand [?] duration: 4009332558ns kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand kostyak@kostyak-linux:~$ ./rand [?] duration: 4788913046ns For 1<<32 iterations of `getU8`, and quite stable over the course of multiple runs. I am going to reintroduce it.

cryptoad requested review of this revision.Jul 11 2017, 10:08 AM

cryptoad edited edge metadata.

cryptoad added inline comments.

lib/scudo/scudo_utils.h

It's actually a lot more nuanced and tricky than I was expecting.
Seeding through /dev/urandom 1<< 12 times, and iterating 1<<20 times per seeding, we get the numbers below (stable over multiple runs).
With clang, 32-bit seems equivalent, 64-bit favors the CachedBytesAvailable version.
With gcc, 32-bit favors CachedBytesAvailable, 64-bit favors the other (and overall slower for either).

kostyak@kostyak-linux:~$ clang++ -m32 -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                  
kostyak@kostyak-linux:~$ ./rand                                                                                                        
[?] duration: 4814670294ns
kostyak@kostyak-linux:~$ clang++ -m32 -O3 rand.cc -o rand
kostyak@kostyak-linux:~$ ./rand
[?] duration: 4830693788ns
kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                       
kostyak@kostyak-linux:~$ ./rand
[?] duration: 3115400364ns
kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand
kostyak@kostyak-linux:~$ ./rand
[?] duration: 4394574294ns
kostyak@kostyak-linux:~$ g++ -m32 -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                                                                                                                                                               
kostyak@kostyak-linux:~$ ./rand
[?] duration: 8782558601ns
kostyak@kostyak-linux:~$ g++ -m32 -O3 rand.cc -o rand
kostyak@kostyak-linux:~$ ./rand
[?] duration: 9332069877ns
kostyak@kostyak-linux:~$ g++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE                                                                                                                                                                                                    
kostyak@kostyak-linux:~$ ./rand
[?] duration: 5651244009ns
kostyak@kostyak-linux:~$ g++ -O3 rand.cc -o rand
kostyak@kostyak-linux:~$ ./rand
[?] duration: 4407575998ns

Doing some ARM & Aarch64 tests additionally.
At this point I still feel reintroducing CachedBytesAvailable might provide the most benefits. LMKWYT.

alekseyshl added inline comments.Jul 11 2017, 11:05 AM

lib/scudo/scudo_utils.h
86	Given the lack of the clear performance advantage, we should go with simpler and straightforward version (CachedBytesAvailable).

cryptoad added inline comments.Jul 11 2017, 11:07 AM

lib/scudo/scudo_utils.h
86	And to make things a bit more complicated, on ARM & Aarch64 (at least with the Android toolchain), the version without CachedBytesAvailable appears faster for 32-bit, way faster for 64-bit.

Reintroducing CachedBytesAvailable to assess the amount of bytes left in the
cached u64.

Harbormaster completed remote builds in B8144: Diff 106073.Jul 11 2017, 11:11 AM

alekseyshl accepted this revision.Jul 11 2017, 2:41 PM

alekseyshl added inline comments.

lib/scudo/scudo_utils.h
85	sizeof(u64) -> sizeof(CachedBytes)

This revision is now accepted and ready to land.Jul 11 2017, 2:41 PM

Addressing Aleksey's comment.

cryptoad closed this revision.Jul 12 2017, 8:29 AM

Revision Contents

Path

Size

lib/

scudo/

16 lines

2 lines

57 lines

36 lines

Diff 106203

lib/scudo/scudo_allocator.cpp

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	AllocatorCache getAllocatorCache(ScudoThreadContext ThreadContext) {
return &ThreadContext->Cache;		return &ThreadContext->Cache;
}		}

ScudoQuarantineCache getQuarantineCache(ScudoThreadContext ThreadContext) {		ScudoQuarantineCache getQuarantineCache(ScudoThreadContext ThreadContext) {
return reinterpret_cast<		return reinterpret_cast<
ScudoQuarantineCache *>(ThreadContext->QuarantineCachePlaceHolder);		ScudoQuarantineCache *>(ThreadContext->QuarantineCachePlaceHolder);
}		}

Xorshift128Plus getPrng(ScudoThreadContext ThreadContext) {		ScudoPrng getPrng(ScudoThreadContext ThreadContext) {
return &ThreadContext->Prng;		return &ThreadContext->Prng;
}		}

struct ScudoAllocator {		struct ScudoAllocator {
static const uptr MaxAllowedMallocSize =		static const uptr MaxAllowedMallocSize =
FIRST_32_SECOND_64(2UL << 30, 1ULL << 40);		FIRST_32_SECOND_64(2UL << 30, 1ULL << 40);

typedef ReturnNullOrDieOnFailure FailureHandler;		typedef ReturnNullOrDieOnFailure FailureHandler;

ScudoBackendAllocator BackendAllocator;		ScudoBackendAllocator BackendAllocator;
ScudoQuarantine AllocatorQuarantine;		ScudoQuarantine AllocatorQuarantine;

// The fallback caches are used when the thread local caches have been		// The fallback caches are used when the thread local caches have been
// 'detroyed' on thread tear-down. They are protected by a Mutex as they can		// 'detroyed' on thread tear-down. They are protected by a Mutex as they can
// be accessed by different threads.		// be accessed by different threads.
StaticSpinMutex FallbackMutex;		StaticSpinMutex FallbackMutex;
AllocatorCache FallbackAllocatorCache;		AllocatorCache FallbackAllocatorCache;
ScudoQuarantineCache FallbackQuarantineCache;		ScudoQuarantineCache FallbackQuarantineCache;
Xorshift128Plus FallbackPrng;		ScudoPrng FallbackPrng;

bool DeallocationTypeMismatch;		bool DeallocationTypeMismatch;
bool ZeroContents;		bool ZeroContents;
bool DeleteSizeMismatch;		bool DeleteSizeMismatch;

explicit ScudoAllocator(LinkerInitialized)		explicit ScudoAllocator(LinkerInitialized)
: AllocatorQuarantine(LINKER_INITIALIZED),		: AllocatorQuarantine(LINKER_INITIALIZED),
FallbackQuarantineCache(LINKER_INITIALIZED) {}		FallbackQuarantineCache(LINKER_INITIALIZED) {}
Show All 33 Lines	void init(const AllocatorOptions &Options) {
DeleteSizeMismatch = Options.DeleteSizeMismatch;		DeleteSizeMismatch = Options.DeleteSizeMismatch;
ZeroContents = Options.ZeroContents;		ZeroContents = Options.ZeroContents;
SetAllocatorMayReturnNull(Options.MayReturnNull);		SetAllocatorMayReturnNull(Options.MayReturnNull);
BackendAllocator.Init(Options.ReleaseToOSIntervalMs);		BackendAllocator.Init(Options.ReleaseToOSIntervalMs);
AllocatorQuarantine.Init(		AllocatorQuarantine.Init(
static_cast<uptr>(Options.QuarantineSizeMb) << 20,		static_cast<uptr>(Options.QuarantineSizeMb) << 20,
static_cast<uptr>(Options.ThreadLocalQuarantineSizeKb) << 10);		static_cast<uptr>(Options.ThreadLocalQuarantineSizeKb) << 10);
BackendAllocator.InitCache(&FallbackAllocatorCache);		BackendAllocator.InitCache(&FallbackAllocatorCache);
FallbackPrng.initFromURandom();		FallbackPrng.init();
Cookie = FallbackPrng.getNext();		Cookie = FallbackPrng.getU64();
}		}

// Helper function that checks for a valid Scudo chunk. nullptr isn't.		// Helper function that checks for a valid Scudo chunk. nullptr isn't.
bool isValidPointer(const void *UserPtr) {		bool isValidPointer(const void *UserPtr) {
initThreadMaybe();		initThreadMaybe();
if (UNLIKELY(!UserPtr))		if (UNLIKELY(!UserPtr))
return false;		return false;
uptr UserBeg = reinterpret_cast<uptr>(UserPtr);		uptr UserBeg = reinterpret_cast<uptr>(UserPtr);
Show All 22 Lines	if (UNLIKELY(AlignedSize >= MaxAllowedMallocSize))
return FailureHandler::OnBadRequest();		return FailureHandler::OnBadRequest();

// Primary and Secondary backed allocations have a different treatment. We		// Primary and Secondary backed allocations have a different treatment. We
// deal with alignment requirements of Primary serviced allocations here,		// deal with alignment requirements of Primary serviced allocations here,
// but the Secondary will take care of its own alignment needs.		// but the Secondary will take care of its own alignment needs.
bool FromPrimary = PrimaryAllocator::CanAllocate(AlignedSize, MinAlignment);		bool FromPrimary = PrimaryAllocator::CanAllocate(AlignedSize, MinAlignment);

void *Ptr;		void *Ptr;
uptr Salt;		u8 Salt;
uptr AllocationSize = FromPrimary ? AlignedSize : NeededSize;		uptr AllocationSize = FromPrimary ? AlignedSize : NeededSize;
uptr AllocationAlignment = FromPrimary ? MinAlignment : Alignment;		uptr AllocationAlignment = FromPrimary ? MinAlignment : Alignment;
ScudoThreadContext *ThreadContext = getThreadContextAndLock();		ScudoThreadContext *ThreadContext = getThreadContextAndLock();
if (LIKELY(ThreadContext)) {		if (LIKELY(ThreadContext)) {
Salt = getPrng(ThreadContext)->getNext();		Salt = getPrng(ThreadContext)->getU8();
Ptr = BackendAllocator.Allocate(getAllocatorCache(ThreadContext),		Ptr = BackendAllocator.Allocate(getAllocatorCache(ThreadContext),
AllocationSize, AllocationAlignment,		AllocationSize, AllocationAlignment,
FromPrimary);		FromPrimary);
ThreadContext->unlock();		ThreadContext->unlock();
} else {		} else {
SpinMutexLock l(&FallbackMutex);		SpinMutexLock l(&FallbackMutex);
Salt = FallbackPrng.getNext();		Salt = FallbackPrng.getU8();
Ptr = BackendAllocator.Allocate(&FallbackAllocatorCache, AllocationSize,		Ptr = BackendAllocator.Allocate(&FallbackAllocatorCache, AllocationSize,
AllocationAlignment, FromPrimary);		AllocationAlignment, FromPrimary);
}		}
if (UNLIKELY(!Ptr))		if (UNLIKELY(!Ptr))
return FailureHandler::OnOOM();		return FailureHandler::OnOOM();

// If requested, we will zero out the entire contents of the returned chunk.		// If requested, we will zero out the entire contents of the returned chunk.
if ((ForceZeroContents \|\| ZeroContents) && FromPrimary)		if ((ForceZeroContents \|\| ZeroContents) && FromPrimary)
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines	if (UNLIKELY(Header.State != ChunkAllocated)) {
dieWithMessage("ERROR: invalid chunk state when sizing address %p\n",		dieWithMessage("ERROR: invalid chunk state when sizing address %p\n",
Ptr);		Ptr);
}		}
return Chunk->getUsableSize(&Header);		return Chunk->getUsableSize(&Header);
}		}

void *calloc(uptr NMemB, uptr Size) {		void *calloc(uptr NMemB, uptr Size) {
initThreadMaybe();		initThreadMaybe();
if (CheckForCallocOverflow(NMemB, Size))		if (CheckForCallocOverflow(NMemB, Size))
		alekseyshlUnsubmitted Done Reply Inline Actions This change is unrelated, please revert. alekseyshl: This change is unrelated, please revert.
		cryptoadAuthorUnsubmitted Done Reply Inline Actions Ack. Slipped in from another one sorry. cryptoad: Ack. Slipped in from another one sorry.
return FailureHandler::OnBadRequest();		return FailureHandler::OnBadRequest();
return allocate(NMemB * Size, MinAlignment, FromMalloc, true);		return allocate(NMemB * Size, MinAlignment, FromMalloc, true);
}		}

void commitBack(ScudoThreadContext *ThreadContext) {		void commitBack(ScudoThreadContext *ThreadContext) {
AllocatorCache *Cache = getAllocatorCache(ThreadContext);		AllocatorCache *Cache = getAllocatorCache(ThreadContext);
AllocatorQuarantine.Drain(getQuarantineCache(ThreadContext),		AllocatorQuarantine.Drain(getQuarantineCache(ThreadContext),
QuarantineCallback(Cache));		QuarantineCallback(Cache));
Show All 15 Lines
}		}

static void initScudoInternal(const AllocatorOptions &Options) {		static void initScudoInternal(const AllocatorOptions &Options) {
Instance.init(Options);		Instance.init(Options);
}		}

void ScudoThreadContext::init() {		void ScudoThreadContext::init() {
getBackendAllocator().InitCache(&Cache);		getBackendAllocator().InitCache(&Cache);
Prng.initFromURandom();		Prng.init();
memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));		memset(QuarantineCachePlaceHolder, 0, sizeof(QuarantineCachePlaceHolder));
}		}

void ScudoThreadContext::commitBack() {		void ScudoThreadContext::commitBack() {
Instance.commitBack(this);		Instance.commitBack(this);
}		}

void *scudoMalloc(uptr Size, AllocType Type) {		void *scudoMalloc(uptr Size, AllocType Type) {
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

lib/scudo/scudo_tls.h

	Show All 24 Lines
	namespace __scudo {			namespace __scudo {

	// Platform specific base thread context definitions.			// Platform specific base thread context definitions.
	#include "scudo_tls_context_android.inc"			#include "scudo_tls_context_android.inc"
	#include "scudo_tls_context_linux.inc"			#include "scudo_tls_context_linux.inc"

	struct ALIGNED(64) ScudoThreadContext : public ScudoThreadContextPlatform {			struct ALIGNED(64) ScudoThreadContext : public ScudoThreadContextPlatform {
	AllocatorCache Cache;			AllocatorCache Cache;
	Xorshift128Plus Prng;			ScudoPrng Prng;
	uptr QuarantineCachePlaceHolder[4];			uptr QuarantineCachePlaceHolder[4];
	void init();			void init();
	void commitBack();			void commitBack();
	};			};

	void initThread();			void initThread();

	// Platform specific dastpath functions definitions.			// Platform specific dastpath functions definitions.
	#include "scudo_tls_android.inc"			#include "scudo_tls_android.inc"
	#include "scudo_tls_linux.inc"			#include "scudo_tls_linux.inc"

	} // namespace __scudo			} // namespace __scudo

	#endif // SCUDO_TLS_H_			#endif // SCUDO_TLS_H_

lib/scudo/scudo_utils.h

	Show All 30 Lines
	void NORETURN dieWithMessage(const char *Format, ...);			void NORETURN dieWithMessage(const char *Format, ...);

	enum CPUFeature {			enum CPUFeature {
	CRC32CPUFeature = 0,			CRC32CPUFeature = 0,
	MaxCPUFeature,			MaxCPUFeature,
	};			};
	bool testCPUFeature(CPUFeature feature);			bool testCPUFeature(CPUFeature feature);

	// Tiny PRNG based on https://en.wikipedia.org/wiki/Xorshift#xorshift.2B			INLINE u64 rotl(const u64 X, int K) {
	// The state (128 bits) will be stored in thread local storage.			return (X << K) \| (X >> (64 - K));
	struct Xorshift128Plus {			}

				// XoRoShiRo128+ PRNG (http://xoroshiro.di.unimi.it/).
				struct XoRoShiRo128Plus {
	public:			public:
	void initFromURandom();			void init() {
	u64 getNext() {			if (UNLIKELY(!GetRandom(reinterpret_cast<void *>(State), sizeof(State)))) {
	u64 x = State[0];			// Early processes (eg: init) do not have /dev/urandom yet, but we still
	const u64 y = State[1];			// have to provide them with some degree of entropy. Not having a secure
	State[0] = y;			// seed is not as problematic for them, as they are less likely to be
	x ^= x << 23;			// the target of heap based vulnerabilities exploitation attempts.
	State[1] = x ^ y ^ (x >> 17) ^ (y >> 26);			State[0] = NanoTime();
	return State[1] + y;			State[1] = 0;
				}
				fillCache();
	}			}
				u8 getU8() {
				if (UNLIKELY(isCacheEmpty()))
				fillCache();
				const u8 Result = static_cast<u8>(CachedBytes & 0xff);
				CachedBytes >>= 8;
				CachedBytesAvailable--;
				return Result;
				}
				u64 getU64() { return next(); }

	private:			private:
				u8 CachedBytesAvailable;
				u64 CachedBytes;
	u64 State[2];			u64 State[2];
				u64 next() {
				const u64 S0 = State[0];
				u64 S1 = State[1];
				const u64 Result = S0 + S1;
				S1 ^= S0;
				State[0] = rotl(S0, 55) ^ S1 ^ (S1 << 14);
				State[1] = rotl(S1, 36);
				return Result;
				}
				bool isCacheEmpty() {
				return CachedBytesAvailable == 0;
				}
				void fillCache() {
				CachedBytes = next();
				CachedBytesAvailable = sizeof(CachedBytes);
				alekseyshlUnsubmitted Not Done Reply Inline Actions sizeof(u64) -> sizeof(CachedBytes) alekseyshl: sizeof(u64) -> sizeof(CachedBytes)
				}
				alekseyshlUnsubmitted Not Done Reply Inline Actions Does this trick really help with performance? alekseyshl: Does this trick really help with performance?
				cryptoadAuthorUnsubmitted Not Done Reply Inline Actions The numbers do not differ enough to go one way or the other, but the u8 was taking up to extra 8bytes depending on architecture. It feels it should be faster, but it definitely save space. cryptoad: The numbers do not differ enough to go one way or the other, but the u8 was taking up to extra…
				cryptoadAuthorUnsubmitted Not Done Reply Inline Actions Actually scratch that. My machine was too loaded to give correct results. The initial version appears to be faster (with the u8): kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE kostyak@kostyak-linux:~$ ./rand [?] duration: 4009332558ns kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand kostyak@kostyak-linux:~$ ./rand [?] duration: 4788913046ns For 1<<32 iterations of `getU8`, and quite stable over the course of multiple runs. I am going to reintroduce it. cryptoad: Actually scratch that. My machine was too loaded to give correct results. The initial version…
				cryptoadAuthorUnsubmitted Not Done Reply Inline Actions It's actually a lot more nuanced and tricky than I was expecting. Seeding through /dev/urandom 1<< 12 times, and iterating 1<<20 times per seeding, we get the numbers below (stable over multiple runs). With clang, 32-bit seems equivalent, 64-bit favors the CachedBytesAvailable version. With gcc, 32-bit favors CachedBytesAvailable, 64-bit favors the other (and overall slower for either). kostyak@kostyak-linux:~$ clang++ -m32 -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE kostyak@kostyak-linux:~$ ./rand [?] duration: 4814670294ns kostyak@kostyak-linux:~$ clang++ -m32 -O3 rand.cc -o rand kostyak@kostyak-linux:~$ ./rand [?] duration: 4830693788ns kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE kostyak@kostyak-linux:~$ ./rand [?] duration: 3115400364ns kostyak@kostyak-linux:~$ clang++ -O3 rand.cc -o rand kostyak@kostyak-linux:~$ ./rand [?] duration: 4394574294ns kostyak@kostyak-linux:~$ g++ -m32 -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE kostyak@kostyak-linux:~$ ./rand [?] duration: 8782558601ns kostyak@kostyak-linux:~$ g++ -m32 -O3 rand.cc -o rand kostyak@kostyak-linux:~$ ./rand [?] duration: 9332069877ns kostyak@kostyak-linux:~$ g++ -O3 rand.cc -o rand -DWITH_CACHEDBYTESAVAILABLE kostyak@kostyak-linux:~$ ./rand [?] duration: 5651244009ns kostyak@kostyak-linux:~$ g++ -O3 rand.cc -o rand kostyak@kostyak-linux:~$ ./rand [?] duration: 4407575998ns Doing some ARM & Aarch64 tests additionally. At this point I still feel reintroducing CachedBytesAvailable might provide the most benefits. LMKWYT. cryptoad: It's actually a lot more nuanced and tricky than I was expecting. Seeding through /dev/urandom…
				alekseyshlUnsubmitted Not Done Reply Inline Actions Given the lack of the clear performance advantage, we should go with simpler and straightforward version (CachedBytesAvailable). alekseyshl: Given the lack of the clear performance advantage, we should go with simpler and…
				cryptoadAuthorUnsubmitted Not Done Reply Inline Actions And to make things a bit more complicated, on ARM & Aarch64 (at least with the Android toolchain), the version without CachedBytesAvailable appears faster for 32-bit, way faster for 64-bit. cryptoad: And to make things a bit more complicated, on ARM & Aarch64 (at least with the Android…
	};			};

				typedef XoRoShiRo128Plus ScudoPrng;

	} // namespace __scudo			} // namespace __scudo

	#endif // SCUDO_UTILS_H_			#endif // SCUDO_UTILS_H_

lib/scudo/scudo_utils.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	bool testCPUFeature(CPUFeature Feature) {
return false;		return false;
}		}
#else		#else
bool testCPUFeature(CPUFeature Feature) {		bool testCPUFeature(CPUFeature Feature) {
return false;		return false;
}		}
#endif // defined(__x86_64__) \|\| defined(__i386__)		#endif // defined(__x86_64__) \|\| defined(__i386__)

// readRetry will attempt to read Count bytes from the Fd specified, and if
// interrupted will retry to read additional bytes to reach Count.
static ssize_t readRetry(int Fd, u8 *Buffer, size_t Count) {
ssize_t AmountRead = 0;
while (static_cast<size_t>(AmountRead) < Count) {
ssize_t Result = read(Fd, Buffer + AmountRead, Count - AmountRead);
if (Result > 0)
AmountRead += Result;
else if (!Result)
break;
else if (errno != EINTR) {
AmountRead = -1;
break;
}
}
return AmountRead;
}

static void fillRandom(u8 *Data, ssize_t Size) {
int Fd = open("/dev/urandom", O_RDONLY);
if (Fd < 0) {
dieWithMessage("ERROR: failed to open /dev/urandom.\n");
}
bool Success = readRetry(Fd, Data, Size) == Size;
close(Fd);
if (!Success) {
dieWithMessage("ERROR: failed to read enough data from /dev/urandom.\n");
}
}

// Seeds the xorshift state with /dev/urandom.
// TODO(kostyak): investigate using getrandom() if available.
void Xorshift128Plus::initFromURandom() {
fillRandom(reinterpret_cast<u8 *>(State), sizeof(State));
}

} // namespace __scudo		} // namespace __scudo

This is an archive of the discontinued LLVM Phabricator instance.

[scudo] PRNG makeoverClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106203

lib/scudo/scudo_allocator.cpp

lib/scudo/scudo_tls.h

lib/scudo/scudo_utils.h

lib/scudo/scudo_utils.cpp

[scudo] PRNG makeover
ClosedPublic