This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/scudo/
-
scudo/
-
scudo_allocator.cpp

Differential D32310

[scudo] Bypass Quarantine if its size is set to 0
ClosedPublic

Authored by cryptoad on Apr 20 2017, 2:41 PM.

Download Raw Diff

Details

Reviewers

kcc
dvyukov
alekseyshl

Commits

rGf1a54fdfd616: [scudo] Bypass Quarantine if its size is set to 0
rCRT301015: [scudo] Bypass Quarantine if its size is set to 0
rL301015: [scudo] Bypass Quarantine if its size is set to 0

Summary

In the current state of things, the deallocation path puts a chunk in the
Quarantine whether it's enabled or not (size of 0). When the Quarantine is
disabled, this results in the header being loaded (and checked) twice, and
stored (and checksummed) once, in deallocate and Recycle.

This change introduces a quarantineOrDeallocateChunk function that has a
fast path to deallocation if the Quarantine is disabled. Even though this is
not the preferred configuration security-wise, this change saves a sizeable
amount of processing for that particular situation (which could be adopted by
low memory devices). Additionally this simplifies a bit deallocate and
reallocate.

Diff Detail

Build Status

Buildable 5724
Build 5724: arc lint + arc unit

Event Timeline

cryptoad created this revision.Apr 20 2017, 2:41 PM

Harbormaster completed remote builds in B5724: Diff 96023.Apr 20 2017, 2:41 PM

If we actually plan to use such configuration, does it make sense to check header for corruption when we reallocate a block (in allocate)? That will give us at least some windows for UAF detection.

Also, we place header before user block, but buffer overruns are more common than underruns. Does it make sense to also check end of block for corruption (either add another checksum at the end, or move header to the end of block).

This revision is now accepted and ready to land.Apr 21 2017, 1:46 AM

In D32310#733329, @dvyukov wrote:

If we actually plan to use such configuration, does it make sense to check header for corruption when we reallocate a block (in allocate)? That will give us at least some windows for UAF detection.

I thought about this at some point and I think it wasn't reasonably achievable for a few reasons.
IIRC the main issue was that when a chunk is returned to the backend, we basically lose control over it's header.
If it's in the quarantine-batch size-class or transfer-batch size-class it could be overwritten by whatever is at the start of those structures.
The first allocation of a chunk being all 0 is also problematic, it could be a special case, but then wouldn't be distinguishable from an all 0 overwrite.

Also, we place header before user block, but buffer overruns are more common than underruns. Does it make sense to also check end of block for corruption (either add another checksum at the end, or move header to the end of block).

With the current implementation, we rely on the fact that what comes after chunk will also have a header (if in use), which will be checked in due time.
This is not 100% true due to batches, and is a weakness that is to be addressed.
The initial reasoning was that it was computationally cheaper to get the header before the chunk rather than behind (simple subtraction vs getting the size of the chunk each time).

Having an additional checksum or marker at the end of the chunk is a possibility.
I think it would be better as an option, since it would come with an additional performance hit, which at this point in time I am trying to avoid.

which will be checked in due time.

It's not checked until the object is freed, which may not happen at all. Attackers are good at laying out objects in the required order, so they place something long-living afterwards it won't be freed.
But it's up to you.

In D32310#733792, @dvyukov wrote:

It's not checked until the object is freed, which may not happen at all. Attackers are good at laying out objects in the required order, so they place something long-living afterwards it won't be freed.

I am in agreement with you.
I can also see that happening for both chunk A and B, whether the checksum is at the end of A or be beginning on B, eg: overflowing A into B without A or B getting freed for a long time.
I think this is a tough problem to solve with an allocator only, and hopefully the randomness of the chunks layout will help making harder to have B after A (though not impossible).

cryptoad closed this revision.Apr 21 2017, 11:23 AM

Revision Contents

Path

Size

lib/

scudo/

scudo_allocator.cpp

62 lines

Diff 96023

lib/scudo/scudo_allocator.cpp

Show First 20 Lines • Show All 454 Lines • ▼ Show 20 Lines	void *allocate(uptr Size, uptr Alignment, AllocType Type,
}		}
Header.Salt = static_cast<u8>(Prng.getNext());		Header.Salt = static_cast<u8>(Prng.getNext());
getScudoChunk(UserBeg)->storeHeader(&Header);		getScudoChunk(UserBeg)->storeHeader(&Header);
void UserPtr = reinterpret_cast<void >(UserBeg);		void UserPtr = reinterpret_cast<void >(UserBeg);
// if (&__sanitizer_malloc_hook) __sanitizer_malloc_hook(UserPtr, Size);		// if (&__sanitizer_malloc_hook) __sanitizer_malloc_hook(UserPtr, Size);
return UserPtr;		return UserPtr;
}		}

		// Place a chunk in the quarantine. In the event of a zero-sized quarantine,
		// we directly deallocate the chunk, otherwise the flow would lead to the
		// chunk being checksummed twice, once before Put and once in Recycle, with
		// no additional security value.
		void quarantineOrDeallocateChunk(ScudoChunk Chunk, UnpackedHeader Header,
		uptr Size) {
		bool BypassQuarantine = (AllocatorQuarantine.GetCacheSize() == 0);
		if (BypassQuarantine) {
		Chunk->eraseHeader();
		void *Ptr = Chunk->getAllocBeg(Header);
		if (LIKELY(!ThreadTornDown)) {
		getBackendAllocator().Deallocate(&Cache, Ptr);
		} else {
		SpinMutexLock Lock(&FallbackMutex);
		getBackendAllocator().Deallocate(&FallbackAllocatorCache, Ptr);
		}
		} else {
		UnpackedHeader NewHeader = *Header;
		NewHeader.State = ChunkQuarantine;
		Chunk->compareExchangeHeader(&NewHeader, Header);
		if (LIKELY(!ThreadTornDown)) {
		AllocatorQuarantine.Put(&ThreadQuarantineCache,
		QuarantineCallback(&Cache), Chunk, Size);
		} else {
		SpinMutexLock l(&FallbackMutex);
		AllocatorQuarantine.Put(&FallbackQuarantineCache,
		QuarantineCallback(&FallbackAllocatorCache),
		Chunk, Size);
		}
		}
		}

// Deallocates a Chunk, which means adding it to the delayed free list (or		// Deallocates a Chunk, which means adding it to the delayed free list (or
// Quarantine).		// Quarantine).
void deallocate(void *UserPtr, uptr DeleteSize, AllocType Type) {		void deallocate(void *UserPtr, uptr DeleteSize, AllocType Type) {
if (UNLIKELY(!ThreadInited))		if (UNLIKELY(!ThreadInited))
initThread();		initThread();
// if (&__sanitizer_free_hook) __sanitizer_free_hook(UserPtr);		// if (&__sanitizer_free_hook) __sanitizer_free_hook(UserPtr);
if (!UserPtr)		if (!UserPtr)
return;		return;
Show All 23 Lines	uptr Size = OldHeader.FromPrimary ? OldHeader.SizeOrUnusedBytes :
Chunk->getUsableSize(&OldHeader) - OldHeader.SizeOrUnusedBytes;		Chunk->getUsableSize(&OldHeader) - OldHeader.SizeOrUnusedBytes;
if (DeleteSizeMismatch) {		if (DeleteSizeMismatch) {
if (DeleteSize && DeleteSize != Size) {		if (DeleteSize && DeleteSize != Size) {
dieWithMessage("ERROR: invalid sized delete on chunk at address %p\n",		dieWithMessage("ERROR: invalid sized delete on chunk at address %p\n",
UserPtr);		UserPtr);
}		}
}		}

UnpackedHeader NewHeader = OldHeader;
NewHeader.State = ChunkQuarantine;
Chunk->compareExchangeHeader(&NewHeader, &OldHeader);

// If a small memory amount was allocated with a larger alignment, we want		// If a small memory amount was allocated with a larger alignment, we want
// to take that into account. Otherwise the Quarantine would be filled with		// to take that into account. Otherwise the Quarantine would be filled with
// tiny chunks, taking a lot of VA memory. This an approximation of the		// tiny chunks, taking a lot of VA memory. This is an approximation of the
// usable size, that allows us to not call GetActuallyAllocatedSize.		// usable size, that allows us to not call GetActuallyAllocatedSize.
uptr LiableSize = Size + (OldHeader.Offset << MinAlignment);		uptr LiableSize = Size + (OldHeader.Offset << MinAlignment);
if (LIKELY(!ThreadTornDown)) {		quarantineOrDeallocateChunk(Chunk, &OldHeader, LiableSize);
AllocatorQuarantine.Put(&ThreadQuarantineCache,
QuarantineCallback(&Cache), Chunk, LiableSize);
} else {
SpinMutexLock l(&FallbackMutex);
AllocatorQuarantine.Put(&FallbackQuarantineCache,
QuarantineCallback(&FallbackAllocatorCache),
Chunk, LiableSize);
}
}		}

// Reallocates a chunk. We can save on a new allocation if the new requested		// Reallocates a chunk. We can save on a new allocation if the new requested
// size still fits in the chunk.		// size still fits in the chunk.
void reallocate(void OldPtr, uptr NewSize) {		void reallocate(void OldPtr, uptr NewSize) {
if (UNLIKELY(!ThreadInited))		if (UNLIKELY(!ThreadInited))
initThread();		initThread();
uptr UserBeg = reinterpret_cast<uptr>(OldPtr);		uptr UserBeg = reinterpret_cast<uptr>(OldPtr);
if (UNLIKELY(!IsAligned(UserBeg, MinAlignment))) {		if (UNLIKELY(!IsAligned(UserBeg, MinAlignment))) {
dieWithMessage("ERROR: attempted to reallocate a chunk not properly "		dieWithMessage("ERROR: attempted to reallocate a chunk not properly "
"aligned at address %p\n", OldPtr);		"aligned at address %p\n", OldPtr);
}		}
ScudoChunk *Chunk = getScudoChunk(UserBeg);		ScudoChunk *Chunk = getScudoChunk(UserBeg);
UnpackedHeader OldHeader;		UnpackedHeader OldHeader;
Chunk->loadHeader(&OldHeader);		Chunk->loadHeader(&OldHeader);
if (UNLIKELY(OldHeader.State != ChunkAllocated)) {		if (UNLIKELY(OldHeader.State != ChunkAllocated)) {
dieWithMessage("ERROR: invalid chunk state when reallocating address "		dieWithMessage("ERROR: invalid chunk state when reallocating address "
"%p\n", OldPtr);		"%p\n", OldPtr);
}		}
if (UNLIKELY(OldHeader.AllocType != FromMalloc)) {		if (UNLIKELY(OldHeader.AllocType != FromMalloc)) {
dieWithMessage("ERROR: invalid chunk type when reallocating address %p\n",		dieWithMessage("ERROR: invalid chunk type when reallocating address %p\n",
OldPtr);		OldPtr);
}		}
uptr UsableSize = Chunk->getUsableSize(&OldHeader);		uptr UsableSize = Chunk->getUsableSize(&OldHeader);
UnpackedHeader NewHeader = OldHeader;
// The new size still fits in the current chunk, and the size difference		// The new size still fits in the current chunk, and the size difference
// is reasonable.		// is reasonable.
if (NewSize <= UsableSize &&		if (NewSize <= UsableSize &&
(UsableSize - NewSize) < (SizeClassMap::kMaxSize / 2)) {		(UsableSize - NewSize) < (SizeClassMap::kMaxSize / 2)) {
		UnpackedHeader NewHeader = OldHeader;
NewHeader.SizeOrUnusedBytes =		NewHeader.SizeOrUnusedBytes =
OldHeader.FromPrimary ? NewSize : UsableSize - NewSize;		OldHeader.FromPrimary ? NewSize : UsableSize - NewSize;
Chunk->compareExchangeHeader(&NewHeader, &OldHeader);		Chunk->compareExchangeHeader(&NewHeader, &OldHeader);
return OldPtr;		return OldPtr;
}		}
// Otherwise, we have to allocate a new chunk and copy the contents of the		// Otherwise, we have to allocate a new chunk and copy the contents of the
// old one.		// old one.
void *NewPtr = allocate(NewSize, MinAlignment, FromMalloc);		void *NewPtr = allocate(NewSize, MinAlignment, FromMalloc);
if (NewPtr) {		if (NewPtr) {
uptr OldSize = OldHeader.FromPrimary ? OldHeader.SizeOrUnusedBytes :		uptr OldSize = OldHeader.FromPrimary ? OldHeader.SizeOrUnusedBytes :
UsableSize - OldHeader.SizeOrUnusedBytes;		UsableSize - OldHeader.SizeOrUnusedBytes;
memcpy(NewPtr, OldPtr, Min(NewSize, OldSize));		memcpy(NewPtr, OldPtr, Min(NewSize, OldSize));
NewHeader.State = ChunkQuarantine;		quarantineOrDeallocateChunk(Chunk, &OldHeader, UsableSize);
Chunk->compareExchangeHeader(&NewHeader, &OldHeader);
if (LIKELY(!ThreadTornDown)) {
AllocatorQuarantine.Put(&ThreadQuarantineCache,
QuarantineCallback(&Cache), Chunk, UsableSize);
} else {
SpinMutexLock l(&FallbackMutex);
AllocatorQuarantine.Put(&FallbackQuarantineCache,
QuarantineCallback(&FallbackAllocatorCache),
Chunk, UsableSize);
}
}		}
return NewPtr;		return NewPtr;
}		}

// Helper function that returns the actual usable size of a chunk.		// Helper function that returns the actual usable size of a chunk.
uptr getUsableSize(const void *Ptr) {		uptr getUsableSize(const void *Ptr) {
if (UNLIKELY(!ThreadInited))		if (UNLIKELY(!ThreadInited))
initThread();		initThread();
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines