This is an archive of the discontinued LLVM Phabricator instance.

[scudo][standalone] Introduce the Primary(s) and LocalCache
ClosedPublic

Authored by cryptoad on May 9 2019, 11:15 AM.

Details

Summary

This CL introduces the 32 & 64-bit primary allocators, and associated
Local Cache. While the general idea is mostly similar to what exists
in sanitizer_common, it departs from the original code somewhat
significantly:

  • the 64-bit primary no longer uses a free array at the end of a region but uses batches of free blocks in region 0, allowing for a convergence with the 32-bit primary behavior;
  • as a result, there is only one (templated) local cache type for both primary allocators, and memory reclaiming can be implemented similarly for the 32-bit & 64-bit platforms;
  • 64-bit primary regions are handled a bit differently: we do not reserve 4TB of memory that we split, but reserve NumClasses * 2^RegionSizeLog, each region being offseted by a random number of pages from its computed base. A side effect of this is that the 64-bit primary works on 32-bit platform (I don't think we want to encourage it but it's an interesting side effect);

Event Timeline

cryptoad created this revision.May 9 2019, 11:15 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 9 2019, 11:15 AM
Herald added subscribers: Restricted Project, jfb, delcypher and 2 others. · View Herald Transcript

This could use a review please!

morehouse added inline comments.May 14 2019, 12:05 PM
lib/scudo/standalone/local_cache.h
18

Local cache and SizeClassAllocator are very tightly coupled. It's pretty difficult to understand what's going on.

For example, a call to SizeClassAllocatorLocalCache::allocate takes a SizeClassAllocator as a param and may call popBatch on that allocator. popBatch takes the current SizeClassAllocatorLocalCache as a param and passes it down through populateFreeList and populateBatch which may then call SizeClassAllocatorLocalCache::createBatch. createBatch may call allocate, which may then call into SizeClassAllocator::popBatch. Which makes a very confusing and ugly cycle in the call graph (whether it can happen in practice or not).

Surely there's a less convoluted way to implement this...

cryptoad added inline comments.May 14 2019, 2:05 PM
lib/scudo/standalone/local_cache.h
18

Right, this is fairly intricate and entangled. I'm going to see if I can sort this out.

cryptoad updated this revision to Diff 199834.May 16 2019, 8:10 AM

Simplify a lot of the functions prototypes by storing a pointer to the
size-class allocator in the cache, or getting the stats from the cache,
etc.

As for avoid the apparent cycle (note that in reality it only recurses
twice at most), currently I do not see a way of getting rid of it while
conserving the properties we want.

morehouse added inline comments.May 16 2019, 5:02 PM
lib/scudo/standalone/local_cache.h
79

Maybe just use C->ClassSize directly below.

96

Again, this line seems unnecessary.

150

We always return true, so why not void?

152

(Future improvement)
Why do we need to pass this to the allocator here? We just copy the returned batch to C->Chunks and destroy it anyway. Why not directly pass the PerClass object and have popBatch populate it? This could be faster and we can break the call graph cycle here.

153

What if B == nullptr?

163

What if B == nullptr?

lib/scudo/standalone/primary32.h
77

Why can't we release small size classes?

lib/scudo/standalone/primary64.h
121

static_cast

lib/scudo/standalone/tests/primary_test.cc
47

unique_ptr or allocate on stack

122

unique_ptr or allocate on stack

173

unique_ptr or allocate on stack

cryptoad marked 4 inline comments as done.May 17 2019, 10:29 AM
cryptoad added inline comments.
lib/scudo/standalone/local_cache.h
79

So this is a memory access trick. PerClassArray is accessed via Count first, and ClassSize is close to Count, while Chunks can be further off if Count is large.
This way we get ClassSize loaded in a register (or enticing the compiler to at least), before accessing Chunks.
I'll add a comment to explain this.

96

Same here.

152

Ack, good idea, thanks!

153

Batch creation failure is fatal, I'll add a CHECK.

163

Same.

lib/scudo/standalone/primary32.h
77

It's a tradeoff. Since we randomize allocations, smaller class sizes are likelier to be fragmented between allocated/available chunks, making them harder to be entirely empty.
On top of it, smaller size classes require more computing power (we iterate through more chunks).
The net benefit of trying to release smaller classes is not obvious in terms of lowering the RSS, hence the lower (arbitrary limit).
I want at some point to make that configurable via options.
I'll add some comments.

lib/scudo/standalone/primary64.h
121

Ack

lib/scudo/standalone/tests/primary_test.cc
122

Ack

173

Ack

cryptoad marked an inline comment as done.May 17 2019, 10:34 AM
cryptoad added inline comments.
lib/scudo/standalone/local_cache.h
150

With B==nullptr this will return false. Updating.

cryptoad added inline comments.May 17 2019, 10:36 AM
lib/scudo/standalone/local_cache.h
153

Nevermind it has to return false!

cryptoad updated this revision to Diff 200074.May 17 2019, 11:56 AM
cryptoad marked 18 inline comments as done.

Correcting some code constructs as suggested by Matt:

  • using unique_ptr in tests;
  • accounting for a potential null TransferBatch in refill;
  • other corrections.

On top of that, adding some comments to justify:

  • using extra variables to improve memory access patterns
  • the lower limit at which size classes can be released (and a TODO)
cryptoad updated this revision to Diff 200075.May 17 2019, 12:07 PM

Adding a forgotten CHECK as well.

This revision is now accepted and ready to land.May 17 2019, 12:18 PM
cryptoad updated this revision to Diff 200080.May 17 2019, 12:33 PM

Correcting a typo (meaningul -> meaningful)

This revision was automatically updated to reflect the committed changes.