This is an archive of the discontinued LLVM Phabricator instance.

[Sanitizers] Allocator: new "release memory to OS" implementation
ClosedPublic

Authored by alekseyshl on Sep 25 2017, 11:30 AM.

Details

Summary

The current implementation of the allocator returning freed memory
back to OS (controlled by allocator_release_to_os_interval_ms flag)
requires sorting of the free chunks list, which has two major issues,
first, when free list grows to millions of chunks, sorting, even the
fastest one, is just too slow, and second, sorting chunks in place
is unacceptable for Scudo allocator as it makes allocations more
predictable and less secure.

The proposed approach is linear in complexity (altough requires quite
a bit more temporary memory). The idea is to count the number of free
chunks on each memory page and release pages containing free chunks
only. It requires one iteration over the free list of chunks and one
iteration over the array of page counters. The obvious disadvantage
is the allocation of the array of the counters, but even in the worst
case we support (4T allocator space, 64 buckets, 16 bytes bucket size,
full free list, which leads to 2 bytes per page counter and ~17M page
counters), requires just about 34Mb of the intermediate buffer (comparing
to ~64Gb of actually allocated chunks) and usually it stays under 100K
and released after each use. It is expected to be a relatively rare event,
releasing memory back to OS, keeping the buffer between those runs
and added complexity of the bookkeeping seems unnesessary here (it can
always be improved later, though, never say never).

The most interesting problem here is how to calculate the number of chunks
falling into each memory page in the bucket. Skipping all the details,
there are three cases when the number of chunks per page is constant:

  1. P >= C, P % C == 0 --> N = P / C
  2. C > P , C % P == 0 --> N = 1
  3. C <= P, P % C != 0 && C % (P % C) == 0 --> N = P / C + 1

where P is page size, C is chunk size and N is the number of chunks per
page and the rest of the cases, where the number of chunks per page is
calculated on the go, during the page counter array iteration.

Among the rest, there are still cases where N can be deduced from the
page index, but they require not that much less calculations per page
than the current "brute force" way and 2/3 of the buckets fall into
the first three categories anyway, so, for the sake of simplicity,
it was decided to stick to those two variations. It can always be
refined and improved later, should we see that brute force way slows
us down unacceptably.

Diff Detail

Repository
rL LLVM

Event Timeline

alekseyshl created this revision.Sep 25 2017, 11:30 AM
cryptoad edited edge metadata.Sep 25 2017, 2:23 PM

I do not have comments on the code, it looks good to me.

There is a few points I was thinking about, not in the sense of suggestions for this CL, but rather general wondering:

  • How about about bumping last_release_at_ns when a region is grown? This would prevent the reclaiming to occur shortly after, and could lower the chances of releasing pages that were just allocated.
  • Could there be enough room in the free array to not have to map memory? eg: the memory between num_freed_chunks*sizeof(CompactPtrT) and mapped_free_array could be enough. I am not sure it would necessarily be a gain except maybe in memory usage. But since the memory is unmapped right after it might be worthless.
  • Did the tests show in any way what could be a good balance timing/cpu wise to set a default to?
eugenis edited edge metadata.Sep 25 2017, 3:46 PM

Why are you mapping the counters in a specific place just after the free array, instead of simply anywhere? I.e. why MAP_FIXED at all?

lib/sanitizer_common/sanitizer_allocator_primary64.h
317 ↗(On Diff #116579)

Either use a named constant instead of UINT64_MAX, or use 8 instead of sizeof(*buffer).

772 ↗(On Diff #116579)

These are const pointers to mutable objects. What's the point?

dvyukov edited edge metadata.Sep 25 2017, 11:48 PM

looks good to me

alekseyshl marked an inline comment as done.
  • Switched temporary counter buffer from fixed to random mapping and address minor style comments.

Why are you mapping the counters in a specific place just after the free array, instead of simply anywhere? I.e. why MAP_FIXED at all?

The original idea was to keep it mapped between iterations, but yes, it does not make much sense anymore and we can always get back to it later. Switched to just mapping.

lib/sanitizer_common/sanitizer_allocator_primary64.h
772 ↗(On Diff #116579)

The point is to indicate that this pointer do not change, but you just reminded me that actually I was going to make CompactPtrToPointer et al const and switch to const refs. Done.

I do not have comments on the code, it looks good to me.

There is a few points I was thinking about, not in the sense of suggestions for this CL, but rather general wondering:

  • How about about bumping last_release_at_ns when a region is grown? This would prevent the reclaiming to occur shortly after, and could lower the chances of releasing pages that were just allocated.
  • Could there be enough room in the free array to not have to map memory? eg: the memory between num_freed_chunks*sizeof(CompactPtrT) and mapped_free_array could be enough. I am not sure it would necessarily be a gain except maybe in memory usage. But since the memory is unmapped right after it might be worthless.
  • Did the tests show in any way what could be a good balance timing/cpu wise to set a default to?
  1. Bumping last_release_at_ns makes sense, let me think about it a bit more
  2. I'd rather not complicate the code just yet, but I like the idea of using whatever is already mapped. Will add a comment about it.
  3. This version is pretty much linear to the number of pages allocated, everything else does not matter that much, so the more memory allocated, the slower it is. Comparing to the previous version, even the larger apps I tried spent insignificant amount of time releasing pages. I'd like to collect more diverse data before drawing any conclusions (fuzzer would make a great test app).
  • Add a couple TODOs

I do not have comments on the code, it looks good to me.

There is a few points I was thinking about, not in the sense of suggestions for this CL, but rather general wondering:

  • How about about bumping last_release_at_ns when a region is grown? This would prevent the reclaiming to occur shortly after, and could lower the chances of releasing pages that were just allocated.
  • Could there be enough room in the free array to not have to map memory? eg: the memory between num_freed_chunks*sizeof(CompactPtrT) and mapped_free_array could be enough. I am not sure it would necessarily be a gain except maybe in memory usage. But since the memory is unmapped right after it might be worthless.
  • Did the tests show in any way what could be a good balance timing/cpu wise to set a default to?
  1. Bumping last_release_at_ns makes sense, let me think about it a bit more
  2. I'd rather not complicate the code just yet, but I like the idea of using whatever is already mapped. Will add a comment about it.
  3. This version is pretty much linear to the number of pages allocated, everything else does not matter that much, so the more memory allocated, the slower it is. Comparing to the previous version, even the larger apps I tried spent insignificant amount of time releasing pages. I'd like to collect more diverse data before drawing any conclusions (fuzzer would make a great test app).

Regarding 1), decided to have a TODO for now. I am concerned with the case when free_list is growing relatively slowly but enough to block release to os (on the other hand, if it grows constantly, it means that we should not release anything yet anyway). I'd rather do it as a separate experiment and separate patch.

Harbormaster completed remote builds in B10604: Diff 116690.
cryptoad accepted this revision.Sep 26 2017, 3:37 PM

LGTM on my side.

This revision is now accepted and ready to land.Sep 26 2017, 3:37 PM
eugenis accepted this revision.Sep 26 2017, 4:23 PM
This revision was automatically updated to reflect the committed changes.