This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] Change the 32-bit Primary AllocateRegion to reduce fragmentation
ClosedPublic

Authored by cryptoad on May 23 2017, 11:29 AM.

Details

Summary

Currently, AllocateRegion has a tendency to fragment memory: it allocates
2*kRegionSize, and if the memory is aligned, will unmap kRegionSize bytes,
thus creating a hole, which can't itself be reused for another region. This
is exacerbated by the fact that if 2 regions get allocated one after another
without any mmap in between, the second will be aligned due to mappings
generally being contiguous.

An idea, suggested by @alekseyshl, to prevent such a behavior is to have a
stash of regions: if the 2*kRegionSize allocation is properly aligned, split
it in two, and stash the second part to be returned next time a region is
requested.

At this point, I thought about a couple of ways to implement this:

  • either an IntrusiveList of regions candidates, storing next at the begining of the region;
  • a small array of regions candidates existing in the Primary.

While the second option is more constrained in terms of size, it offers several
advantages:

  • security wise, a pointer in a region candidate could be overflowed into, and abused when popping an element;
  • we do not dirty the first page of the region by storing something in it;
  • unless several threads request regions simultaneously from different size classes, the stash rarely goes above 1 entry.

I am not certain about the Windows impact of this change, as sanitizer_win.cc
has its own version of MmapAlignedOrDie, maybe someone could chime in on this.

MmapAlignedOrDie is effectively unused after this change and could be removed
at a later point. I didn't notice any sizeable performance gain, even though we
are saving a few mmap/munmap syscalls.

Event Timeline

cryptoad created this revision.May 23 2017, 11:29 AM
cryptoad edited the summary of this revision. (Show Details)May 23 2017, 11:50 AM
alekseyshl edited edge metadata.EditedMay 23 2017, 5:12 PM

How about having a stash for one region in SizeClassInfo? You won't need more than one since sci is locked and it is available when AllocateRegion is called, just pass it there.

alekseyshl added inline comments.May 23 2017, 5:16 PM
lib/sanitizer_common/sanitizer_allocator_primary32.h
347

I'd move the allocation into separate private function anyway.

cryptoad added a comment.EditedMay 23 2017, 5:58 PM

How about having a stash for one region in SizeClassInfo? You won't need more than one since sci is locked and it is available when AllocateRegion is called, just pass it there.

I thought about it since you mentioned it in your initial suggestion. It's not improbable to have several "hanging" regions within different SCI that wouldn't be used for a while. With 52 classes (and as many SCI), the worst case scenario would be 51MB of VA lost in stashes. In a regular use, I assume it would be several MB.
I feel like having a common stash, while adding a mutex, allows to avoid this, allowing to repurpose the stashed ones pretty quickly.

lib/sanitizer_common/sanitizer_allocator_primary32.h
347

Will do.

dvyukov edited edge metadata.May 24 2017, 1:37 AM

FWIW I've solved a similar problem in tcmalloc by trimming blocks either on left or on right. Namely, if heap grows up we trim on right; if heap grows down we trim on left. But I don't see how it is better than this solution.

I am not certain about the Windows impact of this change

I have not looked at the code, but windows has a thing called "allocation granularity" which is larger than page size (historically equal to 64K). OS allocator manages virtual address space only at that granularity. I.e. it is not possible to allocate/free a single page. That can be the reason for the different implementation.

looks good to me

How about having a stash for one region in SizeClassInfo? You won't need more than one since sci is locked and it is available when AllocateRegion is called, just pass it there.

I thought about it since you mentioned it in your initial suggestion. It's not improbable to have several "hanging" regions within different SCI that wouldn't be used for a while. With 52 classes (and as many SCI), the worst case scenario would be 51MB of VA lost in stashes. In a regular use, I assume it would be several MB.
I feel like having a common stash, while adding a mutex, allows to avoid this, allowing to repurpose the stashed ones pretty quickly.

You can store the region in size_class_info_array and traverse it instead of maintaining and traversing regions_stash, but it will complicate the synchronization. Ok, let's move the allocation into separate function and I'm fine with the rest.

cryptoad updated this revision to Diff 100113.May 24 2017, 9:24 AM

As per review feedback, move the new code into its own function.
Additionally, change the top file comment to reflect the new way of doing
things.

alekseyshl accepted this revision.May 24 2017, 11:36 AM

LGTM with some code structure suggestions which you're free to ignore, if you like your way better.

lib/sanitizer_common/sanitizer_allocator_primary32.h
287

I'd just inline map_res + map_size in the only place you use map_end.

288

Can we rename it to 'region'?

289

The name 'extra_region' is a bit narrower than what this flag indicates. How about changing the structure a bit:

bool trim_region = true;
if (IsAligned(res, kRegionSize)) {
   SpinMutexLock l(&regions_stash_mutex);
   if (num_stashed_regions < kMaxStashedRegions) {
     ...
     trim_region = false;
   }
}
if (trim_region) {
  trim both left and right
}

UnmapOrDie handles 0 sizes just fine.

306

I know, the result is the same, but doesn't it make more sense to just assign here? It aligns better with how 'end' is calculated.

map_size = kRegionSize;
358

Please add your "unless several threads request regions simultaneously from different size classes, the stash rarely goes above 1 entry" comment to this constant.

This revision is now accepted and ready to land.May 24 2017, 11:36 AM
cryptoad marked 7 inline comments as done.May 24 2017, 1:36 PM
cryptoad updated this revision to Diff 100156.May 24 2017, 1:36 PM

Including @alekseyshl's suggestions.

cryptoad updated this revision to Diff 100157.May 24 2017, 1:38 PM

Correcting grammar in the newly added comment.

alekseyshl accepted this revision.May 25 2017, 9:14 AM
cryptoad closed this revision.May 25 2017, 9:20 AM