Replace the 32-bit allocator with a 64-bit one with a non-constant
base address, and reduce both the number of size classes and the maximum
size of per-thread caches.
As measured on [1], this reduces average weighted memory overhead
(MaxRSS) from 26% to 12% over stock android allocator. These numbers
include overhead from code instrumentation and hwasan shadow (i.e. not a
pure allocator benchmark).
This switch also enables release-to-OS functionality, which is not
implemented in the 32-bit allocator. I have not seen any effect from
that on the benchmark.
[1] https://android.googlesource.com/platform/system/extras/+/master/memory_replay/
@kcc btw these checks rely on the fact that this allocation is not the first in its size class. This assumption can be broken even w/o llvm/compiler-rt code changes - ex. by a change in libc startup code.
Not sure what to do about this. Maybe we should also describe the chunk that we think is the source of the overflow (that would be an improvement regardless) and test for that.