Can we instead do what bionic-benchmarks does on their malloc test, and touch each page to ensure residency (although we're only allocating 8-bytes at a time).
Similarly, they also bulk allocate -> bulk deallocate using a storage vector. Seems like a better solution - so we don't just hit the freelist 128 * 1024 times.
Can we make 128 * 1024 and 8 constants?
I have absolutely no idea why I read this loop as memcpy, but please ignore the first point.
Bulk allocate/delete would still be great.
Are we looking at different benchmarks? This is what I've been using:
I guess a storage vector would be more realistic, although it may end up conflating the speed of the allocator's main path with other things (e.g. mmap performance, reclaiming, cache (but maybe not because of the quarantine?)) so maybe it should be a separate benchmark?
(that wouldn't happen, the number here is the number of *bytes* to allocate, not the number of iterations)
Testing with DefaultConfig caused crashes for me, I think it was a D70760 style problem with the exclusive TSD but I couldn't seem to solve it. I'll add a FIXME here.