A few small improvements and optimizations:
- when refilling the free list, push back the last batch and return the front one: this allows to keep the allocations towards the front of the region;
- instead of using 48 entries in the shuffle array, use a multiple of MaxNumCached;
- make the maximum number of batches to create on refil a constant; ultimately it should be configurable, but that's for later;
- initCache doesn't need to zero out the cache, it's already done.
- it turns out that when using || or &&, the compiler is adamant on adding a short circuit for every part of the expression. Which ends up making somewhat annoying asm with lots of test and conditional jump. I am changing that to bitwise | or & in two place so that the generated code looks better. Added comments since it might feel weird to people.
This yields to some small performance gains overall, nothing drastic