For the 32-bit TransferBatch:
- SetFromArray callers have bounds count, so relax the CHECK to DCHECK;
- same for Add;
- mark CopyToArray as const;
For the 32-bit Primary:
- {Dea,A}llocateBatch are only called from places that check class_id, relax the CHECK to DCHECK;
- same for AllocateRegion;
- remove GetRegionBeginBySizeClass that is not used;
- use a local variable for the random shuffle state, so that the compiler can use a register instead of reading and writing to the SizeClassInfo at every iteration;
For the 32-bit local cache:
- pass the count to drain instead of doing a Min everytime which is at times superfluous.
Add a comment why you need a local variable here, otherwise someone will "optimize" it back later or, better yet, add a local variable in RandomShuffle and let the compiler do its job generating the best code possible.