Page MenuHomePhabricator

[scudo][standalone] Change the release loop for efficiency purposes
ClosedPublic

Authored by cryptoad on Jul 16 2020, 4:13 PM.

Details

Summary

On 32-b, the release algo loops multiple times over the freelist for a size
class, which lead to a decrease in performance when there were a lot of free
blocks.

This changes the release functions to loop only once over the freelist, at the
cost of using a little bit more memory for the release process: instead of
working on one region at a time, we pass the whole memory area covered by all
the regions for a given size class, and work on sub-areas of RegionSize in
this large area. For 64-b, we just have 1 sub-area encompassing the whole
region. Of course, not all the sub-areas within that large memory area will
belong to the class id we are working on, but those will just be left untouched
(which will not add to the RSS during the release process).

Diff Detail

Event Timeline

cryptoad created this revision.Jul 16 2020, 4:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2020, 4:13 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

I didn't merge properly, updating the CL.

cryptoad updated this revision to Diff 278624.Jul 16 2020, 4:18 PM

With proper merge of the primary*.h.

cryptoad updated this revision to Diff 278625.Jul 16 2020, 4:19 PM

With proper merge of the primary*.h.

cryptoad updated this revision to Diff 278838.Jul 17 2020, 10:27 AM

Corrected a bunch of things as apparently getting right into things
post vacation yielded a bunch of errors.

cferris accepted this revision.Jul 21 2020, 12:25 PM

I verified that this does not result in any loss in performance or RSS on Android. I also verified that this fixes the slow run of the test suite and that none of the tests cause a long os release.

This revision is now accepted and ready to land.Jul 21 2020, 12:25 PM
pcc added inline comments.Jul 23 2020, 4:33 PM
compiler-rt/lib/scudo/standalone/release.h
270

I guess this is equivalent to the code previously on lines 252-253. But I'm confused about why we would have more than one of these. In my mind these blocks would consume as much of the region as possible, which means that at the end we would just have a fraction of a block at most. Is that not the case?

cryptoad added inline comments.Jul 24 2020, 7:25 AM
compiler-rt/lib/scudo/standalone/release.h
270

I went for something generic, and I think you are right in the sense that the loop will only iterate once. I will verify it and update it if it's the case.

cryptoad added inline comments.Jul 24 2020, 10:02 AM
compiler-rt/lib/scudo/standalone/release.h
270

I ran some tests, and I was wrong. There cases where there is more than 1 block.
This is because we use AllocatedUser as the region size, which might not be a multiple of a page (it's bound by a maximum # of batches we want to allocate at a time), so there are several circumstances where there would be more than one block in the last page. Here are some examples that got logged on Android:
block size, last block, region size, rounded region size
320, 173e0, 17700, 18000
1c0, 5e800, 5e9c0, 5f000
120, baf60, bb080, bc000

So the loop ends up necessary.

pcc accepted this revision.Jul 24 2020, 10:27 AM

LGTM

compiler-rt/lib/scudo/standalone/release.h
270

I see. This is fine then, but I wonder whether there's some way that we can avoid wasting space at the end of the region like this. This is follow-up territory of course though.

This revision was automatically updated to reflect the committed changes.