This is an archive of the discontinued LLVM Phabricator instance.

[scudo] Manage free blocks in BatchGroup.
ClosedPublic

Authored by Chia-hungDuan on Sep 14 2022, 2:20 PM.

Details

Summary

Scudo is supposed to allocate any blocks across the entired mapped
pages and each page is equally likely to be selected. Which means Scudo
is leaning to touch as many pages as possible. This brings better
security but it also sacrifices the chance of releasing dirty pages.

To alleviate the unmanagable footprint growing, this CL introduces the
BatchGroup concept. Each blocks will be classified into a BatchGroup
according to its address. While allocation, we are leaning to allocate
blocks in the same group first. Note that the blocks selected from a
group is still random over several pages. At the same time, we have
better prediction of dirty page growing speed. Besides, we are able to
do partial page releasing by examing part of BatchGroups.

Diff Detail

Event Timeline

Chia-hungDuan created this revision.Sep 14 2022, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 2:20 PM
Herald added a subscriber: Enna1. · View Herald Transcript
Chia-hungDuan requested review of this revision.Sep 14 2022, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2022, 2:20 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Do we have rough numbers as to how this lowers the "randomness" of the pool of blocks to be chosen for various classes?

Do we have rough numbers as to how this lowers the "randomness" of the pool of blocks to be chosen for various classes?

AFK, I'll update the numbers later

I did the evaluation with these steps,

In each round,

  1. It will do certain size allocation and have total 1 MB. (For example, for 128 size, it'll do 1 MB / 128 allocations)
  2. Count the address difference between two consecutive allocations
  3. Put the block to a set to keep track of number of different pointers
  4. Release the blocks randomly

It does total 10 rounds and it has 6 threads run in parallel. The tables down below compare the "block-group" and "original".

  1. There is almost the equivalent values in 64 bits configuration.
  2. There's a little difference in number of difference pointers. The "original" has a little higher number of different pointers. This is expected because of in 32 bits config, the group size is the same as region size, which is 256 KB (it's 1 MB in 64 bits config). The address difference is almost the same
64 bits config
Size (block-group)3212851240961638465536
# of diff ptrs13028929492479467714061
# of allocs32768081920204802560640160
Avg ptr diff261532021780302244120158787016539702276290
Size (original)3212851240961638465536
# of diff ptrs13225230392566756515051
# of allocs32768081920204802560640160
Avg ptr diff353795023360402516820179744016587601283180
32 bits config
Size (block-group)3212851240961638465536
# of diff ptrs24001222691410854814650
# of allocs32768081920204802560640160
Avg ptr diff39993400748122011188800347280035004104181950
Size(original)3212851240961638465536
# of diff ptrs27315923260523979217257
# of allocs32768081920204802560640160
Avg ptr diff4728820091313004941100248167032326402864730
cryptoad added inline comments.Sep 19 2022, 11:39 AM
compiler-rt/lib/scudo/standalone/local_cache.h
32

Add a DCHECK that Count + N doesn't overflow a u16

compiler-rt/lib/scudo/standalone/primary32.h
445

AppendSize is a u32 while the argument to appendFromArray is a u16.
We might want to make sure it fits prior to truncating it.

compiler-rt/lib/scudo/standalone/primary64.h
449

Same u16 vs u32 comment

Chia-hungDuan marked 2 inline comments as done.

Fix type truncation error

compiler-rt/lib/scudo/standalone/local_cache.h
32

Use

DCHECK_LE(N, MaxNumCached - Count);

Count has been checked to be smaller or equal to MaxNumCached.

compiler-rt/lib/scudo/standalone/primary32.h
445

Sorry, my bad. Didn't include the fix in this CL.
AppendSize and Unused are u16, did the required static_cast here.

abrachet added inline comments.
compiler-rt/lib/scudo/standalone/local_cache.h
72

This fails on Fuchsia because TransferBatch::MaxNumCached = SizeClassMap::MaxNumCachedHint which on Fuchsia is 10, if I switch it to 13, like Android this passes. Is this static_assert necessary or alternatively is there a downside to moving up Fuchsia's MaxNumCachedHint?

Chia-hungDuan added inline comments.Sep 29 2022, 11:23 AM
compiler-rt/lib/scudo/standalone/local_cache.h
72

To reduce the complexity of support block grouping, BatchGroup and TransferBatch are using the same block so this static_assert is ensuring they can be put in the same block.

The impact of increasing the MaxNumCachedHint is
(1. it will slightly increase the memory used by TransferBatch (TransferBatch is used to record the free blocks in the free list), In general, this won't bring huge impact on memory usage and
(2. it may cache a little more free blocks in thread local.

Given that we don't change the number a lot. I think the impact will be very small. On Fuchsia, I think set it to 12 will pass the assertion.

I'll review the size used by TransferBatch and BatchGroup later and maybe we will find some savings.

Do you think it's fine for Fuchsia to set it as 12?

abrachet added inline comments.Sep 29 2022, 2:03 PM
compiler-rt/lib/scudo/standalone/local_cache.h
72

Do you think it's fine for Fuchsia to set it as 12?

Sure, done in rG2449f42427481518e3ca072d7d8db26c7020c3ce

Mostly a lot of small things.

There are some times I gave suggestions for wording changes in comments, so whatever you do, make sure it matches the same/similar comment in primary32.h and primary64.h.

compiler-rt/lib/scudo/standalone/allocator_config.h
38

addresses

compiler-rt/lib/scudo/standalone/local_cache.h
146

Probably worth wrapping in an UNLIKELY().

153

Same as above, wrap in UNLIKELY()

compiler-rt/lib/scudo/standalone/primary32.h
125

Wrap in UNLIKELY()

143

Looks like a tab snuck in here.

157

Make sure to file a bug for this.

160

belonging

163

Should this be u16? Same for J? I don't think it really matters since you are only using them as array indices, so feel free to ignore.

385

from the same

391

most cases will fall

392

Did you mean "sooner or later"?

392

In that case,

393

worth sorting the array with the

396

Slightly awkward, maybe better as:

The region mutex needs to be held while calling this method.

435

Should this be u16?

437

This is a slightly confusing name since unused can mean the variable is unused. Would it be better to call it UnusedBatches?

456

Can this return nullptr?

489

As before, should these be u16? Same for the loop variable below?

518

As above, might be better worded as:

The region mutex needs to be held while calling this method.

But this is more of an opinion, so take it or leave it.

537

allocating

600–605

Given that it only happens

602

"treat" might be a better wording of this.

compiler-rt/lib/scudo/standalone/primary64.h
105

Add UNLIKELY

106

Another candidate for UNLIKELY.

124

Constructing

127

a recursive

I believe this is a duplicate comment, so make sure the two locations are the same.

140

belonging

438

As above, should this be u16?

440

As above, maybe "UnusedBlocks" to be clear about what this variable represents.

494

u16? As well as the I variable below?

520

the smallest

543

allocating

616

it only happens

617

treat

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
551

Extra line.

compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
337

times the

340

the

341

isn't

344

cross

350

This is going to cause the test to run slightly different every time. I don't know if that matters too much, but sometimes we use a known seed so the test always runs the same.

I'm not sure if that's necessary here, but something to think about.

357

including

Chia-hungDuan marked 33 inline comments as done.

Address review comment

compiler-rt/lib/scudo/standalone/primary32.h
163

This is for pushing blocks, so theoretically we accept a larger number of blocks to be reported at once and we will split them out into different TransferBatch according to the size of TransferBatch.

So I don't use u16 which is specific to mean size in TransferBatch.

392

Sorry I wanted to say "later soon". We call popBlocks right after populateFreeList, so the blocks are supposed to be popped soon

435

Same reason as above. u16 is only used by TransferBatch. We can push a number of blocks which exceed u16 and we will handle it properly. In reality, we don't.

437

Use "UnusedSlots" instead. It means the unused slots in a batch.

456

If it's supposed to return nullptr, then the oom execption will have thrown. Do you think we need to add any comment here?

489

Same as above. The size for TransferBatch and "Array of blocks" can have different scale.

compiler-rt/lib/scudo/standalone/primary64.h
438

Same as above

440

Changed to UnusedSlots

494

Same as above

compiler-rt/lib/scudo/standalone/tests/primary_test.cpp
350

Got it. I follow the usage as in release_test and secondary_test. For our case, we expect the different seed should still give the expected result. So I think it's fine here.

Chia-hungDuan added inline comments.Oct 6 2022, 11:20 AM
compiler-rt/lib/scudo/standalone/primary32.h
157

Done. Filed in internal

Just a few more suggestions.

compiler-rt/lib/scudo/standalone/primary32.h
392

Probably better to remove the later, and leave it as "to be popped soon."

393

I think this nit got missed.

456

No, if this aborts, then leaving it like this is fine.

Chia-hungDuan marked 2 inline comments as done.

Address review comment

Mark comments as done.

cferris accepted this revision.Oct 7 2022, 1:01 PM

LGTM

Any other reviewers, please take a quick look to make everything looks okay. If you feel that you might not understand all of the deeper changes, we'd still appreciate a higher level review to make sure there are no obvious errors.

This revision is now accepted and ready to land.Oct 7 2022, 1:01 PM
cryptoad accepted this revision.Oct 12 2022, 10:53 AM
This revision was landed with ongoing or failed builds.Oct 13 2022, 4:35 PM
This revision was automatically updated to reflect the committed changes.
nemanjai added a subscriber: nemanjai.

It would seem that either this commit or the commit for https://reviews.llvm.org/D134226 has caused terrible instability on a number of PPC bots:
https://lab.llvm.org/buildbot/#/builders/231/builds/3768
https://lab.llvm.org/buildbot/#/builders/230/builds/3987
https://lab.llvm.org/buildbot/#/builders/121/builds/24139

Do you know what the issue might be? It doesn't seem consistent on all the bots.