This is an archive of the discontinued LLVM Phabricator instance.

[scudo] Mark all blocks in a range without visiting each of them
ClosedPublic

Authored by Chia-hungDuan on Jan 17 2023, 11:53 AM.

Details

Summary

When all the blocks in the group are known to be used, we should just
mark the pages in the range as all counted instead of visiting each of
them. This will reduce the time of marking free blocks especially for
smaller size class.

Diff Detail

Event Timeline

Chia-hungDuan created this revision.Jan 17 2023, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 11:53 AM
Chia-hungDuan requested review of this revision.Jan 17 2023, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 11:53 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Rebase to a utility change and add more tests

Do you have any numbers that show how many times the code chooses the mark all range path? Also, what is the difference in performance using the mark all range path? I assume the other path is variable so it maybe some kind of average savings.

compiler-rt/lib/scudo/standalone/primary32.h
785

from the same

786

tricky case

787

while

Don't need the doing.

793

This check looks odd. Don't the blocks have to be in the same group id to do range marking? I'm probably misunderstanding what this is actually checking though.

795

It's kind of confusing to do it this way. Can you reverse the if and else so that you are not checking nots and ors? I think it makes the code easier to understand when you make the check CandDoRangeMark && NumBlocks == MaxContainedBlocks.

compiler-rt/lib/scudo/standalone/release.h
348

to the

348

that

349

sitting across

compiler-rt/lib/scudo/standalone/tests/release_test.cpp
335

not supposed

350

sitting

355

straddling

Chia-hungDuan marked 9 inline comments as done.

Address review commnet

Do you have any numbers that show how many times the code chooses the mark all range path? Also, what is the difference in performance using the mark all range path? I assume the other path is variable so it maybe some kind of average savings.

Sure, I will bring some numbers later soon.

compiler-rt/lib/scudo/standalone/primary32.h
793

The trick thing is the current region may not have used up all region space. It only uses AllocatedUser. However, in PageReleaseContext, we have the logic of marking the last in-use page for each Region but the (RegionSize == AllocatedUser) introduces a different assumption which we need special handling for it. This condition will be removed in D143303 which simplify the block marking logic.

BTW, we don't release the last page in current region before D143303 either.

795

Yep, Agree! I thought the CanDoRangeMark would be removed in the later CL soon so I left it as it is. Also change the order in primary64 to make them consistent

Just a quick update,

Size class 32 w/o range marking

-- Average Operation Time -- -- Name (# of Calls) --
       1987080.0(ns)            DoPageRelease (1)
       1849800.0(ns)              PageMarking (1)
        135860.0(ns)              ReleasePage (1)

Size class 32 w/ range marking

-- Average Operation Time -- -- Name (# of Calls) --
       1435780.0(ns)            DoPageRelease (1)
       1283760.0(ns)              PageMarking (1)
        150330.0(ns)              ReleasePage (1)

The PageMarking difference is 566040 (ns). Total allocated blocks is 65536. This aims to simply measure the worst case only so I disable proactive releasing, i.e., only M_PURGE will release pages. Will collect more data later.

cferris accepted this revision.Feb 16 2023, 5:53 PM

Assuming all of the benchmarking keeps showing positive results, LGTM.

This revision is now accepted and ready to land.Feb 16 2023, 5:53 PM

The followings are measured by stressing certain size class, allocate up to 16 MB then randomly release them

size class 32
-- Average Operation Time -- -- Name (# of Calls) --
        987335.9(ns)            markFreeBlocks (39)
          2447.4(ns)            markRangeAsAllCounted (47)
size class 512

-- Average Operation Time -- -- Name (# of Calls) --
         26269.0(ns)            markFreeBlocks (51)
           495.8(ns)            markRangeAsAllCounted (16)
size class 8192
-- Average Operation Time -- -- Name (# of Calls) --
          9530.6(ns)            markFreeBlocks (49)
          1427.0(ns)            markRangeAsAllCounted (14)

Even the randomness, we can see that it still has some cases fall into range-marking. For real use cases on the phone, the frequency of falling into the range-marking is lower, usually happens when there's some heavy computation like scrolling the launcher. As a result, I think this still give good performance improvement in real cases.