When all the blocks in the group are known to be used, we should just
mark the pages in the range as all counted instead of visiting each of
them. This will reduce the time of marking free blocks especially for
smaller size class.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Do you have any numbers that show how many times the code chooses the mark all range path? Also, what is the difference in performance using the mark all range path? I assume the other path is variable so it maybe some kind of average savings.
compiler-rt/lib/scudo/standalone/primary32.h | ||
---|---|---|
786 | from the same | |
787 | tricky case | |
788 | while Don't need the doing. | |
794 | This check looks odd. Don't the blocks have to be in the same group id to do range marking? I'm probably misunderstanding what this is actually checking though. | |
796 | It's kind of confusing to do it this way. Can you reverse the if and else so that you are not checking nots and ors? I think it makes the code easier to understand when you make the check CandDoRangeMark && NumBlocks == MaxContainedBlocks. | |
compiler-rt/lib/scudo/standalone/release.h | ||
348 | to the | |
348 | that | |
349 | sitting across | |
compiler-rt/lib/scudo/standalone/tests/release_test.cpp | ||
335 | not supposed | |
350 | sitting | |
355 | straddling |
Sure, I will bring some numbers later soon.
compiler-rt/lib/scudo/standalone/primary32.h | ||
---|---|---|
794 | The trick thing is the current region may not have used up all region space. It only uses AllocatedUser. However, in PageReleaseContext, we have the logic of marking the last in-use page for each Region but the (RegionSize == AllocatedUser) introduces a different assumption which we need special handling for it. This condition will be removed in D143303 which simplify the block marking logic. BTW, we don't release the last page in current region before D143303 either. | |
796 | Yep, Agree! I thought the CanDoRangeMark would be removed in the later CL soon so I left it as it is. Also change the order in primary64 to make them consistent |
Just a quick update,
Size class 32 w/o range marking
-- Average Operation Time -- -- Name (# of Calls) -- 1987080.0(ns) DoPageRelease (1) 1849800.0(ns) PageMarking (1) 135860.0(ns) ReleasePage (1)
Size class 32 w/ range marking
-- Average Operation Time -- -- Name (# of Calls) -- 1435780.0(ns) DoPageRelease (1) 1283760.0(ns) PageMarking (1) 150330.0(ns) ReleasePage (1)
The PageMarking difference is 566040 (ns). Total allocated blocks is 65536. This aims to simply measure the worst case only so I disable proactive releasing, i.e., only M_PURGE will release pages. Will collect more data later.
The followings are measured by stressing certain size class, allocate up to 16 MB then randomly release them
size class 32 -- Average Operation Time -- -- Name (# of Calls) -- 987335.9(ns) markFreeBlocks (39) 2447.4(ns) markRangeAsAllCounted (47)
size class 512 -- Average Operation Time -- -- Name (# of Calls) -- 26269.0(ns) markFreeBlocks (51) 495.8(ns) markRangeAsAllCounted (16)
size class 8192 -- Average Operation Time -- -- Name (# of Calls) -- 9530.6(ns) markFreeBlocks (49) 1427.0(ns) markRangeAsAllCounted (14)
Even the randomness, we can see that it still has some cases fall into range-marking. For real use cases on the phone, the frequency of falling into the range-marking is lower, usually happens when there's some heavy computation like scrolling the launcher. As a result, I think this still give good performance improvement in real cases.
from the same