This is an archive of the discontinued LLVM Phabricator instance.

scudo: Table driven size classes for Android allocator.
ClosedPublic

Authored by pcc on Jan 31 2020, 6:29 PM.

Details

Summary

Add an optional table lookup after the existing logarithm computation
for MidSize < Size <= MaxSize during size -> class lookups. The lookup is
O(1) due to indexing a precomputed (via constexpr) table based on a size
table. Switch to this approach for the Android size class maps.

Other approaches considered:

  • Binary search was found to have an unacceptable (~30%) performance cost.
  • An approach using NEON instructions (see older version of D73824) was found to be slightly slower than this approach on newer SoCs but significantly slower on older ones.

By selecting the values in the size tables to minimize wastage (for example,
by passing the malloc_info output of a target program to the included
compute_size_class_config program), we can increase the density of allocations
at a small (~0.5% on bionic malloc_sql_trace as measured using an identity
table) performance cost.

Reduces RSS on specific Android processes as follows (KB):

Before  After

zygote (median of 50 runs) 26836 26792 (-0.2%)
zygote64 (median of 50 runs) 30384 30076 (-1.0%)
dex2oat (median of 3 runs) 375792 372952 (-0.8%)

I also measured the amount of whole-system idle dirty heap on Android by
rebooting the system and then running the following script repeatedly until
the results were stable:

for i in $(seq 1 50); do grep -A5 scudo: /proc/*/smaps | grep Pss: | cut -d: -f2 | awk '{s+=$1} END {print s}' ; sleep 1; done

I did this 3 times both before and after this change and the results were:

Before: 365650, 356795, 372663
After: 344521, 356328, 342589

These results are noisy so it is hard to make a definite conclusion, but
there does appear to be a significant effect.

On other platforms, increase the sizes of all size classes by a fixed offset
equal to the size of the allocation header. This has also been found to improve
density, since it is likely for allocation sizes to be a power of 2, which
would otherwise waste space by pushing the allocation into the next size class.

Diff Detail

Event Timeline

pcc created this revision.Jan 31 2020, 6:29 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 31 2020, 6:29 PM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript

Unit tests: fail. 62377 tests passed, 1 failed and 839 were skipped.

failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_timedmutex_requirements/thread_timedmutex_recursive/try_lock_until.pass.cpp

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

pcc updated this revision to Diff 242508.Feb 4 2020, 8:30 PM

New approach

pcc retitled this revision from [wip] table driven size classes to [wip] scudo: Table driven size classes for Android allocator..Feb 4 2020, 8:31 PM
pcc edited the summary of this revision. (Show Details)

Unit tests: unknown.

clang-tidy: fail. clang-tidy found 2 errors and 2 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

pcc updated this revision to Diff 243342.Feb 7 2020, 7:24 PM
pcc retitled this revision from [wip] scudo: Table driven size classes for Android allocator. to scudo: Table driven size classes for Android allocator..
pcc edited the summary of this revision. (Show Details)

Finalize

I do see RSS benefits in 32 bit, but the benefits are much larger in 64 bit runs. I see some performance fluctuations, but it looks like some got faster, some got slower, so probably in the noise.

On the whole, this does look like a net benefit.

I think this looks good.
I tried it on Android and it seems to be doing as advertised, have you tried this on Fuchsia/Linux? I'm currently giving it a go in g3.

compiler-rt/lib/scudo/standalone/size_class_map.h
90

Do we want to do this here or make sure that any call to this function will fall into the next if? I am not sure which one is better.

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
168

Why that line?

pcc marked 2 inline comments as done.Feb 10 2020, 10:07 AM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/size_class_map.h
90

Hmm, it seems like we would need something like a conditional either way to make sure that getClassIdBySize(16) doesn't return an invalid class id. The way that I've done it seems more clear/explicit.

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp
168

Sorry, this was just something that I added temporarily to keep the tests passing (because we're now moving between size classes in the test code here) and forgot to clean up. I'll fix this so that it's testing what is intended.

pcc updated this revision to Diff 243670.Feb 10 2020, 1:54 PM
  • Adjust the test to use MaxSize-64
pcc marked an inline comment as done.Feb 10 2020, 1:54 PM
cryptoad accepted this revision.Feb 10 2020, 2:00 PM

Thank you Peter!

This revision is now accepted and ready to land.Feb 10 2020, 2:00 PM
This revision was automatically updated to reflect the committed changes.