This is an archive of the discontinued LLVM Phabricator instance.

scudo: Add a basic malloc/free benchmark.
ClosedPublic

Authored by pcc on Dec 5 2019, 6:34 PM.

Diff Detail

Event Timeline

pcc created this revision.Dec 5 2019, 6:34 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added subscribers: Restricted Project, lebedev.ri, jfb, mgorny. · View Herald Transcript
pcc updated this revision to Diff 232489.Dec 5 2019, 6:40 PM
  • license header

Build result: pass - 60541 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

Build result: pass - 60541 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

hctim added inline comments.Dec 6 2019, 7:21 AM
compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
34

Can we instead do what bionic-benchmarks does on their malloc test, and touch each page to ensure residency (although we're only allocating 8-bytes at a time).

Similarly, they also bulk allocate -> bulk deallocate using a storage vector. Seems like a better solution - so we don't just hit the freelist 128 * 1024 times.

43

Missing DefaultConfig?

47

Can we make 128 * 1024 and 8 constants?

hctim added inline comments.Dec 6 2019, 7:52 AM
compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
34

I have absolutely no idea why I read this loop as memcpy, but please ignore the first point.

Bulk allocate/delete would still be great.

pcc marked 3 inline comments as done.Dec 6 2019, 10:45 AM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
34

Similarly, they also bulk allocate -> bulk deallocate using a storage vector.

Are we looking at different benchmarks? This is what I've been using:
https://android.googlesource.com/platform/bionic/+/master/benchmarks/stdlib_benchmark.cpp#41

I guess a storage vector would be more realistic, although it may end up conflating the speed of the allocator's main path with other things (e.g. mmap performance, reclaiming, cache (but maybe not because of the quarantine?)) so maybe it should be a separate benchmark?

we don't just hit the freelist 128 * 1024 times.

(that wouldn't happen, the number here is the number of *bytes* to allocate, not the number of iterations)

43

Testing with DefaultConfig caused crashes for me, I think it was a D70760 style problem with the exclusive TSD but I couldn't seem to solve it. I'll add a FIXME here.

47

Sure.

hctim added inline comments.Dec 6 2019, 12:02 PM
compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
34

BM_stdlib_malloc_multiple_8192_allocs_default - maybe it's worth doing both the retained/non-retained tests?

Could you please put a comment somewhere on how to use the benchmark?

compiler-rt/lib/scudo/standalone/benchmarks/malloc_benchmark.cpp
29

size_t maybe? getpagesize returns an int but that's awkward I feel, hence getPageSizeCached returning an uptr.

pcc updated this revision to Diff 232686.Dec 6 2019, 7:05 PM
pcc marked 3 inline comments as done.
  • Address review comments
pcc added a comment.Dec 6 2019, 7:05 PM

Could you please put a comment somewhere on how to use the benchmark?

Added a comment at the top of the cmake file.

Build result: pass - 60578 tests passed, 0 failed and 726 were skipped.

Log files: console-log.txt, CMakeCache.txt

cryptoad accepted this revision.Dec 6 2019, 9:26 PM

Thanks!

This revision is now accepted and ready to land.Dec 6 2019, 9:26 PM
hctim accepted this revision.Dec 9 2019, 8:08 AM
This revision was automatically updated to reflect the committed changes.