This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Improve shared_ptr dtor performance
ClosedPublic

Authored by bcraig on Jul 18 2016, 11:31 AM.

Details

Summary

Improve shared_ptr dtor performance

If the last destruction is uncontended, skip the atomic store on
__shared_weak_owners_. This shifts some costs from normal
shared_ptr usage to weak_ptr uses.

For x86_64, this results in an 8% improvement in shared_ptr ctor+dtor
performance.
Old benchmarks/shared_ptr_create_destroy.cpp: 26.8638 seconds
New benchmarks/shared_ptr_create_destroy.cpp: 24.6019 seconds

Weak_ptr increment / decrement is now slower.
Old benchmarks/weak_ptr_inc_dec_ref.cpp: 11.2892 seconds
New benchmarks/weak_ptr_inc_dec_ref.cpp: 14.5522 seconds

The increment / decrement code path did not degrade on X86_64.
Old benchmarks/shared_ptr_inc_dec_ref.cpp: 13.0896 seconds
New benchmarks/shared_ptr_inc_dec_ref.cpp: 13.0463 seconds

Diff Detail

Event Timeline

bcraig updated this revision to Diff 64357.Jul 18 2016, 11:31 AM
bcraig retitled this revision from to [libcxx] Improve shared_ptr dtor performance.
bcraig updated this object.
bcraig added reviewers: jfb, mclow.lists, EricWF.
bcraig added a subscriber: cfe-commits.
bcraig updated this revision to Diff 64504.Jul 19 2016, 8:10 AM
bcraig updated this object.

Added weak_ptr benchmark, as that's where the cost shifted.

EricWF accepted this revision.Jul 24 2016, 10:51 PM
EricWF edited edge metadata.

LGTM. Thank you for the thorough doc. It sure made the review easy on my end.

I took the liberty of rewriting your 3 benchmarks to use Google Benchmark: https://gist.github.com/EricWF/6ab9d3ca9315f2dcf8b0b8a7e47a9ac8
I'm hoping to have a singular benchmark format so it's easier to write tooling for. Here are the docs for building libc++'s benchmarks.

This revision is now accepted and ready to land.Jul 24 2016, 10:51 PM

I am going to submit the code changes and the tests independently. I'm having trouble getting cmake to use the right compiler for the libcxx-benchmarks target.

bcraig closed this revision.Aug 1 2016, 11:01 AM

committed rL277357: Improve shared_ptr dtor performance.