This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Floyd's improvement to pop_heap
ClosedPublic

Authored by Quuxplusone on Jan 23 2022, 12:00 PM.

Details

Summary
[libc++] "Bottom-up heapsort" improvement to sort_heap.

https://en.wikipedia.org/wiki/Heapsort#Bottom-up_heapsort
In `pop_heap` specifically, the item we insert at the top and
sift downward is guaranteed to be leaf-sized, so we expect it
to go pretty far down. Sift it down as if it were INT_MIN, and
then bubble it back up if needed.
Also known as "heapsort with bounce."

Numbers are here: https://godbolt.org/z/cvfnYW6fe

Fixes #10008.

Differential Revision: https://reviews.llvm.org/D118003

Fixes https://github.com/llvm/llvm-project/issues/10008

As the issue description says:

[__sift_down] is normally used to put in the correct position an element that was previously a leaf of a heap
(as in the pop_heap implementation). [When] that element was a leaf, it is likely that it will move down when
placed in the root.

An optimization is to move down the empty space at the root by swapping it with the largest child.
When the empty space becomes a leaf, the last element is moved to its place and moved up if necessary.

The advantage is that only one comparison per step is needed to move the empty space down.

A demo is here: https://godbolt.org/z/M8do8sWhT
For the same 100,000 random elements made into a heap using make_heap:
libstdc++ sort_heap takes 1,934,727 moves; 1,534,715 comparisons.
libc++ trunk sort_heap takes 1,900,731 moves; 2,831,757 comparisons.
libc++ after this patch takes 1,764,997 moves; 1,534,701 comparisons.
(Note that this really is apples to apples: hashing proves that libstdc++'s make_heap produces the exact same 100,000-element sequence as libc++'s make_heap.)

Diff Detail

Event Timeline

Quuxplusone requested review of this revision.Jan 23 2022, 12:00 PM
Quuxplusone created this revision.
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptJan 23 2022, 12:00 PM
philnik accepted this revision as: philnik.Jan 23 2022, 1:57 PM

LGTM % nits, but I'm not exactly an expert here.

libcxx/include/__algorithm/pop_heap.h
32
34–47

I'd rather see an early return, but I may be alone.

libcxx/include/__algorithm/sift_down.h
78

Is _LIBCPP_HIDE_FROM_ABI purposefully not here?

82
104–106
MTC added a subscriber: MTC.Jan 23 2022, 7:35 PM

I'm not too familiar with this code, but I see no real issues.

libcxx/include/__algorithm/sift_down.h
79

Since you use a know algorithm, please provide a link to the algorithm. Since the OP didn't know which paper I'm happy with just using https://en.wikipedia.org/wiki/Heapsort#Floyd's_heap_construction.

79

Can you add a benchmark for the new algorithm?

libcxx/include/__algorithm/sift_down.h
78

Nope, accidental... but maybe it should be on purpose? We don't put either inline _LIBCPP_HIDE_FROM_ABI or inline _LIBCPP_INLINE_VISIBILITY on ordinary __sift_down either; nor on __make_heap; nor on __nth_element... It seems like (at least in this old code) the visibility macro is intentionally affiliated with the inline keyword, and the inline keyword goes only on short functions. I don't know whether we should be following that rule, or fixing it. My default of course is to leave it as-is (i.e. not add _LIBCPP_HIDE_FROM_ABI here).

79

https://en.wikipedia.org/wiki/Heapsort#Floyd's_heap_construction is describing the make_heap side of things (which we already implement): make_heap can be done with push_heap in a loop but that's slow. This PR is the sort_heap side of things: sort_heap is still done with pop_heap in a loop but pop_heap itself can be faster.

Ah, but this is the very next subsection: https://en.wikipedia.org/wiki/Heapsort#Bottom-up_heapsort I'll add a link to the commit message and also change the summary to mention "bottom-up heapsort / heapsort with bounce" since those are more googleable than "Floyd's improvement" :)

I'll investigate benchmarking. See the current commit message for my informal benchmark https://godbolt.org/z/M8do8sWhT ; it probably won't take much to turn it into a libcxx/benchmark/ test... although at the same time I don't know that anyone ever actually notices regressions in those benchmarks.

82

FWIW this is not libc++'s historical style, but since Clang C++03 supports using= now, I'm cool with it if everyone else is. Will fix.

Quuxplusone marked 6 inline comments as done.Mar 7 2022, 10:01 AM
Quuxplusone added inline comments.
libcxx/include/__algorithm/sift_down.h
79

Turns out we already have libcxx/benchmarks/ code for this!
https://pastebin.com/iewiGLcN (before-bench.txt)
https://pastebin.com/MLP8f7y4 (after-bench.txt)

My hot takes are that:

  • it's really noisy (he says defensively); note the changes (mostly badwards) in MakeHeap, whose actual code is unaffected by this patch
  • this might be a pessimization for short arrays of trivial types, e.g. BM_MakeThenSortHeap_uint64_Random_262144 goes from 101ns to 104ns
  • this is, as expected, a major win for very long arrays of non-trivial types, e.g. BM_MakeThenSortHeap_string_Random_262144 goes from 832ns to 669ns
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 10:01 AM
Quuxplusone marked an inline comment as done.

s/typedef/using/
s/_VSTD/std/
I claim this is ready to land. @ldionne?

Quuxplusone edited the summary of this revision. (Show Details)Mar 7 2022, 10:04 AM
ldionne accepted this revision.Mar 7 2022, 10:32 AM

Can you add a release note? This is a nice improvement. I'd also like us to add a libc++ specific test that counts the number of comparisons. It can be pretty basic, but I'd like to regression-lock this improvement.

I tried poking holes into the implementation by grabbing the patch, making small changes (e.g. introducing off-by-ones) and literally every single line I touched would make the tests fail. In other words, I am fairly confident this is correct, not because I trust myself to validate the algorithm itself, but because I think our test coverage in that area is pretty solid.

LGTM with a regression test that checks for the number of comparisons and my __assert comment.

libcxx/include/__algorithm/sift_down.h
14

You want to include <__assert> here instead.

This revision is now accepted and ready to land.Mar 7 2022, 10:32 AM

Add complexity.pass.cpp for sort_heap and also for make_heap, since I've got those numbers already.
I thought of trying to make the assertions "fuzzy," but couldn't see a mathematically justifiable way of doing that, and besides, these numbers should never change unless they're deliberately being changed by someone who cares about these numbers! (As in this PR.)

Will land the tests first, with the old numbers, so that the main commit can braggingly update the numbers in place:

diff --git a/libcxx/test/libcxx/algorithms/alg.sorting/alg.heap.operations/sort.heap/complexity.pass.cpp b/libcxx/test/libcxx/algorithms/alg.sorting/alg.heap.operations/sort.heap/complexity.pass.cpp
index 28f82d61da33..8c10f81d767e 100644
--- a/libcxx/test/libcxx/algorithms/alg.sorting/alg.heap.operations/sort.heap/complexity.pass.cpp
+++ b/libcxx/test/libcxx/algorithms/alg.sorting/alg.heap.operations/sort.heap/complexity.pass.cpp
@@ -65,8 +65,8 @@ int main(int, char**)
   stats = {};
   std::sort_heap(v.begin(), v.end());
   assert(stats.copied == 0);
-  assert(stats.moved == 1'900'731);
-  assert(stats.compared == 2'831'757);
+  assert(stats.moved == 1'764'997);
+  assert(stats.compared == 1'534'701);
 
   assert(std::is_sorted(v.begin(), v.end()));

Re-poking CI, though, to make sure all platforms agree about these numbers. (They're deterministic AFAIK, but I could be wrong.)

Oops, also add a release note. @ldionne LGTY?

ldionne accepted this revision.Mar 7 2022, 2:41 PM

Argh. Unsupport the new complexity test on C++11 because it lacks 1'000'000 literals, and I think the 's are nice enough that I don't feel like losing them. It seems unimaginable that C++03/11 would have different complexity characteristics from C++14/17/20/2b (and if they did, really, I'm not sure we'd care).

For now, skip asserting the number of comparisons in debug mode (attn @ldionne, this is relevant to your interests).
I think the GCC11 failure in https://buildkite.com/llvm-project/libcxx-ci/builds/9373 was a fluke, but let's see if it happens again.

EricWF added a subscriber: EricWF.Mar 8 2022, 5:27 PM
EricWF added inline comments.
libcxx/include/__algorithm/sift_down.h
83

LIBCPP_ASSERT is not meant to guard against internal library logic errors.

101

You move __child_i here, but you reference it again at the start of the next loop iteration

Quuxplusone added inline comments.Mar 8 2022, 7:00 PM
libcxx/include/__algorithm/sift_down.h
83

You (or anyone) got any better suggestions? Should this just be a comment // __len >= 2 at this point?

101

Yikes! Thanks, fixed in 2b0ec7ca44ea227665f25b7dc3a3c097f16673cb — but tomorrow I'll think about ways to test for this. I wonder if the test iterators (like forward_iterator<T>) should be augmented to globally detect use-after-move.

EricWF added inline comments.Mar 8 2022, 7:57 PM
libcxx/include/__algorithm/sift_down.h
83

Write a test that would fail were this to be called (failing only under ASAN or MSAN is fine).

Quuxplusone added inline comments.Mar 9 2022, 6:28 AM
libcxx/include/__algorithm/sift_down.h
83

Write a test that would fail were this to be called

@EricWF: I don't understand that comment. My intention is that __floyd_sift_down is never called with __len < 2. If it were ever called with __len < 2, bad stuff would happen. In other words, what we really mean here is something like [[pre: __len >= 2]] or __builtin_assume(len >= 2), but libc++ traditionally hasn't used those, so I just reached for _LIBCPP_ASSERT as the handiest documentation tool. As I suggested above, I could just replace it with a comment // Precondition: __len >= 2, but I'm soliciting better suggestions.