This is an archive of the discontinued LLVM Phabricator instance.

Add introsort to avoid O(n^2) behavior and a benchmark for adversarial quick sort input.
ClosedPublic

Authored by nilayvaish on Nov 8 2021, 8:48 AM.

Details

Reviewers
jdoerfert
ldionne
Group Reviewers
Restricted Project
Commits
rG7f287390d78d: [libc++] Add introsort to avoid O(n^2) behavior
Summary
There are two commits in this request.  The first one adds a benchmark that tests std::sort on an adversarial inputs.  The second 
 commit adds intro sort to the std::sort.  Inputs where partitions are unbalanced even after 2 log(n) pivots have been selected, the 
 algorithm switches to heap sort to avoid the possibility of spending O(n^2) time on sorting the input.  Benchmark results show that
 the intro sort implementation does significantly better.

Benchmarking results before this change.  Time represents the sorting time
required per element.

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1                                   3.75 ns         3.74 ns    187432960
BM_Sort_uint32_QuickSortAdversary_4                                   3.05 ns         3.05 ns    231211008
BM_Sort_uint32_QuickSortAdversary_16                                  2.45 ns         2.45 ns    288096256
BM_Sort_uint32_QuickSortAdversary_64                                  32.8 ns         32.8 ns     21495808
BM_Sort_uint32_QuickSortAdversary_256                                  132 ns          132 ns      5505024
BM_Sort_uint32_QuickSortAdversary_1024                                 498 ns          497 ns      1572864
BM_Sort_uint32_QuickSortAdversary_16384                               3846 ns         3845 ns       262144
BM_Sort_uint32_QuickSortAdversary_262144                             61431 ns        61400 ns       262144
BM_Sort_uint64_QuickSortAdversary_1                                   3.93 ns         3.92 ns    181141504
BM_Sort_uint64_QuickSortAdversary_4                                   3.10 ns         3.09 ns    222560256
BM_Sort_uint64_QuickSortAdversary_16                                  2.50 ns         2.50 ns    283639808
BM_Sort_uint64_QuickSortAdversary_64                                  33.2 ns         33.2 ns     21757952
BM_Sort_uint64_QuickSortAdversary_256                                  132 ns          132 ns      5505024
BM_Sort_uint64_QuickSortAdversary_1024                                 478 ns          477 ns      1572864
BM_Sort_uint64_QuickSortAdversary_16384                               3932 ns         3930 ns       262144
BM_Sort_uint64_QuickSortAdversary_262144                             61646 ns        61615 ns       262144

Benchmarking results after this change

----------------------------------------------------------------------------------------------------------
Benchmark                                                                Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------
BM_Sort_uint32_QuickSortAdversary_1                                   6.31 ns         6.30 ns    107741184
BM_Sort_uint32_QuickSortAdversary_4                                   4.51 ns         4.50 ns    158859264
BM_Sort_uint32_QuickSortAdversary_16                                  3.00 ns         3.00 ns    223608832
BM_Sort_uint32_QuickSortAdversary_64                                  44.8 ns         44.8 ns     15990784
BM_Sort_uint32_QuickSortAdversary_256                                 69.0 ns         68.9 ns      9961472
BM_Sort_uint32_QuickSortAdversary_1024                                 118 ns          118 ns      6029312
BM_Sort_uint32_QuickSortAdversary_16384                                175 ns          175 ns      4194304
BM_Sort_uint32_QuickSortAdversary_262144                               210 ns          210 ns      3407872
BM_Sort_uint64_QuickSortAdversary_1                                   6.75 ns         6.73 ns    103809024
BM_Sort_uint64_QuickSortAdversary_4                                   4.53 ns         4.53 ns    160432128
BM_Sort_uint64_QuickSortAdversary_16                                  2.98 ns         2.97 ns    234356736 
BM_Sort_uint64_QuickSortAdversary_64                                  44.3 ns         44.3 ns     15990784
BM_Sort_uint64_QuickSortAdversary_256                                 69.2 ns         69.2 ns     10223616
BM_Sort_uint64_QuickSortAdversary_1024                                 119 ns          119 ns      6029312
BM_Sort_uint64_QuickSortAdversary_16384                                173 ns          173 ns      4194304
BM_Sort_uint64_QuickSortAdversary_262144                               212 ns          212 ns      3407872

Diff Detail

Event Timeline

nilayvaish created this revision.Nov 8 2021, 8:48 AM

Fix compilation issues with cxx03

nilayvaish updated this revision to Diff 385648.Nov 8 2021, 3:32 PM

Updated the commit message.

nilayvaish edited the summary of this revision. (Show Details)Nov 8 2021, 3:33 PM
nilayvaish published this revision for review.Nov 8 2021, 3:40 PM
nilayvaish edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne requested changes to this revision.Nov 9 2021, 7:03 AM
ldionne added inline comments.
libcxx/include/__algorithm/sort.h
264–273

It seems to me like you reformatted a lot of this code, making the change itself a lot more difficult to understand. Please do the re-formatting either before or after your functional change, in a separate "no functionality change" patch.

This revision now requires changes to proceed.Nov 9 2021, 7:03 AM
nilayvaish updated this revision to Diff 385819.Nov 9 2021, 7:44 AM
nilayvaish edited the summary of this revision. (Show Details)

Moved the diffs due to formatting by arc to a separate commit.

nilayvaish updated this revision to Diff 385823.Nov 9 2021, 7:49 AM

Remove formatting diffs from the diff. It seems Phabricator is unable to show the diffs between different commits.

nilayvaish updated this revision to Diff 385824.Nov 9 2021, 7:50 AM

Another attempt at uploading changes without formatting diff.

nilayvaish updated this revision to Diff 385833.Nov 9 2021, 8:15 AM
nilayvaish marked an inline comment as done.

Move the depth check after the initial switch so that small sized sorts are not affected by the depth check.

nilayvaish edited the summary of this revision. (Show Details)Nov 9 2021, 8:17 AM
nilayvaish added inline comments.Nov 9 2021, 8:20 AM
libcxx/include/__algorithm/sort.h
264–273

I have dropped the formatting changes. Those were done by arc and I just accepted them.

ldionne accepted this revision.Nov 16 2021, 8:09 AM

Do you need someone to commit this for you? If so, please provide Author Name <email@domain> for attribution. Thanks!

libcxx/include/__algorithm/sort.h
265
This revision is now accepted and ready to land.Nov 16 2021, 8:09 AM

Do you need someone to commit this for you? If so, please provide Author Name <email@domain> for attribution. Thanks!

Yes, if you can commit this for me that would be great. You can use 'Nilay Vaish <nilayvaish@google.com>' for attribution. Thanks for all the help!

This revision was automatically updated to reflect the committed changes.

We are seeing build breakages on Fuchsia builders after this change. It complains definition of implicit copy constructor is deprecated because it has a user-provided copy assignment operator. Example of the failure can be found at https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.arm64-release/b8830264610876104385/overview . It is confirmed through bisection. Code that failed to build can be found at https://fuchsia.googlesource.com/third_party/flatbuffers/+/f22618113aea6d04ed10bcaf28cc3621eea146d2/include/flatbuffers/flatbuffers.h#1154

Would you mind taking a look? We can suppress the error on our end but we still haven't figure out why this change could trigger this error(warning).

Quuxplusone added inline comments.
libcxx/include/__algorithm/sort.h
477

This line needs to use _VSTD::__introsort (ADL-proofing — hmm, how did this pass robust_against_adl.pass.cpp? I should investigate).
Also, it needs to use _Comp_ref<_Compare> instead of making a copy of __comp.
@nilayvaish, please hunt through the rest of this patch to see if there are any other similar cases. (I did not look.)

I will investigate adding a new test along the lines of algorithms/robust_*.cpp to verify that we never accidentally copy comparators or predicates.

nilayvaish added inline comments.Nov 17 2021, 7:54 PM
libcxx/include/__algorithm/sort.h
477

Noted. Will send a diff later today.

libcxx/include/__algorithm/sort.h
477

Actually, nvm, I got nerdsniped into taking a look myself: D114133

ayzhao added a subscriber: ayzhao.May 13 2022, 2:03 PM

We see that this patch is causing the size of the Chrome Android APK to increase by ~120 KB. This is confirmed via bisection. A breakdown of the symbols contributing to the size increase can be found at https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2022%2F05%2F10%2F1112911%2Fsupersize_diff.sizediff&group_by=template .

At this point in time, I'm not exactly sure why this patch would increase the size by such a large amount. Does anyone have any ideas?

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 2:03 PM

We see that this patch is causing the size of the Chrome Android APK to increase by ~120 KB. This is confirmed via bisection. A breakdown of the symbols contributing to the size increase can be found at https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2022%2F05%2F10%2F1112911%2Fsupersize_diff.sizediff&group_by=template .

At this point in time, I'm not exactly sure why this patch would increase the size by such a large amount. Does anyone have any ideas?

@From the symbol diffs on e.g. ng_grid_layout_algorithm.cc, it looks like we have lots of negative deltas for sort and larger positive deltas for introsort, which suggests that the difference is the cumulative effect of inlining many instances of __introsort with its slightly longer (len > 5) case in the new code.

Perhaps this code is being built with excessive inlining and needs -Os or similar? Perhaps Clang itself needs some inliner heuristic tuning? Perhaps this is an unavoidable size hit unless you want to take a speed hit?

We see that this patch is causing the size of the Chrome Android APK to increase by ~120 KB. This is confirmed via bisection. A breakdown of the symbols contributing to the size increase can be found at https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2022%2F05%2F10%2F1112911%2Fsupersize_diff.sizediff&group_by=template .

At this point in time, I'm not exactly sure why this patch would increase the size by such a large amount. Does anyone have any ideas?

@From the symbol diffs on e.g. ng_grid_layout_algorithm.cc, it looks like we have lots of negative deltas for sort and larger positive deltas for introsort, which suggests that the difference is the cumulative effect of inlining many instances of __introsort with its slightly longer (len > 5) case in the new code.

Perhaps this code is being built with excessive inlining and needs -Os or similar? Perhaps Clang itself needs some inliner heuristic tuning? Perhaps this is an unavoidable size hit unless you want to take a speed hit?

For documentation purposes:

I chatted offline with pkasting@ and we found out that adding __attribute__((noinline)) to __introsort actually made the binary size even worse - it added an additional ~40KB.

re: binary size - std::sort-related symbols make up more than 1% of Chrome's binary size (for arm32 Android at least). Any work to reduce the size overhead of std::sort() would be very welcome. From what I can tell, the main contributing factor is the templated nature of the function. It gets stamped out a *lot* of times, and in a way that identical-code-folding is often not applicable.

glad to see this land. we had a very similar patch almost 5 years ago but got lost in the review process: https://reviews.llvm.org/D36423

re: binary size - std::sort-related symbols make up more than 1% of Chrome's binary size (for arm32 Android at least). Any work to reduce the size overhead of std::sort() would be very welcome. From what I can tell, the main contributing factor is the templated nature of the function. It gets stamped out a *lot* of times, and in a way that identical-code-folding is often not applicable.

Will it help to explicitly specialize some of the commonly used templates in Chrome?

re: binary size - std::sort-related symbols make up more than 1% of Chrome's binary size (for arm32 Android at least). Any work to reduce the size overhead of std::sort() would be very welcome. From what I can tell, the main contributing factor is the templated nature of the function. It gets stamped out a *lot* of times, and in a way that identical-code-folding is often not applicable.

Will it help to explicitly specialize some of the commonly used templates in Chrome?

We enable identical code folding, so I don't see how that would help.

If we could identify at compile time when sort is being called with aggregates / PODs and route all such calls through qsort(), I think that might help.

I'm working on upgrading Meta's Android apps from libc++ 13 to libc++ 15, and seeing large size regressions caused by this change (similar to the Chrome case above). Was there any progress on figuring out how to offset the binary size increase?

(I recognize it's pretty annoying on my part to be raising the issue 9 months after the code was committed, and I apologize for that. We're working on being able to catch these issues earlier in the future.)