Bitset Sort
Bitset Sort is a variant of quick sort, specifically BlockQuickSort. Bitset Sort uses a carefully written partition function to let the compiler generates SIMD instructions without actually writing SIMD intrinsics in the loop.
Bitset Sort is interface compatible with std::sort and meant to replace std::sort in libc++.
Bitset Sort is 3.4x faster (or spends 71% less time) than libc++ std::sort when sorting uint64s and 1.58x faster (or spends 37% less time) when sorting std::string.
Bitset Sort uses multiple techniques to improve runtime performance of sort. This includes sorting networks, a variant of merge sort called Bitonic Order Merge Sort that is faster for small N, and pattern recognitions.
Please see Github page for more detailed documentations and full results.