This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Refactor deque::iterator algorithm optimizations
ClosedPublic

Authored by philnik on Aug 23 2022, 3:00 PM.

Details

Summary

This has multiple benefits:

  • The optimizations are also performed for the ranges:: versions of the algorithms
  • Code duplication is reduced
  • it is simpler to add this optimization for other segmented iterators, like ranges::join_view::iterator
  • Algorithm code is removed from <deque>

Diff Detail

Event Timeline

philnik created this revision.Aug 23 2022, 3:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 3:00 PM
Herald added a subscriber: mgorny. · View Herald Transcript
philnik requested review of this revision.Aug 23 2022, 3:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2022, 3:00 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik edited the summary of this revision. (Show Details)Aug 23 2022, 3:10 PM
philnik edited the summary of this revision. (Show Details)
philnik updated this revision to Diff 454984.Aug 23 2022, 3:13 PM
  • Generate files
philnik updated this revision to Diff 455151.Aug 24 2022, 4:04 AM
  • Try to fix CI

I found this approach "having a boolean + some operations" less clearer than defining a concept (or pseudo concept with constexpr bool + sfinae) with several customization points.

libcxx/include/__algorithm/copy.h
42

sentinel is always copyable.
this applies for other places

60

+= isn't required as part of your "concept" of segmented iterator

72

As we get more and more optimisations for different types, it is harder to make sure all of these overloads are mutually exclusive. Do you think this is (or will be) a problem?

libcxx/include/__iterator/segmented_iterator.h
18–26

It might be a good idea to put these functions in the primary template __segmented_iterator_traits and =delete them. This way it is more self documentary. What do you think?

huixie90 requested changes to this revision.Aug 25 2022, 6:06 AM
This revision now requires changes to proceed.Aug 25 2022, 6:06 AM
philnik marked an inline comment as done.Aug 25 2022, 7:17 AM
philnik added inline comments.
libcxx/include/__algorithm/copy.h
42

Since this is pre-existing I'd rather fix it in a follow-up. This patch is already quite large.

72

I think this is a problem. But I don't really have a good idea how to fix it. Using if constexpr would probably do the job, but we don't have that option.

libcxx/include/__iterator/segmented_iterator.h
18–26

I didn't define the return type more specifically here on purpose, so what should I return from the deleted functions?

huixie90 added inline comments.Aug 25 2022, 7:36 AM
libcxx/include/__algorithm/copy.h
42

sounds good to me

72

I feel that someone is going to promote his priority_tag thing

libcxx/include/__iterator/segmented_iterator.h
18–26

perhaps just return auto?

ldionne requested changes to this revision.Sep 8 2022, 10:49 AM

I really like this patch in spirit! It should be rebased onto main, though.

libcxx/include/__algorithm/copy.h
59

It seems to me that what you really need here is some kind of __copy_up_to primitive, where we would copy [__first, as-far-as-we-can-go) to [__segment.__begin_, segment-end). Then you would not need to know the size of the input segment, and this optimization would apply to non-random-access iterators as well.

It might not be worth trying to optimize for non random-access input iterators anyway though, since we will end up having to loop over each element regardless inside __copy_up_to, which is not much better than what we'd do without applying any segmented optimization at all.

libcxx/include/__iterator/segmented_iterator.h
13–27

I think your definition of segmented iterators is an excellent starting point, however we could make the following changes based on prior art (http://lafstern.org/matt/segmented.pdf):

// Segmented iterators are iterators over (not necessarily contiguous) sub-ranges.
//
// For example, std::deque stores its data into multiple blocks of contiguous memory,
// which are not stored contiguously themselves. The concept of segmented iterators
// allows algorithms to operate over these multi-level iterators natively, opening the
// door to various optimizations. See http://lafstern.org/matt/segmented.pdf for details.
//
// Given Traits = __segmented_iterator_traits<It>, if Traits::__is_segmented_iterator::value
// is true, the following functions and associated types must be provided:
// - Traits::__local_iterator
//    The type of iterators used to iterate inside a segment.
//
// - Traits::__segment_iterator
//    The type of iterators used to iterate over segments.
//    Segment iterators can be forward iterators or bidirectional iterators, depending on the
//    underlying data structure.
//
// - static __segment_iterator Traits::__segment(It __it)
//   Returns an iterator to the segment that the provided iterator is in.
//
// - static __local_iterator Traits::__local(It __it)
//   Returns the local iterator pointing to the element that the provided iterator points to.
//
// - static __local_iterator Traits::__begin(__segment_iterator __it)
//   Returns the local iterator to the beginning of the segment that the provided iterator is pointing into.
//
// - static __local_iterator Traits::__end(__segment_iterator __it)
//   Returns the one-past-the-end local iterator to the segment that the provided iterator is pointing into.

Then, you can use it as:

void f(__deque_iterator<T> __first, __deque_iterator<T> __last) {
    using _Traits = __segmented_iterator_traits<__deque_iterator<T> >;

    Traits::__segment_iterator __sfirst = _Traits::__segment(__first);
    Traits::__segment_iterator __slast = _Traits::__segment(__last);
    use(Traits::__local(__first), Traits::__end(__sfirst)); // special case the first block
    for (++__sfirst; __sfirst != __slast; ++__sfirst) {
      use(Traits::__begin(__sfirst), Traits::__end(__sfirst));
    }
    use(Traits::__begin(__slast), Traits::__local(__last));
}

At a glance, I think we would implement the traits for deque something like:

template <class T>
struct __segmented_iterator_traits<__deque_iterator<T> > {
  struct __segment_iterator {
    // fun stuff
  };

  using __local_iterator = T*;

  static __segment_iterator __segment(__deque_iterator<T> __it) {
    return __segment_iterator(__it.__m_iter_);
  }

  static __local_iterator __local(__deque_iterator<T> __it) {
    return __it.__ptr_;
  }

  static __local_iterator __begin(__segment_iterator __it) {
    return *__it.__m_iter_;
  }

  static __local_iterator __end(__segment_iterator __it) {
    return *__it.__m_iter_ + BLOCK_SIZE;
  }
};
libcxx/test/std/algorithms/alg.modifying.operations/alg.move/ranges.move.pass.cpp
82

Also fix ranges::move with contiguous iterators and non-iterator sentinels as a drive-by.

I would really prefer doing this separately. If @var-const 's patch does that already, can you simply double-check that your added tests pass now? Basically I want to make sure that you don't lose test coverage when you rebase on top of his change.

EricWF added a subscriber: EricWF.Sep 8 2022, 12:19 PM

My understanding of these algorithms is that they were hand-tuned to produce optimizer friendly (at the time) codegen.

How are we verifying the performance impact of this change? Or is that not a concern?

My understanding of these algorithms is that they were hand-tuned to produce optimizer friendly (at the time) codegen.

How are we verifying the performance impact of this change? Or is that not a concern?

Performance is clearly a concern, and in fact it's the whole purpose of this part of the codebase. I think we should take a look at the codegen before and after. We can also add a simple benchmark in our benchmark infrastructure.

That being said, this is a fairly macro optimization, not a micro optimization, so I'm less concerned about the exact codegen (not that we should ignore it). The real micro optimization happens inside the non-segmented version of std::__copy, where we use memmove if we can.

philnik updated this revision to Diff 460406.Sep 15 2022, 7:42 AM
philnik marked 4 inline comments as done.

Address comments

philnik marked 4 inline comments as done.Sep 15 2022, 7:46 AM

Here are the benchmarks for the current patch:

-----------------------------------------------------------------------
Benchmark                                             old           new
-----------------------------------------------------------------------
BM_deque_vector_copy/0                           0.272 ns       1.58 ns
BM_deque_vector_copy/1                            3.40 ns       1.85 ns
BM_deque_vector_copy/2                            3.30 ns       1.72 ns
BM_deque_vector_copy/64                           4.24 ns       2.90 ns
BM_deque_vector_copy/512                          18.6 ns       17.2 ns
BM_deque_vector_copy/1024                         41.3 ns       40.1 ns
BM_deque_vector_copy/4000                          145 ns        127 ns
BM_deque_vector_copy/4096                          148 ns        132 ns
BM_deque_vector_copy/5500                          196 ns        169 ns
BM_deque_vector_copy/64000                        5518 ns       5554 ns
BM_deque_vector_copy/65536                        5665 ns       5680 ns
BM_deque_vector_copy/70000                        5852 ns       5772 ns
BM_deque_vector_ranges_copy/0                    0.265 ns       1.58 ns
BM_deque_vector_ranges_copy/1                     1.30 ns       1.87 ns
BM_deque_vector_ranges_copy/2                     1.60 ns       1.72 ns
BM_deque_vector_ranges_copy/64                    23.5 ns       2.90 ns
BM_deque_vector_ranges_copy/512                    188 ns       16.9 ns
BM_deque_vector_ranges_copy/1024                   370 ns       39.5 ns
BM_deque_vector_ranges_copy/4000                  1437 ns        128 ns
BM_deque_vector_ranges_copy/4096                  1474 ns        128 ns
BM_deque_vector_ranges_copy/5500                  1990 ns        170 ns
BM_deque_vector_ranges_copy/64000                23213 ns       5503 ns
BM_deque_vector_ranges_copy/65536                23703 ns       5436 ns
BM_deque_vector_ranges_copy/70000                25287 ns       5897 ns
BM_deque_deque_copy/0                             1.06 ns       7.18 ns
BM_deque_deque_copy/1                             5.77 ns       12.8 ns
BM_deque_deque_copy/2                             5.63 ns       13.1 ns
BM_deque_deque_copy/64                            6.42 ns       15.2 ns
BM_deque_deque_copy/512                           21.4 ns       29.1 ns
BM_deque_deque_copy/1024                          43.0 ns       56.9 ns
BM_deque_deque_copy/4000                           114 ns        116 ns
BM_deque_deque_copy/4096                           171 ns        177 ns
BM_deque_deque_copy/5500                           236 ns        225 ns
BM_deque_deque_copy/64000                         5387 ns       5431 ns
BM_deque_deque_copy/65536                         5552 ns       5560 ns
BM_deque_deque_copy/70000                         5882 ns       5941 ns
BM_deque_deque_ranges_copy/0                     0.793 ns       7.12 ns
BM_deque_deque_ranges_copy/1                      1.85 ns       12.8 ns
BM_deque_deque_ranges_copy/2                      2.38 ns       13.0 ns
BM_deque_deque_ranges_copy/64                     44.4 ns       15.2 ns
BM_deque_deque_ranges_copy/512                     281 ns       29.1 ns
BM_deque_deque_ranges_copy/1024                    555 ns       56.9 ns
BM_deque_deque_ranges_copy/4000                   2155 ns        115 ns
BM_deque_deque_ranges_copy/4096                   2217 ns        177 ns
BM_deque_deque_ranges_copy/5500                   2977 ns        226 ns
BM_deque_deque_ranges_copy/64000                 34584 ns       5432 ns
BM_deque_deque_ranges_copy/65536                 35419 ns       5572 ns
BM_deque_deque_ranges_copy/70000                 37847 ns       5972 ns
BM_vector_deque_copy/0                           0.585 ns      0.529 ns
BM_vector_deque_copy/1                            2.98 ns       2.73 ns
BM_vector_deque_copy/2                            2.79 ns       2.62 ns
BM_vector_deque_copy/64                           3.70 ns       3.44 ns
BM_vector_deque_copy/512                          13.5 ns       13.6 ns
BM_vector_deque_copy/1024                         40.0 ns       40.1 ns
BM_vector_deque_copy/4000                          145 ns        144 ns
BM_vector_deque_copy/4096                          147 ns        147 ns
BM_vector_deque_copy/5500                          199 ns        201 ns
BM_vector_deque_copy/64000                        5452 ns       5462 ns
BM_vector_deque_copy/65536                        5718 ns       5691 ns
BM_vector_deque_copy/70000                        5985 ns       5988 ns
BM_vector_deque_ranges_copy/0                    0.529 ns      0.528 ns
BM_vector_deque_ranges_copy/1                     1.06 ns       2.69 ns
BM_vector_deque_ranges_copy/2                     1.46 ns       2.65 ns
BM_vector_deque_ranges_copy/64                    23.2 ns       3.51 ns
BM_vector_deque_ranges_copy/512                    187 ns       14.0 ns
BM_vector_deque_ranges_copy/1024                   369 ns       40.2 ns
BM_vector_deque_ranges_copy/4000                  1440 ns        145 ns
BM_vector_deque_ranges_copy/4096                  1474 ns        146 ns
BM_vector_deque_ranges_copy/5500                  1979 ns        202 ns
BM_vector_deque_ranges_copy/64000                22995 ns       5512 ns
BM_vector_deque_ranges_copy/65536                23549 ns       5695 ns
BM_vector_deque_ranges_copy/70000                25182 ns       6064 ns
BM_deque_vector_move/0                           0.266 ns       1.61 ns
BM_deque_vector_move/1                            3.26 ns       1.91 ns
BM_deque_vector_move/2                            3.23 ns       1.73 ns
BM_deque_vector_move/64                           4.24 ns       2.90 ns
BM_deque_vector_move/512                          14.1 ns       13.5 ns
BM_deque_vector_move/1024                         40.4 ns       39.5 ns
BM_deque_vector_move/4000                          144 ns        128 ns
BM_deque_vector_move/4096                          144 ns        128 ns
BM_deque_vector_move/5500                          200 ns        170 ns
BM_deque_vector_move/64000                        5482 ns       5481 ns
BM_deque_vector_move/65536                        5436 ns       5397 ns
BM_deque_vector_move/70000                        5846 ns       5795 ns
BM_deque_vector_ranges_move/0                    0.264 ns       1.58 ns
BM_deque_vector_ranges_move/1                     1.32 ns       1.85 ns
BM_deque_vector_ranges_move/2                     1.59 ns       1.72 ns
BM_deque_vector_ranges_move/64                    35.3 ns       2.90 ns
BM_deque_vector_ranges_move/512                    202 ns       13.5 ns
BM_deque_vector_ranges_move/1024                   395 ns       39.5 ns
BM_deque_vector_ranges_move/4000                  1548 ns        128 ns
BM_deque_vector_ranges_move/4096                  1559 ns        128 ns
BM_deque_vector_ranges_move/5500                  2133 ns        170 ns
BM_deque_vector_ranges_move/64000                24609 ns       5478 ns
BM_deque_vector_ranges_move/65536                25182 ns       5396 ns
BM_deque_vector_ranges_move/70000                27414 ns       5783 ns
BM_deque_deque_move/0                             1.06 ns       7.12 ns
BM_deque_deque_move/1                             5.67 ns       12.7 ns
BM_deque_deque_move/2                             5.54 ns       12.9 ns
BM_deque_deque_move/64                            6.37 ns       15.1 ns
BM_deque_deque_move/512                           21.3 ns       28.8 ns
BM_deque_deque_move/1024                          43.0 ns       56.8 ns
BM_deque_deque_move/4000                           112 ns        116 ns
BM_deque_deque_move/4096                           171 ns        177 ns
BM_deque_deque_move/5500                           233 ns        228 ns
BM_deque_deque_move/64000                         5378 ns       5442 ns
BM_deque_deque_move/65536                         5546 ns       5578 ns
BM_deque_deque_move/70000                         5877 ns       5932 ns
BM_deque_deque_ranges_move/0                     0.792 ns       7.12 ns
BM_deque_deque_ranges_move/1                      1.85 ns       12.7 ns
BM_deque_deque_ranges_move/2                      2.38 ns       13.0 ns
BM_deque_deque_ranges_move/64                     43.9 ns       15.1 ns
BM_deque_deque_ranges_move/512                     281 ns       28.8 ns
BM_deque_deque_ranges_move/1024                    560 ns       56.8 ns
BM_deque_deque_ranges_move/4000                   2171 ns        115 ns
BM_deque_deque_ranges_move/4096                   2245 ns        177 ns
BM_deque_deque_ranges_move/5500                   3013 ns        225 ns
BM_deque_deque_ranges_move/64000                 35085 ns       5429 ns
BM_deque_deque_ranges_move/65536                 35939 ns       5560 ns
BM_deque_deque_ranges_move/70000                 38388 ns       5935 ns
BM_vector_deque_move/0                           0.597 ns      0.534 ns
BM_vector_deque_move/1                            2.83 ns       2.71 ns
BM_vector_deque_move/2                            2.70 ns       2.60 ns
BM_vector_deque_move/64                           3.68 ns       3.44 ns
BM_vector_deque_move/512                          13.5 ns       13.6 ns
BM_vector_deque_move/1024                         39.9 ns       40.1 ns
BM_vector_deque_move/4000                          145 ns        144 ns
BM_vector_deque_move/4096                          146 ns        146 ns
BM_vector_deque_move/5500                          200 ns        200 ns
BM_vector_deque_move/64000                        5454 ns       5460 ns
BM_vector_deque_move/65536                        5722 ns       5686 ns
BM_vector_deque_move/70000                        5986 ns       5984 ns
BM_vector_deque_ranges_move/0                    0.539 ns      0.528 ns
BM_vector_deque_ranges_move/1                     1.06 ns       2.71 ns
BM_vector_deque_ranges_move/2                     1.47 ns       2.58 ns
BM_vector_deque_ranges_move/64                    24.0 ns       3.44 ns
BM_vector_deque_ranges_move/512                    189 ns       13.6 ns
BM_vector_deque_ranges_move/1024                   375 ns       40.1 ns
BM_vector_deque_ranges_move/4000                  1436 ns        144 ns
BM_vector_deque_ranges_move/4096                  1472 ns        146 ns
BM_vector_deque_ranges_move/5500                  1977 ns        200 ns
BM_vector_deque_ranges_move/64000                22981 ns       5466 ns
BM_vector_deque_ranges_move/65536                23577 ns       5688 ns
BM_vector_deque_ranges_move/70000                25131 ns       5985 ns
BM_deque_vector_copy_backward/0                  0.264 ns       1.58 ns
BM_deque_vector_copy_backward/1                   2.96 ns       1.86 ns
BM_deque_vector_copy_backward/2                   3.55 ns       1.72 ns
BM_deque_vector_copy_backward/64                  4.49 ns       2.93 ns
BM_deque_vector_copy_backward/512                 16.1 ns       13.2 ns
BM_deque_vector_copy_backward/1024                41.1 ns       40.0 ns
BM_deque_vector_copy_backward/4000                 151 ns        126 ns
BM_deque_vector_copy_backward/4096                 145 ns        127 ns
BM_deque_vector_copy_backward/5500                 211 ns        170 ns
BM_deque_vector_copy_backward/64000               5471 ns       5506 ns
BM_deque_vector_copy_backward/65536               5439 ns       5415 ns
BM_deque_vector_copy_backward/70000               5838 ns       5786 ns
BM_deque_vector_ranges_copy_backward/0           0.264 ns       1.58 ns
BM_deque_vector_ranges_copy_backward/1            1.17 ns       1.85 ns
BM_deque_vector_ranges_copy_backward/2            1.45 ns       1.73 ns
BM_deque_vector_ranges_copy_backward/64           26.0 ns       3.11 ns
BM_deque_vector_ranges_copy_backward/512           147 ns       13.3 ns
BM_deque_vector_ranges_copy_backward/1024          282 ns       40.1 ns
BM_deque_vector_ranges_copy_backward/4000         1103 ns        127 ns
BM_deque_vector_ranges_copy_backward/4096         1131 ns        127 ns
BM_deque_vector_ranges_copy_backward/5500         1514 ns        170 ns
BM_deque_vector_ranges_copy_backward/6400        17553 ns       5515 ns
BM_deque_vector_ranges_copy_backward/6553        17944 ns       5415 ns
BM_deque_vector_ranges_copy_backward/7000        19183 ns       5784 ns
BM_deque_deque_copy_backward/0                    1.16 ns       1.32 ns
BM_deque_deque_copy_backward/1                    6.58 ns       3.17 ns
BM_deque_deque_copy_backward/2                    6.87 ns       3.17 ns
BM_deque_deque_copy_backward/64                   7.77 ns       4.19 ns
BM_deque_deque_copy_backward/512                  24.3 ns       19.2 ns
BM_deque_deque_copy_backward/1024                 46.2 ns       43.5 ns
BM_deque_deque_copy_backward/4000                  121 ns        101 ns
BM_deque_deque_copy_backward/4096                  179 ns        163 ns
BM_deque_deque_copy_backward/5500                  247 ns        216 ns
BM_deque_deque_copy_backward/64000                5362 ns       5408 ns
BM_deque_deque_copy_backward/65536                5474 ns       5647 ns
BM_deque_deque_copy_backward/70000                5856 ns       5948 ns
BM_deque_deque_ranges_copy_backward/0            0.792 ns       1.33 ns
BM_deque_deque_ranges_copy_backward/1             2.04 ns       3.17 ns
BM_deque_deque_ranges_copy_backward/2             2.93 ns       3.17 ns
BM_deque_deque_ranges_copy_backward/64            56.0 ns       4.19 ns
BM_deque_deque_ranges_copy_backward/512            372 ns       19.2 ns
BM_deque_deque_ranges_copy_backward/1024           715 ns       43.5 ns
BM_deque_deque_ranges_copy_backward/4000          2839 ns        101 ns
BM_deque_deque_ranges_copy_backward/4096          2861 ns        163 ns
BM_deque_deque_ranges_copy_backward/5500          3850 ns        217 ns
BM_deque_deque_ranges_copy_backward/64000        42909 ns       5404 ns
BM_deque_deque_ranges_copy_backward/65536        44236 ns       5572 ns
BM_deque_deque_ranges_copy_backward/70000        47484 ns       5997 ns
BM_vector_deque_copy_backward/0                  0.597 ns      0.532 ns
BM_vector_deque_copy_backward/1                   4.17 ns       2.15 ns
BM_vector_deque_copy_backward/2                   3.83 ns       2.03 ns
BM_vector_deque_copy_backward/64                  4.99 ns       3.49 ns
BM_vector_deque_copy_backward/512                 18.1 ns       14.0 ns
BM_vector_deque_copy_backward/1024                43.6 ns       41.1 ns
BM_vector_deque_copy_backward/4000                 160 ns        135 ns
BM_vector_deque_copy_backward/4096                 160 ns        138 ns
BM_vector_deque_copy_backward/5500                 225 ns        180 ns
BM_vector_deque_copy_backward/64000               5458 ns       5435 ns
BM_vector_deque_copy_backward/65536               5648 ns       5652 ns
BM_vector_deque_copy_backward/70000               6021 ns       6026 ns
BM_vector_deque_ranges_copy_backward/0           0.529 ns      0.529 ns
BM_vector_deque_ranges_copy_backward/1            1.06 ns       2.11 ns
BM_vector_deque_ranges_copy_backward/2            1.35 ns       1.98 ns
BM_vector_deque_ranges_copy_backward/64           25.4 ns       3.43 ns
BM_vector_deque_ranges_copy_backward/512           166 ns       13.8 ns
BM_vector_deque_ranges_copy_backward/1024          286 ns       41.0 ns
BM_vector_deque_ranges_copy_backward/4000         1149 ns        134 ns
BM_vector_deque_ranges_copy_backward/4096         1138 ns        138 ns
BM_vector_deque_ranges_copy_backward/5500         1536 ns        180 ns
BM_vector_deque_ranges_copy_backward/6400        17771 ns       5435 ns
BM_vector_deque_ranges_copy_backward/6553        18343 ns       5653 ns
BM_vector_deque_ranges_copy_backward/7000        19422 ns       6024 ns
BM_deque_vector_move_backward/0                  0.271 ns       1.58 ns
BM_deque_vector_move_backward/1                   2.91 ns       1.85 ns
BM_deque_vector_move_backward/2                   3.51 ns       1.72 ns
BM_deque_vector_move_backward/64                  4.49 ns       2.94 ns
BM_deque_vector_move_backward/512                 15.8 ns       13.3 ns
BM_deque_vector_move_backward/1024                41.2 ns       40.0 ns
BM_deque_vector_move_backward/4000                 147 ns        126 ns
BM_deque_vector_move_backward/4096                 145 ns        127 ns
BM_deque_vector_move_backward/5500                 207 ns        170 ns
BM_deque_vector_move_backward/64000               5465 ns       5512 ns
BM_deque_vector_move_backward/65536               5435 ns       5414 ns
BM_deque_vector_move_backward/70000               5835 ns       5788 ns
BM_deque_vector_ranges_move_backward/0           0.264 ns       1.58 ns
BM_deque_vector_ranges_move_backward/1            1.17 ns       1.85 ns
BM_deque_vector_ranges_move_backward/2            1.45 ns       1.72 ns
BM_deque_vector_ranges_move_backward/64           23.2 ns       3.11 ns
BM_deque_vector_ranges_move_backward/512           147 ns       13.3 ns
BM_deque_vector_ranges_move_backward/1024          281 ns       39.9 ns
BM_deque_vector_ranges_move_backward/4000         1097 ns        126 ns
BM_deque_vector_ranges_move_backward/4096         1122 ns        127 ns
BM_deque_vector_ranges_move_backward/5500         1514 ns        170 ns
BM_deque_vector_ranges_move_backward/6400        17551 ns       5502 ns
BM_deque_vector_ranges_move_backward/6553        17944 ns       5415 ns
BM_deque_vector_ranges_move_backward/7000        19183 ns       5787 ns
BM_deque_deque_move_backward/0                    1.17 ns       1.32 ns
BM_deque_deque_move_backward/1                    6.60 ns       3.17 ns
BM_deque_deque_move_backward/2                    6.87 ns       3.17 ns
BM_deque_deque_move_backward/64                   7.78 ns       4.19 ns
BM_deque_deque_move_backward/512                  24.2 ns       19.2 ns
BM_deque_deque_move_backward/1024                 46.2 ns       43.5 ns
BM_deque_deque_move_backward/4000                  121 ns        101 ns
BM_deque_deque_move_backward/4096                  179 ns        163 ns
BM_deque_deque_move_backward/5500                  247 ns        216 ns
BM_deque_deque_move_backward/64000                5361 ns       5401 ns
BM_deque_deque_move_backward/65536                5465 ns       5531 ns
BM_deque_deque_move_backward/70000                5845 ns       5942 ns
BM_deque_deque_ranges_move_backward/0            0.791 ns       1.32 ns
BM_deque_deque_ranges_move_backward/1             2.04 ns       3.17 ns
BM_deque_deque_ranges_move_backward/2             2.93 ns       3.17 ns
BM_deque_deque_ranges_move_backward/64            55.7 ns       4.19 ns
BM_deque_deque_ranges_move_backward/512            351 ns       19.2 ns
BM_deque_deque_ranges_move_backward/1024           689 ns       43.5 ns
BM_deque_deque_ranges_move_backward/4000          2685 ns        102 ns
BM_deque_deque_ranges_move_backward/4096          2743 ns        163 ns
BM_deque_deque_ranges_move_backward/5500          3698 ns        215 ns
BM_deque_deque_ranges_move_backward/64000        42808 ns       5394 ns
BM_deque_deque_ranges_move_backward/65536        43858 ns       5531 ns
BM_deque_deque_ranges_move_backward/70000        46853 ns       5941 ns
BM_vector_deque_move_backward/0                  0.621 ns      0.528 ns
BM_vector_deque_move_backward/1                   4.17 ns       2.11 ns
BM_vector_deque_move_backward/2                   3.84 ns       1.98 ns
BM_vector_deque_move_backward/64                  4.99 ns       3.43 ns
BM_vector_deque_move_backward/512                 18.1 ns       13.8 ns
BM_vector_deque_move_backward/1024                43.6 ns       41.0 ns
BM_vector_deque_move_backward/4000                 160 ns        134 ns
BM_vector_deque_move_backward/4096                 160 ns        138 ns
BM_vector_deque_move_backward/5500                 225 ns        180 ns
BM_vector_deque_move_backward/64000               5457 ns       5433 ns
BM_vector_deque_move_backward/65536               5646 ns       5650 ns
BM_vector_deque_move_backward/70000               6020 ns       6027 ns
BM_vector_deque_ranges_move_backward/0           0.536 ns      0.529 ns
BM_vector_deque_ranges_move_backward/1            1.06 ns       2.11 ns
BM_vector_deque_ranges_move_backward/2            1.33 ns       1.98 ns
BM_vector_deque_ranges_move_backward/64           28.9 ns       3.43 ns
BM_vector_deque_ranges_move_backward/512           160 ns       13.8 ns
BM_vector_deque_ranges_move_backward/1024          286 ns       41.0 ns
BM_vector_deque_ranges_move_backward/4000         1197 ns        134 ns
BM_vector_deque_ranges_move_backward/4096         1138 ns        138 ns
BM_vector_deque_ranges_move_backward/5500         1552 ns        180 ns
BM_vector_deque_ranges_move_backward/6400        17834 ns       5432 ns
BM_vector_deque_ranges_move_backward/6553        18351 ns       5654 ns
BM_vector_deque_ranges_move_backward/7000        19473 ns       6028 ns
libcxx/include/__algorithm/copy.h
42

Doesn't apply anymore, since @var-const refactored it.

72

Also doesn't apply anymore.

ldionne requested changes to this revision.Sep 15 2022, 9:04 AM

I think this is a really really nice generalization. It's super elegant how the segmented iterator traits bridge perfectly with our existing deque iterator.

The speedup this provides for ranges is extremely valuable too, and it opens the door to doing the same for other segmented iterators (maybe join_view)?

libcxx/benchmarks/CMakeLists.txt
171

Not tied to this file: Let's add a release note. This is an interesting optimization and it will yield large performance benefits for all segmented iterators, so it's worth advertising.

libcxx/benchmarks/deque_iterator.bench.cpp
1

I would be curious to try adding _LIBCPP_ALWAYS_INLINE to __unwrap_and_dispatch (and possibly other implementation details in that code path) to see what impact it has on the benchmarks. Could you try that out and report? I would assume that for raw pointers, that should all end up being inlined away.

libcxx/include/__algorithm/copy_backward.h
49

Just a matter of personal preference, but I find these names quite long (__first_segment, __last_segment). IMO it would be a bit easier to read with shorter names because of how the lines would be wrapped. I'd suggest __sfirst and __slast, which I think is idiomatic to segmented iterators based on the paper we looked at.

91

Here and in a few other places, a using _DiffT = typename common_type<__iter_diff_t<_InIter>, __iter_diff_t<_OutIter> >::type; would increase readability.

93
103–115

I think you can pretty easily remove the special casing for the last segment by doing something like this. I'm not saying it's the best way to write it, but it should be possible to remove the special case.

Also, let's use {} on the if since it spans multiple lines.

libcxx/include/__iterator/segmented_iterator.h
30

Indentation here and below doesn't match above.

libcxx/test/std/algorithms/alg.modifying.operations/alg.copy/ranges.copy.pass.cpp
60

The indentation is crazy here.

This revision now requires changes to proceed.Sep 15 2022, 9:04 AM
philnik updated this revision to Diff 461004.Sep 17 2022, 7:06 AM
philnik marked 11 inline comments as done.Sep 17 2022, 7:06 AM
  • Address comments

For completeness, here are the updated numbers:

--------------------------------------------------------------------
Benchmark                                         old            new
--------------------------------------------------------------------
BM_deque_vector_copy/0                       0.272 ns        1.58 ns
BM_deque_vector_copy/1                        3.40 ns        1.85 ns
BM_deque_vector_copy/2                        3.30 ns        1.72 ns
BM_deque_vector_copy/64                       4.24 ns        2.90 ns
BM_deque_vector_copy/512                      18.6 ns        26.1 ns
BM_deque_vector_copy/1024                     41.3 ns        40.1 ns
BM_deque_vector_copy/4000                      145 ns         127 ns
BM_deque_vector_copy/4096                      148 ns         131 ns
BM_deque_vector_copy/5500                      196 ns         167 ns
BM_deque_vector_copy/64000                    5518 ns        5647 ns
BM_deque_vector_copy/65536                    5665 ns        5674 ns
BM_deque_vector_copy/70000                    5852 ns        5852 ns
BM_deque_vector_ranges_copy/0                0.265 ns        1.58 ns
BM_deque_vector_ranges_copy/1                 1.30 ns        1.85 ns
BM_deque_vector_ranges_copy/2                 1.60 ns        1.73 ns
BM_deque_vector_ranges_copy/64                23.5 ns        2.90 ns
BM_deque_vector_ranges_copy/512                188 ns        16.9 ns
BM_deque_vector_ranges_copy/1024               370 ns        39.5 ns
BM_deque_vector_ranges_copy/4000              1437 ns         128 ns
BM_deque_vector_ranges_copy/4096              1474 ns         128 ns
BM_deque_vector_ranges_copy/5500              1990 ns         171 ns
BM_deque_vector_ranges_copy/64000            23213 ns        5533 ns
BM_deque_vector_ranges_copy/65536            23703 ns        5469 ns
BM_deque_vector_ranges_copy/70000            25287 ns        5884 ns
BM_deque_deque_copy/0                         1.06 ns        1.24 ns
BM_deque_deque_copy/1                         5.77 ns        3.11 ns
BM_deque_deque_copy/2                         5.63 ns        2.91 ns
BM_deque_deque_copy/64                        6.42 ns        4.03 ns
BM_deque_deque_copy/512                       21.4 ns        18.8 ns
BM_deque_deque_copy/1024                      43.0 ns        41.7 ns
BM_deque_deque_copy/4000                       114 ns        97.6 ns
BM_deque_deque_copy/4096                       171 ns         156 ns
BM_deque_deque_copy/5500                       236 ns         204 ns
BM_deque_deque_copy/64000                     5387 ns        5406 ns
BM_deque_deque_copy/65536                     5552 ns        5566 ns
BM_deque_deque_copy/70000                     5882 ns        5907 ns
BM_deque_deque_ranges_copy/0                 0.793 ns        1.21 ns
BM_deque_deque_ranges_copy/1                  1.85 ns        3.12 ns
BM_deque_deque_ranges_copy/2                  2.38 ns        2.90 ns
BM_deque_deque_ranges_copy/64                 44.4 ns        4.03 ns
BM_deque_deque_ranges_copy/512                 281 ns        18.7 ns
BM_deque_deque_ranges_copy/1024                555 ns        41.8 ns
BM_deque_deque_ranges_copy/4000               2155 ns        97.8 ns
BM_deque_deque_ranges_copy/4096               2217 ns         156 ns
BM_deque_deque_ranges_copy/5500               2977 ns         206 ns
BM_deque_deque_ranges_copy/64000             34584 ns        5412 ns
BM_deque_deque_ranges_copy/65536             35419 ns        5582 ns
BM_deque_deque_ranges_copy/70000             37847 ns        5912 ns
BM_vector_deque_copy/0                       0.585 ns       0.528 ns
BM_vector_deque_copy/1                        2.98 ns        2.11 ns
BM_vector_deque_copy/2                        2.79 ns        1.98 ns
BM_vector_deque_copy/64                       3.70 ns        3.44 ns
BM_vector_deque_copy/512                      13.5 ns        13.4 ns
BM_vector_deque_copy/1024                     40.0 ns        39.2 ns
BM_vector_deque_copy/4000                      145 ns         130 ns
BM_vector_deque_copy/4096                      147 ns         135 ns
BM_vector_deque_copy/5500                      199 ns         182 ns
BM_vector_deque_copy/64000                    5452 ns        5424 ns
BM_vector_deque_copy/65536                    5718 ns        5698 ns
BM_vector_deque_copy/70000                    5985 ns        5992 ns
BM_vector_deque_ranges_copy/0                0.529 ns       0.531 ns
BM_vector_deque_ranges_copy/1                 1.06 ns        2.11 ns
BM_vector_deque_ranges_copy/2                 1.46 ns        1.98 ns
BM_vector_deque_ranges_copy/64                23.2 ns        3.44 ns
BM_vector_deque_ranges_copy/512                187 ns        13.5 ns
BM_vector_deque_ranges_copy/1024               369 ns        39.2 ns
BM_vector_deque_ranges_copy/4000              1440 ns         130 ns
BM_vector_deque_ranges_copy/4096              1474 ns         135 ns
BM_vector_deque_ranges_copy/5500              1979 ns         182 ns
BM_vector_deque_ranges_copy/64000            22995 ns        5421 ns
BM_vector_deque_ranges_copy/65536            23549 ns        5705 ns
BM_vector_deque_ranges_copy/70000            25182 ns        5994 ns
BM_deque_vector_move/0                       0.266 ns        1.59 ns
BM_deque_vector_move/1                        3.26 ns        1.85 ns
BM_deque_vector_move/2                        3.23 ns        1.72 ns
BM_deque_vector_move/64                       4.24 ns        2.90 ns
BM_deque_vector_move/512                      14.1 ns        13.5 ns
BM_deque_vector_move/1024                     40.4 ns        39.5 ns
BM_deque_vector_move/4000                      144 ns         128 ns
BM_deque_vector_move/4096                      144 ns         128 ns
BM_deque_vector_move/5500                      200 ns         171 ns
BM_deque_vector_move/64000                    5482 ns        5458 ns
BM_deque_vector_move/65536                    5436 ns        5469 ns
BM_deque_vector_move/70000                    5846 ns        5781 ns
BM_deque_vector_ranges_move/0                0.264 ns        1.58 ns
BM_deque_vector_ranges_move/1                 1.32 ns        1.85 ns
BM_deque_vector_ranges_move/2                 1.59 ns        1.72 ns
BM_deque_vector_ranges_move/64                35.3 ns        2.91 ns
BM_deque_vector_ranges_move/512                202 ns        13.4 ns
BM_deque_vector_ranges_move/1024               395 ns        39.5 ns
BM_deque_vector_ranges_move/4000              1548 ns         128 ns
BM_deque_vector_ranges_move/4096              1559 ns         128 ns
BM_deque_vector_ranges_move/5500              2133 ns         170 ns
BM_deque_vector_ranges_move/64000            24609 ns        5460 ns
BM_deque_vector_ranges_move/65536            25182 ns        5470 ns
BM_deque_vector_ranges_move/70000            27414 ns        5774 ns
BM_deque_deque_move/0                         1.06 ns        1.22 ns
BM_deque_deque_move/1                         5.67 ns        3.10 ns
BM_deque_deque_move/2                         5.54 ns        2.90 ns
BM_deque_deque_move/64                        6.37 ns        4.00 ns
BM_deque_deque_move/512                       21.3 ns        18.8 ns
BM_deque_deque_move/1024                      43.0 ns        41.7 ns
BM_deque_deque_move/4000                       112 ns        97.6 ns
BM_deque_deque_move/4096                       171 ns         156 ns
BM_deque_deque_move/5500                       233 ns         206 ns
BM_deque_deque_move/64000                     5378 ns        5401 ns
BM_deque_deque_move/65536                     5546 ns        5562 ns
BM_deque_deque_move/70000                     5877 ns        5911 ns
BM_deque_deque_ranges_move/0                 0.792 ns        1.21 ns
BM_deque_deque_ranges_move/1                  1.85 ns        3.10 ns
BM_deque_deque_ranges_move/2                  2.38 ns        2.90 ns
BM_deque_deque_ranges_move/64                 43.9 ns        4.02 ns
BM_deque_deque_ranges_move/512                 281 ns        18.8 ns
BM_deque_deque_ranges_move/1024                560 ns        41.7 ns
BM_deque_deque_ranges_move/4000               2171 ns        97.6 ns
BM_deque_deque_ranges_move/4096               2245 ns         156 ns
BM_deque_deque_ranges_move/5500               3013 ns         204 ns
BM_deque_deque_ranges_move/64000             35085 ns        5399 ns
BM_deque_deque_ranges_move/65536             35939 ns        5560 ns
BM_deque_deque_ranges_move/70000             38388 ns        5903 ns
BM_vector_deque_move/0                       0.597 ns       0.579 ns
BM_vector_deque_move/1                        2.83 ns        2.11 ns
BM_vector_deque_move/2                        2.70 ns        1.98 ns
BM_vector_deque_move/64                       3.68 ns        3.43 ns
BM_vector_deque_move/512                      13.5 ns        13.5 ns
BM_vector_deque_move/1024                     39.9 ns        39.1 ns
BM_vector_deque_move/4000                      145 ns         130 ns
BM_vector_deque_move/4096                      146 ns         135 ns
BM_vector_deque_move/5500                      200 ns         182 ns
BM_vector_deque_move/64000                    5454 ns        5422 ns
BM_vector_deque_move/65536                    5722 ns        5693 ns
BM_vector_deque_move/70000                    5986 ns        5992 ns
BM_vector_deque_ranges_move/0                0.539 ns       0.528 ns
BM_vector_deque_ranges_move/1                 1.06 ns        2.13 ns
BM_vector_deque_ranges_move/2                 1.47 ns        1.99 ns
BM_vector_deque_ranges_move/64                24.0 ns        3.44 ns
BM_vector_deque_ranges_move/512                189 ns        13.8 ns
BM_vector_deque_ranges_move/1024               375 ns        39.5 ns
BM_vector_deque_ranges_move/4000              1436 ns         130 ns
BM_vector_deque_ranges_move/4096              1472 ns         135 ns
BM_vector_deque_ranges_move/5500              1977 ns         183 ns
BM_vector_deque_ranges_move/64000            22981 ns        5660 ns
BM_vector_deque_ranges_move/65536            23577 ns        5888 ns
BM_vector_deque_ranges_move/70000            25131 ns        6174 ns
BM_deque_vector_copy_backward/0              0.264 ns        1.61 ns
BM_deque_vector_copy_backward/1               2.96 ns        1.87 ns
BM_deque_vector_copy_backward/2               3.55 ns        1.75 ns
BM_deque_vector_copy_backward/64              4.49 ns        3.02 ns
BM_deque_vector_copy_backward/512             16.1 ns        13.4 ns
BM_deque_vector_copy_backward/1024            41.1 ns        40.3 ns
BM_deque_vector_copy_backward/4000             151 ns         128 ns
BM_deque_vector_copy_backward/4096             145 ns         129 ns
BM_deque_vector_copy_backward/5500             211 ns         171 ns
BM_deque_vector_copy_backward/64000           5471 ns        5517 ns
BM_deque_vector_copy_backward/65536           5439 ns        5440 ns
BM_deque_vector_copy_backward/70000           5838 ns        5775 ns
BM_deque_vector_ranges_copy_backward/0       0.264 ns        1.58 ns
BM_deque_vector_ranges_copy_backward/1        1.17 ns        1.85 ns
BM_deque_vector_ranges_copy_backward/2        1.45 ns        1.72 ns
BM_deque_vector_ranges_copy_backward/64       26.0 ns        3.12 ns
BM_deque_vector_ranges_copy_backward/512       147 ns        13.2 ns
BM_deque_vector_ranges_copy_backward/1024      282 ns        39.9 ns
BM_deque_vector_ranges_copy_backward/4000     1103 ns         126 ns
BM_deque_vector_ranges_copy_backward/4096     1131 ns         127 ns
BM_deque_vector_ranges_copy_backward/5500     1514 ns         171 ns
BM_deque_vector_ranges_copy_backward/64000   17553 ns        5512 ns
BM_deque_vector_ranges_copy_backward/65536   17944 ns        5441 ns
BM_deque_vector_ranges_copy_backward/70000   19183 ns        5772 ns
BM_deque_deque_copy_backward/0                1.16 ns        1.33 ns
BM_deque_deque_copy_backward/1                6.58 ns        3.17 ns
BM_deque_deque_copy_backward/2                6.87 ns        3.17 ns
BM_deque_deque_copy_backward/64               7.77 ns        4.18 ns
BM_deque_deque_copy_backward/512              24.3 ns        19.2 ns
BM_deque_deque_copy_backward/1024             46.2 ns        43.2 ns
BM_deque_deque_copy_backward/4000              121 ns         101 ns
BM_deque_deque_copy_backward/4096              179 ns         162 ns
BM_deque_deque_copy_backward/5500              247 ns         216 ns
BM_deque_deque_copy_backward/64000            5362 ns        5429 ns
BM_deque_deque_copy_backward/65536            5474 ns        5551 ns
BM_deque_deque_copy_backward/70000            5856 ns        5942 ns
BM_deque_deque_ranges_copy_backward/0        0.792 ns        1.34 ns
BM_deque_deque_ranges_copy_backward/1         2.04 ns        3.17 ns
BM_deque_deque_ranges_copy_backward/2         2.93 ns        3.17 ns
BM_deque_deque_ranges_copy_backward/64        56.0 ns        4.20 ns
BM_deque_deque_ranges_copy_backward/512        372 ns        19.2 ns
BM_deque_deque_ranges_copy_backward/1024       715 ns        43.2 ns
BM_deque_deque_ranges_copy_backward/4000      2839 ns         101 ns
BM_deque_deque_ranges_copy_backward/4096      2861 ns         163 ns
BM_deque_deque_ranges_copy_backward/5500      3850 ns         216 ns
BM_deque_deque_ranges_copy_backward/64000    42909 ns        5425 ns
BM_deque_deque_ranges_copy_backward/65536    44236 ns        5547 ns
BM_deque_deque_ranges_copy_backward/70000    47484 ns        5939 ns
BM_vector_deque_copy_backward/0              0.597 ns       0.566 ns
BM_vector_deque_copy_backward/1               4.17 ns        2.11 ns
BM_vector_deque_copy_backward/2               3.83 ns        1.98 ns
BM_vector_deque_copy_backward/64              4.99 ns        3.43 ns
BM_vector_deque_copy_backward/512             18.1 ns        13.8 ns
BM_vector_deque_copy_backward/1024            43.6 ns        41.0 ns
BM_vector_deque_copy_backward/4000             160 ns         134 ns
BM_vector_deque_copy_backward/4096             160 ns         138 ns
BM_vector_deque_copy_backward/5500             225 ns         180 ns
BM_vector_deque_copy_backward/64000           5458 ns        5440 ns
BM_vector_deque_copy_backward/65536           5648 ns        5657 ns
BM_vector_deque_copy_backward/70000           6021 ns        6018 ns
BM_vector_deque_ranges_copy_backward/0       0.529 ns       0.538 ns
BM_vector_deque_ranges_copy_backward/1        1.06 ns        2.11 ns
BM_vector_deque_ranges_copy_backward/2        1.35 ns        1.98 ns
BM_vector_deque_ranges_copy_backward/64       25.4 ns        3.43 ns
BM_vector_deque_ranges_copy_backward/512       166 ns        13.8 ns
BM_vector_deque_ranges_copy_backward/1024      286 ns        41.0 ns
BM_vector_deque_ranges_copy_backward/4000     1149 ns         134 ns
BM_vector_deque_ranges_copy_backward/4096     1138 ns         139 ns
BM_vector_deque_ranges_copy_backward/5500     1536 ns         182 ns
BM_vector_deque_ranges_copy_backward/64000   17771 ns        5467 ns
BM_vector_deque_ranges_copy_backward/65536   18343 ns        5658 ns
BM_vector_deque_ranges_copy_backward/70000   19422 ns        6062 ns
BM_deque_vector_move_backward/0              0.271 ns        1.60 ns
BM_deque_vector_move_backward/1               2.91 ns        1.87 ns
BM_deque_vector_move_backward/2               3.51 ns        1.74 ns
BM_deque_vector_move_backward/64              4.49 ns        3.13 ns
BM_deque_vector_move_backward/512             15.8 ns        13.4 ns
BM_deque_vector_move_backward/1024            41.2 ns        40.0 ns
BM_deque_vector_move_backward/4000             147 ns         126 ns
BM_deque_vector_move_backward/4096             145 ns         127 ns
BM_deque_vector_move_backward/5500             207 ns         171 ns
BM_deque_vector_move_backward/64000           5465 ns        5515 ns
BM_deque_vector_move_backward/65536           5435 ns        5441 ns
BM_deque_vector_move_backward/70000           5835 ns        5787 ns
BM_deque_vector_ranges_move_backward/0       0.264 ns        1.58 ns
BM_deque_vector_ranges_move_backward/1        1.17 ns        1.85 ns
BM_deque_vector_ranges_move_backward/2        1.45 ns        1.72 ns
BM_deque_vector_ranges_move_backward/64       23.2 ns        3.12 ns
BM_deque_vector_ranges_move_backward/512       147 ns        13.3 ns
BM_deque_vector_ranges_move_backward/1024      281 ns        39.9 ns
BM_deque_vector_ranges_move_backward/4000     1097 ns         126 ns
BM_deque_vector_ranges_move_backward/4096     1122 ns         127 ns
BM_deque_vector_ranges_move_backward/5500     1514 ns         170 ns
BM_deque_vector_ranges_move_backward/64000   17551 ns        5507 ns
BM_deque_vector_ranges_move_backward/65536   17944 ns        5436 ns
BM_deque_vector_ranges_move_backward/70000   19183 ns        5781 ns
BM_deque_deque_move_backward/0                1.17 ns        1.34 ns
BM_deque_deque_move_backward/1                6.60 ns        3.17 ns
BM_deque_deque_move_backward/2                6.87 ns        3.17 ns
BM_deque_deque_move_backward/64               7.78 ns        4.20 ns
BM_deque_deque_move_backward/512              24.2 ns        19.2 ns
BM_deque_deque_move_backward/1024             46.2 ns        43.2 ns
BM_deque_deque_move_backward/4000              121 ns         101 ns
BM_deque_deque_move_backward/4096              179 ns         163 ns
BM_deque_deque_move_backward/5500              247 ns         217 ns
BM_deque_deque_move_backward/64000            5361 ns        5431 ns
BM_deque_deque_move_backward/65536            5465 ns        5547 ns
BM_deque_deque_move_backward/70000            5845 ns        5939 ns
BM_deque_deque_ranges_move_backward/0        0.791 ns        1.33 ns
BM_deque_deque_ranges_move_backward/1         2.04 ns        3.17 ns
BM_deque_deque_ranges_move_backward/2         2.93 ns        3.17 ns
BM_deque_deque_ranges_move_backward/64        55.7 ns        4.18 ns
BM_deque_deque_ranges_move_backward/512        351 ns        19.2 ns
BM_deque_deque_ranges_move_backward/1024       689 ns        43.2 ns
BM_deque_deque_ranges_move_backward/4000      2685 ns         101 ns
BM_deque_deque_ranges_move_backward/4096      2743 ns         162 ns
BM_deque_deque_ranges_move_backward/5500      3698 ns         215 ns
BM_deque_deque_ranges_move_backward/64000    42808 ns        5426 ns
BM_deque_deque_ranges_move_backward/65536    43858 ns        5551 ns
BM_deque_deque_ranges_move_backward/70000    46853 ns        5947 ns
BM_vector_deque_move_backward/0              0.621 ns       0.532 ns
BM_vector_deque_move_backward/1               4.17 ns        2.11 ns
BM_vector_deque_move_backward/2               3.84 ns        1.98 ns
BM_vector_deque_move_backward/64              4.99 ns        3.43 ns
BM_vector_deque_move_backward/512             18.1 ns        13.8 ns
BM_vector_deque_move_backward/1024            43.6 ns        41.0 ns
BM_vector_deque_move_backward/4000             160 ns         134 ns
BM_vector_deque_move_backward/4096             160 ns         138 ns
BM_vector_deque_move_backward/5500             225 ns         181 ns
BM_vector_deque_move_backward/64000           5457 ns        5436 ns
BM_vector_deque_move_backward/65536           5646 ns        5658 ns
BM_vector_deque_move_backward/70000           6020 ns        6022 ns
BM_vector_deque_ranges_move_backward/0       0.536 ns       0.528 ns
BM_vector_deque_ranges_move_backward/1        1.06 ns        2.11 ns
BM_vector_deque_ranges_move_backward/2        1.33 ns        1.98 ns
BM_vector_deque_ranges_move_backward/64       28.9 ns        3.43 ns
BM_vector_deque_ranges_move_backward/512       160 ns        13.8 ns
BM_vector_deque_ranges_move_backward/1024      286 ns        41.0 ns
BM_vector_deque_ranges_move_backward/4000     1197 ns         134 ns
BM_vector_deque_ranges_move_backward/4096     1138 ns         138 ns
BM_vector_deque_ranges_move_backward/5500     1552 ns         181 ns
BM_vector_deque_ranges_move_backward/64000   17834 ns        5436 ns
BM_vector_deque_ranges_move_backward/65536   18351 ns        5660 ns
BM_vector_deque_ranges_move_backward/70000   19473 ns        6022 ns
libcxx/benchmarks/deque_iterator.bench.cpp
1

I've changed the implementation of copy and move to use the segment iterators and that fixed the performance issues, so I didn't test with _LIBCPP_ALWAYS_INLINE.

libcxx/include/__algorithm/copy.h
60

I'd like to investigate that when adding a non-random-access iterator to the segmented iterators, since then it will actually matter. I don't know yet whether local iterators should always be random-access or copying to output segmented iterators should be constrained on the local iterator being random-access.

philnik edited the summary of this revision. (Show Details)Sep 18 2022, 4:27 AM
philnik updated this revision to Diff 461830.Sep 21 2022, 2:54 AM
philnik edited the summary of this revision. (Show Details)
  • Rebased
  • Generate files
ldionne accepted this revision.Sep 23 2022, 8:20 AM

LGTM but let's wait for the CI to be back online for a green run and then submit.

libcxx/docs/ReleaseNotes.rst
50–51 ↗(On Diff #461835)
libcxx/include/__algorithm/copy_backward.h
83
libcxx/include/__algorithm/move_backward.h
82
huixie90 accepted this revision.Sep 28 2022, 4:10 AM

Thanks for the update. LGTM!

This revision is now accepted and ready to land.Sep 28 2022, 4:10 AM
philnik updated this revision to Diff 463986.Sep 29 2022, 12:01 PM
  • Try to fix CI
philnik updated this revision to Diff 464550.Oct 2 2022, 5:59 AM
  • Rebased

If this change builds on top of D130695, then I would request to hold off of proceeding with and landing this - I have bisected downstream breakage down to D130695.

EricWF requested changes to this revision.Oct 2 2022, 12:24 PM
EricWF added inline comments.
libcxx/benchmarks/deque_iterator.bench.cpp
1

Have you verified the assembly for these benchmarks?

125

Can you meaningfully do this more than once on the same container?

139

Can you meaningfully do this more than once on the same container?

This revision now requires changes to proceed.Oct 2 2022, 12:24 PM
philnik added inline comments.Oct 2 2022, 12:34 PM
libcxx/benchmarks/deque_iterator.bench.cpp
1

No, but why should I? The assembly is obviously different, since the benchmarks show differences. The differences aren't very significant though, i.e. a pessimization for the 0-case, but an optimization for small N for some algorithms and an optimization across the board for others.

125

When the types are trivial they are just copied, so it should be fine.

philnik updated this revision to Diff 465850.Oct 6 2022, 1:18 PM
  • Try to fix CI
philnik updated this revision to Diff 476775.Nov 20 2022, 2:33 PM

Fix C++03

ldionne requested changes to this revision.Nov 21 2022, 9:34 AM
ldionne added inline comments.
libcxx/include/__iterator/segmented_iterator.h
54
template <class _Iterator, class = void>
struct __segmented_iterator_traits;
/* exposition only:
 {
  using __segment_iterator      = ...;
  using __local_iterator        = ...;

  static __segment_iterator __segment(_Iterator);
  static __local_iterator __local(_Iterator);
  static __local_iterator __begin(__segment_iterator);
  static __local_iterator __end(__segment_iterator);
  static _Iterator __compose(__segment_iterator, __local_iterator);
};
*/

template <class _Tp, size_t>
constexpr bool __has_specialization_v = false;

template <class _Tp>
constexpr bool __has_specialization_v<_Tp, sizeof(_Tp)> = true;

template <class _Iterator>
using __is_segmented_iterator = bool_constant<__has_specialization_v<segmented_iterator_traits<_Iterator> >;

That way, we avoid forming __segmented_iterator_traits specializations for types that are not segmented iterators. That's more in-line with what we do for other traits classes, e.g. char_traits, allocator_traits, etc.

libcxx/include/deque
399
This revision now requires changes to proceed.Nov 21 2022, 9:34 AM
philnik updated this revision to Diff 477435.Nov 23 2022, 3:09 AM
philnik marked 5 inline comments as done.

Address comments

philnik updated this revision to Diff 489181.Jan 13 2023, 6:28 PM

Try to fix CI

philnik updated this revision to Diff 489242.Jan 14 2023, 6:59 AM

Fix modulemap

philnik updated this revision to Diff 489262.Jan 14 2023, 8:39 AM

Try to fix CI

philnik updated this revision to Diff 489326.Jan 14 2023, 8:15 PM

Try to fix CI

ldionne requested changes to this revision.Jan 16 2023, 8:01 AM
ldionne added inline comments.
libcxx/include/__iterator/segmented_iterator.h
54

I don't think you addressed this feedback, have you?

This revision now requires changes to proceed.Jan 16 2023, 8:01 AM
philnik updated this revision to Diff 489602.Jan 16 2023, 10:48 AM
philnik marked an inline comment as done.

Address comments

philnik updated this revision to Diff 489610.Jan 16 2023, 11:19 AM

Try to fix CI

ldionne accepted this revision.Jan 19 2023, 8:23 AM

Thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Jan 19 2023, 11:11 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.