This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Introduce __for_each_segment and use it in copy/move
ClosedPublic

Authored by philnik on May 23 2023, 3:22 PM.

Details

Summary

This simplifies the code inside copy/move and makes it easier to apply the optimization to other algorithms.

Diff Detail

Event Timeline

philnik created this revision.May 23 2023, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 3:22 PM
philnik requested review of this revision.May 23 2023, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 3:22 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne accepted this revision.May 24 2023, 8:16 AM

This is a nice refactoring! LGTM w/ green CI.

libcxx/include/__algorithm/for_each_segment.h
22

Can you please document what this function does in a comment?

30
This revision is now accepted and ready to land.May 24 2023, 8:16 AM
ldionne added inline comments.May 24 2023, 8:16 AM
libcxx/include/__algorithm/for_each_segment.h
22

In particular please document the input and expected output of the __func you're taking.

Would it make sense to add benchmarks? The number in D151274 look quite impressive.

libcxx/include/__algorithm/copy.h
51

I'm not too thrilled with this part of the changes. Is it possible to remove this helper struct and use a lambda instead?

std::__for_each_segment(__first, __last, _CopySegment, [&__result](_Traits::__local_iterator __lfirst, _Traits::__local_iterator __llast) mutable
{
  __result_ = std::__copy<_AlgPolicy>(__lfirst, __llast, std::move(__result_)).second;
});

If so the same for move.

libcxx/include/__algorithm/for_each_segment.h
30

Thanks a lot for the comments, that makes understanding the code a lot easier.

philnik updated this revision to Diff 525301.May 24 2023, 12:42 PM
philnik marked 5 inline comments as done.

Try to fix CI

libcxx/include/__algorithm/copy.h
51

I would love to, but unfortunately we are in C++03 land here.

Mordante accepted this revision.May 24 2023, 2:05 PM

LGTM, if possible I still would like benchmarks, but not a blocker.

libcxx/include/__algorithm/copy.h
51

ah yes right, I missed that. Then we need to keep our lambda emulation.

LGTM, if possible I still would like benchmarks, but not a blocker.

Benchmarks are within margin of error:

---------------------------------------------------------------------------
Benchmark                                               old             new
---------------------------------------------------------------------------
BM_deque_vector_copy/0                              2.20 ns         2.19 ns
BM_deque_vector_copy/1                              4.16 ns         4.08 ns
BM_deque_vector_copy/2                              2.55 ns         2.51 ns
BM_deque_vector_copy/64                             5.04 ns         4.95 ns
BM_deque_vector_copy/512                            26.4 ns         26.5 ns
BM_deque_vector_copy/1024                           54.6 ns         53.5 ns
BM_deque_vector_copy/4000                            216 ns          211 ns
BM_deque_vector_copy/4096                            227 ns          224 ns
BM_deque_vector_copy/5500                            308 ns          308 ns
BM_deque_vector_copy/64000                          3826 ns         3742 ns
BM_deque_vector_copy/65536                          3779 ns         3869 ns
BM_deque_vector_copy/70000                          4108 ns         4101 ns
BM_deque_vector_ranges_copy/0                       2.23 ns         2.19 ns
BM_deque_vector_ranges_copy/1                       4.15 ns         4.07 ns
BM_deque_vector_ranges_copy/2                       2.55 ns         2.51 ns
BM_deque_vector_ranges_copy/64                      5.04 ns         5.04 ns
BM_deque_vector_ranges_copy/512                     26.5 ns         26.2 ns
BM_deque_vector_ranges_copy/1024                    54.7 ns         54.6 ns
BM_deque_vector_ranges_copy/4000                     215 ns          208 ns
BM_deque_vector_ranges_copy/4096                     224 ns          220 ns
BM_deque_vector_ranges_copy/5500                     307 ns          301 ns
BM_deque_vector_ranges_copy/64000                   3797 ns         3729 ns
BM_deque_vector_ranges_copy/65536                   3800 ns         3829 ns
BM_deque_vector_ranges_copy/70000                   4086 ns         4112 ns
BM_deque_deque_copy/0                               1.17 ns         1.17 ns
BM_deque_deque_copy/1                               4.47 ns         4.44 ns
BM_deque_deque_copy/2                               2.91 ns         2.92 ns
BM_deque_deque_copy/64                              6.20 ns         6.17 ns
BM_deque_deque_copy/512                             27.6 ns         27.4 ns
BM_deque_deque_copy/1024                            55.1 ns         54.0 ns
BM_deque_deque_copy/4000                             214 ns          211 ns
BM_deque_deque_copy/4096                             226 ns          225 ns
BM_deque_deque_copy/5500                             307 ns          308 ns
BM_deque_deque_copy/64000                           3881 ns         3877 ns
BM_deque_deque_copy/65536                           3993 ns         3963 ns
BM_deque_deque_copy/70000                           4261 ns         4234 ns
BM_deque_deque_ranges_copy/0                        1.17 ns         1.17 ns
BM_deque_deque_ranges_copy/1                        4.48 ns         4.47 ns
BM_deque_deque_ranges_copy/2                        2.91 ns         2.85 ns
BM_deque_deque_ranges_copy/64                       6.17 ns         6.05 ns
BM_deque_deque_ranges_copy/512                      27.6 ns         27.6 ns
BM_deque_deque_ranges_copy/1024                     54.7 ns         53.7 ns
BM_deque_deque_ranges_copy/4000                      216 ns          216 ns
BM_deque_deque_ranges_copy/4096                      225 ns          227 ns
BM_deque_deque_ranges_copy/5500                      307 ns          305 ns
BM_deque_deque_ranges_copy/64000                    3873 ns         3872 ns
BM_deque_deque_ranges_copy/65536                    3992 ns         3997 ns
BM_deque_deque_ranges_copy/70000                    4236 ns         4250 ns
BM_vector_deque_copy/0                             0.638 ns        0.637 ns
BM_vector_deque_copy/1                              4.15 ns         4.07 ns
BM_vector_deque_copy/2                              2.55 ns         2.51 ns
BM_vector_deque_copy/64                             5.14 ns         5.03 ns
BM_vector_deque_copy/512                            26.6 ns         26.8 ns
BM_vector_deque_copy/1024                           53.1 ns         52.4 ns
BM_vector_deque_copy/4000                            216 ns          211 ns
BM_vector_deque_copy/4096                            221 ns          220 ns
BM_vector_deque_copy/5500                            299 ns          305 ns
BM_vector_deque_copy/64000                          3945 ns         3849 ns
BM_vector_deque_copy/65536                          3853 ns         3933 ns
BM_vector_deque_copy/70000                          4181 ns         4326 ns
BM_vector_deque_ranges_copy/0                      0.638 ns        0.637 ns
BM_vector_deque_ranges_copy/1                       4.16 ns         4.14 ns
BM_vector_deque_ranges_copy/2                       2.55 ns         2.52 ns
BM_vector_deque_ranges_copy/64                      5.15 ns         5.10 ns
BM_vector_deque_ranges_copy/512                     26.7 ns         26.5 ns
BM_vector_deque_ranges_copy/1024                    53.0 ns         52.6 ns
BM_vector_deque_ranges_copy/4000                     217 ns          208 ns
BM_vector_deque_ranges_copy/4096                     222 ns          220 ns
BM_vector_deque_ranges_copy/5500                     299 ns          307 ns
BM_vector_deque_ranges_copy/64000                   3769 ns         3849 ns
BM_vector_deque_ranges_copy/65536                   4055 ns         3926 ns
BM_vector_deque_ranges_copy/70000                   4422 ns         4194 ns
BM_deque_vector_move/0                              2.23 ns         2.23 ns
BM_deque_vector_move/1                              4.15 ns         4.07 ns
BM_deque_vector_move/2                              2.55 ns         2.51 ns
BM_deque_vector_move/64                             5.04 ns         5.03 ns
BM_deque_vector_move/512                            26.5 ns         26.5 ns
BM_deque_vector_move/1024                           54.6 ns         54.6 ns
BM_deque_vector_move/4000                            215 ns          216 ns
BM_deque_vector_move/4096                            226 ns          219 ns
BM_deque_vector_move/5500                            307 ns          303 ns
BM_deque_vector_move/64000                          3782 ns         4021 ns
BM_deque_vector_move/65536                          3778 ns         3765 ns
BM_deque_vector_move/70000                          4113 ns         4128 ns
BM_deque_vector_ranges_move/0                       2.21 ns         2.19 ns
BM_deque_vector_ranges_move/1                       4.15 ns         4.14 ns
BM_deque_vector_ranges_move/2                       2.55 ns         2.55 ns
BM_deque_vector_ranges_move/64                      5.04 ns         5.03 ns
BM_deque_vector_ranges_move/512                     26.5 ns         26.5 ns
BM_deque_vector_ranges_move/1024                    54.8 ns         54.4 ns
BM_deque_vector_ranges_move/4000                     216 ns          216 ns
BM_deque_vector_ranges_move/4096                     220 ns          225 ns
BM_deque_vector_ranges_move/5500                     306 ns          309 ns
BM_deque_vector_ranges_move/64000                   3798 ns         3845 ns
BM_deque_vector_ranges_move/65536                   3793 ns         3757 ns
BM_deque_vector_ranges_move/70000                   4161 ns         4115 ns
BM_deque_deque_move/0                               1.17 ns         1.15 ns
BM_deque_deque_move/1                               4.47 ns         4.47 ns
BM_deque_deque_move/2                               2.91 ns         2.91 ns
BM_deque_deque_move/64                              6.22 ns         6.17 ns
BM_deque_deque_move/512                             27.6 ns         27.7 ns
BM_deque_deque_move/1024                            54.7 ns         54.7 ns
BM_deque_deque_move/4000                             215 ns          215 ns
BM_deque_deque_move/4096                             227 ns          226 ns
BM_deque_deque_move/5500                             309 ns          308 ns
BM_deque_deque_move/64000                           3868 ns         3854 ns
BM_deque_deque_move/65536                           3987 ns         3985 ns
BM_deque_deque_move/70000                           4231 ns         4218 ns
BM_deque_deque_ranges_move/0                        1.17 ns         1.17 ns
BM_deque_deque_ranges_move/1                        4.48 ns         4.47 ns
BM_deque_deque_ranges_move/2                        2.91 ns         2.90 ns
BM_deque_deque_ranges_move/64                       6.19 ns         6.16 ns
BM_deque_deque_ranges_move/512                      27.5 ns         27.6 ns
BM_deque_deque_ranges_move/1024                     54.8 ns         54.9 ns
BM_deque_deque_ranges_move/4000                      217 ns          217 ns
BM_deque_deque_ranges_move/4096                      227 ns          226 ns
BM_deque_deque_ranges_move/5500                      307 ns          307 ns
BM_deque_deque_ranges_move/64000                    3861 ns         3862 ns
BM_deque_deque_ranges_move/65536                    3990 ns         3956 ns
BM_deque_deque_ranges_move/70000                    4402 ns         4223 ns
BM_vector_deque_move/0                             0.638 ns        0.638 ns
BM_vector_deque_move/1                              4.12 ns         4.07 ns
BM_vector_deque_move/2                              2.55 ns         2.51 ns
BM_vector_deque_move/64                             5.06 ns         5.06 ns
BM_vector_deque_move/512                            26.6 ns         26.6 ns
BM_vector_deque_move/1024                           53.1 ns         53.1 ns
BM_vector_deque_move/4000                            215 ns          216 ns
BM_vector_deque_move/4096                            221 ns          222 ns
BM_vector_deque_move/5500                            303 ns          300 ns
BM_vector_deque_move/64000                          3847 ns         3876 ns
BM_vector_deque_move/65536                          3930 ns         3927 ns
BM_vector_deque_move/70000                          4206 ns         4182 ns
BM_vector_deque_ranges_move/0                      0.638 ns        0.628 ns
BM_vector_deque_ranges_move/1                       4.10 ns         4.14 ns
BM_vector_deque_ranges_move/2                       2.55 ns         2.55 ns
BM_vector_deque_ranges_move/64                      5.07 ns         5.07 ns
BM_vector_deque_ranges_move/512                     26.6 ns         26.6 ns
BM_vector_deque_ranges_move/1024                    53.0 ns         53.0 ns
BM_vector_deque_ranges_move/4000                     213 ns          216 ns
BM_vector_deque_ranges_move/4096                     221 ns          222 ns
BM_vector_deque_ranges_move/5500                     304 ns          299 ns
BM_vector_deque_ranges_move/64000                   3971 ns         3867 ns
BM_vector_deque_ranges_move/65536                   4032 ns         4107 ns
BM_vector_deque_ranges_move/70000                   4285 ns         4185 ns
BM_deque_vector_copy_backward/0                     2.53 ns         2.50 ns
BM_deque_vector_copy_backward/1                     4.43 ns         4.39 ns
BM_deque_vector_copy_backward/2                     2.87 ns         2.87 ns
BM_deque_vector_copy_backward/64                    6.01 ns         5.95 ns
BM_deque_vector_copy_backward/512                   28.6 ns         28.4 ns
BM_deque_vector_copy_backward/1024                  54.1 ns         54.0 ns
BM_deque_vector_copy_backward/4000                   211 ns          211 ns
BM_deque_vector_copy_backward/4096                   218 ns          220 ns
BM_deque_vector_copy_backward/5500                   294 ns          295 ns
BM_deque_vector_copy_backward/64000                 7915 ns         8409 ns
BM_deque_vector_copy_backward/65536                 8402 ns         9041 ns
BM_deque_vector_copy_backward/70000                 9309 ns         9253 ns
BM_deque_vector_ranges_copy_backward/0              2.23 ns         2.21 ns
BM_deque_vector_ranges_copy_backward/1              4.39 ns         4.46 ns
BM_deque_vector_ranges_copy_backward/2              2.83 ns         2.83 ns
BM_deque_vector_ranges_copy_backward/64             6.02 ns         5.95 ns
BM_deque_vector_ranges_copy_backward/512            28.6 ns         28.6 ns
BM_deque_vector_ranges_copy_backward/1024           54.1 ns         54.1 ns
BM_deque_vector_ranges_copy_backward/4000            212 ns          212 ns
BM_deque_vector_ranges_copy_backward/4096            219 ns          220 ns
BM_deque_vector_ranges_copy_backward/5500            295 ns          298 ns
BM_deque_vector_ranges_copy_backward/64000          8584 ns         8819 ns
BM_deque_vector_ranges_copy_backward/65536          9253 ns         9739 ns
BM_deque_vector_ranges_copy_backward/70000          9012 ns        10445 ns
BM_deque_deque_copy_backward/0                      1.28 ns         1.28 ns
BM_deque_deque_copy_backward/1                      4.79 ns         4.70 ns
BM_deque_deque_copy_backward/2                      3.22 ns         3.19 ns
BM_deque_deque_copy_backward/64                     7.14 ns         7.13 ns
BM_deque_deque_copy_backward/512                    29.3 ns         29.2 ns
BM_deque_deque_copy_backward/1024                   58.5 ns         58.5 ns
BM_deque_deque_copy_backward/4000                    221 ns          221 ns
BM_deque_deque_copy_backward/4096                    231 ns          228 ns
BM_deque_deque_copy_backward/5500                    302 ns          302 ns
BM_deque_deque_copy_backward/64000                  8029 ns         8367 ns
BM_deque_deque_copy_backward/65536                  9023 ns         8962 ns
BM_deque_deque_copy_backward/70000                  9403 ns         9668 ns
BM_deque_deque_ranges_copy_backward/0               1.28 ns         1.27 ns
BM_deque_deque_ranges_copy_backward/1               4.79 ns         4.78 ns
BM_deque_deque_ranges_copy_backward/2               3.19 ns         3.19 ns
BM_deque_deque_ranges_copy_backward/64              7.13 ns         7.04 ns
BM_deque_deque_ranges_copy_backward/512             29.4 ns         29.4 ns
BM_deque_deque_ranges_copy_backward/1024            58.2 ns         58.2 ns
BM_deque_deque_ranges_copy_backward/4000             221 ns          221 ns
BM_deque_deque_ranges_copy_backward/4096             227 ns          227 ns
BM_deque_deque_ranges_copy_backward/5500             302 ns          304 ns
BM_deque_deque_ranges_copy_backward/64000           7961 ns         8356 ns
BM_deque_deque_ranges_copy_backward/65536           8818 ns         8913 ns
BM_deque_deque_ranges_copy_backward/70000           9904 ns         9605 ns
BM_vector_deque_copy_backward/0                    0.638 ns        0.638 ns
BM_vector_deque_copy_backward/1                     4.15 ns         4.07 ns
BM_vector_deque_copy_backward/2                     2.55 ns         2.55 ns
BM_vector_deque_copy_backward/64                    6.11 ns         6.12 ns
BM_vector_deque_copy_backward/512                   28.3 ns         28.3 ns
BM_vector_deque_copy_backward/1024                  57.0 ns         57.0 ns
BM_vector_deque_copy_backward/4000                   215 ns          214 ns
BM_vector_deque_copy_backward/4096                   222 ns          221 ns
BM_vector_deque_copy_backward/5500                   296 ns          299 ns
BM_vector_deque_copy_backward/64000                 8013 ns         8780 ns
BM_vector_deque_copy_backward/65536                 9448 ns         9191 ns
BM_vector_deque_copy_backward/70000                 9345 ns        10010 ns
BM_vector_deque_ranges_copy_backward/0             0.638 ns        0.626 ns
BM_vector_deque_ranges_copy_backward/1              4.07 ns         4.14 ns
BM_vector_deque_ranges_copy_backward/2              2.53 ns         2.56 ns
BM_vector_deque_ranges_copy_backward/64             6.12 ns         6.11 ns
BM_vector_deque_ranges_copy_backward/512            28.3 ns         28.4 ns
BM_vector_deque_ranges_copy_backward/1024           57.0 ns         57.0 ns
BM_vector_deque_ranges_copy_backward/4000            214 ns          216 ns
BM_vector_deque_ranges_copy_backward/4096            221 ns          223 ns
BM_vector_deque_ranges_copy_backward/5500            297 ns          293 ns
BM_vector_deque_ranges_copy_backward/64000          8795 ns         9020 ns
BM_vector_deque_ranges_copy_backward/65536          9478 ns         9255 ns
BM_vector_deque_ranges_copy_backward/70000          9165 ns         8989 ns
BM_deque_vector_move_backward/0                     2.19 ns         2.23 ns
BM_deque_vector_move_backward/1                     4.46 ns         4.48 ns
BM_deque_vector_move_backward/2                     2.87 ns         2.86 ns
BM_deque_vector_move_backward/64                    6.01 ns         5.95 ns
BM_deque_vector_move_backward/512                   28.6 ns         28.4 ns
BM_deque_vector_move_backward/1024                  54.1 ns         54.1 ns
BM_deque_vector_move_backward/4000                   211 ns          213 ns
BM_deque_vector_move_backward/4096                   217 ns          219 ns
BM_deque_vector_move_backward/5500                   295 ns          293 ns
BM_deque_vector_move_backward/64000                 9059 ns         8965 ns
BM_deque_vector_move_backward/65536                 9081 ns         9236 ns
BM_deque_vector_move_backward/70000                 9029 ns         9900 ns
BM_deque_vector_ranges_move_backward/0              2.19 ns         2.19 ns
BM_deque_vector_ranges_move_backward/1              4.41 ns         4.47 ns
BM_deque_vector_ranges_move_backward/2              2.87 ns         2.87 ns
BM_deque_vector_ranges_move_backward/64             6.01 ns         5.94 ns
BM_deque_vector_ranges_move_backward/512            28.6 ns         28.6 ns
BM_deque_vector_ranges_move_backward/1024           54.2 ns         54.1 ns
BM_deque_vector_ranges_move_backward/4000            211 ns          210 ns
BM_deque_vector_ranges_move_backward/4096            218 ns          221 ns
BM_deque_vector_ranges_move_backward/5500            294 ns          295 ns
BM_deque_vector_ranges_move_backward/64000          8919 ns         8326 ns
BM_deque_vector_ranges_move_backward/65536          9199 ns         8949 ns
BM_deque_vector_ranges_move_backward/70000          9086 ns         9893 ns
BM_deque_deque_move_backward/0                      1.27 ns         1.28 ns
BM_deque_deque_move_backward/1                      4.72 ns         4.70 ns
BM_deque_deque_move_backward/2                      3.19 ns         3.18 ns
BM_deque_deque_move_backward/64                     7.13 ns         7.13 ns
BM_deque_deque_move_backward/512                    29.3 ns         29.2 ns
BM_deque_deque_move_backward/1024                   58.2 ns         58.3 ns
BM_deque_deque_move_backward/4000                    221 ns          220 ns
BM_deque_deque_move_backward/4096                    226 ns          226 ns
BM_deque_deque_move_backward/5500                    303 ns          303 ns
BM_deque_deque_move_backward/64000                  8372 ns         8382 ns
BM_deque_deque_move_backward/65536                  9157 ns         8910 ns
BM_deque_deque_move_backward/70000                  9703 ns         9593 ns
BM_deque_deque_ranges_move_backward/0               1.27 ns         1.25 ns
BM_deque_deque_ranges_move_backward/1               4.79 ns         4.70 ns
BM_deque_deque_ranges_move_backward/2               3.19 ns         3.16 ns
BM_deque_deque_ranges_move_backward/64              7.13 ns         7.13 ns
BM_deque_deque_ranges_move_backward/512             29.3 ns         29.1 ns
BM_deque_deque_ranges_move_backward/1024            58.8 ns         58.6 ns
BM_deque_deque_ranges_move_backward/4000             221 ns          221 ns
BM_deque_deque_ranges_move_backward/4096             226 ns          226 ns
BM_deque_deque_ranges_move_backward/5500             304 ns          304 ns
BM_deque_deque_ranges_move_backward/64000           7948 ns         8372 ns
BM_deque_deque_ranges_move_backward/65536           8921 ns         9021 ns
BM_deque_deque_ranges_move_backward/70000           9359 ns         9551 ns
BM_vector_deque_move_backward/0                    0.638 ns        0.638 ns
BM_vector_deque_move_backward/1                     4.15 ns         4.15 ns
BM_vector_deque_move_backward/2                     2.55 ns         2.53 ns
BM_vector_deque_move_backward/64                    6.09 ns         6.10 ns
BM_vector_deque_move_backward/512                   28.4 ns         28.4 ns
BM_vector_deque_move_backward/1024                  57.2 ns         56.6 ns
BM_vector_deque_move_backward/4000                   214 ns          213 ns
BM_vector_deque_move_backward/4096                   221 ns          222 ns
BM_vector_deque_move_backward/5500                   296 ns          299 ns
BM_vector_deque_move_backward/64000                 8854 ns         9105 ns
BM_vector_deque_move_backward/65536                 9196 ns         8751 ns
BM_vector_deque_move_backward/70000                 9860 ns         9946 ns
BM_vector_deque_ranges_move_backward/0             0.638 ns        0.634 ns
BM_vector_deque_ranges_move_backward/1              4.08 ns         4.15 ns
BM_vector_deque_ranges_move_backward/2              2.55 ns         2.55 ns
BM_vector_deque_ranges_move_backward/64             6.10 ns         6.11 ns
BM_vector_deque_ranges_move_backward/512            28.4 ns         28.4 ns
BM_vector_deque_ranges_move_backward/1024           57.1 ns         57.1 ns
BM_vector_deque_ranges_move_backward/4000            214 ns          216 ns
BM_vector_deque_ranges_move_backward/4096            222 ns          221 ns
BM_vector_deque_ranges_move_backward/5500            297 ns          294 ns
BM_vector_deque_ranges_move_backward/64000          8939 ns         9241 ns
BM_vector_deque_ranges_move_backward/65536          9277 ns         8730 ns
BM_vector_deque_ranges_move_backward/70000          9972 ns         9853 ns
philnik updated this revision to Diff 525669.May 25 2023, 9:44 AM

Try to fix CI

philnik updated this revision to Diff 525824.May 25 2023, 2:47 PM

Try to fix CI

philnik updated this revision to Diff 526071.May 26 2023, 8:14 AM

Try to fix CI

philnik updated this revision to Diff 526186.May 26 2023, 1:51 PM

Try to fix CI

philnik updated this revision to Diff 526661.May 30 2023, 8:56 AM

Try to fix CI

philnik updated this revision to Diff 527052.May 31 2023, 8:19 AM

Try to fix CI

This revision was landed with ongoing or failed builds.May 31 2023, 6:15 PM
This revision was automatically updated to reflect the committed changes.