This simplifies the code inside copy/move and makes it easier to apply the optimization to other algorithms.
Details
- Reviewers
ldionne Mordante - Group Reviewers
Restricted Project - Commits
- rGdc124cda7c78: [libc++] Introduce __for_each_segment and use it in copy/move
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libcxx/include/__algorithm/for_each_segment.h | ||
---|---|---|
22 | In particular please document the input and expected output of the __func you're taking. |
Would it make sense to add benchmarks? The number in D151274 look quite impressive.
libcxx/include/__algorithm/copy.h | ||
---|---|---|
51 | I'm not too thrilled with this part of the changes. Is it possible to remove this helper struct and use a lambda instead? std::__for_each_segment(__first, __last, _CopySegment, [&__result](_Traits::__local_iterator __lfirst, _Traits::__local_iterator __llast) mutable { __result_ = std::__copy<_AlgPolicy>(__lfirst, __llast, std::move(__result_)).second; }); If so the same for move. | |
libcxx/include/__algorithm/for_each_segment.h | ||
30 | Thanks a lot for the comments, that makes understanding the code a lot easier. |
Try to fix CI
libcxx/include/__algorithm/copy.h | ||
---|---|---|
51 | I would love to, but unfortunately we are in C++03 land here. |
LGTM, if possible I still would like benchmarks, but not a blocker.
libcxx/include/__algorithm/copy.h | ||
---|---|---|
51 | ah yes right, I missed that. Then we need to keep our lambda emulation. |
Benchmarks are within margin of error:
--------------------------------------------------------------------------- Benchmark old new --------------------------------------------------------------------------- BM_deque_vector_copy/0 2.20 ns 2.19 ns BM_deque_vector_copy/1 4.16 ns 4.08 ns BM_deque_vector_copy/2 2.55 ns 2.51 ns BM_deque_vector_copy/64 5.04 ns 4.95 ns BM_deque_vector_copy/512 26.4 ns 26.5 ns BM_deque_vector_copy/1024 54.6 ns 53.5 ns BM_deque_vector_copy/4000 216 ns 211 ns BM_deque_vector_copy/4096 227 ns 224 ns BM_deque_vector_copy/5500 308 ns 308 ns BM_deque_vector_copy/64000 3826 ns 3742 ns BM_deque_vector_copy/65536 3779 ns 3869 ns BM_deque_vector_copy/70000 4108 ns 4101 ns BM_deque_vector_ranges_copy/0 2.23 ns 2.19 ns BM_deque_vector_ranges_copy/1 4.15 ns 4.07 ns BM_deque_vector_ranges_copy/2 2.55 ns 2.51 ns BM_deque_vector_ranges_copy/64 5.04 ns 5.04 ns BM_deque_vector_ranges_copy/512 26.5 ns 26.2 ns BM_deque_vector_ranges_copy/1024 54.7 ns 54.6 ns BM_deque_vector_ranges_copy/4000 215 ns 208 ns BM_deque_vector_ranges_copy/4096 224 ns 220 ns BM_deque_vector_ranges_copy/5500 307 ns 301 ns BM_deque_vector_ranges_copy/64000 3797 ns 3729 ns BM_deque_vector_ranges_copy/65536 3800 ns 3829 ns BM_deque_vector_ranges_copy/70000 4086 ns 4112 ns BM_deque_deque_copy/0 1.17 ns 1.17 ns BM_deque_deque_copy/1 4.47 ns 4.44 ns BM_deque_deque_copy/2 2.91 ns 2.92 ns BM_deque_deque_copy/64 6.20 ns 6.17 ns BM_deque_deque_copy/512 27.6 ns 27.4 ns BM_deque_deque_copy/1024 55.1 ns 54.0 ns BM_deque_deque_copy/4000 214 ns 211 ns BM_deque_deque_copy/4096 226 ns 225 ns BM_deque_deque_copy/5500 307 ns 308 ns BM_deque_deque_copy/64000 3881 ns 3877 ns BM_deque_deque_copy/65536 3993 ns 3963 ns BM_deque_deque_copy/70000 4261 ns 4234 ns BM_deque_deque_ranges_copy/0 1.17 ns 1.17 ns BM_deque_deque_ranges_copy/1 4.48 ns 4.47 ns BM_deque_deque_ranges_copy/2 2.91 ns 2.85 ns BM_deque_deque_ranges_copy/64 6.17 ns 6.05 ns BM_deque_deque_ranges_copy/512 27.6 ns 27.6 ns BM_deque_deque_ranges_copy/1024 54.7 ns 53.7 ns BM_deque_deque_ranges_copy/4000 216 ns 216 ns BM_deque_deque_ranges_copy/4096 225 ns 227 ns BM_deque_deque_ranges_copy/5500 307 ns 305 ns BM_deque_deque_ranges_copy/64000 3873 ns 3872 ns BM_deque_deque_ranges_copy/65536 3992 ns 3997 ns BM_deque_deque_ranges_copy/70000 4236 ns 4250 ns BM_vector_deque_copy/0 0.638 ns 0.637 ns BM_vector_deque_copy/1 4.15 ns 4.07 ns BM_vector_deque_copy/2 2.55 ns 2.51 ns BM_vector_deque_copy/64 5.14 ns 5.03 ns BM_vector_deque_copy/512 26.6 ns 26.8 ns BM_vector_deque_copy/1024 53.1 ns 52.4 ns BM_vector_deque_copy/4000 216 ns 211 ns BM_vector_deque_copy/4096 221 ns 220 ns BM_vector_deque_copy/5500 299 ns 305 ns BM_vector_deque_copy/64000 3945 ns 3849 ns BM_vector_deque_copy/65536 3853 ns 3933 ns BM_vector_deque_copy/70000 4181 ns 4326 ns BM_vector_deque_ranges_copy/0 0.638 ns 0.637 ns BM_vector_deque_ranges_copy/1 4.16 ns 4.14 ns BM_vector_deque_ranges_copy/2 2.55 ns 2.52 ns BM_vector_deque_ranges_copy/64 5.15 ns 5.10 ns BM_vector_deque_ranges_copy/512 26.7 ns 26.5 ns BM_vector_deque_ranges_copy/1024 53.0 ns 52.6 ns BM_vector_deque_ranges_copy/4000 217 ns 208 ns BM_vector_deque_ranges_copy/4096 222 ns 220 ns BM_vector_deque_ranges_copy/5500 299 ns 307 ns BM_vector_deque_ranges_copy/64000 3769 ns 3849 ns BM_vector_deque_ranges_copy/65536 4055 ns 3926 ns BM_vector_deque_ranges_copy/70000 4422 ns 4194 ns BM_deque_vector_move/0 2.23 ns 2.23 ns BM_deque_vector_move/1 4.15 ns 4.07 ns BM_deque_vector_move/2 2.55 ns 2.51 ns BM_deque_vector_move/64 5.04 ns 5.03 ns BM_deque_vector_move/512 26.5 ns 26.5 ns BM_deque_vector_move/1024 54.6 ns 54.6 ns BM_deque_vector_move/4000 215 ns 216 ns BM_deque_vector_move/4096 226 ns 219 ns BM_deque_vector_move/5500 307 ns 303 ns BM_deque_vector_move/64000 3782 ns 4021 ns BM_deque_vector_move/65536 3778 ns 3765 ns BM_deque_vector_move/70000 4113 ns 4128 ns BM_deque_vector_ranges_move/0 2.21 ns 2.19 ns BM_deque_vector_ranges_move/1 4.15 ns 4.14 ns BM_deque_vector_ranges_move/2 2.55 ns 2.55 ns BM_deque_vector_ranges_move/64 5.04 ns 5.03 ns BM_deque_vector_ranges_move/512 26.5 ns 26.5 ns BM_deque_vector_ranges_move/1024 54.8 ns 54.4 ns BM_deque_vector_ranges_move/4000 216 ns 216 ns BM_deque_vector_ranges_move/4096 220 ns 225 ns BM_deque_vector_ranges_move/5500 306 ns 309 ns BM_deque_vector_ranges_move/64000 3798 ns 3845 ns BM_deque_vector_ranges_move/65536 3793 ns 3757 ns BM_deque_vector_ranges_move/70000 4161 ns 4115 ns BM_deque_deque_move/0 1.17 ns 1.15 ns BM_deque_deque_move/1 4.47 ns 4.47 ns BM_deque_deque_move/2 2.91 ns 2.91 ns BM_deque_deque_move/64 6.22 ns 6.17 ns BM_deque_deque_move/512 27.6 ns 27.7 ns BM_deque_deque_move/1024 54.7 ns 54.7 ns BM_deque_deque_move/4000 215 ns 215 ns BM_deque_deque_move/4096 227 ns 226 ns BM_deque_deque_move/5500 309 ns 308 ns BM_deque_deque_move/64000 3868 ns 3854 ns BM_deque_deque_move/65536 3987 ns 3985 ns BM_deque_deque_move/70000 4231 ns 4218 ns BM_deque_deque_ranges_move/0 1.17 ns 1.17 ns BM_deque_deque_ranges_move/1 4.48 ns 4.47 ns BM_deque_deque_ranges_move/2 2.91 ns 2.90 ns BM_deque_deque_ranges_move/64 6.19 ns 6.16 ns BM_deque_deque_ranges_move/512 27.5 ns 27.6 ns BM_deque_deque_ranges_move/1024 54.8 ns 54.9 ns BM_deque_deque_ranges_move/4000 217 ns 217 ns BM_deque_deque_ranges_move/4096 227 ns 226 ns BM_deque_deque_ranges_move/5500 307 ns 307 ns BM_deque_deque_ranges_move/64000 3861 ns 3862 ns BM_deque_deque_ranges_move/65536 3990 ns 3956 ns BM_deque_deque_ranges_move/70000 4402 ns 4223 ns BM_vector_deque_move/0 0.638 ns 0.638 ns BM_vector_deque_move/1 4.12 ns 4.07 ns BM_vector_deque_move/2 2.55 ns 2.51 ns BM_vector_deque_move/64 5.06 ns 5.06 ns BM_vector_deque_move/512 26.6 ns 26.6 ns BM_vector_deque_move/1024 53.1 ns 53.1 ns BM_vector_deque_move/4000 215 ns 216 ns BM_vector_deque_move/4096 221 ns 222 ns BM_vector_deque_move/5500 303 ns 300 ns BM_vector_deque_move/64000 3847 ns 3876 ns BM_vector_deque_move/65536 3930 ns 3927 ns BM_vector_deque_move/70000 4206 ns 4182 ns BM_vector_deque_ranges_move/0 0.638 ns 0.628 ns BM_vector_deque_ranges_move/1 4.10 ns 4.14 ns BM_vector_deque_ranges_move/2 2.55 ns 2.55 ns BM_vector_deque_ranges_move/64 5.07 ns 5.07 ns BM_vector_deque_ranges_move/512 26.6 ns 26.6 ns BM_vector_deque_ranges_move/1024 53.0 ns 53.0 ns BM_vector_deque_ranges_move/4000 213 ns 216 ns BM_vector_deque_ranges_move/4096 221 ns 222 ns BM_vector_deque_ranges_move/5500 304 ns 299 ns BM_vector_deque_ranges_move/64000 3971 ns 3867 ns BM_vector_deque_ranges_move/65536 4032 ns 4107 ns BM_vector_deque_ranges_move/70000 4285 ns 4185 ns BM_deque_vector_copy_backward/0 2.53 ns 2.50 ns BM_deque_vector_copy_backward/1 4.43 ns 4.39 ns BM_deque_vector_copy_backward/2 2.87 ns 2.87 ns BM_deque_vector_copy_backward/64 6.01 ns 5.95 ns BM_deque_vector_copy_backward/512 28.6 ns 28.4 ns BM_deque_vector_copy_backward/1024 54.1 ns 54.0 ns BM_deque_vector_copy_backward/4000 211 ns 211 ns BM_deque_vector_copy_backward/4096 218 ns 220 ns BM_deque_vector_copy_backward/5500 294 ns 295 ns BM_deque_vector_copy_backward/64000 7915 ns 8409 ns BM_deque_vector_copy_backward/65536 8402 ns 9041 ns BM_deque_vector_copy_backward/70000 9309 ns 9253 ns BM_deque_vector_ranges_copy_backward/0 2.23 ns 2.21 ns BM_deque_vector_ranges_copy_backward/1 4.39 ns 4.46 ns BM_deque_vector_ranges_copy_backward/2 2.83 ns 2.83 ns BM_deque_vector_ranges_copy_backward/64 6.02 ns 5.95 ns BM_deque_vector_ranges_copy_backward/512 28.6 ns 28.6 ns BM_deque_vector_ranges_copy_backward/1024 54.1 ns 54.1 ns BM_deque_vector_ranges_copy_backward/4000 212 ns 212 ns BM_deque_vector_ranges_copy_backward/4096 219 ns 220 ns BM_deque_vector_ranges_copy_backward/5500 295 ns 298 ns BM_deque_vector_ranges_copy_backward/64000 8584 ns 8819 ns BM_deque_vector_ranges_copy_backward/65536 9253 ns 9739 ns BM_deque_vector_ranges_copy_backward/70000 9012 ns 10445 ns BM_deque_deque_copy_backward/0 1.28 ns 1.28 ns BM_deque_deque_copy_backward/1 4.79 ns 4.70 ns BM_deque_deque_copy_backward/2 3.22 ns 3.19 ns BM_deque_deque_copy_backward/64 7.14 ns 7.13 ns BM_deque_deque_copy_backward/512 29.3 ns 29.2 ns BM_deque_deque_copy_backward/1024 58.5 ns 58.5 ns BM_deque_deque_copy_backward/4000 221 ns 221 ns BM_deque_deque_copy_backward/4096 231 ns 228 ns BM_deque_deque_copy_backward/5500 302 ns 302 ns BM_deque_deque_copy_backward/64000 8029 ns 8367 ns BM_deque_deque_copy_backward/65536 9023 ns 8962 ns BM_deque_deque_copy_backward/70000 9403 ns 9668 ns BM_deque_deque_ranges_copy_backward/0 1.28 ns 1.27 ns BM_deque_deque_ranges_copy_backward/1 4.79 ns 4.78 ns BM_deque_deque_ranges_copy_backward/2 3.19 ns 3.19 ns BM_deque_deque_ranges_copy_backward/64 7.13 ns 7.04 ns BM_deque_deque_ranges_copy_backward/512 29.4 ns 29.4 ns BM_deque_deque_ranges_copy_backward/1024 58.2 ns 58.2 ns BM_deque_deque_ranges_copy_backward/4000 221 ns 221 ns BM_deque_deque_ranges_copy_backward/4096 227 ns 227 ns BM_deque_deque_ranges_copy_backward/5500 302 ns 304 ns BM_deque_deque_ranges_copy_backward/64000 7961 ns 8356 ns BM_deque_deque_ranges_copy_backward/65536 8818 ns 8913 ns BM_deque_deque_ranges_copy_backward/70000 9904 ns 9605 ns BM_vector_deque_copy_backward/0 0.638 ns 0.638 ns BM_vector_deque_copy_backward/1 4.15 ns 4.07 ns BM_vector_deque_copy_backward/2 2.55 ns 2.55 ns BM_vector_deque_copy_backward/64 6.11 ns 6.12 ns BM_vector_deque_copy_backward/512 28.3 ns 28.3 ns BM_vector_deque_copy_backward/1024 57.0 ns 57.0 ns BM_vector_deque_copy_backward/4000 215 ns 214 ns BM_vector_deque_copy_backward/4096 222 ns 221 ns BM_vector_deque_copy_backward/5500 296 ns 299 ns BM_vector_deque_copy_backward/64000 8013 ns 8780 ns BM_vector_deque_copy_backward/65536 9448 ns 9191 ns BM_vector_deque_copy_backward/70000 9345 ns 10010 ns BM_vector_deque_ranges_copy_backward/0 0.638 ns 0.626 ns BM_vector_deque_ranges_copy_backward/1 4.07 ns 4.14 ns BM_vector_deque_ranges_copy_backward/2 2.53 ns 2.56 ns BM_vector_deque_ranges_copy_backward/64 6.12 ns 6.11 ns BM_vector_deque_ranges_copy_backward/512 28.3 ns 28.4 ns BM_vector_deque_ranges_copy_backward/1024 57.0 ns 57.0 ns BM_vector_deque_ranges_copy_backward/4000 214 ns 216 ns BM_vector_deque_ranges_copy_backward/4096 221 ns 223 ns BM_vector_deque_ranges_copy_backward/5500 297 ns 293 ns BM_vector_deque_ranges_copy_backward/64000 8795 ns 9020 ns BM_vector_deque_ranges_copy_backward/65536 9478 ns 9255 ns BM_vector_deque_ranges_copy_backward/70000 9165 ns 8989 ns BM_deque_vector_move_backward/0 2.19 ns 2.23 ns BM_deque_vector_move_backward/1 4.46 ns 4.48 ns BM_deque_vector_move_backward/2 2.87 ns 2.86 ns BM_deque_vector_move_backward/64 6.01 ns 5.95 ns BM_deque_vector_move_backward/512 28.6 ns 28.4 ns BM_deque_vector_move_backward/1024 54.1 ns 54.1 ns BM_deque_vector_move_backward/4000 211 ns 213 ns BM_deque_vector_move_backward/4096 217 ns 219 ns BM_deque_vector_move_backward/5500 295 ns 293 ns BM_deque_vector_move_backward/64000 9059 ns 8965 ns BM_deque_vector_move_backward/65536 9081 ns 9236 ns BM_deque_vector_move_backward/70000 9029 ns 9900 ns BM_deque_vector_ranges_move_backward/0 2.19 ns 2.19 ns BM_deque_vector_ranges_move_backward/1 4.41 ns 4.47 ns BM_deque_vector_ranges_move_backward/2 2.87 ns 2.87 ns BM_deque_vector_ranges_move_backward/64 6.01 ns 5.94 ns BM_deque_vector_ranges_move_backward/512 28.6 ns 28.6 ns BM_deque_vector_ranges_move_backward/1024 54.2 ns 54.1 ns BM_deque_vector_ranges_move_backward/4000 211 ns 210 ns BM_deque_vector_ranges_move_backward/4096 218 ns 221 ns BM_deque_vector_ranges_move_backward/5500 294 ns 295 ns BM_deque_vector_ranges_move_backward/64000 8919 ns 8326 ns BM_deque_vector_ranges_move_backward/65536 9199 ns 8949 ns BM_deque_vector_ranges_move_backward/70000 9086 ns 9893 ns BM_deque_deque_move_backward/0 1.27 ns 1.28 ns BM_deque_deque_move_backward/1 4.72 ns 4.70 ns BM_deque_deque_move_backward/2 3.19 ns 3.18 ns BM_deque_deque_move_backward/64 7.13 ns 7.13 ns BM_deque_deque_move_backward/512 29.3 ns 29.2 ns BM_deque_deque_move_backward/1024 58.2 ns 58.3 ns BM_deque_deque_move_backward/4000 221 ns 220 ns BM_deque_deque_move_backward/4096 226 ns 226 ns BM_deque_deque_move_backward/5500 303 ns 303 ns BM_deque_deque_move_backward/64000 8372 ns 8382 ns BM_deque_deque_move_backward/65536 9157 ns 8910 ns BM_deque_deque_move_backward/70000 9703 ns 9593 ns BM_deque_deque_ranges_move_backward/0 1.27 ns 1.25 ns BM_deque_deque_ranges_move_backward/1 4.79 ns 4.70 ns BM_deque_deque_ranges_move_backward/2 3.19 ns 3.16 ns BM_deque_deque_ranges_move_backward/64 7.13 ns 7.13 ns BM_deque_deque_ranges_move_backward/512 29.3 ns 29.1 ns BM_deque_deque_ranges_move_backward/1024 58.8 ns 58.6 ns BM_deque_deque_ranges_move_backward/4000 221 ns 221 ns BM_deque_deque_ranges_move_backward/4096 226 ns 226 ns BM_deque_deque_ranges_move_backward/5500 304 ns 304 ns BM_deque_deque_ranges_move_backward/64000 7948 ns 8372 ns BM_deque_deque_ranges_move_backward/65536 8921 ns 9021 ns BM_deque_deque_ranges_move_backward/70000 9359 ns 9551 ns BM_vector_deque_move_backward/0 0.638 ns 0.638 ns BM_vector_deque_move_backward/1 4.15 ns 4.15 ns BM_vector_deque_move_backward/2 2.55 ns 2.53 ns BM_vector_deque_move_backward/64 6.09 ns 6.10 ns BM_vector_deque_move_backward/512 28.4 ns 28.4 ns BM_vector_deque_move_backward/1024 57.2 ns 56.6 ns BM_vector_deque_move_backward/4000 214 ns 213 ns BM_vector_deque_move_backward/4096 221 ns 222 ns BM_vector_deque_move_backward/5500 296 ns 299 ns BM_vector_deque_move_backward/64000 8854 ns 9105 ns BM_vector_deque_move_backward/65536 9196 ns 8751 ns BM_vector_deque_move_backward/70000 9860 ns 9946 ns BM_vector_deque_ranges_move_backward/0 0.638 ns 0.634 ns BM_vector_deque_ranges_move_backward/1 4.08 ns 4.15 ns BM_vector_deque_ranges_move_backward/2 2.55 ns 2.55 ns BM_vector_deque_ranges_move_backward/64 6.10 ns 6.11 ns BM_vector_deque_ranges_move_backward/512 28.4 ns 28.4 ns BM_vector_deque_ranges_move_backward/1024 57.1 ns 57.1 ns BM_vector_deque_ranges_move_backward/4000 214 ns 216 ns BM_vector_deque_ranges_move_backward/4096 222 ns 221 ns BM_vector_deque_ranges_move_backward/5500 297 ns 294 ns BM_vector_deque_ranges_move_backward/64000 8939 ns 9241 ns BM_vector_deque_ranges_move_backward/65536 9277 ns 8730 ns BM_vector_deque_ranges_move_backward/70000 9972 ns 9853 ns
I'm not too thrilled with this part of the changes. Is it possible to remove this helper struct and use a lambda instead?
If so the same for move.