--------------------------------------------------- Benchmark old new --------------------------------------------------- bm_for_each/1 3.00 ns 2.98 ns bm_for_each/2 4.53 ns 4.57 ns bm_for_each/3 5.82 ns 5.82 ns bm_for_each/4 6.94 ns 6.91 ns bm_for_each/5 7.55 ns 7.75 ns bm_for_each/6 7.06 ns 7.45 ns bm_for_each/7 6.69 ns 7.14 ns bm_for_each/8 6.86 ns 4.06 ns bm_for_each/16 11.5 ns 5.73 ns bm_for_each/64 43.7 ns 4.06 ns bm_for_each/512 356 ns 7.98 ns bm_for_each/4096 2787 ns 53.6 ns bm_for_each/32768 20836 ns 438 ns bm_for_each/262144 195362 ns 4945 ns bm_for_each/1048576 685482 ns 19822 ns
Details
- Reviewers
- ldionne - Mordante - var-const - huixie90 - jloser 
- Group Reviewers
- Restricted Project 
- Commits
- rGc81bfc61da7e: [libc++] Optimize for_each for segmented iterators
 rGb1dc43aa3a05: [libc++] Optimize for_each for segmented iterators
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Benchmark numbers:
--------------------------------------------------- Benchmark old new --------------------------------------------------- bm_for_each/1 3.00 ns 2.98 ns bm_for_each/2 4.53 ns 4.57 ns bm_for_each/3 5.82 ns 5.82 ns bm_for_each/4 6.94 ns 6.91 ns bm_for_each/5 7.55 ns 7.75 ns bm_for_each/6 7.06 ns 7.45 ns bm_for_each/7 6.69 ns 7.14 ns bm_for_each/8 6.86 ns 4.06 ns bm_for_each/16 11.5 ns 5.73 ns bm_for_each/64 43.7 ns 4.06 ns bm_for_each/512 356 ns 7.98 ns bm_for_each/4096 2787 ns 53.6 ns bm_for_each/32768 20836 ns 438 ns bm_for_each/262144 195362 ns 4945 ns bm_for_each/1048576 685482 ns 19822 ns
I think the benchmark numbers aren't in any way surprising (we can get similar numbers with similar patches for most algorithms). This patch is more focused on the actual implementation. This is only optimized for C++20-and-later as-is. The question is whether that is OK for the library. The implementation is a lot easier, but we essentially say "use the latest C++ version or you don't get nice optimizations".
LGTM but I'd like to see again with the test.
| libcxx/benchmarks/algorithms/for_each.bench.cpp | ||
|---|---|---|
| 14 | ||
| libcxx/include/__algorithm/for_each.h | ||
| 1 | Can we have a test that checks specifically std::for_each on a segmented iterator. Basically we can have a libcxx/test/std test that runs for_each on deque and explains that we're testing the interaction of for_each with segmented iterators. It'll be valid (but perhaps not *relevant*) for all implementations. | |
| 33 | I am fine with only optimizing this in C++20. This is QOI and we should be pushing folks towards newer Standards. | |
Can you put these numbers in the commit message? That way we can be certain they will be preserved even when Phab ever disappears.
This is only optimized for C++20-and-later as-is. The question is whether that is OK for the library. The implementation is a lot easier, but we essentially say "use the latest C++ version or you don't get nice optimizations".
I'm totally fine with that. If it's easy to backport the changes without ugly code that would be great, otherwise let's not do it.
In 10 years we still need to maintain this code and most users have probably moved to C++20 or later.
| libcxx/include/__algorithm/for_each.h | ||
|---|---|---|
| 33 | +1 | |
| 38 | Seems this answers my lambda question in the other review ;-) | |
| 43 | ||
| libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp | ||
|---|---|---|
| 51 | There's a lot of logic crammed into the same expression otherwise. | |
this causes
$ cat test.cc #include <algorithm> #include <deque> void f(std::deque<int> & v) { std::for_each(v.begin(), v.end(), [&](int) {}); }
$ clang++ -std=c++20 -fsyntax-only test.cc In file included from test.cc:1: In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1758: ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:39:12: error: object of type '(lambda at test.cc:4:39)' cannot be assigned because its copy assignment operator is implicitly deleted 39 | __func = std::for_each(__lfirst, __llast, std::move(__func)); | ^ ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:35:5: note: in instantiation of function template specialization 'std::for_each(std::__deque_iterator<int, int *, int &, int **, long>, std::__deque_iterator<int, int *, int &, int **, long>, (lambda at test.cc:4:39))::(anonymous class)::operator()<int *, int *>' requested here 35 | __func(_Traits::__local(__first), _Traits::__local(__last)); | ^ ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:8: note: in instantiation of function template specialization 'std::__for_each_segment<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)>' requested here 38 | std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) { | ^ test.cc:4:10: note: in instantiation of function template specialization 'std::for_each<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at test.cc:4:39)>' requested here 4 | std::for_each(v.begin(), v.end(), [&](int) {}); | ^ test.cc:4:39: note: lambda expression begins here 4 | std::for_each(v.begin(), v.end(), [&](int) {}); | ^ In file included from test.cc:1: In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1743: In file included from ~/llvm/inst/bin/../include/c++/v1/__algorithm/copy.h:13: ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:40:3: error: no matching function for call to object of type '(lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)' 40 | __func(_Traits::__local(__first), _Traits::__end(__sfirst)); | ^~~~~~ ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:8: note: in instantiation of function template specialization 'std::__for_each_segment<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)>' requested here 38 | std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) { | ^ test.cc:4:10: note: in instantiation of function template specialization 'std::for_each<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at test.cc:4:39)>' requested here 4 | std::for_each(v.begin(), v.end(), [&](int) {}); | ^ ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44: note: candidate template ignored: substitution failure [with __lfirst:auto = __local_iterator, __llast:auto = __local_iterator] 38 | std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) { | ^ In file included from test.cc:1: In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1743: In file included from ~/llvm/inst/bin/../include/c++/v1/__algorithm/copy.h:13: ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:44:5: error: no matching function for call to object of type '(lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)' 44 | __func(_Traits::__begin(__sfirst), _Traits::__end(__sfirst)); | ^~~~~~ ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44: note: candidate template ignored: substitution failure [with __lfirst:auto = __local_iterator, __llast:auto = __local_iterator] 38 | std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) { | ^ In file included from test.cc:1: In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1743: In file included from ~/llvm/inst/bin/../include/c++/v1/__algorithm/copy.h:13: ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:48:3: error: no matching function for call to object of type '(lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)' 48 | __func(_Traits::__begin(__sfirst), _Traits::__local(__last)); | ^~~~~~ ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44: note: candidate template ignored: substitution failure [with __lfirst:auto = __local_iterator, __llast:auto = __local_iterator] 38 | std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) { | ^ 4 errors generated.
We seem to be using copy assignment of the function object but that isn't a requirement of the algorithm. I think this problem is on our side. @philnik we could do something like:
diff --git a/libcxx/include/__algorithm/for_each.h b/libcxx/include/__algorithm/for_each.h index 5e273cf1b9b1..5afd25b337e7 100644 --- a/libcxx/include/__algorithm/for_each.h +++ b/libcxx/include/__algorithm/for_each.h @@ -35,10 +35,11 @@ template <class _SegmentedIterator, class _Function> requires __is_segmented_iterator<_SegmentedIterator>::value _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Function for_each(_SegmentedIterator __first, _SegmentedIterator __last, _Function __func) { + std::__movable_box<_Function> __tmp(in_place, std::move(__func)); std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) { - __func = std::for_each(__lfirst, __llast, std::move(__func)); + __tmp = std::__movable_box<_Function>(in_place, std::for_each(__lfirst, __llast, std::move(*__tmp))); }); - return __func; + return std::move(*__tmp); } #endif // _LIBCPP_STD_VER >= 20
| libcxx/include/__algorithm/for_each.h | ||
|---|---|---|
| 33 | Let's just do it for C++ >= 23 since it will require __movable_box, which is C++23. | |
| libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp | ||
| 1 | Not attached: We need to add a test for std::for_each where we pass a function object that isn't copy-constructible. They are supposed to work. They need to be move constructible though. | |
I think this will be LGTM after comments, but I'd like to take a quick look again.
| libcxx/docs/ReleaseNotes/18.rst | ||
|---|---|---|
| 60–61 ↗ | (On Diff #557520) | |
| libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp | ||
| 13 | This seems wrong? It was there before, but it still seems incorrect. | |
| 14 | ||
| 41 | We need tests with 0 and 1 length sequences at the very least. | |
| 69–73 | Same here, we should test a couple of different sizes of deque, in particular around the size of underlying segments. And 0 and 1. | |
| libcxx/test/tools/clang_tidy_checks/CMakeLists.txt | ||
| 12 ↗ | (On Diff #557520) | I don't think this should be part of this patch. |