This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Optimize for_each for segmented iterators
ClosedPublic

Authored by philnik on May 23 2023, 5:13 PM.

Details

Summary
---------------------------------------------------
Benchmark                       old             new
---------------------------------------------------
bm_for_each/1               3.00 ns         2.98 ns
bm_for_each/2               4.53 ns         4.57 ns
bm_for_each/3               5.82 ns         5.82 ns
bm_for_each/4               6.94 ns         6.91 ns
bm_for_each/5               7.55 ns         7.75 ns
bm_for_each/6               7.06 ns         7.45 ns
bm_for_each/7               6.69 ns         7.14 ns
bm_for_each/8               6.86 ns         4.06 ns
bm_for_each/16              11.5 ns         5.73 ns
bm_for_each/64              43.7 ns         4.06 ns
bm_for_each/512              356 ns         7.98 ns
bm_for_each/4096            2787 ns         53.6 ns
bm_for_each/32768          20836 ns          438 ns
bm_for_each/262144        195362 ns         4945 ns
bm_for_each/1048576       685482 ns        19822 ns

Diff Detail

Event Timeline

philnik created this revision.May 23 2023, 5:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 5:13 PM
philnik requested review of this revision.May 23 2023, 5:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2023, 5:13 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Benchmark numbers:

---------------------------------------------------
Benchmark                       old             new
---------------------------------------------------
bm_for_each/1               3.00 ns         2.98 ns
bm_for_each/2               4.53 ns         4.57 ns
bm_for_each/3               5.82 ns         5.82 ns
bm_for_each/4               6.94 ns         6.91 ns
bm_for_each/5               7.55 ns         7.75 ns
bm_for_each/6               7.06 ns         7.45 ns
bm_for_each/7               6.69 ns         7.14 ns
bm_for_each/8               6.86 ns         4.06 ns
bm_for_each/16              11.5 ns         5.73 ns
bm_for_each/64              43.7 ns         4.06 ns
bm_for_each/512              356 ns         7.98 ns
bm_for_each/4096            2787 ns         53.6 ns
bm_for_each/32768          20836 ns          438 ns
bm_for_each/262144        195362 ns         4945 ns
bm_for_each/1048576       685482 ns        19822 ns

I think the benchmark numbers aren't in any way surprising (we can get similar numbers with similar patches for most algorithms). This patch is more focused on the actual implementation. This is only optimized for C++20-and-later as-is. The question is whether that is OK for the library. The implementation is a lot easier, but we essentially say "use the latest C++ version or you don't get nice optimizations".

ldionne requested changes to this revision.May 24 2023, 8:31 AM

LGTM but I'd like to see again with the test.

libcxx/benchmarks/algorithms/for_each.bench.cpp
14
libcxx/include/__algorithm/for_each.h
1

Can we have a test that checks specifically std::for_each on a segmented iterator. Basically we can have a libcxx/test/std test that runs for_each on deque and explains that we're testing the interaction of for_each with segmented iterators. It'll be valid (but perhaps not *relevant*) for all implementations.

38

I am fine with only optimizing this in C++20. This is QOI and we should be pushing folks towards newer Standards.

This revision now requires changes to proceed.May 24 2023, 8:31 AM

Benchmark numbers:

Can you put these numbers in the commit message? That way we can be certain they will be preserved even when Phab ever disappears.

This is only optimized for C++20-and-later as-is. The question is whether that is OK for the library. The implementation is a lot easier, but we essentially say "use the latest C++ version or you don't get nice optimizations".

I'm totally fine with that. If it's easy to backport the changes without ugly code that would be great, otherwise let's not do it.
In 10 years we still need to maintain this code and most users have probably moved to C++20 or later.

libcxx/include/__algorithm/for_each.h
38

+1

43

Seems this answers my lambda question in the other review ;-)

48
philnik edited the summary of this revision. (Show Details)May 24 2023, 12:48 PM
philnik edited the summary of this revision. (Show Details)
philnik updated this revision to Diff 525315.May 24 2023, 1:15 PM
philnik marked 5 inline comments as done.

Address comments

Mordante accepted this revision as: Mordante.May 24 2023, 2:08 PM

I'm happy, but since Louis had some remarks I leave the final approval to him.

ldionne accepted this revision.May 25 2023, 8:09 AM
ldionne added inline comments.
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
51

There's a lot of logic crammed into the same expression otherwise.

This revision is now accepted and ready to land.May 25 2023, 8:09 AM
philnik updated this revision to Diff 525672.May 25 2023, 9:47 AM
  • Rebased
  • Fix formatting
This revision was automatically updated to reflect the committed changes.
sberg added a subscriber: sberg.Jun 5 2023, 4:39 AM

this causes

$ cat test.cc
#include <algorithm>
#include <deque>
void f(std::deque<int> & v) {
    std::for_each(v.begin(), v.end(), [&](int) {});
}
$ clang++ -std=c++20 -fsyntax-only test.cc
In file included from test.cc:1:
In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1758:
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:39:12: error: object of type '(lambda at test.cc:4:39)' cannot be assigned because its copy assignment operator is implicitly deleted
   39 |     __func = std::for_each(__lfirst, __llast, std::move(__func));
      |            ^
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:35:5: note: in instantiation of function template specialization 'std::for_each(std::__deque_iterator<int, int *, int &, int **, long>, std::__deque_iterator<int, int *, int &, int **, long>, (lambda at test.cc:4:39))::(anonymous class)::operator()<int *, int *>' requested here
   35 |     __func(_Traits::__local(__first), _Traits::__local(__last));
      |     ^
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:8: note: in instantiation of function template specialization 'std::__for_each_segment<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)>' requested here
   38 |   std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) {
      |        ^
test.cc:4:10: note: in instantiation of function template specialization 'std::for_each<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at test.cc:4:39)>' requested here
    4 |     std::for_each(v.begin(), v.end(), [&](int) {});
      |          ^
test.cc:4:39: note: lambda expression begins here
    4 |     std::for_each(v.begin(), v.end(), [&](int) {});
      |                                       ^
In file included from test.cc:1:
In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1743:
In file included from ~/llvm/inst/bin/../include/c++/v1/__algorithm/copy.h:13:
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:40:3: error: no matching function for call to object of type '(lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)'
   40 |   __func(_Traits::__local(__first), _Traits::__end(__sfirst));
      |   ^~~~~~
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:8: note: in instantiation of function template specialization 'std::__for_each_segment<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)>' requested here
   38 |   std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) {
      |        ^
test.cc:4:10: note: in instantiation of function template specialization 'std::for_each<std::__deque_iterator<int, int *, int &, int **, long>, (lambda at test.cc:4:39)>' requested here
    4 |     std::for_each(v.begin(), v.end(), [&](int) {});
      |          ^
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44: note: candidate template ignored: substitution failure [with __lfirst:auto = __local_iterator, __llast:auto = __local_iterator]
   38 |   std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) {
      |                                            ^
In file included from test.cc:1:
In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1743:
In file included from ~/llvm/inst/bin/../include/c++/v1/__algorithm/copy.h:13:
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:44:5: error: no matching function for call to object of type '(lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)'
   44 |     __func(_Traits::__begin(__sfirst), _Traits::__end(__sfirst));
      |     ^~~~~~
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44: note: candidate template ignored: substitution failure [with __lfirst:auto = __local_iterator, __llast:auto = __local_iterator]
   38 |   std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) {
      |                                            ^
In file included from test.cc:1:
In file included from ~/llvm/inst/bin/../include/c++/v1/algorithm:1743:
In file included from ~/llvm/inst/bin/../include/c++/v1/__algorithm/copy.h:13:
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each_segment.h:48:3: error: no matching function for call to object of type '(lambda at ~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44)'
   48 |   __func(_Traits::__begin(__sfirst), _Traits::__local(__last));
      |   ^~~~~~
~/llvm/inst/bin/../include/c++/v1/__algorithm/for_each.h:38:44: note: candidate template ignored: substitution failure [with __lfirst:auto = __local_iterator, __llast:auto = __local_iterator]
   38 |   std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) {
      |                                            ^
4 errors generated.

We seem to be using copy assignment of the function object but that isn't a requirement of the algorithm. I think this problem is on our side. @philnik we could do something like:

diff --git a/libcxx/include/__algorithm/for_each.h b/libcxx/include/__algorithm/for_each.h
index 5e273cf1b9b1..5afd25b337e7 100644
--- a/libcxx/include/__algorithm/for_each.h
+++ b/libcxx/include/__algorithm/for_each.h
@@ -35,10 +35,11 @@ template <class _SegmentedIterator, class _Function>
   requires __is_segmented_iterator<_SegmentedIterator>::value
 _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _Function
 for_each(_SegmentedIterator __first, _SegmentedIterator __last, _Function __func) {
+  std::__movable_box<_Function> __tmp(in_place, std::move(__func));
   std::__for_each_segment(__first, __last, [&](auto __lfirst, auto __llast) {
-    __func = std::for_each(__lfirst, __llast, std::move(__func));
+    __tmp = std::__movable_box<_Function>(in_place, std::for_each(__lfirst, __llast, std::move(*__tmp)));
   });
-  return __func;
+  return std::move(*__tmp);
 }
 #endif // _LIBCPP_STD_VER >= 20

I just spoke with @philnik and we're going to revert this patch until we land D151629 and apply something like the proposed diff above. We'll also add a test to make sure we'd catch this in the future.

asmok-g added a subscriber: asmok-g.Jun 5 2023, 8:31 AM
bgraur added a subscriber: bgraur.Jun 5 2023, 10:01 AM
philnik reopened this revision.Jun 5 2023, 10:03 AM
This revision is now accepted and ready to land.Jun 5 2023, 10:03 AM
ldionne requested changes to this revision.Sep 14 2023, 9:03 AM
ldionne added inline comments.
libcxx/include/__algorithm/for_each.h
38

Let's just do it for C++ >= 23 since it will require __movable_box, which is C++23.

libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
2

Not attached: We need to add a test for std::for_each where we pass a function object that isn't copy-constructible. They are supposed to work. They need to be move constructible though.

This revision now requires changes to proceed.Sep 14 2023, 9:03 AM
philnik updated this revision to Diff 557520.Oct 1 2023, 7:00 AM
philnik marked 3 inline comments as done.

Address comments

ldionne requested changes to this revision.Oct 5 2023, 8:59 AM

I think this will be LGTM after comments, but I'd like to take a quick look again.

libcxx/docs/ReleaseNotes/18.rst
56–57
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
13

This seems wrong? It was there before, but it still seems incorrect.

14
41

We need tests with 0 and 1 length sequences at the very least.

69–73

Same here, we should test a couple of different sizes of deque, in particular around the size of underlying segments. And 0 and 1.

libcxx/test/tools/clang_tidy_checks/CMakeLists.txt
12 ↗(On Diff #557520)

I don't think this should be part of this patch.

This revision now requires changes to proceed.Oct 5 2023, 8:59 AM
philnik updated this revision to Diff 557836.Oct 22 2023, 5:14 AM
philnik marked 6 inline comments as done.

Address comments

ldionne added inline comments.Oct 23 2023, 6:04 PM
libcxx/include/__algorithm/for_each.h
42

We need to test constexpr for segmented iterators (especially since we've had bugs related to that recently).

libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
76

I think this range-based for-loop will break your C++03 CI run.

asmok-g removed a subscriber: asmok-g.Oct 23 2023, 11:39 PM
philnik updated this revision to Diff 557892.Oct 26 2023, 3:35 AM
philnik marked 2 inline comments as done.

Address comments

ldionne accepted this revision.Oct 26 2023, 8:40 AM

LGTM once CI is green. Thanks!

This revision is now accepted and ready to land.Oct 26 2023, 8:40 AM
philnik updated this revision to Diff 557908.Oct 27 2023, 2:26 AM

Try to fix CI

philnik updated this revision to Diff 557936.Oct 30 2023, 9:01 AM

Try to fix CI

This revision was automatically updated to reflect the committed changes.
libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp