This is an archive of the discontinued LLVM Phabricator instance.

libcxx/include/__algorithm/pstl_for_each.h
50	I guess we could do an ADL call here and if that resolves, we use that, otherwise we use this implementation. There's still difficulties with the fact that we have both a `par` and an `unseq` backend, though.

I generally feel that we are better served with a overload set based on internal execution policy types. One problem I see is that we very, very fast are gonna get asked how to for example choose either SIMD or OpenMP based parallelism on a per invocation basis. The overload set based approach makes it easy to support this. Note we ARE allowed to ship implementation defined execution policies: "The semantics of parallel algorithms invoked with an execution policy object of implementation-defined type are implementation-defined." This is not undefined behavior, we just need to say what it does. That means for example we could ship std::omp_par std::omp_par_simd, std::gcd etc.. Or rather the LLVM-OpenMP project could ship std::omp_par together with a customization implementation of the algorithms. And AMD in their ROCm toolchain could add std::par_hip or something like that.

Then the only configuration decision is essentially what to map the mandated execution policies too.

Here is some code from our std::linalg prototype which does something like this:

c++
template<class .........>
void matrix_vector_product(
  ExecutionPolicy&& exec, mdspan<...> A, mdspan<...> x, mdspan<...> y)
{
  constexpr bool use_custom = is_custom_mat_vec_product_avail<
    decltype(execpolicy_mapper(exec)), decltype(A), decltype(x), decltype(y)>::value;

  if constexpr(use_custom) {
    matrix_vector_product(execpolicy_mapper(exec), A, x, y);
  } else {
    matrix_vector_product(std::experimental::linalg::impl::inline_exec_t(), A, x, y);
  }
}

Basically there are no real implementations which take any of the standard std::execution policies. We implement overloads with internal execution policies. The execpolicy_mapper(exec) will by default return exec, except for the official std::execution. policies which map to an internal one (right now for us its all mapping to
std::experimental::linalg::impl::inline_exec_t ). That actually has the advantage that you know what the internal impl does.

The is_custom_mat_vec_product_avail will check whether an overload is visible for the provided args. In linalg that allows vendor plugins to for example only provide implementations for the scalar types they got (like the fortran BLAS). That latter point probably doesn't matter to PSTL.

So a vendor shipping LLVM would be able to modify the mapper to let say std::execution::par_unseq map to hip::gpu_exec or whatever. They then can still only provide the overloads their customers asked for, while the other stuff will fallback to the default impl.

A non-llvm-shipper like Kokkos can also provide their own overloads, but they would only be called if you call std::linalg::matrix_vector_product(Kokkos::exec_policy, ...);

The last little piece not up there is that one would probably want to not just call inline_exec_t() in the else branch (if more than one internal implementation exists) but maybe check what public thing the handed exec policy is convertible too and then call the corresponding internal thing of that. I.e. you get Kokkos::exec_policy but no overload Kokkos::sort exists, however Kokkos::exec_policy is convertible to std::execution::par_unseq, so you could call impl::par_unseq overload or so.

Oh if you like to I am happy to draft a different revision sketching out the above approach similar to the one here.

In D149686#4313646, @crtrott wrote:

I generally feel that we are better served with a overload set based on internal execution policy types. One problem I see is that we very, very fast are gonna get asked how to for example choose either SIMD or OpenMP based parallelism on a per invocation basis.

I don't understand what distinction you are making here. You can use OpenMP to generate SIMD instructions or you can use it to multi-thread your code, but there is no SIMD or OpenMP.

The overload set based approach makes it easy to support this. Note we ARE allowed to ship implementation defined execution policies: "The semantics of parallel algorithms invoked with an execution policy object of implementation-defined type are implementation-defined." This is not undefined behavior, we just need to say what it does. That means for example we could ship std::omp_par std::omp_par_simd, std::gcd etc.. Or rather the LLVM-OpenMP project could ship std::omp_par together with a customization implementation of the algorithms. And AMD in their ROCm toolchain could add std::par_hip or something like that.

That's a lot more complicated than it seems to be at first. There are a few problems I see:

is_execution_policy_v<std::omp_par> has to return true for implementation-defined execution policies, so we wouldn't be able to use it inside the implementation of the algorithms
We would have to push/pop macros everywhere to avoid users #defineing our implementation non-conforming
using OpenMP, std::threads, GCD and whatever other mechanism together seems to defeat the purpose of the interface. At least as I understand it, the idea is to have an interface which brings you >90% of the way compared to writing it by hand.
I don't really see the purpose of having the choice of selecting the backend. If there is a significant performance improvement, the implementation should be tuned instead of having the user try out X different backends to find the best one.
This kind of customization is there in the PSTL, but it never materialized, so I don't think it's actually asked for that much

BTW the "implementation-defined" only forces us to document this. If it weren't mentioned, it would still be allowed for an implementation to support other execution policies, it just wouldn't have to be documented.

I don't understand what distinction you are making here. You can use OpenMP to generate SIMD instructions or you can use it to multi-thread your code, but there is no SIMD or OpenMP.

The current draft thingy has macros to at configure time decide whether to use PSTL_UNSEQ_BACKEND_SIMD or PSTL_UNSEQ_BACKEND_SERIAL my example was just riffing off that.
But even if we just talk OpenMP there is a difference between #pragma omp parallel for and #pragma omp parallel for simd. Now we could just have that map to std::par and std::par_unseq if OpenMP is detected as enabled,
but I certainly don't want folks to need to install to versions of libcxx - one with OpenMP enabled and one with it disabled. We don't install to versions of clang for this right now.
Furthermore, in our use cases one would enable for example in ROCM HIP and OpenMP at the same time. So now users want to decide on a case by case basis whether a for_each runs with OpenMP "parallel for" "parallel for simd" or "hip_gpu". And both the "omp parallel for simd" and the "hip_gpu" presumably have the semantic meaning of "par_unseq".

That's a lot more complicated than it seems to be at first. There are a few problems I see:

is_execution_policy_v<std::omp_par> has to return true for implementation-defined execution policies, so we wouldn't be able to use it inside the implementation of the algorithms

That is what the mapper to fully internal exec policies is for, and there would never be actual implementations using std::execution::par etc.. Those ones would always hit the "dispatch" overload, not an implementation overload.

We would have to push/pop macros everywhere to avoid users #defineing our implementation non-conforming

How so? At most we would need to have a check function in each dispatch function which tests whether the execution policy is an allowed one in "no-implementation-defined-behavior" mode, and so that one check function needs an ifdef.

using OpenMP, std::threads, GCD and whatever other mechanism together seems to defeat the purpose of the interface. At least as I understand it, the idea is to have an interface which brings you >90% of the way compared to writing it by hand.

I don't really see the purpose of having the choice of selecting the backend. If there is a significant performance improvement, the implementation should be tuned instead of having the user try out X different backends to find the best one.

But for different uses of the same algorithm (i.e. different callable, different number of iterations) a different backend will be optimal. There is no way to automatically choose that optimally on the backend side.

This kind of customization is there in the PSTL, but it never materialized, so I don't think it's actually asked for that much

Nobody really worked on this, partly because we didn't have the right people or time to do this. We (as in the HPC community and explicitly DOE) had to get ready for Exascale and there wasn't a clear path for us how to do that with pstl. Now however we are shipping implementations of pstl (in our own namespaces). Kokkos, Intel's OneAPI and NVIDIA are all doing their own version. One common thing across all of them is that we pass in stateful execution policies, because users need to provide some information in an important subset of cases. On our side probably 75% of use cases get away with defaults and then there are some which don't. More importantly we got more than one par_unseq equivalent users choose from.

BTW the "implementation-defined" only forces us to document this. If it weren't mentioned, it would still be allowed for an implementation to support other execution policies, it just wouldn't have to be documented.

Agreed, however this wording means we are allowed to have other execution policies for which is_execution_policy is true. I don't think users are allowed to provide a specialization of is_execution_policy.

Update after today's discussion

Herald added a subscriber: miyuki. · View Herald TranscriptMay 8 2023, 1:10 PM

Harbormaster completed remote builds in B230696: Diff 520470.May 8 2023, 2:08 PM

Update per discussion with @philnik

ldionne added inline comments.May 8 2023, 2:14 PM

libcxx/include/__algorithm/pstl_for_each.h
39	As @philnik pointed out, we actually can't use ADL here because our code needs to be "robust-against-adl" (tm).

Harbormaster completed remote builds in B230703: Diff 520482.May 8 2023, 3:10 PM

philnik commandeered this revision.May 9 2023, 10:18 AM

philnik added a reviewer: ldionne.

Updated approach

I like this. I think this answers all the constraints we had determined yesterday.

libcxx/include/__algorithm/pstl_backend.h
30–42
67–91	# if defined(_PSTL_PAR_BACKEND_STD_THREAD) \|\| defined(_PSTL_PAR_BACKEND_GCD) \|\| defined(_PSTL_PAR_BACKEND_TBB) \|\| defined(_PSTL_PAR_BACKEND_SERIAL) # include <__algorithm/pstl_backends/cpu_backend.h> template <> struct __select_backend<std::parallel_policy> { using type = __cpu_backend; }; template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __cpu_backend; }; #elif defined(_PSTL_PAR_BACKEND_SOME_FUNKY_GPU) # include <__algorithm/pstl_backends/funky_gpu_backend.h> template <> struct __select_backend<std::parallel_policy> { using type = __funky_gpu_backend; }; template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __funky_gpu_backend; }; # else // ...New vendors can add parallel backends here... # error "Invalid choice of a PSTL parallel backend" # endif
libcxx/include/__algorithm/pstl_backends/cpu_backend.h
18–22 ↗	(On Diff #520740)	#ifdef _LIBCPP_HAS_NO_THREADS # include <__algorithm/pstl_backends/cpu_backends/serial.h> #elif defined(_PSTL_PAR_BACKEND_STD_THREAD) # include <__algorithm/pstl_backends/cpu_backends/thread.h> #elif defined(_PSTL_PAR_BACKEND_GCD) # include <__algorithm/pstl_backends/cpu_backends/gcd.h> #elif defined(_PSTL_PAR_BACKEND_TBB) # include <__algorithm/pstl_backends/cpu_backends/tbb.h> #elif defined(_PSTL_PAR_BACKEND_SERIAL) # include <__algorithm/pstl_backends/cpu_backends/serial.h> #else # error "Invalid backend choice for a CPU backend" #endif
libcxx/include/__algorithm/pstl_for_each.h
52

ldionne added inline comments.May 9 2023, 10:53 AM

libcxx/include/__algorithm/pstl_for_each.h
51	We could do this in C++17: auto __for_each_n_test = [](auto&& ...args) -> void_t<decltype(std::__pstl_for_each_n<_RawPolicy>(args...))> {}; if constexpr (__is_valid(__for_each_n_test, _Backend{}, __first, __size, __func)) { // ... } else { // ... } Where: template <typename _Func, typename ..._Args, typename = decltype( std::declval<_Func&&>()(std::declval<_Args&&>()...) )> constexpr bool __is_valid_impl(int) { return true; } template <typename _Func, typename ..._Args> constexpr bool __is_valid_impl(...) { return false; } template <typename _Func, typename ..._Args> constexpr bool __is_valid(_Func&&, _Args&& ...) { return __is_valid_impl<_Func&&, _Args&&...>(int{}); } You might run into issues with `__is_valid(__for_each_n_test, _Backend{}, __first, __size, __func)` not being a constant expression because you are passing references to function arguments. Not sure if it'll be a problem cause you never actually read them. But if it is, then you can switch to returning `-> auto std::true_c{}` and `-> auto std::false_c{}` from your `__is_valid` function, and then you call it like: if constexpr (decltype(__is_valid(as-before)){}) { // ... } OK, not quite as nice, but it works around the constexpr issue. If you don't like this, you can also try to pass the argument types directly instead of the arguments themselves, like `__is_valid<decltype(__for_each_n_test), args...>()`. I'm not sure I quite like this, but it's an option on the table.

Refactor to making this a proper patch (and not a draft anymore)

Harbormaster completed remote builds in B230933: Diff 520782.May 9 2023, 12:00 PM

philnik added a parent revision: D150217: [libc++][PSTL] Move the remaining configuration into __config.May 9 2023, 1:00 PM

philnik retitled this revision from [libc++][DISCUSSION] Exploring PSTL backend customization points to [libc++][PSTL] Add more specialized backend customization points.May 9 2023, 1:03 PM

philnik edited the summary of this revision. (Show Details)

Try to fix CI

Update wording

Design updated after trying to implement a few more algorithms

Fix the implementation

Harbormaster completed remote builds in B230980: Diff 520842.May 9 2023, 5:23 PM

Generate files

Harbormaster completed remote builds in B231093: Diff 520988.May 10 2023, 7:55 AM

Try to fix CI

Harbormaster completed remote builds in B231099: Diff 520995.May 10 2023, 8:09 AM

I think this is an excellent start. Then we can clean up some stuff, improve comments and move all the existing algorithms to this approach.

It turns out that once we get to the CPU backend, we basically do what the original PSTL did -- we're really just adding an additional layer of customizability on top for backends where the par/non-par split might not make sense. LGTM w/ green CI and comments addressed.

libcxx/include/__algorithm/pstl_backend.h
46–47	Let's move this to `pstl_for_each.h`, it seems to belong there more than here. We can also add `// declaration needed for the frontend dispatch below`.
libcxx/include/__algorithm/pstl_backends/cpu_backends/for_each.h
35 ↗	(On Diff #520995)	`_HIDE_FROM_ABI`
libcxx/include/__algorithm/pstl_for_each.h
65	Nit but I don't think you can `move(__first)` here since you are then using `__first + __size`.
libcxx/include/__type_traits/is_execution_policy.h
47 ↗	(On Diff #520995)	`// TODO: Remove default argument once algorithms are using the new backend dispatching`

This revision is now accepted and ready to land.May 10 2023, 8:45 AM

Try to fix CI

philnik added a child revision: D150277: [libc++][PSTL] Move the already implemented functions to the new dispatching scheme.May 10 2023, 9:26 AM

Harbormaster completed remote builds in B231108: Diff 521013.May 10 2023, 10:33 AM

Try to fix CI

Herald added a subscriber: arichardson. · View Herald TranscriptMay 10 2023, 10:47 AM

I did not look it over super careful, but I agree that this looks like it can do the things I was most interested in having available. So I am good with merging this.

Harbormaster completed remote builds in B231125: Diff 521036.May 10 2023, 1:05 PM

Next try

Harbormaster completed remote builds in B231158: Diff 521083.May 10 2023, 3:43 PM

Try to fix CI

Harbormaster completed remote builds in B231197: Diff 521128.May 10 2023, 5:12 PM

Try to fix CI

Fix formatting

Next try

Harbormaster completed remote builds in B231369: Diff 521350.May 11 2023, 1:06 PM

This revision was landed with ongoing or failed builds.May 11 2023, 1:54 PM

Closed by commit rG8e2d09c33938: [libc++][PSTL] Add more specialized backend customization points (authored by ldionne, committed by philnik). · Explain Why

This revision was automatically updated to reflect the committed changes.

philnik added a commit: rG8e2d09c33938: [libc++][PSTL] Add more specialized backend customization points.

Revision Contents

Path

Size

libcxx/

include/

__algorithm/

pstl_backend.h

91 lines

pstl_backends/

57 lines

57 lines

40 lines

50 lines

Diff 520482

libcxx/include/__algorithm/pstl_backend.h

This file was added.

//===----------------------------------------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

#ifndef _LIBCPP___ALGORITHM_PSTL_BACKEND_H

#define _LIBCPP___ALGORITHM_PSTL_BACKEND_H

#include <__algorithm/pstl_backends/serial.h>

#include <__algorithm/pstl_backends/simd.h>

#include <__config>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)

# pragma GCC system_header

#endif

#if defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

_LIBCPP_BEGIN_NAMESPACE_STD

TODO: Documentation of how backends work

A PSTL parallel backend is a tag type to which the following functions are associated, at minimum:

template <class _ExecutionPolicy, class _Iterator, class _Func>

void __for_each(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Func __f);

template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinOp>

_Tp __reduce(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Tp const& __value, _BinOp __op);

etc...

The following functions are optional but can be provided. If provided, they are used by the corresponding

algorithms, otherwise they are implemented in terms of the basis operations mentioned above:

template <class _ExecutionPolicy, class _Iterator, class _Size, class _Func>

void __for_each_n(_Backend, _ExecutionPolicy&&, _Iterator __first, _Size __n, _Func __f);

ldionneUnsubmitted

Done

A PSTL parallel backend is a tag type to which the following functions are associated, at minimum:

template <class _ExecutionPolicy, class _Iterator, class _Func>

- void __for_each(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Func __f);

+ void __pstl_for_each(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Func __f);

template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinOp>

- _Tp __reduce(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Tp const& __value, _BinOp __op);

+ _Tp __pstl_reduce(_Backend, _ExecutionPolicy&&, _Iterator __first, _Iterator __last, _Tp const& __value, _BinOp __op);

etc...

The following functions are optional but can be provided. If provided, they are used by the corresponding

algorithms, otherwise they are implemented in terms of the basis operations mentioned above:

template <class _ExecutionPolicy, class _Iterator, class _Size, class _Func>

- void __for_each_n(_Backend, _ExecutionPolicy&&, _Iterator __first, _Size __n, _Func __f);

+ void __pstl_for_each_n(_Backend, _ExecutionPolicy&&, _Iterator __first, _Size __n, _Func __f);

etc...

ldionne:

etc...

template <class _ExecutionPolicy>

ldionneUnsubmitted

Done

Let's move this to pstl_for_each.h, it seems to belong there more than here. We can also add // declaration needed for the frontend dispatch below.

ldionne: Let's move this to `pstl_for_each.h`, it seems to belong there more than here. We can also add…

struct __select_backend;

template <>

struct __select_backend<std::sequenced_policy> {

using type = __serial_backend;

};

template <>

struct __select_backend<std::unsequenced_policy> {

using type = __simd_backend;

};

# if defined(_PSTL_PAR_BACKEND_STD_THREAD)

# include <__algorithm/pstl_backends/thread.h>

template <> struct __select_backend<std::parallel_policy> { using type = __std_thread_backend; };

template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __std_thread_backend; };

# elif defined(_PSTL_PAR_BACKEND_GCD)

# include <__algorithm/pstl_backends/gcd.h>

template <> struct __select_backend<std::parallel_policy> { using type = __gcd_backend; };

template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __gcd_backend; };

# elif defined(_PSTL_PAR_BACKEND_TBB)

# include <__algorithm/pstl_backends/tbb.h>

template <> struct __select_backend<std::parallel_policy> { using type = __tbb_backend; };

template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __tbb_backend; };

# elif defined(_PSTL_PAR_BACKEND_SERIAL)

# include <__algorithm/pstl_backends/serial.h>

template <> struct __select_backend<std::parallel_policy> { using type = __serial_backend; };

template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __serial_backend; };

# else

// ...New vendors can add parallel backends here...

# error "Invalid choice of a PSTL parallel backend"

# endif

_LIBCPP_END_NAMESPACE_STD

#endif // defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

#endif // _LIBCPP___ALGORITHM_PSTL_BACKEND_H

ldionneUnsubmitted

Done

#  if defined(_PSTL_PAR_BACKEND_STD_THREAD) || defined(_PSTL_PAR_BACKEND_GCD) || defined(_PSTL_PAR_BACKEND_TBB) || defined(_PSTL_PAR_BACKEND_SERIAL)
#    include <__algorithm/pstl_backends/cpu_backend.h>
template <> struct __select_backend<std::parallel_policy> { using type = __cpu_backend; };
template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __cpu_backend; };

#elif defined(_PSTL_PAR_BACKEND_SOME_FUNKY_GPU)
#    include <__algorithm/pstl_backends/funky_gpu_backend.h>
template <> struct __select_backend<std::parallel_policy> { using type = __funky_gpu_backend; };
template <> struct __select_backend<std::parallel_unsequenced_policy> { using type = __funky_gpu_backend; };

#  else

// ...New vendors can add parallel backends here...

#    error "Invalid choice of a PSTL parallel backend"
#  endif

ldionne: ``` # if defined(_PSTL_PAR_BACKEND_STD_THREAD) || defined(_PSTL_PAR_BACKEND_GCD) || defined…

libcxx/include/__algorithm/pstl_backends/serial.h

This file was added.

				// -- C++ --
				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_SERIAL_H
				#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_SERIAL_H

				#include <__config>
				#include <__algorithm/for_each.h>
				#include <__numeric/reduce.h>

				#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
				# pragma GCC system_header
				#endif

				#if defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

				_LIBCPP_BEGIN_NAMESPACE_STD

				struct __serial_backend { };

				//
				// Mandatory customization points
				//
				template <class _ExecutionPolicy, class _Iterator, class _Fp>
				_LIBCPP_HIDE_FROM_ABI
				void __pstl_for_each(__serial_backend, _Iterator __first, _Iterator __last, _Fp __f) {
				std::for_each(__first, __last, __f);
				}

				template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinOp>
				_LIBCPP_HIDE_FROM_ABI
				_Tp __pstl_reduce(__serial_backend, _Iterator __first, _Iterator __last, _Tp const& __value, _BinOp __op) {
				return std::reduce(__first, __last, __value, __op);
				}

				// etc...

				//
				// Optional customization points
				//
				template <class _ExecutionPolicy, class _Iterator, class _Size, class _Fp>
				_LIBCPP_HIDE_FROM_ABI
				void __pstl_for_each_n(__serial_backend, _Iterator __first, _Size __n, _Fp __f) {
				std::for_each_n(__first, __n, __f);
				}

				_LIBCPP_END_NAMESPACE_STD

				#endif // defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

				#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_SERIAL_H

libcxx/include/__algorithm/pstl_backends/thread.h

This file was added.

				// -- C++ --
				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef _LIBCPP___ALGORITHM_PSTL_BACKENDS_THREAD_H
				#define _LIBCPP___ALGORITHM_PSTL_BACKENDS_THREAD_H

				#include <__config>
				#include <__algorithm/for_each.h>
				#include <__numeric/reduce.h>

				#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
				# pragma GCC system_header
				#endif

				#if defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

				_LIBCPP_BEGIN_NAMESPACE_STD

				struct __thread_backend { };

				//
				// Mandatory customization points
				//
				template <class _ExecutionPolicy, class _Iterator, class _Fp>
				_LIBCPP_HIDE_FROM_ABI
				void __pstl_for_each(__thread_backend, _Iterator __first, _Iterator __last, _Fp __f) {
				static_assert(__is_parallel_policy_v<_ExecutionPolicy>);
				auto& __thread_pool = __get_thread_pool();
				std::__terminate_on_exception([&] {
				// chunk up the range and dispatch it to the threads
				for (TODO) {
				__thread_pool.__dispatch([&] {
				__simd_utils::__for_each<_ExecutionPolicy>(__brick_first, __brick_last, __f);
				});
				}
				});
				}

				template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinOp>
				_LIBCPP_HIDE_FROM_ABI
				_Tp __pstl_reduce(__thread_backend, _Iterator __first, _Iterator __last, _Tp const& __value, _BinOp __op) {
				// TODO: Implement
				}

				// etc...

				_LIBCPP_END_NAMESPACE_STD

				#endif // defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

				#endif // _LIBCPP___ALGORITHM_PSTL_BACKENDS_THREAD_H

libcxx/include/__algorithm/pstl_for_each.h

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#ifndef _LIBCPP___ALGORITHM_PSTL_FOR_EACH_H #ifndef _LIBCPP___ALGORITHM_PSTL_FOR_EACH_H

#define _LIBCPP___ALGORITHM_PSTL_FOR_EACH_H #define _LIBCPP___ALGORITHM_PSTL_FOR_EACH_H

#include <__algorithm/pstl_backend.h>

#include <__algorithm/for_each.h> #include <__algorithm/for_each.h>

#include <__algorithm/for_each_n.h> #include <__algorithm/for_each_n.h>

#include <__config> #include <__config>

#include <__iterator/iterator_traits.h> #include <__iterator/iterator_traits.h>

#include <__pstl/internal/parallel_backend.h> #include <__pstl/internal/parallel_backend.h>

#include <__pstl/internal/unseq_backend_simd.h> #include <__pstl/internal/unseq_backend_simd.h>

#include <__type_traits/is_execution_policy.h> #include <__type_traits/is_execution_policy.h>

#include <__type_traits/remove_cvref.h> #include <__type_traits/remove_cvref.h>

#include <__utility/terminate_on_exception.h> #include <__utility/terminate_on_exception.h>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)

# pragma GCC system_header # pragma GCC system_header

#endif #endif

#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17 #if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17

_LIBCPP_BEGIN_NAMESPACE_STD _LIBCPP_BEGIN_NAMESPACE_STD

template <class _ExecutionPolicy, template <class _ExecutionPolicy,

class _ForwardIterator, class _ForwardIterator,

class _Function, class _Function,

enable_if_t<is_execution_policy_v<__remove_cvref_t<_ExecutionPolicy>>, int> = 0> class _RawPolicy = __remove_cvref_t<_ExecutionPolicy>,

enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>

_LIBCPP_HIDE_FROM_ABI void _LIBCPP_HIDE_FROM_ABI void

for_each(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Function __func) { for_each(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Function __func) {

if constexpr (__is_parallel_execution_policy_v<_ExecutionPolicy> && using _Backend = typename __select_backend<_RawPolicy>::type;

__is_cpp17_random_access_iterator<_ForwardIterator>::value) { __pstl_for_each<_RawPolicy>(_Backend{}, std::move(__first), std::move(__last), std::move(__func)); // TODO: Can't do ADL

ldionneUnsubmitted

Done

As @philnik pointed out, we actually can't use ADL here because our code needs to be "robust-against-adl" (tm).

ldionne: As @philnik pointed out, we actually can't use ADL here because our code needs to be "robust…

std::__terminate_on_exception([&] {

__pstl::__par_backend::__parallel_for(

{},

__policy,

__first,

__last,

[&__policy, __func](_ForwardIterator __brick_first, _ForwardIterator __brick_last) {

std::for_each(std::__remove_parallel_policy(__policy), __brick_first, __brick_last, __func);

});

} else if constexpr (__is_unsequenced_execution_policy_v<_ExecutionPolicy> &&

__is_cpp17_random_access_iterator<_ForwardIterator>::value) {

__pstl::__unseq_backend::__simd_walk_1(__first, __last - __first, __func);

} else {

std::for_each(__first, __last, __func);

}

} }

template <class _ExecutionPolicy, template <class _ExecutionPolicy,

class _ForwardIterator, class _ForwardIterator,

class _Size, class _Size,

class _Function, class _Function,

enable_if_t<is_execution_policy_v<__remove_cvref_t<_ExecutionPolicy>>, int> = 0> class _RawPolicy = __remove_cvref_t<_ExecutionPolicy>,

enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>

_LIBCPP_HIDE_FROM_ABI void _LIBCPP_HIDE_FROM_ABI void

for_each_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __size, _Function __func) { for_each_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __size, _Function __func) {

if constexpr (__is_cpp17_random_access_iterator<_ForwardIterator>::value) { using _Backend = typename __select_backend<_RawPolicy>::type;

ldionneUnsubmitted

Done

I guess we could do an ADL call here and if that resolves, we use that, otherwise we use this implementation. There's still difficulties with the fact that we have both a par and an unseq backend, though.

ldionne: I guess we could do an ADL call here and if that resolves, we use that, otherwise we use this…

if constexpr (HAS-FOR_EACH_N(_Backend)) {

ldionneUnsubmitted

Done

We could do this in C++17:

auto __for_each_n_test = [](auto&& ...args) -> void_t<decltype(std::__pstl_for_each_n<_RawPolicy>(args...))> {};
if constexpr (__is_valid(__for_each_n_test, _Backend{}, __first, __size, __func)) {
  // ...
} else {
  // ...
}

Where:

template <typename _Func, typename ..._Args, typename = decltype(
  std::declval<_Func&&>()(std::declval<_Args&&>()...)
)>
constexpr bool __is_valid_impl(int) { return true; }

template <typename _Func, typename ..._Args>
constexpr bool __is_valid_impl(...) { return false; }

template <typename _Func, typename ..._Args>
constexpr bool __is_valid(_Func&&, _Args&& ...) {
  return __is_valid_impl<_Func&&, _Args&&...>(int{});
}

You might run into issues with __is_valid(__for_each_n_test, _Backend{}, __first, __size, __func) not being a constant expression because you are passing references to function arguments. Not sure if it'll be a problem cause you never actually read them. But if it is, then you can switch to returning -> auto std::true_c{} and -> auto std::false_c{} from your __is_valid function, and then you call it like:

if constexpr (decltype(__is_valid(as-before)){}) {
  // ...
}

OK, not quite as nice, but it works around the constexpr issue.

If you don't like this, you can also try to pass the argument types directly instead of the arguments themselves, like __is_valid<decltype(__for_each_n_test), args...>(). I'm not sure I quite like this, but it's an option on the table.

ldionne: We could do this in C++17: ``` auto __for_each_n_test = [](auto&& ...args) -> void_t<decltype…

__pstl_for_each_n<_RawPolicy>(_Backend{}, std::move(__first), __size, std::move(__func));

ldionneUnsubmitted

Done

if constexpr (requires {std::__pstl_for_each_n(_Backend{}, __first, __size, __func); }) {

- __pstl_for_each_n<_RawPolicy>(_Backend{}, std::move(__first), __size, std::move(__func));

+ std::__pstl_for_each_n<_RawPolicy>(_Backend{}, std::move(__first), __size, std::move(__func));

} else {

ldionne:

} else {

// Default implementation

if constexpr (__is_cpp17_random_access_iterator<_ForwardIterator>::value &&

__is_parallel_execution_policy_v<_RawPolicy>) {

std::for_each(__policy, __first, __first + __size, __func); std::for_each(__policy, __first, __first + __size, __func);

} else { } else {

std::for_each_n(__first, __size, __func); std::for_each_n(__first, __size, __func);

} }

}

_LIBCPP_END_NAMESPACE_STD _LIBCPP_END_NAMESPACE_STD

ldionneUnsubmitted

Done

Nit but I don't think you can move(__first) here since you are then using __first + __size.

ldionne: Nit but I don't think you can `move(__first)` here since you are then using `__first + __size`.

#endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17 #endif // !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17

#endif // _LIBCPP___ALGORITHM_PSTL_FOR_EACH_H #endif // _LIBCPP___ALGORITHM_PSTL_FOR_EACH_H

libcxx/include/__algorithm/pstl_simd_utils.h

This file was added.

				// -- C++ --
				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef _LIBCPP___ALGORITHM_PSTL_SIMD_UTILS_H
				#define _LIBCPP___ALGORITHM_PSTL_SIMD_UTILS_H

				#include <__config>
				#include <__algorithm/for_each.h>
				#include <__numeric/reduce.h>

				#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
				# pragma GCC system_header
				#endif

				#if defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

				_LIBCPP_BEGIN_NAMESPACE_STD

				namespace __simd_utils {
				template <class _ExecutionPolicy, class _ForwardIterator, class _Func>
				_LIBCPP_HIDE_FROM_ABI
				void __for_each(_ForwardIterator __first, _ForwardIterator __last, _Func __f) {
				if constexpr (__is_unsequenced_policy_v<_ExecutionPolicy> &&
				__is_cpp17_random_access_iterator<_ForwardIterator>::value) {
				__simd_walk_1(__first, __last - __first, __f);
				} else {
				std::for_each(__first, __last, __f);
				}
				}

				template <class _ExecutionPolicy, class _Iterator, class _Tp, class _BinOp>
				_LIBCPP_HIDE_FROM_ABI
				_Tp __reduce(_Iterator __first, _Iterator __last, _Tp const& __value, _BinOp __op) {
				// TODO: Implement
				}
				}

				// etc...

				_LIBCPP_END_NAMESPACE_STD

				#endif // defined(_LIBCPP_HAS_PARALLEL_ALGORITHMS) && _LIBCPP_STD_VER >= 17

				#endif // _LIBCPP___ALGORITHM_PSTL_SIMD_UTILS_H

This is an archive of the discontinued LLVM Phabricator instance.

[libc++][PSTL] Add more specialized backend customization pointsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 520482

libcxx/include/__algorithm/pstl_backend.h

libcxx/include/__algorithm/pstl_backends/serial.h

libcxx/include/__algorithm/pstl_backends/thread.h

libcxx/include/__algorithm/pstl_for_each.h

libcxx/include/__algorithm/pstl_simd_utils.h

[libc++][PSTL] Add more specialized backend customization points
ClosedPublic