This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
pstl/
-
CMakeLists.txt
-
include/pstl/internal/
-
pstl/
-
internal/
-
parallel_backend.h
6/11
parallel_backend_serial.h
-
pstl_config.h

Differential D59791

[pstl] Add a serial backend for the PSTL
ClosedPublic

Authored by ldionne on Mar 25 2019, 12:13 PM.

Download Raw Diff

Details

Reviewers

rodgert
MikeDvorskiy

Commits

rPSTL358700: [pstl] Add a serial backend for the PSTL
rGdcf4b9aee06a: [pstl] Add a serial backend for the PSTL
rL358700: [pstl] Add a serial backend for the PSTL

Summary

The serial backend performs all tasks serially and does not require
threads. It does not have any dependencies beyond normal C++, but
it is not very efficient either.

Diff Detail

Repository

rPSTL pstl

Build Status

Buildable 29592
Build 29591: arc lint + arc unit

Event Timeline

ldionne created this revision.Mar 25 2019, 12:13 PM

Herald added subscribers: libcxx-commits, jdoerfert, dexonsmith and 2 others. · View Herald TranscriptMar 25 2019, 12:13 PM

Harbormaster completed remote builds in B29592: Diff 192179.Mar 25 2019, 12:14 PM

This is a WIP patch for adding a serial backend. This is a work in progress because I strongly suspect I'm not implementing the backend correctly, and I'm looking for guidance on how to do it correctly. One of the difficulties in implementing this was the lack of documentation for the parameters of each function constituting the backend. For example, I have no idea what __real_body is in

parallel_reduce(_ExecutionPolicy&&, _Index __first, _Index __last, const _Value& __identity, const _RealBody& __real_body, const _Reduction&)

so I tried guessing from the existing code. I feel it would be useful to document the backend API properly and I'm willing to do it, but I need help from @MikeDvorskiy and others to achieve that.

ldionne added a child revision: D59792: [pstl] Make the default backend be the serial backend and always provide parallel policies.Mar 25 2019, 12:20 PM

Hi Louis,

Actually, there is a draft of Parallel STL back-end API documentation. Probably, not so detailed, but there is. I sent it Thomas several month ago. I've just sent you as well.
M.b. make sense to put one into the repo?
And of course , don't hesitate writing directly me about Parallel STL first to prevent wasting time for guessing)..

In essence. I think we don't need to write a special serial back-end if we want a serial execution by a parallel policy. According to PSTL design we can just additional compile time dispatching om "pattern of bricks" design level. It will do a re-direct to the serial patterns. See "is_parallelization_preferred".
Furthermore, we can do just one change within "is_parallelization_preferred" or even, within the parallel policy traits. But I tend to "is_parallelization_preferred" and avoid the policy traits modification.

For example,

template <typename _ExecutionPolicy, typename... _IteratorTypes>
auto is_parallelization_preferred(_ExecutionPolicy&& __exec)
    -> decltype(lazy_and(__exec.__allow_parallel(), typename is_random_access_iterator<_IteratorTypes...>::type()))
{
#if __PSTL_PAR_BACKEND_SERIAL
    return std::false_type();
#else
    return internal::lazy_and(__exec.__allow_parallel(), typename is_random_access_iterator<_IteratorTypes...>::type());
#endif
}

In D59791#1442727, @MikeDvorskiy wrote:

Hi Louis,

Actually, there is a draft of Parallel STL back-end API documentation. Probably, not so detailed, but there is. I sent it Thomas several month ago. I've just sent you as well.
M.b. make sense to put one into the repo?
And of course , don't hesitate writing directly me about Parallel STL first to prevent wasting time for guessing)..

In essence. I think we don't need to write a special serial back-end if we want a serial execution by a parallel policy. According to PSTL design we can just additional compile time dispatching om "pattern of bricks" design level. It will do a re-direct to the serial patterns. See "is_parallelization_preferred".
Furthermore, we can do just one change within "is_parallelization_preferred" or even, within the parallel policy traits. But I tend to "is_parallelization_preferred" and avoid the policy traits modification.

For example,
[...]

Per today's discussion, there IS interest for having a serial backend from multiple vendors. I think we all agreed that this was worth implementing, now I have to figure out how to implement it properly.

I think we all agreed that this was worth implementing

Yes, we agreed. I'm ready to explain any details If you need.

In D59791#1458089, @MikeDvorskiy wrote:

I think we all agreed that this was worth implementing

Yes, we agreed. I'm ready to explain any details If you need.

See questions in this review.

pstl/include/pstl/internal/parallel_backend_serial.h
86	So, I strongly suspect this implementation is incorrect. Can you draft what a correct serial implementation would be?
94	Is `__nsort` the number of elements that should be sorted in the resulting range? IOW, this is a partial sort where you want to get the `n` smallest elements of the whole range in sorted order. It's not clear to me how to achieve this without a `__leaf_sort` that itself accepts a `n` to only partially sort the first `n` elements. I mean, I could probably find a way to do it iteratively by sorting a bit more every time, but what I really want is to just call `std::partial_sort`. And actually, while we're at it, calling `std::partial_sort` isn't enough, since this sort needs to be stable. But we don't have a `std::stable_partial_sort`.
106	Note that the obvious implementation doesn't work: std::merge(__first1, __last1, __first2, __last2, __out, __comp); because it requires the elements to be copyable, but pstl apparently expects the merge to only move elements around without copying them.

MikeDvorskiy added inline comments.Apr 10 2019, 4:38 AM

pstl/include/pstl/internal/parallel_backend_serial.h
94	Actually, leaf_sort is not only std::sort or std::partial_sort. leaf_sort (as lambda) has already captured information about "sort" or "partial sort" - "__n" variable. See, the code of the passed lambda into parallel_stable_sort: [__n](_RandomAccessIterator __begin, _RandomAccessIterator __end, _Compare __comp) { if (__n < __end - __begin) std::partial_sort(__begin, __begin + __n, __end, __comp); else std::sort(__begin, __end, __comp); }, So, in case of serial case here you don't use __nsort parameter, due to you the special lambda which takes into account "sort" or "partial sort" logic. In case of parallel implementation n parameter may be useful - for merging of two sorted sub-ranges , just first n elements....
106	Why "doesnt work"? In special case ... if (__n <= __merge_cut_off) { // Fall back on serial merge __leaf_merge(__xs, __xe, __ys, __ye, __zs, __comp); } ... where __leaf_merge is std::merge

I'd love to get concrete feedback on whether the implementation of the algorithms is wrong, and if so, why. This would be really helpful in understanding the backend API better and would allow me to make progress on implementing this backend, which is a pre-requisite for many other things.

Also note that I don't expect this backend implementation to be final after this commit. I just want us to get something "correct" so as to nail down the semantics of the backend functions and make progress on other tasks. I strongly suspect that we'll have to make changes to the backend API to get a more straightforward serial implementation, but now's not the time to tackle this.

pstl/include/pstl/internal/parallel_backend_serial.h

This is the only thing I strongly suspect is incorrectly implemented. Can you please check this?

Ah, I missed the bit in the lambda. So this implementation is correct, then. Thanks!

106

Like I said, std::merge requires elements to be copyable, but the PSTL tests call parallel_merge with elements that are move-only. It also looks like you guys went through some hoops to make this work, see __serial_move_merge in parallel_backend_utils.h.

However, since it looks like calling __leaf_merge is a valid implementation, let's revisit this copyability issue at another point. I want to land this patch ASAP because other stuff depends on it.

In case you're curious, the error message for copyability looks like this:

<toolchain>/usr/include/c++/v1/algorithm:4392:23: error: object of type 'LocalWrapper<float>' cannot be assigned because its copy assignment operator is implicitly deleted
            *__result = *__first2;
                      ^
<toolchain>/usr/include/c++/v1/algorithm:4416:19: note: in instantiation of function template specialization 'std::__1::__merge<std::__1::less<LocalWrapper<float> > &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *>' requested here
    return _VSTD::__merge<_Comp_ref>(__first1, __last1, __first2, __last2, __result, __comp);
                  ^
<pstl-root>/include/pstl/internal/parallel_backend_serial.h:117:10: note: in instantiation of function template specialization 'std::__1::merge<std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *, std::__1::less<LocalWrapper<float> > >' requested here
    std::merge(__first1, __last1, __first2, __last2, __out, __comp);
         ^
<pstl-root>/include/pstl/internal/algorithm_impl.h:2667:24: note: in instantiation of function template specialization '__pstl::__serial::__parallel_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *, std::__1::less<LocalWrapper<float> >, (lambda at <pstl-root>/include/pstl/internal/algorithm_impl.h:2669:13)>' requested here
        __par_backend::__parallel_merge(
                       ^
<pstl-root>/include/pstl/internal/glue_algorithm_impl.h:899:17: note: in instantiation of function template specialization '__pstl::__internal::__pattern_inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::less<LocalWrapper<float> >, std::__1::integral_constant<bool, false> >' requested here
    __internal::__pattern_inplace_merge(
                ^
<pstl-root>/test/std/algorithms/alg.merge/inplace_merge.pass.cpp:63:14: note: in instantiation of function template specialization 'std::inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::less<LocalWrapper<float> > >' requested here
        std::inplace_merge(exec, first2, mid2, last2, comp);
             ^
<pstl-root>/test/support/utils.h:757:9: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
        op(std::forward<Rest>(rest)...);
        ^
...

Rebase on top of master

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2019, 1:55 PM

Harbormaster completed remote builds in B30341: Diff 194585.Apr 10 2019, 1:57 PM

MikeDvorskiy added inline comments.Apr 11 2019, 5:25 AM

pstl/include/pstl/internal/parallel_backend_serial.h
106	Yes, std::merge requires "copyable". Actually, the all "move" operations in "__serial_move_merge" will reduce to "copy" operations If "move" semantic is not specified. But, you are right. If we pass non-const iterator into std::merge and a value type has the non-trivial move semantic we have got wrong effect... It may be quickly fixed in "__parallel_merge". I'll do it. But the compiler diagnostic shown here tells about "std::inplace_merge" algo. This algo requires just move semantic and the test checks it. The problem is that pattern_inplace_merge re-uses par_backend::parallel_merge due to there is no a special par_backend::parallelinplace_merge API. I think it makes sense to add par_backend::parallelinplace_merge API (and move the relevant code from pattern_inplace_merge into __par_backend). I'll do it. After that in your serial back-end you should just call std::inplace_merge serial. Thanks for raising the issues.

MikeDvorskiy added inline comments.Apr 11 2019, 9:01 AM

pstl/include/pstl/internal/parallel_backend_serial.h
86	I see your question about strict_scan as well. I need additional time to answer regarding right serial code here... I will try to answer today or tomorrow..

MikeDvorskiy added inline comments.Apr 12 2019, 9:14 AM

pstl/include/pstl/internal/parallel_backend_serial.h
86	Yes, this serial version is right for __parallel_strict_scan.

Is there anything to fix in this patch? If not, and if that implementation is the correct one given today's backend API, I'd like to check this in so we can do followup cleanup. It'll also help me as I try to see if a more general backend API is possible because I'll have both the TBB and the serial (trivial) examples to work off of.

If the problem with passing test inplace_merge.pass.cpp" is actual, I'll investigate and fix it by proposed new patch over the top your changes.

This revision is now accepted and ready to land.Apr 18 2019, 10:34 AM

Closed by commit rL358700: [pstl] Add a serial backend for the PSTL (authored by ldionne). · Explain WhyApr 18 2019, 11:19 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2019, 11:19 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Revision Contents

Path

Size

pstl/

CMakeLists.txt

14 lines

include/

pstl/

internal/

parallel_backend.h

4 lines

parallel_backend_serial.h

123 lines

pstl_config.h

8 lines

Diff 192179

pstl/CMakeLists.txt

Show All 23 Lines	if (NOT TBB_DIR)
string(REPLACE pstl tbb TBB_DIR_NAME ${PSTL_DIR_NAME})		string(REPLACE pstl tbb TBB_DIR_NAME ${PSTL_DIR_NAME})
if (EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/../${TBB_DIR_NAME}/cmake")		if (EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/../${TBB_DIR_NAME}/cmake")
get_filename_component(TBB_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../${TBB_DIR_NAME}/cmake" ABSOLUTE)		get_filename_component(TBB_DIR "${CMAKE_CURRENT_SOURCE_DIR}/../${TBB_DIR_NAME}/cmake" ABSOLUTE)
endif()		endif()
endif()		endif()

add_library(ParallelSTL INTERFACE)		add_library(ParallelSTL INTERFACE)
add_library(pstl::ParallelSTL ALIAS ParallelSTL)		add_library(pstl::ParallelSTL ALIAS ParallelSTL)
		target_compile_features(ParallelSTL INTERFACE cxx_std_17)

if (PARALLELSTL_USE_PARALLEL_POLICIES)		if (PARALLELSTL_USE_PARALLEL_POLICIES)
if (PARALLELSTL_BACKEND STREQUAL "tbb")		if (PARALLELSTL_BACKEND STREQUAL "serial")
		message(STATUS "Parallel STL uses the serial backend")
		target_compile_definitions(ParallelSTL INTERFACE -D__PSTL_PAR_BACKEND_SERIAL)
		elseif (PARALLELSTL_BACKEND STREQUAL "tbb")
find_package(TBB 2018 REQUIRED tbb OPTIONAL_COMPONENTS tbbmalloc)		find_package(TBB 2018 REQUIRED tbb OPTIONAL_COMPONENTS tbbmalloc)
message(STATUS "Parallel STL uses TBB ${TBB_VERSION} (interface version: ${TBB_INTERFACE_VERSION})")		message(STATUS "Parallel STL uses TBB ${TBB_VERSION} (interface version: ${TBB_INTERFACE_VERSION})")
target_link_libraries(ParallelSTL INTERFACE TBB::tbb)		target_link_libraries(ParallelSTL INTERFACE TBB::tbb)
		target_compile_definitions(ParallelSTL INTERFACE -D__PSTL_PAR_BACKEND_TBB)
else()		else()
if (TARGET ${PARALLELSTL_BACKEND})		message(FATAL_ERROR "Requested unknown Parallel STL backend '${PARALLELSTL_BACKEND}'.")
target_link_libraries(ParallelSTL INTERFACE ${PARALLELSTL_BACKEND})
else()
find_package(${PARALLELSTL_BACKEND} REQUIRED)
target_link_libraries(ParallelSTL INTERFACE ${${PARALLELSTL_BACKEND}_IMPORTED_TARGETS})
endif()
endif()		endif()
else()		else()
target_compile_definitions(ParallelSTL INTERFACE PSTL_USE_PARALLEL_POLICIES=0)		target_compile_definitions(ParallelSTL INTERFACE PSTL_USE_PARALLEL_POLICIES=0)
endif()		endif()

target_include_directories(ParallelSTL		target_include_directories(ParallelSTL
INTERFACE		INTERFACE
$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>		$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>
Show All 18 Lines

pstl/include/pstl/internal/parallel_backend.h

	// -- C++ --			// -- C++ --
	//===-- parallel_backend.h ------------------------------------------------===//			//===-- parallel_backend.h ------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef __PSTL_parallel_backend_H			#ifndef __PSTL_parallel_backend_H
	#define __PSTL_parallel_backend_H			#define __PSTL_parallel_backend_H

	#if __PSTL_PAR_BACKEND_TBB			#if defined(__PSTL_PAR_BACKEND_SERIAL)
				#include "parallel_backend_serial.h"
				#elif defined(__PSTL_PAR_BACKEND_TBB)
	#include "parallel_backend_tbb.h"			#include "parallel_backend_tbb.h"
	#else			#else
	__PSTL_PRAGMA_MESSAGE("Parallel backend was not specified");			__PSTL_PRAGMA_MESSAGE("Parallel backend was not specified");
	#endif			#endif

	#endif /* __PSTL_parallel_backend_H */			#endif /* __PSTL_parallel_backend_H */

pstl/include/pstl/internal/parallel_backend_serial.h

This file was added.

				// -- C++ --
				//===-- parallel_backend_serial.h -----------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef __PSTL_parallel_backend_serial_H
				#define __PSTL_parallel_backend_serial_H

				#include <algorithm>
				#include <cstddef>
				#include <memory>
				#include <numeric>
				#include <utility>


				namespace __pstl { namespace __serial {

				template <typename _Tp>
				class buffer
				{
				std::allocator<_Tp> __allocator_;
				_Tp* __ptr_;
				const std::size_t __buf_size_;
				buffer(const buffer&) = delete;
				void operator=(const buffer&) = delete;

				public:
				buffer(std::size_t __n)
				: __allocator_()
				, __ptr_(__allocator_.allocate(__n))
				, __buf_size_(__n)
				{ }

				operator bool() const { return __ptr_ != nullptr; }
				_Tp* get() const { return __ptr_; }
				~buffer() { __allocator_.deallocate(__ptr_, __buf_size_); }
				};

				inline void cancel_execution() { }

				template <class _ExecutionPolicy, class _Index, class _Fp>
				void parallel_for(_ExecutionPolicy&&, _Index __first, _Index __last, _Fp __f)
				{
				__f(__first, __last);
				}

				template <class _ExecutionPolicy, class _Value, class _Index, typename _RealBody, typename _Reduction>
				_Value
				parallel_reduce(_ExecutionPolicy&&, _Index __first, _Index __last, const _Value& __identity,
				const _RealBody& __real_body, const _Reduction&)
				{
				if (__first == __last) {
				return __identity;
				} else {
				return __real_body(__first, __last, __identity);
				}
				}

				template <class _ExecutionPolicy, class _Index, class _UnaryOp, class _Tp, class _BinaryOp, class _Reduce>
				_Tp
				parallel_transform_reduce(_ExecutionPolicy&&, _Index __first, _Index __last, _UnaryOp, _Tp __init, _BinaryOp, _Reduce __reduce)
				{
				return __reduce(__first, __last, __init);
				}

				template <class _ExecutionPolicy, typename _Index, typename _Tp, typename _Rp, typename _Cp, typename _Sp, typename _Ap>
				void
				parallel_strict_scan(_ExecutionPolicy&&, _Index __n, _Tp __initial, _Rp __reduce, _Cp __combine, _Sp __scan, _Ap __apex)
				{
				_Tp __sum = __initial;
				if (__n)
				__sum = __combine(__sum, __reduce(_Index(0), __n));
				__apex(__sum);
				if (__n)
				__scan(_Index(0), __n, __initial);
				}

				template <class _ExecutionPolicy, class _Index, class _UnaryOp, class _Tp, class _BinaryOp, class _Reduce, class _Scan>
				_Tp
				parallel_transform_scan(_ExecutionPolicy&&, _Index __n, _UnaryOp, _Tp __init, _BinaryOp, _Reduce, _Scan __scan)
				{
				return __scan(_Index(0), __n, __init);
				ldionneAuthorUnsubmitted Done Reply Inline Actions So, I strongly suspect this implementation is incorrect. Can you draft what a correct serial implementation would be? ldionne: So, I strongly suspect this implementation is incorrect. Can you draft what a correct serial…
				ldionneAuthorUnsubmitted Done Reply Inline Actions This is the only thing I strongly suspect is incorrectly implemented. Can you please check this? ldionne: This is the only thing I strongly suspect is incorrectly implemented. Can you please check this?
				MikeDvorskiyUnsubmitted Not Done Reply Inline Actions I see your question about strict_scan as well. I need additional time to answer regarding right serial code here... I will try to answer today or tomorrow.. MikeDvorskiy: I see your question about strict_scan as well. I need additional time to answer regarding…
				MikeDvorskiyUnsubmitted Not Done Reply Inline Actions Yes, this serial version is right for __parallel_strict_scan. MikeDvorskiy: Yes, this serial version is right for __parallel_strict_scan.
				}

				template <class _ExecutionPolicy, typename _RandomAccessIterator, typename _Compare, typename _LeafSort>
				void
				parallel_stable_sort(_ExecutionPolicy&&, _RandomAccessIterator __first, _RandomAccessIterator __last, _Compare __comp, _LeafSort __leaf_sort, std::size_t __nsort = 0)
				{
				// TODO: What to do with __nsort?
				__leaf_sort(__first, __last, __comp);
				ldionneAuthorUnsubmitted Done Reply Inline Actions Is `__nsort` the number of elements that should be sorted in the resulting range? IOW, this is a partial sort where you want to get the `n` smallest elements of the whole range in sorted order. It's not clear to me how to achieve this without a `__leaf_sort` that itself accepts a `n` to only partially sort the first `n` elements. I mean, I could probably find a way to do it iteratively by sorting a bit more every time, but what I really want is to just call `std::partial_sort`. And actually, while we're at it, calling `std::partial_sort` isn't enough, since this sort needs to be stable. But we don't have a `std::stable_partial_sort`. ldionne: Is `__nsort` the number of elements that should be sorted in the resulting range? IOW, this is…
				MikeDvorskiyUnsubmitted Not Done Reply Inline Actions Actually, leaf_sort is not only std::sort or std::partial_sort. leaf_sort (as lambda) has already captured information about "sort" or "partial sort" - "__n" variable. See, the code of the passed lambda into parallel_stable_sort: [__n](_RandomAccessIterator __begin, _RandomAccessIterator __end, _Compare __comp) { if (__n < __end - __begin) std::partial_sort(__begin, __begin + __n, __end, __comp); else std::sort(__begin, __end, __comp); }, So, in case of serial case here you don't use __nsort parameter, due to you the special lambda which takes into account "sort" or "partial sort" logic. In case of parallel implementation n parameter may be useful - for merging of two sorted sub-ranges , just first n elements.... MikeDvorskiy: Actually, __leaf_sort is not only std::sort or std::partial_sort. __leaf_sort (as lambda) has…
				ldionneAuthorUnsubmitted Done Reply Inline Actions Ah, I missed the bit in the lambda. So this implementation is correct, then. Thanks! ldionne: Ah, I missed the bit in the lambda. So this implementation is correct, then. Thanks!
				}

				template <class _ExecutionPolicy, typename _RandomAccessIterator1, typename _RandomAccessIterator2,
				typename _RandomAccessIterator3, typename _Compare, typename _LeafMerge>
				void
				parallel_merge(_ExecutionPolicy&&, _RandomAccessIterator1 __first1, _RandomAccessIterator1 __last1,
				_RandomAccessIterator2 __first2, _RandomAccessIterator2 __last2,
				_RandomAccessIterator3 __out,
				_Compare __comp,
				_LeafMerge __leaf_merge)
				{
				__leaf_merge(__first1, __last1, __first2, __last2, __out, __comp);
				ldionneAuthorUnsubmitted Done Reply Inline Actions Note that the obvious implementation doesn't work: std::merge(__first1, __last1, __first2, __last2, __out, __comp); because it requires the elements to be copyable, but pstl apparently expects the merge to only move elements around without copying them. ldionne: Note that the obvious implementation doesn't work: ``` std::merge(__first1, __last1, __first2…
				MikeDvorskiyUnsubmitted Not Done Reply Inline Actions Why "doesnt work"? In special case ... if (__n <= __merge_cut_off) { // Fall back on serial merge __leaf_merge(__xs, __xe, __ys, __ye, __zs, __comp); } ... where __leaf_merge is std::merge MikeDvorskiy: Why "doesnt work"? In special case ... if (__n <= __merge_cut_off) { // Fall…
				ldionneAuthorUnsubmitted Done Reply Inline Actions Like I said, `std::merge` requires elements to be copyable, but the PSTL tests call `parallel_merge` with elements that are move-only. It also looks like you guys went through some hoops to make this work, see `__serial_move_merge` in `parallel_backend_utils.h`. However, since it looks like calling `__leaf_merge` is a valid implementation, let's revisit this copyability issue at another point. I want to land this patch ASAP because other stuff depends on it. In case you're curious, the error message for copyability looks like this: <toolchain>/usr/include/c++/v1/algorithm:4392:23: error: object of type 'LocalWrapper<float>' cannot be assigned because its copy assignment operator is implicitly deleted __result = __first2; ^ <toolchain>/usr/include/c++/v1/algorithm:4416:19: note: in instantiation of function template specialization 'std::__1::__merge<std::__1::less<LocalWrapper<float> > &, std::__1::__wrap_iter<LocalWrapper<float> >, std::__1::__wrap_iter<LocalWrapper<float> >, LocalWrapper<float> >' requested here return _VSTD::__merge<_Comp_ref>(__first1, __last1, __first2, __last2, __result, __comp); ^ <pstl-root>/include/pstl/internal/parallel_backend_serial.h:117:10: note: in instantiation of function template specialization 'std::__1::merge<std::__1::__wrap_iter<LocalWrapper<float> >, std::__1::__wrap_iter<LocalWrapper<float> >, LocalWrapper<float> , std::__1::less<LocalWrapper<float> > >' requested here std::merge(__first1, __last1, __first2, __last2, __out, __comp); ^ <pstl-root>/include/pstl/internal/algorithm_impl.h:2667:24: note: in instantiation of function template specialization '__pstl::__serial::__parallel_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> >, std::__1::__wrap_iter<LocalWrapper<float> >, LocalWrapper<float> , std::__1::less<LocalWrapper<float> >, (lambda at <pstl-root>/include/pstl/internal/algorithm_impl.h:2669:13)>' requested here __par_backend::__parallel_merge( ^ <pstl-root>/include/pstl/internal/glue_algorithm_impl.h:899:17: note: in instantiation of function template specialization '__pstl::__internal::__pattern_inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> >, std::__1::less<LocalWrapper<float> >, std::__1::integral_constant<bool, false> >' requested here __internal::__pattern_inplace_merge( ^ <pstl-root>/test/std/algorithms/alg.merge/inplace_merge.pass.cpp:63:14: note: in instantiation of function template specialization 'std::inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> >, std::__1::less<LocalWrapper<float> > >' requested here std::inplace_merge(exec, first2, mid2, last2, comp); ^ <pstl-root>/test/support/utils.h:757:9: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all) op(std::forward<Rest>(rest)...); ^ ... ldionne:* Like I said, `std::merge` requires elements to be copyable, but the PSTL tests call…
				MikeDvorskiyUnsubmitted Not Done Reply Inline Actions Yes, std::merge requires "copyable". Actually, the all "move" operations in "__serial_move_merge" will reduce to "copy" operations If "move" semantic is not specified. But, you are right. If we pass non-const iterator into std::merge and a value type has the non-trivial move semantic we have got wrong effect... It may be quickly fixed in "__parallel_merge". I'll do it. But the compiler diagnostic shown here tells about "std::inplace_merge" algo. This algo requires just move semantic and the test checks it. The problem is that pattern_inplace_merge re-uses par_backend::parallel_merge due to there is no a special par_backend::parallelinplace_merge API. I think it makes sense to add par_backend::parallelinplace_merge API (and move the relevant code from pattern_inplace_merge into __par_backend). I'll do it. After that in your serial back-end you should just call std::inplace_merge serial. Thanks for raising the issues. MikeDvorskiy: 1. Yes, std::merge requires "copyable". Actually, the all "move" operations in…
				}

				template <class _ExecutionPolicy, typename _F1, typename _F2>
				void
				parallel_invoke(_ExecutionPolicy&&, _F1&& __f1, _F2&& __f2)
				{
				std::forward<_F1>(__f1)();
				std::forward<_F2>(__f2)();
				}

				}} // namespace __pstl::__serial

				namespace __pstl { namespace par_backend {
				using namespace __pstl::__serial; // TODO: We need a better way to customize the backend
				}}

				#endif /* __PSTL_parallel_backend_serial_H */

pstl/include/pstl/internal/pstl_config.h

	Show All 17 Lines
	#if defined(PSTL_USE_PARALLEL_POLICIES)			#if defined(PSTL_USE_PARALLEL_POLICIES)
	#undef __PSTL_USE_PAR_POLICIES			#undef __PSTL_USE_PAR_POLICIES
	#define __PSTL_USE_PAR_POLICIES PSTL_USE_PARALLEL_POLICIES			#define __PSTL_USE_PAR_POLICIES PSTL_USE_PARALLEL_POLICIES
	// Check the internal macro for parallel policies			// Check the internal macro for parallel policies
	#elif !defined(__PSTL_USE_PAR_POLICIES)			#elif !defined(__PSTL_USE_PAR_POLICIES)
	#define __PSTL_USE_PAR_POLICIES 1			#define __PSTL_USE_PAR_POLICIES 1
	#endif			#endif

	#if __PSTL_USE_PAR_POLICIES			#if !defined(__PSTL_PAR_BACKEND_SERIAL) && !defined(__PSTL_PAR_BACKEND_TBB)
	#if !defined(__PSTL_PAR_BACKEND_TBB)			# error "The parallel backend is neither serial nor TBB"
	#define __PSTL_PAR_BACKEND_TBB 1
	#endif
	#else
	#undef __PSTL_PAR_BACKEND_TBB
	#endif			#endif

	// Check the user-defined macro for warnings			// Check the user-defined macro for warnings
	#if defined(PSTL_USAGE_WARNINGS)			#if defined(PSTL_USAGE_WARNINGS)
	#undef __PSTL_USAGE_WARNINGS			#undef __PSTL_USAGE_WARNINGS
	#define __PSTL_USAGE_WARNINGS PSTL_USAGE_WARNINGS			#define __PSTL_USAGE_WARNINGS PSTL_USAGE_WARNINGS
	// Check the internal macro for warnings			// Check the internal macro for warnings
	#elif !defined(__PSTL_USAGE_WARNINGS)			#elif !defined(__PSTL_USAGE_WARNINGS)
	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[pstl] Add a serial backend for the PSTLClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 192179

pstl/CMakeLists.txt

pstl/include/pstl/internal/parallel_backend.h

pstl/include/pstl/internal/parallel_backend_serial.h

pstl/include/pstl/internal/pstl_config.h

[pstl] Add a serial backend for the PSTL
ClosedPublic