The serial backend performs all tasks serially and does not require
threads. It does not have any dependencies beyond normal C++, but
it is not very efficient either.
Details
Diff Detail
- Repository
- rPSTL pstl
- Build Status
Buildable 29592 Build 29591: arc lint + arc unit
Event Timeline
This is a WIP patch for adding a serial backend. This is a work in progress because I strongly suspect I'm not implementing the backend correctly, and I'm looking for guidance on how to do it correctly. One of the difficulties in implementing this was the lack of documentation for the parameters of each function constituting the backend. For example, I have no idea what __real_body is in
parallel_reduce(_ExecutionPolicy&&, _Index __first, _Index __last, const _Value& __identity, const _RealBody& __real_body, const _Reduction&)
so I tried guessing from the existing code. I feel it would be useful to document the backend API properly and I'm willing to do it, but I need help from @MikeDvorskiy and others to achieve that.
Hi Louis,
Actually, there is a draft of Parallel STL back-end API documentation. Probably, not so detailed, but there is. I sent it Thomas several month ago. I've just sent you as well.
M.b. make sense to put one into the repo?
And of course , don't hesitate writing directly me about Parallel STL first to prevent wasting time for guessing)..
In essence. I think we don't need to write a special serial back-end if we want a serial execution by a parallel policy. According to PSTL design we can just additional compile time dispatching om "pattern of bricks" design level. It will do a re-direct to the serial patterns. See "is_parallelization_preferred".
Furthermore, we can do just one change within "is_parallelization_preferred" or even, within the parallel policy traits. But I tend to "is_parallelization_preferred" and avoid the policy traits modification.
For example,
template <typename _ExecutionPolicy, typename... _IteratorTypes> auto is_parallelization_preferred(_ExecutionPolicy&& __exec) -> decltype(lazy_and(__exec.__allow_parallel(), typename is_random_access_iterator<_IteratorTypes...>::type())) { #if __PSTL_PAR_BACKEND_SERIAL return std::false_type(); #else return internal::lazy_and(__exec.__allow_parallel(), typename is_random_access_iterator<_IteratorTypes...>::type()); #endif }
Per today's discussion, there IS interest for having a serial backend from multiple vendors. I think we all agreed that this was worth implementing, now I have to figure out how to implement it properly.
I think we all agreed that this was worth implementing
Yes, we agreed. I'm ready to explain any details If you need.
See questions in this review.
pstl/include/pstl/internal/parallel_backend_serial.h | ||
---|---|---|
86 | So, I strongly suspect this implementation is incorrect. Can you draft what a correct serial implementation would be? | |
94 | Is __nsort the number of elements that should be sorted in the resulting range? IOW, this is a partial sort where you want to get the n smallest elements of the whole range in sorted order. It's not clear to me how to achieve this without a __leaf_sort that itself accepts a n to only partially sort the first n elements. I mean, I could probably find a way to do it iteratively by sorting a bit more every time, but what I really want is to just call std::partial_sort. And actually, while we're at it, calling std::partial_sort isn't enough, since this sort needs to be stable. But we don't have a std::stable_partial_sort. | |
106 | Note that the obvious implementation doesn't work: std::merge(__first1, __last1, __first2, __last2, __out, __comp); because it requires the elements to be copyable, but pstl apparently expects the merge to only move elements around without copying them. |
pstl/include/pstl/internal/parallel_backend_serial.h | ||
---|---|---|
94 | Actually, leaf_sort is not only std::sort or std::partial_sort. [__n](_RandomAccessIterator __begin, _RandomAccessIterator __end, _Compare __comp) { if (__n < __end - __begin) std::partial_sort(__begin, __begin + __n, __end, __comp); else std::sort(__begin, __end, __comp); }, So, in case of serial case here you don't use __nsort parameter, due to you the special lambda which takes into account "sort" or "partial sort" logic. In case of parallel implementation n parameter may be useful - for merging of two sorted sub-ranges , just first n elements.... | |
106 | Why "doesnt work"? In special case if (__n <= __merge_cut_off) { // Fall back on serial merge __leaf_merge(__xs, __xe, __ys, __ye, __zs, __comp); } ... |
I'd love to get concrete feedback on whether the implementation of the algorithms is wrong, and if so, why. This would be really helpful in understanding the backend API better and would allow me to make progress on implementing this backend, which is a pre-requisite for many other things.
Also note that I don't expect this backend implementation to be final after this commit. I just want us to get something "correct" so as to nail down the semantics of the backend functions and make progress on other tasks. I strongly suspect that we'll have to make changes to the backend API to get a more straightforward serial implementation, but now's not the time to tackle this.
pstl/include/pstl/internal/parallel_backend_serial.h | ||
---|---|---|
86 | This is the only thing I strongly suspect is incorrectly implemented. Can you please check this? | |
94 | Ah, I missed the bit in the lambda. So this implementation is correct, then. Thanks! | |
106 | Like I said, std::merge requires elements to be copyable, but the PSTL tests call parallel_merge with elements that are move-only. It also looks like you guys went through some hoops to make this work, see __serial_move_merge in parallel_backend_utils.h. However, since it looks like calling __leaf_merge is a valid implementation, let's revisit this copyability issue at another point. I want to land this patch ASAP because other stuff depends on it. In case you're curious, the error message for copyability looks like this: <toolchain>/usr/include/c++/v1/algorithm:4392:23: error: object of type 'LocalWrapper<float>' cannot be assigned because its copy assignment operator is implicitly deleted *__result = *__first2; ^ <toolchain>/usr/include/c++/v1/algorithm:4416:19: note: in instantiation of function template specialization 'std::__1::__merge<std::__1::less<LocalWrapper<float> > &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *>' requested here return _VSTD::__merge<_Comp_ref>(__first1, __last1, __first2, __last2, __result, __comp); ^ <pstl-root>/include/pstl/internal/parallel_backend_serial.h:117:10: note: in instantiation of function template specialization 'std::__1::merge<std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *, std::__1::less<LocalWrapper<float> > >' requested here std::merge(__first1, __last1, __first2, __last2, __out, __comp); ^ <pstl-root>/include/pstl/internal/algorithm_impl.h:2667:24: note: in instantiation of function template specialization '__pstl::__serial::__parallel_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::__wrap_iter<LocalWrapper<float> *>, LocalWrapper<float> *, std::__1::less<LocalWrapper<float> >, (lambda at <pstl-root>/include/pstl/internal/algorithm_impl.h:2669:13)>' requested here __par_backend::__parallel_merge( ^ <pstl-root>/include/pstl/internal/glue_algorithm_impl.h:899:17: note: in instantiation of function template specialization '__pstl::__internal::__pattern_inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::less<LocalWrapper<float> >, std::__1::integral_constant<bool, false> >' requested here __internal::__pattern_inplace_merge( ^ <pstl-root>/test/std/algorithms/alg.merge/inplace_merge.pass.cpp:63:14: note: in instantiation of function template specialization 'std::inplace_merge<const __pstl::execution::v1::parallel_policy &, std::__1::__wrap_iter<LocalWrapper<float> *>, std::__1::less<LocalWrapper<float> > >' requested here std::inplace_merge(exec, first2, mid2, last2, comp); ^ <pstl-root>/test/support/utils.h:757:9: note: (skipping 2 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all) op(std::forward<Rest>(rest)...); ^ ... |
pstl/include/pstl/internal/parallel_backend_serial.h | ||
---|---|---|
106 |
Actually, the all "move" operations in "__serial_move_merge" will reduce to "copy" operations If "move" semantic is not specified. But, you are right. If we pass non-const iterator into std::merge and a value type has the non-trivial move semantic we have got wrong effect...
The problem is that pattern_inplace_merge re-uses par_backend::parallel_merge due to there is no a special par_backend::parallelinplace_merge API. I think it makes sense to add par_backend::parallelinplace_merge API (and move the relevant code from pattern_inplace_merge into __par_backend). I'll do it. After that in your serial back-end you should just call std::inplace_merge serial. Thanks for raising the issues. |
pstl/include/pstl/internal/parallel_backend_serial.h | ||
---|---|---|
86 | I see your question about strict_scan as well. |
pstl/include/pstl/internal/parallel_backend_serial.h | ||
---|---|---|
86 | Yes, this serial version is right for __parallel_strict_scan. |
Is there anything to fix in this patch? If not, and if that implementation is the correct one given today's backend API, I'd like to check this in so we can do followup cleanup. It'll also help me as I try to see if a more general backend API is possible because I'll have both the TBB and the serial (trivial) examples to work off of.
If the problem with passing test inplace_merge.pass.cpp" is actual, I'll investigate and fix it by proposed new patch over the top your changes.
So, I strongly suspect this implementation is incorrect. Can you draft what a correct serial implementation would be?