This is an archive of the discontinued LLVM Phabricator instance.

Implement inclusive_scan and transform_inclusive_scan
AbandonedPublic

Authored by mclow.lists on Jun 7 2017, 12:16 PM.

Details

Reviewers
EricWF
wash
Summary

Like D33997, this implements the non-parallel versions of these algorithms

D33997 implemented reduce and transform_reduce, this adds inclusive_scan and transform_inclusive_scan.

There will be another patch that adds exclusive_scan and transform_exclusive_scan

Diff Detail

Event Timeline

mclow.lists created this revision.Jun 7 2017, 12:16 PM

Re-reading this, I may have implemented exclusive_scan instead of inclusive_scan here.

mclow.lists abandoned this revision.Jun 7 2017, 3:51 PM

I don't think that this is a correct implementation. Also, I need tests for when the result overwrites the source.
As they say .. I'll be back :-)

wash edited edge metadata.Jun 8 2017, 3:33 PM

Minor note: there's a mix of tabs and spaces in this diff.

wash added a comment.EditedJun 8 2017, 3:35 PM

So, the inclusive_scan overloads that do not take an init parameter should be equivalent to partial_sum for the non-parallel version.

Here's partial_sum:

template <class _InputIterator, class _OutputIterator, class _BinaryOperation>
inline _LIBCPP_INLINE_VISIBILITY
_OutputIterator
partial_sum(_InputIterator __first, _InputIterator __last, _OutputIterator __result,
            _BinaryOperation __binary_op)
{
    if (__first != __last)
    {
        typename iterator_traits<_InputIterator>::value_type __t(*__first);
        *__result = __t;
        for (++__first, (void) ++__result; __first != __last; ++__first, (void) ++__result)
        {
            __t = __binary_op(__t, *__first);
            *__result = __t;
        }
    }
    return __result;
}

And here's the inclusive_scan that should be equivalent to that partial_sum:

template <class _InputIterator, class _OutputIterator, class _BinaryOp>
inline _LIBCPP_INLINE_VISIBILITY
_OutputIterator
inclusive_scan(_InputIterator __first, _InputIterator __last, 
               _OutputIterator __result, _BinaryOp __b)
{
    if (__first != __last) {
        typename iterator_traits<_InputIterator>::value_type __init = *__first++;
        return inclusive_scan(__first, __last, __result, __b, __init);
     }
    
    return __result;
}

The inclusive_scan that it forwards to is:

template <class _InputIterator, class _OutputIterator, class _Tp, class _BinaryOp>
inline _LIBCPP_INLINE_VISIBILITY
_OutputIterator 
inclusive_scan(_InputIterator __first, _InputIterator __last, 
               _OutputIterator __result, _BinaryOp __b,  _Tp __init)
{
    *__result++ = __init;
    for (; __first != __last; ++__first) {
        __init = __b(__init, *__first);
        *__result++ = __init;
     }

    return __result;
}

Inlining it, we get:

template <class _InputIterator, class _OutputIterator, class _BinaryOp>
inline _LIBCPP_INLINE_VISIBILITY
_OutputIterator
inclusive_scan(_InputIterator __first, _InputIterator __last, 
               _OutputIterator __result, _BinaryOp __b)
{
    if (__first != __last) {
        typename iterator_traits<_InputIterator>::value_type __init = *__first++;
        *__result++ = __init;

        for (; __first != __last; ++__first) {
            __init = __b(__init, *__first);
            *__result++ = __init;
        }
     }
    
    return __result;
}

That looks equivalent to the partial_sum implementation above, so I think it is correct.