This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
benchmarks/
-
CMakeLists.txt
-
algorithms/
-
reduce.bench.cpp
-
transform_reduce.bench.cpp
-
include/
-
__config
-
__functional/
-
operations.h
-
__numeric/
1/2
reduce.h
2
transform_reduce.h
-
__type_traits/
-
operation_traits.h
-
test/std/numerics/numeric.ops/
-
std/
-
numerics/
-
numeric.ops/
-
reduce/
-
reduce.pass.cpp
-
transform.reduce/
-
transform_reduce_iter_iter_iter_init.pass.cpp

Differential D151521

[libc++] Optimize transform_reduce for floating point types
Needs ReviewPublic

Authored by philnik on May 25 2023, 4:58 PM.

Download Raw Diff

Details

Reviewers

None

Group Reviewers

Restricted Project

Summary

The standard doesn't define the order of execution for transform_reduce on purpose, so we might as well make use of it.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	20,960 ms	libcxx CI - C++26 > llvm-libc++-shared-cfg-in.libcxx::transitive_includes.sh.cpp
	1,270 ms	libcxx CI - GCC 12 / C++latest > llvm-libc++abi-shared-gcc-cfg-in.llvm-libc++abi-shared-gcc-cfg-in::test_demangle.pass.cpp
	410 ms	libcxx CI - GCC 12 / C++latest > llvm-libc++abi-shared-gcc-cfg-in.llvm-libc++abi-shared-gcc-cfg-in::test_exception_storage.pass.cpp
	1,640 ms	libcxx CI - GCC 12 / C++latest > llvm-libc++abi-shared-gcc-cfg-in.llvm-libc++abi-shared-gcc-cfg-in::unittest_demangle.pass.cpp
	1,180 ms	libcxx CI - Modular build > llvm-libc++-shared-cfg-in.std/algorithms/numeric_ops/reduce::pstl.reduce.pass.cpp
		View Full Test Results (7 Failed)

Event Timeline

philnik created this revision.May 25 2023, 4:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 4:59 PM

philnik requested review of this revision.May 25 2023, 4:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2023, 4:59 PM

Herald added a reviewer: Restricted Project. · View Herald Transcript

Herald added subscribers: libcxx-commits, • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B234718: Diff 525884.May 25 2023, 4:59 PM

philnik added a parent revision: D150736: [libc++][PSTL] Implement std::reduce and std::transform_reduce.May 26 2023, 9:15 AM

Fix stuff

Benchmarks:

------------------------------------------------------
Benchmark                           old            new
------------------------------------------------------
bm_reduce<float>/1             0.555 ns       0.710 ns
bm_reduce<float>/2             0.703 ns        1.58 ns
bm_reduce<float>/3              1.56 ns        1.88 ns
bm_reduce<float>/4              1.88 ns        2.19 ns
bm_reduce<float>/5              2.21 ns        2.51 ns
bm_reduce<float>/6              2.50 ns        2.84 ns
bm_reduce<float>/7              2.82 ns        3.16 ns
bm_reduce<float>/8              3.13 ns        3.45 ns
bm_reduce<float>/16             5.63 ns        1.69 ns
bm_reduce<float>/64             26.9 ns        2.88 ns
bm_reduce<float>/512             412 ns        17.9 ns
bm_reduce<float>/4096           3778 ns         219 ns
bm_reduce<float>/32768         30722 ns        1913 ns
bm_reduce<float>/262144       246213 ns       15445 ns
bm_reduce<float>/1048576      994918 ns       62948 ns

----------------------------------------------------------------
Benchmark                                    old             new
----------------------------------------------------------------
bm_transform_reduce<float>/1            0.863 ns        0.872 ns
bm_transform_reduce<float>/2             1.57 ns         1.58 ns
bm_transform_reduce<float>/3             1.88 ns         1.92 ns
bm_transform_reduce<float>/4             2.19 ns         2.23 ns
bm_transform_reduce<float>/5             2.50 ns         2.51 ns
bm_transform_reduce<float>/6             2.83 ns         2.82 ns
bm_transform_reduce<float>/7             3.13 ns         3.14 ns
bm_transform_reduce<float>/8             3.44 ns         3.46 ns
bm_transform_reduce<float>/16            3.23 ns         2.24 ns
bm_transform_reduce<float>/64            19.6 ns         4.56 ns
bm_transform_reduce<float>/512            356 ns         30.3 ns
bm_transform_reduce<float>/4096          3720 ns          241 ns
bm_transform_reduce<float>/32768        31159 ns         3177 ns
bm_transform_reduce<float>/262144      246021 ns        25263 ns
bm_transform_reduce<float>/1048576    1069957 ns       111642 ns

ldionne added a subscriber: ldionne.May 26 2023, 11:43 AM

ldionne added inline comments.

libcxx/include/__numeric/reduce.h
38	Same here.
libcxx/include/__numeric/transform_reduce.h
51	We shouldn't be optimizing for user-defined types even if the operation is "trivial" (also applies to the other algo). This makes me think that we need to revisit our definition of a trivial operation. `std::plus<>` is trivial but only for a specific set of types. And in fact I am not certain what we mean by "is trivial" anymore.
59–63	I would call the `unseq` version in the PSTL instead, that's more general and we'll end up using the appropriate backend.

Not looked at the patch closely but I wonder about the usage of the pragma and their scope.

libcxx/include/__numeric/reduce.h
41	Will this restore the old value of `clang loop vectorize`?

Harbormaster completed remote builds in B234886: Diff 526095.May 27 2023, 2:47 AM

Revision Contents

Path

Size

libcxx/

benchmarks/

CMakeLists.txt

2 lines

algorithms/

reduce.bench.cpp

29 lines

transform_reduce.bench.cpp

31 lines

include/

__config

6 lines

__functional/

operations.h

6 lines

__numeric/

reduce.h

13 lines

transform_reduce.h

23 lines

__type_traits/

operation_traits.h

9 lines

test/

std/

numerics/

numeric.ops/

reduce/

reduce.pass.cpp

6 lines

transform.reduce/

transform_reduce_iter_iter_iter_init.pass.cpp

8 lines

Diff 526095

libcxx/benchmarks/CMakeLists.txt

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	set(BENCHMARK_TESTS
algorithms/push_heap.bench.cpp		algorithms/push_heap.bench.cpp
algorithms/ranges_make_heap.bench.cpp		algorithms/ranges_make_heap.bench.cpp
algorithms/ranges_make_heap_then_sort_heap.bench.cpp		algorithms/ranges_make_heap_then_sort_heap.bench.cpp
algorithms/ranges_pop_heap.bench.cpp		algorithms/ranges_pop_heap.bench.cpp
algorithms/ranges_push_heap.bench.cpp		algorithms/ranges_push_heap.bench.cpp
algorithms/ranges_sort.bench.cpp		algorithms/ranges_sort.bench.cpp
algorithms/ranges_sort_heap.bench.cpp		algorithms/ranges_sort_heap.bench.cpp
algorithms/ranges_stable_sort.bench.cpp		algorithms/ranges_stable_sort.bench.cpp
		algorithms/reduce.bench.cpp
algorithms/sort.bench.cpp		algorithms/sort.bench.cpp
algorithms/sort_heap.bench.cpp		algorithms/sort_heap.bench.cpp
algorithms/stable_sort.bench.cpp		algorithms/stable_sort.bench.cpp
		algorithms/transform_reduce.bench.cpp
libcxxabi/dynamic_cast.bench.cpp		libcxxabi/dynamic_cast.bench.cpp
allocation.bench.cpp		allocation.bench.cpp
deque.bench.cpp		deque.bench.cpp
deque_iterator.bench.cpp		deque_iterator.bench.cpp
filesystem.bench.cpp		filesystem.bench.cpp
format_to_n.bench.cpp		format_to_n.bench.cpp
format_to.bench.cpp		format_to.bench.cpp
format.bench.cpp		format.bench.cpp
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

libcxx/benchmarks/algorithms/reduce.bench.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include <benchmark/benchmark.h>
				#include <numeric>

				template <class T>
				static void bm_reduce(benchmark::State& state) {
				auto size = state.range();
				std::vector<T> a;
				for (size_t i = 0; i != size; ++i) {
				a.emplace_back(i * 3);
				}

				for (auto _ : state) {
				benchmark::ClobberMemory();
				auto ret = std::reduce(a.begin(), a.end());
				benchmark::DoNotOptimize(ret);
				}
				}

				BENCHMARK(bm_reduce<float>)->DenseRange(1, 8)->Range(16, 1 << 20);

				BENCHMARK_MAIN();

libcxx/benchmarks/algorithms/transform_reduce.bench.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include <benchmark/benchmark.h>
				#include <numeric>

				template <class T>
				static void bm_transform_reduce(benchmark::State& state) {
				auto size = state.range();
				std::vector<T> a;
				std::vector<T> b;
				for (size_t i = 0; i != size; ++i) {
				a.emplace_back(i * 3);
				b.emplace_back(i + 2);
				}

				for (auto _ : state) {
				benchmark::ClobberMemory();
				auto ret = std::transform_reduce(a.begin(), a.end(), b.begin(), T(1));
				benchmark::DoNotOptimize(ret);
				}
				}

				BENCHMARK(bm_transform_reduce<float>)->DenseRange(1, 8)->Range(16, 1 << 20);

				BENCHMARK_MAIN();

libcxx/include/__config

	Show First 20 Lines • Show All 1,272 Lines • ▼ Show 20 Lines
	# endif			# endif

	// TODO(varconst): currently, there are bugs in Clang's intrinsics when handling Objective-C++ `id`, so don't use			// TODO(varconst): currently, there are bugs in Clang's intrinsics when handling Objective-C++ `id`, so don't use
	// compiler intrinsics in the Objective-C++ mode.			// compiler intrinsics in the Objective-C++ mode.
	# ifdef __OBJC__			# ifdef __OBJC__
	# define _LIBCPP_WORKAROUND_OBJCXX_COMPILER_INTRINSICS			# define _LIBCPP_WORKAROUND_OBJCXX_COMPILER_INTRINSICS
	# endif			# endif

				#ifdef _LIBCPP_COMPILER_CLANG_BASED
				# define _LIBCPP_CLANG_PRAGMA(...) _Pragma(__VA_ARGS__)
				#else
				# define _LIBCPP_CLANG_PRAGMA(...)
				#endif

	// TODO: Make this a proper configuration option			// TODO: Make this a proper configuration option
	#define _PSTL_PAR_BACKEND_SERIAL			#define _PSTL_PAR_BACKEND_SERIAL

	#define _PSTL_PRAGMA(x) _Pragma(# x)			#define _PSTL_PRAGMA(x) _Pragma(# x)

	// Enable SIMD for compilers that support OpenMP 4.0			// Enable SIMD for compilers that support OpenMP 4.0
	#if (defined(_OPENMP) && _OPENMP >= 201307)			#if (defined(_OPENMP) && _OPENMP >= 201307)

	Show All 35 Lines

libcxx/include/__functional/operations.h

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	{			{
	typedef _Tp __result_type; // used by valarray			typedef _Tp __result_type; // used by valarray
	_LIBCPP_CONSTEXPR_SINCE_CXX14 _LIBCPP_INLINE_VISIBILITY			_LIBCPP_CONSTEXPR_SINCE_CXX14 _LIBCPP_INLINE_VISIBILITY
	_Tp operator()(const _Tp& __x, const _Tp& __y) const			_Tp operator()(const _Tp& __x, const _Tp& __y) const
	{return __x * __y;}			{return __x * __y;}
	};			};
	_LIBCPP_CTAD_SUPPORTED_FOR_TYPE(multiplies);			_LIBCPP_CTAD_SUPPORTED_FOR_TYPE(multiplies);

				template <class _Tp>
				struct __is_trivial_multiplies_operation<multiplies<_Tp>, _Tp, _Tp> : true_type {};

	#if _LIBCPP_STD_VER >= 14			#if _LIBCPP_STD_VER >= 14
	template <>			template <>
	struct _LIBCPP_TEMPLATE_VIS multiplies<void>			struct _LIBCPP_TEMPLATE_VIS multiplies<void>
	{			{
	template <class _T1, class _T2>			template <class _T1, class _T2>
	_LIBCPP_CONSTEXPR_SINCE_CXX14 _LIBCPP_INLINE_VISIBILITY			_LIBCPP_CONSTEXPR_SINCE_CXX14 _LIBCPP_INLINE_VISIBILITY
	auto operator()(_T1&& __t, _T2&& __u) const			auto operator()(_T1&& __t, _T2&& __u) const
	noexcept(noexcept(_VSTD::forward<_T1>(__t) * _VSTD::forward<_T2>(__u)))			noexcept(noexcept(_VSTD::forward<_T1>(__t) * _VSTD::forward<_T2>(__u)))
	-> decltype( _VSTD::forward<_T1>(__t) * _VSTD::forward<_T2>(__u))			-> decltype( _VSTD::forward<_T1>(__t) * _VSTD::forward<_T2>(__u))
	{ return _VSTD::forward<_T1>(__t) * _VSTD::forward<_T2>(__u); }			{ return _VSTD::forward<_T1>(__t) * _VSTD::forward<_T2>(__u); }
	typedef void is_transparent;			typedef void is_transparent;
	};			};

				template <class _Lhs, class _Rhs>
				struct __is_trivial_multiplies_operation<multiplies<>, _Lhs, _Rhs> : true_type {};
	#endif			#endif

	#if _LIBCPP_STD_VER >= 14			#if _LIBCPP_STD_VER >= 14
	template <class _Tp = void>			template <class _Tp = void>
	#else			#else
	template <class _Tp>			template <class _Tp>
	#endif			#endif
	struct _LIBCPP_TEMPLATE_VIS divides			struct _LIBCPP_TEMPLATE_VIS divides
	▲ Show 20 Lines • Show All 470 Lines • Show Last 20 Lines

libcxx/include/__numeric/reduce.h

	// -- C++ --			// -- C++ --
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef _LIBCPP___NUMERIC_REDUCE_H			#ifndef _LIBCPP___NUMERIC_REDUCE_H
	#define _LIBCPP___NUMERIC_REDUCE_H			#define _LIBCPP___NUMERIC_REDUCE_H

	#include <__config>			#include <__config>
	#include <__functional/operations.h>			#include <__functional/operations.h>
	#include <__iterator/iterator_traits.h>			#include <__iterator/iterator_traits.h>
				#include <__type_traits/is_same.h>
				#include <__type_traits/operation_traits.h>

	#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)			#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
	# pragma GCC system_header			# pragma GCC system_header
	#endif			#endif

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

	#if _LIBCPP_STD_VER >= 17			#if _LIBCPP_STD_VER >= 17
	template <class _InputIterator, class _Tp, class _BinaryOp>			template <class _InputIterator, class _Tp, class _BinaryOp>
	_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp reduce(_InputIterator __first, _InputIterator __last,			_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp reduce(_InputIterator __first, _InputIterator __last,
	_Tp __init, _BinaryOp __b) {			_Tp __init, _BinaryOp __b) {
	for (; __first != __last; ++__first)			for (; __first != __last; ++__first)
	__init = __b(__init, *__first);			__init = __b(__init, *__first);
	return __init;			return __init;
	}			}

				#if _LIBCPP_STD_VER >= 20
				template <class _InputIterator, class _Tp, class _BinaryOp>
				requires(is_same_v<__iter_value_type<_InputIterator>, _Tp> && __is_trivial_operation<_BinaryOp, _Tp, _Tp>::value)
				_LIBCPP_HIDE_FROM_ABI constexpr _Tp reduce(_InputIterator __first, _InputIterator __last, _Tp __init, _BinaryOp __b) {
				_LIBCPP_CLANG_PRAGMA("clang loop vectorize(enable)")
				ldionneUnsubmitted Not Done Reply Inline Actions Same here. ldionne: Same here.
				for (; __first != __last; ++__first)
				__init = __b(__init, *__first);
				return __init;
				MordanteUnsubmitted Not Done Reply Inline Actions Will this restore the old value of `clang loop vectorize`? Mordante: Will this restore the old value of `clang loop vectorize`?
				}
				# endif // _LIBCPP_STD_VER >= 20

	template <class _InputIterator, class _Tp>			template <class _InputIterator, class _Tp>
	_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp reduce(_InputIterator __first, _InputIterator __last,			_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp reduce(_InputIterator __first, _InputIterator __last,
	_Tp __init) {			_Tp __init) {
	return _VSTD::reduce(__first, __last, __init, _VSTD::plus<>());			return _VSTD::reduce(__first, __last, __init, _VSTD::plus<>());
	}			}

	template <class _InputIterator>			template <class _InputIterator>
	_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 typename iterator_traits<_InputIterator>::value_type			_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 typename iterator_traits<_InputIterator>::value_type
	reduce(_InputIterator __first, _InputIterator __last) {			reduce(_InputIterator __first, _InputIterator __last) {
	return _VSTD::reduce(__first, __last, typename iterator_traits<_InputIterator>::value_type{});			return _VSTD::reduce(__first, __last, typename iterator_traits<_InputIterator>::value_type{});
	}			}
	#endif			#endif

	_LIBCPP_END_NAMESPACE_STD			_LIBCPP_END_NAMESPACE_STD

	#endif // _LIBCPP___NUMERIC_REDUCE_H			#endif // _LIBCPP___NUMERIC_REDUCE_H

libcxx/include/__numeric/transform_reduce.h

// -- C++ --		// -- C++ --
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef _LIBCPP___NUMERIC_TRANSFORM_REDUCE_H		#ifndef _LIBCPP___NUMERIC_TRANSFORM_REDUCE_H
#define _LIBCPP___NUMERIC_TRANSFORM_REDUCE_H		#define _LIBCPP___NUMERIC_TRANSFORM_REDUCE_H

		#include <__concepts/arithmetic.h>
#include <__config>		#include <__config>
#include <__functional/operations.h>		#include <__functional/operations.h>
		#include <__iterator/iterator_traits.h>
		#include <__type_traits/is_same.h>
		#include <__type_traits/operation_traits.h>
#include <__utility/move.h>		#include <__utility/move.h>

#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)		#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
# pragma GCC system_header		# pragma GCC system_header
#endif		#endif

_LIBCPP_BEGIN_NAMESPACE_STD		_LIBCPP_BEGIN_NAMESPACE_STD

Show All 12 Lines	_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp transform_reduce(_InputIterator1 __first1,
_InputIterator1 __last1,		_InputIterator1 __last1,
_InputIterator2 __first2, _Tp __init,		_InputIterator2 __first2, _Tp __init,
_BinaryOp1 __b1, _BinaryOp2 __b2) {		_BinaryOp1 __b1, _BinaryOp2 __b2) {
for (; __first1 != __last1; ++__first1, (void)++__first2)		for (; __first1 != __last1; ++__first1, (void)++__first2)
__init = __b1(std::move(__init), __b2(__first1, __first2));		__init = __b1(std::move(__init), __b2(__first1, __first2));
return __init;		return __init;
}		}

		#if _LIBCPP_STD_VER >= 20
		template <class _InputIterator1, class _InputIterator2, class _Tp, class _ReductionOp, class _TransformOp>
		requires(__is_trivial_operation<_TransformOp, _Tp, _Tp>::value &&
		__is_trivial_operation<_ReductionOp, _Tp, _Tp>::value &&
		is_same_v<__iter_value_type<_InputIterator1>, _Tp> && is_same_v<__iter_value_type<_InputIterator2>, _Tp>)
		ldionneUnsubmitted Not Done Reply Inline Actions We shouldn't be optimizing for user-defined types even if the operation is "trivial" (also applies to the other algo). This makes me think that we need to revisit our definition of a trivial operation. `std::plus<>` is trivial but only for a specific set of types. And in fact I am not certain what we mean by "is trivial" anymore. ldionne: We shouldn't be optimizing for user-defined types even if the operation is "trivial" (also…
		_LIBCPP_HIDE_FROM_ABI constexpr _Tp transform_reduce(
		_InputIterator1 __first1,
		_InputIterator1 __last1,
		_InputIterator2 __first2,
		_Tp __init,
		_ReductionOp __b1,
		_TransformOp __b2) {
		_LIBCPP_CLANG_PRAGMA("clang loop vectorize(enable)")
		for (; __first1 != __last1; ++__first1, (void)++__first2)
		__init = __b1(__init, __b2(__first1, __first2));
		return __init;
		}
		ldionneUnsubmitted Not Done Reply Inline Actions I would call the `unseq` version in the PSTL instead, that's more general and we'll end up using the appropriate backend. ldionne: I would call the `unseq` version in the PSTL instead, that's more general and we'll end up…
		# endif

template <class _InputIterator1, class _InputIterator2, class _Tp>		template <class _InputIterator1, class _InputIterator2, class _Tp>
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp transform_reduce(_InputIterator1 __first1,		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX20 _Tp transform_reduce(_InputIterator1 __first1,
_InputIterator1 __last1,		_InputIterator1 __last1,
_InputIterator2 __first2, _Tp __init) {		_InputIterator2 __first2, _Tp __init) {
return _VSTD::transform_reduce(__first1, __last1, __first2, _VSTD::move(__init), _VSTD::plus<>(),		return _VSTD::transform_reduce(__first1, __last1, __first2, _VSTD::move(__init), _VSTD::plus<>(),
_VSTD::multiplies<>());		_VSTD::multiplies<>());
}		}
#endif		#endif

_LIBCPP_END_NAMESPACE_STD		_LIBCPP_END_NAMESPACE_STD

#endif // _LIBCPP___NUMERIC_TRANSFORM_REDUCE_H		#endif // _LIBCPP___NUMERIC_TRANSFORM_REDUCE_H

libcxx/include/__type_traits/operation_traits.h

	Show All 15 Lines
	# pragma GCC system_header			# pragma GCC system_header
	#endif			#endif

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

	template <class _Pred, class _Lhs, class _Rhs>			template <class _Pred, class _Lhs, class _Rhs>
	struct __is_trivial_plus_operation : false_type {};			struct __is_trivial_plus_operation : false_type {};

				template <class _Pred, class _Lhs, class _Rhs>
				struct __is_trivial_multiplies_operation : false_type {};

				template <class _Pred, class _Lhs, class _Rhs>
				using __is_trivial_operation =
				integral_constant<bool,
				__is_trivial_plus_operation<_Pred, _Lhs, _Rhs>::value \|\|
				__is_trivial_multiplies_operation<_Pred, _Lhs, _Rhs>::value>;

	_LIBCPP_END_NAMESPACE_STD			_LIBCPP_END_NAMESPACE_STD

	#endif // _LIBCPP___TYPE_TRAITS_OPERATION_TRAITS_H			#endif // _LIBCPP___TYPE_TRAITS_OPERATION_TRAITS_H

libcxx/test/std/numerics/numeric.ops/reduce/reduce.pass.cpp

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	template <typename T>			template <typename T>
	TEST_CONSTEXPR_CXX20 void			TEST_CONSTEXPR_CXX20 void
	test_return_type()			test_return_type()
	{			{
	T *p = nullptr;			T *p = nullptr;
	static_assert( std::is_same_v<T, decltype(std::reduce(p, p))> );			static_assert( std::is_same_v<T, decltype(std::reduce(p, p))> );
	}			}

				TEST_CONSTEXPR_CXX20 void test_optimized_path() {
				float a[] = {1.f, 2.f, 3.f, 4.f};
				auto ret = std::reduce(a, a + 4);
				assert(ret > 9.5f && ret < 10.5f);
				}

	TEST_CONSTEXPR_CXX20 bool			TEST_CONSTEXPR_CXX20 bool
	test()			test()
	{			{
	test_return_type<char>();			test_return_type<char>();
	test_return_type<int>();			test_return_type<int>();
	test_return_type<unsigned long>();			test_return_type<unsigned long>();
	test_return_type<float>();			test_return_type<float>();
	test_return_type<double>();			test_return_type<double>();
	Show All 18 Lines

libcxx/test/std/numerics/numeric.ops/transform.reduce/transform_reduce_iter_iter_iter_init.pass.cpp

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	test_move_only_types()			test_move_only_types()
	{			{
	MoveOnly ia[] = {{1}, {2}, {3}};			MoveOnly ia[] = {{1}, {2}, {3}};
	MoveOnly ib[] = {{1}, {2}, {3}};			MoveOnly ib[] = {{1}, {2}, {3}};
	assert(14 ==			assert(14 ==
	std::transform_reduce(std::begin(ia), std::end(ia), std::begin(ib), MoveOnly{0}).get());			std::transform_reduce(std::begin(ia), std::end(ia), std::begin(ib), MoveOnly{0}).get());
	}			}

				TEST_CONSTEXPR_CXX20 void test_optimized_path() {
				float a[] = {1.f, 2.f, 3.f, 4.f};
				float b[] = {1.f, 2.f, 3.f, 4.f};
				auto ret = std::transform_reduce(a, a + 4, b, 0);
				assert(ret > 29.5f && ret < 30.5f);
				}

	TEST_CONSTEXPR_CXX20 bool			TEST_CONSTEXPR_CXX20 bool
	test()			test()
	{			{
	test_return_type<char, int>();			test_return_type<char, int>();
	test_return_type<int, int>();			test_return_type<int, int>();
	test_return_type<int, unsigned long>();			test_return_type<int, unsigned long>();
	test_return_type<float, int>();			test_return_type<float, int>();
	test_return_type<short, float>();			test_return_type<short, float>();
	Show All 23 Lines

	// just plain pointers (const vs. non-const, too)			// just plain pointers (const vs. non-const, too)
	test<const int, const unsigned int >();			test<const int, const unsigned int >();
	test<const int, unsigned int >();			test<const int, unsigned int >();
	test< int, const unsigned int >();			test< int, const unsigned int >();
	test< int, unsigned int >();			test< int, unsigned int >();

	test_move_only_types();			test_move_only_types();
				test_optimized_path();

	return true;			return true;
	}			}

	int main(int, char**)			int main(int, char**)
	{			{
	test();			test();
	#if TEST_STD_VER > 17			#if TEST_STD_VER > 17
	static_assert(test());			static_assert(test());
	#endif			#endif
	return 0;			return 0;
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Optimize transform_reduce for floating point typesNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 526095

libcxx/benchmarks/CMakeLists.txt

libcxx/benchmarks/algorithms/reduce.bench.cpp

libcxx/benchmarks/algorithms/transform_reduce.bench.cpp

libcxx/include/__config

libcxx/include/__functional/operations.h

libcxx/include/__numeric/reduce.h

libcxx/include/__numeric/transform_reduce.h

libcxx/include/__type_traits/operation_traits.h

libcxx/test/std/numerics/numeric.ops/reduce/reduce.pass.cpp

libcxx/test/std/numerics/numeric.ops/transform.reduce/transform_reduce_iter_iter_iter_init.pass.cpp

[libc++] Optimize transform_reduce for floating point types
Needs ReviewPublic