This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/
4
algorithm

Differential D42357

Under limitation of allocated buffer, inplace_merge() does NOT take usage of partial allocated buffer but applies native rotate directly.
Needs ReviewPublic

Authored by WeiChungChang on Jan 21 2018, 6:06 PM.

Download Raw Diff

This revision needs review, but there are no reviewers specified.

Details

Reviewers: None

Summary

Under limitation of allocated buffer, inplace_merge does NOT take usage of partial allocated buffer but uses native rotate directly. It makes the performance far behind from corresponding part of GCC. In this patch, it tries to use partial allocated buffer firstly. Experiment shows 28.35% & 25.06% speedup for merging two equal size sorted integers for -O3 & no -O3 cases.

Refer to the experimental results below

Diff Detail

Repository

rCXX libc++

Build Status

Buildable 14074
Build 14074: arc lint + arc unit

Event Timeline

WeiChungChang created this revision.Jan 21 2018, 6:06 PM

Herald added a subscriber: cfe-commits. · View Herald TranscriptJan 21 2018, 6:06 PM

WeiChungChang edited the summary of this revision. (Show Details)Jan 21 2018, 6:07 PM

WeiChungChang retitled this revision from Under limitation of allocated buffer, inplace_merge does NOT take usge of partial allocated buffer but uses native rotate directly. It makes the performace far behind from corresponding part of GCC. In this patch, it tries to use partial allocated... to Under limitation of allocated buffer, inplace_merge() does NOT take usage of partial allocated buffer but applies native rotate directly..

WeiChungChang edited the summary of this revision. (Show Details)

WeiChungChang edited the summary of this revision. (Show Details)Jan 21 2018, 6:12 PM

WeiChungChang added a subscriber: mclow.lists.Jan 21 2018, 6:58 PM

mclow.lists added inline comments.Jan 22 2018, 9:38 AM

include/algorithm
2511	These iterator calculations only work for random access iterators. Does this actually work with forward iterators?
2515	Probably a good idea to qualify these calls with `_VSTD::` to ensure that no inadvertent ADL happens.
4582	It seems a shame to calculate `__len1`, etc on L#2511 above when you have them right here.

mclow.lists added inline comments.Jan 23 2018, 10:40 AM

include/algorithm
2515	I think that you should run some tests with types that aren't `int`. You don't know what the state of the buffer is here; how much of it is actual objects, and how much of it is just raw memory. For raw memory, `__move` is the wrong call, because it will attempt to "clean up" the objects that are already there.

Dear mclow:

Thanks a lot for your kindly reply!

I would like to briefly describe the purpose of this fix here.

This patch is in a series of 3 aimed targets.
I could explain them item-by-item in detail with experiment.
However, I mainly focus on algorithm so if there is any syntax issue, I may need your help to correct it.

Please refer to this report report about the final correction of algorithm in detail which may avoid unnecessary split, rotate under limited buffer.

Following list the 3 items I found when compared to GCC libstlc++.

sort is faster but merge is far slower than libstlc++ .
unstable merge speed between backward & forward. (will open other ticket later)
efficient algorithm of pivot selection. (will open other ticket later)

This patch is for 1st item only
Please refer to for my test code, it is simple to sort & merge two vectors.
test source code & make file

I will attached the comparison of merge speed under different cases later (different max allocated buffer constraint).

According to your suggestion, I also would like to know which object is representative instead of integer to analyze the problem.
So I can test it too.

Thanks a lot for your suggestions.

Here gives the comparison between GCC & LLVM's std c++ library.

It is shown that libc++ soon suffers from rotate speed under limited allocated buffer and slower than GCC's corresponding version.
Both LLVM & GCC run the same test program.

Notice that the meaning of full is not the same here.
For merging [A, B] with sorted A & B, each with 8*1024*1024 integers,
LLVM takes a better way to allocate 8*1024*1024 initially but GCC with 16*1024*1024.

	full	25%	12.5%	6.25%	3.125%
LLVM	0.238614667	0.733295778	1.137285111	1.348671667	1.603639556
GCC	0.368778444	0.351659222	0.532976333	0.660785778	0.832495444

Here gives the comparison after fix.

It is better now but still slower; also, one could notice the standard variance (each round the execution time is unstable even the input is uniform)

The problem will be shown on the other ticket.

	full	25%	12.5%	6.25%	3.125%
LLVM Opt 1	0.2401988	0.407111	0.rG851043399029	1.2216213	1.3857887
LLVM	0.238614667	0.733295778	1.137285111	1.348671667	1.603639556
GCC	0.368778444	0.351659222	0.532976333	0.660785778	0.832495444

Revision Contents

Path

Size

include/

algorithm

38 lines

Diff 130824

include/algorithm

Show First 20 Lines • Show All 2,491 Lines • ▼ Show 20 Lines	if (_VSTD::is_trivially_move_assignable<value_type>::value)
return _VSTD::__rotate_gcd(__first, __middle, __last);		return _VSTD::__rotate_gcd(__first, __middle, __last);
}		}
return _VSTD::__rotate_forward(__first, __middle, __last);		return _VSTD::__rotate_forward(__first, __middle, __last);
}		}

template <class _ForwardIterator>		template <class _ForwardIterator>
inline _LIBCPP_INLINE_VISIBILITY		inline _LIBCPP_INLINE_VISIBILITY
_ForwardIterator		_ForwardIterator
		__buffered__rotate(_ForwardIterator __first, _ForwardIterator __middle, _ForwardIterator __last,
		typename iterator_traits<_ForwardIterator>::value_type* __buff, ptrdiff_t __buff_size)
		{
		if (__first == __middle)
		return __last;
		if (__middle == __last)
		return __first;
		typedef typename _VSTD::iterator_traits<_ForwardIterator>::value_type value_type;
		typedef typename _VSTD::iterator_traits<_ForwardIterator>::difference_type difference_type;
		typedef typename _VSTD::iterator_traits<_ForwardIterator>::value_type* pointer;
		difference_type __len1, __len2;
		__len1 = __middle - __first;
		mclow.listsUnsubmitted Not Done Reply Inline Actions These iterator calculations only work for random access iterators. Does this actually work with forward iterators? mclow.lists: These iterator calculations only work for random access iterators. Does this actually work with…
		__len2 = __last - __middle;
		if ((__len1 <= __buff_size) && __len1 < __len2)
		{
		pointer __buff_end = __move(__first, __middle, __buff);
		mclow.listsUnsubmitted Not Done Reply Inline Actions Probably a good idea to qualify these calls with `_VSTD::` to ensure that no inadvertent ADL happens. mclow.lists: Probably a good idea to qualify these calls with `_VSTD::` to ensure that no inadvertent ADL…
		mclow.listsUnsubmitted Not Done Reply Inline Actions I think that you should run some tests with types that aren't `int`. You don't know what the state of the buffer is here; how much of it is actual objects, and how much of it is just raw memory. For raw memory, `__move` is the wrong call, because it will attempt to "clean up" the objects that are already there. mclow.lists: I think that you should run some tests with types that aren't `int`. You don't know what the…
		__move(__middle, __last, __first);
		return __move_backward(__buff, __buff_end, __last);
		}
		else if (__len2 <= __buff_size)
		{
		pointer __buffer_end = __move(__middle, __last, __buff);
		__move_backward(__first, __middle, __last);
		return __move(__buff, __buffer_end, __first);
		}
		else
		{
		return _VSTD::__rotate(__first, __middle, __last,
		typename _VSTD::iterator_traits<_ForwardIterator>::iterator_category());
		}
		}

		template <class _ForwardIterator>
		inline _LIBCPP_INLINE_VISIBILITY
		_ForwardIterator
rotate(_ForwardIterator __first, _ForwardIterator __middle, _ForwardIterator __last)		rotate(_ForwardIterator __first, _ForwardIterator __middle, _ForwardIterator __last)
{		{
if (__first == __middle)		if (__first == __middle)
return __last;		return __last;
if (__middle == __last)		if (__middle == __last)
return __first;		return __first;
return _VSTD::__rotate(__first, __middle, __last,		return _VSTD::__rotate(__first, __middle, __last,
typename _VSTD::iterator_traits<_ForwardIterator>::iterator_category());		typename _VSTD::iterator_traits<_ForwardIterator>::iterator_category());
▲ Show 20 Lines • Show All 2,030 Lines • ▼ Show 20 Lines	while (true)
_VSTD::advance(__m1, __len11);		_VSTD::advance(__m1, __len11);
__m2 = __lower_bound<_Compare>(__middle, __last, *__m1, __comp);		__m2 = __lower_bound<_Compare>(__middle, __last, *__m1, __comp);
__len21 = _VSTD::distance(__middle, __m2);		__len21 = _VSTD::distance(__middle, __m2);
}		}
difference_type __len12 = __len1 - __len11; // distance(__m1, __middle)		difference_type __len12 = __len1 - __len11; // distance(__m1, __middle)
difference_type __len22 = __len2 - __len21; // distance(__m2, __last)		difference_type __len22 = __len2 - __len21; // distance(__m2, __last)
// [__first, __m1) [__m1, __middle) [__middle, __m2) [__m2, __last)		// [__first, __m1) [__m1, __middle) [__middle, __m2) [__m2, __last)
// swap middle two partitions		// swap middle two partitions
__middle = _VSTD::rotate(__m1, __middle, __m2);		//__middle = _VSTD::rotate(__m1, __middle, __m2);
		__middle = _VSTD::__buffered_rotate(__m1, __middle, __m2, __buff, __buff_size);
		mclow.listsUnsubmitted Not Done Reply Inline Actions It seems a shame to calculate `__len1`, etc on L#2511 above when you have them right here. mclow.lists: It seems a shame to calculate `__len1`, etc on L#2511 above when you have them right here.
// __len12 and __len21 now have swapped meanings		// __len12 and __len21 now have swapped meanings
// merge smaller range with recurisve call and larger with tail recursion elimination		// merge smaller range with recurisve call and larger with tail recursion elimination
if (__len11 + __len21 < __len12 + __len22)		if (__len11 + __len21 < __len12 + __len22)
{		{
__inplace_merge<_Compare>(__first, __m1, __middle, __comp, __len11, __len21, __buff, __buff_size);		__inplace_merge<_Compare>(__first, __m1, __middle, __comp, __len11, __len21, __buff, __buff_size);
// __inplace_merge<_Compare>(__middle, __m2, __last, __comp, __len12, __len22, __buff, __buff_size);		// __inplace_merge<_Compare>(__middle, __m2, __last, __comp, __len12, __len22, __buff, __buff_size);
__first = __middle;		__first = __middle;
__middle = __m2;		__middle = __m2;
▲ Show 20 Lines • Show All 1,258 Lines • Show Last 20 Lines