This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
benchmarks/
1
algorithms.bench.cpp
-
include/
24
algorithm
-
test/std/algorithms/alg.modifying.operations/alg.partitions/
-
std/
-
algorithms/
-
alg.modifying.operations/
-
alg.partitions/
2
partition_point.pass.cpp

Differential D52697

Bug 39129: Speeding up partition_point/lower_bound/upper_bound/ by using unsigned division by 2 when possible.
ClosedPublic

Authored by dyaroshev on Sep 29 2018, 2:35 PM.

Download Raw Diff

Details

Reviewers

mclow.lists
EricWF

Summary

The rational and measurements can be found in the bug description: https://bugs.llvm.org/show_bug.cgi?id=39129

Diff Detail

Event Timeline

dyaroshev created this revision.Sep 29 2018, 2:35 PM

Herald added subscribers: libcxx-commits, ldionne, mgrang. · View Herald TranscriptSep 29 2018, 2:35 PM

Unfortunately didn't manage to run a clang-format on my changes - it kept ignoring the library settings - so probably the formatting needs to be reviewed too.

Offtopic: compiling and running benchmarks is non-trivial - I had to lurk through Makefiles to realize how to compile them and then there was a link error with file system stuff. Updating the docs would be great.

include/algorithm
3216	If you know a better metaprogramming trick to do this, would be great.
4106	We are not allowed (by default) to use C++11 for this file? A simple lambda would do this much better.
4119	Code duplication between partition_point, lower_bound and upper_bound doesn't seem to have any practical value.
4167	Do you know of a good way to remove code duplication? Move common parts into a base class?

In general, this is the kind of optimization that I would rather see the compiler do (everywhere, automatically) than libc++ (here and there, manually).

benchmarks/algorithms.bench.cpp
90	Seems like we need more than 32-bit ints here.
include/algorithm
4162	Why constexpr after C++11?
test/std/algorithms/alg.modifying.operations/alg.partitions/partition_point.pass.cpp
92	You can't test the functionality of `__difference_type_to_size_t_cast` (a libc++-specific type) in the `test/std` hierarchy. That's what `test/libc++` tests are for.
102	This needs to be wrapped in a conditional checking to see if we have `__int128_t`

Thanks for the quick review! I will address you comments tomorrow.

In general, this is the kind of optimization that I would rather see the compiler do (everywhere, automatically) than libc++ (here and there, manually).

I don't think it can, can it? The optimization comes from knowing the last - first will never be < 0. How would the compiler know that?

include/algorithm
4162	Mostly to be constexpr in c++17. Do I say 17 instead? Can be contstexpr in 11 no problem

Addressed review comments.
Changed (I measured - the result performs identically) cast to __half_unsigned function
Added similar optimization and benchmarks to equal_range.

(updated measurements can be found in the bug description).

Herald added a subscriber: christof. · View Herald TranscriptSep 30 2018, 9:57 AM

Minor comments typos.

In D52697#1250269, @dyaroshev wrote:

In general, this is the kind of optimization that I would rather see the compiler do (everywhere, automatically) than libc++ (here and there, manually).

I don't think it can, can it? The optimization comes from knowing the last - first will never be < 0. How would the compiler know that?

We can tell the compiler that std::distance always returns >= 0.
The optimizer already knows a lot about various standard library calls.

In D52697#1250525, @mclow.lists wrote:

We can tell the compiler that std::distance always returns >= 0.
The optimizer already knows a lot about various standard library calls.

Oh, that could be interesting. Or this could be an example of the contracts based optimization.

You don't think that it's right to merge this patch? The performance wins are pretty nice. There are plenty of other things for optimizer guys to work on, this is a quite specific case.

In D52697#1250540, @dyaroshev wrote:

You don't think that it's right to merge this patch? The performance wins are pretty nice. There are plenty of other things for optimizer guys to work on, this is a quite specific case.

I don't think there's a hurry to merge this.
The performance wins are nice - on a micro scale.
Do you have an example in a real app where this makes a difference?

Note: I'm not disparaging the work you've done.
Nor do I think this isn't worth pursuing.
I'm just looking at other approaches.

In D52697#1251063, @mclow.lists wrote:

I don't think there's a hurry to merge this.
The performance wins are nice - on a micro scale.
Do you have an example in a real app where this makes a difference?

Note: I'm not disparaging the work you've done.
Nor do I think this isn't worth pursuing.
I'm just looking at other approaches.

At my current job I don't run libc++ in production, so I can't give you bigger measurements.

This is the examples I know:
Chromium actively using flat containers which rely on standard binary search: https://cs.chromium.org/chromium/src/base/containers/flat_tree.h?q=flat_tr&sq=package:chromium&g=0&l=904
FB does it too: https://github.com/facebook/folly/blob/master/folly/sorted_vector_types.h#L448
There are some implementations of hash maps that rely on std::lower_bound to find the correct bucket (I just talked to people who do that, I don't have a github link)
There was a few talks about flat_hash_map - they use the lower_bound too: https://github.com/skarupke/flat_hash_map/blob/master/flat_hash_map.hpp#L1218

I, too some extend, don't know what your concern is. It's 30 lines of code that makes code much faster at least on some benchmarks and smaller (I looked at the assembly).
These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

On the Note:
No worries, I know that you mean well. You have to maintain this staff and libc++ is your responsibility not mine.
I am completely OK with you making the decisions and appreciate you taking the time to review this stuff.

mclow.lists added inline comments.Oct 1 2018, 9:09 AM

include/algorithm
3213	Does `make_unsigned` do what you want here?

dyaroshev added inline comments.Oct 1 2018, 9:26 AM

include/algorithm
3213	I don't think so. Seems like size_t if faster than unsigned: http://quick-bench.com/hATsToKHx8UP3legquGgvT-4qpU (this is not only this benchmark, I've seen other cases when it happens.

These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

Oh, I want the speedup. I'm just exploring other ways of getting it.

In D52697#1251160, @mclow.lists wrote:

These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

Oh, I want the speedup. I'm just exploring other ways of getting it.

FWTW just adding an assume is *NOT* enough, sadly.
https://godbolt.org/z/v8fjnG

In D52697#1251168, @lebedev.ri wrote:

FWTW just adding an assume is *NOT* enough, sadly.
https://godbolt.org/z/v8fjnG

Good thinking though. I checked in libstdc++ (they don't have this performance issue).
They just do right shift instead of casting to unsigned: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algobase.h#L972
I'm not too sure how >> works, but if their way works for you, it looks simpler. (I don't know the concept requirements for difference type, but libstdc++ guys seem to require >> operation - which probably means it's OK).
http://quick-bench.com/9tsZ0ZTUn0bUTy1gPBvFnpYD4oA

My bad, size_t cast is faster: http://quick-bench.com/klty7gya6LMs5p8YVNSQ3bnUXcA

size_t still produces less code: https://godbolt.org/z/OCuCtN

mclow.lists added inline comments.Oct 1 2018, 11:01 AM

include/algorithm
3213	I think we may be talking about different things. I was talking about the template metafunction `std::make_unsigned`, which gives you an unsigned type with the same size as the parameter. From your comment, I get the feeling you are talking about the literal type `unsigned`.

mclow.lists added inline comments.Oct 1 2018, 11:07 AM

include/algorithm
3213	If I change line 9 in your quickbench example to: using len_type = std::make_unsigned_t<typename std::iterator_traits<I>::difference_type>; then the two chunks of code run at the same speed.

dyaroshev added inline comments.Oct 1 2018, 11:34 AM

include/algorithm
3213	Sorry, wasn't clear enough. Not all unsigned types are created equal: for int - unsigned type would unsigned(32 bit) not size_t - 64 bit. If you use size_t - it's faster, so if we would get int difference type - I'd still like to use size_t. Realistically, the only difference_type we care about is std::ptrdiff_t, which is 64 bit, so either way the important case works. But, if this is really the only case we care about, we can just specialise for it.

Hi!

Just wanted to know what is the current status of this PR.

EricWF added a reviewer: EricWF.Oct 10 2018, 9:24 PM

EricWF added a subscriber: EricWF.

EricWF added inline comments.

include/algorithm
3199	`__value_` with the trailing underscore should only be used for members. And pass by value. You'll have to forgive the existing code around you that's doing it wrong.
3205	This is always true. It's a quirk of `_LIBCPP_STD_VER` that it's never less than 11. What you want here is `#if !defined(_LIBCPP_CXX03_LANG)`
3210	If you really want to make this work in C++03, You should be able to use the super-secret member `numeric_limits<Integral>::__max`. But this only works because you know you're dealing with a builtin type in this specialization.
3223	I think this comment can be reduced to something like: // Perform division by two quickly for positive integers (llvm.org/PR39129) We want to keep our header sizes small
3227	s/Bug 39129/llvm.org/PR39129
3233	s/`__value_`/`__value`. And I think we can pass by value here.
3233	Nit on the name: I would rather it reflected (A) that the number must be positive, and (B) it's fast. The fact that it does it using unsigned numbers is an implementation detail.
3245	Needs _VSTD:: qualifier to defeat ADL.
4105	I would prefer `__less_than_t` and `__less_than` had more specific names which reflect how the types/functions bind to comparators/iterators. That being said, I'm also OK if we scrap this optimization in C++03 and then implement it using C++11 lambdas.
4107	This seems like a job for `__compressed_pair`.
4110	There is no reason the functor is guaranteed to be compiled out completely, is there?
4115	We could move here.
4235	Needs _VSTD:: qualifier to defeat ADL.

Changing the status to "Changes Requested"

This revision now requires changes to proceed.Oct 12 2018, 9:22 PM

Simplifying the patch after review.

Forgotten _VSTD::

Seems like you are not comfortable with some additional changes that I did. I removed all of them and just left what was 100% necessary - casting to size_t for positive ptrdiff_t.

EricWF added inline comments.Oct 13 2018, 1:46 PM

include/algorithm
754	I think we can use `_LIBCPP_CONSTEXPR` here and get better behavior in C++03.
760	While this version is certainly simplier, I would still like to have this work for all of the builtin signed integral types.
test/libcxx/algorithms/half_positive.pass.cpp
27 ↗	(On Diff #169572)	Please make sure this test compiles in C++03, C++11, ect. You'll want to use `TEST_STD_VER` from `test_macros.h` to guard the constexpr tests.

dyaroshev added inline comments.Oct 13 2018, 2:22 PM

include/algorithm
754	Sure, will do. Why _LIBCPP_CONSTEXPR helps in C++03?
760	Will do. I tried experimenting with this a bit more, but it's quite hard to tell what divisions are actually faster. The problem with my previous benchmark is that I cast from 64bit pointer to 32bit to do division. While the question that I really want to answer is - is it a better to cast 32 bit thingy to 32 unsigned or to 64 bit unsigned to do the division. Measuring it in isolation seems shady: http://quick-bench.com/Dn1QDyTLbYjH68qDydX3rM6Azko I tried to write an implementation of binary search that would reliably use 32bit arithmetics, but it doesn't really work out, since the pointer dereference is still 64 bit. It's also important (I've seen a similar issue) that the unsigned division doesn't occupy an extra register, but definetely not when you measure this in isolation (see line seven in godbolt: https://gcc.godbolt.org/z/Fx5M6l) My conclusion is: Are you OK if I just do make_unsigned? Seems to be better than signed all the time but not 100% that it's the best cast.
test/libcxx/algorithms/half_positive.pass.cpp
27 ↗	(On Diff #169572)	Will do. Can you please point me to how do I compile in C++03 mode?

__half_positve now just uses a cast to make_unsigned and other review issues.

dyaroshev marked 3 inline comments as done.Oct 14 2018, 7:24 AM

dyaroshev added inline comments.

include/algorithm
760	Just used make_unsigned.
test/libcxx/algorithms/half_positive.pass.cpp
27 ↗	(On Diff #169572)	found in the docs. Instead of feature tests, used internal macroses.

@EricWF can we maybe merge it now?

Hi again! What is the situation with this PR?

EricWF requested changes to this revision.Oct 25 2018, 11:10 AM

EricWF added inline comments.

include/algorithm
754	Woops. I meant to write "C++11". My mistake.
test/libcxx/algorithms/half_positive.pass.cpp
31 ↗	(On Diff #169601)	Other libraries run our test suite, so we can't use internal macros. Could you please write one set of tests that run at runtime, and another set that uses constexpr and static assert. And then guard the constexpr tests under `#if TEST_STD_VER >= 11`.

This revision now requires changes to proceed.Oct 25 2018, 11:10 AM

dyaroshev added inline comments.Oct 25 2018, 11:13 AM

test/libcxx/algorithms/half_positive.pass.cpp
31 ↗	(On Diff #169601)	I can but how they can run these tests? There are public tests right? And these are private tests. Other libraries cannot run this test because they do not have __half_positive not because they do not have a macro? Or am I missing smth?

@EricWF?

Other than the comment about the test, this LGTM.

test/libcxx/algorithms/half_positive.pass.cpp
31 ↗	(On Diff #169601)	Good point. These tests are specific to libc++. My mistake. We still don't use `_LIBCPP_CONSTEXPR` to declare constexpr variables in tests. It's just not how that's supposed to be used. Plus, for the constexpr tests you want to use `static_assert` and not assert. Testing both runtime and compile time behavior will require a little extra duplication, but it's probably worth it. (If you're curious here's how I would have written it: https://gist.github.com/EricWF/df7e02ca931075b62caac587b7c0cf27)

Updating tests according to review comments.

dyaroshev added inline comments.Oct 27 2018, 9:12 AM

benchmarks/ordered_set.bench.cpp
86 ↗	(On Diff #171401)	Had a compilation error here: /space/llvm/llvm/projects/libcxx/benchmarks/ordered_set.bench.cpp:37:6: note: candidate function not viable: no known conversion from 'std::vector<size_t>' (aka 'vector<unsigned long>') to 'std::vector<uint64_t> &' (aka 'vector<unsigned long long> &') for 1st argument

I'm much happier with this patch now. It's a lot simpler, and still accomplishes what you want.

LGTM. Thanks so much for this change.

I'm still quite surprised about how much of a performance difference this make.

This revision is now accepted and ready to land.Oct 27 2018, 10:31 AM

In D52697#1278218, @EricWF wrote:

LGTM. Thanks so much for this change.

I'm still quite surprised about how much of a performance difference this make.

My pleasure, thanks for the review.

Yeah, this in my limited experience performance involves a lot of REALY??? moments. Which is kinda fun though)

I'm sorry, how do I merge this patch?

In D52697#1278285, @dyaroshev wrote:

I'm sorry, how do I merge this patch?

Do you have commit access? If not I can go ahead and merge it for you with your permission.

In D52697#1279140, @EricWF wrote:

In D52697#1278285, @dyaroshev wrote:

I'm sorry, how do I merge this patch?

Do you have commit access? If not I can go ahead and merge it for you with your permission.

I don't! Please merge it.

Committed as r345525.

Thanks again.

Hi again.

Do you remeber if the difference was this big (5 times): http://quick-bench.com/kguTLHndCQWFuzlxJQT4tJQa_p8 ?
This seems insane

I don't believe that quickbench already has the latest version of the library, because this is the assembly generated
3.69% mov %rdx,%rsi
0.12% mov %rcx,%rdx

shr    $0x3f,%rdx

2.89% add %rcx,%rdx
5.42% sar %rdx
5.06% add $0xffffffffffffffff,%rcx

The last 0xffff ... was exactly what this patch removed.
Can someone please take a look? It doesn't feel right.

FOUND IT! ! difference_type len = _VSTD::distance(first, __last); - should be size_t on the left
I don't know what happened - I ran benchmarks locally before merging!
I'm sorry - no idea why it didn't reproduce on my machine. Maybe I messed something up
I can do the patch tommorow. It will take a while to merge it (judging by this merge experience).
Do you want to maybe roll this back?
http://quick-bench.com/fcV9gaAWPeqB_3-AePOw9H2wxXU

Again, I don't know how this happened, I ran the benchmarks. I'm really sorry.

Revision Contents

Path

Size

benchmarks/

algorithms.bench.cpp

30 lines

include/

algorithm

123 lines

test/

std/

algorithms/

alg.modifying.operations/

alg.partitions/

partition_point.pass.cpp

20 lines

Diff 167628

benchmarks/algorithms.bench.cpp

Context not available.
	BENCHMARK_CAPTURE(BM_Sort, single_element_strings,	BENCHMARK_CAPTURE(BM_Sort, single_element_strings,
	getDuplicateStringInputs)->Arg(TestNumInputs);	getDuplicateStringInputs)->Arg(TestNumInputs);

		template <typename GenInputs>
		void BM_LowerBound(benchmark::State& st, GenInputs gen) {
		using ValueType = typename decltype(gen(0))::value_type;
		auto in = gen(st.range(0));
		std::sort(in.begin(), in.end());

		const auto every_10_percentile = [&]() -> std::vector<ValueType*> {
		size_t step = in.size() / 10;

		if (step == 0) {
		st.SkipWithError("Input doesn't contain enough elements");
		return {};
		}

		std::vector<ValueType*> res;
		for (size_t i = 0; i < in.size(); i += step)
		res.push_back(&in[i]);

		return res;
		}();

		for (auto _ : st)
		{
		for (auto* test : every_10_percentile)
		benchmark::DoNotOptimize(std::lower_bound(in.begin(), in.end(), *test));
		}
		}

		BENCHMARK_CAPTURE(BM_LowerBound, random_int32,
		getRandomIntegerInputs<int32_t>)->Arg(TestNumInputs);
		mclow.listsUnsubmitted Not Done Reply Inline Actions Seems like we need more than 32-bit ints here. mclow.lists: Seems like we need more than 32-bit ints here.

	BENCHMARK_MAIN();	BENCHMARK_MAIN();
Context not available.

include/algorithm

Context not available.

	// partition_point	// partition_point

		// Maybe casts to size_t (if the conversion from the type is safe).
		// Implicit assumption - the cast will happen from the positive number.
		//
		// This is necessary because some operations are faster on the unsigned numbers,
		EricWFUnsubmitted Not Done Reply Inline Actions `__value_` with the trailing underscore should only be used for members. And pass by value. You'll have to forgive the existing code around you that's doing it wrong. EricWF: `__value_` with the trailing underscore should only be used for members. And pass by value.
		// see Bug 39129.
		template <typename _Tp, bool isNumeric = is_integral<_Tp>::value>
		struct __difference_type_to_size_t_cast {
		typedef _Tp __type;
		};

		EricWFUnsubmitted Not Done Reply Inline Actions This is always true. It's a quirk of `_LIBCPP_STD_VER` that it's never less than 11. What you want here is `#if !defined(_LIBCPP_CXX03_LANG)` EricWF: This is always true. It's a quirk of `_LIBCPP_STD_VER` that it's never less than 11. What you…
		#if _LIBCPP_STD_VER >= 11 // There isn't a way to get numeric_limits::max() in C++03.
		template <typename _Tp>
		struct __difference_type_to_size_t_cast<_Tp, /isNumeric = / true> {
		typedef typename
		conditional<
		EricWFUnsubmitted Not Done Reply Inline Actions If you really want to make this work in C++03, You should be able to use the super-secret member `numeric_limits<Integral>::__max`. But this only works because you know you're dealing with a builtin type in this specialization. EricWF: If you really want to make this work in C++03, You should be able to use the super-secret…
		std::numeric_limits<_Tp>::max() <= numeric_limits<size_t>::max(),
		size_t,
		_Tp
		mclow.listsUnsubmitted Not Done Reply Inline Actions Does `make_unsigned` do what you want here? mclow.lists: Does `make_unsigned` do what you want here?
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions I don't think so. Seems like size_t if faster than unsigned: http://quick-bench.com/hATsToKHx8UP3legquGgvT-4qpU (this is not only this benchmark, I've seen other cases when it happens. dyaroshev: I don't think so. Seems like size_t if faster than unsigned: http://quick-bench.
		mclow.listsUnsubmitted Not Done Reply Inline Actions I think we may be talking about different things. I was talking about the template metafunction `std::make_unsigned`, which gives you an unsigned type with the same size as the parameter. From your comment, I get the feeling you are talking about the literal type `unsigned`. mclow.lists: I think we may be talking about different things. I was talking about the template…
		mclow.listsUnsubmitted Not Done Reply Inline Actions If I change line 9 in your quickbench example to: using len_type = std::make_unsigned_t<typename std::iterator_traits<I>::difference_type>; then the two chunks of code run at the same speed. mclow.lists: If I change line 9 in your quickbench example to: using len_type = std…
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Sorry, wasn't clear enough. Not all unsigned types are created equal: for int - unsigned type would unsigned(32 bit) not size_t - 64 bit. If you use size_t - it's faster, so if we would get int difference type - I'd still like to use size_t. Realistically, the only difference_type we care about is std::ptrdiff_t, which is 64 bit, so either way the important case works. But, if this is really the only case we care about, we can just specialise for it. dyaroshev: Sorry, wasn't clear enough. Not all unsigned types are created equal: for int - unsigned type…
		>::type __type;
		};
		#endif // _LIBCPP_STD_VER >= 11
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions If you know a better metaprogramming trick to do this, would be great. dyaroshev: If you know a better metaprogramming trick to do this, would be great.

	template<class _ForwardIterator, class _Predicate>	template<class _ForwardIterator, class _Predicate>
	_LIBCPP_CONSTEXPR_AFTER_CXX17 _ForwardIterator	_LIBCPP_CONSTEXPR_AFTER_CXX17 _ForwardIterator
	partition_point(_ForwardIterator __first, _ForwardIterator __last, _Predicate __pred)	partition_point(_ForwardIterator __first, _ForwardIterator __last, _Predicate __pred)
	{	{
	typedef typename iterator_traits<_ForwardIterator>::difference_type difference_type;	typedef typename iterator_traits<_ForwardIterator>::difference_type __difference_type;
	difference_type __len = _VSTD::distance(__first, __last);	typedef typename __difference_type_to_size_t_cast<__difference_type>::__type __len_type;
		EricWFUnsubmitted Not Done Reply Inline Actions I think this comment can be reduced to something like: // Perform division by two quickly for positive integers (llvm.org/PR39129) We want to keep our header sizes small EricWF: I think this comment can be reduced to something like: ``` // Perform division by two quickly…

		__len_type __len = static_cast<__len_type>(_VSTD::distance(__first, __last));
	while (__len != 0)	while (__len != 0)
	{	{
		EricWFUnsubmitted Not Done Reply Inline Actions s/Bug 39129/llvm.org/PR39129 EricWF: s/Bug 39129/llvm.org/PR39129
	difference_type __l2 = __len / 2;	__len_type __l2 = __len / 2;
	_ForwardIterator __m = __first;	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);	_VSTD::advance(__m, static_cast<__difference_type>(__l2));
	if (__pred(*__m))	if (__pred(*__m))
	{	{
	__first = ++__m;	__first = ++__m;
		EricWFUnsubmitted Not Done Reply Inline Actions Needs _VSTD:: qualifier to defeat ADL. EricWF: Needs _VSTD:: qualifier to defeat ADL.
		EricWFUnsubmitted Not Done Reply Inline Actions s/`__value_`/`__value`. And I think we can pass by value here. EricWF: s/`__value_`/`__value`. And I think we can pass by value here.
		EricWFUnsubmitted Not Done Reply Inline Actions Nit on the name: I would rather it reflected (A) that the number must be positive, and (B) it's fast. The fact that it does it using unsigned numbers is an implementation detail. EricWF: Nit on the name: I would rather it reflected (A) that the number must be positive, and (B)…
Context not available.

	// lower_bound	// lower_bound

		template <class _Compare, class _Tp, class _Reference>
		struct __less_than_t
		{
		_Compare __comp_; // We can't use an empty base class optimization for comp
		// since it might be a function pointer.
		//
		// Because this functor is compiled out completly, it doesn't matter.
		const _Tp* __value_;

		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
		__less_than_t(_Compare __comp, const _Tp& __value_) :
		__comp_(__comp),
		__value_(_VSTD::addressof(__value_)) {}

		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
		bool operator()(_Reference __u)
		{
		return __comp_(__u, *__value_);
		}
		EricWFUnsubmitted Not Done Reply Inline Actions I would prefer `__less_than_t` and `__less_than` had more specific names which reflect how the types/functions bind to comparators/iterators. That being said, I'm also OK if we scrap this optimization in C++03 and then implement it using C++11 lambdas. EricWF: I would prefer `__less_than_t` and `__less_than` had more specific names which reflect how the…
		};
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions We are not allowed (by default) to use C++11 for this file? A simple lambda would do this much better. dyaroshev: We are not allowed (by default) to use C++11 for this file? A simple lambda would do this much…

		EricWFUnsubmitted Not Done Reply Inline Actions This seems like a job for `__compressed_pair`. EricWF: This seems like a job for `__compressed_pair`.
		template <class _Iterator, class _Compare, class _Tp>
		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
		__less_than_t<_Compare, _Tp, typename iterator_traits<_Iterator>::reference>
		EricWFUnsubmitted Not Done Reply Inline Actions There is no reason the functor is guaranteed to be compiled out completely, is there? EricWF: There is no reason the functor is guaranteed to be compiled out completely, is there?
		__less_than(_Compare __comp, const _Tp& __value_) {
		return __less_than_t<_Compare, _Tp, typename iterator_traits<_Iterator>::reference>(__comp, __value_);
		}

	template <class _Compare, class _ForwardIterator, class _Tp>	template <class _Compare, class _ForwardIterator, class _Tp>
		EricWFUnsubmitted Not Done Reply Inline Actions We could move here. EricWF: We could move here.
	_LIBCPP_CONSTEXPR_AFTER_CXX17 _ForwardIterator	_LIBCPP_CONSTEXPR_AFTER_CXX17 _ForwardIterator
	__lower_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value_, _Compare __comp)	__lower_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value_, _Compare __comp)
	{	{
	typedef typename iterator_traits<_ForwardIterator>::difference_type difference_type;	return _VSTD::partition_point(__first, __last, __less_than<_ForwardIterator>(__comp, __value_));
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Code duplication between partition_point, lower_bound and upper_bound doesn't seem to have any practical value. dyaroshev: Code duplication between partition_point, lower_bound and upper_bound doesn't seem to have any…
	difference_type __len = _VSTD::distance(__first, __last);
	while (__len != 0)
	{
	difference_type __l2 = __len / 2;
	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);
	if (__comp(*__m, __value_))
	{
	__first = ++__m;
	__len -= __l2 + 1;
	}
	else
	__len = __l2;
	}
	return __first;
	}	}

	template <class _ForwardIterator, class _Tp, class _Compare>	template <class _ForwardIterator, class _Tp, class _Compare>
Context not available.

	// upper_bound	// upper_bound

		template <class _Compare, class _Tp, class _Reference>
		struct __not_greater_than_t
		{
		_Compare __comp_; // We can't use an empty base class optimization for comp
		// since it might be a function pointer.
		//
		// Because this functor is compiled out completly, it doesn't matter.
		const _Tp* __value_;

		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
		__not_greater_than_t(_Compare __comp, const _Tp& __value_) :
		__comp_(__comp),
		__value_(_VSTD::addressof(__value_)) {}

		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
		mclow.listsUnsubmitted Not Done Reply Inline Actions Why constexpr after C++11? mclow.lists: Why constexpr after C++11?
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Mostly to be constexpr in c++17. Do I say 17 instead? Can be contstexpr in 11 no problem dyaroshev: Mostly to be constexpr in c++17. Do I say 17 instead? Can be contstexpr in 11 no problem
		bool operator()(_Reference __u)
		{
		return !__comp_(*__value_, __u);
		}
		};
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Do you know of a good way to remove code duplication? Move common parts into a base class? dyaroshev: Do you know of a good way to remove code duplication? Move common parts into a base class?

		template <class _Iterator, class _Compare, class _Tp>
		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_AFTER_CXX11
		__not_greater_than_t<_Compare, _Tp, typename iterator_traits<_Iterator>::reference>
		__not_greater_than(_Compare __comp, const _Tp& __value_)
		{
		return __not_greater_than_t<_Compare, _Tp, typename iterator_traits<_Iterator>::reference>(__comp, __value_);
		}

	template <class _Compare, class _ForwardIterator, class _Tp>	template <class _Compare, class _ForwardIterator, class _Tp>
	_LIBCPP_CONSTEXPR_AFTER_CXX17 _ForwardIterator	_LIBCPP_CONSTEXPR_AFTER_CXX17 _ForwardIterator
	__upper_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value_, _Compare __comp)	__upper_bound(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value_, _Compare __comp)
	{	{
	typedef typename iterator_traits<_ForwardIterator>::difference_type difference_type;	return _VSTD::partition_point(__first, __last, __not_greater_than<_ForwardIterator>(__comp, __value_));
	difference_type __len = _VSTD::distance(__first, __last);
	while (__len != 0)
	{
	difference_type __l2 = __len / 2;
	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);
	if (__comp(__value_, *__m))
	__len = __l2;
	else
	{
	__first = ++__m;
	__len -= __l2 + 1;
	}
	}
	return __first;
	}	}

	template <class _ForwardIterator, class _Tp, class _Compare>	template <class _ForwardIterator, class _Tp, class _Compare>
Context not available.
		EricWFUnsubmitted Not Done Reply Inline Actions Needs _VSTD:: qualifier to defeat ADL. EricWF: Needs _VSTD:: qualifier to defeat ADL.

test/std/algorithms/alg.modifying.operations/alg.partitions/partition_point.pass.cpp

Context not available.

	#include <algorithm>	#include <algorithm>
	#include <cassert>	#include <cassert>
		#include <vector>

	#include "test_macros.h"	#include "test_macros.h"
	#include "test_iterators.h"	#include "test_iterators.h"
Context not available.
	is_odd()) == forward_iterator<const int*>(ia));	is_odd()) == forward_iterator<const int*>(ia));
	}	}

		{
		typedef typename std::__difference_type_to_size_t_cast<int>::__type cast_int;
		mclow.listsUnsubmitted Not Done Reply Inline Actions You can't test the functionality of `__difference_type_to_size_t_cast` (a libc++-specific type) in the `test/std` hierarchy. That's what `test/libc++` tests are for. mclow.lists: You can't test the functionality of `__difference_type_to_size_t_cast` (a libc++-specific type)…
		static_assert(std::is_same<cast_int, size_t>::value, "");

		typedef typename std::__difference_type_to_size_t_cast<std::ptrdiff_t>::__type cast_ptrdiff;
		static_assert(std::is_same<cast_ptrdiff, size_t>::value, "");

		typedef typename
		std::__difference_type_to_size_t_cast<std::vector<int>::difference_type>::__type cast_vector_difference_type;
		static_assert(std::is_same<cast_vector_difference_type, size_t>::value, "");

		typedef typename std::__difference_type_to_size_t_cast<__int128_t>::__type cast_int128_t;
		mclow.listsUnsubmitted Not Done Reply Inline Actions This needs to be wrapped in a conditional checking to see if we have `__int128_t` mclow.lists: This needs to be wrapped in a conditional checking to see if we have `__int128_t`
		static_assert(std::is_same<cast_int128_t, __int128_t>::value, "");

		struct S {};
		typedef typename std::__difference_type_to_size_t_cast<S>::__type cast_user_defined;
		static_assert(std::is_same<cast_user_defined, S>::value, "");
		}

	#if TEST_STD_VER > 17	#if TEST_STD_VER > 17
	static_assert(test_constexpr());	static_assert(test_constexpr());
	#endif	#endif
Context not available.