This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
benchmarks/
1
algorithms.bench.cpp
1
ordered_set.bench.cpp
-
include/
2/30
algorithm
-
test/libcxx/algorithms/
-
libcxx/
-
algorithms/
1/6
half_positive.pass.cpp

Differential D52697

Bug 39129: Speeding up partition_point/lower_bound/upper_bound/ by using unsigned division by 2 when possible.
ClosedPublic

Authored by dyaroshev on Sep 29 2018, 2:35 PM.

Download Raw Diff

Details

Reviewers

mclow.lists
EricWF

Summary

The rational and measurements can be found in the bug description: https://bugs.llvm.org/show_bug.cgi?id=39129

Diff Detail

Event Timeline

dyaroshev created this revision.Sep 29 2018, 2:35 PM

Herald added subscribers: libcxx-commits, ldionne, mgrang. · View Herald TranscriptSep 29 2018, 2:35 PM

Unfortunately didn't manage to run a clang-format on my changes - it kept ignoring the library settings - so probably the formatting needs to be reviewed too.

Offtopic: compiling and running benchmarks is non-trivial - I had to lurk through Makefiles to realize how to compile them and then there was a link error with file system stuff. Updating the docs would be great.

include/algorithm
3216	If you know a better metaprogramming trick to do this, would be great.
4106	We are not allowed (by default) to use C++11 for this file? A simple lambda would do this much better.
4119	Code duplication between partition_point, lower_bound and upper_bound doesn't seem to have any practical value.
4167	Do you know of a good way to remove code duplication? Move common parts into a base class?

In general, this is the kind of optimization that I would rather see the compiler do (everywhere, automatically) than libc++ (here and there, manually).

benchmarks/algorithms.bench.cpp
90	Seems like we need more than 32-bit ints here.
include/algorithm
4162	Why constexpr after C++11?
test/std/algorithms/alg.modifying.operations/alg.partitions/partition_point.pass.cpp
92 ↗	(On Diff #167628)	You can't test the functionality of `__difference_type_to_size_t_cast` (a libc++-specific type) in the `test/std` hierarchy. That's what `test/libc++` tests are for.
102 ↗	(On Diff #167628)	This needs to be wrapped in a conditional checking to see if we have `__int128_t`

Thanks for the quick review! I will address you comments tomorrow.

In general, this is the kind of optimization that I would rather see the compiler do (everywhere, automatically) than libc++ (here and there, manually).

I don't think it can, can it? The optimization comes from knowing the last - first will never be < 0. How would the compiler know that?

include/algorithm
4162	Mostly to be constexpr in c++17. Do I say 17 instead? Can be contstexpr in 11 no problem

Addressed review comments.
Changed (I measured - the result performs identically) cast to __half_unsigned function
Added similar optimization and benchmarks to equal_range.

(updated measurements can be found in the bug description).

Herald added a subscriber: christof. · View Herald TranscriptSep 30 2018, 9:57 AM

Minor comments typos.

In D52697#1250269, @dyaroshev wrote:

In general, this is the kind of optimization that I would rather see the compiler do (everywhere, automatically) than libc++ (here and there, manually).

I don't think it can, can it? The optimization comes from knowing the last - first will never be < 0. How would the compiler know that?

We can tell the compiler that std::distance always returns >= 0.
The optimizer already knows a lot about various standard library calls.

In D52697#1250525, @mclow.lists wrote:

We can tell the compiler that std::distance always returns >= 0.
The optimizer already knows a lot about various standard library calls.

Oh, that could be interesting. Or this could be an example of the contracts based optimization.

You don't think that it's right to merge this patch? The performance wins are pretty nice. There are plenty of other things for optimizer guys to work on, this is a quite specific case.

In D52697#1250540, @dyaroshev wrote:

You don't think that it's right to merge this patch? The performance wins are pretty nice. There are plenty of other things for optimizer guys to work on, this is a quite specific case.

I don't think there's a hurry to merge this.
The performance wins are nice - on a micro scale.
Do you have an example in a real app where this makes a difference?

Note: I'm not disparaging the work you've done.
Nor do I think this isn't worth pursuing.
I'm just looking at other approaches.

In D52697#1251063, @mclow.lists wrote:

I don't think there's a hurry to merge this.
The performance wins are nice - on a micro scale.
Do you have an example in a real app where this makes a difference?

Note: I'm not disparaging the work you've done.
Nor do I think this isn't worth pursuing.
I'm just looking at other approaches.

At my current job I don't run libc++ in production, so I can't give you bigger measurements.

This is the examples I know:
Chromium actively using flat containers which rely on standard binary search: https://cs.chromium.org/chromium/src/base/containers/flat_tree.h?q=flat_tr&sq=package:chromium&g=0&l=904
FB does it too: https://github.com/facebook/folly/blob/master/folly/sorted_vector_types.h#L448
There are some implementations of hash maps that rely on std::lower_bound to find the correct bucket (I just talked to people who do that, I don't have a github link)
There was a few talks about flat_hash_map - they use the lower_bound too: https://github.com/skarupke/flat_hash_map/blob/master/flat_hash_map.hpp#L1218

I, too some extend, don't know what your concern is. It's 30 lines of code that makes code much faster at least on some benchmarks and smaller (I looked at the assembly).
These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

On the Note:
No worries, I know that you mean well. You have to maintain this staff and libc++ is your responsibility not mine.
I am completely OK with you making the decisions and appreciate you taking the time to review this stuff.

mclow.lists added inline comments.Oct 1 2018, 9:09 AM

include/algorithm
3213	Does `make_unsigned` do what you want here?

dyaroshev added inline comments.Oct 1 2018, 9:26 AM

include/algorithm
3213	I don't think so. Seems like size_t if faster than unsigned: http://quick-bench.com/hATsToKHx8UP3legquGgvT-4qpU (this is not only this benchmark, I've seen other cases when it happens.

These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

Oh, I want the speedup. I'm just exploring other ways of getting it.

In D52697#1251160, @mclow.lists wrote:

These are highly used and important functions and if not to push for every bit of performance in the standard library, where to?

Oh, I want the speedup. I'm just exploring other ways of getting it.

FWTW just adding an assume is *NOT* enough, sadly.
https://godbolt.org/z/v8fjnG

In D52697#1251168, @lebedev.ri wrote:

FWTW just adding an assume is *NOT* enough, sadly.
https://godbolt.org/z/v8fjnG

Good thinking though. I checked in libstdc++ (they don't have this performance issue).
They just do right shift instead of casting to unsigned: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_algobase.h#L972
I'm not too sure how >> works, but if their way works for you, it looks simpler. (I don't know the concept requirements for difference type, but libstdc++ guys seem to require >> operation - which probably means it's OK).
http://quick-bench.com/9tsZ0ZTUn0bUTy1gPBvFnpYD4oA

My bad, size_t cast is faster: http://quick-bench.com/klty7gya6LMs5p8YVNSQ3bnUXcA

size_t still produces less code: https://godbolt.org/z/OCuCtN

mclow.lists added inline comments.Oct 1 2018, 11:01 AM

include/algorithm
3213	I think we may be talking about different things. I was talking about the template metafunction `std::make_unsigned`, which gives you an unsigned type with the same size as the parameter. From your comment, I get the feeling you are talking about the literal type `unsigned`.

mclow.lists added inline comments.Oct 1 2018, 11:07 AM

include/algorithm
3213	If I change line 9 in your quickbench example to: using len_type = std::make_unsigned_t<typename std::iterator_traits<I>::difference_type>; then the two chunks of code run at the same speed.

dyaroshev added inline comments.Oct 1 2018, 11:34 AM

include/algorithm
3213	Sorry, wasn't clear enough. Not all unsigned types are created equal: for int - unsigned type would unsigned(32 bit) not size_t - 64 bit. If you use size_t - it's faster, so if we would get int difference type - I'd still like to use size_t. Realistically, the only difference_type we care about is std::ptrdiff_t, which is 64 bit, so either way the important case works. But, if this is really the only case we care about, we can just specialise for it.

Hi!

Just wanted to know what is the current status of this PR.

EricWF added a reviewer: EricWF.Oct 10 2018, 9:24 PM

EricWF added a subscriber: EricWF.

EricWF added inline comments.

include/algorithm
3199	`__value_` with the trailing underscore should only be used for members. And pass by value. You'll have to forgive the existing code around you that's doing it wrong.
3205	This is always true. It's a quirk of `_LIBCPP_STD_VER` that it's never less than 11. What you want here is `#if !defined(_LIBCPP_CXX03_LANG)`
3210	If you really want to make this work in C++03, You should be able to use the super-secret member `numeric_limits<Integral>::__max`. But this only works because you know you're dealing with a builtin type in this specialization.
3223	I think this comment can be reduced to something like: // Perform division by two quickly for positive integers (llvm.org/PR39129) We want to keep our header sizes small
3227	s/Bug 39129/llvm.org/PR39129
3233	s/`__value_`/`__value`. And I think we can pass by value here.
3233	Nit on the name: I would rather it reflected (A) that the number must be positive, and (B) it's fast. The fact that it does it using unsigned numbers is an implementation detail.
3245	Needs _VSTD:: qualifier to defeat ADL.
4105	I would prefer `__less_than_t` and `__less_than` had more specific names which reflect how the types/functions bind to comparators/iterators. That being said, I'm also OK if we scrap this optimization in C++03 and then implement it using C++11 lambdas.
4107	This seems like a job for `__compressed_pair`.
4110	There is no reason the functor is guaranteed to be compiled out completely, is there?
4115	We could move here.
4235	Needs _VSTD:: qualifier to defeat ADL.

Changing the status to "Changes Requested"

This revision now requires changes to proceed.Oct 12 2018, 9:22 PM

Simplifying the patch after review.

Forgotten _VSTD::

Seems like you are not comfortable with some additional changes that I did. I removed all of them and just left what was 100% necessary - casting to size_t for positive ptrdiff_t.

EricWF added inline comments.Oct 13 2018, 1:46 PM

include/algorithm
754	I think we can use `_LIBCPP_CONSTEXPR` here and get better behavior in C++03.
760	While this version is certainly simplier, I would still like to have this work for all of the builtin signed integral types.
test/libcxx/algorithms/half_positive.pass.cpp
28	Please make sure this test compiles in C++03, C++11, ect. You'll want to use `TEST_STD_VER` from `test_macros.h` to guard the constexpr tests.

dyaroshev added inline comments.Oct 13 2018, 2:22 PM

include/algorithm
754	Sure, will do. Why _LIBCPP_CONSTEXPR helps in C++03?
760	Will do. I tried experimenting with this a bit more, but it's quite hard to tell what divisions are actually faster. The problem with my previous benchmark is that I cast from 64bit pointer to 32bit to do division. While the question that I really want to answer is - is it a better to cast 32 bit thingy to 32 unsigned or to 64 bit unsigned to do the division. Measuring it in isolation seems shady: http://quick-bench.com/Dn1QDyTLbYjH68qDydX3rM6Azko I tried to write an implementation of binary search that would reliably use 32bit arithmetics, but it doesn't really work out, since the pointer dereference is still 64 bit. It's also important (I've seen a similar issue) that the unsigned division doesn't occupy an extra register, but definetely not when you measure this in isolation (see line seven in godbolt: https://gcc.godbolt.org/z/Fx5M6l) My conclusion is: Are you OK if I just do make_unsigned? Seems to be better than signed all the time but not 100% that it's the best cast.
test/libcxx/algorithms/half_positive.pass.cpp
28	Will do. Can you please point me to how do I compile in C++03 mode?

__half_positve now just uses a cast to make_unsigned and other review issues.

dyaroshev marked 3 inline comments as done.Oct 14 2018, 7:24 AM

dyaroshev added inline comments.

include/algorithm
760	Just used make_unsigned.
test/libcxx/algorithms/half_positive.pass.cpp
28	found in the docs. Instead of feature tests, used internal macroses.

@EricWF can we maybe merge it now?

Hi again! What is the situation with this PR?

EricWF requested changes to this revision.Oct 25 2018, 11:10 AM

EricWF added inline comments.

include/algorithm
754	Woops. I meant to write "C++11". My mistake.
test/libcxx/algorithms/half_positive.pass.cpp
32	Other libraries run our test suite, so we can't use internal macros. Could you please write one set of tests that run at runtime, and another set that uses constexpr and static assert. And then guard the constexpr tests under `#if TEST_STD_VER >= 11`.

This revision now requires changes to proceed.Oct 25 2018, 11:10 AM

dyaroshev added inline comments.Oct 25 2018, 11:13 AM

test/libcxx/algorithms/half_positive.pass.cpp
32	I can but how they can run these tests? There are public tests right? And these are private tests. Other libraries cannot run this test because they do not have __half_positive not because they do not have a macro? Or am I missing smth?

@EricWF?

Other than the comment about the test, this LGTM.

test/libcxx/algorithms/half_positive.pass.cpp
32	Good point. These tests are specific to libc++. My mistake. We still don't use `_LIBCPP_CONSTEXPR` to declare constexpr variables in tests. It's just not how that's supposed to be used. Plus, for the constexpr tests you want to use `static_assert` and not assert. Testing both runtime and compile time behavior will require a little extra duplication, but it's probably worth it. (If you're curious here's how I would have written it: https://gist.github.com/EricWF/df7e02ca931075b62caac587b7c0cf27)

Updating tests according to review comments.

dyaroshev added inline comments.Oct 27 2018, 9:12 AM

benchmarks/ordered_set.bench.cpp
86	Had a compilation error here: /space/llvm/llvm/projects/libcxx/benchmarks/ordered_set.bench.cpp:37:6: note: candidate function not viable: no known conversion from 'std::vector<size_t>' (aka 'vector<unsigned long>') to 'std::vector<uint64_t> &' (aka 'vector<unsigned long long> &') for 1st argument

I'm much happier with this patch now. It's a lot simpler, and still accomplishes what you want.

LGTM. Thanks so much for this change.

I'm still quite surprised about how much of a performance difference this make.

This revision is now accepted and ready to land.Oct 27 2018, 10:31 AM

In D52697#1278218, @EricWF wrote:

LGTM. Thanks so much for this change.

I'm still quite surprised about how much of a performance difference this make.

My pleasure, thanks for the review.

Yeah, this in my limited experience performance involves a lot of REALY??? moments. Which is kinda fun though)

I'm sorry, how do I merge this patch?

In D52697#1278285, @dyaroshev wrote:

I'm sorry, how do I merge this patch?

Do you have commit access? If not I can go ahead and merge it for you with your permission.

In D52697#1279140, @EricWF wrote:

In D52697#1278285, @dyaroshev wrote:

I'm sorry, how do I merge this patch?

Do you have commit access? If not I can go ahead and merge it for you with your permission.

I don't! Please merge it.

Committed as r345525.

Thanks again.

Hi again.

Do you remeber if the difference was this big (5 times): http://quick-bench.com/kguTLHndCQWFuzlxJQT4tJQa_p8 ?
This seems insane

I don't believe that quickbench already has the latest version of the library, because this is the assembly generated
3.69% mov %rdx,%rsi
0.12% mov %rcx,%rdx

shr    $0x3f,%rdx

2.89% add %rcx,%rdx
5.42% sar %rdx
5.06% add $0xffffffffffffffff,%rcx

The last 0xffff ... was exactly what this patch removed.
Can someone please take a look? It doesn't feel right.

FOUND IT! ! difference_type len = _VSTD::distance(first, __last); - should be size_t on the left
I don't know what happened - I ran benchmarks locally before merging!
I'm sorry - no idea why it didn't reproduce on my machine. Maybe I messed something up
I can do the patch tommorow. It will take a while to merge it (judging by this merge experience).
Do you want to maybe roll this back?
http://quick-bench.com/fcV9gaAWPeqB_3-AePOw9H2wxXU

Again, I don't know how this happened, I ran the benchmarks. I'm really sorry.

Revision Contents

Path

Size

benchmarks/

algorithms.bench.cpp

64 lines

ordered_set.bench.cpp

2 lines

include/

algorithm

34 lines

test/

libcxx/

algorithms/

half_positive.pass.cpp

59 lines

Diff 171401

benchmarks/algorithms.bench.cpp

Context not available.
	BENCHMARK_CAPTURE(BM_Sort, single_element_strings,	BENCHMARK_CAPTURE(BM_Sort, single_element_strings,
	getDuplicateStringInputs)->Arg(TestNumInputs);	getDuplicateStringInputs)->Arg(TestNumInputs);

		template <typename GenInputs, typename Alg>
		void do_binary_search_benchmark(benchmark::State& st, GenInputs gen, Alg alg)
		{
		using ValueType = typename decltype(gen(0))::value_type;
		auto in = gen(st.range(0));
		std::sort(in.begin(), in.end());

		const auto every_10_percentile = [&]() -> std::vector<ValueType*> {
		size_t step = in.size() / 10;

		if (step == 0) {
		st.SkipWithError("Input doesn't contain enough elements");
		return {};
		}

		std::vector<ValueType*> res;
		for (size_t i = 0; i < in.size(); i += step)
		res.push_back(&in[i]);

		return res;
		}();

		for (auto _ : st)
		{
		for (auto* test : every_10_percentile)
		benchmark::DoNotOptimize(alg(in.begin(), in.end(), *test));
		}
		}

		template <typename GenInputs>
		mclow.listsUnsubmitted Not Done Reply Inline Actions Seems like we need more than 32-bit ints here. mclow.lists: Seems like we need more than 32-bit ints here.
		void BM_LowerBound(benchmark::State& st, GenInputs gen)
		{
		do_binary_search_benchmark(st, gen, [](auto f, auto l, const auto& v) {
		return std::lower_bound(f, l, v);
		});
		}

		BENCHMARK_CAPTURE(BM_LowerBound, random_int32, getRandomIntegerInputs<int32_t>)
		->Arg(TestNumInputs) // Small int32_t vector
		->Arg(TestNumInputs * TestNumInputs); // Big int32_t vector

		BENCHMARK_CAPTURE(BM_LowerBound, random_int64, getRandomIntegerInputs<int64_t>)
		->Arg(TestNumInputs); // Small int64_t vector. Should also represent pointers.

		BENCHMARK_CAPTURE(BM_LowerBound, random_strings, getRandomStringInputs)
		->Arg(TestNumInputs); // Small string vector. What happens if the comparison is not very cheap.

		template <typename GenInputs>
		void BM_EqualRange(benchmark::State& st, GenInputs gen)
		{
		do_binary_search_benchmark(st, gen, [](auto f, auto l, const auto& v) {
		return std::equal_range(f, l, v);
		});
		}

		BENCHMARK_CAPTURE(BM_EqualRange, random_int32, getRandomIntegerInputs<int32_t>)
		->Arg(TestNumInputs) // Small int32_t vector
		->Arg(TestNumInputs * TestNumInputs); // Big int32_t vector

		BENCHMARK_CAPTURE(BM_EqualRange, random_int64, getRandomIntegerInputs<int64_t>)
		->Arg(TestNumInputs); // Small int64_t vector. Should also represent pointers.

		BENCHMARK_CAPTURE(BM_EqualRange, random_strings, getRandomStringInputs)
		->Arg(TestNumInputs); // Small string vector. What happens if the comparison is not very cheap.

	BENCHMARK_MAIN();	BENCHMARK_MAIN();
Context not available.

benchmarks/ordered_set.bench.cpp

Context not available.
	using Base::Base;	using Base::Base;

	void run(benchmark::State& State) const {	void run(benchmark::State& State) const {
	std::vector<size_t> Keys(TableSize);	std::vector<uint64_t> Keys(TableSize);
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Had a compilation error here: /space/llvm/llvm/projects/libcxx/benchmarks/ordered_set.bench.cpp:37:6: note: candidate function not viable: no known conversion from 'std::vector<size_t>' (aka 'vector<unsigned long>') to 'std::vector<uint64_t> &' (aka 'vector<unsigned long long> &') for 1st argument dyaroshev: Had a compilation error here: /space/llvm/llvm/projects/libcxx/benchmarks/ordered_set.bench.
	std::iota(Keys.begin(), Keys.end(), size_t{0});	std::iota(Keys.begin(), Keys.end(), size_t{0});
	sortKeysBy(Keys, Access());	sortKeysBy(Keys, Access());

Context not available.

include/algorithm

Context not available.
	bool operator()(const _T1& __x, const _T2& __y) {return __p_(__y, __x);}	bool operator()(const _T1& __x, const _T2& __y) {return __p_(__y, __x);}
	};	};

		// Perform division by two quickly for positive integers (llvm.org/PR39129)

		EricWFUnsubmitted Done Reply Inline Actions I think we can use `_LIBCPP_CONSTEXPR` here and get better behavior in C++03. EricWF: I think we can use `_LIBCPP_CONSTEXPR` here and get better behavior in C++03.
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Sure, will do. Why _LIBCPP_CONSTEXPR helps in C++03? dyaroshev: Sure, will do. Why _LIBCPP_CONSTEXPR helps in C++03?
		EricWFUnsubmitted Not Done Reply Inline Actions Woops. I meant to write "C++11". My mistake. EricWF: Woops. I meant to write "C++11". My mistake.
		template <typename _Integral>
		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR
		typename enable_if
		<
		is_integral<_Integral>::value,
		_Integral
		EricWFUnsubmitted Done Reply Inline Actions While this version is certainly simplier, I would still like to have this work for all of the builtin signed integral types. EricWF: While this version is certainly simplier, I would still like to have this work for all of the…
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Will do. I tried experimenting with this a bit more, but it's quite hard to tell what divisions are actually faster. The problem with my previous benchmark is that I cast from 64bit pointer to 32bit to do division. While the question that I really want to answer is - is it a better to cast 32 bit thingy to 32 unsigned or to 64 bit unsigned to do the division. Measuring it in isolation seems shady: http://quick-bench.com/Dn1QDyTLbYjH68qDydX3rM6Azko I tried to write an implementation of binary search that would reliably use 32bit arithmetics, but it doesn't really work out, since the pointer dereference is still 64 bit. It's also important (I've seen a similar issue) that the unsigned division doesn't occupy an extra register, but definetely not when you measure this in isolation (see line seven in godbolt: https://gcc.godbolt.org/z/Fx5M6l) My conclusion is: Are you OK if I just do make_unsigned? Seems to be better than signed all the time but not 100% that it's the best cast. dyaroshev: Will do. I tried experimenting with this a bit more, but it's quite hard to tell what divisions…
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Just used make_unsigned. dyaroshev: Just used make_unsigned.
		>::type
		__half_positive(_Integral __value)
		{
		return static_cast<_Integral>(static_cast<typename make_unsigned<_Integral>::type>(__value) / 2);
		}

		template <typename _Tp>
		_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR
		typename enable_if
		<
		!is_integral<_Tp>::value,
		_Tp
		>::type
		__half_positive(_Tp __value)
		{
		return __value / 2;
		}

	#ifdef _LIBCPP_DEBUG	#ifdef _LIBCPP_DEBUG

	template <class _Compare>	template <class _Compare>
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions If you know a better metaprogramming trick to do this, would be great. dyaroshev: If you know a better metaprogramming trick to do this, would be great.
		mclow.listsUnsubmitted Not Done Reply Inline Actions Does `make_unsigned` do what you want here? mclow.lists: Does `make_unsigned` do what you want here?
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions I don't think so. Seems like size_t if faster than unsigned: http://quick-bench.com/hATsToKHx8UP3legquGgvT-4qpU (this is not only this benchmark, I've seen other cases when it happens. dyaroshev: I don't think so. Seems like size_t if faster than unsigned: http://quick-bench.
		mclow.listsUnsubmitted Not Done Reply Inline Actions I think we may be talking about different things. I was talking about the template metafunction `std::make_unsigned`, which gives you an unsigned type with the same size as the parameter. From your comment, I get the feeling you are talking about the literal type `unsigned`. mclow.lists: I think we may be talking about different things. I was talking about the template…
		mclow.listsUnsubmitted Not Done Reply Inline Actions If I change line 9 in your quickbench example to: using len_type = std::make_unsigned_t<typename std::iterator_traits<I>::difference_type>; then the two chunks of code run at the same speed. mclow.lists: If I change line 9 in your quickbench example to: using len_type = std…
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Sorry, wasn't clear enough. Not all unsigned types are created equal: for int - unsigned type would unsigned(32 bit) not size_t - 64 bit. If you use size_t - it's faster, so if we would get int difference type - I'd still like to use size_t. Realistically, the only difference_type we care about is std::ptrdiff_t, which is 64 bit, so either way the important case works. But, if this is really the only case we care about, we can just specialise for it. dyaroshev: Sorry, wasn't clear enough. Not all unsigned types are created equal: for int - unsigned type…
		EricWFUnsubmitted Not Done Reply Inline Actions This is always true. It's a quirk of `_LIBCPP_STD_VER` that it's never less than 11. What you want here is `#if !defined(_LIBCPP_CXX03_LANG)` EricWF: This is always true. It's a quirk of `_LIBCPP_STD_VER` that it's never less than 11. What you…
		EricWFUnsubmitted Not Done Reply Inline Actions s/Bug 39129/llvm.org/PR39129 EricWF: s/Bug 39129/llvm.org/PR39129
		EricWFUnsubmitted Not Done Reply Inline Actions `__value_` with the trailing underscore should only be used for members. And pass by value. You'll have to forgive the existing code around you that's doing it wrong. EricWF: `__value_` with the trailing underscore should only be used for members. And pass by value.
		EricWFUnsubmitted Not Done Reply Inline Actions If you really want to make this work in C++03, You should be able to use the super-secret member `numeric_limits<Integral>::__max`. But this only works because you know you're dealing with a builtin type in this specialization. EricWF: If you really want to make this work in C++03, You should be able to use the super-secret…
		EricWFUnsubmitted Not Done Reply Inline Actions I think this comment can be reduced to something like: // Perform division by two quickly for positive integers (llvm.org/PR39129) We want to keep our header sizes small EricWF: I think this comment can be reduced to something like: ``` // Perform division by two quickly…
Context not available.
	difference_type __len = _VSTD::distance(__first, __last);	difference_type __len = _VSTD::distance(__first, __last);
	while (__len != 0)	while (__len != 0)
	{	{
	difference_type __l2 = __len / 2;	difference_type __l2 = _VSTD::__half_positive(__len);
	_ForwardIterator __m = __first;	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);	_VSTD::advance(__m, __l2);
		EricWFUnsubmitted Not Done Reply Inline Actions s/`__value_`/`__value`. And I think we can pass by value here. EricWF: s/`__value_`/`__value`. And I think we can pass by value here.
		EricWFUnsubmitted Not Done Reply Inline Actions Nit on the name: I would rather it reflected (A) that the number must be positive, and (B) it's fast. The fact that it does it using unsigned numbers is an implementation detail. EricWF: Nit on the name: I would rather it reflected (A) that the number must be positive, and (B)…
	if (__pred(*__m))	if (__pred(*__m))
		EricWFUnsubmitted Not Done Reply Inline Actions Needs _VSTD:: qualifier to defeat ADL. EricWF: Needs _VSTD:: qualifier to defeat ADL.
Context not available.
	difference_type __len = _VSTD::distance(__first, __last);	difference_type __len = _VSTD::distance(__first, __last);
	while (__len != 0)	while (__len != 0)
	{	{
	difference_type __l2 = __len / 2;	difference_type __l2 = _VSTD::__half_positive(__len);
	_ForwardIterator __m = __first;	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);	_VSTD::advance(__m, __l2);
	if (__comp(*__m, __value_))	if (__comp(*__m, __value_))
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions We are not allowed (by default) to use C++11 for this file? A simple lambda would do this much better. dyaroshev: We are not allowed (by default) to use C++11 for this file? A simple lambda would do this much…
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Code duplication between partition_point, lower_bound and upper_bound doesn't seem to have any practical value. dyaroshev: Code duplication between partition_point, lower_bound and upper_bound doesn't seem to have any…
		EricWFUnsubmitted Not Done Reply Inline Actions This seems like a job for `__compressed_pair`. EricWF: This seems like a job for `__compressed_pair`.
		EricWFUnsubmitted Not Done Reply Inline Actions There is no reason the functor is guaranteed to be compiled out completely, is there? EricWF: There is no reason the functor is guaranteed to be compiled out completely, is there?
		EricWFUnsubmitted Not Done Reply Inline Actions We could move here. EricWF: We could move here.
		EricWFUnsubmitted Not Done Reply Inline Actions I would prefer `__less_than_t` and `__less_than` had more specific names which reflect how the types/functions bind to comparators/iterators. That being said, I'm also OK if we scrap this optimization in C++03 and then implement it using C++11 lambdas. EricWF: I would prefer `__less_than_t` and `__less_than` had more specific names which reflect how the…
Context not available.
	difference_type __len = _VSTD::distance(__first, __last);	difference_type __len = _VSTD::distance(__first, __last);
	while (__len != 0)	while (__len != 0)
	{	{
	difference_type __l2 = __len / 2;	difference_type __l2 = _VSTD::__half_positive(__len);
	_ForwardIterator __m = __first;	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);	_VSTD::advance(__m, __l2);
	if (__comp(__value_, *__m))	if (__comp(__value_, *__m))
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Do you know of a good way to remove code duplication? Move common parts into a base class? dyaroshev: Do you know of a good way to remove code duplication? Move common parts into a base class?
		mclow.listsUnsubmitted Not Done Reply Inline Actions Why constexpr after C++11? mclow.lists: Why constexpr after C++11?
		dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Mostly to be constexpr in c++17. Do I say 17 instead? Can be contstexpr in 11 no problem dyaroshev: Mostly to be constexpr in c++17. Do I say 17 instead? Can be contstexpr in 11 no problem
Context not available.
	difference_type __len = _VSTD::distance(__first, __last);	difference_type __len = _VSTD::distance(__first, __last);
	while (__len != 0)	while (__len != 0)
	{	{
	difference_type __l2 = __len / 2;	difference_type __l2 = _VSTD::__half_positive(__len);
	_ForwardIterator __m = __first;	_ForwardIterator __m = __first;
	_VSTD::advance(__m, __l2);	_VSTD::advance(__m, __l2);
	if (__comp(*__m, __value_))	if (__comp(*__m, __value_))
Context not available.
		EricWFUnsubmitted Not Done Reply Inline Actions Needs _VSTD:: qualifier to defeat ADL. EricWF: Needs _VSTD:: qualifier to defeat ADL.

test/libcxx/algorithms/half_positive.pass.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is dual licensed under the MIT and the University of Illinois Open
				// Source Licenses. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				// <algorithm>

				// template <typename _Tp> _Tp __half_positive(const _Tp&);

				// __half_positive divide integer number by 2 as unsigned number
				// if it's safe to do so. It can be an important optimization for lower bound,
				// for example.

				#include <algorithm>
				#include <cassert>
				#include <limits>
				#include <type_traits>

				#include "test_macros.h"
				#include "user_defined_integral.hpp"

				namespace {

				template <class IntType, class UnderlyingType = IntType>
				EricWFUnsubmitted Done Reply Inline Actions Please make sure this test compiles in C++03, C++11, ect. You'll want to use `TEST_STD_VER` from `test_macros.h` to guard the constexpr tests. EricWF: Please make sure this test compiles in C++03, C++11, ect. You'll want to use `TEST_STD_VER`…
				dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions Will do. Can you please point me to how do I compile in C++03 mode? dyaroshev: Will do. Can you please point me to how do I compile in C++03 mode?
				dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions found in the docs. Instead of feature tests, used internal macroses. dyaroshev: found in the docs. Instead of feature tests, used internal macroses.
				TEST_CONSTEXPR bool test(IntType max_v = IntType(std::numeric_limits<UnderlyingType>::max())) {
				return std::__half_positive(max_v) == max_v / 2;
				}

				EricWFUnsubmitted Not Done Reply Inline Actions Other libraries run our test suite, so we can't use internal macros. Could you please write one set of tests that run at runtime, and another set that uses constexpr and static assert. And then guard the constexpr tests under `#if TEST_STD_VER >= 11`. EricWF: Other libraries run our test suite, so we can't use internal macros. Could you please write…
				dyaroshevAuthorUnsubmitted Not Done Reply Inline Actions I can but how they can run these tests? There are public tests right? And these are private tests. Other libraries cannot run this test because they do not have __half_positive not because they do not have a macro? Or am I missing smth? dyaroshev: I can but how they can run these tests? There are public tests right? And these are private…
				EricWFUnsubmitted Not Done Reply Inline Actions Good point. These tests are specific to libc++. My mistake. We still don't use `_LIBCPP_CONSTEXPR` to declare constexpr variables in tests. It's just not how that's supposed to be used. Plus, for the constexpr tests you want to use `static_assert` and not assert. Testing both runtime and compile time behavior will require a little extra duplication, but it's probably worth it. (If you're curious here's how I would have written it: https://gist.github.com/EricWF/df7e02ca931075b62caac587b7c0cf27) EricWF: Good point. These tests are specific to libc++. My mistake. We still don't use…
				} // namespace

				int main()
				{
				{
				assert(test<char>());
				assert(test<int>());
				assert(test<long>());
				assert((test<UserDefinedIntegral<int>, int>()));
				assert(test<size_t>());
				#if !defined(_LIBCPP_HAS_NO_INT128)
				assert(test<__int128_t>());
				#endif // !defined(_LIBCPP_HAS_NO_INT128)
				}

				#if TEST_STD_VER >= 11
				{
				static_assert(test<char>(), "");
				static_assert(test<int>(), "");
				static_assert(test<long>(), "");
				static_assert(test<size_t>(), "");
				#if !defined(_LIBCPP_HAS_NO_INT128)
				static_assert(test<__int128_t>(), "");
				#endif // !defined(_LIBCPP_HAS_NO_INT128)
				}
				#endif // TEST_STD_VER >= 11
				}