Patch https://reviews.llvm.org/D52697 introduced a regression on certain platforms. (for example on quick-bench: http://quick-bench.com/kguTLHndCQWFuzlxJQT4tJQa_p8)
This patch should fix that regression: http://quick-bench.com/fcV9gaAWPeqB_3-AePOw9H2wxXU
I do not know yet why it behaves like it is.
I do think that these measurements are correct since I've tried multiple variations on quick-bench and didn't stumble into the opposite.
On my machine this regression does not reproduce.
As part of the review, please check that you do not run into this issue on your machine.
Related bug: https://bugs.llvm.org/show_bug.cgi?id=39129
UPD: have a strong suspicion that this is the issue with quick-bench using clang6: https://gcc.godbolt.org/z/WqhWpj
The same problem does not reproduce with gcc on quick-bench
UPD2: OK, I did the investigation and I think this was a false alarm - we can recommit the previous patch.
This issue does not arise as a result of miscompliation, though clang7.0 has a significantly better codegen.
Here are reruns of the benchmarks on my machine for different clang versions (3.9, 5.0. 6.0, 7.0):
https://gist.github.com/DenisYaroshevskiy/eac461b1f8c2f80ce52790a056e460a8
The thing that I didn't test when I raised this issue is I didn't try to reimplement what went in the standard library from scratch and see how it performs compared to the std::lower_bound available on quick-bench. Here is what it looks like: http://quick-bench.com/prKysK3i2idH483CrKDJ45CsGuo
(don't mind that in this measurement the Sean Paren's version is slightly faster - it's an artefact of benchmarking - most likely code alignment issue. You can remove std::lower_bound benchmark - and see that the new version of lb will be faster: http://quick-bench.com/hor6m5_f0Jy2ZCQPcwc94Mj-Ga0)
Actually, what I now think is that the standard library on the quick-bench website is not updated from master. It seemed to me that the difference between std::lower_bound and the Sean Parent's one was not that big - but since it has been more than a week - the version with reversed patch would have been picked up already, so I was wrong about that.
I emailed Frederic Tingaud (the quick-bench) to confirm this- waiting for answer.
Note: we can tweak code to compile better on clang6.0 and the previous ones - but it will make it less readable. I don't think it's worth it - new patch is still a win and I'd expect users to move to the new compiler/libcxx somewhat simultaneously. Though this is up to you.
I'll admit to being extremely surprised this makes any difference whatsoever. In my experience, Clang always emits the same instruction no matter what: shr. Adding @jfb to take a look. If he's OK with this, I'm OK with this.