Details
- Reviewers
ldionne - Group Reviewers
Restricted Project - Commits
- rG1fd08edd5808: [libc++] Forward to std::{,w}memchr in std::find
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for working on this, the benchmarks look very nice!
libcxx/benchmarks/algorithms/find.bench.cpp | ||
---|---|---|
10 | I really would like the benchmark results you posted in the commit message. | |
libcxx/include/__algorithm/find.h | ||
54 | Why not test is_same_v<_Tp, wchar_t>? | |
58 | Does this function work with char16_t or char32_t (depending on sizeof(wchar_t).) | |
libcxx/include/__string/constexpr_c_functions.h | ||
78–88 | Note that the current way it's written generates unneeded code | |
libcxx/include/cwchar | ||
242 | This seems duplicated code from the char version, can we combine it in a templated function. |
libcxx/include/__algorithm/find.h | ||
---|---|---|
54 | Because that would disable the optimization for other types that have the same size as wchar_t. For these types this optimization is probably UB, but that shouldn't be a problem in practice, since it's on an ABI boundary and no compiler currently optimizes wmemchr specifically (and IMO no compiler should ever optimize based on this, since there is no potential gain I can see). | |
58 | See my comment above. | |
libcxx/include/__string/constexpr_c_functions.h | ||
78–88 | This looks quite surprising to me, and I'm not sure it's worth the little bit of time the compiler spends on this. | |
libcxx/include/cwchar | ||
242 | I don't think it's worth it. The loop is really simple, and deduplicating this seems like a lot of extra complexity compared to the benefit. |
libcxx/include/cwchar | ||
---|---|---|
230 | Should we also assert the we have the same alignment as wchar_t here? | |
250–251 | For the non-constexpr case, I would do this to make sure that we really never exploit any UB, even if it may seem to be benign right now: if constexpr (std::is_trivial_v<_Tp>) { // we know it's safe to reinterpret cast (at least in C++20 with implicit lifetime types) return reinterpret_cast<_Tp*>(std::wmemchr(reinterpret_cast<__apply_cv<_Tp, wchar_t>::type*>(__str), __char, __count)); } else { // loop } Concretely, this won't change anything since we already only call this function with integral types, however if we were to e.g. specialize __is_trivially_equality_comparable for a non-trivial type, then my suggested code would work whereas the original code might be (benign) UB. WDYT? | |
libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp | ||
10 | We should also update the tests for ranges::find. | |
38–86 | We should also have a test that checks that if multiple elements match, we return the first one. | |
53 | Let's also add a test for an empty range. | |
57 | This shouldn't do the same compare that would be generated anyway, otherwise we are not testing properly that we don't use memchr. | |
59 | We also need to test with types that have different sizes so we trigger the memchr and the wmemchr code paths. Right now we don't trigger both with non-integral types. |
libcxx/include/__algorithm/find.h | ||
---|---|---|
54 | The function you call has the following signature inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 const wchar_t* __constexpr_wmemchr(const wchar_t* __str, wchar_t __char, size_t __count) So how can this work when _Tp is not a wchar_t? That requires a reinterpret_cast which is not allowed during constant evaluation. What am I missing? | |
libcxx/include/__string/constexpr_c_functions.h | ||
78–88 | This is something we have used before. Now it will generate dead code. | |
libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp | ||
78 | I really would like to see tests for all types that will use this optimization. Especially to make sure the code is valid during constant evaluation. |
libcxx/include/__algorithm/find.h | ||
---|---|---|
54 | Your signature is wrong, __constexpr_wmemchr is templated. | |
libcxx/include/__string/constexpr_c_functions.h | ||
78–88 | clang doesn't even emit the condition from the fronted AFAICT, let alone the backend: https://godbolt.org/z/87bvzbhbT. This is at best a minimal compile time speedup. | |
libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp | ||
78 | We currently only have integral types and pointers marked trivially equality comparable, and I don't think it's worth testing for pointers specifically, since they are for all intents and purposes integers with a funny syntax in this case. |
libcxx/include/cwchar | ||
---|---|---|
250–251 | I went for the "only optimize if we have __builtin_wmemchr" option instead. IMO we should try to apply this optimization as broadly as possible, and we can at least heavily influence how clang optimizes __builtin_wmemchr. This means that GCC won't optimize this as well, but I think this is OK given that by far most libc++ users use clang. |
Mostly LGTM but I'd like to see again before merging, especially after splitting up the test changes.
libcxx/benchmarks/algorithms/find.bench.cpp | ||
---|---|---|
22 | We should stop/start the timing to avoid including this bit in the benchmark, since this is calling into mt199whatever which is non trivial. You can use state.PauseTiming() and ResumeTiming() to do this. You should also check whether that introduces a bunch of noise in the timings, I've noticed that before. | |
libcxx/include/__algorithm/find.h | ||
65 | HIDE_FROM_ABI while we're at it. | |
libcxx/include/__string/constexpr_c_functions.h | ||
82–84 | I would suggest this instead, otherwise it looks like the memcpy is trying to tell us something but it's really just a more complicated way to do what's below: return static_cast<_Tp*>(__builtin_memchr(__str, *reinterpret_cast<char*>(&__value), __count)); | |
libcxx/test/std/algorithms/alg.nonmodifying/alg.find/ranges.find.pass.cpp | ||
198 ↗ | (On Diff #522193) | Could you please split these removals in a different review so we can talk about them separately? |
libcxx/benchmarks/algorithms/find.bench.cpp | ||
---|---|---|
22 | When adding them the times are in the hundreds of nanoseconds for single elements. Also, the documentation says: // NOTE: PauseTiming()/ResumeTiming() are relatively // heavyweight, and so their use should generally be avoided // within each benchmark iteration, if possible. |
I think this might have broken one of the sanitizer buildbots:
https://lab.llvm.org/buildbot/#/builders/37/builds/22340
+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new +wmemchr U + echo 'Failed: unexpected symbols'
(it might just need to have that line added?)
Yeah, that looks like a compiler-rt thing. I guess they check which symbols they rely on (and apparently now they rely on ẁmemchr`). Given that wmemcpy, wmemmove and wmemset are also listed there, it's probably fine to add wmemchr.
The libcxx tests might also need updating:
https://lab.llvm.org/buildbot/#/builders/74/builds/19459/steps/14/logs/stdio
*** /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/test/libcxx/transitive_includes/cxx26.csv --- /b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/test/libcxx/Output/transitive_includes.sh.cpp.dir/t.tmp/transitive_includes.csv *************** *** 3,8 **** --- 3,9 ---- algorithmcstdint algorithmcstring algorithmctime + algorithmcwchar algorithmexecution algorithminitializer_list algorithmiosfwd *************** *** 127,138 **** --- 128,141 ---- cstddefversion ctgmathccomplex ctgmathcmath + cwcharcstddef cwcharcwctype cwctypecctype dequecompare dequecstddef dequecstdint dequecstring + dequecwchar dequeinitializer_list dequelimits dequenew *************** *** 442,447 **** --- 445,451 ---- randomversion rangescompare rangescstddef + rangescwchar rangesinitializer_list rangesiosfwd rangesiterator *************** *** 640,645 **** --- 644,650 ---- vectorcstdint vectorcstdlib vectorcstring + vectorcwchar vectorinitializer_list vectoriosfwd vectorlimits