Page MenuHomePhabricator

[libc++] Forward to std::{,w}memchr in std::find
ClosedPublic

Authored by philnik on Feb 20 2023, 6:57 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Commits
rG1fd08edd5808: [libc++] Forward to std::{,w}memchr in std::find

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks for working on this, the benchmarks look very nice!

libcxx/benchmarks/algorithms/find.bench.cpp
10

I really would like the benchmark results you posted in the commit message.

libcxx/include/__algorithm/find.h
54

Why not test is_same_v<_Tp, wchar_t>?

58

Does this function work with char16_t or char32_t (depending on sizeof(wchar_t).)

libcxx/include/__string/constexpr_c_functions.h
78–88

Note that the current way it's written generates unneeded code

libcxx/include/cwchar
242

This seems duplicated code from the char version, can we combine it in a templated function.
The same issue with not using else in if constexpr applies here too,

philnik marked 2 inline comments as done.Mar 22 2023, 1:45 PM
philnik added inline comments.
libcxx/include/__algorithm/find.h
54

Because that would disable the optimization for other types that have the same size as wchar_t. For these types this optimization is probably UB, but that shouldn't be a problem in practice, since it's on an ABI boundary and no compiler currently optimizes wmemchr specifically (and IMO no compiler should ever optimize based on this, since there is no potential gain I can see).

58

See my comment above.

libcxx/include/__string/constexpr_c_functions.h
78–88

This looks quite surprising to me, and I'm not sure it's worth the little bit of time the compiler spends on this.

libcxx/include/cwchar
242

I don't think it's worth it. The loop is really simple, and deduplicating this seems like a lot of extra complexity compared to the benefit.

ldionne added inline comments.Mar 23 2023, 10:20 AM
libcxx/include/cwchar
230

Should we also assert the we have the same alignment as wchar_t here?

250–251

For the non-constexpr case, I would do this to make sure that we really never exploit any UB, even if it may seem to be benign right now:

if constexpr (std::is_trivial_v<_Tp>) { // we know it's safe to reinterpret cast (at least in C++20 with implicit lifetime types)
    return reinterpret_cast<_Tp*>(std::wmemchr(reinterpret_cast<__apply_cv<_Tp, wchar_t>::type*>(__str), __char, __count));
} else {
    // loop
}

Concretely, this won't change anything since we already only call this function with integral types, however if we were to e.g. specialize __is_trivially_equality_comparable for a non-trivial type, then my suggested code would work whereas the original code might be (benign) UB. WDYT?

libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp
10

We should also update the tests for ranges::find.

38–86

We should also have a test that checks that if multiple elements match, we return the first one.

53

Let's also add a test for an empty range.

57

This shouldn't do the same compare that would be generated anyway, otherwise we are not testing properly that we don't use memchr.

59

We also need to test with types that have different sizes so we trigger the memchr and the wmemchr code paths. Right now we don't trigger both with non-integral types.

Mordante added inline comments.Mar 24 2023, 10:23 AM
libcxx/include/__algorithm/find.h
54

The function you call has the following signature

inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 const wchar_t*
__constexpr_wmemchr(const wchar_t* __str, wchar_t __char, size_t __count)

So how can this work when _Tp is not a wchar_t? That requires a reinterpret_cast which is not allowed during constant evaluation. What am I missing?

libcxx/include/__string/constexpr_c_functions.h
78–88

This is something we have used before. Now it will generate dead code.

libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp
78

I really would like to see tests for all types that will use this optimization. Especially to make sure the code is valid during constant evaluation.

philnik marked 2 inline comments as done.Mar 24 2023, 10:40 AM
philnik added inline comments.
libcxx/include/__algorithm/find.h
54

Your signature is wrong, __constexpr_wmemchr is templated.

libcxx/include/__string/constexpr_c_functions.h
78–88

clang doesn't even emit the condition from the fronted AFAICT, let alone the backend: https://godbolt.org/z/87bvzbhbT. This is at best a minimal compile time speedup.

libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp
78

We currently only have integral types and pointers marked trivially equality comparable, and I don't think it's worth testing for pointers specifically, since they are for all intents and purposes integers with a funny syntax in this case.

philnik updated this revision to Diff 521079.Wed, May 10, 1:08 PM
philnik marked 7 inline comments as done.

Address comments

philnik retitled this revision from [libc++] Forward to std::{,w}memcmp in std::find to [libc++] Forward to std::{,w}memchr in std::find.Wed, May 10, 2:23 PM
philnik added inline comments.Wed, May 10, 4:15 PM
libcxx/include/cwchar
250–251

I went for the "only optimize if we have __builtin_wmemchr" option instead. IMO we should try to apply this optimization as broadly as possible, and we can at least heavily influence how clang optimizes __builtin_wmemchr. This means that GCC won't optimize this as well, but I think this is OK given that by far most libc++ users use clang.

philnik updated this revision to Diff 521150.Wed, May 10, 4:25 PM

Try to fix CI

philnik updated this revision to Diff 521352.Thu, May 11, 9:43 AM

Try to fix CI

philnik updated this revision to Diff 521393.Thu, May 11, 11:45 AM

Try to fix CI

philnik updated this revision to Diff 521442.Thu, May 11, 2:06 PM

Fix formatting

philnik updated this revision to Diff 521496.Thu, May 11, 4:48 PM

Try to fix CI

philnik updated this revision to Diff 521693.Fri, May 12, 9:48 AM

Try to fix CI

philnik updated this revision to Diff 521774.Fri, May 12, 1:18 PM

Try to fix CI

philnik updated this revision to Diff 522193.Mon, May 15, 7:58 AM

Try to fix CI

Mostly LGTM but I'd like to see again before merging, especially after splitting up the test changes.

libcxx/benchmarks/algorithms/find.bench.cpp
22

We should stop/start the timing to avoid including this bit in the benchmark, since this is calling into mt199whatever which is non trivial.

You can use state.PauseTiming() and ResumeTiming() to do this. You should also check whether that introduces a bunch of noise in the timings, I've noticed that before.

libcxx/include/__algorithm/find.h
65

HIDE_FROM_ABI while we're at it.

libcxx/include/__string/constexpr_c_functions.h
82–84

I would suggest this instead, otherwise it looks like the memcpy is trying to tell us something but it's really just a more complicated way to do what's below:

return static_cast<_Tp*>(__builtin_memchr(__str, *reinterpret_cast<char*>(&__value), __count));
libcxx/test/std/algorithms/alg.nonmodifying/alg.find/ranges.find.pass.cpp
198 ↗(On Diff #522193)

Could you please split these removals in a different review so we can talk about them separately?

philnik updated this revision to Diff 522244.Mon, May 15, 9:59 AM
philnik marked 4 inline comments as done.

Address comments

philnik updated this revision to Diff 522273.Mon, May 15, 11:14 AM

Rebased on top of D150588

philnik added inline comments.Tue, May 16, 9:59 AM
libcxx/benchmarks/algorithms/find.bench.cpp
22

When adding them the times are in the hundreds of nanoseconds for single elements. Also, the documentation says:

// NOTE: PauseTiming()/ResumeTiming() are relatively
// heavyweight, and so their use should generally be avoided
// within each benchmark iteration, if possible.
ldionne accepted this revision.Wed, May 17, 8:08 AM

LGTM w/ green CI!

This revision is now accepted and ready to land.Wed, May 17, 8:08 AM
philnik updated this revision to Diff 523094.Wed, May 17, 10:09 AM

Try to fix CI

philnik updated this revision to Diff 524866.Tue, May 23, 1:40 PM

Try to fix CI

philnik updated this revision to Diff 525203.Wed, May 24, 8:54 AM

Try to fix CI

philnik updated this revision to Diff 525342.Wed, May 24, 2:43 PM

Try to fix CI

This revision was automatically updated to reflect the committed changes.

I think this might have broken one of the sanitizer buildbots:

https://lab.llvm.org/buildbot/#/builders/37/builds/22340

+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new
+wmemchr U
+ echo 'Failed: unexpected symbols'

(it might just need to have that line added?)

I think this might have broken one of the sanitizer buildbots:

https://lab.llvm.org/buildbot/#/builders/37/builds/22340

+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new
+wmemchr U
+ echo 'Failed: unexpected symbols'

(it might just need to have that line added?)

Yeah, that looks like a compiler-rt thing. I guess they check which symbols they rely on (and apparently now they rely on ẁmemchr`). Given that wmemcpy, wmemmove and wmemset are also listed there, it's probably fine to add wmemchr.

I think this might have broken one of the sanitizer buildbots:

https://lab.llvm.org/buildbot/#/builders/37/builds/22340

+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new
+wmemchr U
+ echo 'Failed: unexpected symbols'

(it might just need to have that line added?)

Yeah, that looks like a compiler-rt thing. I guess they check which symbols they rely on (and apparently now they rely on ẁmemchr`). Given that wmemcpy, wmemmove and wmemset are also listed there, it's probably fine to add wmemchr.

Thanks Phil! I've added it in D151484

The libcxx tests might also need updating:

https://lab.llvm.org/buildbot/#/builders/74/builds/19459/steps/14/logs/stdio

*** /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/test/libcxx/transitive_includes/cxx26.csv
--- /b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/test/libcxx/Output/transitive_includes.sh.cpp.dir/t.tmp/transitive_includes.csv
***************
*** 3,8 ****
--- 3,9 ----
  algorithmcstdint
  algorithmcstring
  algorithmctime
+ algorithmcwchar
  algorithmexecution
  algorithminitializer_list
  algorithmiosfwd
***************
*** 127,138 ****
--- 128,141 ----
  cstddefversion
  ctgmathccomplex
  ctgmathcmath
+ cwcharcstddef
  cwcharcwctype
  cwctypecctype
  dequecompare
  dequecstddef
  dequecstdint
  dequecstring
+ dequecwchar
  dequeinitializer_list
  dequelimits
  dequenew
***************
*** 442,447 ****
--- 445,451 ----
  randomversion
  rangescompare
  rangescstddef
+ rangescwchar
  rangesinitializer_list
  rangesiosfwd
  rangesiterator
***************
*** 640,645 ****
--- 644,650 ----
  vectorcstdint
  vectorcstdlib
  vectorcstring
+ vectorcwchar
  vectorinitializer_list
  vectoriosfwd
  vectorlimits

Hi @philnik,
yes, correct. Thank you.
There is currently another story there.