This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Forward to std::{,w}memchr in std::find
ClosedPublic

Authored by philnik on Feb 20 2023, 6:57 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Commits
rG1fd08edd5808: [libc++] Forward to std::{,w}memchr in std::find

Diff Detail

Event Timeline

philnik created this revision.Feb 20 2023, 6:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2023, 6:57 AM

Benchmarks:

---------------------------------------------------------------
Benchmark                                   old             new
---------------------------------------------------------------
bm_find<char>/1                         10.5 ns         10.5 ns
bm_find<char>/2                         18.3 ns         10.4 ns
bm_find<char>/3                         21.1 ns         10.4 ns
bm_find<char>/4                         22.4 ns         10.3 ns
bm_find<char>/5                         23.3 ns         10.3 ns
bm_find<char>/6                         23.7 ns         10.4 ns
bm_find<char>/7                         24.0 ns         10.3 ns
bm_find<char>/8                         24.5 ns         10.2 ns
bm_find<char>/16                        26.2 ns         10.2 ns
bm_find<char>/64                        29.9 ns         20.8 ns
bm_find<char>/512                       80.2 ns         36.0 ns
bm_find<char>/4096                       523 ns         43.0 ns
bm_find<char>/32768                     3838 ns          105 ns
bm_find<char>/262144                   30832 ns          988 ns
bm_find<char>/1048576                 122541 ns         4854 ns
bm_find<short>/1                        10.5 ns         10.4 ns
bm_find<short>/2                        18.3 ns         18.5 ns
bm_find<short>/3                        21.2 ns         21.4 ns
bm_find<short>/4                        22.6 ns         22.7 ns
bm_find<short>/5                        23.6 ns         23.6 ns
bm_find<short>/6                        24.1 ns         24.2 ns
bm_find<short>/7                        24.5 ns         24.6 ns
bm_find<short>/8                        24.6 ns         24.8 ns
bm_find<short>/16                       26.1 ns         26.6 ns
bm_find<short>/64                       33.3 ns         31.3 ns
bm_find<short>/512                       134 ns         80.4 ns
bm_find<short>/4096                      967 ns          499 ns
bm_find<short>/32768                    7670 ns         3842 ns
bm_find<short>/262144                  61106 ns        31371 ns
bm_find<short>/1048576                244510 ns       122120 ns
bm_find<int>/1                          10.4 ns         12.1 ns
bm_find<int>/2                          18.4 ns         12.0 ns
bm_find<int>/3                          21.9 ns         12.0 ns
bm_find<int>/4                          23.3 ns         11.9 ns
bm_find<int>/5                          24.2 ns         12.1 ns
bm_find<int>/6                          24.7 ns         12.0 ns
bm_find<int>/7                          24.8 ns         11.9 ns
bm_find<int>/8                          25.2 ns         11.9 ns
bm_find<int>/16                         26.2 ns         21.9 ns
bm_find<int>/64                         30.5 ns         33.8 ns
bm_find<int>/512                        80.4 ns         40.4 ns
bm_find<int>/4096                        497 ns         71.7 ns
bm_find<int>/32768                      3844 ns          498 ns
bm_find<int>/262144                    30722 ns         4901 ns
bm_find<int>/1048576                  123303 ns        22864 ns
bm_ranges_find<char>/1                  10.4 ns         10.4 ns
bm_ranges_find<char>/2                  18.3 ns         10.4 ns
bm_ranges_find<char>/3                  21.1 ns         10.4 ns
bm_ranges_find<char>/4                  22.3 ns         10.3 ns
bm_ranges_find<char>/5                  23.2 ns         10.3 ns
bm_ranges_find<char>/6                  23.6 ns         10.3 ns
bm_ranges_find<char>/7                  24.0 ns         10.4 ns
bm_ranges_find<char>/8                  24.4 ns         10.5 ns
bm_ranges_find<char>/16                 26.0 ns         10.3 ns
bm_ranges_find<char>/64                 30.6 ns         20.9 ns
bm_ranges_find<char>/512                80.2 ns         36.2 ns
bm_ranges_find<char>/4096                496 ns         43.1 ns
bm_ranges_find<char>/32768              3844 ns          106 ns
bm_ranges_find<char>/262144            30632 ns          994 ns
bm_ranges_find<char>/1048576          123127 ns         5099 ns
bm_ranges_find<short>/1                 10.5 ns         10.5 ns
bm_ranges_find<short>/2                 18.4 ns         18.3 ns
bm_ranges_find<short>/3                 21.2 ns         21.2 ns
bm_ranges_find<short>/4                 22.7 ns         22.5 ns
bm_ranges_find<short>/5                 23.6 ns         23.4 ns
bm_ranges_find<short>/6                 24.1 ns         23.8 ns
bm_ranges_find<short>/7                 24.4 ns         24.1 ns
bm_ranges_find<short>/8                 24.7 ns         24.5 ns
bm_ranges_find<short>/16                26.2 ns         26.1 ns
bm_ranges_find<short>/64                33.2 ns         30.4 ns
bm_ranges_find<short>/512                134 ns         80.4 ns
bm_ranges_find<short>/4096               969 ns          506 ns
bm_ranges_find<short>/32768             7648 ns         3853 ns
bm_ranges_find<short>/262144           61198 ns        30507 ns
bm_ranges_find<short>/1048576         248057 ns       122080 ns
bm_ranges_find<int>/1                   10.5 ns         11.8 ns
bm_ranges_find<int>/2                   18.5 ns         11.9 ns
bm_ranges_find<int>/3                   22.0 ns         11.9 ns
bm_ranges_find<int>/4                   23.4 ns         11.8 ns
bm_ranges_find<int>/5                   24.3 ns         11.8 ns
bm_ranges_find<int>/6                   24.7 ns         12.0 ns
bm_ranges_find<int>/7                   24.8 ns         11.9 ns
bm_ranges_find<int>/8                   25.1 ns         11.8 ns
bm_ranges_find<int>/16                  26.2 ns         21.9 ns
bm_ranges_find<int>/64                  30.7 ns         33.8 ns
bm_ranges_find<int>/512                 80.4 ns         39.7 ns
bm_ranges_find<int>/4096                 498 ns         72.6 ns
bm_ranges_find<int>/32768               3842 ns          506 ns
bm_ranges_find<int>/262144             30566 ns         5311 ns
bm_ranges_find<int>/1048576           123103 ns        22743 ns
philnik updated this revision to Diff 505808.Mar 16 2023, 8:01 AM

Fix formatting

ldionne published this revision for review.Mar 16 2023, 9:27 AM
ldionne added a subscriber: ldionne.
ldionne added inline comments.
libcxx/include/__algorithm/ranges_find.h
42–48

Maybe this conveys the intent a bit better?

libcxx/include/__string/constexpr_c_functions.h
68
73–87

I would suggest reorganizing like this, to make it more obvious what the intent of using __builtin_char_memchr is.

if (__libcpp_is_constant_evaluated()) {
  // use __builtin_char_memchr to optimize constexpr evaluation if we can
#if __has_builtin(__builtin_char_memchr) 
  if (is_same<_Tp, char>::value)
    return __builtin_char_memchr(__str, __char, __count);
#endif

  for (; __count; --__count) {
    if (*__str == __char)
      return __str;
    ++__str;
  }
  return nullptr;
} else {
  return static_cast<_Tp*>(__builtin_memchr(__str, __char, __count));
}
libcxx/include/cwchar
231

static_assert("diagnostic")

You also seem to be missing an important requirement on _Up? You need _Up to be trivially copyable to wchar_t or else your std::wmemcpy may not be valid? I would actually just replace _Up by wchar_t since we don't need the additional generalization, at least for now.

232

You probably want to remove_cv_t<_Tp> here to also catch the case where _Tp* == wchar_t const*.

233–234

Now you don't need this anymore if you take wchar_t __char as input.

235

Does reintepret_cast<T>(object-of-type-T) work inside constexpr? If not then this doesn't compile when constant-evaluated, and it means your test coverage is also insufficient in that area. Either way please make sure that is tested.

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 9:27 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne requested changes to this revision.Mar 16 2023, 9:27 AM
This revision now requires changes to proceed.Mar 16 2023, 9:27 AM
philnik updated this revision to Diff 506290.Mar 18 2023, 5:48 AM
philnik marked 7 inline comments as done.

Address comments

philnik updated this revision to Diff 506371.Mar 19 2023, 3:00 AM

Try to fix CI

Thanks for working on this, the benchmarks look very nice!

Mordante added inline comments.Mar 19 2023, 11:04 AM
libcxx/benchmarks/algorithms/find.bench.cpp
10

I really would like the benchmark results you posted in the commit message.

libcxx/include/__algorithm/find.h
54

Why not test is_same_v<_Tp, wchar_t>?

58

Does this function work with char16_t or char32_t (depending on sizeof(wchar_t).)

libcxx/include/__string/constexpr_c_functions.h
81–91

Note that the current way it's written generates unneeded code

libcxx/include/cwchar
240

This seems duplicated code from the char version, can we combine it in a templated function.
The same issue with not using else in if constexpr applies here too,

philnik marked 2 inline comments as done.Mar 22 2023, 1:45 PM
philnik added inline comments.
libcxx/include/__algorithm/find.h
54

Because that would disable the optimization for other types that have the same size as wchar_t. For these types this optimization is probably UB, but that shouldn't be a problem in practice, since it's on an ABI boundary and no compiler currently optimizes wmemchr specifically (and IMO no compiler should ever optimize based on this, since there is no potential gain I can see).

58

See my comment above.

libcxx/include/__string/constexpr_c_functions.h
81–91

This looks quite surprising to me, and I'm not sure it's worth the little bit of time the compiler spends on this.

libcxx/include/cwchar
240

I don't think it's worth it. The loop is really simple, and deduplicating this seems like a lot of extra complexity compared to the benefit.

ldionne added inline comments.Mar 23 2023, 10:20 AM
libcxx/include/cwchar
232

Should we also assert the we have the same alignment as wchar_t here?

248–249

For the non-constexpr case, I would do this to make sure that we really never exploit any UB, even if it may seem to be benign right now:

if constexpr (std::is_trivial_v<_Tp>) { // we know it's safe to reinterpret cast (at least in C++20 with implicit lifetime types)
    return reinterpret_cast<_Tp*>(std::wmemchr(reinterpret_cast<__apply_cv<_Tp, wchar_t>::type*>(__str), __char, __count));
} else {
    // loop
}

Concretely, this won't change anything since we already only call this function with integral types, however if we were to e.g. specialize __is_trivially_equality_comparable for a non-trivial type, then my suggested code would work whereas the original code might be (benign) UB. WDYT?

libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp
10

We should also update the tests for ranges::find.

38–86

We should also have a test that checks that if multiple elements match, we return the first one.

53

Let's also add a test for an empty range.

57

This shouldn't do the same compare that would be generated anyway, otherwise we are not testing properly that we don't use memchr.

59

We also need to test with types that have different sizes so we trigger the memchr and the wmemchr code paths. Right now we don't trigger both with non-integral types.

Mordante added inline comments.Mar 24 2023, 10:23 AM
libcxx/include/__algorithm/find.h
54

The function you call has the following signature

inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 const wchar_t*
__constexpr_wmemchr(const wchar_t* __str, wchar_t __char, size_t __count)

So how can this work when _Tp is not a wchar_t? That requires a reinterpret_cast which is not allowed during constant evaluation. What am I missing?

libcxx/include/__string/constexpr_c_functions.h
81–91

This is something we have used before. Now it will generate dead code.

libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp
78

I really would like to see tests for all types that will use this optimization. Especially to make sure the code is valid during constant evaluation.

philnik marked 2 inline comments as done.Mar 24 2023, 10:40 AM
philnik added inline comments.
libcxx/include/__algorithm/find.h
54

Your signature is wrong, __constexpr_wmemchr is templated.

libcxx/include/__string/constexpr_c_functions.h
81–91

clang doesn't even emit the condition from the fronted AFAICT, let alone the backend: https://godbolt.org/z/87bvzbhbT. This is at best a minimal compile time speedup.

libcxx/test/std/algorithms/alg.nonmodifying/alg.find/find.pass.cpp
78

We currently only have integral types and pointers marked trivially equality comparable, and I don't think it's worth testing for pointers specifically, since they are for all intents and purposes integers with a funny syntax in this case.

philnik updated this revision to Diff 521079.May 10 2023, 1:08 PM
philnik marked 7 inline comments as done.

Address comments

philnik retitled this revision from [libc++] Forward to std::{,w}memcmp in std::find to [libc++] Forward to std::{,w}memchr in std::find.May 10 2023, 2:23 PM
philnik added inline comments.May 10 2023, 4:15 PM
libcxx/include/cwchar
248–249

I went for the "only optimize if we have __builtin_wmemchr" option instead. IMO we should try to apply this optimization as broadly as possible, and we can at least heavily influence how clang optimizes __builtin_wmemchr. This means that GCC won't optimize this as well, but I think this is OK given that by far most libc++ users use clang.

philnik updated this revision to Diff 521150.May 10 2023, 4:25 PM

Try to fix CI

philnik updated this revision to Diff 521352.May 11 2023, 9:43 AM

Try to fix CI

philnik updated this revision to Diff 521393.May 11 2023, 11:45 AM

Try to fix CI

philnik updated this revision to Diff 521442.May 11 2023, 2:06 PM

Fix formatting

philnik updated this revision to Diff 521496.May 11 2023, 4:48 PM

Try to fix CI

philnik updated this revision to Diff 521693.May 12 2023, 9:48 AM

Try to fix CI

philnik updated this revision to Diff 521774.May 12 2023, 1:18 PM

Try to fix CI

philnik updated this revision to Diff 522193.May 15 2023, 7:58 AM

Try to fix CI

Mostly LGTM but I'd like to see again before merging, especially after splitting up the test changes.

libcxx/benchmarks/algorithms/find.bench.cpp
22

We should stop/start the timing to avoid including this bit in the benchmark, since this is calling into mt199whatever which is non trivial.

You can use state.PauseTiming() and ResumeTiming() to do this. You should also check whether that introduces a bunch of noise in the timings, I've noticed that before.

libcxx/include/__algorithm/find.h
65

HIDE_FROM_ABI while we're at it.

libcxx/include/__string/constexpr_c_functions.h
85–87

I would suggest this instead, otherwise it looks like the memcpy is trying to tell us something but it's really just a more complicated way to do what's below:

return static_cast<_Tp*>(__builtin_memchr(__str, *reinterpret_cast<char*>(&__value), __count));
libcxx/test/std/algorithms/alg.nonmodifying/alg.find/ranges.find.pass.cpp
198 ↗(On Diff #522193)

Could you please split these removals in a different review so we can talk about them separately?

philnik updated this revision to Diff 522244.May 15 2023, 9:59 AM
philnik marked 4 inline comments as done.

Address comments

philnik updated this revision to Diff 522273.May 15 2023, 11:14 AM

Rebased on top of D150588

philnik added inline comments.May 16 2023, 9:59 AM
libcxx/benchmarks/algorithms/find.bench.cpp
22

When adding them the times are in the hundreds of nanoseconds for single elements. Also, the documentation says:

// NOTE: PauseTiming()/ResumeTiming() are relatively
// heavyweight, and so their use should generally be avoided
// within each benchmark iteration, if possible.
ldionne accepted this revision.May 17 2023, 8:08 AM

LGTM w/ green CI!

This revision is now accepted and ready to land.May 17 2023, 8:08 AM
philnik updated this revision to Diff 523094.May 17 2023, 10:09 AM

Try to fix CI

philnik updated this revision to Diff 524866.May 23 2023, 1:40 PM

Try to fix CI

philnik updated this revision to Diff 525203.May 24 2023, 8:54 AM

Try to fix CI

philnik updated this revision to Diff 525342.May 24 2023, 2:43 PM

Try to fix CI

This revision was automatically updated to reflect the committed changes.

I think this might have broken one of the sanitizer buildbots:

https://lab.llvm.org/buildbot/#/builders/37/builds/22340

+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new
+wmemchr U
+ echo 'Failed: unexpected symbols'

(it might just need to have that line added?)

I think this might have broken one of the sanitizer buildbots:

https://lab.llvm.org/buildbot/#/builders/37/builds/22340

+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new
+wmemchr U
+ echo 'Failed: unexpected symbols'

(it might just need to have that line added?)

Yeah, that looks like a compiler-rt thing. I guess they check which symbols they rely on (and apparently now they rely on ẁmemchr`). Given that wmemcpy, wmemmove and wmemset are also listed there, it's probably fine to add wmemchr.

I think this might have broken one of the sanitizer buildbots:

https://lab.llvm.org/buildbot/#/builders/37/builds/22340

+ diff -u /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/symbolizer/scripts/global_symbols.txt undefined.new
+wmemchr U
+ echo 'Failed: unexpected symbols'

(it might just need to have that line added?)

Yeah, that looks like a compiler-rt thing. I guess they check which symbols they rely on (and apparently now they rely on ẁmemchr`). Given that wmemcpy, wmemmove and wmemset are also listed there, it's probably fine to add wmemchr.

Thanks Phil! I've added it in D151484

The libcxx tests might also need updating:

https://lab.llvm.org/buildbot/#/builders/74/builds/19459/steps/14/logs/stdio

*** /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/libcxx/test/libcxx/transitive_includes/cxx26.csv
--- /b/sanitizer-x86_64-linux-bootstrap-msan/build/libcxx_build_msan/test/libcxx/Output/transitive_includes.sh.cpp.dir/t.tmp/transitive_includes.csv
***************
*** 3,8 ****
--- 3,9 ----
  algorithmcstdint
  algorithmcstring
  algorithmctime
+ algorithmcwchar
  algorithmexecution
  algorithminitializer_list
  algorithmiosfwd
***************
*** 127,138 ****
--- 128,141 ----
  cstddefversion
  ctgmathccomplex
  ctgmathcmath
+ cwcharcstddef
  cwcharcwctype
  cwctypecctype
  dequecompare
  dequecstddef
  dequecstdint
  dequecstring
+ dequecwchar
  dequeinitializer_list
  dequelimits
  dequenew
***************
*** 442,447 ****
--- 445,451 ----
  randomversion
  rangescompare
  rangescstddef
+ rangescwchar
  rangesinitializer_list
  rangesiosfwd
  rangesiterator
***************
*** 640,645 ****
--- 644,650 ----
  vectorcstdint
  vectorcstdlib
  vectorcstring
+ vectorcwchar
  vectorinitializer_list
  vectoriosfwd
  vectorlimits

Hi @philnik,
yes, correct. Thank you.
There is currently another story there.