This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Implement std::boyer_moore{, _horspool}_searcher
ClosedPublic

Authored by philnik on Mar 6 2022, 10:54 AM.

Details

Summary

This mostly copys the <experimental/functional> stuff and updates the code to current libc++ style.

Diff Detail

Event Timeline

philnik created this revision.Mar 6 2022, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 10:54 AM
Herald added a subscriber: mgorny. · View Herald Transcript
philnik requested review of this revision.Mar 6 2022, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2022, 10:54 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone requested changes to this revision.Mar 6 2022, 11:50 AM

Thanks for working on this! This will be super useful — not really to working programmers ;) but in terms of checking boxes for C++17 conformance and finally getting to remove the experimental versions!

libcxx/include/__functional/boyer_moore_searcher.h
35–40

Personally I'd prefer to see the first specialization here done as the primary template, and then we just have one specialization for _BMSkipTable<..., false>. I see it's less symmetric, but it eliminates a forward declaration (and ~7 LoC).

57

Likewise throughout. "All your ctors should be explicit unless they're used for implicit conversions" (or unless the Standard API mandates it :()

87

#include <__algorithm/fill_n.h>
Also, please prefer __table_.data() here. begin+end and data+size always go together as pairs, respectively. (It's also slightly cheaper to compile data than begin because raw pointers, but that's not the reason it raised the red flag for me.)

Actually, whoa, I thought this was our <vector> dependency, but it's not! This specialization should just use a plain old C array value_type __table_[256]; and cut out all the metaprogramming. This header shouldn't need to depend on <array> at all.
You might static_assert(numeric_limits<value_type>::max() < 256) just to be on the absolutely safe side.

123

Let's use __last - __first and save some instantiations. These are random-access iterators.

125

First reaction: Why depend on shared_ptr? Second reaction: Why two dereferences (one for the shared_ptr and another for the vector)? At the very least this should use make_shared<difference_type[]>(__pattern_length_ + 1).

But it would be awesome if we could eliminate the shared_ptr too, somehow. I assume it's in there because std::search wants to copy these things but we don't want the copy to be expensive.

Could we just make std::search not copy these things? And then it wouldn't matter if the copy was expensive, and we could just use std::vector<D> instead of std::shared_ptr<D[]>. (The copy ctor still needs to exist, IIUC, but it doesn't have to be fast, if we can claim it's the user's fault if they ever call it.)

187

As above, __last - __first — this is instantiated only with RAI and reverse_iterator<RAI>, both of which are random-access.

193
206–207

Nit: Should be if (__first == __last) or if (__count == 0), I'd think. size_t can't be negative.

libcxx/include/optional
165–167 ↗(On Diff #413306)

This, and the diff in the <optional> test, looks unrelated to boyer_moore_searcher. Please do this in a separate PR (which I'll rubberstamp if CI is green).

This revision now requires changes to proceed.Mar 6 2022, 11:50 AM
philnik marked 8 inline comments as done.Apr 21 2022, 4:59 AM
philnik added inline comments.
libcxx/include/__functional/boyer_moore_searcher.h
35–40

Is that purely style or is there any technical reason to change it?

87

I think it's easier to read when using an array.

125

std::search doesn't copy this. It just gets a const ref and returns the search result. I'm not sure why you wouldn't just call the search directly, but hey. I'd argue that using std::shared_ptr<D[]> might be the better alternative, since after construction the elements will never change. And AFAIK it would be cheaper to copy and as cheap to move.

philnik updated this revision to Diff 424154.Apr 21 2022, 4:59 AM
philnik marked 2 inline comments as done.
  • Address comments
philnik updated this revision to Diff 424591.Apr 22 2022, 1:25 PM
  • generate files
ldionne accepted this revision.Apr 26 2022, 1:27 PM

LGTM once CI is green and comments are addressed. I'm fine with the code not being perfect since we're basically copying it from <experimental/>. I'm fine (and actually prefer) making changes as follow-up patches.

libcxx/test/std/utilities/optional/optional.hash/enabled_hash.pass.cpp
28

Please add a release note mentioning that we will remove the experimental version in two releases, i.e. LLVM 17. And also mark the experimental feature as deprecated. This is documented in libcxx/docs/DesignDocs/ExperimentalFeatures.rst.

30

Can we tick some boxes in the status pages?

philnik updated this revision to Diff 426221.Apr 30 2022, 6:20 AM
philnik marked an inline comment as done.
  • Try to fix CI
philnik marked an inline comment as done.Apr 30 2022, 6:25 AM
philnik added inline comments.
libcxx/test/std/utilities/optional/optional.hash/enabled_hash.pass.cpp
30

No, we can't. It's part of library fundamentals, and I think not everything has been implemented yet. Maybe we should add a status page for that, because it's really hard to find out what is already implemented.

philnik updated this revision to Diff 426226.Apr 30 2022, 6:27 AM
philnik marked an inline comment as done.

Rebase onto main

philnik updated this revision to Diff 426294.May 1 2022, 5:20 AM
  • Try to fix CI
adamdebreceni added inline comments.
libcxx/include/__functional/boyer_moore_searcher.h
209

shouldn't this be vector<difference_type>?

philnik updated this revision to Diff 428326.May 10 2022, 3:15 AM
  • Try to fix GCC
libcxx/include/__functional/boyer_moore_searcher.h
209

Why do you think it should be vector<difference_type>?

adamdebreceni added inline comments.May 11 2022, 1:24 AM
libcxx/include/__functional/boyer_moore_searcher.h
209

if value_type is signed and the range consists of e.g. the same values with length more than the max, __compute_bm_prefix will cause a signed overflow, moreover as __scratch can now contain negative values we could index out of __suffix_

ldionne requested changes to this revision.May 11 2022, 9:35 AM
ldionne added inline comments.
libcxx/include/__functional/boyer_moore_searcher.h
209

We should add a test and fix this bug in both this version and the experimental/functional version as well (even though we'll eventually remove that one).

This revision now requires changes to proceed.May 11 2022, 9:35 AM
adamdebreceni added inline comments.May 17 2022, 12:10 AM
libcxx/include/__functional/boyer_moore_searcher.h
209

if we plan on fixing the expirmental/functional we should add a +1 to the size in the _BMSkipTable array-based specialization typedef std::array<value_type, numeric_limits<unsigned_key_type>::max()> skip_map; otherwise we could index out of the array (not an issue in the new one)

philnik updated this revision to Diff 436335.Jun 13 2022, 3:54 AM
philnik marked 4 inline comments as done.
  • Add test and fix bug
ldionne accepted this revision.Jun 16 2022, 11:22 AM

LGTM but please fix the experimental version too since it's so trivial.

libcxx/include/__functional/boyer_moore_searcher.h
209

This comment above was marked as done but wasn't done AFAICT.

philnik updated this revision to Diff 437696.Jun 16 2022, 1:50 PM
  • Rebased
  • Address comments
philnik updated this revision to Diff 437851.Jun 17 2022, 4:50 AM
  • Try to fix CI
philnik updated this revision to Diff 437856.Jun 17 2022, 5:09 AM
  • Next try
philnik updated this revision to Diff 437887.Jun 17 2022, 7:03 AM
  • Try to fix CI
This revision was not accepted when it landed; it landed in state Needs Review.Jun 17 2022, 10:10 AM
This revision was automatically updated to reflect the committed changes.