This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add two-way strstr algorithm.
Needs RevisionPublic

Authored by cgyurgyik on Aug 18 2020, 12:00 PM.

Details

Summary

[libc] This adds the two-way strstr algorithm, and uses it when the length of the haystack is above a certain threshold, for which I picked at random (currently). I look forward to your reviews.

Rationale: The two-way matching algorithm allows for string matching in linear time and constant space complexity. However, it (1) adds a lot of complexity to the code, and (2) Is probably worse when the haystack and needle are small. When this "change-over" occurs for when Two Way >> Brute Force, I am currently unsure. I could write a few benchmarks and get a rough estimate, but looking for input first on the best way to approach this.

As far as testing, both the brute force and the two-way algorithms pass the unit tests and the fuzz test for strstr.

Diff Detail

Event Timeline

cgyurgyik created this revision.Aug 18 2020, 12:00 PM
cgyurgyik requested review of this revision.Aug 18 2020, 12:00 PM
cgyurgyik added a comment.EditedAug 19 2020, 10:55 AM

QuickBenchmark

I've provided some quick benchmarks, which takes a string of length N and uses both the two-way algorithm and the brute force algorithm on a relatively short needle. This is run 4 times with the following placements:

  1. Needle in the beginning.
  2. Needle in the middle.
  3. Needle in the end.
  4. Needle not in the string.

What this does NOT account for: long needles, patterns that might make the brute force or two-way algorithm worse.
Without profiling, this obviously doesn't mean much, but gives us some idea of where the change-over may be.

Edit: Here is another example benchmark with a much longer haystack (and still a relatively small needle): https://quick-bench.com/q/CLFSyNlLlPkkyuPTSu7TD6sgiuI

cgyurgyik updated this revision to Diff 287274.Aug 23 2020, 2:29 PM

Use operator< for internal::Max.

cgyurgyik updated this revision to Diff 287276.Aug 23 2020, 2:34 PM

As far as testing, both the brute force and the two-way algorithms pass the unit tests and the fuzz test for strstr.

In case you have some supplementary materials that you may want to share with reviewers, there is a Files application of Phabricator (usually, you can just drag files over the editor of web UI).

As far as testing, both the brute force and the two-way algorithms pass the unit tests and the fuzz test for strstr.

In case you have some supplementary materials that you may want to share with reviewers, there is a Files application of Phabricator (usually, you can just drag files over the editor of web UI).

Good to know, thank you.

abrachet requested changes to this revision.Jul 6 2022, 11:55 AM

Is this still actively being worked on? I'm requesting changes to get this out of my queue

This revision now requires changes to proceed.Jul 6 2022, 11:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 11:55 AM