This is an archive of the discontinued LLVM Phabricator instance.

[Support] Use find() for faster StringRef::count.
ClosedPublic

Authored by ishitatsuyuki on Sep 11 2022, 12:19 AM.

Details

Summary

While profiling InclusionRewriter, it was found that counting lines was
so slow that it took up 20% of the processing time. Surely, calling
memcmp() of size 1 on every substring in the window isn't a good idea.

Use StringRef::find() instead; in the case of N=1 it will forward to
memcmp which is much more optimal. For 2<=N<256 it will run the same
memcmp loop as we have now, which is still suboptimal but at least does
not regress anything.

Diff Detail

Event Timeline

ishitatsuyuki created this revision.Sep 11 2022, 12:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2022, 12:19 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
ishitatsuyuki requested review of this revision.Sep 11 2022, 12:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2022, 12:19 AM
ishitatsuyuki retitled this revision from [Support] Use find() for faster StringRef::count. to [Support] Optimize StringRef::count,find..Sep 11 2022, 12:21 AM
ishitatsuyuki edited the summary of this revision. (Show Details)
ishitatsuyuki added a reviewer: chandlerc.

A note that the N=2 search code is mostly mirrored from glibc; I believe it's short enough to not cause copyright issues but please let me know if I should be more defensive here.

https://codebrowser.dev/glibc/glibc/string/memmem.c.html#70

nikic added a subscriber: nikic.Sep 11 2022, 12:44 AM

Please split this into two patches, one for count(), and one for find().

Split revision; this becomes the first revision (StringRef::count).

ishitatsuyuki retitled this revision from [Support] Optimize StringRef::count,find. to [Support] Use find() for faster StringRef::count..Sep 11 2022, 1:00 AM
ishitatsuyuki edited the summary of this revision. (Show Details)

The split is done; the second revision is D133660.

nikic accepted this revision.Sep 11 2022, 2:17 AM

LGTM

This revision is now accepted and ready to land.Sep 11 2022, 2:17 AM

Looks like this still hasn't been committed. Can anyone help me commit this (as I don't have commiter rights)?

nikic requested changes to this revision.Oct 26 2022, 2:45 AM
nikic added inline comments.
llvm/lib/Support/StringRef.cpp
386

The !N condition needs to be retained, otherwise we will go into an infinite loop for an empty Str. I believe this causes the ADTTest timeout.

(Returning 0 for an empty string isn't really right, this should return Length + 1, but this is orthogonal to your change.)

This revision now requires changes to proceed.Oct 26 2022, 2:45 AM

Restore check for zero N

ishitatsuyuki added inline comments.Oct 26 2022, 4:11 AM
llvm/lib/Support/StringRef.cpp
386

Looks like there's a test with hardcoded expect result of 0. Should I change the test too or retain the old behavior?

nikic added inline comments.Oct 26 2022, 5:58 AM
llvm/lib/Support/StringRef.cpp
386

I'd suggest keeping the old behavior in this patch (maybe with a TODO) to keep it NFC. We should fix this separately.

Restore behavior when Str == "" and add TODO

nikic accepted this revision.Oct 27 2022, 1:12 AM

LGTM

This revision is now accepted and ready to land.Oct 27 2022, 1:12 AM
This revision was landed with ongoing or failed builds.Oct 27 2022, 1:22 AM
This revision was automatically updated to reflect the committed changes.