This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold memchr of sequences of same characters
ClosedPublic

Authored by msebor on May 26 2022, 7:04 PM.

Details

Summary

This change enhances the memchr library function handler to fold calls with one or two consecutive sequences of the same character, analogously to the corresponding feature recently added to the memrchr handler (D123631). Rather than just one sequence as for memrchr, this one handles also two sequences in order to also benefit strchr.

Tested on x86_64-linux by building Clang and running make check-all.

Diff Detail

Event Timeline

msebor created this revision.May 26 2022, 7:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 7:04 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.May 26 2022, 7:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 7:04 PM
msebor edited the summary of this revision. (Show Details)May 27 2022, 8:18 AM
msebor added reviewers: bkramer, xbolva00, nikic.
nikic added inline comments.May 30 2022, 3:49 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
302

Unrelated change (in strchr rather than memchr)?

1055

As far as I can tell EndOff doesn't actually get used later, it's only used for the substr(), so I'm not sure why this code is changing rather than only being moved.

I think in https://reviews.llvm.org/D123472 the wrong version of the patch ended up being committed, I remember there being a static substr() function for this purpose, but it doesn't seem to be present on main.

1066

I don't think this is correct for the case where N is variable. Let's say we have S = "12", C = 2 and N = 1 (known only at runtime). The correct result for this is null. I believe your code would return S + 1.

There doesn't seem to be test coverage for this case (variable N with two runs -- there's only one with three runs).

llvm/test/Transforms/InstCombine/memchr-6.ll
19

Why does this one not fold?

53

Unrelated: There's a canonicalization opportunity here to convert C1 ? X : (C2 ? X : Y) to (C1 ? true : C2) ? X : Y` (or vice versa, but I'd consider logical or more canonical here).

msebor updated this revision to Diff 433202.May 31 2022, 3:10 PM
msebor marked an inline comment as done.
  • Add a substr helper function and call it.
  • Remove a spurious change to LibCallSimplifier::optimizeStrChr.
  • Add a missing conditional and a test case for it.
msebor added inline comments.May 31 2022, 3:10 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1055

Yes, I must have committed a prior version of the patch.

1066

Good catch! I've added the missing conditional and a test case for it.

(For what it's worth, plain C runtime tests would help spot these mistakes sooner.)

llvm/test/Transforms/InstCombine/memchr-6.ll
19

Because a00000 has no initializer getConstantDataArrayInfo returns a null Slice.Array that getConstantStringInfo then fails for. I've added a comment to the test.

I thought I'd handled it D125114 but apparently it needs another enhancement. I can take care of it in a followup.

53

I've reworked the conditional a bit and I'm not sure I see this opportunity there anymore but let me know if I missed it.

nikic accepted this revision.Jun 1 2022, 1:21 AM

LGTM

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1055

I'd suggest landing the substr() function as a separate NFC patch, as it's not directly related to this one (and we'd want to replace the other usages at the same time as well).

llvm/test/Transforms/InstCombine/memchr-6.ll
19

Hrm, that would probably also require an API change, because we don't have any already allocated string we could create a StringRef to.

This revision is now accepted and ready to land.Jun 1 2022, 1:21 AM
This revision was landed with ongoing or failed builds.Jun 7 2022, 12:28 PM
This revision was automatically updated to reflect the committed changes.