This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold memrchr calls with sequences of identical bytes.
ClosedPublic

Authored by msebor on Apr 12 2022, 1:41 PM.

Details

Summary

This change implements folding of memrchr calls with sequences of identical characters.

Depends on D123629.

Diff Detail

Event Timeline

msebor created this revision.Apr 12 2022, 1:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 1:41 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.Apr 12 2022, 1:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 1:41 PM
msebor edited the summary of this revision. (Show Details)Apr 12 2022, 1:51 PM
msebor updated this revision to Diff 422360.Apr 12 2022, 4:09 PM

Add context.

nikic added a comment.Apr 14 2022, 5:55 AM

Just wondering if there's any specific motivation for this pattern, as it seems oddly specific. Is this something that GCC implements?

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
984

Why do we need the N <= sizeof S comparison here? As that would be UB, I think we can omit it here.

msebor added inline comments.Apr 14 2022, 8:59 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
984

I did it for the same reason as the folding of the excessive bounds to null. In my view, if it can be done cheaply it's preferable to avoid out of bounds accesses. GCC takes this approach in some cases (e.g., for strlen("123" + i)) but not in others. But unless these cases are handled consistently in LLVM I don't think it matters much one way or the other either. I can remove the check if you prefer. (It would be helpful to have some documented guidance on how these issues are expected to be handled in general, or to try to come to a consensus if it's not yet clear.)

My motivation for recognizing this particular pattern is to optimize searches in zero-initialized arrays (including the trailing sequences of nulls following initial strings in bigger arrays, ending at the offset of one of the nulls). The former are rare on their own but common as members of bigger aggregates. They're yet to be handled by LLVM (the latter is only handled in strlen) but my hope is to add support for both to all the function folders. I'd also like to extend this pattern to memchr, for the same reason. (The non-null cases aren't as important but they naturally fall out of this.)

GCC doesn't have a built-in for memrchr so it doesn't do anything interesting with calls to it. Its `memchr folder is very simplistic and doesn't support this pattern (unlike LLVM, it does handle searches in all kinds of constant aggregates).

nikic added inline comments.Apr 14 2022, 9:20 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
984

I did it for the same reason as the folding of the excessive bounds to null. In my view, if it can be done cheaply it's preferable to avoid out of bounds accesses. GCC takes this approach in some cases (e.g., for strlen("123" + i)) but not in others. But unless these cases are handled consistently in LLVM I don't think it matters much one way or the other either. I can remove the check if you prefer. (It would be helpful to have some documented guidance on how these issues are expected to be handled in general, or to try to come to a consensus if it's not yet clear.)

I don't think there is any official guidance for this, but my 2c is that we're free to exploit UB aggressively, but also don't need to go out of the way to do so, if there's no benefit. In D123628 the choice was between retaining the libcall (and thus sanitizer checks) or returning null. In this case, the libcall is going to be removed either way, and we're going to return an "incorrect" result either way, whether that is s + n or null. I don't think one of those values is preferable over the other. As such, we should make the choice which results in fewer instructions here, which is to omit the additional bounds check.

My motivation for recognizing this particular pattern is to optimize searches in zero-initialized arrays (including the trailing sequences of nulls following initial strings in bigger arrays, ending at the offset of one of the nulls). The former are rare on their own but common as members of bigger aggregates. They're yet to be handled by LLVM (the latter is only handled in strlen) but my hope is to add support for both to all the function folders. I'd also like to extend this pattern to memchr, for the same reason. (The non-null cases aren't as important but they naturally fall out of this.)

Ah, I see, thanks for the context.

msebor updated this revision to Diff 425294.Apr 26 2022, 1:46 PM

Remove bounds check on the assumption it's valid.

msebor added a comment.EditedApr 26 2022, 2:31 PM

There is a build error with the patch that I don't understand:

Build 240741: pre-merge checks patch application failed

ERROR error: patch failed: llvm/test/Transforms/InstCombine/memrchr-2.ll:1
error: llvm/test/Transforms/InstCombine/memrchr-2.ll: patch does not apply

There is no memrchr-2.ll in this patch. Not sure what to make of it.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
984

I forgot this is still open. I'll go ahead and remove the check. Just one remark: the essential difference between returning null and an invalid past-the-end pointer is that most callers are prepared to handle the former while none can meaningfully deal with the latter. Whether the null result is correct or not depends. The invalid result is never correct and will almost certainly cause trouble.

I wonder if the error has something to do with merging the patch from D123628 into the next one in the series (D123629) in the final commit. Should I abandon the former?

nikic added a comment.Apr 27 2022, 2:52 AM

I wonder if the error has something to do with merging the patch from D123628 into the next one in the series (D123629) in the final commit. Should I abandon the former?

Yes, that's likely. Pre-merge checks apply dependencies of the patch first, though I'm not sure how exactly that works if some of them are closed. In either case, you can abandon the patch as it is no longer relevant.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
962

Use the helper function that avoids the 32-bit truncation issue?

970

I don't get the AKA part of this comment.

977

I think this should be using CreateLogicalAnd, in case CharVal is poison (unless we want to argue that this memrchr argument cannot be poison, even for a zero bound).

msebor added inline comments.May 6 2022, 4:47 PM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
962

The issue can't happen here because EndOff is either restricted to at most Str.size() earlier on, or, for variable offsets, left initialized to UINT64_MAX and then implicitly converted to SIZE_MAX AKA std::npos when passed to substr.

970

When N is constant it's not zero at this point (constants zero and one are handled above). It's might be a vestige of the constant zero handling being done later in one of the early patches. Let me remove it.

977

I don't have enough experience with poison to understand how it makes a difference but sure, CreateLogicalAnd sounds fine.

(Since CharVal is treated as an unsigned char by these functions, any value should be allowed here, including indeterminate/uninitialized. What exactly that means in practical C terms has been debated but the official albeit sometimes contested position is that each evaluation of an indeterminate unsigned char might yield a different result).

msebor updated this revision to Diff 431721.May 24 2022, 10:57 AM

Use CreateLogicalAnd instead of CreateAnd and update comment.

nikic accepted this revision.May 24 2022, 12:12 PM

LGTM

This revision is now accepted and ready to land.May 24 2022, 12:12 PM
This revision was landed with ongoing or failed builds.May 24 2022, 4:03 PM
This revision was automatically updated to reflect the committed changes.