This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold strlen and strnlen recursively.
AbandonedPublic

Authored by msebor on Apr 14 2022, 1:28 PM.

Details

Reviewers
nikic
xbolva00
Summary

This patch in the strnlen series introduces a recursive function, LibCallSimplifier::minStringLength(), to handle calls with pointer arguments involving nested conditional expressions. This also enhances the folding of strlen. For strnlen, only constant bounds are handled.

Depends on D123818.

Diff Detail

Event Timeline

msebor created this revision.Apr 14 2022, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 1:28 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
msebor requested review of this revision.Apr 14 2022, 1:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 1:28 PM
msebor updated this revision to Diff 425598.Apr 27 2022, 1:35 PM

Patch rebased on top of the latest trunk, including all dependencies.

nikic added a comment.May 3 2022, 3:52 AM

As mentioned on the original review, I think we would be better off not passing the Bound to minStringLength() and instead doing one umin at the end -- this avoids the need to handle this in each branch. Though possibly I forgot some reason why that doesn't work from the previous discussion.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
714

Why is this condition needed?

nikic added inline comments.May 3 2022, 3:54 AM
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
635

Can pass in IRBuilderBase by-reference? It seems like part of the changes here are just B. to B->.

msebor added a comment.May 3 2022, 9:43 AM

As mentioned on the original review, I think we would be better off not passing the Bound to minStringLength() and instead doing one umin at the end -- this avoids the need to handle this in each branch. Though possibly I forgot some reason why that doesn't work from the previous discussion.

You made this suggestion in the context of the select statement where I implemented it (see line 748). Removing the Bound (and MaxLen) from the function completely would prevent handling variable offsets in strlen while punting on them in strnlen as you requested in D122686. It would also prevent ultimately handling it in strnlen as proposed D123820 in cases that the strlen folding cannot handle (arrays with embedded nuls where the bound is less than the length of any of the substrings).

Splitting up the variable offset handling between two patches makes the work harder to think about and, it seems, also harder to review, rather than easier as you'd hoped. Would it help if I merged them into one?

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
635

Passing it by reference means the lambda capture cannot be simply by value. Instead, each lambda must explicitly specify the capture for each variable, making it more verbose, harder to read, etc.

714

To prevent folding strnlen calls with variable offsets as you requested in the previous review. See for example fold_strnlen_s3_pi_n in strnlen-3.ll and others. The folding is enabled in D123820.

...cases that the strlen folding cannot handle (arrays with embedded nuls where the bound is less than the length of any of the substrings).

Whoops, that was a thinko. Those are cases that strnlen cannot safely handle. Those it can handle that strlen can't are unterminated arrays with variable offsets.

nikic added a comment.May 4 2022, 8:20 AM

As mentioned on the original review, I think we would be better off not passing the Bound to minStringLength() and instead doing one umin at the end -- this avoids the need to handle this in each branch. Though possibly I forgot some reason why that doesn't work from the previous discussion.

You made this suggestion in the context of the select statement where I implemented it (see line 748). Removing the Bound (and MaxLen) from the function completely would prevent handling variable offsets in strlen while punting on them in strnlen as you requested in D122686. It would also prevent ultimately handling it in strnlen as proposed D123820 in cases that the strlen folding cannot handle (arrays with embedded nuls where the bound is less than the length of any of the substrings).

Splitting up the variable offset handling between two patches makes the work harder to think about and, it seems, also harder to review, rather than easier as you'd hoped. Would it help if I merged them into one?

Okay, I think I have made things a bit confusing here: The bit I was originally concerned about is the special handling for the NullTermIdx == ~((uint64_t)0) case -- I don't think the implementation from D123820 really makes sense. I don't think there's a strong need to split off the handling of variable bounds as long as that special case is not included.

Basically, what I would expect is something along these two patches:
https://gist.github.com/nikic/d847d507ee89f9a2ccb07e39c18ea356 - This extends strnlen support to all the cases strlen supports
https://gist.github.com/nikic/dfa8018bc82ab6347887c8c65249f084 - This adds handling for the special "null not contained in string" case

The latter patch doesn't make a distinction between strlen and strnlen: If there is no null in the string, we just return -1, which will result in the bound getting picked as the final result for the strnlen case, which is the only possible non-UB result. The strlen case is always UB, thus returning -1 is fine there as well. Passing the Bound does allow us to distinguish these and avoid optimizing some UB cases -- but it also does cause additional complexity in this case, because it means the Bound is not handled in a single place.

I'm not sure where this stands now. Are you expecting me to submit your patches for review and then incorporate the recursion part separately?

Cases like strlen (x ? s1 : s2) should be handled in more general fashion and rewrite fn (x ? s1 : s2) as x ? fn(s1) : fn(s2) if fn(s1) and (or?) fn(s2) can be optimized to just some (constant) value.

Same for more chains like fn (x == 3 ? s3 : x == 5 ? s5 : s7) to x == 3 ? fn(s3) : x == 5 ? fn(s2) : fn(s5).

msebor added a comment.EditedMay 27 2022, 9:55 AM

Cases like strlen (x ? s1 : s2) should be handled in more general fashion

Agreed. Such a general rewrite seems outside the scope of the simple transformations done by the folder (when either s1 or s2 is not constant). Does that mean we don't want the proposed change and should instead wait for the more general transform?

For what it's worth, GCC does this transform in its Partial Redundancy Elimination pass that sits on top of Global Value Numbering. I don't see a PRE pass in LLVM and its GVN pass documents its purpose as "to eliminate fully redundant instructions." So it seems like the transformation we're looking for might be a suitable enhancement for that pass.

nikic added a comment.May 31 2022, 9:49 AM

I'm not sure where this stands now. Are you expecting me to submit your patches for review and then incorporate the recursion part separately?

Yeah, that's what I'd suggest. Maybe not exactly those patches, but something that's functionally similar.

For the recursion part, there are a few additional issues we'd want to consider: a) reused operands in nested selects, b) potential infinite loops in unreachable code (not sure if SLC can run on unreachable code), c) potentially non-profitable transform for multi-use selects, d) potentially deep recursion without limit. (Some of those have a common solution.)

Cases like strlen (x ? s1 : s2) should be handled in more general fashion and rewrite fn (x ? s1 : s2) as x ? fn(s1) : fn(s2) if fn(s1) and (or?) fn(s2) can be optimized to just some (constant) value.

Same for more chains like fn (x == 3 ? s3 : x == 5 ? s5 : s7) to x == 3 ? fn(s3) : x == 5 ? fn(s2) : fn(s5).

We generally cannot optimize fn(x ? s1 : s2) to x ? fn(s1) : fn(s2), because this would unconditionally execute fn() on both inputs. We can only do this if both sides fold to a speculatable result. Otherwise this is only possible by introducing a branch, which is not possible in SLC (and has unclear profitability).

For what it's worth, GCC does this transform in its Partial Redundancy Elimination pass that sits on top of Global Value Numbering. I don't see a PRE pass in LLVM and its GVN pass documents its purpose as "to eliminate fully redundant instructions." So it seems like the transformation we're looking for might be a suitable enhancement for that pass.

GVN does perform PRE as well. Though I believe it is only performed for phis, not selects (at least for scalar PRE, I think load PRE does handle select). Though to be honest I don't really understand why this particular transform would fall under the purview of PRE.

msebor abandoned this revision.Jun 14 2022, 3:07 PM

@nikic, it doesn't make sense to me to submit your work for your review. Please feel free to submit the patches yourself. I'll work on the recursive approach separately.