This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyLibCalls] Simplify strlen to a subtraction for certain cases
ClosedPublic

Authored by lihuang on Mar 16 2016, 3:02 PM.

Details

Summary

This patch implements this optimization:

strlen(s + x)  --> strlen(s) - x,

where s is a pointer to a string literal, and x is an integer variable. Note that strlen(s) is a constant which is known at compile time. This optimization is legal when one of the following is true:

A. x is known to be within the range [0, strlen(s)]
B. The string literal is null-terminated and has only one null terminator "\0" at its end, and (s + x) is proved to be an out-of-bound pointer when x < 0 or x > strlen(s).

Case A is straightforward. For case B, it's legal to do this optimization because strlen will have undefined behavior when (s + x) is out of bound.

Diff Detail

Repository
rL LLVM

Event Timeline

lihuang updated this revision to Diff 50860.Mar 16 2016, 3:02 PM
lihuang retitled this revision from to [SimplifyLibCalls] Simplify strlen to a subtraction for certain cases.
lihuang updated this object.
lihuang added reviewers: spatel, DavidKreitzer.
lihuang added a subscriber: llvm-commits.
nlopes added a subscriber: nlopes.Mar 16 2016, 4:13 PM

Could you please explain the motivation for this transformation?
Doesn't seem very profitable to me unless strlen(s) has already been computed before. Otherwise we end up visiting more memory positions than with the original code.

We only do this optimization when s is a string literal, so that strlen(s) could be computed at compile time. The motivation is, this could save some function calls, and gcc has this optimization.

majnemer added inline comments.
lib/Transforms/Utils/SimplifyLibCalls.cpp
538–558 ↗(On Diff #50860)

Have you tried using something like GetPointerBaseWithConstantOffset? You might be able to replace this code with that.

lihuang added inline comments.Mar 17 2016, 11:13 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
538–558 ↗(On Diff #50860)

The GEP in this case is not with a constant offset, as x is a variable. So GetPointerBaseWithConstantOffset doesn't help here.

This code actually duplicates part of the code in getConstantStringInfo, but I cannot just use getConstantStringInfo for GEP because it only works with constant offset. I need the same checks here to make sure that the GEP is pointing to a string and the first index is 0.

Just a kindly ping :)

Hi Sanjay, do you have any comments on this change?

Thanks!

spatel edited edge metadata.Apr 5 2016, 11:21 AM

Can you rebase this patch after http://reviews.llvm.org/rL265431 ?
You can use utils/update_test_checks.py to regenerate the CHECK lines.
Sorry for the extra effort, but I used your test cases as experiments for the script.

spatel added inline comments.Apr 5 2016, 11:24 AM
lib/Transforms/Utils/SimplifyLibCalls.cpp
541–558 ↗(On Diff #50860)

These checks are included in getConstantStringInfo(). You do not need to repeat them here?

Sure, I will rebase the patch after 265431.

lib/Transforms/Utils/SimplifyLibCalls.cpp
541–558 ↗(On Diff #50860)

Yes these lines are the same lines in getConstantStringInfo(). The reason I didn't use getConstantStringInfo() here is it only works for GEP with a constant offset. In this case, the offset is a variable, and we need the same checks.

lihuang updated this revision to Diff 52749.Apr 5 2016, 5:28 PM
lihuang edited edge metadata.

Hi Sanjay,

I have updated the test using your script, it's an awesome script!

So, uh, why not make getConstantStringInfo work for this case too, rather
than duplicate it?

I tried to make getConstantStringInfo work for this case but didn't find a good way.

I think this case needs a different logic than getConstantStringInfo, though requiring same checks on the GEP. getConstantStringInfo returns the string pointed to by the input pointer, but the string is not decided at compile time when the input pointer is a GEP with variable index. We could extend the logic of getConstantStringInfo, but that will need changing the signature and refactoring lots of code, as this function is already used in many places.

In order to avoid duplicate code, I could extract the GEP part from getConstantStringInfo into another function so it could be shared.

lihuang updated this revision to Diff 53356.Apr 11 2016, 8:31 PM

Updated the changed. Added a helper function in ValueTracking.cpp/h to check if a given GEP is based on a pointer to string and is indexing into that string. This helper function can be used by both getConstantStringInfo() and optimizeStrlen().

spatel added inline comments.Apr 12 2016, 9:45 AM
include/llvm/Analysis/ValueTracking.h
185–186 ↗(On Diff #53356)

Doxygen comments start with 3 slashes ('///'). Also, don't repeat the function name in the comment:
http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments

Feel free to fix the rest of the file in a separate commit. :)

lib/Analysis/ValueTracking.cpp
2654–2655 ↗(On Diff #53356)

Don't repeat the block comment from the header file.

lib/Transforms/Utils/SimplifyLibCalls.cpp
551–552 ↗(On Diff #53356)

You can move this down to where it is actually used.

lihuang updated this revision to Diff 53430.Apr 12 2016, 11:08 AM
lihuang marked 3 inline comments as done.Apr 12 2016, 11:13 AM
lihuang added inline comments.
include/llvm/Analysis/ValueTracking.h
185–186 ↗(On Diff #53430)

Thanks! Will update the rest of this file :)

spatel accepted this revision.Apr 12 2016, 11:21 AM
spatel edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Apr 12 2016, 11:21 AM
This revision was automatically updated to reflect the committed changes.