This is an archive of the discontinued LLVM Phabricator instance.

Strlen loop idiom recognition
Needs ReviewPublic

Authored by anjankgk on Sep 28 2020, 6:19 PM.

Details

Summary

(Original patch by musman under D83392. I am planning to commit this in his absence, with his consent.)
This patch adds strlen to LoopIdiomRecognize.cpp. It is the first part of 3 patches:

  1. Strlen loop idiom recognition
  2. Strlen16 recognition and creation of new strlen16 intrinsic
  3. Folding of strlen/strlen16 call if they are only used for zero equality comparison (replace with load of first char)

Example that this recognizes:

unsigned strlen(char *Str) {

char *Src = Str;
if (!Src)
  return 0;
for (; *Src;)
  Src++;
return Src - Str;

}

Diff Detail

Event Timeline

anjankgk created this revision.Sep 28 2020, 6:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 6:19 PM
anjankgk requested review of this revision.Sep 28 2020, 6:19 PM
shawnl requested changes to this revision.Oct 2 2020, 6:57 PM

This pass should recognise the nonnull attribute, as it is analyzing against null.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1614

If the passed pointer has the "nonnull" attribute then this condition need not be present.

This revision now requires changes to proceed.Oct 2 2020, 6:57 PM
shawnl added a comment.Oct 3 2020, 5:33 AM

It is possible DCE will handle the null check with "nonnull" attribute, so add the test first.

Otherwise I like how little code this took compared to earlier versions.

(For future reference, Phabricator has an option "commandeer" a revision, to take control if the original author of a patch leaves a patch incomplete for whatever reason. Obviously use sparingly.)

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1614

The icmp isn't against a pointer; it's against an i8. Or, it should be; I guess the code is missing a check that it's i8, as opposed to some arbitrary integer type.

For comparison, it's undefined behavior to call strlen with a null pointer.

1666

I'm not sure how ResInst is connected to the rest of this transform; why are we transforming the operand of some random PHI node?

1683

"dead instructions", not "death instructions".

llvm/test/Transforms/LoopIdiom/recognize-strlen.ll
1

Can you generate the check lines use update_test_checks.py? It isn't obvious what the whole function looks like as a result of this transform.

shawnl resigned from this revision.Oct 4 2020, 4:15 PM
shawnl added inline comments.
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1614

My bad. I was really working from example code (and the tests also have a null check). This is why in my C code I always use '\x00' for NUL (the character) and not 0 or NULL.

This revision now requires review to proceed.Oct 4 2020, 4:15 PM
anjankgk marked 4 inline comments as done.Oct 7 2020, 3:15 PM
anjankgk added inline comments.
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

I added a comment for clarity -

// Finally, we find the phi node that corresponds to the distance between the
// pointers and replace it's uses by the call to strlen in the transformed
// code.
anjankgk updated this revision to Diff 296817.Oct 7 2020, 3:24 PM
anjankgk marked an inline comment as done.

Check for i8 type and use update_test_checks.py for testfile.

efriedma added inline comments.Oct 7 2020, 3:25 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

The patch doesn't appear to prove that anywhere.

anjankgk added inline comments.Oct 7 2020, 3:31 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

I am getting the sub instruction saved in ResInst and replacing that in line# 1689. Is that something that you prefer to be done in a different way?

efriedma added inline comments.Oct 7 2020, 3:38 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

How do you know it's a sub instruction?

anjankgk added inline comments.Oct 7 2020, 8:05 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

I agree. The current code is actually just looking for the phi node from the predecessor (LoopExitBB) block. I think, I should add a code to ensure that is actually a sub instruction. However, I am afraid that that could again lead to a pattern-matching kind of code which was initially posted.

efriedma added inline comments.Oct 7 2020, 10:36 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

What happens if we ignore ResInst, and rewrite LCSSAPhi instead? That still eliminates the inner loop; later passes should optimize the resulting arithmetic.

anjankgk added inline comments.Oct 15 2020, 2:37 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

I tried a few things, but I am not sure how to do that.
This is the i/p BB:

for.end:                                          ; preds = %for.inc
  %incdec.ptr.lcssa = phi i8* [ %incdec.ptr, %for.inc ]
  %sub.ptr.lhs.cast = ptrtoint i8* %incdec.ptr.lcssa to i64
  %sub.ptr.rhs.cast = ptrtoint i8* %Str to i64
  %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
  br label %cleanup

return type of strlen is i64 (same as ResInt which is %sub.ptr.sub), as compared to the type of lcssa instrn (%incdec.ptr.lcssa) which is a i8*. I am not clear how I can rewrite lcssa with the strlen as suggested.

efriedma added inline comments.Oct 15 2020, 2:46 PM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1666

%incdec.ptr.lcssa is going to be equivalent to something like getelementptr i8, i8* %Str, i64 %strlen. Speaking in generic terms, replace %Str with the start value of LCSSAEv. Add the two values together with SCEV, and use SCEVExpander to materialize it.

anjankgk marked an inline comment as done.Oct 19 2020, 3:24 PM
anjankgk updated this revision to Diff 299193.Oct 19 2020, 3:27 PM

Rewrite the LCSSAPhi instruction to reflect strlen idiom recognition.

Phabricator doesn't track incremental patches the way you're uploading them; please upload the full diff against master.

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1680–1681

LCSSAEv->getStart() is the value of the output of the loop; the input to strlen needs to be the input to the load (LoadEv->getStart()).

More tests would be helpful to catch this sort of issue.

1690

getPointerBase() isn't right; it strips all offsets from the pointer.

I'd like to see more regression test coverage of different variations on the values produced by the loop.

It'll be nice to know if this patch worked fine with a set of regression tests.
results on LLVM testsuite, or clang bootstrap will really help.

Thanks for working on this.

anjankgk updated this revision to Diff 312230.Dec 16 2020, 9:12 AM

Changes from previous review comments:

  • Replaced LCSSAEv->getStart() by LoadEv->getStart()
  • Added another test function where multiple LCSSAPhi nodes were present and therefore shouldn't have replaced with strlen function.

Phabricator doesn't track incremental patches the way you're uploading them; please upload the full diff against master.

Thanks @efriedma! I made the changes suggested and added another test as well. I have uploaded the right patch this time. Pls let me know in case it's still showing up wrongly.

anjankgk added a comment.EditedDec 16 2020, 9:16 AM

It'll be nice to know if this patch worked fine with a set of regression tests.
results on LLVM testsuite, or clang bootstrap will really help.

Thanks for working on this.

Thank you for the feedback @hiraditya! I have tested the latest patch at various options (O2, O3, LTO and PGO) against the SPEC benchmarks as well as run the lnt tests and bootstrap tests successfully. If you were concerned about any other specific testsuite, pls let me know.

It'll be nice to know if this patch worked fine with a set of regression tests.
results on LLVM testsuite, or clang bootstrap will really help.

Thanks for working on this.

Thank you for the feedback @hiraditya! I have tested the latest patch at various options (O2, O3, LTO and PGO) against the SPEC benchmarks as well as run the lnt tests and bootstrap tests successfully. If you were concerned about any other specific testsuite, pls let me know.

Nice, if we have performance improvements on any of the benchmarks/workloads then do share.

It'll be nice to know if this patch worked fine with a set of regression tests.
results on LLVM testsuite, or clang bootstrap will really help.

Thanks for working on this.

Thank you for the feedback @hiraditya! I have tested the latest patch at various options (O2, O3, LTO and PGO) against the SPEC benchmarks as well as run the lnt tests and bootstrap tests successfully. If you were concerned about any other specific testsuite, pls let me know.

Nice, if we have performance improvements on any of the benchmarks/workloads then do share.

I did track performance measurements on a bunch of benchmarks but there was nothing noticeable since what this patch is doing is just recognizing the idiom and replacing with the lib call. However, going forward, if we could better the implementation of these library calls (which could be target platform specific) this should give us gains.

Gentle ping...

Another gentle reminder for the review.. Thanks!

hiraditya added inline comments.Feb 16 2021, 11:00 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
1657

break when I != PN because PNs are in the beginning of a basic block.

if (!PN)
  break;
1719

Should we call forgetLoop before deleting the loop?

1721

Shouldn't be necessary as ORE is already there.

anjankgk updated this revision to Diff 335289.Apr 5 2021, 10:20 AM
anjankgk marked 3 inline comments as done.

Updates from tests and fix to identifying the correct replace instruction from exit BB.

anjankgk added a comment.EditedApr 5 2021, 10:21 AM

I have updated the code based on some more test results and based on the review comments.
Thanks!

Gentle ping..

Do we have an example of real world workload where this pattern was found?

This comment was removed by hiraditya.
joerg added a subscriber: joerg.Jun 13 2021, 5:18 PM

Please make sure at the very least that it doesn't get matched if a function named strlen and also test for that.

Please make sure at the very least that it doesn't get matched if a function named strlen and also test for that.

yeah, transforming strlen itself will be a lot of trouble ;)

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 7:57 AM

Can this patch also convert this one to strlen? gcc is able to recognize this idiom: https://godbolt.org/z/7ra1ErqKK

size_t strlen2(char *Str) {
    char *Src = Str;
    int i = 0;
    while (*Src) {
        ++i;
        ++Src;
    }
    return i;
}