This is an archive of the discontinued LLVM Phabricator instance.

[libc][NFC] Optimize and standardize the C string functions
AbandonedPublic

Authored by gAlfonso-bit on Jul 21 2021, 5:02 PM.

Details

Reviewers
sivachandra
Group Reviewers
Restricted Project

Diff Detail

Event Timeline

gAlfonso-bit created this revision.Jul 21 2021, 5:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 5:02 PM
gAlfonso-bit requested review of this revision.Jul 21 2021, 5:02 PM
This comment was removed by gAlfonso-bit.
gAlfonso-bit retitled this revision from [libc] Add explicit C++ casting to strrchr to [libc] Add explicit C++ casting to strchr and strrchr.Jul 21 2021, 6:21 PM

Added \0 so we know what we are searching for

gAlfonso-bit accepted this revision.Jul 22 2021, 9:17 AM

If there are any objections, please let me know.

This revision is now accepted and ready to land.Jul 22 2021, 9:17 AM

About strchr, it is the way it is because the standard says, "each character interpreted as unsigned char". I see almost no difference before and after wrt codegen.

Before: https://godbolt.org/z/89sqGrYY9
After: https://godbolt.org/z/4h9eWzKKT

The changes comparing with `\0' are good though.

gAlfonso-bit added a comment.EditedJul 23 2021, 7:10 AM

You are confusing strchr and memchr it seems. Str - standard says char. Memchr - standard says unsigned char.

Easy way to remember this is by knowing: strings are char arrays. Mem is by the raw byte.

This comment was removed by gAlfonso-bit.
gAlfonso-bit retitled this revision from [libc] Add explicit C++ casting to strchr and strrchr to [libc] Add explicit C++ casting to strchr, strrchr, memchr, and memrchr.Jul 23 2021, 7:48 AM
gAlfonso-bit edited the summary of this revision. (Show Details)
This comment was removed by gAlfonso-bit.
gAlfonso-bit resigned from this revision.Jul 23 2021, 11:48 AM
This revision now requires review to proceed.Jul 23 2021, 11:48 AM
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
gAlfonso-bit updated this revision to Diff 361325.

To get this reviewed, you should split this up into multiple small and focused patches. I have also left a few comments inline you can use as a general guidelines for splitting.

You are confusing strchr and memchr it seems. Str - standard says char. Memchr - standard says unsigned char.

I think the original author looked at cppreference.com which has almost the same text for strchr and memchr.

strchr - https://en.cppreference.com/w/c/string/byte/strchr
memchr - https://en.cppreference.com/w/c/string/byte/memchr

But, I also looked up the actual standard document and it indeed follows what you are saying.

libc/src/string/memchr.cpp
21

Using these explicit casts is probably a good thing, but it mostly cosmetic. If there is a reason beyond just the cosmetics, elaborate that.

libc/src/string/memrchr.cpp
19–20

Instead of folding existing code into patterns like this (using an increment and decrement operator along with a another operator like the dereference operator), I would suggest doing the other way round cleanup. For example, the existing code is more readable than this new code.

libc/src/string/strchr.cpp
19

Comparing with an exact value instead of relying on its boolean interpretation is a good change I think. So, please do it in a separate patch just focused on these kind of changes. It makes the intention clear to the reviewers.

libc/src/string/strcpy.cpp
14

The changes in this file have to be a patch by itself explaining why and what clearly.

libc/src/string/string_utils.h
21–22

For me, most of the changes in this file look like cosmetic churn. So, please explain why you want to do these changes.

gAlfonso-bit marked an inline comment as done.Jul 23 2021, 2:04 PM
This comment was removed by gAlfonso-bit.
libc/src/string/strchr.cpp
19

It actually does what the point of this patch is, which is that it coverts to char, not unsigned char, and because \0 is a char literal for 0, it makes sense here

libc/src/string/strcpy.cpp
14

It’s an exact copy of the one in memcpy except the argument types align with what the functions are supposed to be handling.

libc/src/string/string_utils.h
21–22

It’s actually more efficient this way and closer to the other functions.

gAlfonso-bit marked an inline comment as done.
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
gAlfonso-bit marked an inline comment as done.
gAlfonso-bit retitled this revision from [libc] Add explicit C++ casting to strchr, strrchr, memchr, and memrchr to [libc] Optimize and standardize the C string functions.Jul 25 2021, 5:59 PM
gAlfonso-bit edited the summary of this revision. (Show Details)
gAlfonso-bit edited the summary of this revision. (Show Details)
gAlfonso-bit marked 3 inline comments as done.
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
libc/src/string/strcpy.cpp
14

Addressed

This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.

Is this good? @sivachandra

A large part of this change is what I would would consider as "churn". So, if you can demonstrate why you want this change, it will be easier to review for people like me (for whom it might not be clear as to why one flavor is better than the other). For example, for code this like:

const char ch = static_cast<char>(c); // c is of type int

The implicit conversion rules make it equivalent to

const char ch = c;

Unless you want the code to work under -Wconversion or something similar, the explicit cast is redundant. Speaking for myself (as a reviewer), if you can let us know the intention of your change, I will be able read it with that in my mind.

Don't get me wrong here; I totally appreciate you taking time to do this. In fact, I have already commented earlier that some of the changes you included in here are good from consistency and readability point of view, but that they should be done separately in order not to mix-up concerns.

This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.
gAlfonso-bit retitled this revision from [libc] Optimize and standardize the C string functions to [libc][NFC] Optimize and standardize the C string functions.Jul 27 2021, 2:16 PM
gAlfonso-bit added a subscriber: MaskRay.
This comment was removed by gAlfonso-bit.
gAlfonso-bit edited reviewers, added: ldionne; removed: MaskRay.Jul 28 2021, 8:40 AM

Rebased on LLVM 13 Release branch

This comment was removed by gAlfonso-bit.
libc/src/string/memrchr.cpp
19–20

The code is more efficient this way than the last way.

Rebased to current

Remove unneeded check

This comment was removed by gAlfonso-bit.
This comment was removed by gAlfonso-bit.

One way is to fold this entire patch "Under standardize and optimize" but I would rather split this patch to actually see the "standardize" parts and the "optimize" parts separately. In fact, as it stands, this patch is doing cleanups beyond "optimize" and "standardize". So, I would strongly encourage you to separate out all the individual concerns clearly in different patches.

libc/src/string/memmove.cpp
1

Some of the changes in this file have to separated out into a patch of their own.

libc/src/string/memory_utils/elements.h
1

Same here - some of the changes have to be a patch of their own explaining the rationale clearly. You can combine this change with the change to memmove.cpp.

libc/src/string/memrchr.cpp
19–20

Can you please demonstrate that at -O2 or higher, mixing an increment/decrement operator with another operator is more efficient?

gAlfonso-bit edited the summary of this revision. (Show Details)Sep 27 2023, 11:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2023, 11:07 AM
gAlfonso-bit abandoned this revision.Sep 27 2023, 11:08 AM