This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add strcmp implementation.
ClosedPublic

Authored by cgyurgyik on Jun 18 2020, 4:49 PM.

Details

Summary

[libc] Adds string compare (strcmp) implementation to the libc string library.

Diff Detail

Event Timeline

cgyurgyik created this revision.Jun 18 2020, 4:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2020, 4:49 PM
cgyurgyik edited the summary of this revision. (Show Details)Jun 18 2020, 5:04 PM
cgyurgyik added reviewers: sivachandra, PaulkaToast.
sivachandra added inline comments.Jun 18 2020, 5:39 PM
libc/src/string/strcmp.cpp
15

Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.

22

Use reinterpret cast instead of C style casts.

libc/test/src/string/strcmp_test.cpp
11

Where relevant in the following tests, can you add a check with the operands reversed?

PaulkaToast added inline comments.Jun 18 2020, 6:06 PM
libc/src/string/strcmp.cpp
15

I would suggest to opt for more descriptive variable names over single letter one. i.e ("left" instead of "l").

cgyurgyik added inline comments.Jun 18 2020, 6:12 PM
libc/src/string/strcmp.cpp
15

What is the standard for leaving TODOs? I'm not finding anything in the llvm Coding Standards.

FWIW, this is a good candidate for fuzz testing

libc/src/string/strcmp.cpp
16–21

for ( ; *l && *l == *r; ++l, ++r); maybe

libc/src/string/strcmp.h
12

No need for this include

cgyurgyik updated this revision to Diff 271892.Jun 18 2020, 6:41 PM

Add TODO for word comparison, add better variable names, add operands reversed to tests, removed unnecessary header.

cgyurgyik marked 4 inline comments as done.Jun 18 2020, 6:42 PM
sivachandra accepted this revision.Jun 19 2020, 9:36 AM

As @abrachet has said, you should consider adding a fuzz target as a follow up. Also, before committing, please update the commit message as follows:

[libc] Add strcmp implementation.

The [libc] prefix is important (and even I miss it sometimes.)

This revision is now accepted and ready to land.Jun 19 2020, 9:36 AM
cgyurgyik retitled this revision from Add strcmp. to [libc] Add strcmp..Jun 19 2020, 9:37 AM
cgyurgyik retitled this revision from [libc] Add strcmp. to [libc] Add strcmp implementation..Jun 19 2020, 9:40 AM
cgyurgyik edited the summary of this revision. (Show Details)

As @abrachet has said, you should consider adding a fuzz target as a follow up. Also, before committing, please update the commit message as follows:

[libc] Add strcmp implementation.

The [libc] prefix is important (and even I miss it sometimes.)

Acknowledged.

cgyurgyik accepted this revision.Jun 23 2020, 5:52 AM
cgyurgyik closed this revision.

Submitted.

gchatelet added inline comments.
libc/src/string/strcmp.cpp
15

Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.

Technically we have to read the buffers byte per byte as we're not supposed to read past the \0.
I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering hardware pointer authentication which can break reading buffers cache line per cache line.

The granularity of ARMv8.3 PAC is 16 byte. Can you read an invalid address?

The granularity of ARMv8.3 PAC is 16 byte. Can you read an invalid address?

@tschuett I didn't dig into the address granularity, thank you for mentioning it.
Problem here is that we have two pointers to read from so this makes for the following logic:

  • both pointers are aligned: we can use 16B loads,
  • both pointers are unaligned of the same amount, we can load the first few bytes up to the next 16B boundary and then load 16B at a time,
  • pointers have unrelated alignment, not much we can do...

On top of this the return of investment of the "align + load 16B chunks" strategy heavily depends on the size of the two strings - which we can't know in advance since they're 0 terminated.
If on average strings are a few tens of bytes the added complexity will never pay off.

This is different from memcmp which provides the size argument that we can use to decide the best strategy in advance.

So it's unclear whether the added complexity will yield any substantial benefit over a simple version that also uses less space in the L1 instruction cache.

sivachandra added inline comments.Dec 14 2020, 7:35 AM
libc/src/string/strcmp.cpp
15

Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.

Technically we have to read the buffers byte per byte as we're not supposed to read past the \0.
I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering hardware pointer authentication which can break reading buffers cache line per cache line.

So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like strlen. Just to note, few other libcs take such an approach with strlen.

gchatelet added inline comments.Dec 14 2020, 8:19 AM
libc/src/string/strcmp.cpp
15

So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like strlen. Just to note, few other libcs take such an approach with strlen.

My understanding is that reading memory past the null character is undefined behavior.

7.21.1.1 p325 of the C standard:
Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined.

These optimizations work for now on a particular implementation but the standard does not make any guarantees whatsoever.

As a matter of fact neither GCC nor Clang try to vectorize this code with optimization turned on.