This is an archive of the discontinued LLVM Phabricator instance.

libc/src/string/strcmp.cpp
15	Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round.
22	Use reinterpret cast instead of C style casts.
libc/test/src/string/strcmp_test.cpp
11	Where relevant in the following tests, can you add a check with the operands reversed?

Harbormaster failed remote builds in B60917: Diff 271880!Jun 18 2020, 6:03 PM

PaulkaToast added inline comments.Jun 18 2020, 6:06 PM

libc/src/string/strcmp.cpp
15	I would suggest to opt for more descriptive variable names over single letter one. i.e ("left" instead of "l").

cgyurgyik added inline comments.Jun 18 2020, 6:12 PM

libc/src/string/strcmp.cpp
15	What is the standard for leaving TODOs? I'm not finding anything in the llvm Coding Standards.

FWIW, this is a good candidate for fuzz testing

libc/src/string/strcmp.cpp
16–21	`for ( ; l && l == *r; ++l, ++r);` maybe
libc/src/string/strcmp.h
12	No need for this include

Add TODO for word comparison, add better variable names, add operands reversed to tests, removed unnecessary header.

cgyurgyik marked 4 inline comments as done.Jun 18 2020, 6:42 PM

Harbormaster failed remote builds in B60922: Diff 271892!Jun 18 2020, 7:39 PM

As @abrachet has said, you should consider adding a fuzz target as a follow up. Also, before committing, please update the commit message as follows:

[libc] Add strcmp implementation.

The [libc] prefix is important (and even I miss it sometimes.)

This revision is now accepted and ready to land.Jun 19 2020, 9:36 AM

cgyurgyik retitled this revision from Add strcmp. to [libc] Add strcmp..Jun 19 2020, 9:37 AM

Herald added subscribers: ecnelises, tschuett. · View Herald TranscriptJun 19 2020, 9:37 AM

In D82134#2103965, @sivachandra wrote:
As @abrachet has said, you should consider adding a fuzz target as a follow up. Also, before committing, please update the commit message as follows:
[libc] Add strcmp implementation.
The [libc] prefix is important (and even I miss it sometimes.)

Acknowledged.

Submitted.

cheng.w mentioned this in D93009: [libc] Add memcmp implementation..Dec 10 2020, 12:40 AM

gchatelet added a subscriber: gchatelet.Dec 14 2020, 1:06 AM

gchatelet added inline comments.

libc/src/string/strcmp.cpp
15	Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round. Technically we have to read the buffers byte per byte as we're not supposed to read past the `\0`. I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering hardware pointer authentication which can break reading buffers cache line per cache line.

The granularity of ARMv8.3 PAC is 16 byte. Can you read an invalid address?

In D82134#2451435, @tschuett wrote:

The granularity of ARMv8.3 PAC is 16 byte. Can you read an invalid address?

@tschuett I didn't dig into the address granularity, thank you for mentioning it.
Problem here is that we have two pointers to read from so this makes for the following logic:

both pointers are aligned: we can use 16B loads,
both pointers are unaligned of the same amount, we can load the first few bytes up to the next 16B boundary and then load 16B at a time,
pointers have unrelated alignment, not much we can do...

On top of this the return of investment of the "align + load 16B chunks" strategy heavily depends on the size of the two strings - which we can't know in advance since they're 0 terminated.
If on average strings are a few tens of bytes the added complexity will never pay off.

This is different from memcmp which provides the size argument that we can use to decide the best strategy in advance.

So it's unclear whether the added complexity will yield any substantial benefit over a simple version that also uses less space in the L1 instruction cache.

sivachandra added inline comments.Dec 14 2020, 7:35 AM

libc/src/string/strcmp.cpp
15	Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round. Technically we have to read the buffers byte per byte as we're not supposed to read past the `\0`. I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering hardware pointer authentication which can break reading buffers cache line per cache line. So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like `strlen`. Just to note, few other libcs take such an approach with `strlen`.

gchatelet added inline comments.Dec 14 2020, 8:19 AM

libc/src/string/strcmp.cpp
15	So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like `strlen`. Just to note, few other libcs take such an approach with `strlen`. My understanding is that reading memory past the null character is undefined behavior. 7.21.1.1 p325 of the C standard: Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined. These optimizations work for now on a particular implementation but the standard does not make any guarantees whatsoever. As a matter of fact neither GCC nor Clang try to vectorize this code with optimization turned on.

Revision Contents

Path

Size

libc/

src/

string/

strcmp.h

4 lines

strcmp.cpp

14 lines

test/

src/

string/

strcmp_test.cpp

33 lines

Diff 271892

libc/src/string/strcmp.h

	//===-- Implementation header for strcmp ------------------------- C++ --===//			//===-- Implementation header for strcmp ------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIBC_SRC_STRING_STRCMP_H			#ifndef LLVM_LIBC_SRC_STRING_STRCMP_H
	#define LLVM_LIBC_SRC_STRING_STRCMP_H			#define LLVM_LIBC_SRC_STRING_STRCMP_H

	#include "include/string.h"

	namespace __llvm_libc {			namespace __llvm_libc {
				abrachetUnsubmitted Done Reply Inline Actions No need for this include abrachet: No need for this include

	int strcmp(const char l, const char r);			int strcmp(const char left, const char right);

	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_STRING_STRCMP_H			#endif // LLVM_LIBC_SRC_STRING_STRCMP_H

libc/src/string/strcmp.cpp

	//===-- Implementation of strcmp ------------------------------------------===//			//===-- Implementation of strcmp ------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/string/strcmp.h"			#include "src/string/strcmp.h"

	#include "src/__support/common.h"			#include "src/__support/common.h"

	namespace __llvm_libc {			namespace __llvm_libc {

	int LLVM_LIBC_ENTRYPOINT(strcmp)(const char l, const char r) {			// TODO: Look at benefits for comparing words at a time.
				sivachandraUnsubmitted Not Done Reply Inline Actions Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round. sivachandra: Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so…
				cgyurgyikAuthorUnsubmitted Not Done Reply Inline Actions What is the standard for leaving TODOs? I'm not finding anything in the llvm Coding Standards. cgyurgyik: What is the standard for leaving TODOs? I'm not finding anything in the llvm Coding Standards.
				gchateletUnsubmitted Not Done Reply Inline Actions Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round. Technically we have to read the buffers byte per byte as we're not supposed to read past the `\0`. I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering hardware pointer authentication which can break reading buffers cache line per cache line. gchatelet: > Is there benefit in comparing words at a time? I know that it leads to potential OOB reads…
				sivachandraUnsubmitted Not Done Reply Inline Actions Is there benefit in comparing words at a time? I know that it leads to potential OOB reads, so we can handle it separately. But if it can lead to speed up, just a leave a TODO about it and we will get to it in a later round. Technically we have to read the buffers byte per byte as we're not supposed to read past the `\0`. I'm not optimistic on the ability to accelerate this routine because -even if a page fault can only occur at page boundaries- some processors are offering hardware pointer authentication which can break reading buffers cache line per cache line. So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like `strlen`. Just to note, few other libcs take such an approach with `strlen`. sivachandra: > > Is there benefit in comparing words at a time? I know that it leads to potential OOB reads…
				gchateletUnsubmitted Not Done Reply Inline Actions So, going by that, we cannot employ similar techniques (of comparing word lengths at a time) even for functions like `strlen`. Just to note, few other libcs take such an approach with `strlen`. My understanding is that reading memory past the null character is undefined behavior. 7.21.1.1 p325 of the C standard: Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined. These optimizations work for now on a particular implementation but the standard does not make any guarantees whatsoever. As a matter of fact neither GCC nor Clang try to vectorize this code with optimization turned on. gchatelet: > So, going by that, we cannot employ similar techniques (of comparing word lengths at a time)…
				PaulkaToastUnsubmitted Done Reply Inline Actions I would suggest to opt for more descriptive variable names over single letter one. i.e ("left" instead of "l"). PaulkaToast: I would suggest to opt for more descriptive variable names over single letter one. i.e ("left"…
	while (*l) {			int LLVM_LIBC_ENTRYPOINT(strcmp)(const char left, const char right) {
	if (l != r)			for (; left && left == *right; ++left, ++right)
	break;			;
	++l;			return reinterpret_cast<const unsigned char >(left) -
	++r;			reinterpret_cast<const unsigned char >(right);
	}
	return (const unsigned char )l - (const unsigned char )r;
	}			}
				abrachetUnsubmitted Done Reply Inline Actions `for ( ; l && l == r; ++l, ++r);` maybe abrachet:* `for ( ; l && l == *r; ++l, ++r);` maybe

				sivachandraUnsubmitted Done Reply Inline Actions Use reinterpret cast instead of C style casts. sivachandra: Use reinterpret cast instead of C style casts.
	} // namespace __llvm_libc			} // namespace __llvm_libc

libc/test/src/string/strcmp_test.cpp

	//===-- Unittests for strcmp ----------------------------------------------===//			//===-- Unittests for strcmp ----------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/string/strcmp.h"			#include "src/string/strcmp.h"
	#include "utils/UnitTest/Test.h"			#include "utils/UnitTest/Test.h"

				sivachandraUnsubmitted Not Done Reply Inline Actions Where relevant in the following tests, can you add a check with the operands reversed? sivachandra: Where relevant in the following tests, can you add a check with the operands reversed?
	TEST(StrCmpTest, EmptyStringsShouldReturnZero) {			TEST(StrCmpTest, EmptyStringsShouldReturnZero) {
	const char *s1 = "";			const char *s1 = "";
	const char *s2 = "";			const char *s2 = "";
	const int result = __llvm_libc::strcmp(s1, s2);			int result = __llvm_libc::strcmp(s1, s2);
				ASSERT_EQ(result, 0);

				// Verify operands reversed.
				result = __llvm_libc::strcmp(s2, s1);
	ASSERT_EQ(result, 0);			ASSERT_EQ(result, 0);
	}			}

	TEST(StrCmpTest, EmptyStringShouldNotEqualNonEmptyString) {			TEST(StrCmpTest, EmptyStringShouldNotEqualNonEmptyString) {
	const char *empty = "";			const char *empty = "";
	const char *s2 = "abc";			const char *s2 = "abc";
	int result = __llvm_libc::strcmp(empty, s2);			int result = __llvm_libc::strcmp(empty, s2);
	// This should be '\0' - 'a' = -97			// This should be '\0' - 'a' = -97
	ASSERT_EQ(result, -97);			ASSERT_EQ(result, -97);

	// Similar case if empty string is second argument.			// Similar case if empty string is second argument.
	const char *s3 = "123";			const char *s3 = "123";
	result = __llvm_libc::strcmp(s3, empty);			result = __llvm_libc::strcmp(s3, empty);
	// This should be '1' - '\0' = 49			// This should be '1' - '\0' = 49
	ASSERT_EQ(result, 49);			ASSERT_EQ(result, 49);
	}			}

	TEST(StrCmpTest, EqualStringsShouldReturnZero) {			TEST(StrCmpTest, EqualStringsShouldReturnZero) {
	const char *s1 = "abc";			const char *s1 = "abc";
	const char *s2 = "abc";			const char *s2 = "abc";
	const int result = __llvm_libc::strcmp(s1, s2);			int result = __llvm_libc::strcmp(s1, s2);
				ASSERT_EQ(result, 0);

				// Verify operands reversed.
				result = __llvm_libc::strcmp(s2, s1);
	ASSERT_EQ(result, 0);			ASSERT_EQ(result, 0);
	}			}

	TEST(StrCmpTest, ShouldReturnResultOfFirstDifference) {			TEST(StrCmpTest, ShouldReturnResultOfFirstDifference) {
	const char *s1 = "___B42__";			const char *s1 = "___B42__";
	const char *s2 = "___C55__";			const char *s2 = "___C55__";
	const int result = __llvm_libc::strcmp(s1, s2);			int result = __llvm_libc::strcmp(s1, s2);
	// This should return 'B' - 'C' = -1.			// This should return 'B' - 'C' = -1.
	ASSERT_EQ(result, -1);			ASSERT_EQ(result, -1);

				// Verify operands reversed.
				result = __llvm_libc::strcmp(s2, s1);
				// This should return 'C' - 'B' = 1.
				ASSERT_EQ(result, 1);
	}			}

	TEST(StrCmpTest, CapitalizedLetterShouldNotBeEqual) {			TEST(StrCmpTest, CapitalizedLetterShouldNotBeEqual) {
	const char *s1 = "abcd";			const char *s1 = "abcd";
	const char *s2 = "abCd";			const char *s2 = "abCd";
	const int result = __llvm_libc::strcmp(s1, s2);			int result = __llvm_libc::strcmp(s1, s2);
	// 'c' - 'C' = 32.			// 'c' - 'C' = 32.
	ASSERT_EQ(result, 32);			ASSERT_EQ(result, 32);

				// Verify operands reversed.
				result = __llvm_libc::strcmp(s2, s1);
				// 'C' - 'c' = -32.
				ASSERT_EQ(result, -32);
	}			}

	TEST(StrCmpTest, UnequalLengthStringsShouldNotReturnZero) {			TEST(StrCmpTest, UnequalLengthStringsShouldNotReturnZero) {
	const char *s1 = "abc";			const char *s1 = "abc";
	const char *s2 = "abcd";			const char *s2 = "abcd";
	const int result = __llvm_libc::strcmp(s1, s2);			int result = __llvm_libc::strcmp(s1, s2);
	// '\0' - 'd' = -100.			// '\0' - 'd' = -100.
	ASSERT_EQ(result, -100);			ASSERT_EQ(result, -100);

				// Verify operands reversed.
				result = __llvm_libc::strcmp(s2, s1);
				// 'd' - '\0' = 100.
				ASSERT_EQ(result, 100);
	}			}

	TEST(StrCmpTest, StringArgumentSwapChangesSign) {			TEST(StrCmpTest, StringArgumentSwapChangesSign) {
	const char *a = "a";			const char *a = "a";
	const char *b = "b";			const char *b = "b";
	int result = __llvm_libc::strcmp(b, a);			int result = __llvm_libc::strcmp(b, a);
	// 'b' - 'a' = 1.			// 'b' - 'a' = 1.
	ASSERT_EQ(result, 1);			ASSERT_EQ(result, 1);

	result = __llvm_libc::strcmp(a, b);			result = __llvm_libc::strcmp(a, b);
	// 'a' - 'b' = -1.			// 'a' - 'b' = -1.
	ASSERT_EQ(result, -1);			ASSERT_EQ(result, -1);
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add strcmp implementation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 271892

libc/src/string/strcmp.h

libc/src/string/strcmp.cpp

libc/test/src/string/strcmp_test.cpp

[libc] Add strcmp implementation.
ClosedPublic