This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/
-
src/string/
-
string/
-
strlen.cpp
-
test/src/string/
-
src/
-
string/
-
strlen_test.cpp

Differential D77949

[libc] Add SIMD strlen implementation .
AbandonedPublic

Authored by tschuett on Apr 11 2020, 9:51 AM.

Download Raw Diff

Details

Reviewers

sivachandra
abrachet
PaulkaToast
ckennelly
hctim

Summary

It tries to improve the performance of strlen by processing multiple bytes at a time.

Diff Detail

Event Timeline

tschuett created this revision.Apr 11 2020, 9:51 AM

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

abrachet added subscribers: gchatelet, ckennelly.Apr 11 2020, 10:39 AM

For reference, this was also discussed in https://reviews.llvm.org/D77279.

Added SevenBytes and SixBytes

Some stylistic improvements.

AddressSanitizer is already unhappy about EmptyString.

Hey tschuett, thank you for this contribution! (:

This could be great optimization and I believe it might be worth exploring if this change can be made in clang.
It would be fantastic if this could be recognized by the compiler and optimized there rather then having the logic here.
Afaik one of the design goals of llvm-libc is to make the implementations as platform agnostic as possible, this means avoiding assembly and also letting the compiler handle optimizations from a platform to platform basis.

Just something to consider, I think it would be quite meaningful.

sivachandra edited the summary of this revision. (Show Details)Apr 13 2020, 12:36 PM

sivachandra added reviewers: ckennelly, hctim.

Added @hctim to guide us wrt ASAN.

In general, we want to use the compiler to avoid assembly code (in standalone .S files or via inline assembly). Since, this patch does not use either of them, it is OK. But, I will let @ckennelly answer Paula. Even if we can do something in Clang, I am not sure if we can do the same in other compilers.

In D77949#1976185, @abrachet wrote:

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).

In D77949#1978750, @hctim wrote:

In D77949#1976185, @abrachet wrote:

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).

This implementation doesn't cross a page boundary (assuming the string doesn't of course) because of this:

// align char_ptr to multiple of sizeof(uintptr_t)
for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
       ++charPtr) {
    if (*charPtr == '\0')
      return charPtr - src;
  }

This doesn't die:

char *twoPages = (char *)mmap(nullptr, 0x2000, ...);
mprotect(twoPages + 0x1000, 0x1000, PROT_NONE);
std::strcpy(twoPages + 0x0FFA, "12345");
__llvm_libc::strlen(twoPages + 0x0FFA);

In D77949#1978924, @abrachet wrote:
In D77949#1978750, @hctim wrote:

In D77949#1976185, @abrachet wrote:

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).

This implementation doesn't cross a page boundary (assuming the string doesn't of course) because of this:
// align char_ptr to multiple of sizeof(uintptr_t)
for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
       ++charPtr) {
    if (*charPtr == '\0')
      return charPtr - src;
  }
This doesn't die:
char *twoPages = (char *)mmap(nullptr, 0x2000, ...);
mprotect(twoPages + 0x1000, 0x1000, PROT_NONE);
std::strcpy(twoPages + 0x0FFA, "12345");
__llvm_libc::strlen(twoPages + 0x0FFA);

I agree that this implementation does not cross the page boundary unless the string itself crosses the page boundary. But, if I am reading @hctim right, he is probably saying that reading OOB is by itself UB. Even if I understood @hctim right, another aspect to note is that this implementation would read strictly less than word-size worth of bytes OOB. I do not know if sanitizers are sensitive to this. If they are, instead of a blanket disable of sanitizers, are there any finer grain knobs which say reading (wordsize - 1) bytes OOB is OK?

Thanks for the clarification on word boundary - I missed that part of the code.

ASan + HWASan are byte-precise, MSan is bit-precise. You would have to blacklist strlen from being sanitized and write a range check in the sanitizer (like the interceptors currently do).

I don't think relying on UB is a good idea. LLVM libc would immediately become incompatible with anything that has protection granularity less than sizeof(uintptr_t).

this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries

Sure - it might not seem to break x86 or amd64's MMUs, but we can't make the same guarantee of other RISC implementations or DMA systems. I feel like this would just bury the gremlins for someone to find out about later. I'm fairly sure it also could break Intel SGX (as enclave size can be 1, 2, or 4 bytes).

glibc is doing it in a similar way. It exhibits the same UB.
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strlen.c;h=f934e1a1e897c0176c04566a182f092dc1ab2a37;hb=HEAD

I guess this is one of those times where we would have to give up correctness for speed. Quick question - do we have performance numbers that indicate this is faster? I did some quick runs on quick-bench and found it had improvements, but was surprised that the compiler didn't produce a repnz for the old strlen variant...

Bearing in mind to keep the LLVM libc goal here of being sanitizer-friendly, you'd have to implement the sanitizer behaviour yourself. It would basically be:

Foreach bounds-based sanitizer: -fsanitize=address, -fsanitize=bounds, -fsanitize=memory, -fsanitize=hwasan, -fsanitize=memtag:
- Mark the function as unsanitizeable: __attribute__(no_sanitize("address"))
- Guard the following code behind an if has_feature(address_sanitizer): (note that the interfaces will slightly differ depending on the sanitizer)
  1. Check the shadow before reading each word (__asan_region_is_poisoned(ptrPtr, sizeof(uintptr_t)))
  2. If the string still looks like it should continue, but the sanitizer reported that it's OOB, you should trap (__asan_report_error)
  3. If the string terminates, and the sanitizer reported that it's OOB, you should check that the string termination point is strictly less than the sanitizer OOB point (fallback to a single-byte slow path using __asan_address_is_poisoned)
  4. If the string terminates, and sanitizers don't report OOB, continue as usual.

Alternatively you could range-check based on the return value, which would probably be simpler (this is what we do with the interceptors currently, see sanitizer_common_interceptors.inc:364).

In D77949#1981766, @hctim wrote:

I guess this is one of those times where we would have to give up correctness for speed. Quick question - do we have performance numbers that indicate this is faster? I did some quick runs on quick-bench and found it had improvements, but was surprised that the compiler didn't produce a repnz for the old strlen variant...

Bearing in mind to keep the LLVM libc goal here of being sanitizer-friendly, you'd have to implement the sanitizer behaviour yourself. It would basically be:

Foreach bounds-based sanitizer: -fsanitize=address, -fsanitize=bounds, -fsanitize=memory, -fsanitize=hwasan, -fsanitize=memtag:

Mark the function as unsanitizeable: __attribute__(no_sanitize("address"))

Guard the following code behind an if has_feature(address_sanitizer): (note that the interfaces will slightly differ depending on the sanitizer)

Check the shadow before reading each word (__asan_region_is_poisoned(ptrPtr, sizeof(uintptr_t)))

If the string still looks like it should continue, but the sanitizer reported that it's OOB, you should trap (__asan_report_error)

If the string terminates, and the sanitizer reported that it's OOB, you should check that the string termination point is strictly less than the sanitizer OOB point (fallback to a single-byte slow path using __asan_address_is_poisoned)

If the string terminates, and sanitizers don't report OOB, continue as usual.

Alternatively you could range-check based on the return value, which would probably be simpler (this is what we do with the interceptors currently, see sanitizer_common_interceptors.inc:364).

We could also make LLVM_USE_SANITIZER define a macro so we know when if we are building with any sanitizers and take the slow path in that case. I think it is prohibitively complicated to try to hand write what the compiler can emit for us.

In D77949#1982189, @abrachet wrote:

We could also make LLVM_USE_SANITIZER define a macro so we know when if we are building with any sanitizers and take the slow path in that case. I think it is prohibitively complicated to try to hand write what the compiler can emit for us.

This also sounds reasonable to me. It might be worth creating a pattern of having a _sanitizeable variant and a "vanilla" variant, and one of the implementations is inlined at compile time. I don't imagine this is the last time this comes up.

In D77949#1982360, @hctim wrote:

In D77949#1982189, @abrachet wrote:

We could also make LLVM_USE_SANITIZER define a macro so we know when if we are building with any sanitizers and take the slow path in that case. I think it is prohibitively complicated to try to hand write what the compiler can emit for us.

This also sounds reasonable to me. It might be worth creating a pattern of having a _sanitizeable variant and a "vanilla" variant, and one of the implementations is inlined at compile time. I don't imagine this is the last time this comes up.

In general, we want to avoid having separate instrumented and non-instrumented versions.

In the case of this patch, I think there is only one line that needs special handling at strlen.cpp:51. I would think a static inline helper like safe_read_word can be implemented. This function can handle all the sanitizer specific logic as explained by @hctim.

@tschuett, I hope you are not getting the wrong signal here. We want this patch. We are only debating on ways to get it in :)

Also, composing what a compiler should ideally be doing might seem daunting. But again, I think for each such special case, one will find a clear cut point which needs to be handled specially.

No worries. I am only afraid that a future compiler detects the UB and starts to optimize.

In D77949#1982778, @tschuett wrote:

No worries. I am only afraid that a future compiler detects the UB and starts to optimize.

At least wrt strlen, I am not sure if compilers can detect the kind of UB we are discussing here. The strlen function gets a pointer. The compilers cannot assume anything about what that pointer is pointing to. Is there something else in your mind that I am missing?

Pattern matching? Most compilers can recognize matrix-matrix multiplication. It bit me once.
It might eventually learn these SIMD patterns.

In D77949#1983650, @tschuett wrote:

Pattern matching? Most compilers can recognize matrix-matrix multiplication. It bit me once.
It might eventually learn these SIMD patterns.

Wouldn't it have the same address sanitizer problem we are discussing here?

Anyway, to help further this discussion, I have prepared a patch over this patch: https://reviews.llvm.org/D78612

abrachet mentioned this in D78612: [libc] Add sanitizer instrumentation to the SIMD strlen implementation..Apr 22 2020, 1:24 AM

tschuett abandoned this revision.Dec 21 2020, 12:22 PM

Herald added a subscriber: ecnelises. · View Herald TranscriptDec 21 2020, 12:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 9:13 PM

Revision Contents

Path

Size

libc/

src/

string/

strlen.cpp

54 lines

test/

src/

string/

strlen_test.cpp

35 lines

Diff 256789

libc/src/string/strlen.cpp

	//===-- Implementation of strlen ------------------------------------------===//			//===-- Implementation of strlen ------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/string/strlen.h"			#include "src/string/strlen.h"

	#include "src/__support/common.h"			#include "src/__support/common.h"

				#include <stdint.h>

	namespace __llvm_libc {			namespace __llvm_libc {

	// TODO: investigate the performance of this function.			// For LLP64, i.e., Windows, pointers are 64 bit and long is only 32 bit
	// There might be potential for compiler optimization.			// Thus, this implementation uses uintptr_t instead of long

	size_t LLVM_LIBC_ENTRYPOINT(strlen)(const char *src) {			size_t LLVM_LIBC_ENTRYPOINT(strlen)(const char *src) {
				const char *charPtr;
				const uintptr_t *ptrPtr;
				uintptr_t highMask = 0x80808080L;
				uintptr_t lowMask = 0x01010101L;

				if (sizeof(uintptr_t) == 4) {
				// do nothing
				} else if (sizeof(uintptr_t) == 8) {
				// setup 64 bit masks
				highMask = ((highMask << 16) << 16) \| highMask;
				lowMask = ((lowMask << 16) << 16) \| lowMask;
				} else {
				// fallback to slow path
	const char *end = src;			const char *end = src;
	while (*end != '\0')			while (*end != '\0')
	++end;			++end;
	return end - src;			return end - src;
	}			}

				// align char_ptr to multiple of sizeof(uintptr_t)
				for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
				++charPtr) {
				if (*charPtr == '\0')
				return charPtr - src;
				}

				ptrPtr = (const uintptr_t *)charPtr;

				// process sizeof(uintptr_t) bytes at a time
				for (;;) {
				uintptr_t value = *ptrPtr;

				if (((value - lowMask) & ~value & highMask) != 0) {
				// found a zero byte
				const char cp = (const char )ptrPtr;

				for (unsigned i = 0; i < sizeof(uintptr_t); ++i)
				if (cp[i] == 0)
				return cp - src + i;
				}

				ptrPtr++;
				}
				}

	} // namespace __llvm_libc			} // namespace __llvm_libc

libc/test/src/string/strlen_test.cpp

	Show All 16 Lines
	}			}

	TEST(StrLenTest, AnyString) {			TEST(StrLenTest, AnyString) {
	const char *any = "Hello World!";			const char *any = "Hello World!";

	size_t result = __llvm_libc::strlen(any);			size_t result = __llvm_libc::strlen(any);
	ASSERT_EQ((size_t)12, result);			ASSERT_EQ((size_t)12, result);
	}			}

				TEST(StrLenTest, EightBytes) {
				const char *eight = "1234567";

				size_t result = __llvm_libc::strlen(eight);
				ASSERT_EQ((size_t)7, result);
				}

				TEST(StrLenTest, SevenBytes) {
				const char *seven = "123456";

				size_t result = __llvm_libc::strlen(seven);
				ASSERT_EQ((size_t)6, result);
				}

				TEST(StrLenTest, SixBytes) {
				const char *six = "12345";

				size_t result = __llvm_libc::strlen(six);
				ASSERT_EQ((size_t)5, result);
				}

				TEST(StrLenTest, FourBytes) {
				const char *four = "123";

				size_t result = __llvm_libc::strlen(four);
				ASSERT_EQ((size_t)3, result);
				}

				TEST(StrLenTest, TwoBytes) {
				const char *two = "1";

				size_t result = __llvm_libc::strlen(two);
				ASSERT_EQ((size_t)1, result);
				}