This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add SIMD strlen implementation .
AbandonedPublic

Authored by tschuett on Apr 11 2020, 9:51 AM.

Details

Summary

It tries to improve the performance of strlen by processing multiple bytes at a time.

Diff Detail

Event Timeline

tschuett created this revision.Apr 11 2020, 9:51 AM

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

For reference, this was also discussed in https://reviews.llvm.org/D77279.

tschuett updated this revision to Diff 256786.Apr 11 2020, 10:55 AM

Added SevenBytes and SixBytes

tschuett updated this revision to Diff 256789.Apr 11 2020, 11:30 AM

Some stylistic improvements.

AddressSanitizer is already unhappy about EmptyString.

PaulkaToast added a comment.EditedApr 13 2020, 12:27 AM

Hey tschuett, thank you for this contribution! (:

This could be great optimization and I believe it might be worth exploring if this change can be made in clang.
It would be fantastic if this could be recognized by the compiler and optimized there rather then having the logic here.
Afaik one of the design goals of llvm-libc is to make the implementations as platform agnostic as possible, this means avoiding assembly and also letting the compiler handle optimizations from a platform to platform basis.

Just something to consider, I think it would be quite meaningful.

sivachandra edited the summary of this revision. (Show Details)Apr 13 2020, 12:36 PM
sivachandra added reviewers: ckennelly, hctim.

Added @hctim to guide us wrt ASAN.

In general, we want to use the compiler to avoid assembly code (in standalone .S files or via inline assembly). Since, this patch does not use either of them, it is OK. But, I will let @ckennelly answer Paula. Even if we can do something in Clang, I am not sure if we can do the same in other compilers.

hctim added a comment.Apr 13 2020, 1:12 PM

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).

This implementation doesn't cross a page boundary (assuming the string doesn't of course) because of this:

// align char_ptr to multiple of sizeof(uintptr_t)
for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
       ++charPtr) {
    if (*charPtr == '\0')
      return charPtr - src;
  }

This doesn't die:

char *twoPages = (char *)mmap(nullptr, 0x2000, ...);
mprotect(twoPages + 0x1000, 0x1000, PROT_NONE);
std::strcpy(twoPages + 0x0FFA, "12345");
__llvm_libc::strlen(twoPages + 0x0FFA);

As a high level comment, this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries (of course they all do at page boundaries) so readying some extra bytes past the end of the string will never cause SIGBUS or SIGSEGV. However it is still undefined behavior and sanitizers will complain. If you build with LLVM_USE_SANITIZER=Address I'm pretty sure that FourBytes and TwoBytes will fail. And certainly __llvm_libc::strlen(__llvm_libc::strcpy(::malloc(6), "12345")); will get the sanitizers to complain.

I think it is worth discussing if we think it's valuable to turn off sanitizers for this function because we can conclude that it is safe to read past the string in the name of speed. I'd like to hear what the others have to say.

Sorry - I fail to see how this is safe. There are no guarantees about the alignment of the string, the OOB read could trivially cross a page boundary (and even then, I don't think we should be relying on UB).

This implementation doesn't cross a page boundary (assuming the string doesn't of course) because of this:

// align char_ptr to multiple of sizeof(uintptr_t)
for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
       ++charPtr) {
    if (*charPtr == '\0')
      return charPtr - src;
  }

This doesn't die:

char *twoPages = (char *)mmap(nullptr, 0x2000, ...);
mprotect(twoPages + 0x1000, 0x1000, PROT_NONE);
std::strcpy(twoPages + 0x0FFA, "12345");
__llvm_libc::strlen(twoPages + 0x0FFA);

I agree that this implementation does not cross the page boundary unless the string itself crosses the page boundary. But, if I am reading @hctim right, he is probably saying that reading OOB is by itself UB. Even if I understood @hctim right, another aspect to note is that this implementation would read strictly less than word-size worth of bytes OOB. I do not know if sanitizers are sensitive to this. If they are, instead of a blanket disable of sanitizers, are there any finer grain knobs which say reading (wordsize - 1) bytes OOB is OK?

hctim added a comment.Apr 13 2020, 4:11 PM

Thanks for the clarification on word boundary - I missed that part of the code.

ASan + HWASan are byte-precise, MSan is bit-precise. You would have to blacklist strlen from being sanitized and write a range check in the sanitizer (like the interceptors currently do).

I don't think relying on UB is a good idea. LLVM libc would immediately become incompatible with anything that has protection granularity less than sizeof(uintptr_t).

this scheme is technically safe because no mmu that I know of can deal make fault pages on non word boundaries

Sure - it might not seem to break x86 or amd64's MMUs, but we can't make the same guarantee of other RISC implementations or DMA systems. I feel like this would just bury the gremlins for someone to find out about later. I'm fairly sure it also could break Intel SGX (as enclave size can be 1, 2, or 4 bytes).

hctim added a comment.Apr 14 2020, 1:31 PM

I guess this is one of those times where we would have to give up correctness for speed. Quick question - do we have performance numbers that indicate this is faster? I did some quick runs on quick-bench and found it had improvements, but was surprised that the compiler didn't produce a repnz for the old strlen variant...

Bearing in mind to keep the LLVM libc goal here of being sanitizer-friendly, you'd have to implement the sanitizer behaviour yourself. It would basically be:

  1. Foreach bounds-based sanitizer: -fsanitize=address, -fsanitize=bounds, -fsanitize=memory, -fsanitize=hwasan, -fsanitize=memtag:
    • Mark the function as unsanitizeable: __attribute__(no_sanitize("address"))
    • Guard the following code behind an if has_feature(address_sanitizer): (note that the interfaces will slightly differ depending on the sanitizer)
      1. Check the shadow before reading each word (__asan_region_is_poisoned(ptrPtr, sizeof(uintptr_t)))
      2. If the string still looks like it should continue, but the sanitizer reported that it's OOB, you should trap (__asan_report_error)
      3. If the string terminates, and the sanitizer reported that it's OOB, you should check that the string termination point is strictly less than the sanitizer OOB point (fallback to a single-byte slow path using __asan_address_is_poisoned)
      4. If the string terminates, and sanitizers don't report OOB, continue as usual.

Alternatively you could range-check based on the return value, which would probably be simpler (this is what we do with the interceptors currently, see sanitizer_common_interceptors.inc:364).

I guess this is one of those times where we would have to give up correctness for speed. Quick question - do we have performance numbers that indicate this is faster? I did some quick runs on quick-bench and found it had improvements, but was surprised that the compiler didn't produce a repnz for the old strlen variant...

Bearing in mind to keep the LLVM libc goal here of being sanitizer-friendly, you'd have to implement the sanitizer behaviour yourself. It would basically be:

  1. Foreach bounds-based sanitizer: -fsanitize=address, -fsanitize=bounds, -fsanitize=memory, -fsanitize=hwasan, -fsanitize=memtag:
    • Mark the function as unsanitizeable: __attribute__(no_sanitize("address"))
    • Guard the following code behind an if has_feature(address_sanitizer): (note that the interfaces will slightly differ depending on the sanitizer)
      1. Check the shadow before reading each word (__asan_region_is_poisoned(ptrPtr, sizeof(uintptr_t)))
      2. If the string still looks like it should continue, but the sanitizer reported that it's OOB, you should trap (__asan_report_error)
      3. If the string terminates, and the sanitizer reported that it's OOB, you should check that the string termination point is strictly less than the sanitizer OOB point (fallback to a single-byte slow path using __asan_address_is_poisoned)
      4. If the string terminates, and sanitizers don't report OOB, continue as usual.

Alternatively you could range-check based on the return value, which would probably be simpler (this is what we do with the interceptors currently, see sanitizer_common_interceptors.inc:364).

We could also make LLVM_USE_SANITIZER define a macro so we know when if we are building with any sanitizers and take the slow path in that case. I think it is prohibitively complicated to try to hand write what the compiler can emit for us.

hctim added a comment.Apr 14 2020, 4:43 PM

We could also make LLVM_USE_SANITIZER define a macro so we know when if we are building with any sanitizers and take the slow path in that case. I think it is prohibitively complicated to try to hand write what the compiler can emit for us.

This also sounds reasonable to me. It might be worth creating a pattern of having a _sanitizeable variant and a "vanilla" variant, and one of the implementations is inlined at compile time. I don't imagine this is the last time this comes up.

We could also make LLVM_USE_SANITIZER define a macro so we know when if we are building with any sanitizers and take the slow path in that case. I think it is prohibitively complicated to try to hand write what the compiler can emit for us.

This also sounds reasonable to me. It might be worth creating a pattern of having a _sanitizeable variant and a "vanilla" variant, and one of the implementations is inlined at compile time. I don't imagine this is the last time this comes up.

In general, we want to avoid having separate instrumented and non-instrumented versions.

In the case of this patch, I think there is only one line that needs special handling at strlen.cpp:51. I would think a static inline helper like safe_read_word can be implemented. This function can handle all the sanitizer specific logic as explained by @hctim.

@tschuett, I hope you are not getting the wrong signal here. We want this patch. We are only debating on ways to get it in :)

Also, composing what a compiler should ideally be doing might seem daunting. But again, I think for each such special case, one will find a clear cut point which needs to be handled specially.

No worries. I am only afraid that a future compiler detects the UB and starts to optimize.

No worries. I am only afraid that a future compiler detects the UB and starts to optimize.

At least wrt strlen, I am not sure if compilers can detect the kind of UB we are discussing here. The strlen function gets a pointer. The compilers cannot assume anything about what that pointer is pointing to. Is there something else in your mind that I am missing?

Pattern matching? Most compilers can recognize matrix-matrix multiplication. It bit me once.
It might eventually learn these SIMD patterns.

Pattern matching? Most compilers can recognize matrix-matrix multiplication. It bit me once.
It might eventually learn these SIMD patterns.

Wouldn't it have the same address sanitizer problem we are discussing here?

Anyway, to help further this discussion, I have prepared a patch over this patch: https://reviews.llvm.org/D78612

tschuett abandoned this revision.Dec 21 2020, 12:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 9:13 PM