This is an archive of the discontinued LLVM Phabricator instance.

I do not think this patch is ready. But, I am sending this out for review by @hctim to get his feedback about the hand-written sanitizer instrumentation in strlen.cpp and the corresponding testing in strlen_test.cpp.

Harbormaster failed remote builds in B54195: Diff 259166!Apr 21 2020, 10:40 PM

I'm not sure if this discussion belongs here or D77949, but as this complicates that patch even more I will do so here.

I really wonder if the added complexity is worth the speedup we are hoping to achieve. Firstly even if this implements ASan which is by far the most common and ubiquitously useful, it makes sense to implement it here, we also have fuzz tests set up which rely on -fsanitize=fuzzer. That won't be implemented here. The whole appeal of sanitizers is that nothing needs to change in our code, but it's not true here no matter what we do if we rely on UB.

This patch is well done and it's very clever and easy to understand way to implement by hand what -fsanitize=address is doing. But it is undeniably making this much less maintainable. Is the cost we are paying in maintainability worth the speedup we are hoping to get using this implementation? If this patch is only for strlen and not an example for how we will handle all such cases which rely on UB for clever tricks, then I would say it is not worth it.

FWIW all of that would be moot if this was performance critical, but I'm not convinced that strlen is a particularly hot function, especially given its use outside of C is very limited. Even in C++ it would be used much less frequently and for programs written in other languages, that would be even less than that is my guess. Even if it were a hot function, how large does the string need to be before the implementation in D77949 is better than what we currently have?

In D78612#1996257, @abrachet wrote:

I'm not sure if this discussion belongs here or D77949, but as this complicates that patch even more I will do so here.

I really wonder if the added complexity is worth the speedup we are hoping to achieve. Firstly even if this implements ASan which is by far the most common and ubiquitously useful, it makes sense to implement it here, we also have fuzz tests set up which rely on -fsanitize=fuzzer. That won't be implemented here. The whole appeal of sanitizers is that nothing needs to change in our code, but it's not true here no matter what we do if we rely on UB.

This patch is well done and it's very clever and easy to understand way to implement by hand what -fsanitize=address is doing. But it is undeniably making this much less maintainable. Is the cost we are paying in maintainability worth the speedup we are hoping to get using this implementation? If this patch is only for strlen and not an example for how we will handle all such cases which rely on UB for clever tricks, then I would say it is not worth it.

Yes, the idea is to use strlen as the first example. The approach we want to take with LLVM libc is to hand craft the instrumentation vs using interceptors as is done for other libcs. So, we want to start learning about such hand-crafted instrumentation using this "simple" example.

About other sanitizers, that is a very valid question. I want @hctim or other sanitizer people to tell us more. FWIW, I heard of informal interest from glibc developers also about using sanitizer instrumentation. Currently, each sanitizer has its own interface API. I would really prefer that all of them are unified in some manner to make them simpler to use. So, my personal view is that we might be heading towards a more generalized/standardized sanitizer API.

FWIW all of that would be moot if this was performance critical, but I'm not convinced that strlen is a particularly hot function, especially given its use outside of C is very limited. Even in C++ it would be used much less frequently and for programs written in other languages, that would be even less than that is my guess. Even if it were a hot function, how large does the string need to be before the implementation in D77949 is better than what we currently have?

I do not have any benchmarks to share or point to wrt strlen particularly. I have done some asking around, and the feedback I received is that we should match at least glibc's performance.

Also adding @phosek who can share his view/opinion about this.

I personally think this is too clever. The whole templating thing seems like a good overall QOL improvement - but the whole unsanitized safe_check_word that we have to manually sanitize seems challenging to maintain. You would need to do sanitizer instrumentation manually for ASan, MSan, UBSan, HWASan, MTE, DFSan, and any other fun toys we come up with. Each of these has a different interface, and different quirks. Why not just let the compiler do these quirks?

Instead, can't we just use the slow path always under sanitizers? i.e.:

#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
  static constexpr bool word_optimized = false;
#else // ASan
  static constexpr bool word_optimized = Mask<sizeof(uintptr_t)>::opt;
#endif

Then, the whole dancing-around-managing sanitizers is solved as the compiler just does normal instrumentation of the slow path.

libc/test/src/string/strlen_test.cpp
70	Wouldn't this be simpler just as `const char *hello = "hello"`? We definitely shouldn't be manually doing asan-ish things in the test.

hctim added subscribers: eugenis, pcc, kcc.Apr 22 2020, 10:15 AM

In D78612#1997305, @hctim wrote:

I personally think this is too clever. The whole templating thing seems like a good overall QOL improvement - but the whole unsanitized safe_check_word that we have to manually sanitize seems challenging to maintain. You would need to do sanitizer instrumentation manually for ASan, MSan, UBSan, HWASan, MTE, DFSan, and any other fun toys we come up with. Each of these has a different interface, and different quirks. Why not just let the compiler do these quirks?

Sorry, the templating thing is probably a distraction at this point.

With respect to the sanitizer parts: In this patch, strlen is an example. With the present sanitizer API, wouldn't hand-crafting instrumentation mean that we have to manually deal with multiple sanitizers in general anyway?

Instead, can't we just use the slow path always under sanitizers? i.e.:
#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
  static constexpr bool word_optimized = false;
#else // ASan
  static constexpr bool word_optimized = Mask<sizeof(uintptr_t)>::opt;
#endif
Then, the whole dancing-around-managing sanitizers is solved as the compiler just does normal instrumentation of the slow path.

My view is that this just defeats the purpose of sanitizing. Also, we will be setting an example which says, "it is OK to keep fast paths unsanitized."

There is an alternate approach we can take here: We can keep the safe_word_check function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.

libc/test/src/string/strlen_test.cpp
70	Other tests do the equivalent. The literal string ends up in the data section which is properly aligned. So, will not trigger an ASAN error as the word read is still within the data section?

I am starting to believe that it is better to stay with the slow and safe strlen implementation. Avoid the sanitizer dancing. Spend more energy on adding features. So that people start using this libc. Then you can use a profiler to find the hot functions.

In D78612#1997367, @tschuett wrote:

I am starting to believe that it is better to stay with the slow and safe strlen implementation. Avoid the sanitizer dancing. Spend more energy on adding features. So that people starting using this libc. Then you can use a profiler to find the hot functions.

Making LLVM libc sanitizer friendly is a major goal. So, sorting out sanitizer issues here is not a waste but actually contributing to a feature of LLVM libc.

Sorry, your patch had to be hijacked. But, I felt strlen was simple enough to understand and set an example with.

Sorry, my mistake. With "avoid the sanitizer dancing" I meant that the compiler should do the instrumentation instead of doing it manually, which is error-prone.

In D78612#1997364, @sivachandra wrote:
In D78612#1997305, @hctim wrote:

I personally think this is too clever. The whole templating thing seems like a good overall QOL improvement - but the whole unsanitized safe_check_word that we have to manually sanitize seems challenging to maintain. You would need to do sanitizer instrumentation manually for ASan, MSan, UBSan, HWASan, MTE, DFSan, and any other fun toys we come up with. Each of these has a different interface, and different quirks. Why not just let the compiler do these quirks?

Sorry, the templating thing is probably a distraction at this point.

With respect to the sanitizer parts: In this patch, strlen is an example. With the present sanitizer API, wouldn't hand-crafting instrumentation mean that we have to manually deal with multiple sanitizers in general anyway?
Instead, can't we just use the slow path always under sanitizers? i.e.:
#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
  static constexpr bool word_optimized = false;
#else // ASan
  static constexpr bool word_optimized = Mask<sizeof(uintptr_t)>::opt;
#endif
Then, the whole dancing-around-managing sanitizers is solved as the compiler just does normal instrumentation of the slow path.
My view is that this just defeats the purpose of sanitizing. Also, we will be setting an example which says, "it is OK to keep fast paths unsanitized."

I agree, sanitizers are useful for detecting problems in programs using libc and detecting problems with the libc itself. It's really the "fast path" where we want the second kind, because it's more complicated and harder to see if it's safe. @sivachandra is right, and thats not an example I would want to set here.

There is an alternate approach we can take here: We can keep the safe_word_check function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.

This makes a lot of sense, I am in strong favor of this approach. We keep both the performance of D77949 and we achieve the goal of easy sanitization. Theres also a lot of merit in pulling out the unsafe bits of a function and explicitly marking them as such, it documents we understand it is UB and is a good starting point if there are ever any bugs.

In D78612#1997446, @abrachet wrote:

In D78612#1997364, @sivachandra wrote:

There is an alternate approach we can take here: We can keep the safe_word_check function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.

This makes a lot of sense, I am in strong favor of this approach. We keep both the performance of D77949 and we achieve the goal of easy sanitization. Theres also a lot of merit in pulling out the unsafe bits of a function and explicitly marking them as such, it documents we understand it is UB and is a good starting point if there are ever any bugs.

For completeness, I would like to point out that it will not catch bad strings. But, I am not sure if it is the job of strlen to catch bad strings. If we ensure that the implementation is safe for good strings, we are probably good.

Do what ever you want, but I would go with the slow and safe strlen implementation until somebody proves that strlen is a performance issue.

My view is that this just defeats the purpose of sanitizing. Also, we will be setting an example which says, "it is OK to keep fast paths unsanitized."

I agree, sanitizers are useful for detecting problems in programs using libc and detecting problems with the libc itself. It's really the "fast path" where we want the second kind, because it's more complicated and harder to see if it's safe. @sivachandra is right, and thats not an example I would want to set here.

From a libc-user's perspective - I want to know if my call to libc was buggy. How that function is implemented is not really important (i.e. whether I get one implementation or another) - and this is how the interceptors work currently with glibc. I don't expect that a sanitizer-friendly implementation would be more than 2x slower than a "fast variant".

From a libc-developer's perspective - I completely understand that having a sanitizer-friendly implementation and a sanitizer-unfriendly implementation is undesirable. The tradeoff here is an enormous maintenance burden. By emulating ASan's behaviour here - it's creating the expectation to emulate all of MSan, TSan, HWASan, UBSan, MTE and DFSan's behaviours as well. This turns strlen() into hundreds of lines of sanitizer-specific behaviour and tricks.

There is an alternate approach we can take here: We can keep the safe_word_check function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.

This makes a lot of sense, I am in strong favor of this approach. We keep both the performance of D77949 and we achieve the goal of easy sanitization. Theres also a lot of merit in pulling out the unsafe bits of a function and explicitly marking them as such, it documents we understand it is UB and is a good starting point if there are ever any bugs.

This approach works well here. Bearing in mind that it doesn't work trivially for anything that doesn't "double back" on the sanitizer-friendly version. Think SIMD strcpy() + MSan.

libc/test/src/string/strlen_test.cpp
70	ASan doesn't care about alignment. It stores a description of the real size of the global - which is caught by partially addressable dwords.

Calling __asan_region_is_poisoned on every word will have very poor performance under ASan. Not sanitizing that entire function would allow strlen to "hop over" inaccessible regions as long as the (alleged) end of the string is inside a valid allocation.

IMHO, running slow path under asan is not that bad, on the balance. The fast path is simple enough to verify by code inspection. Of course, we should only do this for truly simple and performance-critical functions.

Another option is to run the fast path unsanitized, and then check that the reported string bytes are actually accessible according to ASan. That's what the interceptors do. It is possible that the fast path escapes, hits an unmapped page and dies, but we have not seen in practice at all.

In D78612#1997532, @hctim wrote:

From a libc-developer's perspective - I completely understand that having a sanitizer-friendly implementation and a sanitizer-unfriendly implementation is undesirable. The tradeoff here is an enormous maintenance burden. By emulating ASan's behaviour here - it's creating the expectation to emulate all of MSan, TSan, HWASan, UBSan, MTE and DFSan's behaviours as well. This turns strlen() into hundreds of lines of sanitizer-specific behaviour and tricks.

Totally agree that emulating every sanitizer's behavior is not the approach we should be taking. This is probably dumb, but like I have mentioned previously, is it possible to unify/simplify the API of the various sanitizers?

Since everyone from sanitizers is subscribed on this change, let me ask few general questions:

Should LLVM libc have code which relies on, for example, the undefined behavior like in the implementation of strlen in the change?
Should LLVM libc have hand-crafted sanitizer instrumentation at all? If yes, what should be the criteria to determine whether it is required?

Should LLVM libc have code which relies on, for example, the undefined behavior like in the implementation of strlen in the change?

IMHO, yes, if you can show meaningful performance improvement on non-synthetic benchmarks.

Should LLVM libc have hand-crafted sanitizer instrumentation at all? If yes, what should be the criteria to determine whether it is required?

If the answer to the previous question is yes, then it is pretty much required.
Even w/o UB, sanitizers may need source-level help. For example, TSan may need happens-before annotations around atomic-based (or external to the process) synchronization in some cases for correctness.

About unifiying the sanitizer API. Do you have any concrete suggestions?
Sanitizers are generally interested in different code properties. For example this strlen algorithm would be an issue for addressability checkers only - asan and hwasan.
TSan is entirely different and would be interested in synchronization events but not much else.

As a LibreOffice committer, strlen and friends are definitely hot :-)

strlen was the in the top 4 hottest libc routines in my statically linked image of clang when building the Linux kernel.

Performance numbers from @michaelrj here: https://quick-bench.com/q/Z7wq6I3K6xUIeAIogIpHpPgBwSw

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 9:14 PM

Herald added a subscriber: ecnelises. · View Herald Transcript

Revision Contents

Path

Size

libc/

src/

string/

CMakeLists.txt

2 lines

strlen.cpp

71 lines

test/

src/

string/

CMakeLists.txt

1 line

strlen_test.cpp

36 lines

Diff 259164

libc/src/string/CMakeLists.txt

	Show All 25 Lines

	add_entrypoint_object(			add_entrypoint_object(
	strlen			strlen
	SRCS			SRCS
	strlen.cpp			strlen.cpp
	HDRS			HDRS
	strlen.h			strlen.h
	DEPENDS			DEPENDS
				libc.config.${LIBC_TARGET_OS}.lowlevel_utils
	libc.include.string			libc.include.string
				libc.src.__support.lowlevel_utils
	)			)

	# ------------------------------------------------------------------------------			# ------------------------------------------------------------------------------
	# memcpy			# memcpy
	# ------------------------------------------------------------------------------			# ------------------------------------------------------------------------------

	# include the relevant architecture specific implementations			# include the relevant architecture specific implementations
	if(${LIBC_TARGET_MACHINE} STREQUAL "x86_64")			if(${LIBC_TARGET_MACHINE} STREQUAL "x86_64")
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

libc/src/string/strlen.cpp

	//===-- Implementation of strlen ------------------------------------------===//			//===-- Implementation of strlen ------------------------------------------===//
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/string/strlen.h"			#include "src/string/strlen.h"

	#include "src/__support/common.h"			#include "src/__support/common.h"
				#include "src/__support/lowlevel_utils.h"

	#include <stdint.h>			#include <stdint.h>
				#if __has_feature(address_sanitizer) \|\| defined(__SANITIZE_ADDRESS__)
				#include <sanitizer/asan_interface.h>
				#endif

	namespace __llvm_libc {			namespace __llvm_libc {

				template <int ptr_size>
				struct Mask {
				static constexpr bool opt = false;
				static constexpr uintptr_t high = 0;
				static constexpr uintptr_t low = 0;
				};

				template <>
				struct Mask<4> {
				static constexpr bool opt = true;
				static constexpr uintptr_t high = 0x80808080L;
				static constexpr uintptr_t low = 0x01010101L;
				};

				template <>
				struct Mask<8> {
				static constexpr bool opt = true;
				static constexpr uintptr_t high = (Mask<4>::high << 32) \| Mask<4>::high;
				static constexpr uintptr_t low = (Mask<4>::low << 32) \| Mask<4>::low;
				};

				static constexpr bool word_optimized = Mask<sizeof(uintptr_t)>::opt;
				static constexpr uintptr_t high_mask = Mask<sizeof(uintptr_t)>::high;
				static constexpr uintptr_t low_mask = Mask<sizeof(uintptr_t)>::low;

				__attribute__((no_sanitize("address", "bounds", "memory")))
				static inline bool safe_check_word(const uintptr_t *ptr) {
				#if __has_feature(address_sanitizer) \|\| defined(__SANITIZE_ADDRESS__)
				void *poison_addr = __asan_region_is_poisoned(
				const_cast<void >(reinterpret_cast<const void >(ptr)),
				sizeof(uintptr_t));
				#endif

				uintptr_t value = *ptr;
				if (((value - low_mask) & ~value & high_mask) != 0) {
				return true;
				}

				#if __has_feature(address_sanitizer) \|\| defined(__SANITIZE_ADDRESS__)
				if (poison_addr != nullptr) {
				// A NUL byte was not found and some part of the word is OOB.
				__asan_report_error(
				reinterpret_cast<void *>(get_current_pc()),
				reinterpret_cast<void *>(get_current_bp()),
				reinterpret_cast<void *>(get_current_sp()), poison_addr, false, 1);
				}
				#endif

				return false;
				}

	// For LLP64, i.e., Windows, pointers are 64 bit and long is only 32 bit			// For LLP64, i.e., Windows, pointers are 64 bit and long is only 32 bit
	// Thus, this implementation uses uintptr_t instead of long			// Thus, this implementation uses uintptr_t instead of long

	size_t LLVM_LIBC_ENTRYPOINT(strlen)(const char *src) {			size_t LLVM_LIBC_ENTRYPOINT(strlen)(const char *src) {
	const char *charPtr;			const char *charPtr;
	const uintptr_t *ptrPtr;			const uintptr_t *ptrPtr;
	uintptr_t highMask = 0x80808080L;
	uintptr_t lowMask = 0x01010101L;

	if (sizeof(uintptr_t) == 4) {			if (!word_optimized) {
	// do nothing
	} else if (sizeof(uintptr_t) == 8) {
	// setup 64 bit masks
	highMask = ((highMask << 16) << 16) \| highMask;
	lowMask = ((lowMask << 16) << 16) \| lowMask;
	} else {
	// fallback to slow path			// fallback to slow path
	const char *end = src;			const char *end = src;
	while (*end != '\0')			while (*end != '\0')
	++end;			++end;
	return end - src;			return end - src;
	}			}

	// align char_ptr to multiple of sizeof(uintptr_t)			// align char_ptr to multiple of sizeof(uintptr_t)
	for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;			for (charPtr = src; ((uintptr_t)charPtr & (sizeof(uintptr_t) - 1)) != 0;
	++charPtr) {			++charPtr) {
	if (*charPtr == '\0')			if (*charPtr == '\0')
	return charPtr - src;			return charPtr - src;
	}			}

	ptrPtr = (const uintptr_t *)charPtr;			ptrPtr = (const uintptr_t *)charPtr;

	// process sizeof(uintptr_t) bytes at a time			// process sizeof(uintptr_t) bytes at a time
	for (;;) {			for (;;) {
	uintptr_t value = *ptrPtr;			if (safe_check_word(ptrPtr)) {

	if (((value - lowMask) & ~value & highMask) != 0) {
	// found a zero byte			// found a zero byte
	const char cp = (const char )ptrPtr;			const char cp = (const char )ptrPtr;

	for (unsigned i = 0; i < sizeof(uintptr_t); ++i)			for (unsigned i = 0; i < sizeof(uintptr_t); ++i)
	if (cp[i] == 0)			if (cp[i] == 0)
	return cp - src + i;			return cp - src + i;
	}			}

	ptrPtr++;			ptrPtr++;
	}			}
	}			}

	} // namespace __llvm_libc			} // namespace __llvm_libc

libc/test/src/string/CMakeLists.txt

	Show All 23 Lines

	add_libc_unittest(			add_libc_unittest(
	strlen_test			strlen_test
	SUITE			SUITE
	libc_string_unittests			libc_string_unittests
	SRCS			SRCS
	strlen_test.cpp			strlen_test.cpp
	DEPENDS			DEPENDS
				libc.include.signal
	libc.src.string.strlen			libc.src.string.strlen
	)			)

	# Tests all implementations of memcpy that can run on the host.			# Tests all implementations of memcpy that can run on the host.
	get_property(memcpy_implementations GLOBAL PROPERTY memcpy_implementations)			get_property(memcpy_implementations GLOBAL PROPERTY memcpy_implementations)
	foreach(memcpy_config_name IN LISTS memcpy_implementations)			foreach(memcpy_config_name IN LISTS memcpy_implementations)
	get_target_property(require_cpu_features libc.src.string.${memcpy_config_name} REQUIRE_CPU_FEATURES)			get_target_property(require_cpu_features libc.src.string.${memcpy_config_name} REQUIRE_CPU_FEATURES)
	host_supports(can_run "${require_cpu_features}")			host_supports(can_run "${require_cpu_features}")
	Show All 15 Lines

libc/test/src/string/strlen_test.cpp

	//===-- Unittests for strlen ----------------------------------------------===//			//===-- Unittests for strlen ----------------------------------------------===//
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "src/string/strlen.h"			#include "src/string/strlen.h"
	#include "utils/UnitTest/Test.h"			#include "utils/UnitTest/Test.h"

				#if __has_feature(address_sanitizer) \|\| defined(__SANITIZE_ADDRESS__)
				#include "include/signal.h"
				#include "src/signal/signal.h"

				#include <sanitizer/asan_interface.h>
				#endif

	TEST(StrLenTest, EmptyString) {			TEST(StrLenTest, EmptyString) {
	const char *empty = "";			const char *empty = "";

	size_t result = __llvm_libc::strlen(empty);			size_t result = __llvm_libc::strlen(empty);
	ASSERT_EQ((size_t)0, result);			ASSERT_EQ((size_t)0, result);
	}			}

	TEST(StrLenTest, AnyString) {			TEST(StrLenTest, AnyString) {
	Show All 32 Lines
	}			}

	TEST(StrLenTest, TwoBytes) {			TEST(StrLenTest, TwoBytes) {
	const char *two = "1";			const char *two = "1";

	size_t result = __llvm_libc::strlen(two);			size_t result = __llvm_libc::strlen(two);
	ASSERT_EQ((size_t)1, result);			ASSERT_EQ((size_t)1, result);
	}			}

				TEST(StrLenTest, SafeOOBRead) {
				const char hello[8] = {'h', 'e', 'l', 'l', 'o', '\0', '0', '0'} ;
				#if __has_feature(address_sanitizer) \|\| defined(__SANITIZE_ADDRESS__)
				hctimUnsubmitted Not Done Reply Inline Actions Wouldn't this be simpler just as `const char hello = "hello"`? We definitely shouldn't be manually doing asan-ish things in the test. hctim:* Wouldn't this be simpler just as `const char *hello = "hello"`? We definitely shouldn't be…
				sivachandraAuthorUnsubmitted Done Reply Inline Actions Other tests do the equivalent. The literal string ends up in the data section which is properly aligned. So, will not trigger an ASAN error as the word read is still within the data section? sivachandra: Other tests do the equivalent. The literal string ends up in the data section which is properly…
				hctimUnsubmitted Not Done Reply Inline Actions ASan doesn't care about alignment. It stores a description of the real size of the global - which is caught by partially addressable dwords. hctim: ASan doesn't care about alignment. It stores a description of the real size of the global…
				__asan_poison_memory_region(hello + 6, 2);
				#endif

				size_t result = __llvm_libc::strlen(hello);
				ASSERT_EQ((size_t)5, result);
				}

				#if __has_feature(address_sanitizer) \|\| defined(__SANITIZE_ADDRESS__)
				TEST(StrLenTest, OOBFailWhenCheckingWord) {
				auto strlen_call = [] {
				const char hello[8] = {'h', 'e', 'l', 'l', 'o', '0', '0', '0'};
				__asan_poison_memory_region(hello + 3, 6);
				__llvm_libc::strlen(hello); };
				EXPECT_EXITS(strlen_call, 1);
				}

				TEST(StrLenTest, OOBFailWhenReadingLastByte) {
				auto strlen_call = [] {
				const char hello[8] = {'h', 'e', 'l', 'l', 'o', '\0', '0', '0'};
				__asan_poison_memory_region(hello + 3, 5);
				__llvm_libc::strlen(hello); };
				EXPECT_EXITS(strlen_call, 1);
				}

				#endif

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add sanitizer instrumentation to the SIMD strlen implementation.Needs ReviewPublic

Details

Diff Detail