This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/src/__support/
-
src/
-
__support/
-
common.h

Differential D124958

[libc] Align functions to 64B to lower benchmarking variance
Needs ReviewPublic

Authored by gchatelet on May 4 2022, 1:30 PM.

Download Raw Diff

Details

Reviewers

sivachandra

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gchatelet created this revision.May 4 2022, 1:30 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 4 2022, 1:30 PM

Herald added subscribers: libc-commits, ecnelises, tschuett. · View Herald Transcript

gchatelet requested review of this revision.May 4 2022, 1:30 PM

@sivachandra at first I thought I should use LLVM_LIBC_FUNCTION_ATTR but it does not work. Should it be moved to the definition instead of the declaration?

Harbormaster completed remote builds in B162782: Diff 427118.May 4 2022, 1:36 PM

I have a few questions about this patch:

Do you want all libc functions to be aligned to 64 bytes, or all public functions to be aligned to 64 bytes? Or, do you just want the memory functions to be aligned to 64 bytes?
What effect should I see with and without this patch? For example, with memcpy, I notice that it gets an address of 0 with or without this patch.

In D124958#3492446, @sivachandra wrote:

I have a few questions about this patch:

Do you want all libc functions to be aligned to 64 bytes, or all public functions to be aligned to 64 bytes? Or, do you just want the memory functions to be aligned to 64 bytes?

This was my original answer

We've measured a significant (+30%) swing in performance in microbenchmarks for several x86 microarchitectures (Intel Haswell, Intel Skylake, AMD Rome) and for a variety of memory functions: read only functions (e.g. memcmp, bcmp), write only functions (e.g. memset, bzero) and read/write functions (e.g. memmove, memcpy).
So I'd be inclined to think that at least all functions touching memory are subject to this swing for x86.
Now, I don't see a specific link between alignment of the code and performance of read/write operations so I'd be tempted to conclude that this behaviour is generalizable to all functions.
Do you want me to gather evidence of this behaviour for other functions before moving forward with this patch?
Also maybe we can lower the alignment requirement to 32B, by default it is 16B.

but taking a step back here, it seems that the swing happens mostly for distributions that exercise large sizes (namely uniform 384 to 4096).
This corresponds to code running in a loop and using vector instructions.
On x86 these instructions take many bytes to encode and the CPU's frontend can only decode up to 16B per cycle (32B for Rome AFAIR).
Usually the decoded instructions are cached to prevent tight loops to be frontend bounds but we know that under certain circumstances the cache is evicted, leading to decoding to occur again.
We also know that the caching is based on instruction addresses so aligning the function may just - by chance - help with this.
I'd need to do more tests to check this assumption though.

(source https://stackoverflow.com/a/26003659)

What effect should I see with and without this patch? For example, with memcpy, I notice that it gets an address of 0 with or without this patch.

You cannot see the effect of this patch by looking at the asm in the generated .o or .a, the effect is only visible once linked in the final binary.
You can witness it by using objdump though. The -h option drops the section data where you can see the required alignment 2**6 (64), without this patch the alignment is 2**4 (16).

 % objdump -h ~/llvm-project_rel_compiled-with-clang/projects/libc/src/string/CMakeFiles/libc.src.string.memcpy_opt_host.__internal__.dir/memcpy.cpp.o       

/redacted/llvm-project_rel_compiled-with-clang/projects/libc/src/string/CMakeFiles/libc.src.string.memcpy_opt_host.__internal__.dir/memcpy.cpp.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .text._ZN11__llvm_libc6memcpyEPvPKvm 00000183  0000000000000000  0000000000000000  00000040  2**6
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 .rodata._ZN11__llvm_libc6memcpyEPvPKvm 00000014  0000000000000000  0000000000000000  000001c4  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
  3 .comment      00000067  0000000000000000  0000000000000000  000001d8  2**0
                  CONTENTS, READONLY
  4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  0000023f  2**0
                  CONTENTS, READONLY
  5 .llvm_addrsig 00000000  0000000000000000  0000000000000000  00000348  2**0
                  CONTENTS, READONLY, EXCLUDE

Revision Contents

Path

Size

libc/

src/

__support/

common.h

17 lines

Diff 427118

libc/src/__support/common.h

	Show All 9 Lines
	#define LLVM_LIBC_SUPPORT_COMMON_H			#define LLVM_LIBC_SUPPORT_COMMON_H

	#define LIBC_INLINE_ASM __asm__ __volatile__			#define LIBC_INLINE_ASM __asm__ __volatile__

	#define likely(x) __builtin_expect(!!(x), 1)			#define likely(x) __builtin_expect(!!(x), 1)
	#define unlikely(x) __builtin_expect(x, 0)			#define unlikely(x) __builtin_expect(x, 0)
	#define UNUSED __attribute__((unused))			#define UNUSED __attribute__((unused))

				#ifdef __has_attribute
				#define LIBC_HAVE_ATTRIBUTE(x) __has_attribute(x)
				#else
				#define LIBC_HAVE_ATTRIBUTE(x) 0
				#endif

				#if LIBC_HAVE_ATTRIBUTE(aligned) \|\| (defined(__GNUC__) && !defined(__clang__))
				#define LIBC_ATTRIBUTE_FUNC_ALIGN(bytes) __attribute__((aligned(bytes)))
				#else
				#define LIBC_ATTRIBUTE_FUNC_ALIGN(bytes)
				#endif

	#ifndef LLVM_LIBC_FUNCTION_ATTR			#ifndef LLVM_LIBC_FUNCTION_ATTR
	#define LLVM_LIBC_FUNCTION_ATTR			#define LLVM_LIBC_FUNCTION_ATTR
	#endif			#endif

	// MacOS needs to be excluded because it does not support aliasing.			// MacOS needs to be excluded because it does not support aliasing.
	#if defined(LLVM_LIBC_PUBLIC_PACKAGING) && (!defined(__APPLE__))			#if defined(LLVM_LIBC_PUBLIC_PACKAGING) && (!defined(__APPLE__))
	#define LLVM_LIBC_FUNCTION(type, name, arglist) \			#define LLVM_LIBC_FUNCTION(type, name, arglist) \
	LLVM_LIBC_FUNCTION_ATTR decltype(__llvm_libc::name) \			LLVM_LIBC_FUNCTION_ATTR decltype(__llvm_libc::name) \
	__##name##_impl__ __asm__(#name); \			__##name##_impl__ __asm__(#name); \
	decltype(__llvm_libc::name) name [[gnu::alias(#name)]]; \			decltype(__llvm_libc::name) name [[gnu::alias(#name)]]; \
	type __##name##_impl__ arglist			type LIBC_ATTRIBUTE_FUNC_ALIGN(64) __##name##_impl__ arglist
	#else			#else
	#define LLVM_LIBC_FUNCTION(type, name, arglist) type name arglist			#define LLVM_LIBC_FUNCTION(type, name, arglist) \
				type LIBC_ATTRIBUTE_FUNC_ALIGN(64) name arglist
	#endif			#endif

	namespace __llvm_libc {			namespace __llvm_libc {
	namespace internal {			namespace internal {
	constexpr bool same_string(char const lhs, char const rhs) {			constexpr bool same_string(char const lhs, char const rhs) {
	for (; lhs \|\| rhs; ++lhs, ++rhs)			for (; lhs \|\| rhs; ++lhs, ++rhs)
	if (lhs != rhs)			if (lhs != rhs)
	return false;			return false;
	Show All 18 Lines