This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libc/
-
CMakeLists.txt
-
cmake/modules/
-
modules/
2/2
LLVMLibCCheckCpuFeatures.cmake
7/7
LLVMLibCRules.cmake
-
cpu_features/
-
check_avx.cpp
-
check_avx512f.cpp
4/4
check_cpu_features.cpp.in
-
check_sse.cpp
-
check_sse2.cpp
-
lib/
-
CMakeLists.txt
-
src/string/
-
string/
22/22
CMakeLists.txt
-
memcpy.h
2/2
memcpy.cpp
5/5
memcpy_arch_specific.h.def
-
memory_utils/
-
CMakeLists.txt
3/3
memcpy_utils.h
-
utils.h
-
x86/
-
CMakeLists.txt
-
memcpy_arch_specific.h.inc
-
test/src/string/
-
src/
-
string/
9/9
CMakeLists.txt
4/4
memcpy_test.cpp
-
memory_utils/
-
CMakeLists.txt
12/12
memcpy_utils_test.cpp
-
utils_test.cpp

Differential D74397

[libc] Adding memcpy implementation for x86_64
ClosedPublic

Authored by gchatelet on Feb 11 2020, 4:51 AM.

Download Raw Diff

Details

Reviewers

sivachandra
abrachet

Commits

rG04a309dd0be3: [libc] Adding memcpy implementation for x86_64

Summary

It is advised to read the post motivating the creation of __builtin_memcpy_inline first.

The patch focuses on static library but allows creation of several implementations depending on cpu features. The default implementation will be optimized for the host capabilities.
Currently the use of rep movsb is disabled but we plan to unable it via CMake options.

This implementation is mainly tested on clang but should compile with GCC as well. For now it doesn't build on MSVC.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gchatelet created this revision.Feb 11 2020, 4:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2020, 4:51 AM

Herald added subscribers: libc-commits, tschuett, MaskRay, mgorny. · View Herald Transcript

Harbormaster completed remote builds in B46212: Diff 243817.Feb 11 2020, 4:58 AM

I've gone ahead and copy and pasted some of this into godbolt to save others some time if they wish to play around https://godbolt.org/z/z4dCmj :)

How do we build? We may want to test in debug but build the libc with -march=native for instance,

This is logical. To my knowledge we don't currently do anything special when CMAKE_BUILD_TYPE=Release but this makes sense to turn on for release and benchmarking.

With gcc we can use __builtin_memcpy but then we'd need a postprocess step to check that the final assembly do not contain call to memcpy (unlikely but allowed),

I think we can do this in cmake with add_custom_command and POST_BUILD specified then just nm $<TARGET_FILE:target> | grep "U memcpy"

libc/src/string/memcpy_x86_64.cpp
12 ↗	(On Diff #243817)	It might be worth it as a sanity check because this is an x86 specific file to static_assert LLVM_LIBC_CACHELINE_SIZE == 64
21–30 ↗	(On Diff #243817)	Off of intuition I would imagine these are very rare sizes to call `memcpy` with. Would it be better to do something like `if (count < 5) goto smallCount;` and move these down? It's worth linking while we are here @ckennelly's thread from a few weeks ago. "For `memcpy`, 96% of sizes are <= 128 bytes. 99% are <= 1024 bytes."
44 ↗	(On Diff #243817)	I'm not sure I'm understanding this. It's not like we can change this at compile time, line 47 is dead code.
46 ↗	(On Diff #243817)	Why was 32 chosen?
libc/test/src/string/memory_utils/memcpy_utils_test.cpp
15	D74091 Was also wanting to use `assert`/`abort` we should add them. Also that comment is funny to me because its the only thing in assert.h :)
18	Not used as far as I can tell

How do we customize the implementation? (i.e. how to define kRepMovsBSize),

What kind of customizations?

How do we specify custom compilation flags? (We'd need -fno-builtin-memcpy to be passed in),

Do we need to pass this flag for building llvm-libc, or to user code (and llvm-libc tests?) Reading the code, it seems to me that it is the latter, correct?

How do we build? We may want to test in debug but build the libc with -march=native for instance,

Not sure I understand this fully. What are the use cases/goals of what you are describing here?

Clang has a brand new builtin __builtin_memcpy_inline which makes the implementation easy and efficient, but:

If we compile with gcc or msvc we can't use it, resorting on less efficient code generation,

Less efficient wrt clang compiled code? Can we rephrase what you are saying as, "we make use of a better optimization when compiled with clang?"

With gcc we can use __builtin_memcpy but then we'd need a postprocess step to check that the final assembly do not contain call to memcpy (unlikely but allowed),

The concern is that it will become a recursive call? What action is to be taken if we do find a call to memcpy in the final assembly?

For msvc we'd need to resort on the compiler optimization passes.

Does "we" mean llvm-libc developers? A related question: considering we are using __builtin_memcpy and inline assembly, does the code work as is with MSVC?

libc/src/string/memcpy_x86_64.cpp
15 ↗	(On Diff #243817)	Is this the only reason this file name has the `_86_64` in it? If yes, is this function the only machine specific piece for other archs as well? If yes, we should put it in a header file and use the same scheme we use for `LLVM_LIBC_CACHELINE_SIZE`.
44 ↗	(On Diff #243817)	I guess this relates to the customization @gchatelet mentioned about in the patch description.
libc/test/src/string/memory_utils/memcpy_utils_test.cpp
10	Where is `monitor_memcpy` defined?
15	Point taken. I will prepare something to address this.
16	Use `stdint.h` instead of `cstdint`.

MaskRay added inline comments.Feb 11 2020, 10:34 PM

libc/src/string/memcpy_x86_64.cpp
16 ↗	(On Diff #243817)	`LIBC_INLINE_ASM` does not improve readability. `asm volatile` Any reason using constraint modifier `+`? It adds two operands, one input and one output. Just use `"D"(dst), "S"(src), "c"(count)`

abrachet added inline comments.Feb 11 2020, 10:43 PM

libc/src/string/memcpy_x86_64.cpp
16 ↗	(On Diff #243817)	+1 I think the reasoning behind the macro is for MSVC, which as far as I can tell doesn't support GCC's syntax anyway https://docs.microsoft.com/en-us/cpp/assembler/inline/asm?view=vs-2019

In D74397#1870989, @abrachet wrote:

I've gone ahead and copy and pasted some of this into godbolt to save others some time if they wish to play around https://godbolt.org/z/z4dCmj :)

Thx : )
One should add -fno-builtin-memcpy
Also it's worth playing with -mno-avx or -mavx512f to see the difference in codegen.

How do we build? We may want to test in debug but build the libc with -march=native for instance,

This is logical. To my knowledge we don't currently do anything special when CMAKE_BUILD_TYPE=Release but this makes sense to turn on for release and benchmarking.

Yes I think we want to get the most out of the architecture we target (see the difference in codegen depending on available features avx / avx512f)

With gcc we can use __builtin_memcpy but then we'd need a postprocess step to check that the final assembly do not contain call to memcpy (unlikely but allowed),

I think we can do this in cmake with add_custom_command and POST_BUILD specified then just nm $<TARGET_FILE:target> | grep "U memcpy"

Thx that's useful

In D74397#1871338, @sivachandra wrote:

How do we customize the implementation? (i.e. how to define kRepMovsBSize),

What kind of customizations?

The rep movsb instruction performance is highly tied to the targeted microarchitecture.
For one there is the ERMS cpuid flag that helps to know if it should be used at all.
Then depending on the microarchitecture the crosspoint between aligned copy and rep movsb varies between 512 to a few kilobytes.
Ideally we need to adapt this threshold somehow to provide the best implementation.

Eventually, when rep movsb becomes excellent we can replace the function entirely with this single instruction.

I understand that llvm-libc is to be a pick and choose what you need libc which implies some sort of customization anyways. Am I right?

How do we specify custom compilation flags? (We'd need -fno-builtin-memcpy to be passed in),

Do we need to pass this flag for building llvm-libc, or to user code (and llvm-libc tests?) Reading the code, it seems to me that it is the latter, correct?

It is solely for this specific compilation unit
When using clang we can use [[ https://clang.llvm.org/docs/AttributeReference.html#no-builtin | __attribute__((no_builtin("memcpy"))) ]] but it won't work with gcc.

How do we build? We may want to test in debug but build the libc with -march=native for instance,

Not sure I understand this fully. What are the use cases/goals of what you are describing here?

See above, the generated code is improved (smaller and faster) when targeting specific architecture.
glibc is using IFUNC to that matter and lets the runtime pick the best implementation.
Although feasible in llvm-libc we noticed that the required extra level of indirection is hurting branch prediction and latency for small sizes (which are the most frequent)

Clang has a brand new builtin __builtin_memcpy_inline which makes the implementation easy and efficient, but:

If we compile with gcc or msvc we can't use it, resorting on less efficient code generation,

Less efficient wrt clang compiled code? Can we rephrase what you are saying as, "we make use of a better optimization when compiled with clang?"

With gcc we can use __builtin_memcpy but then we'd need a postprocess step to check that the final assembly do not contain call to memcpy (unlikely but allowed),

The concern is that it will become a recursive call?

Yes this is technically possible but I've never seen it in practice,
It is highly unlikely that a compiler would generate a call to memcpy for sizes <=64B.

What action is to be taken if we do find a call to memcpy in the final assembly?

I believe we should just refuse to compile on such a compiler if it happens.

For msvc we'd need to resort on the compiler optimization passes.

Does "we" mean llvm-libc developers?

Yes, sorry for not being clear here.

A related question: considering we are using __builtin_memcpy and inline assembly, does the code work as is with MSVC?

No it doesn't, I can add a fallback case if you want but the generated code is not good right now.
This means we'd need to specialize the CopyRepMovsb function so it uses the correct syntax for msvc __asm instead of the provided LIBC_INLINE_ASM

The patch is to get the conversation started and is not a full just implementation yet (although it is functional).

Quick question @sivachandra , how do we currently build llvm-libc?
[[ https://github.com/llvm/llvm-project/blob/master/libc/docs/source_layout.rst#the-utilsbuild_scripts-directory | utils/build_scripts ]] seems to be missing so I could only build the entrypoints via transitive dependency through tests.

gchatelet edited the summary of this revision. (Show Details)Feb 12 2020, 2:41 AM

gchatelet marked 13 inline comments as done.Feb 12 2020, 4:57 AM

gchatelet added inline comments.

libc/src/string/memcpy_x86_64.cpp
12 ↗	(On Diff #243817)	Agreed but I'm deferring the addition of the static assert to the point where we know which layout to use for the source code. I'm keeping your comment as `Not Done` so I don't forget to add it later.
15 ↗	(On Diff #243817)	Yes for now. I'd yet to see how this implementation performs on other architectures (ARM, PowerPC, ...). But yes indeed this implementation will at least be common to `x86` and `x86_64`. I'll adapt the code consequently,
16 ↗	(On Diff #243817)	@MaskRay `LIBC_INLINE_ASM` was available so I used it, I thought this would be used to mask out the differences between the different compilers but I'm not so sure. The operands/constraints modifiers might be sufficiently different that it is not possible to write it in a compiler agnostic way. For instance `asm` is not to be used with msv (from the link @abrachet sent). As for the `+` modifier they are not needed in this context indeed.
21–30 ↗	(On Diff #243817)	Off of intuition I would imagine these are very rare sizes to call memcpy with They are not rare actually. The rationale here is to have a branching cost that is proportional to the copy cost. Now since the code is C++ the compiler sees it and understands the semantic. For people using Profile Guided Optimization techniques the compiler will be able to reorder the branches according to branching probabilities. Thx for linking in the thread.
44 ↗	(On Diff #243817)	Yes as I stated in the introduction of the patch this is for early feedback. Ultimately `kRepMovsBSize` needs to be defined by the environment.
46 ↗	(On Diff #243817)	It gave good results : ) `CopyAligned<N>` is prone to memory reloads so there's a balance between copying big chunks and reloading data from memory. To be honest I would like a few pieces of the code to be customizable (this is one) so we could be benchmarking the cross-product of the implementations and pick the best performing ones per uarch. The overall design should be better documented for sure, I'll work on it. Code size is especially important for icache pressure and as you saw the generated code is quite small.
libc/test/src/string/memory_utils/memcpy_utils_test.cpp
10	Further down after the `GetTrace` function.
15	I think that would be nice to have the functions in `memcpy_utils.h` have some precondition checking, an `abort` function would be useful indeed, #if not defined(NDEBUG) // check precondition #endif

Address comments

gchatelet marked an inline comment as done.Feb 12 2020, 5:27 AM

gchatelet added inline comments.

libc/src/string/memcpy_arch_specific.h.def
18	@sivachandra I'm not super happy with this pattern that dispatches the logic in many files. Would it be possible to generate the cpp file directly instead of generating an intermediate header file? At least the common implementation would be in the template.

Harbormaster completed remote builds in B46323: Diff 244144.Feb 12 2020, 5:27 AM

Improved documentation / fallback code for msvc

Harbormaster completed remote builds in B46330: Diff 244162.Feb 12 2020, 6:41 AM

Revert rep movsb

libc/src/string/memcpy_x86_64.cpp
16 ↗	(On Diff #243817)	Actually I believe we still need to use `+` See clang implementation. This is to inform the compiler that executing `rep mov` will change the underlying registers, it can't assume that `ECX`, `ESI`, `EDI` won't change.

Harbormaster completed remote builds in B46331: Diff 244170.Feb 12 2020, 7:09 AM

The rep movsb instruction performance is highly tied to the targeted microarchitecture.
For one there is the ERMS cpuid flag https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features that helps to know if it should be used at all.
Then depending on the microarchitecture the crosspoint between aligned copy and rep movsb varies between 512 to a few kilobytes.
Ideally we need to adapt this threshold somehow to provide the best implementation.

Eventually, when rep movsb becomes excellent we can replace the function entirely with this single instruction.

understand that llvm-libc is to be a pick and choose what you need libc which implies some sort of customization anyways. Am I right?

Would configure time sniffing to detect what is available work? It will not work in case of cross-compilation. But, requiring explicit setting of build params for cross-compilation seems OK to me.

A related question: considering we are using __builtin_memcpy and inline assembly, does the code work as is with MSVC?

No it doesn't, I can add a fallback case if you want but the generated code is not good right now.
This means we'd need to specialize the CopyRepMovsb function so it uses the correct syntax for msvc __asm instead of the provided LIBC_INLINE_ASM

We don't need to provide the specialization in this patch, but pointing out which parts needs to be specialized would help.

Quick question @sivachandra , how do we currently build llvm-libc?
utils/build_scripts https://github.com/llvm/llvm-project/blob/master/libc/docs/source_layout.rst#the-utilsbuild_scripts-directory seems to be missing so I could only build the entrypoints via transitive dependency through tests.

Sorry, that file is out of date. Will fix it soon.
Add the target of your entrypoint to the list of DEPENDS for the llvmlibc target here: https://github.com/llvm/llvm-project/blob/master/libc/lib/CMakeLists.txt
Running ninja llvmlibc after that will produce libllvmlibc.a which includes your entrypoint.

libc/test/src/string/memory_utils/memcpy_utils_test.cpp
10	Ah sorry, I missed it. So, does it mean that you want that name to be overridable/customizable? If yes, then would it make sense to define it to `monitor_memcpy` only if not defined? #ifndef LLVM_LIBC_MEMCPY_MONITOR #define LLVM_LIBC_MEMCPY_MONITOR monitor_memcpy #endif
15	Okay. There are two kinds of `abort` here. One to use in the implementation, another to use in the test. For the `abort` used in the test, we should use the one coming from the system libc. For the one going into the implementation, we should use the one from llvm-libc. The abort function is fairly involved and needs other pieces to be in place for us to build llvm-libc's implementation. Let me do my homework get back to you on that.

See above, the generated code is improved (smaller and faster) when targeting specific architecture.
glibc is using IFUNC to that matter and lets the runtime pick the best implementation.
Although feasible in llvm-libc we noticed that the required extra level of indirection is hurting branch prediction and latency for small sizes (which are the most frequent)

The resolver function is only called the first time that ld.so binds the symbol into the plt so I don't think it would hurt branch prediction any more than any other dynamic symbol. Notably a big goal of llvm libc stated from Siva early on was static linking, so I am sure we will end up having a build time way of determining which specialization to call, as well as runtime with ifunc.

How do we specify custom compilation flags? (We'd need -fno-builtin-memcpy to be passed in),

Were you asking how to express this in CMake?

set_property(
  SOURCE memcpy.cpp
  PROPERTY COMPILE_OPTIONS
  -fno-builtin-memcpy
)

This worked in my (quick) testing. ninja -v will show the commands it's running so you can see if it worked here too.

libc/src/string/memcpy_arch_specific.h.def
22	remove `#define`
libc/src/string/memory_utils/memcpy_utils.h
23	Presumably should be just `__builtin_memcpy`
libc/test/src/string/memory_utils/memcpy_utils_test.cpp
15	The abort function is fairly involved and needs other pieces to be in place for us to build llvm-libc's implementation. I was actually working on writing to the list about getting started on signals, which is how abort should be implemented I think, `for(;;) raise(SIGABRT);`. I'd be happy to get started on signals :) I think for now though, it is fine to have abort be `__builtin_trap()` because we are mainly focused on x86 at the moment. (I remember on ARM32 __builtin_trap() calls abort for example)

sivachandra added inline comments.Feb 12 2020, 2:46 PM

libc/test/src/string/memory_utils/memcpy_utils_test.cpp
15	It would be awesome if you can start work on signals.

gchatelet edited the summary of this revision. (Show Details)Feb 13 2020, 1:12 AM

In D74397#1872961, @sivachandra wrote:

The rep movsb instruction performance is highly tied to the targeted microarchitecture.
For one there is the ERMS cpuid flag https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features that helps to know if it should be used at all.
Then depending on the microarchitecture the crosspoint between aligned copy and rep movsb varies between 512 to a few kilobytes.
Ideally we need to adapt this threshold somehow to provide the best implementation.

Eventually, when rep movsb becomes excellent we can replace the function entirely with this single instruction.

understand that llvm-libc is to be a pick and choose what you need libc which implies some sort of customization anyways. Am I right?

Would configure time sniffing to detect what is available work? It will not work in case of cross-compilation. But, requiring explicit setting of build params for cross-compilation seems OK to me.

Yes that would work, we can map from micro architecture to parameters and be conservative when we don't know.
The possibility to force the configuration manually would be required for cross compilation as you noticed.

A related question: considering we are using __builtin_memcpy and inline assembly, does the code work as is with MSVC?

No it doesn't, I can add a fallback case if you want but the generated code is not good right now.
This means we'd need to specialize the CopyRepMovsb function so it uses the correct syntax for msvc __asm instead of the provided LIBC_INLINE_ASM

We don't need to provide the specialization in this patch, but pointing out which parts needs to be specialized would help.

I'll document the code then.

Quick question @sivachandra , how do we currently build llvm-libc?
utils/build_scripts https://github.com/llvm/llvm-project/blob/master/libc/docs/source_layout.rst#the-utilsbuild_scripts-directory seems to be missing so I could only build the entrypoints via transitive dependency through tests.

Sorry, that file is out of date. Will fix it soon.
Add the target of your entrypoint to the list of DEPENDS for the llvmlibc target here: https://github.com/llvm/llvm-project/blob/master/libc/lib/CMakeLists.txt
Running ninja llvmlibc after that will produce libllvmlibc.a which includes your entrypoint.

Perfect! Thx!

In D74397#1873189, @abrachet wrote:

See above, the generated code is improved (smaller and faster) when targeting specific architecture.
glibc is using IFUNC to that matter and lets the runtime pick the best implementation.
Although feasible in llvm-libc we noticed that the required extra level of indirection is hurting branch prediction and latency for small sizes (which are the most frequent)

The resolver function is only called the first time that ld.so binds the symbol into the plt

Yes

so I don't think it would hurt branch prediction any more than any other dynamic symbol.

Not more than dynamic symbol but still noticeable for small sizes.
That also mean one branch prediction entry is kept for each runtime dispatched function (at least memcpy, memset, memmove, memcmp). These functions are called every now and prevent other code from using them.

Notably a big goal of llvm libc stated from Siva early on was static linking, so I am sure we will end up having a build time way of determining which specialization to call, as well as runtime with ifunc.

Yes, I agree that static linking is the primary goal, we can always provide an ifunc selected function on top of individual implementations.

How do we specify custom compilation flags? (We'd need -fno-builtin-memcpy to be passed in),

Were you asking how to express this in CMake?
set_property(
  SOURCE memcpy.cpp
  PROPERTY COMPILE_OPTIONS
  -fno-builtin-memcpy
)
This worked in my (quick) testing. ninja -v will show the commands it's running so you can see if it worked here too.

Thx !

gchatelet marked 2 inline comments as done.Feb 13 2020, 7:05 AM

gchatelet added inline comments.

libc/src/string/memory_utils/memcpy_utils.h
23	Good catch

Address comments
Improved documentation / fallback code for msvc
Revert rep movsb
Add memcpy to llvmlibc
Add -fnobuiltin and check no undefined symbols
Fix LLVM_LIBC_MEMCPY_MONITOR macro
Remove define in comment
Add restrict and documentation for MSVC

libc/src/string/memory_utils/memcpy_utils.h
23	`__has_builtin` is actually fairly new and `__builtin_memcpy` pretty old so I'm just going to assume that gcc has `__builtin_memcpy`.
libc/test/src/string/memory_utils/memcpy_utils_test.cpp
10	I've defined it in the `CMakeLists.txt` with a check in the test file.

Remove unneeded comment

Harbormaster completed remote builds in B46413: Diff 244426.Feb 13 2020, 7:21 AM

Harbormaster completed remote builds in B46414: Diff 244428.

Tighten MaxReloads test

Harbormaster completed remote builds in B46415: Diff 244429.Feb 13 2020, 7:30 AM

Herald added a subscriber: tatyana-krasnukha. · View Herald TranscriptFeb 13 2020, 7:30 AM

Restrict check undefined symbols to release builds
Added thorough test of memcpy function

Harbormaster completed remote builds in B46424: Diff 244454.Feb 13 2020, 9:01 AM

Allow multiple code generation

gchatelet marked an inline comment as done.Feb 14 2020, 5:00 AM

gchatelet added inline comments.

libc/src/string/CMakeLists.txt
66	This now creates the following memcpy implementations `__llvm_libc::memcpy_x86_64_avx512` `__llvm_libc::memcpy_x86_64_avx` `__llvm_libc::memcpy_x86_64_sse2` `__llvm_libc::memcpy_x86_64_sse` `__llvm_libc::memcpy_x86_64_unopt` For shared libc, we need an ifunc like trampoline to select the correct version. For static libc, we need to select an implementation @sivachandra how do you see this kind if code generation integrate into the more general cmake functions from `libc/cmake/modules/LLVMLibCRules.cmake`? I expect other memory functions to follow the same scheme.

Harbormaster completed remote builds in B46495: Diff 244626.Feb 14 2020, 5:01 AM

sivachandra added inline comments.Feb 16 2020, 12:56 AM

libc/src/string/CMakeLists.txt
66	Instead of building all the possible implementations, could we use the CMake [[ https://cmake.org/cmake/help/v3.14/command/try_compile.html \| `try_compile` ]] and/or [[ https://cmake.org/cmake/help/v3.14/command/try_run.html \| `try_run` ]] command to sniff out the best flags to use? I think `try_run` is more appropriate as I expect that we need to run the `cpuid` instruction? Also, compilers have a convenience macro `__cpuid` to run this instruction on x86/x86_64? BTW, one can have ifuncs in static libraries as well. But, I do understand we want to avoid the overhead of the added indirection, so sniffing out at configure time is the best. If we can setup something for configure time sniffing, I believe we should be able use it (may be with straightforward extension/modification iff required) to use as the ifunc selector as well.

abrachet added inline comments.Feb 16 2020, 3:35 PM

libc/src/string/CMakeLists.txt
11	llvm-nm (the nm that ships with MacOS) for mach-o files prints just the symbol and not its value or type character so grepping for U wouldn't work here (in this very niche case that I just happened to test). If we're using --undefined-only we could just `grep .` perhaps?
66	I think this is going to get quickly outside of the scope of this patch. It would probably be better to work on this in a separate patch.
libc/src/string/memcpy_arch_specific.h.def
34–35	Seems like clang-format interfered a little bit here.
libc/src/string/memcpy_entrypoint.cpp
1 ↗	(On Diff #244626)	Must this be in its own file? Why not in src/string/memcpy.cpp?

sivachandra added inline comments.Feb 17 2020, 12:34 AM

libc/src/string/CMakeLists.txt
66	If there are any infrastructure pieces required to build a full solution, then we can do them separately as a prerequisite. However, if correct compile options are critical to memcpy implementation, then any code added to deduce them should belong to this patch. I agree that the patch will start to become too big. But, I think the additions to memcpy_utils and their tests can be split out to another prerequisite patch.

gchatelet marked 11 inline comments as done.Feb 19 2020, 5:30 AM

gchatelet added inline comments.

libc/src/string/CMakeLists.txt
11	thx for letting me know. I guess `grep .` will work.
66	@sivachandra `try_compile` and `try_run` will only work when the host is also the target of the build. For instance it won't work when cross compiling. There are a few dimensions here: static library autodetect which implementation to use with `try_compile` / `try_run` manually choose the implementation for cross compilation dynamic library select a set of implementations to include in the library and the associated dispatch code test test all the implementations that can run on the host I agree with both of you that this is getting too big for this patch. Let me do my homework and come back with the prerequisite patches.
66	Let me think about it, I'll get back with the required building blocks.
libc/src/string/memcpy_arch_specific.h.def
34–35	Good catch
libc/src/string/memcpy_entrypoint.cpp
1 ↗	(On Diff #244626)	On the one hand we need to pick one implementation to be the `LLVM_LIBC_ENTRYPOINT(memcpy)` and on the other hand we want to generate multiple implementations to be able to test them and embed them into the library. I don't like having two files for this purpose, this is a dirty hack that is not to be submitted, I'll come back with a better solution.

sivachandra added inline comments.Feb 19 2020, 11:17 AM

libc/src/string/CMakeLists.txt
66	We don't need to worry about #2 for now. For cross-compilation in case of #1, if we set up the build vars appropriately, I believe it should not be a problem. #3 is currently a problem because of the way the `add_entrypoint_object` rule works: it expects the entrypoint name and the target name to be the same. We can add an optional argument to the rule which explicitly provides the entrypoint name. This will allow us to add multiple targets for the same entrypoint. We can setup our targets as follows: src/string/ - x86_64 # x86_64 specific directory. - CMakeLists.txt # Lists the targets for the various # x86_64 flavors which all use the # single memcpy.cpp source file - CMakeLists.txt # Lists the target for the release version # of memcpy built using the best compile # for the target machine. - memcpy.cpp # The actual platform independent memcpy # implementation. The same structure can be followed for tests as well. WDYT?

gchatelet marked 5 inline comments as done.Feb 20 2020, 5:58 AM

gchatelet added inline comments.

libc/src/string/CMakeLists.txt
66	@sivachandra SGTM Can you try to amend `add_entrypoint_object` in a separate patch? Separately, I came up with a CMake function to test cpu flags on linux/mac this will help for `3` and `2`. `x86` and `x86_64`will have the same implementation any suggestion on how to name the directory? `x86_32_64` ?

sivachandra added inline comments.Feb 20 2020, 10:27 PM

libc/src/string/CMakeLists.txt
66	Done now: https://reviews.llvm.org/D74948 Conventionally, pieces common to x86_64 and other x86 flavors are all put together in a directory named `x86`. I do not have an easy to way to test on non-x86_64 machines. So, may be just start with `x86_64` for now and leave a comment that it works for other x86 flavors also? If you have a better way, feel free to propose.

This version creates and tests many implementations, and select one for publication into llvmlibc.a

Harbormaster completed remote builds in B47589: Diff 247261.Feb 28 2020, 7:19 AM

@sivachandra it seems there's a problem with the NAME attribute in add_entrypoint_object as the symbol being exported in the final library is not memcpy anymore but the provided NAME

 % nm /tmp/llvm-project_rel_compiled-with-clang/projects/libc/lib/libllvmlibc.a

__errno_location.o:
0000000000000000 T __errno_location
0000000000000000 T _ZN11__llvm_libc16__errno_locationEv
0000000000000000 b _ZN11__llvm_libcL7__errnoE

strcpy.o:
                 U memcpy
0000000000000000 T strcpy
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcpyEPcPKc

strcat.o:
0000000000000000 T strcat
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcatEPcPKc
                 U _ZN11__llvm_libc6strcpyEPcPKc

memcpy.o:
0000000000000000 A memcpy_x86_64_opt_avx512f
0000000000000000 T _ZN11__llvm_libc25memcpy_x86_64_opt_avx512fEPvPKvm
0000000000000000 T _ZN11__llvm_libc6memcpyEPvPKvm

mmap.o:
0000000000000000 T mmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc4mmapEPvmiiil

munmap.o:
0000000000000000 T munmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc6munmapEPvm

raise.o:
0000000000000000 T raise
0000000000000000 T _ZN11__llvm_libc5raiseEi

libc/cmake/modules/LLVMLibCRules.cmake
160	@sivachandra this should be submitted as a separate patch but it's better to have context for the change.
libc/src/string/CMakeLists.txt
0–1	I'm removing this since all implementations will be tested from now on.
libc/src/string/memcpy.cpp
15	We need to selectively declare the entry point depending on if we generate test or final implementation.

In D74397#1898087, @gchatelet wrote:

@sivachandra it seems there's a problem with the NAME attribute in add_entrypoint_object as the symbol being exported in the final library is not memcpy anymore but the provided NAME

 % nm /tmp/llvm-project_rel_compiled-with-clang/projects/libc/lib/libllvmlibc.a

__errno_location.o:
0000000000000000 T __errno_location
0000000000000000 T _ZN11__llvm_libc16__errno_locationEv
0000000000000000 b _ZN11__llvm_libcL7__errnoE

strcpy.o:
                 U memcpy
0000000000000000 T strcpy
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcpyEPcPKc

strcat.o:
0000000000000000 T strcat
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcatEPcPKc
                 U _ZN11__llvm_libc6strcpyEPcPKc

memcpy.o:
0000000000000000 A memcpy_x86_64_opt_avx512f
0000000000000000 T _ZN11__llvm_libc25memcpy_x86_64_opt_avx512fEPvPKvm
0000000000000000 T _ZN11__llvm_libc6memcpyEPvPKvm

mmap.o:
0000000000000000 T mmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc4mmapEPvmiiil

munmap.o:
0000000000000000 T munmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc6munmapEPvm

raise.o:
0000000000000000 T raise
0000000000000000 T _ZN11__llvm_libc5raiseEi

I thought your use case was the other way around: you want to have multiple targets with the same entrypoint name. Is this not correct? The first argument to add_entrypoint_object is the target name (which is in sync with the CMake convention of using target names as the first arg.) The NAME option specifies the entrypoint name.

libc/cmake/modules/LLVMLibCRules.cmake
160	This is fine.
libc/src/string/CMakeLists.txt
27	The `NAME` is the entrypoint name and not the target name. May be you mean the opposite here? As in, instead of `memcpy` on line 54, you want `${memcpy_name}` and `memcpy` for the `NAME` argument?
libc/src/string/memcpy.cpp
15	Test or final, all of them should have the same entrypoint name. No? For example, `memcpy_config1_test` will depend on `memcpy_config1` but will call __llvm_libc::memcpy for the test.
libc/src/string/memcpy_arch_specific.h.def
61	This should all be just `memcpy`. May be I am missing something still.
libc/test/src/string/CMakeLists.txt
30	This seems to be setup correctly. But, instead of `memcpy_name`, they should be called `memcpy_config_name` to make it clear that they are all `memcpy` implementations in different configs.
36	This as well.
libc/test/src/string/memcpy_test.cpp
12	This should not be required.
46	This should also not be required.

MaskRay added inline comments.Mar 2 2020, 10:00 PM

libc/src/string/CMakeLists.txt
66	Does `__attribute__((target("avx")))` meet the needs? As to ifunc, it needs non-trivial work in the rtld. Even in a -static (but not -static-pie) context, there will be R_X86_64_IRELATIVE relocations and crt should resolve them.

address comments and rebase

I think we misunderstood each other.

My original goal was to also support shared library with runtime dispatch (postponed but we'll do it eventually)
In that case we need a single library with all the implementations, they need to have different names to prevent ODR, and memcpy would be the dispatching function.
That's why I provided the custom definitions MEMCPY_NAME=${memcpy_name}.

Now I can indeed remove all of this, have different targets generating different object files all with the same memcpy function in it. This means that I need to create X unit tests and X benchmarks all linking with a different targets but all referring to the same memcpy function (just a different implementation). This works, and I've updated the patch accordingly.

However it will not be straightforward to go from this to the "shared library with runtime dispatch" state. We would need extra machinery to copy the memcpy functions of each implementations into the final object and rename them to prevent ODR.

The current implementation is simpler though. Let me know what you think.

libc/src/string/CMakeLists.txt
66	Does `__attribute__((target("avx")))` meet the needs? Kind of, but It's brittle, I've been bitten a few times passing features with typo and the compiler will happily compile without a warning. It won't work with MSVC For these reasons I'd rather not use it.

Harbormaster failed remote builds in B48818: Diff 249635!Mar 11 2020, 9:37 AM

Mostly LGTM. Few minor questions inline.

libc/cmake/modules/LLVMLibCRules.cmake
202	Sorry for not mentioning this earlier. I have copied this part in to another patch of mine.
208	Do we need this still?
376	Likewise, seems to me like `COMPILE_DEFINITIONS` is not used.
libc/src/string/CMakeLists.txt
45	Should we check if `${opt_level}` is available for the machine this target is being built?
libc/test/src/string/memcpy_test.cpp
13	Instead of this, you can include `src/string/memcpy.h`.

Address comments

libc/cmake/modules/LLVMLibCRules.cmake
202	Yes, I think I messed up with the merge, should be fixed.
208	Right now, support for `repmovsb` is disabled. This is a customization point that should be exposed through preprocessor definitions (and CMake options). As of this patch this is not needed but I expect to need it soon-ish.
libc/src/string/CMakeLists.txt
45	Yes, makes sense. I'm still not super happy about the introspection part, it's cumbersome to use... I'll work on it as a separate patch if you don't mind.
libc/test/src/string/memcpy_test.cpp
13	Ha sure !

rebase

Harbormaster completed remote builds in B48972: Diff 249905.Mar 12 2020, 5:15 AM

Harbormaster completed remote builds in B48970: Diff 249903.

rebasing

Harbormaster failed remote builds in B48974: Diff 249908!Mar 12 2020, 6:29 AM

sivachandra accepted this revision.Mar 12 2020, 8:12 AM

sivachandra marked an inline comment as done.

sivachandra added inline comments.

libc/src/string/CMakeLists.txt
45	SGTM

This revision is now accepted and ready to land.Mar 12 2020, 8:12 AM

I revamped the cpu feature part. It's much simpler and easier to understand.
Now tests that can't run on the host will be skipped.

This last modification does not change the memcpy implementation but how we detect cpu features and how we build/test the memcpy implementations.
There are less global variables only ALL_CPU_FEATURES and HOST_CPU_FEATURES.
The flags are built through compute_flags and host compatibility tested with host_supports.
All memcpy implementations are added to a global property to be passed down to tests.
Tests can query the target for their required cpu features and test the host against them, hence testing only the implementations that the host can execute.
This also remove the need for individual check files.

Overall I think the design is much simpler and straightforward.

As this patch gets closer to landing I think it would make sense to update the description :)

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
10	Is it necessary to print this?
libc/cmake/modules/cpu_features/check_cpu_features.cpp.in
5–11	Is it necessary to define these? Does DEFINITIONS rely on the definitions of macros? Also I think handling MSVC here is strange when `compute_flags` does not currently.
20–21	Why not `putchar` and `puts`?
libc/test/src/string/CMakeLists.txt
32	Would you mind explaining this? It seems like ${flags} will just be -march=native, and the work above to find flags gets ignored.

Harbormaster failed remote builds in B49183: Diff 250296!Mar 13 2020, 2:34 PM

gchatelet edited the summary of this revision. (Show Details)Mar 16 2020, 7:40 AM

Address comments and rebase

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
10	Not really.
libc/cmake/modules/cpu_features/check_cpu_features.cpp.in
5–11	Right, it's probably better to defer anything related to MSVC at that point. Also I'll add documentation to this file because it's a bit obscure.
20–21	I used `putchar` for a single character but I can't use `puts` since it automatically appends a line feed.
libc/test/src/string/CMakeLists.txt
32	Actually there's no need for flags here, the implementation has already been compiled with the correct flags. The test itself doesn't need them.

sivachandra marked an inline comment as done.Mar 16 2020, 9:29 AM

sivachandra added inline comments.

libc/src/string/CMakeLists.txt
83	It seems like this will not ensure the best flags. Is that the intention? If so, why?
libc/src/string/x86_64/CMakeLists.txt
1 ↗	(On Diff #250296)	These listings are already under `x86_64`. So, do we need to use `${LIBC_TARGET_MACHINE}`?
4 ↗	(On Diff #250296)	It would be nice if a descriptive error message is shown if one tries to build a target which isn't supported on their machine. ISTM that it is not the case. If indeed so, we can do it in a later round I think.
libc/test/src/string/CMakeLists.txt
32	Same question for me as well. I left a related comment at a different place above.

sivachandra added inline comments.Mar 16 2020, 10:20 AM

libc/test/src/string/CMakeLists.txt
32	Ah, so remove this line then?

Harbormaster failed remote builds in B49325: Diff 250584!Mar 16 2020, 10:21 AM

gchatelet marked 6 inline comments as done.Mar 16 2020, 11:16 AM

gchatelet added inline comments.

libc/src/string/CMakeLists.txt
83	It will, `-march=native` enables all the features available on the host. Why do you think `-march=native` won't get the best flags? That said if you're on a `skylake-avx512` machine it will not use avx512 instructions. This is because `-mprefer-vector-width` is defaulted to 256 bit width operations (see this phoronix article) Currently, if on a `skylake-avx512` machine the implementation will be the same as the `avx` one. We would have to measure to be sure it's worth forcing `-mprefer-vector-width=512` as well.
libc/src/string/x86_64/CMakeLists.txt
1 ↗	(On Diff #250296)	This is because we redirect both x86 and x86_64 versions to this folder and they ought to have different names. We should probably rename this folder if it's unclear. Maybe `x86_multi` ?
libc/test/src/string/CMakeLists.txt
32	This is a two step process: the implementations get compiled with specific flags when testing we retrieve these flags and check whether the current host supports them If they are compatible, the already compiled `.o` file can run on the host, the test file itself doesn't need to receive the flags (only the implementation) This will be the same for benchmarking, the benchmarking code does not need to be compiled with `avx` support, only the code under test needs to be. Do you think it deserves a comment?

I think what is outstanding wrt comments is very minor. So, feel free to land and we can iterate if required after landing.

libc/src/string/CMakeLists.txt
83	I was of the opinion that the compilers do not have the complete set of capabilities available to them and that is why we have facilities like `HWCAP`, `cpuid` etc. But, you are the expert, and if you say what you have is enough, I will take it :)
libc/src/string/x86_64/CMakeLists.txt
1 ↗	(On Diff #250296)	Normal convention is to name it just `x86`. So, name it just `x86` then?
libc/test/src/string/CMakeLists.txt
32	AFAICT, `compute_flags` does not have any side effects. Also, it doesn't look like `${flags}` is used anywhere here. So, why call it at all?

Address comments and rebase

libc/src/string/CMakeLists.txt
83	With `-march=native` the compiler will introspect the CPU with cpuid and detect the available capabilities. We're deferring to the compiler here. Now for shared libraries and runtime dispatch we'll have to provide such code but we're not there yet.
libc/test/src/string/CMakeLists.txt
32	Yes thank you it was a leftover.

Thank you @sivachandra and @abrachet for the review.
I'll submit it as-is for now and we can iterate from here.

In D74397#1929348, @gchatelet wrote:

Thank you @sivachandra and @abrachet for the review.
I'll submit it as-is for now and we can iterate from here.

No really, thank you for patiently working this out. The bots are green so it is sticking.

Harbormaster failed remote builds in B49615: Diff 251115!Mar 18 2020, 10:20 AM

Closed by commit rG04a309dd0be3: [libc] Adding memcpy implementation for x86_64 (authored by gchatelet). · Explain WhyMar 18 2020, 10:20 AM

This revision was automatically updated to reflect the committed changes.

This change breaks -DLLVM_INCLUDE_TESTS=OFF:

$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=libc -DLLVM_INCLUDE_TESTS=OFF ../llvm
...
CMake Error at /home/nathan/src/llvm-project/libc/test/src/string/memory_utils/CMakeLists.txt:13 (target_compile_definitions):
  Cannot specify compile definitions for target "utils_test" which is not
  built by this project.
...

abrachet mentioned this in D76577: [libc] Don't configure test and fuzzer when -DLLVM_INCLUDE_TESTS=OFF.Mar 22 2020, 3:38 PM

abrachet mentioned this in rGa1762f9ceb95: [libc] Don't configure test and fuzzer when -DLLVM_INCLUDE_TESTS=OFF.Mar 22 2020, 11:06 PM

Revision Contents

Path

Size

libc/

CMakeLists.txt

1 line

cmake/

modules/

LLVMLibCCheckCpuFeatures.cmake

180 lines

LLVMLibCRules.cmake

9 lines

cpu_features/

check_avx.cpp

check_avx512f.cpp

check_cpu_features.cpp.in

29 lines

check_sse.cpp

check_sse2.cpp

lib/

CMakeLists.txt

1 line

src/

string/

CMakeLists.txt

63 lines

memcpy.h

21 lines

memcpy.cpp

22 lines

memcpy_arch_specific.h.def

65 lines

memory_utils/

CMakeLists.txt

7 lines

memcpy_utils.h

100 lines

utils.h

7 lines

x86/

CMakeLists.txt

4 lines

memcpy_arch_specific.h.inc

35 lines

test/

src/

string/

CMakeLists.txt

21 lines

memcpy_test.cpp

53 lines

memory_utils/

CMakeLists.txt

7 lines

memcpy_utils_test.cpp

208 lines

utils_test.cpp

8 lines

Diff 251128

libc/CMakeLists.txt

	Show All 15 Lines

	set(LIBC_TARGET_OS ${CMAKE_SYSTEM_NAME})			set(LIBC_TARGET_OS ${CMAKE_SYSTEM_NAME})
	string(TOLOWER ${LIBC_TARGET_OS} LIBC_TARGET_OS)			string(TOLOWER ${LIBC_TARGET_OS} LIBC_TARGET_OS)

	set(LIBC_TARGET_MACHINE ${CMAKE_SYSTEM_PROCESSOR})			set(LIBC_TARGET_MACHINE ${CMAKE_SYSTEM_PROCESSOR})

	include(CMakeParseArguments)			include(CMakeParseArguments)
	include(LLVMLibCRules)			include(LLVMLibCRules)
				include(LLVMLibCCheckCpuFeatures)

	add_subdirectory(src)			add_subdirectory(src)
	add_subdirectory(config)			add_subdirectory(config)
	add_subdirectory(include)			add_subdirectory(include)
	add_subdirectory(utils)			add_subdirectory(utils)

	# The lib and test directories are added at the very end as tests			# The lib and test directories are added at the very end as tests
	# and libraries potentially draw from the components present in all			# and libraries potentially draw from the components present in all
	# of the other directories.			# of the other directories.
	add_subdirectory(lib)			add_subdirectory(lib)
	add_subdirectory(test)			add_subdirectory(test)
	add_subdirectory(fuzzing)			add_subdirectory(fuzzing)

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake

	#------------------------------------------------------------------------------			# ------------------------------------------------------------------------------
	# Cpu features definition and flags			# Cpu features definition and flags
	#
	# Declare a list of all supported cpu features in ALL_CPU_FEATURES.
	#
	# Declares associated flags to enable/disable individual feature of the form:
	# - CPU_FEATURE_<FEATURE>_ENABLE_FLAG
	# - CPU_FEATURE_<FEATURE>_DISABLE_FLAG
	#
	#------------------------------------------------------------------------------			# ------------------------------------------------------------------------------

	if(${LIBC_TARGET_MACHINE} MATCHES "x86\|x86_64")			if(${LIBC_TARGET_MACHINE} MATCHES "x86\|x86_64")
	set(ALL_CPU_FEATURES SSE SSE2 AVX AVX512F)			set(ALL_CPU_FEATURES SSE SSE2 AVX AVX2 AVX512F)
	endif()			endif()

	function(_define_cpu_feature_flags feature)			list(SORT ALL_CPU_FEATURES)
	if(${CMAKE_CXX_COMPILER_ID} MATCHES "Clang")
				abrachetUnsubmitted Done Reply Inline Actions Is it necessary to print this? abrachet: Is it necessary to print this?
				gchateletAuthorUnsubmitted Done Reply Inline Actions Not really. gchatelet: Not really.
				# Function to check whether the host supports the provided set of features.
				# Usage:
				# host_supports(
				# <output variable>
				# <list of cpu features>
				# )
				function(host_supports output_var features)
				_intersection(a "${HOST_CPU_FEATURES}" "${features}")
				if("${a}" STREQUAL "${features}")
				set(${output_var} TRUE PARENT_SCOPE)
				else()
				unset(${output_var} PARENT_SCOPE)
				endif()
				endfunction()

				# Function to compute the flags to pass down to the compiler.
				# Usage:
				# compute_flags(
				# <output variable>
				# MARCH <arch name or "native">
				# REQUIRE <list of mandatory features to enable>
				# REJECT <list of features to disable>
				# )
				function(compute_flags output_var)
				cmake_parse_arguments(
				"COMPUTE_FLAGS"
				"" # Optional arguments
				"MARCH" # Single value arguments
				"REQUIRE;REJECT" # Multi value arguments
				${ARGN})
				# Check that features are not required and rejected at the same time.
				if(COMPUTE_FLAGS_REQUIRE AND COMPUTE_FLAGS_REJECT)
				_intersection(var ${COMPUTE_FLAGS_REQUIRE} ${COMPUTE_FLAGS_REJECT})
				if(var)
				message(FATAL_ERROR "Cpu Features REQUIRE and REJECT ${var}")
				endif()
				endif()
				# Generate the compiler flags in `current`.
				if(${CMAKE_CXX_COMPILER_ID} MATCHES "Clang\|GNU")
				if(COMPUTE_FLAGS_MARCH)
				list(APPEND current "-march=${COMPUTE_FLAGS_MARCH}")
				endif()
				foreach(feature IN LISTS COMPUTE_FLAGS_REQUIRE)
				string(TOLOWER ${feature} lowercase_feature)
				list(APPEND current "-m${lowercase_feature}")
				endforeach()
				foreach(feature IN LISTS COMPUTE_FLAGS_REJECT)
	string(TOLOWER ${feature} lowercase_feature)			string(TOLOWER ${feature} lowercase_feature)
	set(CPU_FEATURE_${feature}_ENABLE_FLAG "-m${lowercase_feature}" PARENT_SCOPE)			list(APPEND current "-mno-${lowercase_feature}")
	set(CPU_FEATURE_${feature}_DISABLE_FLAG "-mno-${lowercase_feature}" PARENT_SCOPE)			endforeach()
	else()			else()
	# In future, we can extend for other compilers.			# In future, we can extend for other compilers.
	message(FATAL_ERROR "Unkown compiler ${CMAKE_CXX_COMPILER_ID}.")			message(FATAL_ERROR "Unkown compiler ${CMAKE_CXX_COMPILER_ID}.")
	endif()			endif()
				# Export the list of flags.
				set(${output_var} "${current}" PARENT_SCOPE)
	endfunction()			endfunction()

	# Defines cpu features flags
	foreach(feature IN LISTS ALL_CPU_FEATURES)
	_define_cpu_feature_flags(${feature})
	endforeach()

	#------------------------------------------------------------------------------			# ------------------------------------------------------------------------------
	# Optimization level flags			# Internal helpers and utilities.
	#
	# Generates the set of flags needed to compile for a up to a particular
	# optimization level.
	#
	# Creates variables of the form `CPU_FEATURE_OPT_<FEATURE>_FLAGS`.
	# CPU_FEATURE_OPT_NONE_FLAGS is a special flag for which no feature is needed.
	#
	# e.g.
	# CPU_FEATURE_OPT_NONE_FLAGS : -mno-sse;-mno-sse2;-mno-avx;-mno-avx512f
	# CPU_FEATURE_OPT_SSE_FLAGS : -msse;-mno-sse2;-mno-avx;-mno-avx512f
	# CPU_FEATURE_OPT_SSE2_FLAGS : -msse;-msse2;-mno-avx;-mno-avx512f
	# CPU_FEATURE_OPT_AVX_FLAGS : -msse;-msse2;-mavx;-mno-avx512f
	# CPU_FEATURE_OPT_AVX512F_FLAGS : -msse;-msse2;-mavx;-mavx512f
	#------------------------------------------------------------------------------			# ------------------------------------------------------------------------------

	# Helper function to concatenate flags needed to support optimization up to			# Computes the intersection between two lists.
	# a particular feature.			function(_intersection output_var list1 list2)
	function(_generate_flags_for_up_to feature flag_variable)			foreach(element IN LISTS list1)
	list(FIND ALL_CPU_FEATURES ${feature} feature_index)			if("${list2}" MATCHES "(^\|;)${element}(;\|$)")
	foreach(current_feature IN LISTS ALL_CPU_FEATURES)			list(APPEND tmp "${element}")
	list(FIND ALL_CPU_FEATURES ${current_feature} current_feature_index)
	if(${current_feature_index} GREATER ${feature_index})
	list(APPEND flags ${CPU_FEATURE_${current_feature}_DISABLE_FLAG})
	else()
	list(APPEND flags ${CPU_FEATURE_${current_feature}_ENABLE_FLAG})
	endif()			endif()
	endforeach()			endforeach()
	set(${flag_variable} ${flags} PARENT_SCOPE)			set(${output_var} ${tmp} PARENT_SCOPE)
	endfunction()			endfunction()

	function(_generate_opt_levels)			# Generates a cpp file to introspect the compiler defined flags.
	set(opt_levels NONE)			function(_generate_check_code)
	list(APPEND opt_levels ${ALL_CPU_FEATURES})			foreach(feature IN LISTS ALL_CPU_FEATURES)
	foreach(feature IN LISTS opt_levels)			set(DEFINITIONS
	set(flag_name "CPU_FEATURE_OPT_${feature}_FLAGS")			"${DEFINITIONS}
	_generate_flags_for_up_to(${feature} ${flag_name})			#ifdef __${feature}__
	set(${flag_name} ${${flag_name}} PARENT_SCOPE)			\"${feature}\",
				#endif")
	endforeach()			endforeach()
				configure_file(
				"${LIBC_SOURCE_DIR}/cmake/modules/cpu_features/check_cpu_features.cpp.in"
				"cpu_features/check_cpu_features.cpp" @ONLY)
	endfunction()			endfunction()
				_generate_check_code()

	_generate_opt_levels()			# Compiles and runs the code generated above with the specified requirements.
				# This is helpful to infer which features a particular target supports or if
	#------------------------------------------------------------------------------			# a specific features implies other features (e.g. BMI2 implies SSE2 and SSE).
	# Host cpu feature introspection			function(_check_defined_cpu_feature output_var)
	#			cmake_parse_arguments(
	# Populates a HOST_CPU_FEATURES list containing the available CPU_FEATURE.			"CHECK_DEFINED"
	#------------------------------------------------------------------------------			"" # Optional arguments
	function(_check_host_cpu_feature feature)			"MARCH" # Single value arguments
	string(TOLOWER ${feature} lowercase_feature)			"REQUIRE;REJECT" # Multi value arguments
				${ARGN})
				compute_flags(
				flags
				MARCH ${CHECK_DEFINED_MARCH}
				REQUIRE ${CHECK_DEFINED_REQUIRE}
				REJECT ${CHECK_DEFINED_REJECT})
	try_run(			try_run(
	run_result			run_result compile_result "${CMAKE_CURRENT_BINARY_DIR}/check_${feature}"
	compile_result			"${CMAKE_CURRENT_BINARY_DIR}/cpu_features/check_cpu_features.cpp"
	"${CMAKE_CURRENT_BINARY_DIR}/check_${lowercase_feature}"			COMPILE_DEFINITIONS ${flags}
	"${CMAKE_MODULE_PATH}/cpu_features/check_${lowercase_feature}.cpp"			COMPILE_OUTPUT_VARIABLE compile_output
	COMPILE_DEFINITIONS ${CPU_FEATURE_${feature}_ENABLE_FLAG}			RUN_OUTPUT_VARIABLE run_output)
	OUTPUT_VARIABLE compile_output
	)
	if(${compile_result} AND ("${run_result}" EQUAL 0))			if(${compile_result} AND ("${run_result}" EQUAL 0))
	list(APPEND HOST_CPU_FEATURES ${feature})			set(${output_var}
	set(HOST_CPU_FEATURES ${HOST_CPU_FEATURES} PARENT_SCOPE)			"${run_output}"
				PARENT_SCOPE)
				else()
				message(FATAL_ERROR "${compile_output}")
	endif()			endif()
	endfunction()			endfunction()

	foreach(feature IN LISTS ALL_CPU_FEATURES)			# Populates the HOST_CPU_FEATURES list.
	_check_host_cpu_feature(${feature})			_check_defined_cpu_feature(HOST_CPU_FEATURES MARCH native)
	endforeach()

libc/cmake/modules/LLVMLibCRules.cmake

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
# Usage:		# Usage:
# add_entrypoint_object(		# add_entrypoint_object(
# <target_name>		# <target_name>
# [REDIRECTED] # Specified if the entrypoint is redirected.		# [REDIRECTED] # Specified if the entrypoint is redirected.
# [NAME] <the C name of the entrypoint if different from target_name>		# [NAME] <the C name of the entrypoint if different from target_name>
# SRCS <list of .cpp files>		# SRCS <list of .cpp files>
# HDRS <list of .h files>		# HDRS <list of .h files>
# DEPENDS <list of dependencies>		# DEPENDS <list of dependencies>
# COMPILE_OPTIONS <optional list of special compile options for this target>		# COMPILE_OPTIONS <optional list of special compile options for this target>
		gchateletAuthorUnsubmitted Done Reply Inline Actions @sivachandra this should be submitted as a separate patch but it's better to have context for the change. gchatelet: @sivachandra this should be submitted as a separate patch but it's better to have context for…
		sivachandraUnsubmitted Done Reply Inline Actions This is fine. sivachandra: This is fine.
# SPECIAL_OBJECTS <optional list of special object targets added by the rule `add_object`>		# SPECIAL_OBJECTS <optional list of special object targets added by the rule `add_object`>
# )		# )
function(add_entrypoint_object target_name)		function(add_entrypoint_object target_name)
cmake_parse_arguments(		cmake_parse_arguments(
"ADD_ENTRYPOINT_OBJ"		"ADD_ENTRYPOINT_OBJ"
"REDIRECTED" # Optional argument		"REDIRECTED" # Optional argument
"NAME" # Single value arguments		"NAME" # Single value arguments
"SRCS;HDRS;SPECIAL_OBJECTS;DEPENDS;COMPILE_OPTIONS" # Multi value arguments		"SRCS;HDRS;SPECIAL_OBJECTS;DEPENDS;COMPILE_OPTIONS" # Multi value arguments
Show All 25 Lines	target_compile_options(
PRIVATE		PRIVATE
-fpie ${LLVM_CXX_STD_default}		-fpie ${LLVM_CXX_STD_default}
)		)
target_include_directories(		target_include_directories(
${target_name}_objects		${target_name}_objects
PRIVATE		PRIVATE
"${LIBC_BUILD_DIR}/include;${LIBC_SOURCE_DIR};${LIBC_BUILD_DIR}"		"${LIBC_BUILD_DIR}/include;${LIBC_SOURCE_DIR};${LIBC_BUILD_DIR}"
)		)
add_dependencies(		add_dependencies(
		sivachandraUnsubmitted Done Reply Inline Actions Sorry for not mentioning this earlier. I have copied this part in to another patch of mine. sivachandra: Sorry for not mentioning this earlier. I have copied this part in to another patch of mine.
		gchateletAuthorUnsubmitted Done Reply Inline Actions Yes, I think I messed up with the merge, should be fixed. gchatelet: Yes, I think I messed up with the merge, should be fixed.
${target_name}_objects		${target_name}_objects
support_common_h		support_common_h
)		)
if(ADD_ENTRYPOINT_OBJ_DEPENDS)		if(ADD_ENTRYPOINT_OBJ_DEPENDS)
add_dependencies(		add_dependencies(
${target_name}_objects		${target_name}_objects
		sivachandraUnsubmitted Done Reply Inline Actions Do we need this still? sivachandra: Do we need this still?
		gchateletAuthorUnsubmitted Done Reply Inline Actions Right now, support for `repmovsb` is disabled. This is a customization point that should be exposed through preprocessor definitions (and CMake options). As of this patch this is not needed but I expect to need it soon-ish. gchatelet: Right now, support for `repmovsb` is disabled. This is a customization point that should be…
${ADD_ENTRYPOINT_OBJ_DEPENDS}		${ADD_ENTRYPOINT_OBJ_DEPENDS}
)		)
endif()		endif()
if(ADD_ENTRYPOINT_OBJ_COMPILE_OPTIONS)		if(ADD_ENTRYPOINT_OBJ_COMPILE_OPTIONS)
target_compile_options(		target_compile_options(
${target_name}_objects		${target_name}_objects
PRIVATE ${ADD_ENTRYPOINT_OBJ_COMPILE_OPTIONS}		PRIVATE ${ADD_ENTRYPOINT_OBJ_COMPILE_OPTIONS}
)		)
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
# Rule to add a libc unittest.		# Rule to add a libc unittest.
# Usage		# Usage
# add_libc_unittest(		# add_libc_unittest(
# <target name>		# <target name>
# SUITE <name of the suite this test belongs to>		# SUITE <name of the suite this test belongs to>
# SRCS <list of .cpp files for the test>		# SRCS <list of .cpp files for the test>
# HDRS <list of .h files for the test>		# HDRS <list of .h files for the test>
# DEPENDS <list of dependencies>		# DEPENDS <list of dependencies>
		# COMPILE_OPTIONS <list of special compile options for this target>
# )		# )
		sivachandraUnsubmitted Done Reply Inline Actions Likewise, seems to me like `COMPILE_DEFINITIONS` is not used. sivachandra: Likewise, seems to me like `COMPILE_DEFINITIONS` is not used.
function(add_libc_unittest target_name)		function(add_libc_unittest target_name)
if(NOT LLVM_INCLUDE_TESTS)		if(NOT LLVM_INCLUDE_TESTS)
return()		return()
endif()		endif()

cmake_parse_arguments(		cmake_parse_arguments(
"LIBC_UNITTEST"		"LIBC_UNITTEST"
"" # No optional arguments		"" # No optional arguments
"SUITE" # Single value arguments		"SUITE" # Single value arguments
"SRCS;HDRS;DEPENDS" # Multi-value arguments		"SRCS;HDRS;DEPENDS;COMPILE_OPTIONS" # Multi-value arguments
${ARGN}		${ARGN}
)		)
if(NOT LIBC_UNITTEST_SRCS)		if(NOT LIBC_UNITTEST_SRCS)
message(FATAL_ERROR "'add_libc_unittest' target requires a SRCS list of .cpp files.")		message(FATAL_ERROR "'add_libc_unittest' target requires a SRCS list of .cpp files.")
endif()		endif()
if(NOT LIBC_UNITTEST_DEPENDS)		if(NOT LIBC_UNITTEST_DEPENDS)
message(FATAL_ERROR "'add_libc_unittest' target requires a DEPENDS list of 'add_entrypoint_object' targets.")		message(FATAL_ERROR "'add_libc_unittest' target requires a DEPENDS list of 'add_entrypoint_object' targets.")
endif()		endif()
Show All 21 Lines	function(add_libc_unittest target_name)
)		)
target_include_directories(		target_include_directories(
${target_name}		${target_name}
PRIVATE		PRIVATE
${LIBC_SOURCE_DIR}		${LIBC_SOURCE_DIR}
${LIBC_BUILD_DIR}		${LIBC_BUILD_DIR}
${LIBC_BUILD_DIR}/include		${LIBC_BUILD_DIR}/include
)		)
		if(LIBC_UNITTEST_COMPILE_OPTIONS)
		target_compile_options(
		${target_name}
		PRIVATE ${LIBC_UNITTEST_COMPILE_OPTIONS}
		)
		endif()

if(library_deps)		if(library_deps)
target_link_libraries(${target_name} PRIVATE ${library_deps})		target_link_libraries(${target_name} PRIVATE ${library_deps})
endif()		endif()

set_target_properties(${target_name} PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})		set_target_properties(${target_name} PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})

add_dependencies(		add_dependencies(
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

libc/cmake/modules/cpu_features/check_avx.cpp

This file was deleted.

	#if !defined __AVX__
	#error "missing __AVX__"
	#endif
	#include <immintrin.h>
	int main() {
	(void)_mm256_set1_epi8('0');
	return 0;
	}

libc/cmake/modules/cpu_features/check_avx512f.cpp

This file was deleted.

	#if !defined __AVX512F__
	#error "missing __AVX512F__"
	#endif
	#include <immintrin.h>
	int main() {
	(void)_mm512_undefined();
	return 0;
	}

libc/cmake/modules/cpu_features/check_cpu_features.cpp.in

This file was added.

				#include <cstdio>
				#include <cstdlib>

				// This file is instantiated by CMake.
				// DEFINITIONS below is replaced with a set of lines like so:
				// #ifdef __SSE2__
				// "SSE2",
				// #endif
				//
				// This allows for introspection of compiler definitions.
				// The output of the program is a single line of semi colon separated feature
				abrachetUnsubmitted Done Reply Inline Actions Is it necessary to define these? Does DEFINITIONS rely on the definitions of macros? Also I think handling MSVC here is strange when `compute_flags` does not currently. abrachet: Is it necessary to define these? Does DEFINITIONS rely on the definitions of macros? Also I…
				gchateletAuthorUnsubmitted Done Reply Inline Actions Right, it's probably better to defer anything related to MSVC at that point. Also I'll add documentation to this file because it's a bit obscure. gchatelet: Right, it's probably better to defer anything related to MSVC at that point. Also I'll add…
				// names.

				// MSVC is using a different set of preprocessor definitions for
				// SSE and SSE2, see _M_IX86_FP in
				// https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros

				int main(int, char **) {
				const char *strings[] = {
				@DEFINITIONS@
				};
				abrachetUnsubmitted Done Reply Inline Actions Why not `putchar` and `puts`? abrachet: Why not `putchar` and `puts`?
				gchateletAuthorUnsubmitted Done Reply Inline Actions I used `putchar` for a single character but I can't use `puts` since it automatically appends a line feed. gchatelet: I used `putchar` for a single character but I can't use `puts` since it automatically appends a…
				const size_t size = sizeof(strings) / sizeof(strings[0]);
				for (size_t i = 0; i < size; ++i) {
				if (i)
				putchar(';');
				fputs(strings[i], stdout);
				}
				return EXIT_SUCCESS;
				}

libc/cmake/modules/cpu_features/check_sse.cpp

This file was deleted.

	#if !defined __SSE__
	#error "missing __SSE__"
	#endif
	#include <immintrin.h>
	int main() {
	(void)_mm_set_ss(1.0f);
	return 0;
	}

libc/cmake/modules/cpu_features/check_sse2.cpp

This file was deleted.

	#if !defined __SSE2__
	#error "missing __SSE2__"
	#endif
	#include <immintrin.h>
	int main() {
	(void)_mm_set1_epi8('0');
	return 0;
	}

libc/lib/CMakeLists.txt


	add_entrypoint_library(			add_entrypoint_library(
	llvmlibc			llvmlibc
	DEPENDS			DEPENDS
	# assert.h entrypoints			# assert.h entrypoints
	__assert_fail			__assert_fail

	# errno.h entrypoints			# errno.h entrypoints
	__errno_location			__errno_location

	# string.h entrypoints			# string.h entrypoints
	strcpy			strcpy
	strcat			strcat
				memcpy

	# sys/mman.h entrypoints			# sys/mman.h entrypoints
	mmap			mmap
	munmap			munmap

	# signal.h entrypoints			# signal.h entrypoints
	raise			raise
	sigaction			sigaction
	Show All 28 Lines

libc/src/string/CMakeLists.txt

				add_subdirectory(memory_utils)

	add_entrypoint_object(			add_entrypoint_object(
	gchateletAuthorUnsubmitted Done Reply Inline Actions I'm removing this since all implementations will be tested from now on. gchatelet: I'm removing this since all implementations will be tested from now on.
	strcat			strcat
	SRCS			SRCS
	strcat.cpp			strcat.cpp
	HDRS			HDRS
	strcat.h			strcat.h
	DEPENDS			DEPENDS
	strcpy			strcpy
	string_h			string_h
				abrachetUnsubmitted Done Reply Inline Actions llvm-nm (the nm that ships with MacOS) for mach-o files prints just the symbol and not its value or type character so grepping for U wouldn't work here (in this very niche case that I just happened to test). If we're using --undefined-only we could just `grep .` perhaps? abrachet: [[ https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-nm/llvm-nm.cpp#L811 \| llvm…
				gchateletAuthorUnsubmitted Done Reply Inline Actions thx for letting me know. I guess `grep .` will work. gchatelet: thx for letting me know. I guess `grep .` will work.
	)			)

	add_entrypoint_object(			add_entrypoint_object(
	strcpy			strcpy
	SRCS			SRCS
	strcpy.cpp			strcpy.cpp
	HDRS			HDRS
	strcpy.h			strcpy.h
	DEPENDS			DEPENDS
	string_h			string_h
	)			)

	add_subdirectory(memory_utils)			# ------------------------------------------------------------------------------
				# memcpy
				# ------------------------------------------------------------------------------

				sivachandraUnsubmitted Done Reply Inline Actions The `NAME` is the entrypoint name and not the target name. May be you mean the opposite here? As in, instead of `memcpy` on line 54, you want `${memcpy_name}` and `memcpy` for the `NAME` argument? sivachandra: The `NAME` is the entrypoint name and not the target name. May be you mean the opposite here?
				# include the relevant architecture specific implementations
				if(${LIBC_TARGET_MACHINE} STREQUAL "x86_64")
				set(LIBC_MEMCPY_IMPL_FOLDER "x86")
				else()
				set(LIBC_MEMCPY_IMPL_FOLDER ${LIBC_TARGET_MACHINE})
				endif()

				add_gen_header(
				memcpy_arch_specific
				DEF_FILE
				memcpy_arch_specific.h.def
				GEN_HDR
				memcpy_arch_specific.h
				PARAMS
				memcpy_arch_specific=${LIBC_MEMCPY_IMPL_FOLDER}/memcpy_arch_specific.h.inc
				DATA_FILES
				${LIBC_MEMCPY_IMPL_FOLDER}/memcpy_arch_specific.h.inc
				)
				sivachandraUnsubmitted Done Reply Inline Actions Should we check if `${opt_level}` is available for the machine this target is being built? sivachandra: Should we check if `${opt_level}` is available for the machine this target is being built?
				gchateletAuthorUnsubmitted Done Reply Inline Actions Yes, makes sense. I'm still not super happy about the introspection part, it's cumbersome to use... I'll work on it as a separate patch if you don't mind. gchatelet: Yes, makes sense. I'm still not super happy about the introspection part, it's cumbersome to…
				sivachandraUnsubmitted Done Reply Inline Actions SGTM sivachandra: SGTM

				# Helper to define an implementation of memcpy.
				# - Computes flags to satisfy required/rejected features and arch,
				# - Declares an entry point,
				# - Attach the REQUIRE_CPU_FEATURES property to the target,
				# - Add the target to `memcpy_implementations` global property for tests.
				function(add_memcpy memcpy_name)
				cmake_parse_arguments(
				"ADD_MEMCPY"
				"" # Optional arguments
				"MARCH" # Single value arguments
				"REQUIRE;REJECT" # Multi value arguments
				${ARGN})
				compute_flags(flags
				MARCH ${ADD_MEMCPY_MARCH}
				REQUIRE ${ADD_MEMCPY_REQUIRE}
				REJECT ${ADD_MEMCPY_REJECT}
				)
				add_entrypoint_object(
				${memcpy_name}
				SRCS ${LIBC_SOURCE_DIR}/src/string/memcpy.cpp
				gchateletAuthorUnsubmitted Done Reply Inline Actions This now creates the following memcpy implementations `__llvm_libc::memcpy_x86_64_avx512` `__llvm_libc::memcpy_x86_64_avx` `__llvm_libc::memcpy_x86_64_sse2` `__llvm_libc::memcpy_x86_64_sse` `__llvm_libc::memcpy_x86_64_unopt` For shared libc, we need an ifunc like trampoline to select the correct version. For static libc, we need to select an implementation @sivachandra how do you see this kind if code generation integrate into the more general cmake functions from `libc/cmake/modules/LLVMLibCRules.cmake`? I expect other memory functions to follow the same scheme. gchatelet: This now creates the following memcpy implementations - `__llvm_libc::memcpy_x86_64_avx512`…
				sivachandraUnsubmitted Done Reply Inline Actions Instead of building all the possible implementations, could we use the CMake [[ https://cmake.org/cmake/help/v3.14/command/try_compile.html \| `try_compile` ]] and/or [[ https://cmake.org/cmake/help/v3.14/command/try_run.html \| `try_run` ]] command to sniff out the best flags to use? I think `try_run` is more appropriate as I expect that we need to run the `cpuid` instruction? Also, compilers have a convenience macro `__cpuid` to run this instruction on x86/x86_64? BTW, one can have ifuncs in static libraries as well. But, I do understand we want to avoid the overhead of the added indirection, so sniffing out at configure time is the best. If we can setup something for configure time sniffing, I believe we should be able use it (may be with straightforward extension/modification iff required) to use as the ifunc selector as well. sivachandra: Instead of building all the possible implementations, could we use the CMake [[ https://cmake.
				abrachetUnsubmitted Done Reply Inline Actions I think this is going to get quickly outside of the scope of this patch. It would probably be better to work on this in a separate patch. abrachet: I think this is going to get quickly outside of the scope of this patch. It would probably be…
				sivachandraUnsubmitted Done Reply Inline Actions If there are any infrastructure pieces required to build a full solution, then we can do them separately as a prerequisite. However, if correct compile options are critical to memcpy implementation, then any code added to deduce them should belong to this patch. I agree that the patch will start to become too big. But, I think the additions to memcpy_utils and their tests can be split out to another prerequisite patch. sivachandra: If there are any infrastructure pieces required to build a full solution, then we can do them…
				gchateletAuthorUnsubmitted Done Reply Inline Actions @sivachandra `try_compile` and `try_run` will only work when the host is also the target of the build. For instance it won't work when cross compiling. There are a few dimensions here: static library autodetect which implementation to use with `try_compile` / `try_run` manually choose the implementation for cross compilation dynamic library select a set of implementations to include in the library and the associated dispatch code test test all the implementations that can run on the host I agree with both of you that this is getting too big for this patch. Let me do my homework and come back with the prerequisite patches. gchatelet: @sivachandra `try_compile` and `try_run` will only work when the host is also the target of the…
				sivachandraUnsubmitted Done Reply Inline Actions We don't need to worry about #2 for now. For cross-compilation in case of #1, if we set up the build vars appropriately, I believe it should not be a problem. #3 is currently a problem because of the way the `add_entrypoint_object` rule works: it expects the entrypoint name and the target name to be the same. We can add an optional argument to the rule which explicitly provides the entrypoint name. This will allow us to add multiple targets for the same entrypoint. We can setup our targets as follows: src/string/ - x86_64 # x86_64 specific directory. - CMakeLists.txt # Lists the targets for the various # x86_64 flavors which all use the # single memcpy.cpp source file - CMakeLists.txt # Lists the target for the release version # of memcpy built using the best compile # for the target machine. - memcpy.cpp # The actual platform independent memcpy # implementation. The same structure can be followed for tests as well. WDYT? sivachandra: We don't need to worry about #2 for now. For cross-compilation in case of #1, if we set up the…
				gchateletAuthorUnsubmitted Done Reply Inline Actions @sivachandra SGTM Can you try to amend `add_entrypoint_object` in a separate patch? Separately, I came up with a CMake function to test cpu flags on linux/mac this will help for `3` and `2`. `x86` and `x86_64`will have the same implementation any suggestion on how to name the directory? `x86_32_64` ? gchatelet: @sivachandra SGTM Can you try to amend `add_entrypoint_object` in a separate patch? Separately…
				sivachandraUnsubmitted Done Reply Inline Actions Done now: https://reviews.llvm.org/D74948 Conventionally, pieces common to x86_64 and other x86 flavors are all put together in a directory named `x86`. I do not have an easy to way to test on non-x86_64 machines. So, may be just start with `x86_64` for now and leave a comment that it works for other x86 flavors also? If you have a better way, feel free to propose. sivachandra: Done now: https://reviews.llvm.org/D74948 Conventionally, pieces common to x86_64 and other…
				gchateletAuthorUnsubmitted Done Reply Inline Actions Let me think about it, I'll get back with the required building blocks. gchatelet: Let me think about it, I'll get back with the required building blocks.
				MaskRayUnsubmitted Done Reply Inline Actions Does `__attribute__((target("avx")))` meet the needs? As to ifunc, it needs non-trivial work in the rtld. Even in a -static (but not -static-pie) context, there will be R_X86_64_IRELATIVE relocations and crt should resolve them. MaskRay: Does `__attribute__((target("avx")))` meet the needs? As to ifunc, it needs non-trivial work…
				gchateletAuthorUnsubmitted Done Reply Inline Actions Does `__attribute__((target("avx")))` meet the needs? Kind of, but It's brittle, I've been bitten a few times passing features with typo and the compiler will happily compile without a warning. It won't work with MSVC For these reasons I'd rather not use it. gchatelet: > Does `__attribute__((target("avx")))` meet the needs? Kind of, but - It's brittle, I've…
				HDRS ${LIBC_SOURCE_DIR}/src/string/memcpy.h
				DEPENDS
				string_h
				memory_utils
				memcpy_arch_specific
				COMPILE_OPTIONS
				-fno-builtin-memcpy
				${flags}
				)
				set_target_properties(${memcpy_name} PROPERTIES REQUIRE_CPU_FEATURES "${ADD_MEMCPY_REQUIRE}")
				get_property(all GLOBAL PROPERTY memcpy_implementations)
				list(APPEND all ${memcpy_name})
				set_property(GLOBAL PROPERTY memcpy_implementations "${all}")
				endfunction()

				add_subdirectory(${LIBC_MEMCPY_IMPL_FOLDER})
				add_memcpy(memcpy MARCH native)
				sivachandraUnsubmitted Done Reply Inline Actions It seems like this will not ensure the best flags. Is that the intention? If so, why? sivachandra: It seems like this will not ensure the best flags. Is that the intention? If so, why?
				gchateletAuthorUnsubmitted Done Reply Inline Actions It will, `-march=native` enables all the features available on the host. Why do you think `-march=native` won't get the best flags? That said if you're on a `skylake-avx512` machine it will not use avx512 instructions. This is because `-mprefer-vector-width` is defaulted to 256 bit width operations (see this phoronix article) Currently, if on a `skylake-avx512` machine the implementation will be the same as the `avx` one. We would have to measure to be sure it's worth forcing `-mprefer-vector-width=512` as well. gchatelet: It will, `-march=native` enables all the features available on the host. Why do you think `…
				sivachandraUnsubmitted Done Reply Inline Actions I was of the opinion that the compilers do not have the complete set of capabilities available to them and that is why we have facilities like `HWCAP`, `cpuid` etc. But, you are the expert, and if you say what you have is enough, I will take it :) sivachandra: I was of the opinion that the compilers do not have the complete set of capabilities available…
				gchateletAuthorUnsubmitted Done Reply Inline Actions With `-march=native` the compiler will introspect the CPU with cpuid and detect the available capabilities. We're deferring to the compiler here. Now for shared libraries and runtime dispatch we'll have to provide such code but we're not there yet. gchatelet: With `-march=native` the compiler will introspect the CPU with [cpuid](https://github.

libc/src/string/memcpy.h

This file was added.

				//===----------------- Implementation header for memcpy -------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIBC_SRC_STRING_MEMCPY_H
				#define LLVM_LIBC_SRC_STRING_MEMCPY_H

				#include "include/string.h"
				#include <stddef.h> // size_t

				namespace __llvm_libc {

				void memcpy(void __restrict, const void *__restrict, size_t);

				} // namespace __llvm_libc

				#endif // LLVM_LIBC_SRC_STRING_MEMCPY_H

libc/src/string/memcpy.cpp

This file was added.

				//===--------------------- Implementation of memcpy -----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "src/string/memcpy.h"
				#include "src/__support/common.h"
				#include "src/string/memcpy_arch_specific.h"

				namespace __llvm_libc {

				void LLVM_LIBC_ENTRYPOINT(memcpy)(void __restrict dst,
				gchateletAuthorUnsubmitted Done Reply Inline Actions We need to selectively declare the entry point depending on if we generate test or final implementation. gchatelet: We need to selectively declare the entry point depending on if we generate test or final…
				sivachandraUnsubmitted Done Reply Inline Actions Test or final, all of them should have the same entrypoint name. No? For example, `memcpy_config1_test` will depend on `memcpy_config1` but will call __llvm_libc::memcpy for the test. sivachandra: Test or final, all of them should have the same entrypoint name. No? For example…
				const void *__restrict src, size_t size) {
				memcpy_no_return(reinterpret_cast<char *>(dst),
				reinterpret_cast<const char *>(src), size);
				return dst;
				}

				} // namespace __llvm_libc

libc/src/string/memcpy_arch_specific.h.def

This file was added.

				//===-------------- Implementation of arch specific memcpy ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIBC_SRC_STRING_MEMORY_ARCH_H
				#define LLVM_LIBC_SRC_STRING_MEMORY_ARCH_H

				%%include_file(${memcpy_arch_specific})

				namespace __llvm_libc {

				// Design rationale
				// ================
				//
				gchateletAuthorUnsubmitted Done Reply Inline Actions @sivachandra I'm not super happy with this pattern that dispatches the logic in many files. Would it be possible to generate the cpp file directly instead of generating an intermediate header file? At least the common implementation would be in the template. gchatelet: @sivachandra I'm not super happy with this pattern that dispatches the logic in many files.
				// Using a profiler to observe size distributions for calls into libc
				// functions, it was found most operations act on a small number of bytes.
				// This makes it important to favor small sizes.
				//
				abrachetUnsubmitted Done Reply Inline Actions remove `#define` abrachet: remove `#define`
				// The tests for `count` are in ascending order so the cost of branching is
				// proportional to the cost of copying.
				//
				// The function is written in C++ for several reasons:
				// - The compiler can __see__ the code, this is useful when performing Profile
				// Guided Optimization as the optimized code can take advantage of branching
				// probabilities.
				// - It also allows for easier customization and favors testing multiple
				// implementation parameters.
				// - As compilers and processors get better, the generated code is improved
				// with little change on the code side.
				static void memcpy_no_return(char __restrict dst, const char __restrict src,
				size_t count) {
				abrachetUnsubmitted Done Reply Inline Actions Seems like clang-format interfered a little bit here. abrachet: Seems like clang-format interfered a little bit here.
				gchateletAuthorUnsubmitted Done Reply Inline Actions Good catch gchatelet: Good catch
				if (count == 0)
				return;
				if (count == 1)
				return Copy<1>(dst, src);
				if (count == 2)
				return Copy<2>(dst, src);
				if (count == 3)
				return Copy<3>(dst, src);
				if (count == 4)
				return Copy<4>(dst, src);
				if (count < 8)
				return CopyOverlap<4>(dst, src, count);
				if (count == 8)
				return Copy<8>(dst, src);
				if (count < 16)
				return CopyOverlap<8>(dst, src, count);
				if (count == 16)
				return Copy<16>(dst, src);
				if (count < 32)
				return CopyOverlap<16>(dst, src, count);
				if (count < 64)
				return CopyOverlap<32>(dst, src, count);
				if (count < 128)
				return CopyOverlap<64>(dst, src, count);
				CopyGE128(dst, src, count);
				}
				sivachandraUnsubmitted Done Reply Inline Actions This should all be just `memcpy`. May be I am missing something still. sivachandra: This should all be just `memcpy`. May be I am missing something still.

				} // namespace __llvm_libc

				#endif // LLVM_LIBC_SRC_STRING_MEMORY_ARCH_H

libc/src/string/memory_utils/CMakeLists.txt

	add_gen_header(			add_gen_header(
	cacheline_size			cacheline_size
	DEF_FILE			DEF_FILE
	cacheline_size.h.def			cacheline_size.h.def
	GEN_HDR			GEN_HDR
	cacheline_size.h			cacheline_size.h
	PARAMS			PARAMS
	machine_cacheline_size=cacheline_size_${LIBC_TARGET_MACHINE}.h.inc			machine_cacheline_size=cacheline_size_${LIBC_TARGET_MACHINE}.h.inc
	DATA_FILES			DATA_FILES
	cacheline_size_${LIBC_TARGET_MACHINE}.h.inc			cacheline_size_${LIBC_TARGET_MACHINE}.h.inc
	)			)

	add_header_library(			add_header_library(
	memory_utils			memory_utils
	HDRS utils.h			HDRS
	DEPENDS cacheline_size			utils.h
				memcpy_utils.h
				DEPENDS
				cacheline_size
	)			)

libc/src/string/memory_utils/memcpy_utils.h

This file was added.

				//===---------------------------- Memcpy utils ----------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIBC_SRC_MEMORY_UTILS_MEMCPY_UTILS_H
				#define LLVM_LIBC_SRC_MEMORY_UTILS_MEMCPY_UTILS_H

				#include "src/string/memory_utils/utils.h"
				#include <stddef.h> // size_t

				// __builtin_memcpy_inline guarantees to never call external functions.
				// Unfortunately it is not widely available.
				#if defined(__clang__) && __has_builtin(__builtin_memcpy_inline)
				#define USE_BUILTIN_MEMCPY_INLINE
				#elif defined(__GNUC__)
				#define USE_BUILTIN_MEMCPY
				#endif

				// This is useful for testing.
				abrachetUnsubmitted Done Reply Inline Actions Presumably should be just `__builtin_memcpy` abrachet: Presumably should be just `__builtin_memcpy`
				gchateletAuthorUnsubmitted Done Reply Inline Actions Good catch gchatelet: Good catch
				gchateletAuthorUnsubmitted Done Reply Inline Actions `__has_builtin` is actually fairly new and `__builtin_memcpy` pretty old so I'm just going to assume that gcc has `__builtin_memcpy`. gchatelet: `__has_builtin` is actually fairly new and `__builtin_memcpy` pretty old so I'm just going to…
				#if defined(LLVM_LIBC_MEMCPY_MONITOR)
				extern "C" void LLVM_LIBC_MEMCPY_MONITOR(char *__restrict,
				const char *__restrict, size_t);
				#endif

				namespace __llvm_libc {

				// Copies `kBlockSize` bytes from `src` to `dst`.
				template <size_t kBlockSize>
				static void Copy(char __restrict dst, const char __restrict src) {
				#if defined(LLVM_LIBC_MEMCPY_MONITOR)
				LLVM_LIBC_MEMCPY_MONITOR(dst, src, kBlockSize);
				#elif defined(USE_BUILTIN_MEMCPY_INLINE)
				__builtin_memcpy_inline(dst, src, kBlockSize);
				#elif defined(USE_BUILTIN_MEMCPY)
				__builtin_memcpy(dst, src, kBlockSize);
				#else
				for (size_t i = 0; i < kBlockSize; ++i)
				dst[i] = src[i];
				#endif
				}

				// Copies `kBlockSize` bytes from `src + count - kBlockSize` to
				// `dst + count - kBlockSize`.
				// Precondition: `count >= kBlockSize`.
				template <size_t kBlockSize>
				static void CopyLastBlock(char __restrict dst, const char __restrict src,
				size_t count) {
				const size_t offset = count - kBlockSize;
				Copy<kBlockSize>(dst + offset, src + offset);
				}

				// Copies `kBlockSize` bytes twice with an overlap between the two.
				//
				// [1234567812345678123]
				// [__XXXXXXXXXXXXXX___]
				// [__XXXXXXXX_________]
				// [________XXXXXXXX___]
				//
				// Precondition: `count >= kBlockSize && count <= kBlockSize`.
				template <size_t kBlockSize>
				static void CopyOverlap(char __restrict dst, const char __restrict src,
				size_t count) {
				Copy<kBlockSize>(dst, src);
				CopyLastBlock<kBlockSize>(dst, src, count);
				}

				// Copies `count` bytes by blocks of `kBlockSize` bytes.
				// Copies at the start and end of the buffer are unaligned.
				// Copies in the middle of the buffer are aligned to `kBlockSize`.
				//
				// e.g. with
				// [12345678123456781234567812345678]
				// [__XXXXXXXXXXXXXXXXXXXXXXXXXXX___]
				// [__XXXXXXXX______________________]
				// [________XXXXXXXX________________]
				// [________________XXXXXXXX________]
				// [_____________________XXXXXXXX___]
				//
				// Precondition: `count > 2 * kBlockSize` for efficiency.
				// `count >= kBlockSize` for correctness.
				template <size_t kBlockSize>
				static void CopyAligned(char __restrict dst, const char __restrict src,
				size_t count) {
				Copy<kBlockSize>(dst, src); // Copy first block

				// Copy aligned blocks
				size_t offset = kBlockSize - offset_from_last_aligned<kBlockSize>(dst);
				for (; offset + kBlockSize < count; offset += kBlockSize)
				Copy<kBlockSize>(dst + offset, src + offset);

				CopyLastBlock<kBlockSize>(dst, src, count); // Copy last block
				}

				} // namespace __llvm_libc

				#endif // LLVM_LIBC_SRC_MEMORY_UTILS_MEMCPY_UTILS_H

libc/src/string/memory_utils/utils.h

	Show All 37 Lines
	}			}

	// Returns the first power of two following value or value if it is already a			// Returns the first power of two following value or value if it is already a
	// power of two (or 0 when value is 0).			// power of two (or 0 when value is 0).
	static constexpr size_t ge_power2(size_t value) {			static constexpr size_t ge_power2(size_t value) {
	return is_power2_or_zero(value) ? value : 1ULL << (log2(value) + 1);			return is_power2_or_zero(value) ? value : 1ULL << (log2(value) + 1);
	}			}

				template <size_t alignment> intptr_t offset_from_last_aligned(const void *ptr) {
				static_assert(is_power2(alignment), "alignment must be a power of 2");
				return reinterpret_cast<uintptr_t>(ptr) & (alignment - 1U);
				}

	template <size_t alignment> intptr_t offset_to_next_aligned(const void *ptr) {			template <size_t alignment> intptr_t offset_to_next_aligned(const void *ptr) {
	static_assert(is_power2(alignment), "alignment must be a power of 2");			static_assert(is_power2(alignment), "alignment must be a power of 2");
	// The logic is not straightforward and involves unsigned modulo arithmetic			// The logic is not straightforward and involves unsigned modulo arithmetic
	// but the generated code is as fast as it can be.			// but the generated code is as fast as it can be.
	return -reinterpret_cast<uintptr_t>(ptr) & (alignment - 1U);			return -reinterpret_cast<uintptr_t>(ptr) & (alignment - 1U);
	}			}

	// Returns the offset from `ptr` to the next cache line.			// Returns the offset from `ptr` to the next cache line.
	static intptr_t offset_to_next_cache_line(const void *ptr) {			static inline intptr_t offset_to_next_cache_line(const void *ptr) {
	return offset_to_next_aligned<LLVM_LIBC_CACHELINE_SIZE>(ptr);			return offset_to_next_aligned<LLVM_LIBC_CACHELINE_SIZE>(ptr);
	}			}

	} // namespace __llvm_libc			} // namespace __llvm_libc

	#endif // LLVM_LIBC_SRC_MEMORY_UTILS_H			#endif // LLVM_LIBC_SRC_MEMORY_UTILS_H

libc/src/string/x86/CMakeLists.txt

This file was added.

				add_memcpy("memcpy_${LIBC_TARGET_MACHINE}_opt_none" REJECT "${ALL_CPU_FEATURES}")
				add_memcpy("memcpy_${LIBC_TARGET_MACHINE}_opt_sse" REQUIRE "SSE" REJECT "SSE2")
				add_memcpy("memcpy_${LIBC_TARGET_MACHINE}_opt_avx" REQUIRE "AVX" REJECT "AVX2")
				add_memcpy("memcpy_${LIBC_TARGET_MACHINE}_opt_avx512f" REQUIRE "AVX512F")

libc/src/string/x86/memcpy_arch_specific.h.inc

This file was added.

				#include "src/string/memory_utils/memcpy_utils.h"

				namespace __llvm_libc {

				static void CopyRepMovsb(char __restrict dst, const char __restrict src,
				size_t count) {
				// FIXME: Add MSVC suppport with
				// #include <intrin.h>
				// __movsb(reinterpret_cast<unsigned char *>(dst),
				// reinterpret_cast<const unsigned char *>(src), count);
				asm volatile("rep movsb" : "+D"(dst), "+S"(src), "+c"(count) : : "memory");
				}

				#if defined(__AVX__)
				#define BEST_SIZE 64
				#else
				#define BEST_SIZE 32
				#endif

				static void CopyGE128(char __restrict dst, const char __restrict src,
				size_t count) {
				#if defined(__AVX__)
				if (count < 256)
				return CopyOverlap<128>(dst, src, count);
				#endif
				// kRepMovsBSize == -1 : Only CopyAligned is used.
				// kRepMovsBSize == 0 : Only RepMovsb is used.
				// else CopyAligned is used to to kRepMovsBSize and then RepMovsb.
				constexpr size_t kRepMovsBSize = -1;
				if (count <= kRepMovsBSize)
				return CopyAligned<BEST_SIZE>(dst, src, count);
				CopyRepMovsb(dst, src, count);
				}

				} // namespace __llvm_libc

libc/test/src/string/CMakeLists.txt

Show All 16 Lines	add_libc_unittest(
strcpy_test		strcpy_test
SUITE		SUITE
libc_string_unittests		libc_string_unittests
SRCS		SRCS
strcpy_test.cpp		strcpy_test.cpp
DEPENDS		DEPENDS
strcpy		strcpy
)		)

		# Tests all implementations of memcpy that can run on the host.
		get_property(memcpy_implementations GLOBAL PROPERTY memcpy_implementations)
		foreach(memcpy_config_name IN LISTS memcpy_implementations)
		get_target_property(require_cpu_features ${memcpy_config_name} REQUIRE_CPU_FEATURES)
		host_supports(can_run "${require_cpu_features}")
		sivachandraUnsubmitted Done Reply Inline Actions This seems to be setup correctly. But, instead of `memcpy_name`, they should be called `memcpy_config_name` to make it clear that they are all `memcpy` implementations in different configs. sivachandra: This seems to be setup correctly. But, instead of `memcpy_name`, they should be called…
		if(can_run)
		add_libc_unittest(
		abrachetUnsubmitted Done Reply Inline Actions Would you mind explaining this? It seems like ${flags} will just be -march=native, and the work above to find flags gets ignored. abrachet: Would you mind explaining this? It seems like ${flags} will just be -march=native, and the work…
		gchateletAuthorUnsubmitted Done Reply Inline Actions Actually there's no need for flags here, the implementation has already been compiled with the correct flags. The test itself doesn't need them. gchatelet: Actually there's no need for flags here, the implementation has already been compiled with the…
		sivachandraUnsubmitted Done Reply Inline Actions Ah, so remove this line then? sivachandra: Ah, so remove this line then?
		gchateletAuthorUnsubmitted Done Reply Inline Actions Yes thank you it was a leftover. gchatelet: Yes thank you it was a leftover.
		sivachandraUnsubmitted Done Reply Inline Actions Same question for me as well. I left a related comment at a different place above. sivachandra: Same question for me as well. I left a related comment at a different place above.
		gchateletAuthorUnsubmitted Done Reply Inline Actions This is a two step process: the implementations get compiled with specific flags when testing we retrieve these flags and check whether the current host supports them If they are compatible, the already compiled `.o` file can run on the host, the test file itself doesn't need to receive the flags (only the implementation) This will be the same for benchmarking, the benchmarking code does not need to be compiled with `avx` support, only the code under test needs to be. Do you think it deserves a comment? gchatelet: This is a two step process: - the implementations get compiled with specific flags - when…
		sivachandraUnsubmitted Done Reply Inline Actions AFAICT, `compute_flags` does not have any side effects. Also, it doesn't look like `${flags}` is used anywhere here. So, why call it at all? sivachandra: AFAICT, `compute_flags` does not have any side effects. Also, it doesn't look like `${flags}`…
		${memcpy_config_name}_test
		SUITE
		libc_string_unittests
		SRCS
		sivachandraUnsubmitted Done Reply Inline Actions This as well. sivachandra: This as well.
		memcpy_test.cpp
		DEPENDS
		${memcpy_config_name}
		)
		else()
		message(STATUS "Skipping test for '${memcpy_config_name}' insufficient host cpu features")
		endif()
		endforeach()

libc/test/src/string/memcpy_test.cpp

This file was added.

				//===----------------------- Unittests for memcpy -------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "utils/CPP/ArrayRef.h"
				#include "utils/UnitTest/Test.h"
				#include "src/string/memcpy.h"

				sivachandraUnsubmitted Done Reply Inline Actions This should not be required. sivachandra: This should not be required.
				using __llvm_libc::cpp::Array;
				sivachandraUnsubmitted Done Reply Inline Actions Instead of this, you can include `src/string/memcpy.h`. sivachandra: Instead of this, you can include `src/string/memcpy.h`.
				gchateletAuthorUnsubmitted Done Reply Inline Actions Ha sure ! gchatelet: Ha sure !
				using __llvm_libc::cpp::ArrayRef;
				using __llvm_libc::cpp::MutableArrayRef;
				using Data = Array<char, 2048>;

				static const ArrayRef<char> kNumbers("0123456789", 10);
				static const ArrayRef<char> kDeadcode("DEADC0DE", 8);

				// Returns a Data object filled with a repetition of `filler`.
				Data getData(ArrayRef<char> filler) {
				Data out;
				for (size_t i = 0; i < out.size(); ++i)
				out[i] = filler[i % filler.size()];
				return out;
				}

				TEST(MemcpyTest, Thorough) {
				const Data groundtruth = getData(kNumbers);
				const Data dirty = getData(kDeadcode);
				for (size_t count = 0; count < 1024; ++count) {
				for (size_t align = 0; align < 64; ++align) {
				auto buffer = dirty;
				const char *const src = groundtruth.data();
				char *const dst = &buffer[align];
				__llvm_libc::memcpy(dst, src, count);
				// Everything before copy is untouched.
				for (size_t i = 0; i < align; ++i)
				ASSERT_EQ(buffer[i], dirty[i]);
				// Everything in between is copied.
				for (size_t i = 0; i < count; ++i)
				ASSERT_EQ(buffer[align + i], groundtruth[i]);
				// Everything after copy is untouched.
				for (size_t i = align + count; i < dirty.size(); ++i)
				ASSERT_EQ(buffer[i], dirty[i]);
				sivachandraUnsubmitted Done Reply Inline Actions This should also not be required. sivachandra: This should also not be required.
				}
				}
				}

				// FIXME: Add tests with reads and writes on the boundary of a read/write
				// protected page to check we're not reading nor writing prior/past the allowed
				// regions.

libc/test/src/string/memory_utils/CMakeLists.txt

	add_libc_unittest(			add_libc_unittest(
	utils_test			utils_test
	SUITE			SUITE
	libc_string_unittests			libc_string_unittests
	SRCS			SRCS
	utils_test.cpp			utils_test.cpp
				memcpy_utils_test.cpp
	DEPENDS			DEPENDS
	memory_utils			memory_utils
	standalone_cpp			standalone_cpp
	)			)

				target_compile_definitions(
				utils_test
				PRIVATE
				LLVM_LIBC_MEMCPY_MONITOR=memcpy_monitor
				)

libc/test/src/string/memory_utils/memcpy_utils_test.cpp

This file was added.

				//===-------------------- Unittests for memory_utils ----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "src/string/memory_utils/memcpy_utils.h"
				#include "utils/CPP/Array.h"
				sivachandraUnsubmitted Done Reply Inline Actions Where is `monitor_memcpy` defined? sivachandra: Where is `monitor_memcpy` defined?
				gchateletAuthorUnsubmitted Done Reply Inline Actions Further down after the `GetTrace` function. gchatelet: Further down after the `GetTrace` function.
				sivachandraUnsubmitted Done Reply Inline Actions Ah sorry, I missed it. So, does it mean that you want that name to be overridable/customizable? If yes, then would it make sense to define it to `monitor_memcpy` only if not defined? #ifndef LLVM_LIBC_MEMCPY_MONITOR #define LLVM_LIBC_MEMCPY_MONITOR monitor_memcpy #endif sivachandra: Ah sorry, I missed it. So, does it mean that you want that name to be overridable/customizable?
				gchateletAuthorUnsubmitted Done Reply Inline Actions I've defined it in the `CMakeLists.txt` with a check in the test file. gchatelet: I've defined it in the `CMakeLists.txt` with a check in the test file.
				#include "utils/UnitTest/Test.h"

				#include <assert.h>
				#include <stdint.h> // uintptr_t

				abrachetUnsubmitted Done Reply Inline Actions D74091 Was also wanting to use `assert`/`abort` we should add them. Also that comment is funny to me because its the only thing in assert.h :) abrachet: D74091 Was also wanting to use `assert`/`abort` we should add them. Also that comment is funny…
				sivachandraUnsubmitted Done Reply Inline Actions Point taken. I will prepare something to address this. sivachandra: Point taken. I will prepare something to address this.
				gchateletAuthorUnsubmitted Done Reply Inline Actions I think that would be nice to have the functions in `memcpy_utils.h` have some precondition checking, an `abort` function would be useful indeed, #if not defined(NDEBUG) // check precondition #endif gchatelet: I think that would be nice to have the functions in `memcpy_utils.h` have some precondition…
				sivachandraUnsubmitted Done Reply Inline Actions Okay. There are two kinds of `abort` here. One to use in the implementation, another to use in the test. For the `abort` used in the test, we should use the one coming from the system libc. For the one going into the implementation, we should use the one from llvm-libc. The abort function is fairly involved and needs other pieces to be in place for us to build llvm-libc's implementation. Let me do my homework get back to you on that. sivachandra: Okay. There are two kinds of `abort` here. One to use in the implementation, another to use in…
				abrachetUnsubmitted Done Reply Inline Actions The abort function is fairly involved and needs other pieces to be in place for us to build llvm-libc's implementation. I was actually working on writing to the list about getting started on signals, which is how abort should be implemented I think, `for(;;) raise(SIGABRT);`. I'd be happy to get started on signals :) I think for now though, it is fine to have abort be `__builtin_trap()` because we are mainly focused on x86 at the moment. (I remember on ARM32 __builtin_trap() calls abort for example) abrachet: > The abort function is fairly involved and needs other pieces to be in place for us to build…
				sivachandraUnsubmitted Done Reply Inline Actions It would be awesome if you can start work on signals. sivachandra: It would be awesome if you can start work on signals.
				#ifndef LLVM_LIBC_MEMCPY_MONITOR
				sivachandraUnsubmitted Done Reply Inline Actions Use `stdint.h` instead of `cstdint`. sivachandra: Use `stdint.h` instead of `cstdint`.
				#error LLVM_LIBC_MEMCPY_MONITOR must be defined for this test.
				#endif
				abrachetUnsubmitted Done Reply Inline Actions Not used as far as I can tell abrachet: Not used as far as I can tell

				namespace __llvm_libc {

				struct Buffer {
				static constexpr size_t kMaxBuffer = 1024;
				char buffer[kMaxBuffer + 1];
				size_t last = 0;

				void Clear() {
				last = 0;
				for (size_t i = 0; i < kMaxBuffer; ++i)
				buffer[i] = '0';
				buffer[kMaxBuffer] = '\0';
				}

				void Increment(const void *ptr) {
				const auto offset = reinterpret_cast<uintptr_t>(ptr);
				assert(offset < kMaxBuffer);
				++buffer[offset];
				if (offset > last)
				last = offset;
				}

				char *Finish() {
				assert(last < kMaxBuffer);
				buffer[last + 1] = '\0';
				return buffer;
				}
				};

				struct Trace {
				Buffer read;
				Buffer write;

				void Add(char __restrict dst, const char __restrict src, size_t count) {
				for (size_t i = 0; i < count; ++i)
				read.Increment(src + i);
				for (size_t i = 0; i < count; ++i)
				write.Increment(dst + i);
				}

				void Clear() {
				read.Clear();
				write.Clear();
				}

				char *Read() { return read.Finish(); }
				char *Write() { return write.Finish(); }
				};

				static Trace &GetTrace() {
				static thread_local Trace events;
				return events;
				}

				extern "C" void LLVM_LIBC_MEMCPY_MONITOR(char *__restrict dst,
				const char *__restrict src,
				size_t count) {
				GetTrace().Add(dst, src, count);
				}

				char I(uintptr_t offset) { return reinterpret_cast<char >(offset); }

				TEST(MemcpyUtilsTest, CopyTrivial) {
				auto &trace = GetTrace();

				trace.Clear();
				Copy<1>(I(0), I(0));
				EXPECT_STREQ(trace.Write(), "1");
				EXPECT_STREQ(trace.Read(), "1");

				trace.Clear();
				Copy<2>(I(0), I(0));
				EXPECT_STREQ(trace.Write(), "11");
				EXPECT_STREQ(trace.Read(), "11");

				trace.Clear();
				Copy<4>(I(0), I(0));
				EXPECT_STREQ(trace.Write(), "1111");
				EXPECT_STREQ(trace.Read(), "1111");

				trace.Clear();
				Copy<8>(I(0), I(0));
				EXPECT_STREQ(trace.Write(), "11111111");
				EXPECT_STREQ(trace.Read(), "11111111");

				trace.Clear();
				Copy<16>(I(0), I(0));
				EXPECT_STREQ(trace.Write(), "1111111111111111");
				EXPECT_STREQ(trace.Read(), "1111111111111111");

				trace.Clear();
				Copy<32>(I(0), I(0));
				EXPECT_STREQ(trace.Write(), "11111111111111111111111111111111");
				EXPECT_STREQ(trace.Read(), "11111111111111111111111111111111");

				trace.Clear();
				Copy<64>(I(0), I(0));
				EXPECT_STREQ(
				trace.Write(),
				"1111111111111111111111111111111111111111111111111111111111111111");
				EXPECT_STREQ(
				trace.Read(),
				"1111111111111111111111111111111111111111111111111111111111111111");
				}

				TEST(MemcpyUtilsTest, CopyOffset) {
				auto &trace = GetTrace();

				trace.Clear();
				Copy<1>(I(3), I(1));
				EXPECT_STREQ(trace.Write(), "0001");
				EXPECT_STREQ(trace.Read(), "01");

				trace.Clear();
				Copy<1>(I(2), I(1));
				EXPECT_STREQ(trace.Write(), "001");
				EXPECT_STREQ(trace.Read(), "01");
				}

				TEST(MemcpyUtilsTest, CopyOverlap) {
				auto &trace = GetTrace();

				trace.Clear();
				CopyOverlap<2>(I(0), I(0), 2);
				EXPECT_STREQ(trace.Write(), "22");
				EXPECT_STREQ(trace.Read(), "22");

				trace.Clear();
				CopyOverlap<2>(I(0), I(0), 3);
				EXPECT_STREQ(trace.Write(), "121");
				EXPECT_STREQ(trace.Read(), "121");

				trace.Clear();
				CopyOverlap<2>(I(0), I(0), 4);
				EXPECT_STREQ(trace.Write(), "1111");
				EXPECT_STREQ(trace.Read(), "1111");

				trace.Clear();
				CopyOverlap<4>(I(2), I(1), 7);
				EXPECT_STREQ(trace.Write(), "001112111");
				EXPECT_STREQ(trace.Read(), "01112111");
				}

				TEST(MemcpyUtilsTest, CopyAligned) {
				auto &trace = GetTrace();
				// Destination is aligned already.
				// "1111000000000"
				// + "0000111100000"
				// + "0000000011110"
				// + "0000000001111"
				// = "1111111112221"
				trace.Clear();
				CopyAligned<4>(I(0), I(0), 13);
				EXPECT_STREQ(trace.Write(), "1111111112221");
				EXPECT_STREQ(trace.Read(), "1111111112221");

				// Misaligned destination
				// "01111000000000"
				// + "00001111000000"
				// + "00000000111100"
				// + "00000000001111"
				// = "01112111112211"
				trace.Clear();
				CopyAligned<4>(I(1), I(0), 13);
				EXPECT_STREQ(trace.Write(), "01112111112211");
				EXPECT_STREQ(trace.Read(), "1112111112211");
				}

				TEST(MemcpyUtilsTest, MaxReloads) {
				auto &trace = GetTrace();
				for (size_t alignment = 0; alignment < 32; ++alignment) {
				for (size_t count = 64; count < 768; ++count) {
				trace.Clear();
				// We should never reload more than twice when copying from count = 2x32.
				CopyAligned<32>(I(alignment), I(0), count);
				const char *const written = trace.Write();
				// First bytes are untouched.
				for (size_t i = 0; i < alignment; ++i)
				EXPECT_EQ(written[i], '0');
				// Next bytes are loaded once or twice but no more.
				for (size_t i = alignment; i < count; ++i) {
				EXPECT_GE(written[i], '1');
				EXPECT_LE(written[i], '2');
				}
				}
				}
				}

				} // namespace __llvm_libc

libc/test/src/string/memory_utils/utils_test.cpp

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	TEST(UtilsTest, OffsetToNextAligned) {			TEST(UtilsTest, OffsetToNextAligned) {
	EXPECT_EQ(offset_to_next_aligned<16>(forge(0)), I(0));			EXPECT_EQ(offset_to_next_aligned<16>(forge(0)), I(0));
	EXPECT_EQ(offset_to_next_aligned<16>(forge(1)), I(15));			EXPECT_EQ(offset_to_next_aligned<16>(forge(1)), I(15));
	EXPECT_EQ(offset_to_next_aligned<16>(forge(16)), I(0));			EXPECT_EQ(offset_to_next_aligned<16>(forge(16)), I(0));
	EXPECT_EQ(offset_to_next_aligned<16>(forge(15)), I(1));			EXPECT_EQ(offset_to_next_aligned<16>(forge(15)), I(1));
	EXPECT_EQ(offset_to_next_aligned<32>(forge(16)), I(16));			EXPECT_EQ(offset_to_next_aligned<32>(forge(16)), I(16));
	}			}

				TEST(UtilsTest, OffsetFromLastAligned) {
				EXPECT_EQ(offset_from_last_aligned<16>(forge(0)), I(0));
				EXPECT_EQ(offset_from_last_aligned<16>(forge(1)), I(1));
				EXPECT_EQ(offset_from_last_aligned<16>(forge(16)), I(0));
				EXPECT_EQ(offset_from_last_aligned<16>(forge(15)), I(15));
				EXPECT_EQ(offset_from_last_aligned<32>(forge(16)), I(16));
				}

	TEST(UtilsTest, OffsetToNextCacheLine) {			TEST(UtilsTest, OffsetToNextCacheLine) {
	EXPECT_GT(LLVM_LIBC_CACHELINE_SIZE, 0);			EXPECT_GT(LLVM_LIBC_CACHELINE_SIZE, 0);
	EXPECT_EQ(offset_to_next_cache_line(forge(0)), I(0));			EXPECT_EQ(offset_to_next_cache_line(forge(0)), I(0));
	EXPECT_EQ(offset_to_next_cache_line(forge(1)),			EXPECT_EQ(offset_to_next_cache_line(forge(1)),
	I(LLVM_LIBC_CACHELINE_SIZE - 1));			I(LLVM_LIBC_CACHELINE_SIZE - 1));
	EXPECT_EQ(offset_to_next_cache_line(forge(LLVM_LIBC_CACHELINE_SIZE)), I(0));			EXPECT_EQ(offset_to_next_cache_line(forge(LLVM_LIBC_CACHELINE_SIZE)), I(0));
	EXPECT_EQ(offset_to_next_cache_line(forge(LLVM_LIBC_CACHELINE_SIZE - 1)),			EXPECT_EQ(offset_to_next_cache_line(forge(LLVM_LIBC_CACHELINE_SIZE - 1)),
	I(1));			I(1));
	}			}
	} // namespace __llvm_libc			} // namespace __llvm_libc

This is an archive of the discontinued LLVM Phabricator instance.

[libc] Adding memcpy implementation for x86_64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 251128

libc/CMakeLists.txt

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake

libc/cmake/modules/LLVMLibCRules.cmake

libc/cmake/modules/cpu_features/check_avx.cpp

libc/cmake/modules/cpu_features/check_avx512f.cpp

libc/cmake/modules/cpu_features/check_cpu_features.cpp.in

libc/cmake/modules/cpu_features/check_sse.cpp

libc/cmake/modules/cpu_features/check_sse2.cpp

libc/lib/CMakeLists.txt

libc/src/string/CMakeLists.txt

libc/src/string/memcpy.h

libc/src/string/memcpy.cpp

libc/src/string/memcpy_arch_specific.h.def

libc/src/string/memory_utils/CMakeLists.txt

libc/src/string/memory_utils/memcpy_utils.h

libc/src/string/memory_utils/utils.h

libc/src/string/x86/CMakeLists.txt

libc/src/string/x86/memcpy_arch_specific.h.inc

libc/test/src/string/CMakeLists.txt

libc/test/src/string/memcpy_test.cpp

libc/test/src/string/memory_utils/CMakeLists.txt

libc/test/src/string/memory_utils/memcpy_utils_test.cpp

libc/test/src/string/memory_utils/utils_test.cpp

[libc] Adding memcpy implementation for x86_64
ClosedPublic