Page MenuHomePhabricator

[libc] Adding memcpy implementation for x86_64
ClosedPublic

Authored by gchatelet on Feb 11 2020, 4:51 AM.

Details

Summary

It is advised to read the post motivating the creation of __builtin_memcpy_inline first.

The patch focuses on static library but allows creation of several implementations depending on cpu features. The default implementation will be optimized for the host capabilities.
Currently the use of rep movsb is disabled but we plan to unable it via CMake options.

This implementation is mainly tested on clang but should compile with GCC as well. For now it doesn't build on MSVC.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sivachandra added inline comments.Feb 19 2020, 11:17 AM
libc/src/string/CMakeLists.txt
66

We don't need to worry about #2 for now.

For cross-compilation in case of #1, if we set up the build vars appropriately, I believe it should not be a problem.

#3 is currently a problem because of the way the add_entrypoint_object rule works: it expects the entrypoint name and the target name to be the same. We can add an optional argument to the rule which explicitly provides the entrypoint name. This will allow us to add multiple targets for the same entrypoint. We can setup our targets as follows:

src/string/
    - x86_64 # x86_64 specific directory.
          - CMakeLists.txt # Lists the targets for the various
                           # x86_64 flavors which all use the
                           # single memcpy.cpp source file
    - CMakeLists.txt # Lists the target for the release version
                     # of memcpy built using the best compile
                     # for the target machine.
    - memcpy.cpp # The actual platform independent memcpy
                 # implementation.

The same structure can be followed for tests as well. WDYT?

gchatelet marked 5 inline comments as done.Feb 20 2020, 5:58 AM
gchatelet added inline comments.
libc/src/string/CMakeLists.txt
66

@sivachandra SGTM
Can you try to amend add_entrypoint_object in a separate patch?

Separately, I came up with a CMake function to test cpu flags on linux/mac this will help for 3 and 2.

x86 and x86_64will have the same implementation any suggestion on how to name the directory? x86_32_64 ?

sivachandra added inline comments.Feb 20 2020, 10:27 PM
libc/src/string/CMakeLists.txt
66

Done now: https://reviews.llvm.org/D74948

Conventionally, pieces common to x86_64 and other x86 flavors are all put together in a directory named x86. I do not have an easy to way to test on non-x86_64 machines. So, may be just start with x86_64 for now and leave a comment that it works for other x86 flavors also? If you have a better way, feel free to propose.

gchatelet updated this revision to Diff 247261.Feb 28 2020, 7:13 AM

This version creates and tests many implementations, and select one for publication into llvmlibc.a

gchatelet marked 3 inline comments as done.Feb 28 2020, 7:20 AM

@sivachandra it seems there's a problem with the NAME attribute in add_entrypoint_object as the symbol being exported in the final library is not memcpy anymore but the provided NAME

 % nm /tmp/llvm-project_rel_compiled-with-clang/projects/libc/lib/libllvmlibc.a

__errno_location.o:
0000000000000000 T __errno_location
0000000000000000 T _ZN11__llvm_libc16__errno_locationEv
0000000000000000 b _ZN11__llvm_libcL7__errnoE

strcpy.o:
                 U memcpy
0000000000000000 T strcpy
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcpyEPcPKc

strcat.o:
0000000000000000 T strcat
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcatEPcPKc
                 U _ZN11__llvm_libc6strcpyEPcPKc

memcpy.o:
0000000000000000 A memcpy_x86_64_opt_avx512f
0000000000000000 T _ZN11__llvm_libc25memcpy_x86_64_opt_avx512fEPvPKvm
0000000000000000 T _ZN11__llvm_libc6memcpyEPvPKvm

mmap.o:
0000000000000000 T mmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc4mmapEPvmiiil

munmap.o:
0000000000000000 T munmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc6munmapEPvm

raise.o:
0000000000000000 T raise
0000000000000000 T _ZN11__llvm_libc5raiseEi
libc/cmake/modules/LLVMLibCRules.cmake
160

@sivachandra this should be submitted as a separate patch but it's better to have context for the change.

libc/src/string/CMakeLists.txt
0–1

I'm removing this since all implementations will be tested from now on.

libc/src/string/memcpy.cpp
15

We need to selectively declare the entry point depending on if we generate test or final implementation.

@sivachandra it seems there's a problem with the NAME attribute in add_entrypoint_object as the symbol being exported in the final library is not memcpy anymore but the provided NAME

 % nm /tmp/llvm-project_rel_compiled-with-clang/projects/libc/lib/libllvmlibc.a

__errno_location.o:
0000000000000000 T __errno_location
0000000000000000 T _ZN11__llvm_libc16__errno_locationEv
0000000000000000 b _ZN11__llvm_libcL7__errnoE

strcpy.o:
                 U memcpy
0000000000000000 T strcpy
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcpyEPcPKc

strcat.o:
0000000000000000 T strcat
                 U strlen
0000000000000000 T _ZN11__llvm_libc6strcatEPcPKc
                 U _ZN11__llvm_libc6strcpyEPcPKc

memcpy.o:
0000000000000000 A memcpy_x86_64_opt_avx512f
0000000000000000 T _ZN11__llvm_libc25memcpy_x86_64_opt_avx512fEPvPKvm
0000000000000000 T _ZN11__llvm_libc6memcpyEPvPKvm

mmap.o:
0000000000000000 T mmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc4mmapEPvmiiil

munmap.o:
0000000000000000 T munmap
                 U _ZN11__llvm_libc16__errno_locationEv
0000000000000000 T _ZN11__llvm_libc6munmapEPvm

raise.o:
0000000000000000 T raise
0000000000000000 T _ZN11__llvm_libc5raiseEi

I thought your use case was the other way around: you want to have multiple targets with the same entrypoint name. Is this not correct? The first argument to add_entrypoint_object is the target name (which is in sync with the CMake convention of using target names as the first arg.) The NAME option specifies the entrypoint name.

libc/cmake/modules/LLVMLibCRules.cmake
160

This is fine.

libc/src/string/CMakeLists.txt
27

The NAME is the entrypoint name and not the target name. May be you mean the opposite here? As in, instead of memcpy on line 54, you want ${memcpy_name} and memcpy for the NAME argument?

libc/src/string/memcpy.cpp
15

Test or final, all of them should have the same entrypoint name. No?

For example, memcpy_config1_test will depend on memcpy_config1 but will call __llvm_libc::memcpy for the test.

libc/src/string/memcpy_arch_specific.h.def
61

This should all be just memcpy. May be I am missing something still.

libc/test/src/string/CMakeLists.txt
30

This seems to be setup correctly. But, instead of memcpy_name, they should be called memcpy_config_name to make it clear that they are all memcpy implementations in different configs.

36

This as well.

libc/test/src/string/memcpy_test.cpp
12

This should not be required.

46

This should also not be required.

MaskRay added inline comments.Mon, Mar 2, 10:00 PM
libc/src/string/CMakeLists.txt
66

Does __attribute__((target("avx"))) meet the needs?

As to ifunc, it needs non-trivial work in the rtld. Even in a -static (but not -static-pie) context, there will be R_X86_64_IRELATIVE relocations and crt should resolve them.

gchatelet updated this revision to Diff 249635.Wed, Mar 11, 8:33 AM
gchatelet marked 12 inline comments as done.
  • address comments and rebase

I think we misunderstood each other.

My original goal was to also support shared library with runtime dispatch (postponed but we'll do it eventually)
In that case we need a single library with all the implementations, they need to have different names to prevent ODR, and memcpy would be the dispatching function.
That's why I provided the custom definitions MEMCPY_NAME=${memcpy_name}.

Now I can indeed remove all of this, have different targets generating different object files all with the same memcpy function in it. This means that I need to create X unit tests and X benchmarks all linking with a different targets but all referring to the same memcpy function (just a different implementation). This works, and I've updated the patch accordingly.

However it will not be straightforward to go from this to the "shared library with runtime dispatch" state. We would need extra machinery to copy the memcpy functions of each implementations into the final object and rename them to prevent ODR.

The current implementation is simpler though. Let me know what you think.

libc/src/string/CMakeLists.txt
66

Does __attribute__((target("avx"))) meet the needs?

Kind of, but

  • It's brittle, I've been bitten a few times passing features with typo and the compiler will happily compile without a warning.
  • It won't work with MSVC

For these reasons I'd rather not use it.

Mostly LGTM. Few minor questions inline.

libc/cmake/modules/LLVMLibCRules.cmake
202

Sorry for not mentioning this earlier. I have copied this part in to another patch of mine.

208

Do we need this still?

376

Likewise, seems to me like COMPILE_DEFINITIONS is not used.

libc/src/string/CMakeLists.txt
45

Should we check if ${opt_level} is available for the machine this target is being built?

libc/test/src/string/memcpy_test.cpp
13

Instead of this, you can include src/string/memcpy.h.

gchatelet updated this revision to Diff 249903.Thu, Mar 12, 5:05 AM
gchatelet marked 9 inline comments as done.

Address comments

libc/cmake/modules/LLVMLibCRules.cmake
202

Yes, I think I messed up with the merge, should be fixed.

208

Right now, support for repmovsb is disabled. This is a customization point that should be exposed through preprocessor definitions (and CMake options).
As of this patch this is not needed but I expect to need it soon-ish.

libc/src/string/CMakeLists.txt
45

Yes, makes sense.
I'm still not super happy about the introspection part, it's cumbersome to use...
I'll work on it as a separate patch if you don't mind.

libc/test/src/string/memcpy_test.cpp
13

Ha sure !

Harbormaster completed remote builds in B48970: Diff 249903.
sivachandra accepted this revision.Thu, Mar 12, 8:12 AM
sivachandra marked an inline comment as done.
sivachandra added inline comments.
libc/src/string/CMakeLists.txt
45

SGTM

This revision is now accepted and ready to land.Thu, Mar 12, 8:12 AM
gchatelet updated this revision to Diff 250296.Fri, Mar 13, 1:54 PM

I revamped the cpu feature part. It's much simpler and easier to understand.
Now tests that can't run on the host will be skipped.

This last modification does not change the memcpy implementation but how we detect cpu features and how we build/test the memcpy implementations.
There are less global variables only ALL_CPU_FEATURES and HOST_CPU_FEATURES.
The flags are built through compute_flags and host compatibility tested with host_supports.
All memcpy implementations are added to a global property to be passed down to tests.
Tests can query the target for their required cpu features and test the host against them, hence testing only the implementations that the host can execute.
This also remove the need for individual check files.

Overall I think the design is much simpler and straightforward.

As this patch gets closer to landing I think it would make sense to update the description :)

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
10

Is it necessary to print this?

libc/cmake/modules/cpu_features/check_cpu_features.cpp.in
5–11

Is it necessary to define these? Does DEFINITIONS rely on the definitions of macros? Also I think handling MSVC here is strange when compute_flags does not currently.

20–21

Why not putchar and puts?

libc/test/src/string/CMakeLists.txt
32

Would you mind explaining this? It seems like ${flags} will just be -march=native, and the work above to find flags gets ignored.

gchatelet edited the summary of this revision. (Show Details)Mon, Mar 16, 7:40 AM
gchatelet updated this revision to Diff 250584.Mon, Mar 16, 9:24 AM
gchatelet marked 8 inline comments as done.

Address comments and rebase

libc/cmake/modules/LLVMLibCCheckCpuFeatures.cmake
10

Not really.

libc/cmake/modules/cpu_features/check_cpu_features.cpp.in
5–11

Right, it's probably better to defer anything related to MSVC at that point.
Also I'll add documentation to this file because it's a bit obscure.

20–21

I used putchar for a single character but I can't use puts since it automatically appends a line feed.

libc/test/src/string/CMakeLists.txt
32

Actually there's no need for flags here, the implementation has already been compiled with the correct flags. The test itself doesn't need them.

sivachandra marked an inline comment as done.Mon, Mar 16, 9:29 AM
sivachandra added inline comments.
libc/src/string/CMakeLists.txt
83

It seems like this will not ensure the best flags. Is that the intention? If so, why?

libc/src/string/x86_64/CMakeLists.txt
1 ↗(On Diff #250296)

These listings are already under x86_64. So, do we need to use ${LIBC_TARGET_MACHINE}?

4 ↗(On Diff #250296)

It would be nice if a descriptive error message is shown if one tries to build a target which isn't supported on their machine. ISTM that it is not the case. If indeed so, we can do it in a later round I think.

libc/test/src/string/CMakeLists.txt
32

Same question for me as well. I left a related comment at a different place above.

sivachandra added inline comments.Mon, Mar 16, 10:20 AM
libc/test/src/string/CMakeLists.txt
32

Ah, so remove this line then?

gchatelet marked 6 inline comments as done.Mon, Mar 16, 11:16 AM
gchatelet added inline comments.
libc/src/string/CMakeLists.txt
83

It will, -march=native enables all the features available on the host.
Why do you think -march=native won't get the best flags?

That said if you're on a skylake-avx512 machine it will not use avx512 instructions. This is because -mprefer-vector-width is defaulted to 256 bit width operations (see this phoronix article)

Currently, if on a skylake-avx512 machine the implementation will be the same as the avx one. We would have to measure to be sure it's worth forcing -mprefer-vector-width=512 as well.

libc/src/string/x86_64/CMakeLists.txt
1 ↗(On Diff #250296)

This is because we redirect both x86 and x86_64 versions to this folder and they ought to have different names. We should probably rename this folder if it's unclear. Maybe x86_multi ?

libc/test/src/string/CMakeLists.txt
32

This is a two step process:

  • the implementations get compiled with specific flags
  • when testing we retrieve these flags and check whether the current host supports them

If they are compatible, the already compiled .o file can run on the host, the test file itself doesn't need to receive the flags (only the implementation)

This will be the same for benchmarking, the benchmarking code does not need to be compiled with avx support, only the code under test needs to be.

Do you think it deserves a comment?

sivachandra accepted this revision.Mon, Mar 16, 11:55 AM
sivachandra marked an inline comment as done.

I think what is outstanding wrt comments is very minor. So, feel free to land and we can iterate if required after landing.

libc/src/string/CMakeLists.txt
83

I was of the opinion that the compilers do not have the complete set of capabilities available to them and that is why we have facilities like HWCAP, cpuid etc. But, you are the expert, and if you say what you have is enough, I will take it :)

libc/src/string/x86_64/CMakeLists.txt
1 ↗(On Diff #250296)

Normal convention is to name it just x86. So, name it just x86 then?

libc/test/src/string/CMakeLists.txt
32

AFAICT, compute_flags does not have any side effects. Also, it doesn't look like ${flags} is used anywhere here. So, why call it at all?

gchatelet updated this revision to Diff 251115.Wed, Mar 18, 9:44 AM
gchatelet marked 8 inline comments as done.

Address comments and rebase

libc/src/string/CMakeLists.txt
83

With -march=native the compiler will introspect the CPU with cpuid and detect the available capabilities. We're deferring to the compiler here.

Now for shared libraries and runtime dispatch we'll have to provide such code but we're not there yet.

libc/test/src/string/CMakeLists.txt
32

Yes thank you it was a leftover.

Thank you @sivachandra and @abrachet for the review.
I'll submit it as-is for now and we can iterate from here.

Thank you @sivachandra and @abrachet for the review.
I'll submit it as-is for now and we can iterate from here.

No really, thank you for patiently working this out. The bots are green so it is sticking.

This revision was automatically updated to reflect the committed changes.

This change breaks -DLLVM_INCLUDE_TESTS=OFF:

$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=libc -DLLVM_INCLUDE_TESTS=OFF ../llvm
...
CMake Error at /home/nathan/src/llvm-project/libc/test/src/string/memory_utils/CMakeLists.txt:13 (target_compile_definitions):
  Cannot specify compile definitions for target "utils_test" which is not
  built by this project.
...