It is advised to read the post motivating the creation of __builtin_memcpy_inline first.
The patch focuses on static library but allows creation of several implementations depending on cpu features. The default implementation will be optimized for the host capabilities.
Currently the use of rep movsb is disabled but we plan to unable it via CMake options.
This implementation is mainly tested on clang but should compile with GCC as well. For now it doesn't build on MSVC.
@sivachandra this should be submitted as a separate patch but it's better to have context for the change.