Hi,
Here is an optimized memcmp for AArch64. This uses __builtin_memcmp_inline for which I have a RFC at D105440 .
I also changed some general implementations to prefer using Equal where possible and less ThreeWayCmp's. I have not benchmarked these changes on x86 though. Let me know what ya think.
I also added extra testing as I found the existing tests to be lacking at times.
Kind regards,
Andre
Let's use const char *lhs, const char *rhs, size_t count here and do the casting in the calling function.