Patch D56593 by @courbet causes bcmp() to be emitted in some cases. Unfortunately, TTI::MemCmpExpansionOptions() is not overridden by AArch64, which results in calls to bcmp() even for small constant values. In a proprietary benchmark we see a performance drop of about 12% on PNG compression before this patch, though it passes all tests.
This patch mirrors how X86 initializes TTI::MemCmpExpansionOptions() to then expand calls to bcmp() when appropriate. No tuning of the parameters was performed, but, at this point, it's enough to recover the performance drop above.
This problem also exists on ARM. Once a consensus is reached for AArch64, we can work to fix ARM as well.