Improving the 16B case for equals and three way compare on aarch64. This is beneficial for bcmp and memcmp.
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
This patch does not seem to radically change the performance of bcmp and memcmp. It is neutral or slightly negative on Neoverse N1.
name old speed new speed delta BM_Memcpy/0/0 [__llvm_libc::memcpy,memcpy Google A ] 16.0GB/s ± 4% 15.9GB/s ± 3% ~ (p=0.127 n=20+20) BM_Memcpy/1/0 [__llvm_libc::memcpy,memcpy Google B ] 6.77GB/s ± 9% 7.00GB/s ± 7% +3.47% (p=0.017 n=20+20) BM_Memcpy/2/0 [__llvm_libc::memcpy,memcpy Google D ] 31.0GB/s ± 2% 30.9GB/s ± 2% ~ (p=0.529 n=20+20) BM_Memcpy/3/0 [__llvm_libc::memcpy,memcpy Google L ] 6.39GB/s ± 9% 6.45GB/s ±10% ~ (p=0.588 n=20+19) BM_Memcpy/4/0 [__llvm_libc::memcpy,memcpy Google M ] 5.55GB/s ± 8% 5.64GB/s ± 7% ~ (p=0.398 n=20+20) BM_Memcpy/5/0 [__llvm_libc::memcpy,memcpy Google Q ] 2.77GB/s ±12% 2.77GB/s ± 8% ~ (p=0.923 n=19+20) BM_Memcpy/6/0 [__llvm_libc::memcpy,memcpy Google S ] 6.51GB/s ± 6% 6.62GB/s ± 5% ~ (p=0.068 n=20+20) BM_Memcpy/7/0 [__llvm_libc::memcpy,memcpy Google U ] 8.08GB/s ±10% 7.84GB/s ± 9% ~ (p=0.059 n=18+20) BM_Memcpy/8/0 [__llvm_libc::memcpy,memcpy Google W ] 5.80GB/s ± 5% 5.77GB/s ± 5% ~ (p=0.565 n=20+20) BM_Memcpy/9/0 [__llvm_libc::memcpy,uniform 384 to 4096 ] 41.3GB/s ± 0% 41.3GB/s ± 0% ~ (p=0.547 n=20+20) BM_Memmove/0/0 [__llvm_libc::memmove,memmove Google A ] 2.34GB/s ± 8% 2.37GB/s ± 9% ~ (p=0.369 n=20+20) BM_Memmove/1/0 [__llvm_libc::memmove,memmove Google B ] 5.18GB/s ± 2% 5.17GB/s ± 3% ~ (p=0.583 n=20+20) BM_Memmove/2/0 [__llvm_libc::memmove,memmove Google D ] 10.4GB/s ± 5% 10.3GB/s ± 6% ~ (p=0.461 n=20+20) BM_Memmove/3/0 [__llvm_libc::memmove,memmove Google L ] 4.39GB/s ± 9% 4.42GB/s ± 8% ~ (p=0.659 n=20+20) BM_Memmove/4/0 [__llvm_libc::memmove,memmove Google M ] 3.88GB/s ± 4% 3.84GB/s ± 7% ~ (p=0.383 n=20+20) BM_Memmove/5/0 [__llvm_libc::memmove,memmove Google Q ] 3.57GB/s ±11% 3.51GB/s ±14% ~ (p=0.461 n=20+20) BM_Memmove/6/0 [__llvm_libc::memmove,memmove Google S ] 6.97GB/s ± 5% 7.00GB/s ± 5% ~ (p=0.428 n=19+20) BM_Memmove/7/0 [__llvm_libc::memmove,memmove Google U ] 2.95GB/s ±11% 3.00GB/s ±11% ~ (p=0.583 n=20+20) BM_Memmove/8/0 [__llvm_libc::memmove,memmove Google W ] 5.41GB/s ± 4% 5.41GB/s ± 3% ~ (p=0.925 n=20+20) BM_Memmove/9/0 [__llvm_libc::memmove,uniform 384 to 4096] 34.2GB/s ± 1% 34.1GB/s ± 0% ~ (p=0.102 n=20+20) BM_Memcmp/0/0 [__llvm_libc::memcmp,memcmp Google A ] 1.50GB/s ± 6% 1.47GB/s ± 5% -2.07% (p=0.028 n=20+20) BM_Memcmp/1/0 [__llvm_libc::memcmp,memcmp Google B ] 4.25GB/s ± 3% 4.27GB/s ± 4% ~ (p=0.565 n=20+20) BM_Memcmp/2/0 [__llvm_libc::memcmp,memcmp Google D ] 2.87GB/s ± 3% 2.85GB/s ± 4% ~ (p=0.201 n=20+20) BM_Memcmp/3/0 [__llvm_libc::memcmp,memcmp Google L ] 3.14GB/s ± 1% 3.17GB/s ± 1% +0.98% (p=0.000 n=19+20) BM_Memcmp/4/0 [__llvm_libc::memcmp,memcmp Google M ] 1.29GB/s ± 8% 1.29GB/s ± 9% ~ (p=0.620 n=20+20) BM_Memcmp/5/0 [__llvm_libc::memcmp,memcmp Google Q ] 2.57GB/s ± 6% 2.53GB/s ±10% -1.63% (p=0.046 n=20+20) BM_Memcmp/6/0 [__llvm_libc::memcmp,memcmp Google S ] 3.80GB/s ± 2% 3.80GB/s ± 3% ~ (p=0.835 n=19+20) BM_Memcmp/7/0 [__llvm_libc::memcmp,memcmp Google U ] 2.78GB/s ± 4% 2.74GB/s ± 3% -1.49% (p=0.017 n=20+20) BM_Memcmp/8/0 [__llvm_libc::memcmp,memcmp Google W ] 1.54GB/s ± 1% 1.52GB/s ± 1% -1.67% (p=0.000 n=20+20) BM_Memcmp/9/0 [__llvm_libc::memcmp,uniform 384 to 4096 ] 23.8GB/s ± 0% 23.8GB/s ± 0% +0.20% (p=0.000 n=20+20) BM_Bcmp/0/0 [__llvm_libc::bcmp,memcmp Google A ] 1.66GB/s ± 2% 1.58GB/s ± 2% -4.52% (p=0.000 n=16+16) BM_Bcmp/1/0 [__llvm_libc::bcmp,memcmp Google B ] 4.13GB/s ± 3% 4.12GB/s ± 3% ~ (p=0.435 n=19+19) BM_Bcmp/2/0 [__llvm_libc::bcmp,memcmp Google D ] 3.01GB/s ± 1% 2.89GB/s ± 4% -4.09% (p=0.000 n=17+20) BM_Bcmp/3/0 [__llvm_libc::bcmp,memcmp Google L ] 3.14GB/s ± 1% 3.07GB/s ± 1% -2.22% (p=0.000 n=19+20) BM_Bcmp/4/0 [__llvm_libc::bcmp,memcmp Google M ] 1.29GB/s ± 7% 1.23GB/s ± 9% ~ (p=0.081 n=20+20) BM_Bcmp/5/0 [__llvm_libc::bcmp,memcmp Google Q ] 2.46GB/s ± 4% 2.32GB/s ± 8% -5.55% (p=0.000 n=20+19) BM_Bcmp/6/0 [__llvm_libc::bcmp,memcmp Google S ] 3.81GB/s ± 3% 3.75GB/s ± 3% -1.55% (p=0.002 n=20+20) BM_Bcmp/7/0 [__llvm_libc::bcmp,memcmp Google U ] 2.81GB/s ± 3% 2.72GB/s ± 1% -3.18% (p=0.000 n=20+17) BM_Bcmp/8/0 [__llvm_libc::bcmp,memcmp Google W ] 1.57GB/s ± 1% 1.52GB/s ± 1% -3.23% (p=0.000 n=19+19) BM_Bcmp/9/0 [__llvm_libc::bcmp,uniform 384 to 4096 ] 23.0GB/s ± 0% 23.0GB/s ± 0% -0.28% (p=0.000 n=20+20) BM_Memset/0/0 [__llvm_libc::memset,memset Google A ] 10.3GB/s ± 3% 10.4GB/s ± 5% ~ (p=0.224 n=19+20) BM_Memset/1/0 [__llvm_libc::memset,memset Google B ] 8.64GB/s ± 8% 8.64GB/s ± 6% ~ (p=0.967 n=20+19) BM_Memset/2/0 [__llvm_libc::memset,memset Google D ] 26.1GB/s ± 2% 26.2GB/s ± 2% ~ (p=0.081 n=20+20) BM_Memset/3/0 [__llvm_libc::memset,memset Google L ] 10.9GB/s ± 4% 11.0GB/s ± 4% ~ (p=0.174 n=20+20) BM_Memset/4/0 [__llvm_libc::memset,memset Google M ] 20.1GB/s ± 2% 20.0GB/s ± 3% ~ (p=0.383 n=20+20) BM_Memset/5/0 [__llvm_libc::memset,memset Google Q ] 13.4GB/s ± 3% 13.1GB/s ± 4% -2.07% (p=0.001 n=20+20) BM_Memset/6/0 [__llvm_libc::memset,memset Google S ] 10.5GB/s ± 4% 10.6GB/s ± 4% ~ (p=0.253 n=20+20) BM_Memset/7/0 [__llvm_libc::memset,memset Google U ] 9.51GB/s ± 4% 9.47GB/s ± 4% ~ (p=0.461 n=20+20) BM_Memset/8/0 [__llvm_libc::memset,memset Google W ] 12.1GB/s ± 2% 12.1GB/s ± 2% ~ (p=0.512 n=20+20) BM_Memset/9/0 [__llvm_libc::memset,uniform 384 to 4096 ] 44.8GB/s ± 0% 44.8GB/s ± 0% ~ (p=0.620 n=20+20) BM_Bzero/0/0 [__llvm_libc::bzero,memset Google A ] 10.6GB/s ± 4% 10.6GB/s ± 4% ~ (p=0.383 n=20+20) BM_Bzero/1/0 [__llvm_libc::bzero,memset Google B ] 8.76GB/s ± 4% 8.83GB/s ± 4% ~ (p=0.495 n=20+20) BM_Bzero/2/0 [__llvm_libc::bzero,memset Google D ] 28.5GB/s ± 2% 28.5GB/s ± 2% ~ (p=0.341 n=20+20) BM_Bzero/3/0 [__llvm_libc::bzero,memset Google L ] 10.9GB/s ± 3% 10.9GB/s ± 2% ~ (p=0.778 n=19+17) BM_Bzero/4/0 [__llvm_libc::bzero,memset Google M ] 21.6GB/s ± 2% 21.4GB/s ± 3% -0.96% (p=0.024 n=20+19) BM_Bzero/5/0 [__llvm_libc::bzero,memset Google Q ] 13.4GB/s ± 3% 13.3GB/s ± 5% ~ (p=0.121 n=20+20) BM_Bzero/6/0 [__llvm_libc::bzero,memset Google S ] 10.4GB/s ± 4% 10.4GB/s ± 4% ~ (p=0.445 n=20+20) BM_Bzero/7/0 [__llvm_libc::bzero,memset Google U ] 9.45GB/s ± 5% 9.48GB/s ± 5% ~ (p=0.512 n=20+20) BM_Bzero/8/0 [__llvm_libc::bzero,memset Google W ] 12.1GB/s ± 2% 12.1GB/s ± 2% ~ (p=0.841 n=20+20) BM_Bzero/9/0 [__llvm_libc::bzero,uniform 384 to 4096 ] 71.4GB/s ± 0% 71.4GB/s ± 0% ~ (p=0.904 n=20+20)