This is an archive of the discontinued LLVM Phabricator instance.

[libc] improve {mem|b}cmp for aarch64
AbandonedPublic

Authored by gchatelet on Aug 18 2022, 5:31 AM.

Details

Reviewers
danlark
avieira
Summary

Improving the 16B case for equals and three way compare on aarch64. This is beneficial for bcmp and memcmp.

Diff Detail

Event Timeline

gchatelet created this revision.Aug 18 2022, 5:31 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 18 2022, 5:31 AM
gchatelet requested review of this revision.Aug 18 2022, 5:31 AM

This patch does not seem to radically change the performance of bcmp and memcmp. It is neutral or slightly negative on Neoverse N1.

name                                                       old speed               new speed               delta
BM_Memcpy/0/0  [__llvm_libc::memcpy,memcpy Google A     ]  16.0GB/s ± 4%           15.9GB/s ± 3%    ~           (p=0.127 n=20+20)
BM_Memcpy/1/0  [__llvm_libc::memcpy,memcpy Google B     ]  6.77GB/s ± 9%           7.00GB/s ± 7%  +3.47%        (p=0.017 n=20+20)
BM_Memcpy/2/0  [__llvm_libc::memcpy,memcpy Google D     ]  31.0GB/s ± 2%           30.9GB/s ± 2%    ~           (p=0.529 n=20+20)
BM_Memcpy/3/0  [__llvm_libc::memcpy,memcpy Google L     ]  6.39GB/s ± 9%           6.45GB/s ±10%    ~           (p=0.588 n=20+19)
BM_Memcpy/4/0  [__llvm_libc::memcpy,memcpy Google M     ]  5.55GB/s ± 8%           5.64GB/s ± 7%    ~           (p=0.398 n=20+20)
BM_Memcpy/5/0  [__llvm_libc::memcpy,memcpy Google Q     ]  2.77GB/s ±12%           2.77GB/s ± 8%    ~           (p=0.923 n=19+20)
BM_Memcpy/6/0  [__llvm_libc::memcpy,memcpy Google S     ]  6.51GB/s ± 6%           6.62GB/s ± 5%    ~           (p=0.068 n=20+20)
BM_Memcpy/7/0  [__llvm_libc::memcpy,memcpy Google U     ]  8.08GB/s ±10%           7.84GB/s ± 9%    ~           (p=0.059 n=18+20)
BM_Memcpy/8/0  [__llvm_libc::memcpy,memcpy Google W     ]  5.80GB/s ± 5%           5.77GB/s ± 5%    ~           (p=0.565 n=20+20)
BM_Memcpy/9/0  [__llvm_libc::memcpy,uniform 384 to 4096 ]  41.3GB/s ± 0%           41.3GB/s ± 0%    ~           (p=0.547 n=20+20)
BM_Memmove/0/0 [__llvm_libc::memmove,memmove Google A   ]  2.34GB/s ± 8%           2.37GB/s ± 9%    ~           (p=0.369 n=20+20)
BM_Memmove/1/0 [__llvm_libc::memmove,memmove Google B   ]  5.18GB/s ± 2%           5.17GB/s ± 3%    ~           (p=0.583 n=20+20)
BM_Memmove/2/0 [__llvm_libc::memmove,memmove Google D   ]  10.4GB/s ± 5%           10.3GB/s ± 6%    ~           (p=0.461 n=20+20)
BM_Memmove/3/0 [__llvm_libc::memmove,memmove Google L   ]  4.39GB/s ± 9%           4.42GB/s ± 8%    ~           (p=0.659 n=20+20)
BM_Memmove/4/0 [__llvm_libc::memmove,memmove Google M   ]  3.88GB/s ± 4%           3.84GB/s ± 7%    ~           (p=0.383 n=20+20)
BM_Memmove/5/0 [__llvm_libc::memmove,memmove Google Q   ]  3.57GB/s ±11%           3.51GB/s ±14%    ~           (p=0.461 n=20+20)
BM_Memmove/6/0 [__llvm_libc::memmove,memmove Google S   ]  6.97GB/s ± 5%           7.00GB/s ± 5%    ~           (p=0.428 n=19+20)
BM_Memmove/7/0 [__llvm_libc::memmove,memmove Google U   ]  2.95GB/s ±11%           3.00GB/s ±11%    ~           (p=0.583 n=20+20)
BM_Memmove/8/0 [__llvm_libc::memmove,memmove Google W   ]  5.41GB/s ± 4%           5.41GB/s ± 3%    ~           (p=0.925 n=20+20)
BM_Memmove/9/0 [__llvm_libc::memmove,uniform 384 to 4096]  34.2GB/s ± 1%           34.1GB/s ± 0%    ~           (p=0.102 n=20+20)
BM_Memcmp/0/0  [__llvm_libc::memcmp,memcmp Google A     ]  1.50GB/s ± 6%           1.47GB/s ± 5%  -2.07%        (p=0.028 n=20+20)
BM_Memcmp/1/0  [__llvm_libc::memcmp,memcmp Google B     ]  4.25GB/s ± 3%           4.27GB/s ± 4%    ~           (p=0.565 n=20+20)
BM_Memcmp/2/0  [__llvm_libc::memcmp,memcmp Google D     ]  2.87GB/s ± 3%           2.85GB/s ± 4%    ~           (p=0.201 n=20+20)
BM_Memcmp/3/0  [__llvm_libc::memcmp,memcmp Google L     ]  3.14GB/s ± 1%           3.17GB/s ± 1%  +0.98%        (p=0.000 n=19+20)
BM_Memcmp/4/0  [__llvm_libc::memcmp,memcmp Google M     ]  1.29GB/s ± 8%           1.29GB/s ± 9%    ~           (p=0.620 n=20+20)
BM_Memcmp/5/0  [__llvm_libc::memcmp,memcmp Google Q     ]  2.57GB/s ± 6%           2.53GB/s ±10%  -1.63%        (p=0.046 n=20+20)
BM_Memcmp/6/0  [__llvm_libc::memcmp,memcmp Google S     ]  3.80GB/s ± 2%           3.80GB/s ± 3%    ~           (p=0.835 n=19+20)
BM_Memcmp/7/0  [__llvm_libc::memcmp,memcmp Google U     ]  2.78GB/s ± 4%           2.74GB/s ± 3%  -1.49%        (p=0.017 n=20+20)
BM_Memcmp/8/0  [__llvm_libc::memcmp,memcmp Google W     ]  1.54GB/s ± 1%           1.52GB/s ± 1%  -1.67%        (p=0.000 n=20+20)
BM_Memcmp/9/0  [__llvm_libc::memcmp,uniform 384 to 4096 ]  23.8GB/s ± 0%           23.8GB/s ± 0%  +0.20%        (p=0.000 n=20+20)
BM_Bcmp/0/0    [__llvm_libc::bcmp,memcmp Google A       ]  1.66GB/s ± 2%           1.58GB/s ± 2%  -4.52%        (p=0.000 n=16+16)
BM_Bcmp/1/0    [__llvm_libc::bcmp,memcmp Google B       ]  4.13GB/s ± 3%           4.12GB/s ± 3%    ~           (p=0.435 n=19+19)
BM_Bcmp/2/0    [__llvm_libc::bcmp,memcmp Google D       ]  3.01GB/s ± 1%           2.89GB/s ± 4%  -4.09%        (p=0.000 n=17+20)
BM_Bcmp/3/0    [__llvm_libc::bcmp,memcmp Google L       ]  3.14GB/s ± 1%           3.07GB/s ± 1%  -2.22%        (p=0.000 n=19+20)
BM_Bcmp/4/0    [__llvm_libc::bcmp,memcmp Google M       ]  1.29GB/s ± 7%           1.23GB/s ± 9%    ~           (p=0.081 n=20+20)
BM_Bcmp/5/0    [__llvm_libc::bcmp,memcmp Google Q       ]  2.46GB/s ± 4%           2.32GB/s ± 8%  -5.55%        (p=0.000 n=20+19)
BM_Bcmp/6/0    [__llvm_libc::bcmp,memcmp Google S       ]  3.81GB/s ± 3%           3.75GB/s ± 3%  -1.55%        (p=0.002 n=20+20)
BM_Bcmp/7/0    [__llvm_libc::bcmp,memcmp Google U       ]  2.81GB/s ± 3%           2.72GB/s ± 1%  -3.18%        (p=0.000 n=20+17)
BM_Bcmp/8/0    [__llvm_libc::bcmp,memcmp Google W       ]  1.57GB/s ± 1%           1.52GB/s ± 1%  -3.23%        (p=0.000 n=19+19)
BM_Bcmp/9/0    [__llvm_libc::bcmp,uniform 384 to 4096   ]  23.0GB/s ± 0%           23.0GB/s ± 0%  -0.28%        (p=0.000 n=20+20)
BM_Memset/0/0  [__llvm_libc::memset,memset Google A     ]  10.3GB/s ± 3%           10.4GB/s ± 5%    ~           (p=0.224 n=19+20)
BM_Memset/1/0  [__llvm_libc::memset,memset Google B     ]  8.64GB/s ± 8%           8.64GB/s ± 6%    ~           (p=0.967 n=20+19)
BM_Memset/2/0  [__llvm_libc::memset,memset Google D     ]  26.1GB/s ± 2%           26.2GB/s ± 2%    ~           (p=0.081 n=20+20)
BM_Memset/3/0  [__llvm_libc::memset,memset Google L     ]  10.9GB/s ± 4%           11.0GB/s ± 4%    ~           (p=0.174 n=20+20)
BM_Memset/4/0  [__llvm_libc::memset,memset Google M     ]  20.1GB/s ± 2%           20.0GB/s ± 3%    ~           (p=0.383 n=20+20)
BM_Memset/5/0  [__llvm_libc::memset,memset Google Q     ]  13.4GB/s ± 3%           13.1GB/s ± 4%  -2.07%        (p=0.001 n=20+20)
BM_Memset/6/0  [__llvm_libc::memset,memset Google S     ]  10.5GB/s ± 4%           10.6GB/s ± 4%    ~           (p=0.253 n=20+20)
BM_Memset/7/0  [__llvm_libc::memset,memset Google U     ]  9.51GB/s ± 4%           9.47GB/s ± 4%    ~           (p=0.461 n=20+20)
BM_Memset/8/0  [__llvm_libc::memset,memset Google W     ]  12.1GB/s ± 2%           12.1GB/s ± 2%    ~           (p=0.512 n=20+20)
BM_Memset/9/0  [__llvm_libc::memset,uniform 384 to 4096 ]  44.8GB/s ± 0%           44.8GB/s ± 0%    ~           (p=0.620 n=20+20)
BM_Bzero/0/0   [__llvm_libc::bzero,memset Google A      ]  10.6GB/s ± 4%           10.6GB/s ± 4%    ~           (p=0.383 n=20+20)
BM_Bzero/1/0   [__llvm_libc::bzero,memset Google B      ]  8.76GB/s ± 4%           8.83GB/s ± 4%    ~           (p=0.495 n=20+20)
BM_Bzero/2/0   [__llvm_libc::bzero,memset Google D      ]  28.5GB/s ± 2%           28.5GB/s ± 2%    ~           (p=0.341 n=20+20)
BM_Bzero/3/0   [__llvm_libc::bzero,memset Google L      ]  10.9GB/s ± 3%           10.9GB/s ± 2%    ~           (p=0.778 n=19+17)
BM_Bzero/4/0   [__llvm_libc::bzero,memset Google M      ]  21.6GB/s ± 2%           21.4GB/s ± 3%  -0.96%        (p=0.024 n=20+19)
BM_Bzero/5/0   [__llvm_libc::bzero,memset Google Q      ]  13.4GB/s ± 3%           13.3GB/s ± 5%    ~           (p=0.121 n=20+20)
BM_Bzero/6/0   [__llvm_libc::bzero,memset Google S      ]  10.4GB/s ± 4%           10.4GB/s ± 4%    ~           (p=0.445 n=20+20)
BM_Bzero/7/0   [__llvm_libc::bzero,memset Google U      ]  9.45GB/s ± 5%           9.48GB/s ± 5%    ~           (p=0.512 n=20+20)
BM_Bzero/8/0   [__llvm_libc::bzero,memset Google W      ]  12.1GB/s ± 2%           12.1GB/s ± 2%    ~           (p=0.841 n=20+20)
BM_Bzero/9/0   [__llvm_libc::bzero,uniform 384 to 4096  ]  71.4GB/s ± 0%           71.4GB/s ± 0%    ~           (p=0.904 n=20+20)
gchatelet abandoned this revision.Nov 16 2022, 2:43 AM