This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add optimized bcmp for RISCV
ClosedPublic

Authored by gchatelet on May 15 2023, 6:14 AM.

Details

Summary

[libc] Add optimized bcmp for RISCV

This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.

Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz.

Before

Run on (4 X 1500 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x4)
  L1 Data 32 KiB (x4)
  L2 Unified 2048 KiB (x1)
Load Average: 7.03, 5.98, 3.71
----------------------------------------------------------------------
Benchmark            Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------
BM_Bcmp/0/0        102 ns         60.5 ns     11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0        328 ns          172 ns      3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0        199 ns         99.7 ns      7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0        173 ns         86.5 ns      8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0        105 ns         51.8 ns     13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0        167 ns         93.9 ns      7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0        262 ns          165 ns      3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0        168 ns          105 ns      6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0        108 ns         68.0 ns     10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0      15371 ns         9007 ns        78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096

After

BM_Bcmp/0/0       74.2 ns         49.7 ns     14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0        108 ns         68.1 ns     10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0       80.2 ns         56.0 ns     12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0       92.4 ns         55.7 ns     12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0       79.3 ns         46.8 ns     14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0       98.0 ns         57.9 ns     12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0        132 ns         65.5 ns     10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0        101 ns         60.9 ns     11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0       72.5 ns         50.2 ns     14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0        852 ns          803 ns       854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096

For comparison with glibc

BM_Bcmp/0/0        106 ns         52.6 ns     12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A
BM_Bcmp/1/0        132 ns         77.1 ns      8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B
BM_Bcmp/2/0        122 ns         62.3 ns     10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D
BM_Bcmp/3/0       99.5 ns         64.2 ns     11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L
BM_Bcmp/4/0       86.6 ns         50.2 ns     13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M
BM_Bcmp/5/0        106 ns         61.4 ns     11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q
BM_Bcmp/6/0        145 ns         71.9 ns     10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S
BM_Bcmp/7/0        119 ns         65.6 ns     10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U
BM_Bcmp/8/0       86.4 ns         54.5 ns     13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W
BM_Bcmp/9/0       1090 ns          604 ns      1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096

Diff Detail

Event Timeline

gchatelet created this revision.May 15 2023, 6:14 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 15 2023, 6:14 AM
gchatelet requested review of this revision.May 15 2023, 6:14 AM
sivachandra added inline comments.May 15 2023, 10:54 AM
libc/src/string/memory_utils/bcmp_implementations.h
27

s/uint32_t/BcmpReturnType ? I think it might not compile? May be a cleaner approach would be to provide a BcmpReturnType::TRUE and BcmpReturnType::FALSE, or may be BmpReturnType::ONE.

27

What was offset previously? Was this dead code?

27

Is this the correct check? Should we early-return on equality? IIUC, previously, it was doing an early return on XOR success.

56

Same here: 1U need not always be of uint32_t type.

81

And here.

sivachandra added inline comments.May 15 2023, 11:01 AM
libc/src/string/memory_utils/bcmp_implementations.h
27

Ignore this comment - I misread this.

gchatelet marked 5 inline comments as done.May 16 2023, 2:24 AM
gchatelet added inline comments.
libc/src/string/memory_utils/bcmp_implementations.h
27

s/uint32_t/BcmpReturnType ? I think it might not compile? May be a cleaner approach would be to provide a BcmpReturnType::TRUE and BcmpReturnType::FALSE, or may be BmpReturnType::ONE.

BcmpReturnType is implicitly convertible from uint32_t by design. Since bool
We could provide a NONZERO value though. The best value might be platform dependant.

27

Is this the correct check? Should we early-return on equality? IIUC, previously, it was doing an early return on XOR success.

The code is wrong indeed, I'm surprised that the tests didn't catch it. I'm investigating.

56

Same here: 1U need not always be of uint32_t type.

True, this might break on platforms where unsigned is different from uint32_t

gchatelet edited the summary of this revision. (Show Details)May 16 2023, 5:24 AM
gchatelet updated this revision to Diff 522557.May 16 2023, 5:25 AM
gchatelet marked 3 inline comments as done.
  • address comments
gchatelet added inline comments.May 16 2023, 5:35 AM
libc/src/string/memory_utils/bcmp_implementations.h
27

The tests did catch it, I just didn't rerun them after changing this line...

sivachandra accepted this revision.May 16 2023, 9:00 AM
This revision is now accepted and ready to land.May 16 2023, 9:00 AM

rebase and amend original commit message to reflect correct benchmark numbers

This revision was landed with ongoing or failed builds.May 16 2023, 10:28 AM
This revision was automatically updated to reflect the committed changes.