This code is taken from public domain.
https://github.com/jsonn/src/blob/trunk/common/lib/libc/hash/sha1/sha1.c
I wrote a sha1 command and run it on my Xeon E5-2680 v2 2.80GHz machine.
Here is a result. The new hash function is 37% faster than before.
Performance counter stats for './llvm-sha1-old /ssd/build/bin/lld' (10 runs):
6640.503687 task-clock (msec) # 1.001 CPUs utilized ( +- 0.03% ) 54 context-switches # 0.008 K/sec ( +- 5.03% ) 5 cpu-migrations # 0.001 K/sec ( +- 31.73% ) 183,803 page-faults # 0.028 M/sec ( +- 0.00% ) 18,527,954,113 cycles # 2.790 GHz ( +- 0.03% ) 4,993,237,485 stalled-cycles-frontend # 26.95% frontend cycles idle ( +- 0.11% ) <not supported> stalled-cycles-backend 50,217,149,423 instructions # 2.71 insns per cycle # 0.10 stalled cycles per insn ( +- 0.00% ) 6,094,322,337 branches # 917.750 M/sec ( +- 0.00% ) 11,778,239 branch-misses # 0.19% of all branches ( +- 0.01% ) 6.634017401 seconds time elapsed ( +- 0.03% )
Performance counter stats for './llvm-sha1-new /ssd/build/bin/lld' (10 runs):
4167.062720 task-clock (msec) # 1.001 CPUs utilized ( +- 0.02% ) 52 context-switches # 0.012 K/sec ( +- 16.45% ) 7 cpu-migrations # 0.002 K/sec ( +- 32.20% ) 183,804 page-faults # 0.044 M/sec ( +- 0.00% ) 11,626,611,958 cycles # 2.790 GHz ( +- 0.02% ) 4,491,897,976 stalled-cycles-frontend # 38.63% frontend cycles idle ( +- 0.05% ) <not supported> stalled-cycles-backend 24,320,180,617 instructions # 2.09 insns per cycle # 0.18 stalled cycles per insn ( +- 0.00% ) 1,574,674,576 branches # 377.886 M/sec ( +- 0.00% ) 11,769,693 branch-misses # 0.75% of all branches ( +- 0.00% ) 4.163251552 seconds time elapsed ( +- 0.02% )