D57000 / PR37698 added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)
First, small-scale experiment:
$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o --- mode: inverse_throughput key: instructions: - 'BSF64rr RAX RDX' config: '' register_initial_values: - 'RDX=0x0' cpu_name: bdver2 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 } error: '' info: instruction has no tied variables picking Uses different from defs assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3 ...
If we plug bsfq %r12, %r10 into llvm-mca:
https://godbolt.org/z/ZtOyhJ
Dispatch Width: 4 uOps Per Cycle: 3.00 IPC: 0.50 Block RThroughput: 2.0
So RThroughput mismatch exists.
Now, let's upscale and analyse:
$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html:
And if we now look at https://www.agner.org/optimize/instruction_tables.pdf,
Reciprocal throughput for BSF r,r is listed as 3.
Yay?