This is an archive of the discontinued LLVM Phabricator instance.

[llvm-exegesis] Throughput support in analysis mode
ClosedPublic

Authored by lebedev.ri on Feb 3 2019, 2:40 AM.

Details

Summary

D57000 / PR37698 added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)

First, small-scale experiment:

$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o
---
mode:            inverse_throughput
key:             
  instructions:    
    - 'BSF64rr RAX RDX'
  config:          ''
  register_initial_values: 
    - 'RDX=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:    
  - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 }
error:           ''
info:            instruction has no tied variables picking Uses different from defs
assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3
...

If we plug bsfq %r12, %r10 into llvm-mca:
https://godbolt.org/z/ZtOyhJ

Dispatch Width:    4
uOps Per Cycle:    3.00
IPC:               0.50
Block RThroughput: 2.0

So RThroughput mismatch exists.

Now, let's upscale and analyse:


$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html:


And if we now look at https://www.agner.org/optimize/instruction_tables.pdf,
Reciprocal throughput for BSF r,r is listed as 3.
Yay?

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.Feb 3 2019, 2:40 AM
courbet accepted this revision.Feb 4 2019, 12:20 AM

Thanks for working on this Roman !

Yay?

Indeed :)

I've had a look a the analysis file, there a bunch of other interesting ones, e.g. BT, BTC, CRC32, DPPD

We seem to be doing a bad job on X87, but that does not surprise me too much :(

This revision is now accepted and ready to land.Feb 4 2019, 12:20 AM
lebedev.ri marked an inline comment as done.Feb 4 2019, 12:25 AM

Thanks for working on this Roman !

Yay?

Indeed :)

Thank you for the review :)

I've had a look a the analysis file, there a bunch of other interesting ones, e.g. BT, BTC, CRC32, DPPD

We seem to be doing a bad job on X87, but that does not surprise me too much :(

tools/llvm-exegesis/lib/Analysis.cpp
515

@courbet one point of note here. Is min() the correct pick?
It does not seem to matter though, and is what is done in https://reviews.llvm.org/D57000#change-FpxQcgWblhyw

case InstructionBenchmark::InverseThroughput:
  Result = {BenchmarkMeasure::Create("inverse_throughput", MinValue)};
This revision was automatically updated to reflect the committed changes.