As it is being discussed in https://github.com/llvm/llvm-project/issues/59325,
at least for the instructions with tied variables,
when trying to parallelize the instructions,
register selection is rather bad, and may either
use a register which we have used for def,
or vice versa.
That introduces serialization, and leads to
overly pessimistic inverse throughput measurement.
The new implementation avoids that,
New result:
$ ninja llvm-exegesis && ./bin/llvm-exegesis --mode=inverse_throughput --opcode-name=VFMADD132PDr --max-configs-per-opcode=9182 ninja: no work to do. Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4af034.o --- mode: inverse_throughput key: instructions: - 'VFMADD132PDr XMM3 XMM3 XMM4 XMM8' - 'VFMADD132PDr XMM5 XMM5 XMM14 XMM7' - 'VFMADD132PDr XMM10 XMM10 XMM11 XMM15' - 'VFMADD132PDr XMM13 XMM13 XMM15 XMM15' - 'VFMADD132PDr XMM12 XMM12 XMM11 XMM1' - 'VFMADD132PDr XMM0 XMM0 XMM6 XMM9' - 'VFMADD132PDr XMM2 XMM2 XMM15 XMM11' config: '' register_initial_values: - 'XMM3=0x0' - 'XMM4=0x0' - 'XMM8=0x0' - 'MXCSR=0x0' - 'XMM5=0x0' - 'XMM14=0x0' - 'XMM7=0x0' - 'XMM10=0x0' - 'XMM11=0x0' - 'XMM15=0x0' - 'XMM13=0x0' - 'XMM12=0x0' - 'XMM1=0x0' - 'XMM0=0x0' - 'XMM6=0x0' - 'XMM9=0x0' - 'XMM2=0x0' cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 0.6403, per_snippet_value: 4.4821 } error: '' info: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, randomizing registers for uses assembled_snippetheck generated assembly with: /usr/bin/objdump -d /tmp/snippet-f05c2f.o --- mode: inverse_throughput key: instructions: - 'VFMADD132PDr XMM15 XMM15 XMM11 XMM2' - 'VFMADD132PDr XMM5 XMM5 XMM11 XMM2' - 'VFMADD132PDr XMM14 XMM14 XMM11 XMM2' - 'VFMADD132PDr XMM4 XMM4 XMM11 XMM2' - 'VFMADD132PDr XMM8 XMM8 XMM11 XMM2' - 'VFMADD132PDr XMM3 XMM3 XMM11 XMM2' - 'VFMADD132PDr XMM10 XMM10 XMM11 XMM2' - 'VFMADD132PDr XMM7 XMM7 XMM11 XMM2' - 'VFMADD132PDr XMM13 XMM13 XMM11 XMM2' - 'VFMADD132PDr XMM9 XMM9 XMM11 XMM2' - 'VFMADD132PDr XMM1 XMM1 XMM11 XMM2' - 'VFMADD132PDr XMM6 XMM6 XMM11 XMM2' - 'VFMADD132PDr XMM0 XMM0 XMM11 XMM2' - 'VFMADD132PDr XMM12 XMM12 XMM11 XMM2' config: '' register_initial_values: - 'XMM15=0x0' - 'XMM11=0x0' - 'XMM2=0x0' - 'MXCSR=0x0' - 'XMM5=0x0' - 'XMM14=0x0' - 'XMM4=0x0' - 'XMM8=0x0' - 'XMM3=0x0' - 'XMM10=0x0' - 'XMM7=0x0' - 'XMM13=0x0' - 'XMM9=0x0' - 'XMM1=0x0' - 'XMM6=0x0' - 'XMM0=0x0' - 'XMM12=0x0' cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 0.5312, per_snippet_value: 7.4368 } error: '' info: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, one unique register for each use position assembled_snippetheck generated assembly with: /usr/bin/objdump -d /tmp/snippet-c32060.o --- mode: inverse_throughput key: instructions: - 'VFMADD132PDr XMM10 XMM10 XMM6 XMM6' - 'VFMADD132PDr XMM8 XMM8 XMM6 XMM6' - 'VFMADD132PDr XMM12 XMM12 XMM6 XMM6' - 'VFMADD132PDr XMM9 XMM9 XMM6 XMM6' - 'VFMADD132PDr XMM7 XMM7 XMM6 XMM6' - 'VFMADD132PDr XMM1 XMM1 XMM6 XMM6' - 'VFMADD132PDr XMM0 XMM0 XMM6 XMM6' - 'VFMADD132PDr XMM5 XMM5 XMM6 XMM6' - 'VFMADD132PDr XMM11 XMM11 XMM6 XMM6' - 'VFMADD132PDr XMM2 XMM2 XMM6 XMM6' - 'VFMADD132PDr XMM15 XMM15 XMM6 XMM6' - 'VFMADD132PDr XMM3 XMM3 XMM6 XMM6' - 'VFMADD132PDr XMM14 XMM14 XMM6 XMM6' - 'VFMADD132PDr XMM4 XMM4 XMM6 XMM6' - 'VFMADD132PDr XMM13 XMM13 XMM6 XMM6' config: '' register_initial_values: - 'XMM10=0x0' - 'XMM6=0x0' - 'MXCSR=0x0' - 'XMM8=0x0' - 'XMM12=0x0' - 'XMM9=0x0' - 'XMM7=0x0' - 'XMM1=0x0' - 'XMM0=0x0' - 'XMM5=0x0' - 'XMM11=0x0' - 'XMM2=0x0' - 'XMM15=0x0' - 'XMM3=0x0' - 'XMM14=0x0' - 'XMM4=0x0' - 'XMM13=0x0' cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: inverse_throughput, value: 0.5311, per_snippet_value: 7.9665 } error: '' info: instruction has tied variables, avoiding Read-After-Write issue, picking random def and use registers not aliasing each other, reusing the same register for all uses assembled_snippet: 4883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F34244883C4104883EC04C70424801F0000C5F8AE14244883C4044883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F2C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F14244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F3C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F1C244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F34244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F24244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C57A6F2C244883C410C462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC462C998D6C462C998C6C462C998E6C462C998CEC4E2C998FEC4E2C998CEC4E2C998C6C4E2C998EEC462C998DEC4E2C998D6C462C998FEC4E2C998DEC462C998F6C4E2C998E6C462C998EEC3 ...
(trivial) Not sure if we need the 'Use' prefix? RegRandomizationStrategy?