This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register.
On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits".
Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.
This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register.
The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly.
I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2.
Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise.
Please let me know if okay to commit.
Thanks,
Andrea
When is it better to use BitVector vs APInt? I don't have an answer but we're incredibly inconsistent on this!