This stops reporting CostPerUse 1 for R8-R15 and XMM8-XMM31. This was
previously done because instruction encoding require a REX prefix when
using them resulting in longer instruction encodings. I found that this
regresses the quality of the register allocation as the costs impose an
ordering on eviction candidates. I also feel that there is a bit of an
impedance mismatch as the actual costs occure when encoding instructions
using those registers, but the order of VReg assignments is not
primarily ordered by number of Defs+Uses.
I did extensive measurements with the llvm-test-suite wiht SPEC2006 +
SPEC2017 included, internal services showed similar patterns. Generally
there are a log of improvements but also a lot of regression. But on
average the allocation quality seems to improve at a small code size
regression.
Results for measuring static and dynamic instruction counts:
-O3 + ThinLTO + Instr-PGO
Dynamic Counts (scaled by execution frequency) / Optimization Remarks:
Spills+FoldedSpills -5.6% Reloads+FoldedReloads -4.2% Copies -0.1%
Static / LLVM Statistics:
regalloc.NumSpills mean -1.6%, geomean -2.8% regalloc.NumReloads mean -1.7%, geomean -3.1% size..text mean +0.4%, geomean +0.4%
-O3
Static / LLVM Statistics:
mean -2.2%, geomean -3.1%) regalloc.NumSpills mean -2.6%, geomean -3.9%) regalloc.NumReloads mean +0.6%, geomean +0.6%) size..text
-Os
Static / LLVM Statistics:
regalloc.NumSpills mean -3.0% regalloc.NumReloads mean -3.3% size..text mean +0.3%, geomean +0.3%
Detailed numbers in https://reviews.llvm.org/P8290