This patch adds a new PowerPC-dependent peephole optimization pass after register allocation.
The new peephole optimization rewrites operands of instructions after a register copy to reduce latency.
For example, in code sequences like
mr X, Y (no update in X or Y) addi Z, X, 1
this pass updates addi Z, X, 1 to addi Z, Y, 1 to make the execution of the addi independent from the preceding mr instruction.
Typically, such register copies are for method parameters or a return value (e.g. copying a method parameter to another register before the first use.)
This pattern appears quite frequently; this optimization rewrites more than 100k instructions while compiling LLVM+Clang.
The performance was improved in most of SPECCPU benchmarks on POWER8.
I400.perlbench 0.03% I401.bzip2 0.28% I403.gcc 0.32% I429.mcf 1.43% I445.gobmk -0.05% I456.hmmer 0.13% I458.sjeng 0.06% I462.libquantum 0.70% I464.h264ref 0.03% I471.omnetpp 0.37% I473.astar -0.03% I483.xalancbmk 0.85% f433.milc 0.70% f444.namd 0.06% f447.dealII 0.57% f450.soplex -0.74% f453.povray 0.28% f470.lbm 0.06% f482.sphinx3 0.78% SPECINT 0.34% SPECFP 0.25% TOTAL 0.31%
(Average of 8 runs. A positive number means improvement by the patch.)
The comments below should be doxygen comments, identified by "///" instead of the usual "/"