This patch aims to improve the code generation for float vector gather on POWER9. Patterns have been implemented to utilize instructions that deliver improved performance. This decreases overall latency from 16 to 12 cycles.
- Before Patch
lfs 0, 0(3) lfs 2, 0(5) lfs 1, 0(4) xxmrghd 0, 2, 0 lfs 3, 0(6) xvcvdpsp 34, 0 xxmrghd 0, 3, 1 xvcvdpsp 35, 0 vmrgew 2, 3, 2
- After Patch (using POWER9 instructions)
lfiwzx 0, 0, 6 lfiwzx 1, 0, 5 xxmrghw 0, 0, 1 lfiwzx 1, 0, 4 lfiwzx 2, 0, 3 xxmrghw 1, 1, 2 xxmrgld 34, 0, 1
Why these dag belongs to AlignValues? Why not MrgFP?