When e.g. a VR64 register got spilled to a stack slot requiring a long displacement (20-bit), an LAY was emitted always.
This patch checks if the allocated phys reg is legal with an FP instruction and if so uses it instead of the separate LAY.
These were the general code differences with the patch:
SPEC17: lay : 64372 58227 -6145 stdy : 1540 4965 +3425 std : 64568 61143 -3425 ldy : 3478 5821 +2343 ld : 87066 84723 -2343 stey : 267 544 +277 ste : 51282 51005 -277 ley : 352 462 +110 lde : 64929 64827 -102 ...
Most of the improvement happens in cactus. (Cactus only:)
lay : 23523 17984 -5539 stdy : 683 4036 +3353 std : 11327 7974 -3353 ldy : 2463 4651 +2188 ld : 12831 10643 -2188
So far, I have measured (quick) this patch with both fp-contract "off" and "fast". On both z14 and z15 I saw mainly a 2% improvement on cactus, with both "off" and "fast":
Z14 "off"
Improvements:
0.979: f507.cactuBSSN_r
0.989: f526.blender_r
0.995: f519.lbm_r
(No regressions)
Z14 "fast"
Improvements:
0.978: f507.cactuBSSN_r
Regressions:
1.007: i500.perlbench_r
Z15 similar, except one regression with "off":
1.048: f538.imagick_r
This regression does not happen with "fast", or with default fp-flags. It disappeared nicely if I disabled LEY instructions - the partial register dependency issue most likely then... (I could be wrong as LAY+LDE -> LEY changes alignments...).
imagick: lay : 5768 5437 -331 ldy : 686 841 +155 ld : 3823 3668 -155 stey : 36 138 +102 ste : 1770 1668 -102 stdy : 551 623 +72 std : 2791 2719 -72 ley : 19 25 +6 lde : 1221 1215 -6 stg : 20220 20216 -4 lg : 28317 28313 -4
So if removing the LEY:s like
lde : 1215 1221 +6 ley : 25 19 -6 lay : 5437 5443 +6
, the regression disappears. This also happens exactly the same way with "full" runs.
Therefore, I have removed the LEYs from the patch. (I will still remeasure benchmarks one more time with w/out LEY...)
clang-format: please reformat the code