This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Improve eliminateFrameIndex() to use STDY/STEY/LDY for VR32/VR64 when possible.
ClosedPublic

Authored by jonpa on Dec 8 2021, 3:55 PM.

Details

Summary

When e.g. a VR64 register got spilled to a stack slot requiring a long displacement (20-bit), an LAY was emitted always.

This patch checks if the allocated phys reg is legal with an FP instruction and if so uses it instead of the separate LAY.

These were the general code differences with the patch:

SPEC17:
lay            :                64372                58227    -6145
stdy           :                 1540                 4965    +3425
std            :                64568                61143    -3425
ldy            :                 3478                 5821    +2343
ld             :                87066                84723    -2343
stey           :                  267                  544     +277
ste            :                51282                51005     -277
ley            :                  352                  462     +110
lde            :                64929                64827     -102
...

Most of the improvement happens in cactus. (Cactus only:)

lay            :                23523                17984    -5539
stdy           :                  683                 4036    +3353
std            :                11327                 7974    -3353
ldy            :                 2463                 4651    +2188
ld             :                12831                10643    -2188

So far, I have measured (quick) this patch with both fp-contract "off" and "fast". On both z14 and z15 I saw mainly a 2% improvement on cactus, with both "off" and "fast":

Z14 "off"
Improvements:
0.979: f507.cactuBSSN_r
0.989: f526.blender_r
0.995: f519.lbm_r

(No regressions)

Z14 "fast"
Improvements:
0.978: f507.cactuBSSN_r

Regressions:
1.007: i500.perlbench_r

Z15 similar, except one regression with "off":
1.048: f538.imagick_r

This regression does not happen with "fast", or with default fp-flags. It disappeared nicely if I disabled LEY instructions - the partial register dependency issue most likely then... (I could be wrong as LAY+LDE -> LEY changes alignments...).

imagick:
lay            :                 5768                 5437     -331                    
ldy            :                  686                  841     +155                    
ld             :                 3823                 3668     -155 
stey           :                   36                  138     +102                    
ste            :                 1770                 1668     -102                    
stdy           :                  551                  623      +72                    
std            :                 2791                 2719      -72                    
ley            :                   19                   25       +6                    
lde            :                 1221                 1215       -6                    
stg            :                20220                20216       -4                    
lg             :                28317                28313       -4

So if removing the LEY:s like

lde            :                 1215                 1221       +6                    
ley            :                   25                   19       -6   
lay            :                 5437                 5443       +6

, the regression disappears. This also happens exactly the same way with "full" runs.

Therefore, I have removed the LEYs from the patch. (I will still remeasure benchmarks one more time with w/out LEY...)

Diff Detail

Event Timeline

jonpa created this revision.Dec 8 2021, 3:55 PM
jonpa requested review of this revision.Dec 8 2021, 3:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 3:55 PM
jonpa added a comment.Dec 9 2021, 9:36 AM

I made a new set of quick benchmark runs with fp-contract=default/off/fast for main/patch/"patch with LEYs", giving 9 runs for each benchmark.

On Z14
default: cactus improving 1.5%, LEY:s give 1.2% improvement on blender (no regressions)
fast: cactus improving 2.5%, blender improving 0.5%. LEY:s give no further improvements (but possible slight regression on xalank 0.4%).
off: cactus improving 2%. LEY:s give 1.1% improvement on blender (no regressions).

On Z15
default: cactus improving 1%, nab improving 0.7%, deepsheng regressing 0.8%. Imagick regressing 2.3%, also without LEYs. LEYs give 1% improvement on perl.
fast: cactus improving 1.8%
off: cactus improving 2.4%. LEYs improve perl 1%, but cause 5% regression on imagick.

Z15/default/Imagick regresssion 2.3%:

  • seems to be <1% with a "full" run (~0.7%)
  • Opcode diff look ok:
lay            :                 5776                 5449     -327
ld             :                 4160                 4004     -156
ldy            :                  684                  840     +156
stey           :                   36                  138     +102
ste            :                 1776                 1674     -102
std            :                 2786                 2713      -73
stdy           :                  550                  623      +73
lg             :                28283                28275       -8
stg            :                20219                20211       -8

Hot file is morphology.s:

lay            :                  221                  120     -101
stey           :                    0                   93      +93
ste            :                   93                    0      -93
stg            :                  293                  285       -8
lg             :                  633                  625       -8
std            :                  224                  219       -5
stdy           :                   14                   19       +5
ld             :                  316                  312       -4
ldy            :                   31                   35       +4

Looks ok as well.. (Less STG/LG are cases where the LAYs needed one extra register.) Not sure then what this could be, or if it's worth more time...

jonpa added a comment.EditedMay 7 2022, 10:53 AM

This patch seems to have a much lesser impact now: In December last year this gave ~6000 less LAYs, and now only ~250 less. This seems to be due to the added ordering of the frame objects - if I disable that ordering I get some 6000 more LAYs in the output...

I ran the patch with and without the use of LEY and this time I did not see that it made any difference (earlier there was a regression with LEY). Now there are only a handful of extra LEYs emitted anyway...

I also made the experiment again with substituting all LEY:s with LAY+(LDE or LEY). In one build an LEY would result in LAY+LDE, and in another LAY+LEY. This would for all else the same, including alignments. In the "LDE build", the number LEYs went from 390 to 38, so most were gone. I could not see in any benchmark on either default or off (fp-contract) that LEYs would be worse.

So overall, this seems to not give any noticable performance impact on SPEC to me. Given the above I would probably suggest putting back the LEY emission to the patch, or?

"default fp-contract":

lay            :                55499                55231     -268
std            :                65122                64967     -155
stdy           :                  865                 1020     +155
ld             :                98755                98666      -89
ldy            :                 3031                 3120      +89
ste            :                51716                51698      -18
stey           :                  286                  304      +18
ley            :                  423                  429       +6
lde            :                68442                68436       -6
lg             :               976008               976007       -1
stg            :               369621               369620       -1
OPCDIFFS: -270
Even less for some reason with "fp-contract off":

lay            :                55247                55193      -54
...
OPCDIFFS: -54
Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2022, 10:53 AM
jonpa updated this revision to Diff 430082.May 17 2022, 8:50 AM

Emission of LEY reinserted into patch. Test cases updated to work also with the ordering of frame objects.

uweigand accepted this revision.May 17 2022, 8:59 AM

LGTM, thanks!

This revision is now accepted and ready to land.May 17 2022, 8:59 AM
This revision was landed with ongoing or failed builds.Jun 8 2022, 8:10 AM
This revision was automatically updated to reflect the committed changes.