This is a follow-up patch that came up when working on the handling of stores of replicated values. This is the case that involves loading any integer constant into a register and immediately storing it. There are some cases where VGBM / VGM can be used for this.
This (at least currently) involves a new SystemZ method "isSupportedByScalarStore()" which is used to find cases where it would be worthwhile to use a vector constant instead. I have tried two variants:
- Replace all immediate loads that would require one or two scalar instructions.
- Only replace cases where two scalar instructions would be needed (-keep-imm-load passed).
(1) might make sense as there are more vector registers so register pressure might be improved. It seems that one time that gave less spills, and the other time it was more, and only with small differences, so that does not really seem to be a big deal.
(2) Is a bit more of a clear improvement as one immediate load is used instead of two scalar ones always.
main <> patch (1) vsteg : 6564 9016 +2452 stg : 388272 385858 -2414 vstef : 6615 8761 +2146 st : 130605 128843 -1762 vgmf : 13125 14464 +1339 vgmg : 8152 9472 +1320 llilh : 6956 5825 -1131 llihh : 2549 1692 -857 llihl : 7114 6668 -446 stfh : 17089 16791 -298 vgbm : 11085 11288 +203 oilh : 1539 1351 -188 llihf : 11156 10978 -178 oilf : 8959 8819 -140 iilf : 5084 4965 -119 iihf : 4918 4814 -104 lay : 56028 56126 +98 llilf : 2249 2177 -72 oill : 18106 18045 -61 ... OPCDIFFS: -212
main <> patch with experimental option "-keep-imm-load" passed (2) stg : 388272 387648 -624 vsteg : 6564 7177 +613 vgmg : 8152 8355 +203 oilh : 1539 1351 -188 llihh : 2549 2367 -182 vgmf : 13125 13304 +179 oilf : 8959 8819 -140 llihl : 7114 7003 -111 llihf : 11156 11065 -91 oill : 18106 18045 -61 lg : 1050050 1050003 -47 vgbm : 11085 11122 +37 lay : 56028 56058 +30 mvc : 59773 59755 -18 lgr : 842765 842758 -7 cgije : 142634 142629 -5 iihf : 4918 4913 -5 je : 337365 337370 +5 ldr : 51266 51270 +4 OPCDIFFS: -408
With the latest build (2) seems better, but again I am not sure how stable that is as it seems to change over time.
The patch is not quite finalized and the next step would be to decide between (1) and (2)...
Nightly performance measurements do not show any changes above 1%. Given these measurements I would take (2) as there are a few 0.25% improvements with that, but I am not sure so small differences are trustworthy...
XXX is for maybe using isSupportedByScalarStore() here as well