A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
On SPEC 17: trunk <> patched chsi : 53231 59356 +6125 lt : 19324 14060 -5264 cghsi : 29368 34598 +5230 ltg : 166914 161949 -4965 mvhi : 29323 33923 +4600 lhi : 262083 257623 -4460 st : 181993 177559 -4434 mvghi : 54650 58599 +3949 stg : 409267 405380 -3887 lghi : 467915 464077 -3838 l : 231431 230640 -791 jlh : 178461 178961 +500 lg : 1093362 1092985 -377 cijlh : 83235 82875 -360 je : 340896 341233 +337 cije : 111150 110969 -181 jl : 52685 52808 +123 chi : 60634 60530 -104 cijl : 13434 13358 -76
Since LT/LTG and LHI/LGHI use a register write and an extra instruction, while CHSI/CGSI and MVHI/MVGHI do not, this should be a general improvement. I didn't see any big change in spilling/reloading, though (in fact a very slight increase in number of instructions which is probably related to later optimizations).
This is the remaining improvements I could see while looking at imagick. It seems to improve it maybe yet another percent or so.
This handles just some 30-40 cases so I am not sure how useful this is, but I suppose it can be.
There are a few more (50?) unhandled cases of LEFR/LFER that appear to require things like new opcodes with special handlings or similar ('%gr64bit = LFER %vr32bit' for instance seems awkward to handle ...)