A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
On SPEC 17: trunk <> patched chsi : 53231 59356 +6125 lt : 19324 14060 -5264 cghsi : 29368 34598 +5230 ltg : 166914 161949 -4965 mvhi : 29323 33923 +4600 lhi : 262083 257623 -4460 st : 181993 177559 -4434 mvghi : 54650 58599 +3949 stg : 409267 405380 -3887 lghi : 467915 464077 -3838 l : 231431 230640 -791 jlh : 178461 178961 +500 lg : 1093362 1092985 -377 cijlh : 83235 82875 -360 je : 340896 341233 +337 cije : 111150 110969 -181 jl : 52685 52808 +123 chi : 60634 60530 -104 cijl : 13434 13358 -76
Since LT/LTG and LHI/LGHI use a register write and an extra instruction, while CHSI/CGSI and MVHI/MVGHI do not, this should be a general improvement. I didn't see any big change in spilling/reloading, though (in fact a very slight increase in number of instructions which is probably related to later optimizations).
This is the remaining improvements I could see while looking at imagick. It seems to improve it maybe yet another percent or so.
Ugh, that's annoying. Can you at least move the mnemonic twiddling to the other side, i.e. to MemFoldPseudo? We already have similar twiddling for LOC vs SEL there.