Some 64 bit constants can be materialized with fewer instructions than we currently use.
We consider a 64 bit immediate value divided into four parts, Hi16OfHi32 (bits 48...63), Lo16OfHi32 (bits 32...47), Hi16OfLo32 (bits 16...31), Lo16OfLo32 (bits 0...15). When any three parts are equal, the immediate can be treated as "almost" a splat of a 32 bit value in a 64 bit register.
For example:
define i64 @almost_splat() { entry: ; 0xCCFFCCFF0123CCFF (Hi16OfHi32 == Lo16OfHi32 ==Lo16OfLo32) ret i64 14771750698406366463 }
Currently we use 5 instruction to materialize the immediate:
# %bb.0: # %entry lis 3, -13057 ori 3, 3, 52479 rldic 3, 3, 32, 0 oris 3, 3, 291 ori 3, 3, 52479 blr
To improve that we can use 3 instructions to generate the splat and use 1 instruction to modify the different part:
# %bb.0: # %entry lis 3, 291 ori 3, 3, 52479 rldimi 3, 3, 32, 0 // 0x0123CCFF0123CCFF is generated here rldimi 3, 3, 48, 0 // modify Hi16OfHi32, then we get 0xCCFFCCFF0123CCFF blr
For compile time concern, since the 1 instruction patterns for Imm & 0xffffffff00000000 are simple, can we move the new code before line 1318 and do some simple check for 1 instruction pattern in selectI64ImmDirect?