Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32
eg. For the following IR
%0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1
Pre custom lowering:
ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1
After custom lowering:
lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0
spec2017 improvements:
- parest by 1.16%
- blender by 1.24%.
spec2006 improvements:
- mcf by 2%
- xalancbmk by 1.29%
expand maybe?