An initial attempt to try and improve vXi64 UITOFP conversions:
vXi64-vXf64 - perform this as a true vectorization instead of (partially vectorized) scalar conversions by adding vector support to ExpandLegalINT_TO_FP)
vXi64-vXf32 - SSE customized versions of the ExpandLegalINT_TO_FP code, avoiding a lot of branches that were often poorly predicted
There's still room for improvement:
uitofp_4i64_to_4f64 - AVX1 codegen should be able to perform the vpsrlq xmm shifts as ymm (v8f32) shuffles
uitofp_Xi64_to_Xf32 - some of the BLENDV cases should be selected from the sign bit directly and not need a shift/comparison
I think we really need to handle this in the expand code in LegalizeVectorOps. We hit the expand there first which scalarizes it. Then DAG combine reassembled it in reduceBuildVecConvertToConvertBuildVec. The we hit this code in LegalizeDAG. But I don't think we really want to rely on DAG combine like that.