This is another enhancement to D77895/D78362 to avoid a round-trip from XMM->GPR->XMM.
This time we handle the case of starting/ending with different FP types but always with signed i32 as the intermediate value.
I think this covers all of the faux vector optimization possibilities for pre-AVX512.
There is at least 1 other transform mentioned in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
...where we fold an 'fpext' into a preceding 'sitofp'. I think we will want to handle that earlier (DAGCombiner or instcombine) because that's a target-independent optimization.