This is an enhancement to D77895 to avoid another round-trip from XMM->GPR->XMM. This time we handle the case of starting/ending with an f64 and casting to signed i32 as the intermediate value.
It's a bit more involved than I initially assumed because we need to use target-specific opcodes to represent the non-standard cast ops.