This canonicalization step saves us 3 pattern matching possibilities * 4 math ops for scalar math that uses xmm regs.
The tests in llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll cover this scenario already, so I don't think we need to add any more tests.
Is it possible to add an assert to make sure we're not bypassing this canonicalization? Where would it go?
Also, I'm assuming that we don't turn this into a 'movsd' because of its partial register update, but should there be a 'FIXME' comment here when optimizing for size because 'movsd' is 2 bytes shorter?
Should also mention that the backend knows how to commute this instruction again as needed to match the register allocation.