The CBW instruction sign extends AL to AX. But if the input isn't in AL yet we need to copy it there. MOVSX on the other hand has an independent source and destination register. So it can copy any 8-bit value into AX in one operation saving the copy. It can also fold a load which CBW can't.
This patch switches our sdiv isel to use MOVSX to allow these improvements. I'm extending all the way to i32 because that's one byte shorter to encode. It also avoids a partial register dependency on bits 31:16 of the output register on recent Intel CPUs. Unfortunately, this prevents X86MCInstLowering from being able to turn the MOVSX back into CBW if the register allocation works out that the input is in AL already. CBW is immune to the partial read issue on recent CPUs. But I think CBW is 3 bytes to encode and MOVSX i8->i32 is also 3 bytes so maybe this loss doesn't matter. If its important we could probably emit i8->i16 from isel and teach FixupBWInst to turn i8->i16 into i8->i32 when it won't interfere with CBW formation.