This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use MOVSX instead of CBW to extend i8 to AX for i8 sdiv.
ClosedPublic

Authored by craig.topper on Sep 4 2019, 1:12 PM.

Details

Summary

The CBW instruction sign extends AL to AX. But if the input isn't in AL yet we need to copy it there. MOVSX on the other hand has an independent source and destination register. So it can copy any 8-bit value into AX in one operation saving the copy. It can also fold a load which CBW can't.

This patch switches our sdiv isel to use MOVSX to allow these improvements. I'm extending all the way to i32 because that's one byte shorter to encode. It also avoids a partial register dependency on bits 31:16 of the output register on recent Intel CPUs. Unfortunately, this prevents X86MCInstLowering from being able to turn the MOVSX back into CBW if the register allocation works out that the input is in AL already. CBW is immune to the partial read issue on recent CPUs. But I think CBW is 3 bytes to encode and MOVSX i8->i32 is also 3 bytes so maybe this loss doesn't matter. If its important we could probably emit i8->i16 from isel and teach FixupBWInst to turn i8->i16 into i8->i32 when it won't interfere with CBW formation.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Sep 4 2019, 1:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 1:12 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Actually CBW is 2 bytes. So that would be better than MOVSX

I'm not clear on why we can't peephole replace MOVSX with CBW at a later stage.

You can't peephole replace MOVSX %al, %eax with CBW without proving that the upper 48-bits of %rax are dead. CBW is equivalent to MOVSX %al, %ax which leaves the upper 48-bits unmodified.

Ok I've recovered CBW formation by using MOVSX16rr8 out of isel and fixing it with FixupBWInsts. To avoid complicating the isel code to do something different for udiv/idiv, I've also changed to MOVZX16rr8. I'm also using the 16-bit form for the loads since they might get unfolded later. So FixupBWInsts has to handle all 4 instructions.

Realistically there 3 patches in here if we agree this is the way to go.
-Update FixupBWInsts to handle extends. This by itself will affect the fast-isel test.
-Change udiv to use MOVZX16rr8
-Change sdiv to use MOVSX16rr8

New codegen looks nice

Realistically there 3 patches

Maybe review this as one patch and then commit as three patches?

andreadb accepted this revision.Sep 6 2019, 3:50 AM

Looks good to me.
Thanks for the changes in FixupBWInsts. Personally I was already happy with the previous version patch. But this one is looks even better.

I also agree that this change should be split into three commits.

llvm/lib/Target/X86/X86FixupBWInsts.cpp
242 ↗(On Diff #218973)

Extra '//' between 'something' and 'else'

This revision is now accepted and ready to land.Sep 6 2019, 3:50 AM

Please also update the patch summary before commiting

This revision was automatically updated to reflect the committed changes.