This is an archive of the discontinued LLVM Phabricator instance.

[TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB
ClosedPublic

Authored by Carrot on Nov 4 2021, 8:40 AM.

Details

Summary

Currently we create register mappings for registers used only once in current MBB. For registers with multiple uses, when all the uses are in the current MBB, we can also create mappings for them similarly according to the last use. For example

%reg101 = ...
                = ... reg101
%reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Diff Detail

Event Timeline

Carrot created this revision.Nov 4 2021, 8:40 AM
Carrot requested review of this revision.Nov 4 2021, 8:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2021, 8:40 AM
Carrot updated this revision to Diff 385992.Nov 9 2021, 3:25 PM

Rebase.

pengfei added inline comments.Nov 12 2021, 6:18 PM
llvm/test/CodeGen/X86/uadd_sat_vec.ll
1079

The tests in this file seem all become bad. Do you have any idea to optimizate for them?

Carrot added inline comments.Nov 15 2021, 12:04 PM
llvm/test/CodeGen/X86/uadd_sat_vec.ll
1079

Let's take function v4i32 as an example

With this patch

liveins: $xmm0, $xmm1
%1:vr128 = COPY killed $xmm1
%0:vr128 = COPY killed $xmm0
%2:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
%3:vr128 = COPY %0:vr128
%3:vr128 = PXORrr %3:vr128(tied-def 0), %2:vr128
%4:vr128 = COPY killed %0:vr128
%4:vr128 = PADDDrr %4:vr128(tied-def 0), killed %1:vr128
%5:vr128 = COPY killed %2:vr128
%5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
%6:vr128 = COPY killed %3:vr128
%6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), killed %5:vr128
%7:vr128 = COPY killed %4:vr128
%7:vr128 = PORrr %7:vr128(tied-def 0), killed %6:vr128
$xmm0 = COPY killed %7:vr128
RET 0, killed $xmm0

The final extra movdqa comes from

3:vr128 = COPY %0:vr128

It can't be deleted because %0 is not kill at this point. All other COPY instructions can be coalesced and deleted.

Without this patch

liveins: $xmm0, $xmm1
%1:vr128 = COPY killed $xmm1
%0:vr128 = COPY killed $xmm0
%2:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
%3:vr128 = COPY %0:vr128
%3:vr128 = PXORrr %3:vr128(tied-def 0), %2:vr128
%4:vr128 = COPY killed %1:vr128
%4:vr128 = PADDDrr %4:vr128(tied-def 0), killed %0:vr128
%5:vr128 = COPY killed %2:vr128
%5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
%6:vr128 = COPY killed %3:vr128
%6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), killed %5:vr128
%7:vr128 = COPY killed %6:vr128
%7:vr128 = PORrr %7:vr128(tied-def 0), killed %4:vr128
$xmm0 = COPY killed %7:vr128
RET 0, killed $xmm0

It also contains

3:vr128 = COPY %0:vr128

and %0 is not killed, so %3 is not xmm0 and this COPY is a real MOV at this point. The SrcRegMap contains %7 -> %6 -> %3, so %7 can't be coalesced with xmm0, and the last COPY is also a real MOV.

It can seen clearly after coalescing

0B      bb.0 (%ir-block.0):
          liveins: $xmm0, $xmm1
16B       %4:vr128 = COPY $xmm1
32B       %0:vr128 = COPY $xmm0
48B       %5:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
64B       %6:vr128 = COPY %0:vr128
80B       %6:vr128 = PXORrr %6:vr128(tied-def 0), %5:vr128
112B      %4:vr128 = PADDDrr %4:vr128(tied-def 0), %0:vr128
144B      %5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
176B      %6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), %5:vr128
208B      %6:vr128 = PORrr %6:vr128(tied-def 0), %4:vr128
224B      $xmm0 = COPY %6:vr128
240B      RET 0, $xmm0

So the two address conversion result is actually better with my new patch.

But after scheduling, the %0 in the COPY instruction becomes killed operand, so now %6 and %0 and xmm0 can be coalesced, and both COPY instructions can be deleted.

0B      bb.0 (%ir-block.0):
          liveins: $xmm0, $xmm1
16B       %4:vr128 = COPY $xmm1
32B       %0:vr128 = COPY $xmm0
48B       %5:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load (s128) from constant-pool)
112B      %4:vr128 = PADDDrr %4:vr128(tied-def 0), %0:vr128
116B      %6:vr128 = COPY %0:vr128
120B      %6:vr128 = PXORrr %6:vr128(tied-def 0), %5:vr128
144B      %5:vr128 = PXORrr %5:vr128(tied-def 0), %4:vr128
176B      %6:vr128 = PCMPGTDrr %6:vr128(tied-def 0), %5:vr128
208B      %6:vr128 = PORrr %6:vr128(tied-def 0), %4:vr128
224B      $xmm0 = COPY %6:vr128
240B      RET 0, $xmm0

So this is another pass order problem between scheduling/TwoAddressInstructionPass.

pengfei accepted this revision.Nov 22 2021, 11:19 PM

LGTM.

This revision is now accepted and ready to land.Nov 22 2021, 11:19 PM
This revision was landed with ongoing or failed builds.Nov 29 2021, 7:05 PM
This revision was automatically updated to reflect the committed changes.
dongAxis1944 added inline comments.
llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
389

hi, I have a question, why does the code not break here directly?

pengfei added inline comments.Feb 15 2022, 5:59 AM
llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
389

It finds the last use in BB, see comments above.