This is an archive of the discontinued LLVM Phabricator instance.

[MachineCopyPropagation] Check CrossCopyRegClass for cross-class copys
ClosedPublic

Authored by vangthao on Aug 12 2021, 9:39 PM.

Details

Summary

On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.

To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.

Issue seen while attempting to optimize another AGPR to AGPR issue:

Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0

After machine-cp:

$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0

Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.

Diff Detail

Event Timeline

vangthao created this revision.Aug 12 2021, 9:39 PM
vangthao requested review of this revision.Aug 12 2021, 9:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2021, 9:39 PM
vangthao updated this revision to Diff 366182.Aug 12 2021, 9:45 PM

Fix test.

rampitec accepted this revision.Aug 13 2021, 9:42 AM

LGTM, but please wait for others to reviews it too.

This revision is now accepted and ready to land.Aug 13 2021, 9:42 AM
vangthao updated this revision to Diff 368545.Aug 24 2021, 8:52 PM

Clang format and rebased

lkail accepted this revision.Aug 24 2021, 9:19 PM

LGTM.

arsenm added inline comments.Aug 26 2021, 7:30 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
428

I suspect you don't really want an exact class match, and probably should be checking if one is a subclass of the other

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
806

Should comment why this is done

llvm/lib/Target/AMDGPU/SIRegisterInfo.h
115–118

I think copying the comment from the generic implementation just leads to documentation rot

vangthao added inline comments.Aug 31 2021, 10:15 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
428

Would it be needed in this case? If a different register class is not needed for the copy then getCrossCopyRegClass() should return the same exact class and we can check for the exact match.

Although I think UseDstRC and CopyDstRC can be a subclass of CrossCopyRC or vice versa.