This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid copying dead subregisters in copyPhysReg
Needs ReviewPublic

Authored by foad on Nov 2 2021, 6:56 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

When SIInstrInfo::copyPhysReg splits a multi-dword (superregister) copy
into individual dword (subregister) copies, use LivePhysRegs info to
avoid copying dead subregisters.

This fixes a liveness problem where the superregister copy source may
have been only partially defined (which is allowed) but one of the
resulting subregister copy sources would be completely undefined (which
is not allowed by the machine verifier).

This replaces the previous workaround, which was to add implicit
superregister use/def operands to all the subregister copy instructions,
which caused false dependencies between them and restricted the freedom
of post-RA scheduling and other late codegen passes.

Diff Detail

Event Timeline

foad created this revision.Nov 2 2021, 6:56 AM
foad requested review of this revision.Nov 2 2021, 6:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2021, 6:56 AM
foad added a comment.Nov 5 2021, 8:49 AM

Some stats from compiling a corpus of 10,000 graphics shaders:

  • 57 of them (mostly compute shaders) actually have some redundant mov instructions removed
  • 2,000 of them have other codegen differences due to post-ra scheduling being less constrained
  • average compile time is slower by 0.7%
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 3:25 AM

I wonder if we would be better off trying to break these copies into individual subregister pieces before we discard liveness information. Recomputing physreg liveness all over the place isn't exactly cheap

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
909–910

I don't understand how you can simply skip here

foad added inline comments.Jul 7 2022, 7:00 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
909–910

It's copying a dead subreg, so there's no need to insert any copy instruction.