This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Handle the case where there is no scavenged register.
ClosedPublic

Authored by hliao on Jul 21 2021, 8:07 AM.

Details

Summary
  • When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring.
  • As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot.
  • The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block.

Diff Detail

Event Timeline

hliao created this revision.Jul 21 2021, 8:07 AM
hliao requested review of this revision.Jul 21 2021, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2021, 8:07 AM
hliao updated this revision to Diff 361744.Jul 26 2021, 12:00 PM

Rebase and kindly ping for review.

hliao updated this revision to Diff 364125.Aug 4 2021, 8:25 AM

Rebase and kindly PING for review.

arsenm added inline comments.Aug 6 2021, 6:26 AM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
585–586

typo Optiionally

llvm/lib/CodeGen/BranchRelaxation.cpp
484–485

I'm pretty sure this is illegal, you can't have a branch to the entry block

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
331

Where did this frame index come from?

hliao updated this revision to Diff 376091.Sep 29 2021, 7:23 PM

Fix typos and remove dead code.

hliao marked 2 inline comments as done.Sep 29 2021, 7:31 PM
hliao added inline comments.
llvm/lib/CodeGen/BranchRelaxation.cpp
484–485

yeah, you are absolutely right!

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
331

That's the scavenge frame index previously added in https://reviews.llvm.org/D96336. Here, we need to spill an SGPR into a VGPR, which needs spilling into a frame slow when no VGPR could be scavenged.

hliao marked an inline comment as done.Sep 30 2021, 9:43 AM
hliao updated this revision to Diff 380056.Oct 15 2021, 10:36 AM

Rebase and kindly PING for review.

hliao added a comment.Oct 25 2021, 8:22 AM

Kingly PING for review

hliao updated this revision to Diff 382010.Oct 25 2021, 8:29 AM

Rebase and kingly PING for review.

arsenm added inline comments.Oct 26 2021, 3:05 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
721–722

Should turn this into an assert

722

Can delete the return

llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
2

Can you switch this to an amdhsa triple? I want to be sure the use of the sgpr0_sgpr1 ordinarily used for the scratch buffer is tested. I think you would need to add a test variant that is a non-kernel function too

paklui added a subscriber: paklui.Oct 26 2021, 7:53 PM
hliao updated this revision to Diff 382752.Oct 27 2021, 12:38 PM

Revise following reviewers' comment.

arsenm accepted this revision.Oct 27 2021, 12:41 PM

LGTM. Should still add a non-kernel case to the same test so that s[0:3] really is the SRD

This revision is now accepted and ready to land.Oct 27 2021, 12:41 PM
hliao updated this revision to Diff 382756.Oct 27 2021, 12:51 PM
hliao marked 3 inline comments as done.

Add a non-kernel function case.

hliao added inline comments.Oct 27 2021, 12:56 PM
llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
2

that none-kernel case is added. however, due to calling convention, v0 is reused for spilling SGPR and is always available for spilling SGPRs. We cannot test the no-scavenged register case. Do you have suggestions to fabricate the test case for that purpose?

arsenm added inline comments.Oct 27 2021, 12:58 PM
llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll
2

It's not available for spilling if there's a call that clobbers VGPRs

hliao updated this revision to Diff 382812.Oct 27 2021, 3:08 PM

Update non-kernel test case

  • It's turned out that we need a uniform condition to prevent vcc being *really* used.
arsenm accepted this revision.Oct 27 2021, 3:21 PM
This revision was landed with ongoing or failed builds.Oct 27 2021, 3:37 PM
This revision was automatically updated to reflect the committed changes.