This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GFX908] Only consider explicit defs of src reg in indirect agpr copy
AbandonedPublic

Authored by jrbyrnes on Dec 27 2022, 3:08 PM.

Details

Reviewers
arsenm
rampitec
Summary

When doing indirect copy to AGPR, we parse all previous instructions in attempt to find an accvgpr_write which has a dest that holds the src of the copy -- we can use this instead of reconstructing a separate accvgpr_write. However, after https://github.com/llvm/llvm-project/commit/663f16684d1a5f129874b7be9e1f486682977bfa we add implicit defs of the wide AGPR to AGPR spills. Thus, we previously flag accvgpr_writes as holding some particular subreg of the src of the copy, when, in fact, they hold some other subreg of the src. Ideally, we would not need to attach implicit_defs for each agpr spill. This patch resolves the bug until we have a better way of doing things. It modifies indirectCopyToAGPR s.t. it only considers accvgpr_writes that explicitly define the vgpr.

Diff Detail

Unit TestsFailed

Event Timeline

jrbyrnes created this revision.Dec 27 2022, 3:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2022, 3:08 PM
jrbyrnes requested review of this revision.Dec 27 2022, 3:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2022, 3:08 PM
jrbyrnes edited the summary of this revision. (Show Details)Dec 27 2022, 3:10 PM
arsenm added inline comments.Dec 27 2022, 3:21 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
577

Assuming the intent of implicit defs makes me nervous. They're there for liveness representation we don't want to just ignore. What does this solve exactly? I thought we always reserved a register for this now, so you don't need to actually scan for anything

jrbyrnes added inline comments.Dec 27 2022, 3:42 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
577

We don't need to scan for anything -- it is just an optimization. When copying agprs on gfx908, this code is used to pick up the case where we already have an accvgpr_write that does the same thing as one we are about to create. However, it is incorrectly picking up accvgpr_writes because we have attached to spills implicit def of the entire wide agpr.

So when lowering
agpr0_agpr1 = copy agpr2_agpr3,
we will find
vgprx = v_accvgpr_write agpr2, implicit-def agpr2_agpr3
and incorrectly use vgprx as holding agpr3 due to implicit-def.

If we don't find any such accvgpr_write, then we use a temp register to copy over. The register we use for this takes into account the reserved registers to find an unused one. The other patch was to mark vgpr used for agpr spills as reserved so the temp didn't clobber.

I think this case is niche enough to be able to assume the intent of implicit-def. With this patch we at least err on the side of caution in that we only apply optimization if the found instruction explicitly defines our reg. Furthermore, if we do find such a reg, we check instructions in between that def and our use for any modifications to this reg (including implicit defs).

This isn’t the right place to apply optimizations. I’ve wanted to delete all of this code.

Copy lowering should be as straightforward as possible. What we have now is doing a liveness scan for each copy, which is crazy. We should either avoid this situation in the first place by applying implicit VGPR virtual register defs so the allocator ensures one is free at this point, or optimize all the AGPR copies at once after the fact

This isn’t the right place to apply optimizations. I’ve wanted to delete all of this code.

Copy lowering should be as straightforward as possible. What we have now is doing a liveness scan for each copy, which is crazy. We should either avoid this situation in the first place by applying implicit VGPR virtual register defs so the allocator ensures one is free at this point, or optimize all the AGPR copies at once after the fact

indirectCopyToAGPR performs two optimizations when lowering copies.

  1. reuse existing accvgpr_write instead of creating a new one. If we find a match, we can reuse accvgpr_write into our dest, rather than accvpgr_read the source into vgpr, then accvgpr_write that vgpr int our dest. This involves scanning to find accvgpr_writes then to check if candidate duplicate accvpgr_writes are modified.
  2. allocate vgprs used in the copy via round-robin of a triplet. This is to avoid a hazard. This involves scanning via RS.

Since the goal of expand-postra-pseudos is to lower pseudos and not apply these optimizations, it makes sense to move these elsewhere. Perhaps it makes the most sense to have a pass to do pseudo expansion cleanup (i.e. handle both of these, and more if desired)? Attaching implicit virtual vgpr defs may enable 2., but I think there would be no guarantee that they would all be allocated in the desired fashion. I think this would allow us to get rid of VGPRForAGPR copy though.

Regardless, in facilitating 1., even in a separate pass, I think that having implicit-defs of agprs attached to instructions that don't define them will pose problems, and we may end up wanting the explicitlyDefines functionality.

I propose to use this patch to provide fix for relatively high priority correctness issue, and introduce explicitlyDefines. Then, in separate work, clean up the indirectCopyToAGPR (potentially via new pass). Otherwise, I can handle above concerns here.