Based on available register budget, reserve highest available VGPR for
AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one
exist.
Fixes SWDEV-330006.
Paths
| Differential D123525
[AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget. ClosedPublic Authored by hsmhsm on Apr 11 2022, 10:43 AM.
Details Summary Based on available register budget, reserve highest available VGPR for Fixes SWDEV-330006.
Diff Detail
Event Timelinerampitec added inline comments. This revision now requires changes to proceed.Apr 11 2022, 11:22 AM
Comment Actions Refine the logic within indirectCopyToAGPR() while chosing lowest available Comment Actions Here is the update where we always reserve highest avaialble VGPR irrespective of the With this update, below two lit tests fail to compile because RegAlloc fails. spill-agpr.ll ------------- define amdgpu_kernel void @max_5regs_used_8a(<4 x float> addrspace(1)* %arg) #4 { %tid = call i32 @llvm.amdgcn.workitem.id.x() %v0 = call float asm sideeffect "; def $0", "=v"() %a4 = call <4 x float> asm sideeffect "; def $0", "=a"() %gep = getelementptr inbounds <4 x float>, <4 x float> addrspace(1)* %arg, i32 %tid %mai.in = load <4 x float>, <4 x float> addrspace(1)* %gep %mai.out = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %mai.in, i32 0, i32 0, i32 0) store <4 x float> %mai.out, <4 x float> addrspace(1)* %gep store volatile <4 x float> %a4, <4 x float> addrspace(1)* undef call void asm sideeffect "; use $0", "v"(float %v0); ret void } declare i32 @llvm.amdgcn.workitem.id.x() declare <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float, float, <4 x float>, i32, i32, i32) attributes #4 = { nounwind "amdgpu-num-vgpr"="5" } spill-vgpr-to-agpr.ll --------------------- define amdgpu_kernel void @max_10_vgprs_used_1a_partial_spill(i64 addrspace(1)* %p) #0 { %tid = load volatile i32, i32 addrspace(1)* undef call void asm sideeffect "", "a"(i32 1) %p1 = getelementptr inbounds i64, i64 addrspace(1)* %p, i32 %tid %p2 = getelementptr inbounds i64, i64 addrspace(1)* %p1, i32 8 %p3 = getelementptr inbounds i64, i64 addrspace(1)* %p2, i32 16 %p4 = getelementptr inbounds i64, i64 addrspace(1)* %p3, i32 24 %p5 = getelementptr inbounds i64, i64 addrspace(1)* %p4, i32 32 %v1 = load volatile i64, i64 addrspace(1)* %p1 %v2 = load volatile i64, i64 addrspace(1)* %p2 %v3 = load volatile i64, i64 addrspace(1)* %p3 %v4 = load volatile i64, i64 addrspace(1)* %p4 %v5 = load volatile i64, i64 addrspace(1)* %p5 call void asm sideeffect "", "v,v,v,v,v"(i64 %v1, i64 %v2, i64 %v3, i64 %v4, i64 %v5) store volatile i64 %v1, i64 addrspace(1)* %p2 store volatile i64 %v2, i64 addrspace(1)* %p3 store volatile i64 %v3, i64 addrspace(1)* %p4 store volatile i64 %v4, i64 addrspace(1)* %p5 store volatile i64 %v5, i64 addrspace(1)* %p1 ret void } declare i32 @llvm.amdgcn.workitem.id.x() attributes #0 = { nounwind "amdgpu-num-vgpr"="10" } Comment Actions Let's begin with the question: what problem are you trying to solve?
Comment Actions
The problem, I am basically trying to solve is a case, where we have following situation - there is a vector ALU operation which requires 1024 wide VGPR, and the register budget in this case is 64. Since v32 is reserved, all the 1024 bit vectors starting from v1...v32 until v32...v63 are NOT usable. Only possible to use 1024 bit register is v0...v31. But, unfortunately, there is SGPR spills to v0 happening, and hence, v0...v31 is also NOT usable. So, RA fails. Hence, we cannot (always) choose v32, probably the better way of handling it is: (1) While reserving the registers before RA, reserve highest available VGPR based on register budget, and then later Now, we have of couple of questions to answer Q1. Is it safe to always reserve highest available VGPR irrespective of constrained register budget? Answer to Q1 is - we have no other choice at the moment.
Comment Actions Move code related to shifting of reserved VGPR to a lower range within FrameLowering.
hsmhsm added a parent revision: D123809: [AMDGPU] Pre-checkin updated lit tests for D123525..Apr 14 2022, 11:45 AM Comment Actions Rebase to latest trunk and to D123809. Since the lit test for the function @max_5regs_used_8a() within spill-agpr.ll asserts for
hsmhsm added a parent revision: D123973: [AMDGPU] Split the lit test spill-vgpr-to-agpr.ll to different tests.Apr 18 2022, 7:24 PM This revision is now accepted and ready to land.Apr 20 2022, 1:47 PM
This revision was landed with ongoing or failed builds.Apr 20 2022, 7:28 PM Closed by commit rG5bd87350a5ae: [AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget. (authored by hsmhsm). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 422794 llvm/lib/Target/AMDGPU/SIFrameLowering.h
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
llvm/test/CodeGen/AMDGPU/accvgpr-copy.mir
llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
llvm/test/CodeGen/AMDGPU/agpr-copy-no-vgprs.mir
llvm/test/CodeGen/AMDGPU/agpr-copy-sgpr-no-vgprs.mir
llvm/test/CodeGen/AMDGPU/agpr-remat.ll
llvm/test/CodeGen/AMDGPU/agpr-usage-should-fail-on-gfx900.ll
llvm/test/CodeGen/AMDGPU/alloc-aligned-tuples-gfx908.mir
llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir
llvm/test/CodeGen/AMDGPU/pei-build-av-spill.mir
llvm/test/CodeGen/AMDGPU/pei-build-spill.mir
llvm/test/CodeGen/AMDGPU/regalloc-introduces-copy-sgpr-to-agpr.mir
llvm/test/CodeGen/AMDGPU/sgpr-spill-vmem-large-frame.mir
llvm/test/CodeGen/AMDGPU/spill-agpr-gfx908.ll
llvm/test/CodeGen/AMDGPU/spill-agpr-gfx90a.ll
llvm/test/CodeGen/AMDGPU/spill-agpr-partially-undef.mir
llvm/test/CodeGen/AMDGPU/spill-agpr.ll
llvm/test/CodeGen/AMDGPU/spill-agpr.mir
llvm/test/CodeGen/AMDGPU/spill-vgpr-on-gfx900.ll
llvm/test/CodeGen/AMDGPU/spill-vgpr-to-agpr-gfx908.ll
|
Just use Register instead of auto for these