NFCI. The motivation for this is avoid problems in future if we add new
classes containing only a subset of all VGPRs, or a subset of all SGPRs.
getMinimalPhysRegClass would favour these smaller classes, which is not
what we want here.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2551 | As an alternative, I did try implementing a getMaximalAllocatablePhysRegClass and using it here. That seemed to cause other problems, but I can pursue it further if you think it's the right way to go. |
I think a switch case over known RegClasses that we want to use is the way to go. The only question is do we need to handle more cases. Is this passing testing? When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?
I'm assuming your failures are on 16-bit types which still use a 32-bit register class
Yes.
When I put assert(VT.getSizeInBits() == 32) here I get massive test failures. So what happens to wider types, they all become VGPR_32?
See the calling conv definitions in AMDGPUCallingConv.td. All supported types (including i16 and f16) are passed in 32-bit GPRs, either VGPRn or SGPRn.
Do you prefer this (cribbed from SITargetLowering::insertCopiesSplitCSR)?
const TargetRegisterClass *RC = nullptr; if (AMDGPU::SGPR_32RegClass.contains(Reg)) RC = &AMDGPU::SGPR_32RegClass; else if (AMDGPU::VGPR_32RegClass.contains(Reg)) RC = &AMDGPU::VGPR_32RegClass; else llvm_unreachable("Unexpected register class in LowerFormalArguments!");
As you say, it seems guaranteed to be SGPR or VGPR by the calling convention. They compile to the same thing under optimization right? If so I slightly prefer the llvm_unreachable version because the intent is locally clear.
As an alternative, I did try implementing a getMaximalAllocatablePhysRegClass and using it here. That seemed to cause other problems, but I can pursue it further if you think it's the right way to go.