diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -4562,7 +4562,7 @@ If the kernel has function calls it must set up the ABI stack pointer described in :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions` by setting -SGPR32 to the unswizzled scratch offset of the address past the last local +SGPR35 to the unswizzled scratch offset of the address past the last local allocation. .. _amdgpu-amdhsa-kernel-prolog-frame-pointer: @@ -4571,7 +4571,7 @@ +++++++++++++ If the kernel needs a frame pointer for the reasons defined in -``SIFrameLowering`` then SGPR33 is used and is always set to ``0`` in the +``SIFrameLowering`` then SGPR40 is used and is always set to ``0`` in the kernel prolog. If a frame pointer is not required then all uses of the frame pointer are replaced with immediate ``0`` offsets. @@ -4688,7 +4688,7 @@ follows: - If it is known during instruction selection that there is stack usage, - SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if + SGPR24-27 is reserved for use as the scratch V#. Stack usage is assumed if optimizations are disabled (``-O0``), if stack objects already exist (for locals, etc.), or if there are any function calls. @@ -10745,29 +10745,29 @@ On entry to a function: -1. SGPR0-3 contain a V# with the following properties (see +1. The FLAT_SCRATCH register pair is setup. See + :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`. +2. GFX6-GFX8: M0 register set to the size of LDS in bytes. See + :ref:`amdgpu-amdhsa-kernel-prolog-m0`. +3. The EXEC register is set to the lanes active on entry to the function. +4. MODE register: *TBD* +5. VGPR0-31 and SGPR0-23 are used to pass function input arguments as described + below. +6. SGPR24-27 contain a V# with the following properties (see :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`): * Base address pointing to the beginning of the wavefront scratch backing memory. * Swizzled with dword element size and stride of wavefront size elements. -2. The FLAT_SCRATCH register pair is setup. See - :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`. -3. GFX6-GFX8: M0 register set to the size of LDS in bytes. See - :ref:`amdgpu-amdhsa-kernel-prolog-m0`. -4. The EXEC register is set to the lanes active on entry to the function. -5. MODE register: *TBD* -6. VGPR0-31 and SGPR4-29 are used to pass function input arguments as described - below. -7. SGPR30-31 return address (RA). The code address that the function must +7. SGPR28-29 return address (RA). The code address that the function must return to when it completes. The value is undefined if the function is *no return*. -8. SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch +8. SGPR35 is used for the stack pointer (SP). It is an unswizzled scratch offset relative to the beginning of the wavefront scratch backing memory. The unswizzled SP can be used with buffer instructions as an unswizzled SGPR - offset with the scratch V# in SGPR0-3 to access the stack in a swizzled + offset with the scratch V# in SGPR24-27 to access the stack in a swizzled manner. The unswizzled SP value can be converted into the swizzled SP value by: @@ -10797,7 +10797,7 @@ ``alloca`` local allocations. If the function calls another function, it will place any stack allocated - arguments after the last local allocation and adjust SGPR32 to the address + arguments after the last local allocation and adjust SGPR35 to the address after the last local allocation. 9. All other registers are unspecified. @@ -10806,14 +10806,14 @@ On exit from a function: -1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as +1. VGPR0-31 and SGPR0-23 are used to pass function result arguments as described below. Any registers used are considered clobbered registers. 2. The following registers are preserved and have the same value as on entry: * FLAT_SCRATCH * EXEC * GFX6-GFX8: M0 - * All SGPR registers except the clobbered registers of SGPR4-31. + * All SGPR registers except the clobbered registers of SGPR0-23. * VGPR40-47 * VGPR56-63 * VGPR72-79 @@ -11005,7 +11005,7 @@ How are overly aligned structures allocated on the stack? * SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to - SGPR29. + SGPR23. If there are more arguments than will fit in these registers, the remaining arguments are allocated on the stack in order on naturally aligned @@ -11024,10 +11024,10 @@ The following is not part of the AMDGPU function calling convention but describes how the AMDGPU implements function calls: -1. SGPR33 is used as a frame pointer (FP) if necessary. Like the SP it is an +1. SGPR40 is used as a frame pointer (FP) if necessary. Like the SP it is an unswizzled scratch address. It is only needed if runtime sized ``alloca`` are used, or for the reasons defined in ``SIFrameLowering``. -2. Runtime stack alignment is supported. SGPR34 is used as a base pointer (BP) +2. Runtime stack alignment is supported. SGPR41 is used as a base pointer (BP) to access the incoming stack arguments in the function. The BP is needed only when the function requires the runtime stack alignment. diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp --- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp @@ -1115,12 +1115,13 @@ const GCNSubtarget &ST, const SIMachineFunctionInfo &FuncInfo, ArrayRef> ImplicitArgRegs) const { if (!ST.enableFlatScratch()) { + const SIRegisterInfo *TRI = ST.getRegisterInfo(); // Insert copies for the SRD. In the HSA case, this should be an identity // copy. auto ScratchRSrcReg = MIRBuilder.buildCopy(LLT::fixed_vector(4, 32), FuncInfo.getScratchRSrcReg()); - MIRBuilder.buildCopy(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, ScratchRSrcReg); - CallInst.addReg(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, RegState::Implicit); + MIRBuilder.buildCopy(TRI->getScratchRSrcReg(), ScratchRSrcReg); + CallInst.addReg(TRI->getScratchRSrcReg(), RegState::Implicit); } for (std::pair ArgReg : ImplicitArgRegs) { diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td --- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td @@ -17,14 +17,11 @@ // Calling convention for SI def CC_SI_Gfx : CallingConv<[ - // 0-3 are reserved for the stack buffer descriptor - // 30-31 are reserved for the return address - // 32 is reserved for the stack pointer + // SGPR24 onwards is reserved for the stack pointer, return address, etc. CCIfInReg>>, CCIfNotInReg>, CCIfType<[i1, i16], CCIfExtend>>, - // 0-3 are reserved for the stack buffer descriptor - // 32 is reserved for the stack pointer + // SGPR24 onwards is reserved for the stack pointer, return address, etc. CCIfInReg>>, CCIfNotInReghasFP(MF)) { - Info.setFrameOffsetReg(AMDGPU::SGPR33); + Info.setFrameOffsetReg(AMDGPU::SGPR40); } } @@ -3134,7 +3134,8 @@ // In the HSA case, this should be an identity copy. SDValue ScratchRSrcReg = DAG.getCopyFromReg(Chain, DL, Info->getScratchRSrcReg(), MVT::v4i32); - RegsToPass.emplace_back(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, ScratchRSrcReg); + const SIRegisterInfo *TRI = getSubtarget()->getRegisterInfo(); + RegsToPass.emplace_back(TRI->getScratchRSrcReg(), ScratchRSrcReg); CopyFromChains.push_back(ScratchRSrcReg.getValue(1)); Chain = DAG.getTokenFactor(DL, CopyFromChains); } diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp --- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp @@ -51,6 +51,7 @@ HighBitsOf32BitAddress(0), GDSSize(0) { const GCNSubtarget &ST = MF.getSubtarget(); + const SIRegisterInfo *TRI = ST.getRegisterInfo(); const Function &F = MF.getFunction(); FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F); WavesPerEU = ST.getWavesPerEU(F); @@ -84,13 +85,13 @@ ArgInfo = AMDGPUArgumentUsageInfo::FixedABIFunctionInfo; // TODO: Pick a high register, and shift down, similar to a kernel. - FrameOffsetReg = AMDGPU::SGPR33; - StackPtrOffsetReg = AMDGPU::SGPR32; + FrameOffsetReg = AMDGPU::SGPR40; + StackPtrOffsetReg = AMDGPU::SGPR35; if (!ST.enableFlatScratch()) { // Non-entry functions have no special inputs for now, other registers // required for scratch access. - ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3; + ScratchRSrcReg = TRI->getScratchRSrcReg(); ArgInfo.PrivateSegmentBuffer = ArgDescriptor::createRegister(ScratchRSrcReg); diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h @@ -276,6 +276,8 @@ MCRegister getReturnAddressReg(const MachineFunction &MF) const; + Register getScratchRSrcReg() const; + const TargetRegisterClass * getRegClassForSizeOnBank(unsigned Size, const RegisterBank &Bank, diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp --- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp +++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp @@ -406,7 +406,7 @@ return MFI.getNumFixedObjects() && shouldRealignStack(MF); } -Register SIRegisterInfo::getBaseRegister() const { return AMDGPU::SGPR34; } +Register SIRegisterInfo::getBaseRegister() const { return AMDGPU::SGPR40; } const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const { return CSR_AMDGPU_AllVGPRs_RegMask; @@ -2387,7 +2387,11 @@ MCRegister SIRegisterInfo::getReturnAddressReg(const MachineFunction &MF) const { // Not a callee saved register. - return AMDGPU::SGPR30_SGPR31; + return AMDGPU::SGPR28_SGPR29; +} + +Register SIRegisterInfo::getScratchRSrcReg() const { + return AMDGPU::SGPR24_SGPR25_SGPR26_SGPR27; } const TargetRegisterClass *