diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4562,7 +4562,7 @@
 
 If the kernel has function calls it must set up the ABI stack pointer described
 in :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions` by setting
-SGPR32 to the unswizzled scratch offset of the address past the last local
+SGPR35 to the unswizzled scratch offset of the address past the last local
 allocation.
 
 .. _amdgpu-amdhsa-kernel-prolog-frame-pointer:
@@ -4571,7 +4571,7 @@
 +++++++++++++
 
 If the kernel needs a frame pointer for the reasons defined in
-``SIFrameLowering`` then SGPR33 is used and is always set to ``0`` in the
+``SIFrameLowering`` then SGPR40 is used and is always set to ``0`` in the
 kernel prolog. If a frame pointer is not required then all uses of the frame
 pointer are replaced with immediate ``0`` offsets.
 
@@ -4688,7 +4688,7 @@
 follows:
 
   - If it is known during instruction selection that there is stack usage,
-    SGPR0-3 is reserved for use as the scratch V#.  Stack usage is assumed if
+    SGPR24-27 is reserved for use as the scratch V#.  Stack usage is assumed if
     optimizations are disabled (``-O0``), if stack objects already exist (for
     locals, etc.), or if there are any function calls.
 
@@ -10745,29 +10745,29 @@
 
 On entry to a function:
 
-1.  SGPR0-3 contain a V# with the following properties (see
+1.  The FLAT_SCRATCH register pair is setup. See
+    :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
+2.  GFX6-GFX8: M0 register set to the size of LDS in bytes. See
+    :ref:`amdgpu-amdhsa-kernel-prolog-m0`.
+3.  The EXEC register is set to the lanes active on entry to the function.
+4.  MODE register: *TBD*
+5.  VGPR0-31 and SGPR0-23 are used to pass function input arguments as described
+    below.
+6.  SGPR24-27 contain a V# with the following properties (see
     :ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`):
 
     * Base address pointing to the beginning of the wavefront scratch backing
       memory.
     * Swizzled with dword element size and stride of wavefront size elements.
 
-2.  The FLAT_SCRATCH register pair is setup. See
-    :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
-3.  GFX6-GFX8: M0 register set to the size of LDS in bytes. See
-    :ref:`amdgpu-amdhsa-kernel-prolog-m0`.
-4.  The EXEC register is set to the lanes active on entry to the function.
-5.  MODE register: *TBD*
-6.  VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
-    below.
-7.  SGPR30-31 return address (RA). The code address that the function must
+7.  SGPR28-29 return address (RA). The code address that the function must
     return to when it completes. The value is undefined if the function is *no
     return*.
-8.  SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch
+8.  SGPR35 is used for the stack pointer (SP). It is an unswizzled scratch
     offset relative to the beginning of the wavefront scratch backing memory.
 
     The unswizzled SP can be used with buffer instructions as an unswizzled SGPR
-    offset with the scratch V# in SGPR0-3 to access the stack in a swizzled
+    offset with the scratch V# in SGPR24-27 to access the stack in a swizzled
     manner.
 
     The unswizzled SP value can be converted into the swizzled SP value by:
@@ -10797,7 +10797,7 @@
     ``alloca`` local allocations.
 
     If the function calls another function, it will place any stack allocated
-    arguments after the last local allocation and adjust SGPR32 to the address
+    arguments after the last local allocation and adjust SGPR35 to the address
     after the last local allocation.
 
 9.  All other registers are unspecified.
@@ -10806,14 +10806,14 @@
 
 On exit from a function:
 
-1.  VGPR0-31 and SGPR4-29 are used to pass function result arguments as
+1.  VGPR0-31 and SGPR0-23 are used to pass function result arguments as
     described below. Any registers used are considered clobbered registers.
 2.  The following registers are preserved and have the same value as on entry:
 
     * FLAT_SCRATCH
     * EXEC
     * GFX6-GFX8: M0
-    * All SGPR registers except the clobbered registers of SGPR4-31.
+    * All SGPR registers except the clobbered registers of SGPR0-23.
     * VGPR40-47
     * VGPR56-63
     * VGPR72-79
@@ -11005,7 +11005,7 @@
     How are overly aligned structures allocated on the stack?
 
 * SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to
-  SGPR29.
+  SGPR23.
 
   If there are more arguments than will fit in these registers, the remaining
   arguments are allocated on the stack in order on naturally aligned
@@ -11024,10 +11024,10 @@
 The following is not part of the AMDGPU function calling convention but
 describes how the AMDGPU implements function calls:
 
-1.  SGPR33 is used as a frame pointer (FP) if necessary. Like the SP it is an
+1.  SGPR40 is used as a frame pointer (FP) if necessary. Like the SP it is an
     unswizzled scratch address. It is only needed if runtime sized ``alloca``
     are used, or for the reasons defined in ``SIFrameLowering``.
-2.  Runtime stack alignment is supported. SGPR34 is used as a base pointer (BP)
+2.  Runtime stack alignment is supported. SGPR41 is used as a base pointer (BP)
     to access the incoming stack arguments in the function. The BP is needed
     only when the function requires the runtime stack alignment.
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
@@ -1115,12 +1115,13 @@
     const GCNSubtarget &ST, const SIMachineFunctionInfo &FuncInfo,
     ArrayRef<std::pair<MCRegister, Register>> ImplicitArgRegs) const {
   if (!ST.enableFlatScratch()) {
+    const SIRegisterInfo *TRI = ST.getRegisterInfo();
     // Insert copies for the SRD. In the HSA case, this should be an identity
     // copy.
     auto ScratchRSrcReg = MIRBuilder.buildCopy(LLT::fixed_vector(4, 32),
                                                FuncInfo.getScratchRSrcReg());
-    MIRBuilder.buildCopy(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, ScratchRSrcReg);
-    CallInst.addReg(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, RegState::Implicit);
+    MIRBuilder.buildCopy(TRI->getScratchRSrcReg(), ScratchRSrcReg);
+    CallInst.addReg(TRI->getScratchRSrcReg(), RegState::Implicit);
   }
 
   for (std::pair<MCRegister, Register> ArgReg : ImplicitArgRegs) {
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
--- a/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td
@@ -17,14 +17,11 @@
 
 // Calling convention for SI
 def CC_SI_Gfx : CallingConv<[
-  // 0-3 are reserved for the stack buffer descriptor
-  // 30-31 are reserved for the return address
-  // 32 is reserved for the stack pointer
+  // SGPR24 onwards is reserved for the stack pointer, return address, etc.
   CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
-    SGPR4, SGPR5, SGPR6, SGPR7,
+    SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
     SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
     SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
-    SGPR24, SGPR25, SGPR26, SGPR27, SGPR28, SGPR29,
   ]>>>,
 
   CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
@@ -41,15 +38,11 @@
   CCIfType<[i1], CCPromoteToType<i32>>,
   CCIfType<[i1, i16], CCIfExtend<CCPromoteToType<i32>>>,
 
-  // 0-3 are reserved for the stack buffer descriptor
-  // 32 is reserved for the stack pointer
+  // SGPR24 onwards is reserved for the stack pointer, return address, etc.
   CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
-    SGPR4, SGPR5, SGPR6, SGPR7,
+    SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
     SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
     SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
-    SGPR24, SGPR25, SGPR26, SGPR27, SGPR28, SGPR29, SGPR30, SGPR31,
-    SGPR33, SGPR34, SGPR35, SGPR36, SGPR37, SGPR38, SGPR39,
-    SGPR40, SGPR41, SGPR42, SGPR43
   ]>>>,
 
   CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -892,7 +892,7 @@
   setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);
 
   // FIXME: In other contexts we pretend this is a per-function property.
-  setStackPointerRegisterToSaveRestore(AMDGPU::SGPR32);
+  setStackPointerRegisterToSaveRestore(AMDGPU::SGPR35);
 
   setSchedulingPreference(Sched::RegPressure);
 }
@@ -2242,13 +2242,13 @@
   // only ever use S32 as the call ABI stack pointer, and so using it does not
   // imply we need a separate frame pointer.
   //
-  // Try to use s32 as the SP, but move it if it would interfere with input
+  // Try to use s35 as the SP, but move it if it would interfere with input
   // arguments. This won't work with calls though.
   //
   // FIXME: Move SP to avoid any possible inputs, or find a way to spill input
   // registers.
-  if (!MRI.isLiveIn(AMDGPU::SGPR32)) {
-    Info.setStackPtrOffsetReg(AMDGPU::SGPR32);
+  if (!MRI.isLiveIn(AMDGPU::SGPR35)) {
+    Info.setStackPtrOffsetReg(AMDGPU::SGPR35);
   } else {
     assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));
 
@@ -2270,7 +2270,7 @@
   // finalized, because it does not rely on the known stack size, only
   // properties like whether variable sized objects are present.
   if (ST.getFrameLowering()->hasFP(MF)) {
-    Info.setFrameOffsetReg(AMDGPU::SGPR33);
+    Info.setFrameOffsetReg(AMDGPU::SGPR40);
   }
 }
 
@@ -3134,7 +3134,8 @@
       // In the HSA case, this should be an identity copy.
       SDValue ScratchRSrcReg
         = DAG.getCopyFromReg(Chain, DL, Info->getScratchRSrcReg(), MVT::v4i32);
-      RegsToPass.emplace_back(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, ScratchRSrcReg);
+      const SIRegisterInfo *TRI = getSubtarget()->getRegisterInfo();
+      RegsToPass.emplace_back(TRI->getScratchRSrcReg(), ScratchRSrcReg);
       CopyFromChains.push_back(ScratchRSrcReg.getValue(1));
       Chain = DAG.getTokenFactor(DL, CopyFromChains);
     }
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -51,6 +51,7 @@
     HighBitsOf32BitAddress(0),
     GDSSize(0) {
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
+  const SIRegisterInfo *TRI = ST.getRegisterInfo();
   const Function &F = MF.getFunction();
   FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);
   WavesPerEU = ST.getWavesPerEU(F);
@@ -84,13 +85,13 @@
       ArgInfo = AMDGPUArgumentUsageInfo::FixedABIFunctionInfo;
 
     // TODO: Pick a high register, and shift down, similar to a kernel.
-    FrameOffsetReg = AMDGPU::SGPR33;
-    StackPtrOffsetReg = AMDGPU::SGPR32;
+    FrameOffsetReg = AMDGPU::SGPR40;
+    StackPtrOffsetReg = AMDGPU::SGPR35;
 
     if (!ST.enableFlatScratch()) {
       // Non-entry functions have no special inputs for now, other registers
       // required for scratch access.
-      ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;
+      ScratchRSrcReg = TRI->getScratchRSrcReg();
 
       ArgInfo.PrivateSegmentBuffer =
         ArgDescriptor::createRegister(ScratchRSrcReg);
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -276,6 +276,8 @@
 
   MCRegister getReturnAddressReg(const MachineFunction &MF) const;
 
+  Register getScratchRSrcReg() const;
+
   const TargetRegisterClass *
   getRegClassForSizeOnBank(unsigned Size,
                            const RegisterBank &Bank,
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -406,7 +406,7 @@
   return MFI.getNumFixedObjects() && shouldRealignStack(MF);
 }
 
-Register SIRegisterInfo::getBaseRegister() const { return AMDGPU::SGPR34; }
+Register SIRegisterInfo::getBaseRegister() const { return AMDGPU::SGPR40; }
 
 const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {
   return CSR_AMDGPU_AllVGPRs_RegMask;
@@ -2387,7 +2387,11 @@
 
 MCRegister SIRegisterInfo::getReturnAddressReg(const MachineFunction &MF) const {
   // Not a callee saved register.
-  return AMDGPU::SGPR30_SGPR31;
+  return AMDGPU::SGPR28_SGPR29;
+}
+
+Register SIRegisterInfo::getScratchRSrcReg() const {
+  return AMDGPU::SGPR24_SGPR25_SGPR26_SGPR27;
 }
 
 const TargetRegisterClass *