This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][RFC] Improve sgpr function arguments
Needs RevisionPublic

Authored by sebastian-ne on May 10 2021, 9:18 AM.

Download Raw Diff

Details

Reviewers

t-tye
arsenm
madhur13490

Summary

The SGPR layout on functions calls currently looks like this:

s[0:3] SRD

arguments...

s[30:31] return address

s32 stack pointer

s33 frame pointer

s34 base pointer

The return address and stack pointer occupy multiple 4-aligned blocks
of SGPRs.
Large scalar memory reads require a 4-aligned block of SGPRs, so if less
of them are available, register allocation becomes more difficult.

The stack resource descriptor occupies SGPR0-3. If we want to pass
user-data SGPRS to a function, the SGPRs need to be moved from s[0:...]
to s[4:...] before the call.
This is also the case when flat scratch is used instead of the SRD, even
if s[0:4] is unused then, because the same call convention is used.

To improve this, I propose the following layout:

arguments...

s[24:27] SRD

s[28:29] return address

...

s35 stack pointer

...

s40 frame pointer

s41 base pointer

To free s[0:3] for arguments, the SRD is moved to s[24:27]. This has
the effect, that all of s[0:23] can be used for arguments.

The base pointer is not used in the general case, so it is moved to
s41.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	200 ms	x64 debian > LLVM.CodeGen/AMDGPU::abi-attribute-hints-undefined-behavior.ll
	460 ms	x64 debian > LLVM.CodeGen/AMDGPU::addrspacecast.ll
	100 ms	x64 debian > LLVM.CodeGen/AMDGPU::agpr-remat.ll
	330 ms	x64 debian > LLVM.CodeGen/AMDGPU::amdgpu-codegenprepare-fold-binop-select.ll
	360 ms	x64 debian > LLVM.CodeGen/AMDGPU::amdpal-callable.ll
		View Full Test Results (253 Failed)

Event Timeline

sebastian-ne created this revision.May 10 2021, 9:18 AM

Herald added subscribers: kerbowa, hiraditya, tpr and 5 others. · View Herald TranscriptMay 10 2021, 9:18 AM

sebastian-ne requested review of this revision.May 10 2021, 9:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2021, 9:18 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B103508: Diff 344085.May 10 2021, 9:32 AM

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

In D102177#2748278, @arsenm wrote:

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

In D102177#2750094, @madhur13490 wrote:

In D102177#2748278, @arsenm wrote:

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

In D102177#2750256, @sebastian-ne wrote:

In D102177#2750094, @madhur13490 wrote:

In D102177#2748278, @arsenm wrote:

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

Well, then you're saying unification of both ABIs and it is not discussed thoroughly internally. The layout needs to be documented and get reviewed internally before we can proceed with this patch.

In D102177#2750461, @madhur13490 wrote:

In D102177#2750256, @sebastian-ne wrote:

In D102177#2750094, @madhur13490 wrote:

In D102177#2748278, @arsenm wrote:

We have another proposal we were working on to rearrange these a bit differently. We need to account for a few more inputs in the layout

As long as this remains in GFX land, we should be fine with it because our new proposal is for compute only (as of now).

I would like to keep the same calling convention in compute and graphics. At least regarding the stack pointer and others, because I don’t see a compelling reason to diverge even more. Actually, I’d like it if they were more common than they are now, because we implement some things twice at the moment.
The compute proposal should work just fine; if we move the stack and frame pointer, we end up with the same benefits as in this patch. I commented on the internal proposal for this (I hope I found the right one?).

Well, then you're saying unification of both ABIs and it is not discussed thoroughly internally. The layout needs to be documented and get reviewed internally before we can proceed with this patch.

I think the reason that compute was considering putting FP and BP at the high registers is because they can often be optimized away. So putting them there allows them to be contiguous with other callee user registers rather then leaving an unused hole that is harder for the register allocator to use. Is that reasonable? How does that work for gfx?

I am all for trying to unify the call convention for compute and gfx if possible:-)

Move registers to be more in line with other plans.

Herald added a subscriber: foad. · View Herald TranscriptOct 12 2021, 4:29 AM

sebastian-ne edited the summary of this revision. (Show Details)Oct 12 2021, 4:30 AM

Harbormaster completed remote builds in B128326: Diff 378969.Oct 12 2021, 5:11 AM

sebastian-ne mentioned this in D111637: [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values..Oct 20 2021, 12:16 AM

Current proposal is different values

This revision now requires changes to proceed.Nov 16 2022, 4:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 4:11 PM

Herald added a subscriber: kosarev. · View Herald Transcript

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

42 lines

lib/

Target/

AMDGPU/

AMDGPUCallLowering.cpp

5 lines

AMDGPUCallingConv.td

15 lines

SIISelLowering.cpp

13 lines

SIMachineFunctionInfo.cpp

7 lines

SIRegisterInfo.h

2 lines

SIRegisterInfo.cpp

8 lines

Diff 378969

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,556 Lines • ▼ Show 20 Lines

.. _amdgpu-amdhsa-kernel-prolog-stack-pointer:		.. _amdgpu-amdhsa-kernel-prolog-stack-pointer:

Stack Pointer		Stack Pointer
+++++++++++++		+++++++++++++

If the kernel has function calls it must set up the ABI stack pointer described		If the kernel has function calls it must set up the ABI stack pointer described
in :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions` by setting		in :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions` by setting
SGPR32 to the unswizzled scratch offset of the address past the last local		SGPR35 to the unswizzled scratch offset of the address past the last local
allocation.		allocation.

.. _amdgpu-amdhsa-kernel-prolog-frame-pointer:		.. _amdgpu-amdhsa-kernel-prolog-frame-pointer:

Frame Pointer		Frame Pointer
+++++++++++++		+++++++++++++

If the kernel needs a frame pointer for the reasons defined in		If the kernel needs a frame pointer for the reasons defined in
``SIFrameLowering`` then SGPR33 is used and is always set to ``0`` in the		``SIFrameLowering`` then SGPR40 is used and is always set to ``0`` in the
kernel prolog. If a frame pointer is not required then all uses of the frame		kernel prolog. If a frame pointer is not required then all uses of the frame
pointer are replaced with immediate ``0`` offsets.		pointer are replaced with immediate ``0`` offsets.

.. _amdgpu-amdhsa-kernel-prolog-flat-scratch:		.. _amdgpu-amdhsa-kernel-prolog-flat-scratch:

Flat Scratch		Flat Scratch
++++++++++++		++++++++++++

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
runtime. It is used, together with Scratch Wavefront Offset as an offset, to		runtime. It is used, together with Scratch Wavefront Offset as an offset, to
access the private memory space using a segment address. See		access the private memory space using a segment address. See
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`.		:ref:`amdgpu-amdhsa-initial-kernel-execution-state`.

The scratch V# is a four-aligned SGPR and always selected for the kernel as		The scratch V# is a four-aligned SGPR and always selected for the kernel as
follows:		follows:

- If it is known during instruction selection that there is stack usage,		- If it is known during instruction selection that there is stack usage,
SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if		SGPR24-27 is reserved for use as the scratch V#. Stack usage is assumed if
optimizations are disabled (``-O0``), if stack objects already exist (for		optimizations are disabled (``-O0``), if stack objects already exist (for
locals, etc.), or if there are any function calls.		locals, etc.), or if there are any function calls.

- Otherwise, four high numbered SGPRs beginning at a four-aligned SGPR index		- Otherwise, four high numbered SGPRs beginning at a four-aligned SGPR index
are reserved for the tentative scratch V#. These will be used if it is		are reserved for the tentative scratch V#. These will be used if it is
determined that spilling is needed.		determined that spilling is needed.

- If no use is made of the tentative scratch V#, then it is unreserved,		- If no use is made of the tentative scratch V#, then it is unreserved,
▲ Show 20 Lines • Show All 6,040 Lines • ▼ Show 20 Lines
outer kernel function.		outer kernel function.

If a kernel has function calls then scratch is always allocated and used for		If a kernel has function calls then scratch is always allocated and used for
the call stack which grows from low address to high address using the swizzled		the call stack which grows from low address to high address using the swizzled
scratch address space.		scratch address space.

On entry to a function:		On entry to a function:

1. SGPR0-3 contain a V# with the following properties (see		1. The FLAT_SCRATCH register pair is setup. See
		:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
		2. GFX6-GFX8: M0 register set to the size of LDS in bytes. See
		:ref:`amdgpu-amdhsa-kernel-prolog-m0`.
		3. The EXEC register is set to the lanes active on entry to the function.
		4. MODE register: TBD
		5. VGPR0-31 and SGPR0-23 are used to pass function input arguments as described
		below.
		6. SGPR24-27 contain a V# with the following properties (see
:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`):		:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`):

* Base address pointing to the beginning of the wavefront scratch backing		* Base address pointing to the beginning of the wavefront scratch backing
memory.		memory.
* Swizzled with dword element size and stride of wavefront size elements.		* Swizzled with dword element size and stride of wavefront size elements.

2. The FLAT_SCRATCH register pair is setup. See		7. SGPR28-29 return address (RA). The code address that the function must
:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
3. GFX6-GFX8: M0 register set to the size of LDS in bytes. See
:ref:`amdgpu-amdhsa-kernel-prolog-m0`.
4. The EXEC register is set to the lanes active on entry to the function.
5. MODE register: TBD
6. VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
below.
7. SGPR30-31 return address (RA). The code address that the function must
return to when it completes. The value is undefined if the function is *no		return to when it completes. The value is undefined if the function is *no
return*.		return*.
8. SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch		8. SGPR35 is used for the stack pointer (SP). It is an unswizzled scratch
offset relative to the beginning of the wavefront scratch backing memory.		offset relative to the beginning of the wavefront scratch backing memory.

The unswizzled SP can be used with buffer instructions as an unswizzled SGPR		The unswizzled SP can be used with buffer instructions as an unswizzled SGPR
offset with the scratch V# in SGPR0-3 to access the stack in a swizzled		offset with the scratch V# in SGPR24-27 to access the stack in a swizzled
manner.		manner.

The unswizzled SP value can be converted into the swizzled SP value by:		The unswizzled SP value can be converted into the swizzled SP value by:

\| swizzled SP = unswizzled SP / wavefront size		\| swizzled SP = unswizzled SP / wavefront size

This may be used to obtain the private address space address of stack		This may be used to obtain the private address space address of stack
objects and to convert this address to a flat address by adding the flat		objects and to convert this address to a flat address by adding the flat
Show All 13 Lines	8. SGPR35 is used for the stack pointer (SP). It is an unswizzled scratch

The function may use positive offsets beyond the last stack passed argument		The function may use positive offsets beyond the last stack passed argument
for stack allocated local variables and register spill slots. If necessary,		for stack allocated local variables and register spill slots. If necessary,
the function may align these to greater alignment than 16 bytes. After these		the function may align these to greater alignment than 16 bytes. After these
the function may dynamically allocate space for such things as runtime sized		the function may dynamically allocate space for such things as runtime sized
``alloca`` local allocations.		``alloca`` local allocations.

If the function calls another function, it will place any stack allocated		If the function calls another function, it will place any stack allocated
arguments after the last local allocation and adjust SGPR32 to the address		arguments after the last local allocation and adjust SGPR35 to the address
after the last local allocation.		after the last local allocation.

9. All other registers are unspecified.		9. All other registers are unspecified.
10. Any necessary ``s_waitcnt`` has been performed to ensure memory is available		10. Any necessary ``s_waitcnt`` has been performed to ensure memory is available
to the function.		to the function.

On exit from a function:		On exit from a function:

1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as		1. VGPR0-31 and SGPR0-23 are used to pass function result arguments as
described below. Any registers used are considered clobbered registers.		described below. Any registers used are considered clobbered registers.
2. The following registers are preserved and have the same value as on entry:		2. The following registers are preserved and have the same value as on entry:

* FLAT_SCRATCH		* FLAT_SCRATCH
* EXEC		* EXEC
* GFX6-GFX8: M0		* GFX6-GFX8: M0
* All SGPR registers except the clobbered registers of SGPR4-31.		* All SGPR registers except the clobbered registers of SGPR0-23.
* VGPR40-47		* VGPR40-47
* VGPR56-63		* VGPR56-63
* VGPR72-79		* VGPR72-79
* VGPR88-95		* VGPR88-95
* VGPR104-111		* VGPR104-111
* VGPR120-127		* VGPR120-127
* VGPR136-143		* VGPR136-143
* VGPR152-159		* VGPR152-159
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	* VGPR arguments are assigned to consecutive VGPRs starting at VGPR0 up to
arguments are allocated on the stack in order on naturally aligned		arguments are allocated on the stack in order on naturally aligned
addresses.		addresses.

.. TODO::		.. TODO::

How are overly aligned structures allocated on the stack?		How are overly aligned structures allocated on the stack?

* SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to		* SGPR arguments are assigned to consecutive SGPRs starting at SGPR0 up to
SGPR29.		SGPR23.

If there are more arguments than will fit in these registers, the remaining		If there are more arguments than will fit in these registers, the remaining
arguments are allocated on the stack in order on naturally aligned		arguments are allocated on the stack in order on naturally aligned
addresses.		addresses.

Note that decomposed struct type arguments may have some fields passed in		Note that decomposed struct type arguments may have some fields passed in
registers and some in memory.		registers and some in memory.

.. TODO::		.. TODO::

So, a struct which can pass some fields as decomposed register arguments, will		So, a struct which can pass some fields as decomposed register arguments, will
pass the rest as decomposed stack elements? But an argument that will not start		pass the rest as decomposed stack elements? But an argument that will not start
in registers will not be decomposed and will be passed as a non-decomposed		in registers will not be decomposed and will be passed as a non-decomposed
stack value?		stack value?

The following is not part of the AMDGPU function calling convention but		The following is not part of the AMDGPU function calling convention but
describes how the AMDGPU implements function calls:		describes how the AMDGPU implements function calls:

1. SGPR33 is used as a frame pointer (FP) if necessary. Like the SP it is an		1. SGPR40 is used as a frame pointer (FP) if necessary. Like the SP it is an
unswizzled scratch address. It is only needed if runtime sized ``alloca``		unswizzled scratch address. It is only needed if runtime sized ``alloca``
are used, or for the reasons defined in ``SIFrameLowering``.		are used, or for the reasons defined in ``SIFrameLowering``.
2. Runtime stack alignment is supported. SGPR34 is used as a base pointer (BP)		2. Runtime stack alignment is supported. SGPR41 is used as a base pointer (BP)
to access the incoming stack arguments in the function. The BP is needed		to access the incoming stack arguments in the function. The BP is needed
only when the function requires the runtime stack alignment.		only when the function requires the runtime stack alignment.

3. Allocating SGPR arguments on the stack are not supported.		3. Allocating SGPR arguments on the stack are not supported.

4. No CFI is currently generated. See		4. No CFI is currently generated. See
:ref:`amdgpu-dwarf-call-frame-information`.		:ref:`amdgpu-dwarf-call-frame-information`.

▲ Show 20 Lines • Show All 1,344 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

	Show First 20 Lines • Show All 1,109 Lines • ▼ Show 20 Lines
	// Insert outgoing implicit arguments for a call, by inserting copies to the			// Insert outgoing implicit arguments for a call, by inserting copies to the
	// implicit argument registers and adding the necessary implicit uses to the			// implicit argument registers and adding the necessary implicit uses to the
	// call instruction.			// call instruction.
	void AMDGPUCallLowering::handleImplicitCallArguments(			void AMDGPUCallLowering::handleImplicitCallArguments(
	MachineIRBuilder &MIRBuilder, MachineInstrBuilder &CallInst,			MachineIRBuilder &MIRBuilder, MachineInstrBuilder &CallInst,
	const GCNSubtarget &ST, const SIMachineFunctionInfo &FuncInfo,			const GCNSubtarget &ST, const SIMachineFunctionInfo &FuncInfo,
	ArrayRef<std::pair<MCRegister, Register>> ImplicitArgRegs) const {			ArrayRef<std::pair<MCRegister, Register>> ImplicitArgRegs) const {
	if (!ST.enableFlatScratch()) {			if (!ST.enableFlatScratch()) {
				const SIRegisterInfo *TRI = ST.getRegisterInfo();
	// Insert copies for the SRD. In the HSA case, this should be an identity			// Insert copies for the SRD. In the HSA case, this should be an identity
	// copy.			// copy.
	auto ScratchRSrcReg = MIRBuilder.buildCopy(LLT::fixed_vector(4, 32),			auto ScratchRSrcReg = MIRBuilder.buildCopy(LLT::fixed_vector(4, 32),
	FuncInfo.getScratchRSrcReg());			FuncInfo.getScratchRSrcReg());
	MIRBuilder.buildCopy(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, ScratchRSrcReg);			MIRBuilder.buildCopy(TRI->getScratchRSrcReg(), ScratchRSrcReg);
	CallInst.addReg(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, RegState::Implicit);			CallInst.addReg(TRI->getScratchRSrcReg(), RegState::Implicit);
	}			}

	for (std::pair<MCRegister, Register> ArgReg : ImplicitArgRegs) {			for (std::pair<MCRegister, Register> ArgReg : ImplicitArgRegs) {
	MIRBuilder.buildCopy((Register)ArgReg.first, ArgReg.second);			MIRBuilder.buildCopy((Register)ArgReg.first, ArgReg.second);
	CallInst.addReg(ArgReg.first, RegState::Implicit);			CallInst.addReg(ArgReg.first, RegState::Implicit);
	}			}
	}			}

	▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td

	Show All 11 Lines

	// Inversion of CCIfInReg			// Inversion of CCIfInReg
	class CCIfNotInReg<CCAction A> : CCIf<"!ArgFlags.isInReg()", A> {}			class CCIfNotInReg<CCAction A> : CCIf<"!ArgFlags.isInReg()", A> {}
	class CCIfExtend<CCAction A>			class CCIfExtend<CCAction A>
	: CCIf<"ArgFlags.isSExt() \|\| ArgFlags.isZExt()", A>;			: CCIf<"ArgFlags.isSExt() \|\| ArgFlags.isZExt()", A>;

	// Calling convention for SI			// Calling convention for SI
	def CC_SI_Gfx : CallingConv<[			def CC_SI_Gfx : CallingConv<[
	// 0-3 are reserved for the stack buffer descriptor			// SGPR24 onwards is reserved for the stack pointer, return address, etc.
	// 30-31 are reserved for the return address
	// 32 is reserved for the stack pointer
	CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[			CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
	SGPR4, SGPR5, SGPR6, SGPR7,			SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
	SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,			SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
	SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,			SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
	SGPR24, SGPR25, SGPR26, SGPR27, SGPR28, SGPR29,
	]>>>,			]>>>,

	CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[			CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
	VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,			VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,
	VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,			VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,
	VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,			VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,
	VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31			VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31
	]>>>,			]>>>,

	CCIfType<[i32, f32, v2i16, v2f16, i16, f16, i1], CCAssignToStack<4, 4>>			CCIfType<[i32, f32, v2i16, v2f16, i16, f16, i1], CCAssignToStack<4, 4>>
	]>;			]>;

	def RetCC_SI_Gfx : CallingConv<[			def RetCC_SI_Gfx : CallingConv<[
	CCIfType<[i1], CCPromoteToType<i32>>,			CCIfType<[i1], CCPromoteToType<i32>>,
	CCIfType<[i1, i16], CCIfExtend<CCPromoteToType<i32>>>,			CCIfType<[i1, i16], CCIfExtend<CCPromoteToType<i32>>>,

	// 0-3 are reserved for the stack buffer descriptor			// SGPR24 onwards is reserved for the stack pointer, return address, etc.
	// 32 is reserved for the stack pointer
	CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[			CCIfInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
	SGPR4, SGPR5, SGPR6, SGPR7,			SGPR0, SGPR1, SGPR2, SGPR3, SGPR4, SGPR5, SGPR6, SGPR7,
	SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,			SGPR8, SGPR9, SGPR10, SGPR11, SGPR12, SGPR13, SGPR14, SGPR15,
	SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,			SGPR16, SGPR17, SGPR18, SGPR19, SGPR20, SGPR21, SGPR22, SGPR23,
	SGPR24, SGPR25, SGPR26, SGPR27, SGPR28, SGPR29, SGPR30, SGPR31,
	SGPR33, SGPR34, SGPR35, SGPR36, SGPR37, SGPR38, SGPR39,
	SGPR40, SGPR41, SGPR42, SGPR43
	]>>>,			]>>>,

	CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[			CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
	VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,			VGPR0, VGPR1, VGPR2, VGPR3, VGPR4, VGPR5, VGPR6, VGPR7,
	VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,			VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,
	VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,			VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,
	VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31,			VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31,
	VGPR32, VGPR33, VGPR34, VGPR35, VGPR36, VGPR37, VGPR38, VGPR39,			VGPR32, VGPR33, VGPR34, VGPR35, VGPR36, VGPR37, VGPR38, VGPR39,
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 886 Lines • ▼ Show 20 Lines	#endif
setTargetDAGCombine(ISD::ATOMIC_LOAD_MAX);		setTargetDAGCombine(ISD::ATOMIC_LOAD_MAX);
setTargetDAGCombine(ISD::ATOMIC_LOAD_UMIN);		setTargetDAGCombine(ISD::ATOMIC_LOAD_UMIN);
setTargetDAGCombine(ISD::ATOMIC_LOAD_UMAX);		setTargetDAGCombine(ISD::ATOMIC_LOAD_UMAX);
setTargetDAGCombine(ISD::ATOMIC_LOAD_FADD);		setTargetDAGCombine(ISD::ATOMIC_LOAD_FADD);
setTargetDAGCombine(ISD::INTRINSIC_VOID);		setTargetDAGCombine(ISD::INTRINSIC_VOID);
setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);		setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN);

// FIXME: In other contexts we pretend this is a per-function property.		// FIXME: In other contexts we pretend this is a per-function property.
setStackPointerRegisterToSaveRestore(AMDGPU::SGPR32);		setStackPointerRegisterToSaveRestore(AMDGPU::SGPR35);

setSchedulingPreference(Sched::RegPressure);		setSchedulingPreference(Sched::RegPressure);
}		}

const GCNSubtarget *SITargetLowering::getSubtarget() const {		const GCNSubtarget *SITargetLowering::getSubtarget() const {
return Subtarget;		return Subtarget;
}		}

▲ Show 20 Lines • Show All 1,333 Lines • ▼ Show 20 Lines	static void reservePrivateMemoryRegs(const TargetMachine &TM,
// For entry functions we have to set up the stack pointer if we use it,		// For entry functions we have to set up the stack pointer if we use it,
// whereas non-entry functions get this "for free". This means there is no		// whereas non-entry functions get this "for free". This means there is no
// intrinsic advantage to using S32 over S34 in cases where we do not have		// intrinsic advantage to using S32 over S34 in cases where we do not have
// calls but do need a frame pointer (i.e. if we are requested to have one		// calls but do need a frame pointer (i.e. if we are requested to have one
// because frame pointer elimination is disabled). To keep things simple we		// because frame pointer elimination is disabled). To keep things simple we
// only ever use S32 as the call ABI stack pointer, and so using it does not		// only ever use S32 as the call ABI stack pointer, and so using it does not
// imply we need a separate frame pointer.		// imply we need a separate frame pointer.
//		//
// Try to use s32 as the SP, but move it if it would interfere with input		// Try to use s35 as the SP, but move it if it would interfere with input
// arguments. This won't work with calls though.		// arguments. This won't work with calls though.
//		//
// FIXME: Move SP to avoid any possible inputs, or find a way to spill input		// FIXME: Move SP to avoid any possible inputs, or find a way to spill input
// registers.		// registers.
if (!MRI.isLiveIn(AMDGPU::SGPR32)) {		if (!MRI.isLiveIn(AMDGPU::SGPR35)) {
Info.setStackPtrOffsetReg(AMDGPU::SGPR32);		Info.setStackPtrOffsetReg(AMDGPU::SGPR35);
} else {		} else {
assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));		assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));

if (MFI.hasCalls())		if (MFI.hasCalls())
report_fatal_error("call in graphics shader with too many input SGPRs");		report_fatal_error("call in graphics shader with too many input SGPRs");

for (unsigned Reg : AMDGPU::SGPR_32RegClass) {		for (unsigned Reg : AMDGPU::SGPR_32RegClass) {
if (!MRI.isLiveIn(Reg)) {		if (!MRI.isLiveIn(Reg)) {
Info.setStackPtrOffsetReg(Reg);		Info.setStackPtrOffsetReg(Reg);
break;		break;
}		}
}		}

if (Info.getStackPtrOffsetReg() == AMDGPU::SP_REG)		if (Info.getStackPtrOffsetReg() == AMDGPU::SP_REG)
report_fatal_error("failed to find register for SP");		report_fatal_error("failed to find register for SP");
}		}

// hasFP should be accurate for entry functions even before the frame is		// hasFP should be accurate for entry functions even before the frame is
// finalized, because it does not rely on the known stack size, only		// finalized, because it does not rely on the known stack size, only
// properties like whether variable sized objects are present.		// properties like whether variable sized objects are present.
if (ST.getFrameLowering()->hasFP(MF)) {		if (ST.getFrameLowering()->hasFP(MF)) {
Info.setFrameOffsetReg(AMDGPU::SGPR33);		Info.setFrameOffsetReg(AMDGPU::SGPR40);
}		}
}		}

bool SITargetLowering::supportSplitCSR(MachineFunction *MF) const {		bool SITargetLowering::supportSplitCSR(MachineFunction *MF) const {
const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
return !Info->isEntryFunction();		return !Info->isEntryFunction();
}		}

▲ Show 20 Lines • Show All 847 Lines • ▼ Show 20 Lines	if (!IsSibCall) {
Chain = DAG.getCALLSEQ_START(Chain, 0, 0, DL);		Chain = DAG.getCALLSEQ_START(Chain, 0, 0, DL);

if (!Subtarget->enableFlatScratch()) {		if (!Subtarget->enableFlatScratch()) {
SmallVector<SDValue, 4> CopyFromChains;		SmallVector<SDValue, 4> CopyFromChains;

// In the HSA case, this should be an identity copy.		// In the HSA case, this should be an identity copy.
SDValue ScratchRSrcReg		SDValue ScratchRSrcReg
= DAG.getCopyFromReg(Chain, DL, Info->getScratchRSrcReg(), MVT::v4i32);		= DAG.getCopyFromReg(Chain, DL, Info->getScratchRSrcReg(), MVT::v4i32);
RegsToPass.emplace_back(AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3, ScratchRSrcReg);		const SIRegisterInfo *TRI = getSubtarget()->getRegisterInfo();
		RegsToPass.emplace_back(TRI->getScratchRSrcReg(), ScratchRSrcReg);
CopyFromChains.push_back(ScratchRSrcReg.getValue(1));		CopyFromChains.push_back(ScratchRSrcReg.getValue(1));
Chain = DAG.getTokenFactor(DL, CopyFromChains);		Chain = DAG.getTokenFactor(DL, CopyFromChains);
}		}
}		}

MVT PtrVT = MVT::i32;		MVT PtrVT = MVT::i32;

// Walk the register/memloc assignments, inserting copies/loads.		// Walk the register/memloc assignments, inserting copies/loads.
▲ Show 20 Lines • Show All 9,264 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	: AMDGPUMachineFunction(MF),
WorkItemIDY(false),		WorkItemIDY(false),
WorkItemIDZ(false),		WorkItemIDZ(false),
ImplicitBufferPtr(false),		ImplicitBufferPtr(false),
ImplicitArgPtr(false),		ImplicitArgPtr(false),
GITPtrHigh(0xffffffff),		GITPtrHigh(0xffffffff),
HighBitsOf32BitAddress(0),		HighBitsOf32BitAddress(0),
GDSSize(0) {		GDSSize(0) {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIRegisterInfo *TRI = ST.getRegisterInfo();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);		FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);
WavesPerEU = ST.getWavesPerEU(F);		WavesPerEU = ST.getWavesPerEU(F);

Occupancy = ST.computeOccupancy(F, getLDSSize());		Occupancy = ST.computeOccupancy(F, getLDSSize());
CallingConv::ID CC = F.getCallingConv();		CallingConv::ID CC = F.getCallingConv();

// FIXME: Should have analysis or something rather than attribute to detect		// FIXME: Should have analysis or something rather than attribute to detect
Show All 17 Lines	if (IsKernel) {
PSInputAddr = AMDGPU::getInitialPSInputAddr(F);		PSInputAddr = AMDGPU::getInitialPSInputAddr(F);
}		}

if (!isEntryFunction()) {		if (!isEntryFunction()) {
if (UseFixedABI)		if (UseFixedABI)
ArgInfo = AMDGPUArgumentUsageInfo::FixedABIFunctionInfo;		ArgInfo = AMDGPUArgumentUsageInfo::FixedABIFunctionInfo;

// TODO: Pick a high register, and shift down, similar to a kernel.		// TODO: Pick a high register, and shift down, similar to a kernel.
FrameOffsetReg = AMDGPU::SGPR33;		FrameOffsetReg = AMDGPU::SGPR40;
StackPtrOffsetReg = AMDGPU::SGPR32;		StackPtrOffsetReg = AMDGPU::SGPR35;

if (!ST.enableFlatScratch()) {		if (!ST.enableFlatScratch()) {
// Non-entry functions have no special inputs for now, other registers		// Non-entry functions have no special inputs for now, other registers
// required for scratch access.		// required for scratch access.
ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;		ScratchRSrcReg = TRI->getScratchRSrcReg();

ArgInfo.PrivateSegmentBuffer =		ArgInfo.PrivateSegmentBuffer =
ArgDescriptor::createRegister(ScratchRSrcReg);		ArgDescriptor::createRegister(ScratchRSrcReg);
}		}

if (!F.hasFnAttribute("amdgpu-no-implicitarg-ptr"))		if (!F.hasFnAttribute("amdgpu-no-implicitarg-ptr"))
ImplicitArgPtr = true;		ImplicitArgPtr = true;
} else {		} else {
▲ Show 20 Lines • Show All 552 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	public:

unsigned getRegPressureSetLimit(const MachineFunction &MF,		unsigned getRegPressureSetLimit(const MachineFunction &MF,
unsigned Idx) const override;		unsigned Idx) const override;

const int *getRegUnitPressureSets(unsigned RegUnit) const override;		const int *getRegUnitPressureSets(unsigned RegUnit) const override;

MCRegister getReturnAddressReg(const MachineFunction &MF) const;		MCRegister getReturnAddressReg(const MachineFunction &MF) const;

		Register getScratchRSrcReg() const;

const TargetRegisterClass *		const TargetRegisterClass *
getRegClassForSizeOnBank(unsigned Size,		getRegClassForSizeOnBank(unsigned Size,
const RegisterBank &Bank,		const RegisterBank &Bank,
const MachineRegisterInfo &MRI) const;		const MachineRegisterInfo &MRI) const;

const TargetRegisterClass *		const TargetRegisterClass *
getRegClassForTypeOnBank(LLT Ty,		getRegClassForTypeOnBank(LLT Ty,
const RegisterBank &Bank,		const RegisterBank &Bank,
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 400 Lines • ▼ Show 20 Lines

bool SIRegisterInfo::hasBasePointer(const MachineFunction &MF) const {		bool SIRegisterInfo::hasBasePointer(const MachineFunction &MF) const {
// When we need stack realignment, we can't reference off of the		// When we need stack realignment, we can't reference off of the
// stack pointer, so we reserve a base pointer.		// stack pointer, so we reserve a base pointer.
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
return MFI.getNumFixedObjects() && shouldRealignStack(MF);		return MFI.getNumFixedObjects() && shouldRealignStack(MF);
}		}

Register SIRegisterInfo::getBaseRegister() const { return AMDGPU::SGPR34; }		Register SIRegisterInfo::getBaseRegister() const { return AMDGPU::SGPR40; }

const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {		const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {
return CSR_AMDGPU_AllVGPRs_RegMask;		return CSR_AMDGPU_AllVGPRs_RegMask;
}		}

const uint32_t *SIRegisterInfo::getAllAGPRRegMask() const {		const uint32_t *SIRegisterInfo::getAllAGPRRegMask() const {
return CSR_AMDGPU_AllAGPRs_RegMask;		return CSR_AMDGPU_AllAGPRs_RegMask;
}		}
▲ Show 20 Lines • Show All 1,964 Lines • ▼ Show 20 Lines	const int *SIRegisterInfo::getRegUnitPressureSets(unsigned RegUnit) const {
if (RegPressureIgnoredUnits[RegUnit])		if (RegPressureIgnoredUnits[RegUnit])
return Empty;		return Empty;

return AMDGPUGenRegisterInfo::getRegUnitPressureSets(RegUnit);		return AMDGPUGenRegisterInfo::getRegUnitPressureSets(RegUnit);
}		}

MCRegister SIRegisterInfo::getReturnAddressReg(const MachineFunction &MF) const {		MCRegister SIRegisterInfo::getReturnAddressReg(const MachineFunction &MF) const {
// Not a callee saved register.		// Not a callee saved register.
return AMDGPU::SGPR30_SGPR31;		return AMDGPU::SGPR28_SGPR29;
		}

		Register SIRegisterInfo::getScratchRSrcReg() const {
		return AMDGPU::SGPR24_SGPR25_SGPR26_SGPR27;
}		}

const TargetRegisterClass *		const TargetRegisterClass *
SIRegisterInfo::getRegClassForSizeOnBank(unsigned Size,		SIRegisterInfo::getRegClassForSizeOnBank(unsigned Size,
const RegisterBank &RB,		const RegisterBank &RB,
const MachineRegisterInfo &MRI) const {		const MachineRegisterInfo &MRI) const {
switch (RB.getID()) {		switch (RB.getID()) {
case AMDGPU::VGPRRegBankID:		case AMDGPU::VGPRRegBankID:
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines