This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
1
AMDGPU.h
-
AMDGPUInstructions.td
-
AMDGPULegalizerInfo.cpp
-
SIInstrInfo.td
-
test/CodeGen/AMDGPU/GlobalISel/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
legalize-ptr-add.mir

Differential D143945

[AMDGPU] Add legalization case for PTR_ADD on buffer pointers
AbandonedPublic

Authored by krzysz00 on Feb 13 2023, 1:29 PM.

Download Raw Diff

Details

Reviewers

arsenm

Group Reviewers

Restricted Project

Summary

On buffer pointers (address space 7), we want PTR_ADD to take 32-bit
offsets, and so, unlike all other address spaces, we do not want to
enforce the condition that those G_PTR_ADD instructions have an offset
of the same size as their pointer operand.

(Also, fix some comments and missing pattern fragment definitions
while we're here.)

Depends on D143526

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Feb 13 2023, 1:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 1:29 PM

Herald added subscribers: kosarev, foad, kerbowa and 8 others. · View Herald Transcript

krzysz00 requested review of this revision.Feb 13 2023, 1:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2023, 1:29 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

krzysz00 added reviewers: arsenm, Restricted Project.Feb 13 2023, 1:30 PM

Harbormaster completed remote builds in B213495: Diff 497090.Feb 13 2023, 1:30 PM

I think we might want to consider whether we should be using g_ptr_add at all, or if a different instruction would be more appropriate for a fat pointer

@arsenm Yeah, I think there's room for a broader design conversation here. My theory was that we could pattern-match PTR_ADD and G_{LOAD,STORE,...} into the relevant buffer instructions at some point in the lowering process, but I didn't have a good sense of where - I figured it'd be after legalization pass, though.

My sense of the goal is to try and keep the fat pointers as p7 values as long as we can manage it, although that does mean we'd need to do things like the voffset/imm split somewhere further down the codegen pipeline than I think they currently are.

The other option would be to lower these G_LOAD operations into the relevant intrinsic instructions pretty early on and then to pattern-match all the address calculation onto them, which might have the downside that it loses alias information?

(I'm also suspecting that the definitions of a lot of the buffer ops might need to be loosened from s4i32/v4i32, to SReg_128/VReg_128, since that's the actual constraint, but that's not this patch)

I guess. we can go with this for now. I think this doesn't really work the same, since e.g. what happens on overflow of the 48-bit pointer part?

We do need to be able to handle these standalone, independent of a buffer instruction which may require emulation of any bounds checking it performs?

This revision is now accepted and ready to land.Feb 13 2023, 4:52 PM

piotr added subscribers: nhaehnle, piotr.Feb 14 2023, 12:40 AM

piotr added inline comments.

llvm/lib/Target/AMDGPU/AMDGPU.h
382	This is now in conflict with the description in AMDGPUUsage.rst, which says 160-bit. In fact, making it 160-bit (including offset) was a deliberate choice, see discussion in https://reviews.llvm.org/D58957?id=189289#inline-522534. But perhaps now it's good time to revisit the discussion, fyi @nhaehnle.

@piotr @nhaehnle @sheredom My proposal here - and in the patch stack in general, is that we do revisit that 160-bit idea.

How I imagine this working is that, if you GEP a buffer descriptor, the operation that that lowers to - here G_PTR_ADD - does not modify the input buffer descriptor value at all. Instead, all these offsets we add are accumulated into the voffset and imm fields of the relevant MUBUF instruction.

That means that

%y = gep i32, ptr addrspace(7) %x, i32 1
%z = load i32, ptr addrspace(7) %y

lowers to

[%z] = BUFFER_LOAD_DWORD [%x], offset:4

and any pattern of code that can't be translated into offsets on a load/store/atomic is unsupported.

Now, this does mean that, for example

%y = gep i32 ptr addrspace(7) %x, i32 1
%z = ptrtoint i128 %y

produces ... who the heck knows what ... in %z, but buffer descriptors are a very weird kind of pointer and anyone emitting such code either knows exactly what they're doing or is Wrong. (My gut tells me that, if I _had_ to pick a behavior, ptrtoint %y == ptrtoint %x)

Now, in the case that the buffer descriptor itself in non-uniform, I'm willing to initially declare that case a "instruction selection failed" case and perhaps add emulation in the future.

krzysz00 mentioned this in D143522: [AMDGPU] Set a data layout entry for buffer descriptors (addrspace 7).Feb 14 2023, 9:16 AM

In D143945#4126397, @krzysz00 wrote:

@piotr @nhaehnle @sheredom My proposal here - and in the patch stack in general, is that we do revisit that 160-bit idea.

How I imagine this working is that, if you GEP a buffer descriptor, the operation that that lowers to - here G_PTR_ADD - does not modify the input buffer descriptor value at all. Instead, all these offsets we add are accumulated into the voffset and imm fields of the relevant MUBUF instruction.

Yes. In other words, the GEP doesn't modify the 128 bits of buffer descriptor that are in the pointer, but it does modify the 32 bits of offset that are in the pointer.

That means that

%y = gep i32, ptr addrspace(7) %x, i32 1
%z = load i32, ptr addrspace(7) %y

lowers to

[%z] = BUFFER_LOAD_DWORD [%x], offset:4

That doesn't quite work, though, because %x itself might already have an offset. It really needs to translate to something like:

%y_offset = G_ADD %x_offset, 4
%z = G_AMDGPU_BUFFER_LOAD_DWORD %y_offset, %x_buffer

which can then later split %y_offset into a voffset and an inst_offset part during instruction selection.

Now, this does mean that, for example
%y = gep i32 ptr addrspace(7) %x, i32 1
%z = ptrtoint i128 %y
produces ... who the heck knows what ... in %z, but buffer descriptors are a very weird kind of pointer and anyone emitting such code either knows exactly what they're doing or is Wrong. (My gut tells me that, if I _had_ to pick a behavior, ptrtoint %y == ptrtoint %x)

This problem is very easily solved by making the pointers into 160-bit quantities as initially intended.

Note that @lgc.late.launder.fat.pointer is not a full inttoptr for fat pointers. It so happens that in LLPC we never need an initial offset different from 0, and so that's why this lgc "intrinsic" doesn't have the offset part as input even though it sort of plays the role of inttoptr. But if you look at how the operation is lowered in our PatchBufferOp pass, you'll see clearly that it produces a 160-bit quantity: 128 bits of buffer descriptor + 32 bits of offset (which just happen to always be 0).

Now, in the case that the buffer descriptor itself in non-uniform, I'm willing to initially declare that case a "instruction selection failed" case and perhaps add emulation in the future.

That's not acceptable for the graphics use cases.

(quick notes from phone)

I'm not convinced you need to store the offset in the actual value of the pointer itself. That is, because addres space 7 is non-integral, the only operations that can modify the address a buffer descriptor points to (uhlees the user really wants to do weird things) are GEPs.

This means that, to get the offset argument to the load operation, all we have to do is trace through the G_PTR_ADD operations and add those together until we reach a G_PTR_ADD input that didn't come from a GEP (ok, there is the complexity that we'd need to replicate comdjtiinaks if we really wanted to do this trace right), take the sum of their offset values that offset value, use it as the offset to the load/store

What worries me about the i160 approach is that it'll mess up things like allocating function arguments: if I want to pass a buffer descriptor to a function, that takes four registers and not five.

In D143945#4133628, @krzysz00 wrote:

I'm not convinced you need to store the offset in the actual value of the pointer itself. That is, because addres space 7 is non-integral, the only operations that can modify the address a buffer descriptor points to (uhlees the user really wants to do weird things) are GEPs.

This means that, to get the offset argument to the load operation, all we have to do is trace through the G_PTR_ADD operations and add those together until we reach a G_PTR_ADD input that didn't come from a GEP

That doesn't work for a couple of reasons. First the G_PTR_ADDs could be in a loop or other complex control flow, so you can't just statically trace them back through the program. Second there are other things you can do with arbitrary pointers in IR that the backend needs to be able to implement: you can store them to memory and load them later, you can pass them into and out of function calls, you can use them in phi and select instructions, and so on.

(ok, there is the complexity that we'd need to replicate comdjtiinaks if we really wanted to do this trace right)

Good point, I'd forgotten about the comdjtiinaks :)

Ok, to elaborate the proposal - which is wonky enough that going i160 might be the right call here, even though we don't have a s_load_dwordx5 and so we'd need something like that buffer pointer launder intrinsic - is that (given that we replace all G_LOAD with a G_OFFSET_LOAD) we replace

%y = G_PTR_ADD p7 %x, s32 %a
%v = G_OFFSET_LOAD p7 %y, s32 %b

with

%newOff = G_ADD s32 %a, s32 %b
%v = G_OFFSET_LOAD p7 %x, s32 %newOff

. If we get to select/phi, we duplicate those into select/phi on the offset and select/phi on the base pointer, and if we start storing things, we materialize the offset by incrementing the base address in the descriptor and then decrementing the buffer extent by the same amount.

But, having written that out, i160 and having a <4 x i32>/i128/... to address space 7 intrinsic - or even not needing one, since inttoptr exists - may well be the better approach here.

The catch, I think, will be *when* we split away the offset part and the pointer part during ISel - and whether then it'll be too late to run uniformity analysis on the pointer part.

... on the third hand, the "increment address, decrement extent" fallback for GEPs that don't merge into a load or store isn't _that_ absurd

This isn't a path we're going down anymore, closing

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

94 lines

AMDGPUInstructions.td

11 lines

AMDGPULegalizerInfo.cpp

37 lines

SIInstrInfo.td

3 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-ptr-add.mir

114 lines

Diff 497090

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 359 Lines • ▼ Show 20 Lines

	/// OpenCL uses address spaces to differentiate between			/// OpenCL uses address spaces to differentiate between
	/// various memory regions on the hardware. On the CPU			/// various memory regions on the hardware. On the CPU
	/// all of the address spaces point to the same memory,			/// all of the address spaces point to the same memory,
	/// however on the GPU, each address space points to			/// however on the GPU, each address space points to
	/// a separate piece of memory that is unique from other			/// a separate piece of memory that is unique from other
	/// memory locations.			/// memory locations.
	namespace AMDGPUAS {			namespace AMDGPUAS {
	enum : unsigned {			enum : unsigned {
	// The maximum value for flat, generic, local, private, constant and region.			// The maximum value for flat, generic, local, private, constant and region.
	MAX_AMDGPU_ADDRESS = 7,			MAX_AMDGPU_ADDRESS = 7,

	FLAT_ADDRESS = 0, ///< Address space for flat memory.			FLAT_ADDRESS = 0, ///< Address space for flat memory.
	GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).			GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
	REGION_ADDRESS = 2, ///< Address space for region memory. (GDS)			REGION_ADDRESS = 2, ///< Address space for region memory. (GDS)

	CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2).			CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2).
	LOCAL_ADDRESS = 3, ///< Address space for local memory.			LOCAL_ADDRESS = 3, ///< Address space for local memory.
	PRIVATE_ADDRESS = 5, ///< Address space for private memory.			PRIVATE_ADDRESS = 5, ///< Address space for private memory.

	CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory.			CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory.

	BUFFER_FAT_POINTER = 7, ///< Address space for 160-bit buffer fat pointers.			BUFFER_FAT_POINTER = 7, ///< Address space for 128-bit buffer fat pointers.
				piotrUnsubmitted Not Done Reply Inline Actions This is now in conflict with the description in AMDGPUUsage.rst, which says 160-bit. In fact, making it 160-bit (including offset) was a deliberate choice, see discussion in https://reviews.llvm.org/D58957?id=189289#inline-522534. But perhaps now it's good time to revisit the discussion, fyi @nhaehnle. piotr: This is now in conflict with the description in AMDGPUUsage.rst, which says 160-bit. In fact…

	/// Address space for direct addressable parameter memory (CONST0).			/// Address space for direct addressable parameter memory (CONST0).
	PARAM_D_ADDRESS = 6,			PARAM_D_ADDRESS = 6,
	/// Address space for indirect addressable parameter memory (VTX1).			/// Address space for indirect addressable parameter memory (VTX1).
	PARAM_I_ADDRESS = 7,			PARAM_I_ADDRESS = 7,

	// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on			// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on
	// this order to be able to dynamically index a constant buffer, for			// this order to be able to dynamically index a constant buffer, for
	// example:			// example:
	//			//
	// ConstantBufferAS = CONSTANT_BUFFER_0 + CBIdx			// ConstantBufferAS = CONSTANT_BUFFER_0 + CBIdx

	CONSTANT_BUFFER_0 = 8,			CONSTANT_BUFFER_0 = 8,
	CONSTANT_BUFFER_1 = 9,			CONSTANT_BUFFER_1 = 9,
	CONSTANT_BUFFER_2 = 10,			CONSTANT_BUFFER_2 = 10,
	CONSTANT_BUFFER_3 = 11,			CONSTANT_BUFFER_3 = 11,
	CONSTANT_BUFFER_4 = 12,			CONSTANT_BUFFER_4 = 12,
	CONSTANT_BUFFER_5 = 13,			CONSTANT_BUFFER_5 = 13,
	CONSTANT_BUFFER_6 = 14,			CONSTANT_BUFFER_6 = 14,
	CONSTANT_BUFFER_7 = 15,			CONSTANT_BUFFER_7 = 15,
	CONSTANT_BUFFER_8 = 16,			CONSTANT_BUFFER_8 = 16,
	CONSTANT_BUFFER_9 = 17,			CONSTANT_BUFFER_9 = 17,
	CONSTANT_BUFFER_10 = 18,			CONSTANT_BUFFER_10 = 18,
	CONSTANT_BUFFER_11 = 19,			CONSTANT_BUFFER_11 = 19,
	CONSTANT_BUFFER_12 = 20,			CONSTANT_BUFFER_12 = 20,
	CONSTANT_BUFFER_13 = 21,			CONSTANT_BUFFER_13 = 21,
	CONSTANT_BUFFER_14 = 22,			CONSTANT_BUFFER_14 = 22,
	CONSTANT_BUFFER_15 = 23,			CONSTANT_BUFFER_15 = 23,

	// Some places use this if the address space can't be determined.			// Some places use this if the address space can't be determined.
	UNKNOWN_ADDRESS_SPACE = ~0u,			UNKNOWN_ADDRESS_SPACE = ~0u,
	};			};
	}			}

	namespace AMDGPU {			namespace AMDGPU {

	// FIXME: Missing constant_32bit			// FIXME: Missing constant_32bit
	inline bool isFlatGlobalAddrSpace(unsigned AS) {			inline bool isFlatGlobalAddrSpace(unsigned AS) {
	return AS == AMDGPUAS::GLOBAL_ADDRESS \|\|			return AS == AMDGPUAS::GLOBAL_ADDRESS \|\|
	AS == AMDGPUAS::FLAT_ADDRESS \|\|			AS == AMDGPUAS::FLAT_ADDRESS \|\|
	AS == AMDGPUAS::CONSTANT_ADDRESS \|\|			AS == AMDGPUAS::CONSTANT_ADDRESS \|\|
	AS > AMDGPUAS::MAX_AMDGPU_ADDRESS;			AS > AMDGPUAS::MAX_AMDGPU_ADDRESS;
	}			}
	}			}

	} // End namespace llvm			} // End namespace llvm

	#endif			#endif

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td

	Show All 13 Lines
	class AddressSpacesImpl {			class AddressSpacesImpl {
	int Flat = 0;			int Flat = 0;
	int Global = 1;			int Global = 1;
	int Region = 2;			int Region = 2;
	int Local = 3;			int Local = 3;
	int Constant = 4;			int Constant = 4;
	int Private = 5;			int Private = 5;
	int Constant32Bit = 6;			int Constant32Bit = 6;
				int Buffer = 7;
	}			}

	def AddrSpaces : AddressSpacesImpl;			def AddrSpaces : AddressSpacesImpl;


	class AMDGPUInst <dag outs, dag ins, string asm = "",			class AMDGPUInst <dag outs, dag ins, string asm = "",
	list<dag> pattern = []> : Instruction {			list<dag> pattern = []> : Instruction {
	field bit isRegisterLoad = 0;			field bit isRegisterLoad = 0;
	▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines
	def StoreAddress_private : AddressSpaceList<[ AddrSpaces.Private ]>;			def StoreAddress_private : AddressSpaceList<[ AddrSpaces.Private ]>;

	def LoadAddress_local : AddressSpaceList<[ AddrSpaces.Local ]>;			def LoadAddress_local : AddressSpaceList<[ AddrSpaces.Local ]>;
	def StoreAddress_local : AddressSpaceList<[ AddrSpaces.Local ]>;			def StoreAddress_local : AddressSpaceList<[ AddrSpaces.Local ]>;

	def LoadAddress_region : AddressSpaceList<[ AddrSpaces.Region ]>;			def LoadAddress_region : AddressSpaceList<[ AddrSpaces.Region ]>;
	def StoreAddress_region : AddressSpaceList<[ AddrSpaces.Region ]>;			def StoreAddress_region : AddressSpaceList<[ AddrSpaces.Region ]>;

				def LoadAddress_buffer : AddressSpaceList<[ AddrSpaces.Buffer ]>;
				def StoreAddress_buffer : AddressSpaceList<[ AddrSpaces.Buffer ]>;


	foreach as = [ "global", "flat", "constant", "local", "private", "region" ] in {
				foreach as = [ "global", "flat", "constant", "local", "private", "region", "buffer" ] in {
	let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {			let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {

	def load_#as : PatFrag<(ops node:$ptr), (unindexedload node:$ptr)> {			def load_#as : PatFrag<(ops node:$ptr), (unindexedload node:$ptr)> {
	let IsLoad = 1;			let IsLoad = 1;
	let IsNonExtLoad = 1;			let IsNonExtLoad = 1;
	}			}

	def extloadi8_#as : PatFrag<(ops node:$ptr), (extloadi8 node:$ptr)> {			def extloadi8_#as : PatFrag<(ops node:$ptr), (extloadi8 node:$ptr)> {
	Show All 38 Lines
	def atomic_load_64_#as : PatFrag<(ops node:$ptr), (atomic_load_64 node:$ptr)> {			def atomic_load_64_#as : PatFrag<(ops node:$ptr), (atomic_load_64 node:$ptr)> {
	let IsAtomic = 1;			let IsAtomic = 1;
	let MemoryVT = i64;			let MemoryVT = i64;
	}			}
	} // End let AddressSpaces			} // End let AddressSpaces
	} // End foreach as			} // End foreach as


	foreach as = [ "global", "flat", "local", "private", "region" ] in {			foreach as = [ "global", "flat", "local", "private", "region" , "buffer"] in {
	let IsStore = 1, AddressSpaces = !cast<AddressSpaceList>("StoreAddress_"#as).AddrSpaces in {			let IsStore = 1, AddressSpaces = !cast<AddressSpaceList>("StoreAddress_"#as).AddrSpaces in {
	def store_#as : PatFrag<(ops node:$val, node:$ptr),			def store_#as : PatFrag<(ops node:$val, node:$ptr),
	(unindexedstore node:$val, node:$ptr)> {			(unindexedstore node:$val, node:$ptr)> {
	let IsTruncStore = 0;			let IsTruncStore = 0;
	}			}

	// truncstore fragments.			// truncstore fragments.
	def truncstore_#as : PatFrag<(ops node:$val, node:$ptr),			def truncstore_#as : PatFrag<(ops node:$val, node:$ptr),
	▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	}			}

	multiclass noret_ternary_atomic_op<SDNode atomic_op> {			multiclass noret_ternary_atomic_op<SDNode atomic_op> {
	let HasNoUse = true in			let HasNoUse = true in
	defm "_noret" : ternary_atomic_op<atomic_op>;			defm "_noret" : ternary_atomic_op<atomic_op>;
	}			}

	multiclass binary_atomic_op_all_as<SDNode atomic_op, bit IsInt = 1> {			multiclass binary_atomic_op_all_as<SDNode atomic_op, bit IsInt = 1> {
	foreach as = [ "global", "flat", "constant", "local", "private", "region" ] in {			foreach as = [ "global", "flat", "constant", "local", "private",
				"region", "buffer" ] in {
	let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {			let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {
	defm "_"#as : binary_atomic_op<atomic_op, IsInt>;			defm "_"#as : binary_atomic_op<atomic_op, IsInt>;
	defm "_"#as : noret_binary_atomic_op<atomic_op, IsInt>;			defm "_"#as : noret_binary_atomic_op<atomic_op, IsInt>;
	}			}
	}			}
	}			}

	defm atomic_swap : binary_atomic_op_all_as<atomic_swap>;			defm atomic_swap : binary_atomic_op_all_as<atomic_swap>;
	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show All 16 Lines
#include "AMDGPUGlobalISelUtils.h"		#include "AMDGPUGlobalISelUtils.h"
#include "AMDGPUInstrInfo.h"		#include "AMDGPUInstrInfo.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"		#include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"
		#include "llvm/CodeGen/GlobalISel/LegalizerInfo.h"
#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"		#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"		#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsR600.h"		#include "llvm/IR/IntrinsicsR600.h"

#define DEBUG_TYPE "amdgpu-legalinfo"		#define DEBUG_TYPE "amdgpu-legalinfo"

▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	static unsigned maxSizeForAddrSpace(const GCNSubtarget &ST, unsigned AS,
bool IsLoad, bool IsAtomic) {		bool IsLoad, bool IsAtomic) {
switch (AS) {		switch (AS) {
case AMDGPUAS::PRIVATE_ADDRESS:		case AMDGPUAS::PRIVATE_ADDRESS:
// FIXME: Private element size.		// FIXME: Private element size.
return ST.enableFlatScratch() ? 128 : 32;		return ST.enableFlatScratch() ? 128 : 32;
case AMDGPUAS::LOCAL_ADDRESS:		case AMDGPUAS::LOCAL_ADDRESS:
return ST.useDS128() ? 128 : 64;		return ST.useDS128() ? 128 : 64;
case AMDGPUAS::GLOBAL_ADDRESS:		case AMDGPUAS::GLOBAL_ADDRESS:
		case AMDGPUAS::BUFFER_FAT_POINTER:
case AMDGPUAS::CONSTANT_ADDRESS:		case AMDGPUAS::CONSTANT_ADDRESS:
case AMDGPUAS::CONSTANT_ADDRESS_32BIT:		case AMDGPUAS::CONSTANT_ADDRESS_32BIT:
// Treat constant and global as identical. SMRD loads are sometimes usable for		// Treat constant and global as identical. SMRD loads are sometimes usable for
// global loads (ideally constant address space should be eliminated)		// global loads (ideally constant address space should be eliminated)
// depending on the context. Legality cannot be context dependent, but		// depending on the context. Legality cannot be context dependent, but
// RegBankSelect can split the load as necessary depending on the pointer		// RegBankSelect can split the load as necessary depending on the pointer
// register bank/uniformity and if the memory is invariant or not written in a		// register bank/uniformity and if the memory is invariant or not written in a
// kernel.		// kernel.
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,

const LLT GlobalPtr = GetAddrSpacePtr(AMDGPUAS::GLOBAL_ADDRESS);		const LLT GlobalPtr = GetAddrSpacePtr(AMDGPUAS::GLOBAL_ADDRESS);
const LLT ConstantPtr = GetAddrSpacePtr(AMDGPUAS::CONSTANT_ADDRESS);		const LLT ConstantPtr = GetAddrSpacePtr(AMDGPUAS::CONSTANT_ADDRESS);
const LLT Constant32Ptr = GetAddrSpacePtr(AMDGPUAS::CONSTANT_ADDRESS_32BIT);		const LLT Constant32Ptr = GetAddrSpacePtr(AMDGPUAS::CONSTANT_ADDRESS_32BIT);
const LLT LocalPtr = GetAddrSpacePtr(AMDGPUAS::LOCAL_ADDRESS);		const LLT LocalPtr = GetAddrSpacePtr(AMDGPUAS::LOCAL_ADDRESS);
const LLT RegionPtr = GetAddrSpacePtr(AMDGPUAS::REGION_ADDRESS);		const LLT RegionPtr = GetAddrSpacePtr(AMDGPUAS::REGION_ADDRESS);
const LLT FlatPtr = GetAddrSpacePtr(AMDGPUAS::FLAT_ADDRESS);		const LLT FlatPtr = GetAddrSpacePtr(AMDGPUAS::FLAT_ADDRESS);
const LLT PrivatePtr = GetAddrSpacePtr(AMDGPUAS::PRIVATE_ADDRESS);		const LLT PrivatePtr = GetAddrSpacePtr(AMDGPUAS::PRIVATE_ADDRESS);
		const LLT BufferDesc = GetAddrSpacePtr(AMDGPUAS::BUFFER_FAT_POINTER);

const LLT CodePtr = FlatPtr;		const LLT CodePtr = FlatPtr;

const std::initializer_list<LLT> AddrSpaces64 = {		const std::initializer_list<LLT> AddrSpaces64 = {
GlobalPtr, ConstantPtr, FlatPtr		GlobalPtr, ConstantPtr, FlatPtr
};		};

const std::initializer_list<LLT> AddrSpaces32 = {		const std::initializer_list<LLT> AddrSpaces32 = {
LocalPtr, PrivatePtr, Constant32Ptr, RegionPtr		LocalPtr, PrivatePtr, Constant32Ptr, RegionPtr
};		};

		const std::initializer_list<LLT> AddrSpaces128 = {BufferDesc};

const std::initializer_list<LLT> FPTypesBase = {		const std::initializer_list<LLT> FPTypesBase = {
S32, S64		S32, S64
};		};

const std::initializer_list<LLT> FPTypes16 = {		const std::initializer_list<LLT> FPTypes16 = {
S32, S64, S16		S32, S64, S16
};		};

const std::initializer_list<LLT> FPTypesPK16 = {		const std::initializer_list<LLT> FPTypesPK16 = {
S32, S64, S16, V2S16		S32, S64, S16, V2S16
};		};

const LLT MinScalarFPTy = ST.has16BitInsts() ? S16 : S32;		const LLT MinScalarFPTy = ST.has16BitInsts() ? S16 : S32;

// s1 for VCC branches, s32 for SCC branches.		// s1 for VCC branches, s32 for SCC branches.
getActionDefinitionsBuilder(G_BRCOND).legalFor({S1, S32});		getActionDefinitionsBuilder(G_BRCOND).legalFor({S1, S32});

// TODO: All multiples of 32, vectors of pointers, all v2s16 pairs, more		// TODO: All multiples of 32, vectors of pointers, all v2s16 pairs, more
// elements for v3s16		// elements for v3s16
getActionDefinitionsBuilder(G_PHI)		getActionDefinitionsBuilder(G_PHI)
.legalFor({S32, S64, V2S16, S16, V4S16, S1, S128, S256})		.legalFor({S32, S64, V2S16, S16, V4S16, S1, S128, S256})
.legalFor(AllS32Vectors)		.legalFor(AllS32Vectors)
.legalFor(AllS64Vectors)		.legalFor(AllS64Vectors)
.legalFor(AddrSpaces64)		.legalFor(AddrSpaces64)
.legalFor(AddrSpaces32)		.legalFor(AddrSpaces32)
		.legalFor(AddrSpaces128)
.legalIf(isPointer(0))		.legalIf(isPointer(0))
.clampScalar(0, S16, S256)		.clampScalar(0, S16, S256)
.widenScalarToNextPow2(0, 32)		.widenScalarToNextPow2(0, 32)
.clampMaxNumElements(0, S32, 16)		.clampMaxNumElements(0, S32, 16)
.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))		.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
.scalarize(0);		.scalarize(0);

if (ST.hasVOP3PInsts() && ST.hasAddNoCarry() && ST.hasIntClamp()) {		if (ST.hasVOP3PInsts() && ST.hasAddNoCarry() && ST.hasIntClamp()) {
// Full set of gfx9 features.		// Full set of gfx9 features.
getActionDefinitionsBuilder({G_ADD, G_SUB})		getActionDefinitionsBuilder({G_ADD, G_SUB})
.legalFor({S32, S16, V2S16})		.legalFor({S32, S16, V2S16})
.clampMaxNumElementsStrict(0, S16, 2)		.clampMaxNumElementsStrict(0, S16, 2)
.scalarize(0)		.scalarize(0)
.minScalar(0, S16)		.minScalar(0, S16)
▲ Show 20 Lines • Show All 364 Lines • ▼ Show 20 Lines	if (ST.has16BitInsts()) {
getActionDefinitionsBuilder({G_INTRINSIC_TRUNC, G_FCEIL, G_FRINT})		getActionDefinitionsBuilder({G_INTRINSIC_TRUNC, G_FCEIL, G_FRINT})
.legalFor({S32})		.legalFor({S32})
.customFor({S64})		.customFor({S64})
.clampScalar(0, S32, S64)		.clampScalar(0, S32, S64)
.scalarize(0);		.scalarize(0);
}		}

getActionDefinitionsBuilder(G_PTR_ADD)		getActionDefinitionsBuilder(G_PTR_ADD)
.legalIf(all(isPointer(0), sameSize(0, 1)))		.legalIf(all(isPointer(0), sameSize(0, 1)))
		.legalIf(all(isPointer(0, AMDGPUAS::BUFFER_FAT_POINTER), sizeIs(1, 32)))
.scalarize(0)		.scalarize(0)
		.minScalarIf(isPointer(0, AMDGPUAS::BUFFER_FAT_POINTER), 1, S32)
		.maxScalarIf(isPointer(0, AMDGPUAS::BUFFER_FAT_POINTER), 1, S32)
.scalarSameSizeAs(1, 0);		.scalarSameSizeAs(1, 0);

getActionDefinitionsBuilder(G_PTRMASK)		getActionDefinitionsBuilder(G_PTRMASK)
.legalIf(all(sameSize(0, 1), typeInSet(1, {S64, S32})))		.legalIf(all(sameSize(0, 1), typeInSet(1, {S64, S32})))
.scalarSameSizeAs(1, 0)		.scalarSameSizeAs(1, 0)
.scalarize(0);		.scalarize(0);

auto &CmpBuilder =		auto &CmpBuilder =
getActionDefinitionsBuilder(G_ICMP)		getActionDefinitionsBuilder(G_ICMP)
▲ Show 20 Lines • Show All 4,893 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 661 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class LoadD16Frag <SDPatternOperator op> : PatFrag<			class LoadD16Frag <SDPatternOperator op> : PatFrag<
	(ops node:$ptr, node:$tied_in),			(ops node:$ptr, node:$tied_in),
	(op node:$ptr, node:$tied_in)> {			(op node:$ptr, node:$tied_in)> {
	let IsLoad = 1;			let IsLoad = 1;
	}			}

	foreach as = [ "global", "flat", "constant", "local", "private", "region" ] in {			foreach as = [ "global", "flat", "constant", "local", "private",
				"region", "buffer" ] in {
	let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {			let AddressSpaces = !cast<AddressSpaceList>("LoadAddress_"#as).AddrSpaces in {

	def load_d16_hi_#as : LoadD16Frag <SIload_d16_hi>;			def load_d16_hi_#as : LoadD16Frag <SIload_d16_hi>;

	def az_extloadi8_d16_hi_#as : LoadD16Frag <SIload_d16_hi_u8> {			def az_extloadi8_d16_hi_#as : LoadD16Frag <SIload_d16_hi_u8> {
	let MemoryVT = i8;			let MemoryVT = i8;
	}			}

	▲ Show 20 Lines • Show All 2,365 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ptr-add.mir

Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	bb.0:
; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p3) = G_PTR_ADD [[UV1]], [[TRUNC1]](s32)		; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p3) = G_PTR_ADD [[UV1]], [[TRUNC1]](s32)
; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p3>) = G_BUILD_VECTOR [[PTR_ADD]](p3), [[PTR_ADD1]](p3)		; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p3>) = G_BUILD_VECTOR [[PTR_ADD]](p3), [[PTR_ADD1]](p3)
; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x p3>)		; CHECK-NEXT: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x p3>)
%0:_(<2 x p3>) = COPY $vgpr0_vgpr1		%0:_(<2 x p3>) = COPY $vgpr0_vgpr1
%1:_(<2 x s64>) = COPY $vgpr2_vgpr3_vgpr4_vgpr5		%1:_(<2 x s64>) = COPY $vgpr2_vgpr3_vgpr4_vgpr5
%2:_(<2 x p3>) = G_PTR_ADD %0, %1		%2:_(<2 x p3>) = G_PTR_ADD %0, %1
$vgpr0_vgpr1 = COPY %2		$vgpr0_vgpr1 = COPY %2
...		...

		---
		name: test_gep_buffer_i32_idx
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4

		; CHECK-LABEL: name: test_gep_buffer_i32_idx
		; CHECK: liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p7) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
		; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr4
		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[COPY]], [[COPY1]](s32)
		; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[PTR_ADD]](p7)
		%0:_(p7) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
		%1:_(s32) = COPY $vgpr4
		%2:_(p7) = G_PTR_ADD %0, %1

		$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %2
		...

		---
		name: test_gep_buffer_v2p7_v2i32
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7, $vgpr8_vgpr9

		; CHECK-LABEL: name: test_gep_buffer_v2p7_v2i32
		; CHECK: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7, $vgpr8_vgpr9
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<2 x p7>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
		; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr8_vgpr9
		; CHECK-NEXT: [[UV:%[0-9]+]]:_(p7), [[UV1:%[0-9]+]]:_(p7) = G_UNMERGE_VALUES [[COPY]](<2 x p7>)
		; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[UV]], [[UV2]](s32)
		; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p7) = G_PTR_ADD [[UV1]], [[UV3]](s32)
		; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p7>) = G_BUILD_VECTOR [[PTR_ADD]](p7), [[PTR_ADD1]](p7)
		; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = COPY [[BUILD_VECTOR]](<2 x p7>)
		%0:_(<2 x p7>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
		%1:_(<2 x s32>) = COPY $vgpr8_vgpr9
		%2:_(<2 x p7>) = G_PTR_ADD %0, %1

		$vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = COPY %2
		...

		---
		name: test_gep_buffer_v2p7_v2i64
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7, $vgpr8_vgpr9_vgpr10_vgpr11

		; CHECK-LABEL: name: test_gep_buffer_v2p7_v2i64
		; CHECK: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7, $vgpr8_vgpr9_vgpr10_vgpr11
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: [[COPY:%[0-9]+]]:_(<2 x p7>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
		; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(<2 x s64>) = COPY $vgpr8_vgpr9_vgpr10_vgpr11
		; CHECK-NEXT: [[UV:%[0-9]+]]:_(p7), [[UV1:%[0-9]+]]:_(p7) = G_UNMERGE_VALUES [[COPY]](<2 x p7>)
		; CHECK-NEXT: [[UV2:%[0-9]+]]:_(s64), [[UV3:%[0-9]+]]:_(s64) = G_UNMERGE_VALUES [[COPY1]](<2 x s64>)
		; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[UV2]](s64)
		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[UV]], [[TRUNC]](s32)
		; CHECK-NEXT: [[TRUNC1:%[0-9]+]]:_(s32) = G_TRUNC [[UV3]](s64)
		; CHECK-NEXT: [[PTR_ADD1:%[0-9]+]]:_(p7) = G_PTR_ADD [[UV1]], [[TRUNC1]](s32)
		; CHECK-NEXT: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x p7>) = G_BUILD_VECTOR [[PTR_ADD]](p7), [[PTR_ADD1]](p7)
		; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = COPY [[BUILD_VECTOR]](<2 x p7>)
		%0:_(<2 x p7>) = COPY $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
		%1:_(<2 x s64>) = COPY $vgpr8_vgpr9_vgpr10_vgpr11
		%2:_(<2 x p7>) = G_PTR_ADD %0, %1

		$vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = COPY %2
		...

		---
		name: test_gep_buffer_i64_index
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4_vgpr5

		; CHECK-LABEL: name: test_gep_buffer_i64_index
		; CHECK: liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4_vgpr5
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p7) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
		; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr4_vgpr5
		; CHECK-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY1]](s64)
		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[COPY]], [[TRUNC]](s32)
		; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[PTR_ADD]](p7)
		%0:_(p7) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
		%1:_(s64) = COPY $vgpr4_vgpr5
		%2:_(p7) = G_PTR_ADD %0, %1

		$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %2
		...

		---
		name: test_gep_buffer_i16_index
		body: \|
		bb.0:
		liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4

		; CHECK-LABEL: name: test_gep_buffer_i16_index
		; CHECK: liveins: $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr4
		; CHECK-NEXT: {{ $}}
		; CHECK-NEXT: [[COPY:%[0-9]+]]:_(p7) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
		; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr4
		; CHECK-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY1]], 16
		; CHECK-NEXT: [[PTR_ADD:%[0-9]+]]:_(p7) = G_PTR_ADD [[COPY]], [[SEXT_INREG]](s32)
		; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[PTR_ADD]](p7)
		%0:_(p7) = COPY $vgpr0_vgpr1_vgpr2_vgpr3
		%1:_(s32) = COPY $vgpr4
		%2:_(s16) = G_TRUNC %1
		%3:_(p7) = G_PTR_ADD %0, %2

		$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %3
		...