This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add an experimental buffer fat pointer address space.
ClosedPublic

Authored by sheredom on Mar 5 2019, 3:07 AM.

Download Raw Diff

Details

Reviewers

tpr
nhaehnle
arsenm

Commits

rG523dab07887f: [AMDGPU] Add an experimental buffer fat pointer address space.
rL356373: [AMDGPU] Add an experimental buffer fat pointer address space.

Summary

Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend.

Diff Detail

Event Timeline

sheredom created this revision.Mar 5 2019, 3:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2019, 3:07 AM

Herald added subscribers: llvm-commits, jfb, t-tye and 5 others. · View Herald Transcript

I think this will be accepted right now as a no-op addrspacecast to any of the global-like address spaces

docs/AMDGPUUsage.rst
294–303	Might as well reserve more for 256-bit descriptors. We'll probably need several of these eventually
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	Shouldn't this add p7:128?
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
257	This probably shouldn't be included now, at least without a test
lib/Target/AMDGPU/SIISelLowering.cpp
1050	Ditto

sheredom marked 3 inline comments as done.Mar 5 2019, 8:20 AM

sheredom added inline comments.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	It'd need to be p7:160 - but I'm entirely unsure whether LLVM will drop a lung on a non-power of 2 pointer size.
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
257	It's needed for us to be able to run the middle end optimizations with fat pointers in the module though. I'll work out how to test this.
lib/Target/AMDGPU/SIISelLowering.cpp
1050	TLI calls into this (which is why it is needed). I can work out some pass that queries TLI and do a test though.

Add a test case that triggers the target transform info code path.

sheredom marked 3 inline comments as done.Mar 6 2019, 6:22 AM

sheredom added inline comments.

docs/AMDGPUUsage.rst
294–303	I'd rather not do that now - we'll need at least 3 (1 for image descriptors, 1 for structured buffer descriptors, 1 for samplers) more and I don't want to go through all the steps of adding them everywhere when we've got no immediate need for them.

arsenm added inline comments.Mar 6 2019, 8:45 AM

docs/AMDGPUUsage.rst
299–300	I don't understand why you would blend these. You just need the 128-bit pointer, and then the intrinsic accessing it will have a 32-bit offset operand that isn't part of the pointer
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	Why does it need to be 160? It should be 128 like the descriptor

sheredom marked an inline comment as done.Mar 6 2019, 8:49 AM

sheredom added inline comments.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	So for non-swizzable buffer descriptors we want to model it using normal LLVM load/store/atomic instructions, so that we have no intrinsics required for them at all. To model this we need a 160-bit pointer for the 128-bit descriptor + 32-bit offset. This is super important because it means these 160-bit pointers partake in all the normal load/store optimizations without us having to have special cases for whatever new intrinsics we'd have to introduce. We're trying our best to avoid the need for any new intrinsics (if at all possible, it won't always be possible though).

arsenm added inline comments.Mar 6 2019, 8:55 AM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	I still don't see how the offset is part of the pointer itself. You could always use an offset of 0, so it would be a matter of changing the GEP index type to be different from the bit width of the pointer, and an optimization during codegen to fold it in

sheredom marked an inline comment as done.Mar 6 2019, 9:40 AM

sheredom added inline comments.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	So if we explicitly laid out the pointer as a p7:128, what would happen if you stored that pointer into memory (say an alloca)? This is fine if you are storing the original pointer (just the 128-bit memory buffer descriptor), but if you had GEP'ed into this pointer, when you store the pointer into memory you are losing the offset the information as part of that store. It seems dangerous to me that we'd pretend the pointer is 128-bit when actually for all intermediate uses of the pointer it'll contain 160-bits of valid information. I think I'd rather just leave it as non-integral which solves all these issues perfectly well and also has the added benefit of restricting the usages of the pointer from things we don't want to support.

arsenm added inline comments.Mar 6 2019, 11:26 AM

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	I don't follow this. Of course it's invalid to store a different value and expect a different one to be there afterwards. You seem to be implying getelementptr is invalid to use at all. The getelementptr is producing a new 128-bit value, it isn't packing the 32-bit offset into some merge value

sheredom marked an inline comment as done.Mar 7 2019, 12:57 AM

sheredom added inline comments.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	So your pitch is that GEP produces a new 128-bit value, where we've modified the base addr + the num_records within the GEP to record the new addr + upper bound? This is really not what we want when we actually want to consume the fat pointer into an MUBUF instruction, we really want the original descriptor to be unmodified and the offset to be passed in a separate VGPR though.

nhaehnle added inline comments.Mar 7 2019, 1:29 AM

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
65	It may be more pragmatic to have NoAlias with Constant 32-bit. The intention and current practice is for Constant 32-bit to be used with descriptor tables, and those really shouldn't ever alias buffer fat pointers.
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
297	Definitely agree with @sheredom here, as the way I would think about these pointers is similar to "pointers + bounds" you could do on a CPU. Two very immediate practical problems that would come up with 128 bits are: No good way to support non-uniform GEPs. In practice, we'll want to create a buffer fat pointer from a uniform 128-bit descriptor using some trivial amdgcn intrinsic, but GEPs will very often be non-uniform. In order to be able to use MUBUF instructions, we need the 128+32 bit representation. (Of course, we'll also need a fallback for the case where the descriptor isn't uniform, but in actual practice it will be uniform almost always) Inability to support GEPs with negative offsets (because the required bounds-check information is lost)

sheredom marked an inline comment as done.Mar 7 2019, 1:45 AM

sheredom added inline comments.

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
65	Should I then actually be changing all the Constant 32-bit aliasing rules to only alias with itself do you think?

LGTM, whether you do the NoAlias between constant 32-bit and buffer address space here or separately. (Making NoAlias between constant 32-bit and the other existing address spaces should definitely be a different patch.)

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
65	I think that's a good idea, but it should be a separate patch.

This revision is now accepted and ready to land.Mar 15 2019, 1:32 PM

arsenm added inline comments.Mar 15 2019, 2:57 PM

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
65	It also seems to me like it should be renamed, since there's a more specific purpose in mind than just a 32-bit pointer for constant

sheredom marked 10 inline comments as done.Mar 18 2019, 6:48 AM

sheredom added inline comments.

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
65	Agreed on both your comments, I'll submit that change as a separate patch.

Closed by commit rL356373: [AMDGPU] Add an experimental buffer fat pointer address space. (authored by sheredom). · Explain WhyMar 18 2019, 7:43 AM

This revision was automatically updated to reflect the committed changes.

sheredom marked an inline comment as done.

piotr mentioned this in D143945: [AMDGPU] Add legalization case for PTR_ADD on buffer pointers.Feb 14 2023, 12:40 AM

Revision Contents

Path

Size

docs/

AMDGPUUsage.rst

14 lines

lib/

Target/

AMDGPU/

AMDGPU.h

12 lines

AMDGPUAliasAnalysis.cpp

23 lines

AMDGPUTargetMachine.cpp

5 lines

AMDGPUTargetTransformInfo.cpp

3 lines

SIISelLowering.cpp

3 lines

test/

CodeGen/

AMDGPU/

amdgpu-alias-analysis.ll

40 lines

r600.amdgpu-alias-analysis.ll

5 lines

vectorize-buffer-fat-pointer.ll

17 lines

Diff 189498

docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines
	The memory space names used in the table, aside from the region memory space, is			The memory space names used in the table, aside from the region memory space, is
	from the OpenCL standard.			from the OpenCL standard.

	LLVM Address Space number is used throughout LLVM (for example, in LLVM IR).			LLVM Address Space number is used throughout LLVM (for example, in LLVM IR).

	.. table:: Address Space Mapping			.. table:: Address Space Mapping
	:name: amdgpu-address-space-mapping-table			:name: amdgpu-address-space-mapping-table

	================== =================			================== =================================
	LLVM Address Space Memory Space			LLVM Address Space Memory Space
	================== =================			================== =================================
	0 Generic (Flat)			0 Generic (Flat)
	1 Global			1 Global
	2 Region (GDS)			2 Region (GDS)
	3 Local (group/LDS)			3 Local (group/LDS)
	4 Constant			4 Constant
	5 Private (Scratch)			5 Private (Scratch)
	6 Constant 32-bit			6 Constant 32-bit
	================== =================			7 Buffer Fat Pointer (experimental)
				================== =================================

				The buffer fat pointer is an experimental address space that is currently
				unsupported in the backend. It exposes a non-integral pointer that is in future
				intended to support the modelling of 128-bit buffer descriptors + a 32-bit
				offset into the buffer descriptor (in total encapsulating a 160-bit 'pointer'),
				arsenmUnsubmitted Done Reply Inline Actions I don't understand why you would blend these. You just need the 128-bit pointer, and then the intrinsic accessing it will have a 32-bit offset operand that isn't part of the pointer arsenm: I don't understand why you would blend these. You just need the 128-bit pointer, and then the…
				allowing us to use normal LLVM load/store/atomic operations to model the buffer
				descriptors used heavily in graphics workloads targeting the backend.

				arsenmUnsubmitted Done Reply Inline Actions Might as well reserve more for 256-bit descriptors. We'll probably need several of these eventually arsenm: Might as well reserve more for 256-bit descriptors. We'll probably need several of these…
				sheredomAuthorUnsubmitted Done Reply Inline Actions I'd rather not do that now - we'll need at least 3 (1 for image descriptors, 1 for structured buffer descriptors, 1 for samplers) more and I don't want to go through all the steps of adding them everywhere when we've got no immediate need for them. sheredom: I'd rather not do that now - we'll need at least 3 (1 for image descriptors, 1 for structured…
	.. _amdgpu-memory-scopes:			.. _amdgpu-memory-scopes:

	Memory Scopes			Memory Scopes
	-------------			-------------

	This section provides LLVM memory synchronization scopes supported by the AMDGPU			This section provides LLVM memory synchronization scopes supported by the AMDGPU
	backend memory model when the target triple OS is ``amdhsa`` (see			backend memory model when the target triple OS is ``amdhsa`` (see
	:ref:`amdgpu-amdhsa-memory-model` and :ref:`amdgpu-target-triples`).			:ref:`amdgpu-amdhsa-memory-model` and :ref:`amdgpu-target-triples`).
	▲ Show 20 Lines • Show All 4,834 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
	/// various memory regions on the hardware. On the CPU			/// various memory regions on the hardware. On the CPU
	/// all of the address spaces point to the same memory,			/// all of the address spaces point to the same memory,
	/// however on the GPU, each address space points to			/// however on the GPU, each address space points to
	/// a separate piece of memory that is unique from other			/// a separate piece of memory that is unique from other
	/// memory locations.			/// memory locations.
	namespace AMDGPUAS {			namespace AMDGPUAS {
	enum : unsigned {			enum : unsigned {
	// The maximum value for flat, generic, local, private, constant and region.			// The maximum value for flat, generic, local, private, constant and region.
	MAX_AMDGPU_ADDRESS = 6,			MAX_AMDGPU_ADDRESS = 7,

	FLAT_ADDRESS = 0, ///< Address space for flat memory.			FLAT_ADDRESS = 0, ///< Address space for flat memory.
	GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).			GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
	REGION_ADDRESS = 2, ///< Address space for region memory. (GDS)			REGION_ADDRESS = 2, ///< Address space for region memory. (GDS)

	CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2)			CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2).
	LOCAL_ADDRESS = 3, ///< Address space for local memory.			LOCAL_ADDRESS = 3, ///< Address space for local memory.
	PRIVATE_ADDRESS = 5, ///< Address space for private memory.			PRIVATE_ADDRESS = 5, ///< Address space for private memory.

	CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory			CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory.

	/// Address space for direct addressible parameter memory (CONST0)			BUFFER_FAT_POINTER = 7, ///< Address space for 160-bit buffer fat pointers.

				/// Address space for direct addressible parameter memory (CONST0).
	PARAM_D_ADDRESS = 6,			PARAM_D_ADDRESS = 6,
	/// Address space for indirect addressible parameter memory (VTX1)			/// Address space for indirect addressible parameter memory (VTX1).
	PARAM_I_ADDRESS = 7,			PARAM_I_ADDRESS = 7,

	// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on			// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on
	// this order to be able to dynamically index a constant buffer, for			// this order to be able to dynamically index a constant buffer, for
	// example:			// example:
	//			//
	// ConstantBufferAS = CONSTANT_BUFFER_0 + CBIdx			// ConstantBufferAS = CONSTANT_BUFFER_0 + CBIdx

	Show All 23 Lines

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	ImmutablePass *llvm::createAMDGPUExternalAAWrapperPass() {			ImmutablePass *llvm::createAMDGPUExternalAAWrapperPass() {
	return new AMDGPUExternalAAWrapper();			return new AMDGPUExternalAAWrapper();
	}			}

	void AMDGPUAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {			void AMDGPUAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
	AU.setPreservesAll();			AU.setPreservesAll();
	}			}

	// These arrays are indexed by address space value enum elements 0 ... to 6			// These arrays are indexed by address space value enum elements 0 ... to 7
	static const AliasResult ASAliasRules[7][7] = {			static const AliasResult ASAliasRules[8][8] = {
	/* Flat Global Region Group Constant Private Constant 32-bit */			/* Flat Global Region Group Constant Private Constant 32-bit Buffer Fat Ptr */
	/* Flat */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias},			/* Flat */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias},
	/* Global */ {MayAlias, MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , MayAlias},			/* Global */ {MayAlias, MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , MayAlias, MayAlias},
	/* Region */ {MayAlias, NoAlias , NoAlias , NoAlias, MayAlias, NoAlias , MayAlias},			/* Region */ {MayAlias, NoAlias , NoAlias , NoAlias , MayAlias, NoAlias , MayAlias, NoAlias},
	/* Group */ {MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , NoAlias , NoAlias},			/* Group */ {MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , NoAlias , NoAlias , NoAlias},
	/* Constant */ {MayAlias, MayAlias, MayAlias, NoAlias , NoAlias, NoAlias , MayAlias},			/* Constant */ {MayAlias, MayAlias, MayAlias, NoAlias , NoAlias , NoAlias , MayAlias, MayAlias},
	/* Private */ {MayAlias, NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, NoAlias},			/* Private */ {MayAlias, NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, NoAlias , NoAlias},
	/* Constant 32-bit */ {MayAlias, MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , NoAlias}			/* Constant 32-bit */ {MayAlias, MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , NoAlias , MayAlias},
				nhaehnleUnsubmitted Done Reply Inline Actions It may be more pragmatic to have NoAlias with Constant 32-bit. The intention and current practice is for Constant 32-bit to be used with descriptor tables, and those really shouldn't ever alias buffer fat pointers. nhaehnle: It may be more pragmatic to have NoAlias with Constant 32-bit. The intention and current…
				sheredomAuthorUnsubmitted Done Reply Inline Actions Should I then actually be changing all the Constant 32-bit aliasing rules to only alias with itself do you think? sheredom: Should I then actually be changing all the Constant 32-bit aliasing rules to only alias with…
				nhaehnleUnsubmitted Done Reply Inline Actions I think that's a good idea, but it should be a separate patch. nhaehnle: I think that's a good idea, but it should be a separate patch.
				arsenmUnsubmitted Done Reply Inline Actions It also seems to me like it should be renamed, since there's a more specific purpose in mind than just a 32-bit pointer for constant arsenm: It also seems to me like it should be renamed, since there's a more specific purpose in mind…
				sheredomAuthorUnsubmitted Done Reply Inline Actions Agreed on both your comments, I'll submit that change as a separate patch. sheredom: Agreed on both your comments, I'll submit that change as a separate patch.
				/* Buffer Fat Ptr */ {MayAlias, MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , MayAlias, MayAlias}
	};			};

	static AliasResult getAliasResult(unsigned AS1, unsigned AS2) {			static AliasResult getAliasResult(unsigned AS1, unsigned AS2) {
	static_assert(AMDGPUAS::MAX_AMDGPU_ADDRESS <= 6, "Addr space out of range");			static_assert(AMDGPUAS::MAX_AMDGPU_ADDRESS <= 7, "Addr space out of range");

	if (AS1 > AMDGPUAS::MAX_AMDGPU_ADDRESS \|\| AS2 > AMDGPUAS::MAX_AMDGPU_ADDRESS)			if (AS1 > AMDGPUAS::MAX_AMDGPU_ADDRESS \|\| AS2 > AMDGPUAS::MAX_AMDGPU_ADDRESS)
	return MayAlias;			return MayAlias;

	return ASAliasRules[AS1][AS2];			return ASAliasRules[AS1][AS2];
	}			}

	AliasResult AMDGPUAAResult::alias(const MemoryLocation &LocA,			AliasResult AMDGPUAAResult::alias(const MemoryLocation &LocA,
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

	Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines
	static StringRef computeDataLayout(const Triple &TT) {			static StringRef computeDataLayout(const Triple &TT) {
	if (TT.getArch() == Triple::r600) {			if (TT.getArch() == Triple::r600) {
	// 32-bit pointers.			// 32-bit pointers.
	return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"			return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
	"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5";			"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5";
	}			}

	// 32-bit private, local, and region pointers. 64-bit global, constant and			// 32-bit private, local, and region pointers. 64-bit global, constant and
	// flat.			// flat, non-integral buffer fat pointers.
	return "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"			return "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32"
				arsenmUnsubmitted Done Reply Inline Actions Shouldn't this add p7:128? arsenm: Shouldn't this add p7:128?
				sheredomAuthorUnsubmitted Done Reply Inline Actions It'd need to be p7:160 - but I'm entirely unsure whether LLVM will drop a lung on a non-power of 2 pointer size. sheredom: It'd need to be p7:160 - but I'm entirely unsure whether LLVM will drop a lung on a non-power…
				arsenmUnsubmitted Done Reply Inline Actions Why does it need to be 160? It should be 128 like the descriptor arsenm: Why does it need to be 160? It should be 128 like the descriptor
				sheredomAuthorUnsubmitted Done Reply Inline Actions So for non-swizzable buffer descriptors we want to model it using normal LLVM load/store/atomic instructions, so that we have no intrinsics required for them at all. To model this we need a 160-bit pointer for the 128-bit descriptor + 32-bit offset. This is super important because it means these 160-bit pointers partake in all the normal load/store optimizations without us having to have special cases for whatever new intrinsics we'd have to introduce. We're trying our best to avoid the need for any new intrinsics (if at all possible, it won't always be possible though). sheredom: So for non-swizzable buffer descriptors we want to model it using normal LLVM load/store/atomic…
				arsenmUnsubmitted Done Reply Inline Actions I still don't see how the offset is part of the pointer itself. You could always use an offset of 0, so it would be a matter of changing the GEP index type to be different from the bit width of the pointer, and an optimization during codegen to fold it in arsenm: I still don't see how the offset is part of the pointer itself. You could always use an offset…
				sheredomAuthorUnsubmitted Done Reply Inline Actions So if we explicitly laid out the pointer as a p7:128, what would happen if you stored that pointer into memory (say an alloca)? This is fine if you are storing the original pointer (just the 128-bit memory buffer descriptor), but if you had GEP'ed into this pointer, when you store the pointer into memory you are losing the offset the information as part of that store. It seems dangerous to me that we'd pretend the pointer is 128-bit when actually for all intermediate uses of the pointer it'll contain 160-bits of valid information. I think I'd rather just leave it as non-integral which solves all these issues perfectly well and also has the added benefit of restricting the usages of the pointer from things we don't want to support. sheredom: So if we explicitly laid out the pointer as a p7:128, what would happen if you stored that…
				arsenmUnsubmitted Done Reply Inline Actions I don't follow this. Of course it's invalid to store a different value and expect a different one to be there afterwards. You seem to be implying getelementptr is invalid to use at all. The getelementptr is producing a new 128-bit value, it isn't packing the 32-bit offset into some merge value arsenm: I don't follow this. Of course it's invalid to store a different value and expect a different…
				sheredomAuthorUnsubmitted Done Reply Inline Actions So your pitch is that GEP produces a new 128-bit value, where we've modified the base addr + the num_records within the GEP to record the new addr + upper bound? This is really not what we want when we actually want to consume the fat pointer into an MUBUF instruction, we really want the original descriptor to be unmodified and the offset to be passed in a separate VGPR though. sheredom: So your pitch is that GEP produces a new 128-bit value, where we've modified the base addr +…
				nhaehnleUnsubmitted Done Reply Inline Actions Definitely agree with @sheredom here, as the way I would think about these pointers is similar to "pointers + bounds" you could do on a CPU. Two very immediate practical problems that would come up with 128 bits are: No good way to support non-uniform GEPs. In practice, we'll want to create a buffer fat pointer from a uniform 128-bit descriptor using some trivial amdgcn intrinsic, but GEPs will very often be non-uniform. In order to be able to use MUBUF instructions, we need the 128+32 bit representation. (Of course, we'll also need a fallback for the case where the descriptor isn't uniform, but in actual practice it will be uniform almost always) Inability to support GEPs with negative offsets (because the required bounds-check information is lost) nhaehnle: Definitely agree with @sheredom here, as the way I would think about these pointers is similar…
	"-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"			"-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
	"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5";			"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"
				"-ni:7";
	}			}

	LLVM_READNONE			LLVM_READNONE
	static StringRef getGPUOrDefault(const Triple &TT, StringRef GPU) {			static StringRef getGPUOrDefault(const Triple &TT, StringRef GPU) {
	if (!GPU.empty())			if (!GPU.empty())
	return GPU;			return GPU;

	if (TT.getArch() == Triple::amdgcn)			if (TT.getArch() == Triple::amdgcn)
	▲ Show 20 Lines • Show All 623 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	if (VecRegBitWidth > 128)
return 128 / StoreSize;		return 128 / StoreSize;

return VF;		return VF;
}		}

unsigned GCNTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {		unsigned GCNTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
if (AddrSpace == AMDGPUAS::GLOBAL_ADDRESS \|\|		if (AddrSpace == AMDGPUAS::GLOBAL_ADDRESS \|\|
AddrSpace == AMDGPUAS::CONSTANT_ADDRESS \|\|		AddrSpace == AMDGPUAS::CONSTANT_ADDRESS \|\|
AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT) {		AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT \|\|
		AddrSpace == AMDGPUAS::BUFFER_FAT_POINTER) {
		arsenmUnsubmitted Done Reply Inline Actions This probably shouldn't be included now, at least without a test arsenm: This probably shouldn't be included now, at least without a test
		sheredomAuthorUnsubmitted Done Reply Inline Actions It's needed for us to be able to run the middle end optimizations with fat pointers in the module though. I'll work out how to test this. sheredom: It's needed for us to be able to run the middle end optimizations with fat pointers in the…
return 512;		return 512;
}		}

if (AddrSpace == AMDGPUAS::FLAT_ADDRESS \|\|		if (AddrSpace == AMDGPUAS::FLAT_ADDRESS \|\|
AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|		AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|
AddrSpace == AMDGPUAS::REGION_ADDRESS)		AddrSpace == AMDGPUAS::REGION_ADDRESS)
return 128;		return 128;

▲ Show 20 Lines • Show All 467 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,040 Lines • ▼ Show 20 Lines	bool SITargetLowering::isLegalAddressingMode(const DataLayout &DL,
// No global is ever allowed as a base.		// No global is ever allowed as a base.
if (AM.BaseGV)		if (AM.BaseGV)
return false;		return false;

if (AS == AMDGPUAS::GLOBAL_ADDRESS)		if (AS == AMDGPUAS::GLOBAL_ADDRESS)
return isLegalGlobalAddressingMode(AM);		return isLegalGlobalAddressingMode(AM);

if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|		if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|
AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT) {		AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT \|\|
		AS == AMDGPUAS::BUFFER_FAT_POINTER) {
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
		sheredomAuthorUnsubmitted Done Reply Inline Actions TLI calls into this (which is why it is needed). I can work out some pass that queries TLI and do a test though. sheredom: TLI calls into this (which is why it is needed). I can work out some pass that queries TLI and…
// If the offset isn't a multiple of 4, it probably isn't going to be		// If the offset isn't a multiple of 4, it probably isn't going to be
// correctly aligned.		// correctly aligned.
// FIXME: Can we get the real alignment here?		// FIXME: Can we get the real alignment here?
if (AM.BaseOffs % 4 != 0)		if (AM.BaseOffs % 4 != 0)
return isLegalMUBUFAddressingMode(AM);		return isLegalMUBUFAddressingMode(AM);

// There are no SMRD extloads, so if we have to do a small type access we		// There are no SMRD extloads, so if we have to do a small type access we
// will use a MUBUF load.		// will use a MUBUF load.
▲ Show 20 Lines • Show All 8,770 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	define void @test_1_999(i8 addrspace(1)* %p, i8 addrspace(999)* %p1) {			define void @test_1_999(i8 addrspace(1)* %p, i8 addrspace(999)* %p1) {
	ret void			ret void
	}			}

	; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(999)* %p			; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(999)* %p
	define void @test_999_1(i8 addrspace(999)* %p, i8 addrspace(1)* %p1) {			define void @test_999_1(i8 addrspace(999)* %p, i8 addrspace(1)* %p1) {
	ret void			ret void
	}			}

				; CHECK: MayAlias: i8 addrspace(7)* %p, i8* %p1
				define void @test_7_0(i8 addrspace(7)* %p, i8 addrspace(0)* %p1) {
				ret void
				}

				; CHECK: MayAlias: i8 addrspace(1)* %p1, i8 addrspace(7)* %p
				define void @test_7_1(i8 addrspace(7)* %p, i8 addrspace(1)* %p1) {
				ret void
				}

				; CHECK: NoAlias: i8 addrspace(2)* %p1, i8 addrspace(7)* %p
				define void @test_7_2(i8 addrspace(7)* %p, i8 addrspace(2)* %p1) {
				ret void
				}

				; CHECK: NoAlias: i8 addrspace(3)* %p1, i8 addrspace(7)* %p
				define void @test_7_3(i8 addrspace(7)* %p, i8 addrspace(3)* %p1) {
				ret void
				}

				; CHECK: MayAlias: i8 addrspace(4)* %p1, i8 addrspace(7)* %p
				define void @test_7_4(i8 addrspace(7)* %p, i8 addrspace(4)* %p1) {
				ret void
				}

				; CHECK: NoAlias: i8 addrspace(5)* %p1, i8 addrspace(7)* %p
				define void @test_7_5(i8 addrspace(7)* %p, i8 addrspace(5)* %p1) {
				ret void
				}

				; CHECK: MayAlias: i8 addrspace(6)* %p1, i8 addrspace(7)* %p
				define void @test_7_6(i8 addrspace(7)* %p, i8 addrspace(6)* %p1) {
				ret void
				}

				; CHECK: MayAlias: i8 addrspace(7)* %p, i8 addrspace(7)* %p1
				define void @test_7_7(i8 addrspace(7)* %p, i8 addrspace(7)* %p1) {
				ret void
				}

test/CodeGen/AMDGPU/r600.amdgpu-alias-analysis.ll

	; RUN: opt -mtriple=r600-- -O3 -aa-eval -print-all-alias-modref-info -disable-output < %s 2>&1 \| FileCheck %s			; RUN: opt -mtriple=r600-- -O3 -aa-eval -print-all-alias-modref-info -disable-output < %s 2>&1 \| FileCheck %s

	; CHECK: MayAlias: i8 addrspace(5)* %p, i8 addrspace(7)* %p1			; CHECK: MayAlias: i8 addrspace(5)* %p, i8 addrspace(999)* %p1
				define amdgpu_kernel void @test(i8 addrspace(5)* %p, i8 addrspace(999)* %p1) {
	define amdgpu_kernel void @test(i8 addrspace(5)* %p, i8 addrspace(7)* %p1) {
	ret void			ret void
	}			}

test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll

This file was added.

				; RUN: opt -S -mtriple=amdgcn-- -load-store-vectorizer < %s \| FileCheck -check-prefix=OPT %s

				; OPT-LABEL: @func(
				define void @func(i32 addrspace(7)* %out) {
				entry:
				%a0 = getelementptr i32, i32 addrspace(7)* %out, i32 0
				%a1 = getelementptr i32, i32 addrspace(7)* %out, i32 1
				%a2 = getelementptr i32, i32 addrspace(7)* %out, i32 2
				%a3 = getelementptr i32, i32 addrspace(7)* %out, i32 3

				; OPT: store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32> addrspace(7)* %0, align 4
				store i32 0, i32 addrspace(7)* %a0
				store i32 1, i32 addrspace(7)* %a1
				store i32 2, i32 addrspace(7)* %a2
				store i32 3, i32 addrspace(7)* %a3
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add an experimental buffer fat pointer address space.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 189498

docs/AMDGPUUsage.rst

lib/Target/AMDGPU/AMDGPU.h

lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll

test/CodeGen/AMDGPU/r600.amdgpu-alias-analysis.ll

test/CodeGen/AMDGPU/vectorize-buffer-fat-pointer.ll

[AMDGPU] Add an experimental buffer fat pointer address space.
ClosedPublic