This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
AMDGPUUsage.rst
-
include/llvm/IR/
-
llvm/
-
IR/
5/7
IntrinsicsAMDGPU.td
-
lib/
-
Analysis/
4/4
ValueTracking.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPULegalizerInfo.h
9/11
AMDGPULegalizerInfo.cpp
-
SIISelLowering.h
1/3
SIISelLowering.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
GlobalISel/
-
llvm.amdgcn.make.buffer.rsrc.ll
-
llvm.amdgcn.make.buffer.rsrc.ll
1
make-buffer-rsrc-lds-fails.ll
-
ptr-buffer-alias-scheduling.ll
-
Transforms/LICM/AMDGPU/
-
LICM/
-
AMDGPU/
-
buffer-rsrc-ptrs.ll

Differential D148957

[AMDGPU] Add intrinsic for converting global pointers to resources
ClosedPublic

Authored by krzysz00 on Apr 21 2023, 1:46 PM.

Download Raw Diff

Details

Reviewers

arsenm
foad
nhaehnle
piotr
rampitec
fhahn
jdoerfert

Commits

rG23098bd4542e: [AMDGPU] Add intrinsic for converting global pointers to resources

Summary

Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).

This intrinsic is lowered during the early phases of the backend.

This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.

Depends on D148184

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Apr 21 2023, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2023, 1:46 PM

Herald added subscribers: kosarev, StephenFan, kerbowa and 8 others. · View Herald Transcript

krzysz00 requested review of this revision.Apr 21 2023, 1:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2023, 1:46 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B227317: Diff 515904.Apr 21 2023, 1:46 PM

gandhi21299 added a subscriber: gandhi21299.Apr 24 2023, 9:50 AM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432	Might be helpful to have an assertion on the number of operands of `MI`. Is it possible that any of the operands is not a register?

arsenm added inline comments.Apr 24 2023, 11:55 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1009–1010	There's no reason to have different intrinsics for different source address spaces. Just accept a type mangling operand for the input pointer

arsenm added inline comments.Apr 24 2023, 11:57 AM

llvm/lib/Analysis/ValueTracking.cpp
5804	Typo necassarily

arsenm added inline comments.Apr 24 2023, 12:02 PM

llvm/lib/Analysis/ValueTracking.cpp
5810–5811	This handling needs a test (I'm assuming that was the intent of ptr-buffer-alias-scheduling.ll, but I think we also need a pure IR one that doesn't depend on codegen)
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4460–4461	You can combine all of these createGenericVirtualRegister calls like: auto ExtStride = B.buildAnyExt(S32, Stride)

krzysz00 added inline comments.Apr 24 2023, 1:28 PM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1009–1010	Does "any pointer" work for different address spaces? The documentation's a bit fuzzy If we're accepting arbitrary pointers, will we then need to, during legalization, reject pointer types that don't make sense (ex. LDS)?
llvm/lib/Analysis/ValueTracking.cpp
5810–5811	The test over in LICM is handling this, though there might be a more straightforward way to do it.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432	From what I can tell of all the surrounding code ... no?

arsenm added inline comments.Apr 25 2023, 3:48 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1009–1010	Yes Yes. Ideally we would have a target IR verifier for these sorts of things. In general we just get selection errors for weird things like this. If you just handle any 64-bit pointer I think it will work out that way without having to do anything special

arsenm added inline comments.Apr 25 2023, 1:53 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432	Only if any operands are immarg, which they aren't (IIRC this was a MachineVerifier check which is missing)

Merge intrinsics into one variadic one, add negative test for 32-bit pointers

Harbormaster completed remote builds in B228090: Diff 516896.Apr 25 2023, 2:11 PM

One more rebasing commit

Harbormaster completed remote builds in B228155: Diff 516981.Apr 25 2023, 5:17 PM

Code simplifications - thanks for the info, Matt!

krzysz00 marked an inline comment as done.Apr 26 2023, 9:00 AM

krzysz00 added inline comments.

llvm/lib/Analysis/ValueTracking.cpp
5810–5811	Unless you can think of a "purer" way to check this
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4432	Ok so this is basically fine as is, sounds like

Harbormaster completed remote builds in B228314: Diff 517203.Apr 26 2023, 9:40 AM

What should happen if I do something like:

  %alloca = ...
  %cast = addrspacecast ptr addrspace(5) %alloca to ptr
  %buffer = call ptr addrspace(8) @llvm.amdgcn.as.buffer.rsrc.p0(ptr %cast ...)

Now you can do evil things like access other lanes private stack items. Should we define this to poison or something?

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
998	Drop this comment, the i8 reference is vestigial
999	make_buffer_rsrc? as makes it sound like a regular cast?

Re the evil code: you're right, but I could also write that as

%alloca = ...
%cast = addrspacecast ptr addrspace(5) %alloca to ptr
%as.int = ptrtoint ptr %cast as i64
%ext = zext i64 %as.int to i128
%buffer.int = or i128 %ext, i128 [...]
; Either
%buffer = inttoptr i128 %buffer.int to ptr addrspace(8)
; or
%buffer = bitcast i128 %buffer.int to <4 x i32>
; Muck around in everyone else's stack, etc etc.
%buffer = call ptr addrspace(8) @llvm.amdgcn.as.buffer.rsrc.p0(ptr %cast ...)

As to what we should *do* about it ... I'm not 100% sure, but it seems to me that, from an IR perspective, this sort of cracking open the stack is allowed, but using it to step on something that's not in bounds of your pointer isn't.
So the conversion itself is valid, but you'd get undef (conceptually) from trying to read things you shouldn't.

In other words, I'd argue IR that takes alloca() results and builds the buffer manually is the same kind of syntactically unavoidable but semantically forbidden behavior as

%alloca = alloca [1 x i8]
%poke = getelementptr i8, ptr %alloca, i64 1024
store i64 ..., ptr %poke

even though creating %poke itself is allowed (if I understand the semantics of LLVM IR correctly).

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
999	I'm thinking of it as an `addrspacecast` that takes arguments, hence the naming, but I'm not too tied to the name.

arsenm added inline comments.May 1 2023, 2:03 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4442	I thought you had to do B.buildUnmerge({S32, S32}, Pointer)?
4450–4451	Do you need really need the version that returns APInt and the register, or can you use the one that returns int64_t?
4452	Can do !StrideConst
4454	StrideConst
4455	can you just get out of APInt?
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8678	Should this be *ConstStride != 0?

Adding people who most recently touched/reviewed the isIntrinsicReturningPointerAliasingArgumentWithoutCapturing function for their input on whether my patch to it is correct. ( @fhahn in particular since it looks like MustPreserveNullness is yours)

Update address space documentation, address some review comments

krzysz00 marked an inline comment as not done.May 3 2023, 5:06 PM

krzysz00 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4450–4451	Having looked around, the `APInt` version seems to do things like look through chains of sext/trunc/copy/... and otherwise does that sort of constant folding. That might be worth it?

Harbormaster completed remote builds in B229859: Diff 519308.May 3 2023, 6:32 PM

Rename Extent to NumRecords to match ISA docs.

Harbormaster completed remote builds in B232083: Diff 522297.May 15 2023, 3:01 PM

arsenm added inline comments.May 16 2023, 7:38 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
999	I think "make" or "create" would be better
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.as.buffer.rsrc.ll
2 ↗	(On Diff #522297)	Why split the dag and globalisel versions of the tests? The dag version should also use an explicit -global-isel=0 run line

Rename intrinsic per review comments

Harbormaster completed remote builds in B232619: Diff 523060.May 17 2023, 9:41 AM

arsenm accepted this revision.May 23 2023, 3:32 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
8686	Hardcoding this to an i32 constant is fine instead of going through DAG.getShiftAmountConstant
8693–8694	can fold to direct return

This revision is now accepted and ready to land.May 23 2023, 3:32 AM

loveme00835 added a subscriber: loveme00835.May 31 2023, 4:42 PM

Closed by commit rG23098bd4542e: [AMDGPU] Add intrinsic for converting global pointers to resources (authored by krzysz00). · Explain WhyJun 5 2023, 10:08 AM

This revision was automatically updated to reflect the committed changes.

krzysz00 added a commit: rG23098bd4542e: [AMDGPU] Add intrinsic for converting global pointers to resources.

chapuni added a subscriber: chapuni.Jun 5 2023, 2:13 PM

chapuni added inline comments.

llvm/test/CodeGen/AMDGPU/make-buffer-rsrc-lds-fails.ll
2–3	Would they really crash? I guess they require +asserts.

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

21 lines

include/

llvm/

IR/

IntrinsicsAMDGPU.td

10 lines

lib/

Analysis/

ValueTracking.cpp

11 lines

Target/

AMDGPU/

AMDGPULegalizerInfo.h

3 lines

AMDGPULegalizerInfo.cpp

47 lines

SIISelLowering.h

4 lines

SIISelLowering.cpp

45 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.make.buffer.rsrc.ll

232 lines

llvm.amdgcn.make.buffer.rsrc.ll

163 lines

make-buffer-rsrc-lds-fails.ll

8 lines

ptr-buffer-alias-scheduling.ll

61 lines

Transforms/

LICM/

AMDGPU/

buffer-rsrc-ptrs.ll

43 lines

Diff 523060

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	Private
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).		:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).

Scratch memory can be accessed in an interleaved manner using buffer		Scratch memory can be accessed in an interleaved manner using buffer
instructions with the scratch buffer descriptor and per wavefront scratch		instructions with the scratch buffer descriptor and per wavefront scratch
offset, by the scratch instructions, or by flat instructions. Multi-dword		offset, by the scratch instructions, or by flat instructions. Multi-dword
access is not supported except by flat and scratch instructions in		access is not supported except by flat and scratch instructions in
GFX9-GFX11.		GFX9-GFX11.

		Code that manipulates the stack values in other lanes of a wavefront,
		such as by `addrspacecast`ing stack pointers to generic ones and taking offsets
		that reach other lanes or by explicitly constructing the scratch buffer descriptor,
		triggers undefined behavior when it modifies the scratch values of other lanes.
		The compiler may assume that such modifications do not occur.

Constant 32-bit		Constant 32-bit
TODO		TODO

Buffer Fat Pointer		Buffer Fat Pointer
The buffer fat pointer is an experimental address space that is currently		The buffer fat pointer is an experimental address space that is currently
unsupported in the backend. It exposes a non-integral pointer that is in		unsupported in the backend. It exposes a non-integral pointer that is in
the future intended to support the modelling of 128-bit buffer descriptors		the future intended to support the modelling of 128-bit buffer descriptors
plus a 32-bit offset into the buffer (in total encapsulating a 160-bit		plus a 32-bit offset into the buffer (in total encapsulating a 160-bit
pointer), allowing normal LLVM load/store/atomic operations to be used to		pointer), allowing normal LLVM load/store/atomic operations to be used to
model the buffer descriptors used heavily in graphics workloads targeting		model the buffer descriptors used heavily in graphics workloads targeting
the backend.		the backend.

The buffer descriptor used to construct a buffer fat pointer must be raw:		The buffer descriptor used to construct a buffer fat pointer must be raw:
the stride must be 0, the "add tid" flag bust be 0, the swizzle enable bits		the stride must be 0, the "add tid" flag bust be 0, the swizzle enable bits
must be off, and the extent must be measured in bytes. (On subtargets where		must be off, and the extent must be measured in bytes. (On subtargets where
bounds checking may be disabled, buffer fat pointers may choose to enable		bounds checking may be disabled, buffer fat pointers may choose to enable
it or not).		it or not).

Buffer Resource		Buffer Resource
The buffer resource is an experimental address space that is currently unsupported		The buffer resource pointer, in address space 8, is the newer form
in the backend. It exposes a non-integral pointer that will represent a 128-bit		for representing buffer descriptors in AMDGPU IR, replacing their
buffer descriptor resource.		previous representation as `<4 x i32>`. It is a non-integral pointer
		that represents a 128-bit buffer descriptor resource (`V#`).

Since, in general, a buffer resource supports complex addressing modes that cannot		Since, in general, a buffer resource supports complex addressing modes that cannot
be easily represented in LLVM (such as implicit swizzled access to structured		be easily represented in LLVM (such as implicit swizzled access to structured
buffers), it is illegal to perform non-trivial address computations, such as		buffers), it is illegal to perform non-trivial address computations, such as
``getelementptr`` operations, on buffer resources. They may be passed to		``getelementptr`` operations, on buffer resources. They may be passed to
AMDGPU buffer intrinsics, and they may be converted to and from ``i128``.		AMDGPU buffer intrinsics, and they may be converted to and from ``i128``.

Casting a buffer resource to a bufer fat pointer is permitted and adds an offset		Casting a buffer resource to a bufer fat pointer is permitted and adds an offset
of 0.		of 0.

		Buffer resources can be created from 64-bit pointers (which should be either
		generic or global) using the `llvm.amdgcn.make.buffer.rsrc` intrinsic, which
		takes the pointer, which becomes the base of the resource,
		the 16-bit stride (and swzizzle control) field stored in bits `63:48` of a `V#`,
		the 32-bit NumRecords/extent field (bits `95:64`), and the 32-bit flags field
		(bits `127:96`). The specific interpretation of these fields varies by the
		target architecture and is detailed in the ISA descriptions.

Streamout Registers		Streamout Registers
Dedicated registers used by the GS NGG Streamout Instructions. The register		Dedicated registers used by the GS NGG Streamout Instructions. The register
file is modelled as a memory in a distinct address space because it is indexed		file is modelled as a memory in a distinct address space because it is indexed
by an address-like offset in place of named registers, and because register		by an address-like offset in place of named registers, and because register
accesses affect LGKMcnt. This is an internal address space used only by the		accesses affect LGKMcnt. This is an internal address space used only by the
compiler. Do not use this address space for IR pointers.		compiler. Do not use this address space for IR pointers.

.. _amdgpu-memory-scopes:		.. _amdgpu-memory-scopes:
▲ Show 20 Lines • Show All 14,402 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 989 Lines • ▼ Show 20 Lines

	// Data type for buffer resources (V#). Maybe, in the future, we can create a			// Data type for buffer resources (V#). Maybe, in the future, we can create a
	// similar one for textures (T#).			// similar one for textures (T#).
	class AMDGPUBufferRsrcTy<LLVMType data_ty = llvm_any_ty>			class AMDGPUBufferRsrcTy<LLVMType data_ty = llvm_any_ty>
	: LLVMQualPointerType<data_ty, 8>;			: LLVMQualPointerType<data_ty, 8>;

	let TargetPrefix = "amdgcn" in {			let TargetPrefix = "amdgcn" in {

				def int_amdgcn_make_buffer_rsrc : DefaultAttrsIntrinsic <
				arsenmUnsubmitted Done Reply Inline Actions Drop this comment, the i8 reference is vestigial arsenm: Drop this comment, the i8 reference is vestigial
				[AMDGPUBufferRsrcTy<llvm_i8_ty>],
				arsenmUnsubmitted Not Done Reply Inline Actions make_buffer_rsrc? as makes it sound like a regular cast? arsenm: make_buffer_rsrc? as makes it sound like a regular cast?
				krzysz00AuthorUnsubmitted Done Reply Inline Actions I'm thinking of it as an `addrspacecast` that takes arguments, hence the naming, but I'm not too tied to the name. krzysz00: I'm thinking of it as an `addrspacecast` that takes arguments, hence the naming, but I'm not…
				arsenmUnsubmitted Not Done Reply Inline Actions I think "make" or "create" would be better arsenm: I think "make" or "create" would be better
				[llvm_anyptr_ty, // base
				llvm_i16_ty, // stride (and swizzle control)
				llvm_i32_ty, // NumRecords / extent
				llvm_i32_ty], // flags
				// Attributes lifted from ptrmask + some extra argument attributes.
				[IntrNoMem, NoCapture<ArgIndex<0>>, ReadNone<ArgIndex<0>>,
				IntrSpeculatable, IntrWillReturn]>;

	defset list<AMDGPURsrcIntrinsic> AMDGPUBufferIntrinsics = {			defset list<AMDGPURsrcIntrinsic> AMDGPUBufferIntrinsics = {

	class AMDGPUBufferLoad<LLVMType data_ty = llvm_any_ty> : DefaultAttrsIntrinsic <			class AMDGPUBufferLoad<LLVMType data_ty = llvm_any_ty> : DefaultAttrsIntrinsic <
				arsenmUnsubmitted Done Reply Inline Actions There's no reason to have different intrinsics for different source address spaces. Just accept a type mangling operand for the input pointer arsenm: There's no reason to have different intrinsics for different source address spaces. Just accept…
				krzysz00AuthorUnsubmitted Done Reply Inline Actions Does "any pointer" work for different address spaces? The documentation's a bit fuzzy If we're accepting arbitrary pointers, will we then need to, during legalization, reject pointer types that don't make sense (ex. LDS)? krzysz00: 1. Does "any pointer" work for different address spaces? The documentation's a bit fuzzy 2. If…
				arsenmUnsubmitted Done Reply Inline Actions Yes Yes. Ideally we would have a target IR verifier for these sorts of things. In general we just get selection errors for weird things like this. If you just handle any 64-bit pointer I think it will work out that way without having to do anything special arsenm: 1. Yes 2. Yes. Ideally we would have a target IR verifier for these sorts of things. In general…
	[data_ty],			[data_ty],
	[llvm_v4i32_ty, // rsrc(SGPR)			[llvm_v4i32_ty, // rsrc(SGPR)
	llvm_i32_ty, // vindex(VGPR)			llvm_i32_ty, // vindex(VGPR)
	llvm_i32_ty, // offset(SGPR/VGPR/imm)			llvm_i32_ty, // offset(SGPR/VGPR/imm)
	llvm_i1_ty, // glc(imm)			llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)			llvm_i1_ty], // slc(imm)
	[IntrReadMem, ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>], "", [SDNPMemOperand]>,			[IntrReadMem, ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>], "", [SDNPMemOperand]>,
	AMDGPURsrcIntrinsic<0>;			AMDGPURsrcIntrinsic<0>;
	▲ Show 20 Lines • Show All 1,726 Lines • Show Last 20 Lines

llvm/lib/Analysis/ValueTracking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	#include "llvm/IR/GlobalValue.h"			#include "llvm/IR/GlobalValue.h"
	#include "llvm/IR/GlobalVariable.h"			#include "llvm/IR/GlobalVariable.h"
	#include "llvm/IR/InstrTypes.h"			#include "llvm/IR/InstrTypes.h"
	#include "llvm/IR/Instruction.h"			#include "llvm/IR/Instruction.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/Intrinsics.h"			#include "llvm/IR/Intrinsics.h"
	#include "llvm/IR/IntrinsicsAArch64.h"			#include "llvm/IR/IntrinsicsAArch64.h"
				#include "llvm/IR/IntrinsicsAMDGPU.h"
	#include "llvm/IR/IntrinsicsRISCV.h"			#include "llvm/IR/IntrinsicsRISCV.h"
	#include "llvm/IR/IntrinsicsX86.h"			#include "llvm/IR/IntrinsicsX86.h"
	#include "llvm/IR/LLVMContext.h"			#include "llvm/IR/LLVMContext.h"
	#include "llvm/IR/Metadata.h"			#include "llvm/IR/Metadata.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/IR/Operator.h"			#include "llvm/IR/Operator.h"
	#include "llvm/IR/PatternMatch.h"			#include "llvm/IR/PatternMatch.h"
	#include "llvm/IR/Type.h"			#include "llvm/IR/Type.h"
	▲ Show 20 Lines • Show All 5,727 Lines • ▼ Show 20 Lines

	bool llvm::isIntrinsicReturningPointerAliasingArgumentWithoutCapturing(			bool llvm::isIntrinsicReturningPointerAliasingArgumentWithoutCapturing(
	const CallBase *Call, bool MustPreserveNullness) {			const CallBase *Call, bool MustPreserveNullness) {
	switch (Call->getIntrinsicID()) {			switch (Call->getIntrinsicID()) {
	case Intrinsic::launder_invariant_group:			case Intrinsic::launder_invariant_group:
	case Intrinsic::strip_invariant_group:			case Intrinsic::strip_invariant_group:
	case Intrinsic::aarch64_irg:			case Intrinsic::aarch64_irg:
	case Intrinsic::aarch64_tagp:			case Intrinsic::aarch64_tagp:
				// The amdgcn_make_buffer_rsrc function does not alter the address of the
				// input pointer (and thus preserve null-ness for the purposes of escape
				// analysis, which is where the MustPreserveNullness flag comes in to play).
				// However, it will not necessarily map ptr addrspace(N) null to ptr
				arsenmUnsubmitted Done Reply Inline Actions Typo necassarily arsenm: Typo necassarily
				// addrspace(8) null, aka the "null descriptor", which has "all loads return
				// 0, all stores are dropped" semantics. Given the context of this intrinsic
				// list, no one should be relying on such a strict interpretation of
				// MustPreserveNullness (and, at time of writing, they are not), but we
				// document this fact out of an abundance of caution.
				case Intrinsic::amdgcn_make_buffer_rsrc:
	return true;			return true;
				arsenmUnsubmitted Done Reply Inline Actions This handling needs a test (I'm assuming that was the intent of ptr-buffer-alias-scheduling.ll, but I think we also need a pure IR one that doesn't depend on codegen) arsenm: This handling needs a test (I'm assuming that was the intent of ptr-buffer-alias-scheduling.ll…
				krzysz00AuthorUnsubmitted Done Reply Inline Actions The test over in LICM is handling this, though there might be a more straightforward way to do it. krzysz00: The test over in LICM is handling this, though there might be a more straightforward way to do…
				krzysz00AuthorUnsubmitted Done Reply Inline Actions Unless you can think of a "purer" way to check this krzysz00: Unless you can think of a "purer" way to check this
	case Intrinsic::ptrmask:			case Intrinsic::ptrmask:
	return !MustPreserveNullness;			return !MustPreserveNullness;
	default:			default:
	return false;			return false;
	}			}
	}			}

	/// \p PN defines a loop-variant pointer to an object. Check if the			/// \p PN defines a loop-variant pointer to an object. Check if the
	▲ Show 20 Lines • Show All 3,001 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	bool legalizeCTLZ_CTTZ(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;

bool loadInputValue(Register DstReg, MachineIRBuilder &B,		bool loadInputValue(Register DstReg, MachineIRBuilder &B,
const ArgDescriptor *Arg,		const ArgDescriptor *Arg,
const TargetRegisterClass *ArgRC, LLT ArgTy) const;		const TargetRegisterClass *ArgRC, LLT ArgTy) const;
bool loadInputValue(Register DstReg, MachineIRBuilder &B,		bool loadInputValue(Register DstReg, MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;

		bool legalizePointerAsRsrcIntrin(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B) const;

bool legalizePreloadedArgIntrin(		bool legalizePreloadedArgIntrin(
MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;
bool legalizeWorkitemIDIntrinsic(		bool legalizeWorkitemIDIntrinsic(
MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,
unsigned Dim, AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;		unsigned Dim, AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;

Register getKernargParameterPtr(MachineIRBuilder &B, int64_t Offset) const;		Register getKernargParameterPtr(MachineIRBuilder &B, int64_t Offset) const;
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show All 18 Lines
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"		#include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"
#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"		#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"		#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
		#include "llvm/CodeGen/GlobalISel/Utils.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/IntrinsicsR600.h"		#include "llvm/IR/IntrinsicsR600.h"

#define DEBUG_TYPE "amdgpu-legalinfo"		#define DEBUG_TYPE "amdgpu-legalinfo"

using namespace llvm;		using namespace llvm;
using namespace LegalizeActions;		using namespace LegalizeActions;
▲ Show 20 Lines • Show All 4,383 Lines • ▼ Show 20 Lines	if (!loadInputValue(KernargPtrReg, B,
AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR))		AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR))
return false;		return false;

// FIXME: This should be nuw		// FIXME: This should be nuw
B.buildPtrAdd(DstReg, KernargPtrReg, B.buildConstant(IdxTy, Offset).getReg(0));		B.buildPtrAdd(DstReg, KernargPtrReg, B.buildConstant(IdxTy, Offset).getReg(0));
return true;		return true;
}		}

		/// To create a buffer resource from a 64-bit pointer, mask off the upper 32
		/// bits of the pointer and replace them with the stride argument, then
		/// merge_values everything together. In the common case of a raw buffer (the
		/// stride component is 0), we can just AND off the upper half.
		bool AMDGPULegalizerInfo::legalizePointerAsRsrcIntrin(
		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B) const {
		gandhi21299Unsubmitted Done Reply Inline Actions Might be helpful to have an assertion on the number of operands of `MI`. Is it possible that any of the operands is not a register? gandhi21299: Might be helpful to have an assertion on the number of operands of `MI`. Is it possible that…
		krzysz00AuthorUnsubmitted Done Reply Inline Actions From what I can tell of all the surrounding code ... no? krzysz00: From what I can tell of all the surrounding code ... no?
		arsenmUnsubmitted Done Reply Inline Actions Only if any operands are immarg, which they aren't (IIRC this was a MachineVerifier check which is missing) arsenm: Only if any operands are immarg, which they aren't (IIRC this was a MachineVerifier check which…
		krzysz00AuthorUnsubmitted Done Reply Inline Actions Ok so this is basically fine as is, sounds like krzysz00: Ok so this is basically fine as is, sounds like
		Register Result = MI.getOperand(0).getReg();
		Register Pointer = MI.getOperand(2).getReg();
		Register Stride = MI.getOperand(3).getReg();
		Register NumRecords = MI.getOperand(4).getReg();
		Register Flags = MI.getOperand(5).getReg();

		LLT S32 = LLT::scalar(32);

		B.setInsertPt(B.getMBB(), ++B.getInsertPt());
		auto Unmerge = B.buildUnmerge(S32, Pointer);
		arsenmUnsubmitted Done Reply Inline Actions I thought you had to do B.buildUnmerge({S32, S32}, Pointer)? arsenm: I thought you had to do B.buildUnmerge({S32, S32}, Pointer)?
		Register LowHalf = Unmerge.getReg(0);
		Register HighHalf = Unmerge.getReg(1);

		auto AndMask = B.buildConstant(S32, 0x0000ffff);
		auto Masked = B.buildAnd(S32, HighHalf, AndMask);

		MachineInstrBuilder NewHighHalf = Masked;
		std::optional<ValueAndVReg> StrideConst =
		getIConstantVRegValWithLookThrough(Stride, MRI);
		arsenmUnsubmitted Not Done Reply Inline Actions Do you need really need the version that returns APInt and the register, or can you use the one that returns int64_t? arsenm: Do you need really need the version that returns APInt and the register, or can you use the one…
		krzysz00AuthorUnsubmitted Not Done Reply Inline Actions Having looked around, the `APInt` version seems to do things like look through chains of sext/trunc/copy/... and otherwise does that sort of constant folding. That might be worth it? krzysz00: Having looked around, the `APInt` version seems to do things like look through chains of…
		if (!StrideConst \|\| !StrideConst->Value.isZero()) {
		arsenmUnsubmitted Done Reply Inline Actions Can do !StrideConst arsenm: Can do !StrideConst
		MachineInstrBuilder ShiftedStride;
		if (StrideConst) {
		arsenmUnsubmitted Done Reply Inline Actions StrideConst arsenm: StrideConst
		uint32_t StrideVal = StrideConst->Value.getZExtValue();
		arsenmUnsubmitted Done Reply Inline Actions can you just get out of APInt? arsenm: can you just get out of APInt?
		uint32_t ShiftedStrideVal = StrideVal << 16;
		ShiftedStride = B.buildConstant(S32, ShiftedStrideVal);
		} else {
		auto ExtStride = B.buildAnyExt(S32, Stride);
		auto ShiftConst = B.buildConstant(S32, 16);
		ShiftedStride = B.buildShl(S32, ExtStride, ShiftConst);
		arsenmUnsubmitted Done Reply Inline Actions You can combine all of these createGenericVirtualRegister calls like: auto ExtStride = B.buildAnyExt(S32, Stride) arsenm: You can combine all of these createGenericVirtualRegister calls like: ``` auto ExtStride = B.
		}
		NewHighHalf = B.buildOr(S32, Masked, ShiftedStride);
		}
		Register NewHighHalfReg = NewHighHalf.getReg(0);
		B.buildMergeValues(Result, {LowHalf, NewHighHalfReg, NumRecords, Flags});
		MI.eraseFromParent();
		return true;
		}

bool AMDGPULegalizerInfo::legalizeImplicitArgPtr(MachineInstr &MI,		bool AMDGPULegalizerInfo::legalizeImplicitArgPtr(MachineInstr &MI,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
MachineIRBuilder &B) const {		MachineIRBuilder &B) const {
const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = B.getMF().getInfo<SIMachineFunctionInfo>();
if (!MFI->isEntryFunction()) {		if (!MFI->isEntryFunction()) {
return legalizePreloadedArgIntrin(MI, MRI, B,		return legalizePreloadedArgIntrin(MI, MRI, B,
AMDGPUFunctionArgInfo::IMPLICIT_ARG_PTR);		AMDGPUFunctionArgInfo::IMPLICIT_ARG_PTR);
}		}
▲ Show 20 Lines • Show All 1,520 Lines • ▼ Show 20 Lines	if (MachineInstr *BrCond =
MI.eraseFromParent();		MI.eraseFromParent();
BrCond->eraseFromParent();		BrCond->eraseFromParent();
MRI.setRegClass(Reg, TRI->getWaveMaskRegClass());		MRI.setRegClass(Reg, TRI->getWaveMaskRegClass());
return true;		return true;
}		}

return false;		return false;
}		}
		case Intrinsic::amdgcn_make_buffer_rsrc:
		return legalizePointerAsRsrcIntrin(MI, MRI, B);
case Intrinsic::amdgcn_kernarg_segment_ptr:		case Intrinsic::amdgcn_kernarg_segment_ptr:
if (!AMDGPU::isKernel(B.getMF().getFunction().getCallingConv())) {		if (!AMDGPU::isKernel(B.getMF().getFunction().getCallingConv())) {
// This only makes sense to call in a kernel, so just lower to null.		// This only makes sense to call in a kernel, so just lower to null.
B.buildConstant(MI.getOperand(0).getReg(), 0);		B.buildConstant(MI.getOperand(0).getReg(), 0);
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	private:

// Convert the i128 that an addrspace(8) pointer is natively represented as		// Convert the i128 that an addrspace(8) pointer is natively represented as
// into the v4i32 that all the buffer intrinsics expct to receive. We can't		// into the v4i32 that all the buffer intrinsics expct to receive. We can't
// add register classes for i128 on pain of the promotion logic going haywire,		// add register classes for i128 on pain of the promotion logic going haywire,
// so this slightly ugly hack is what we've got. If passed a non-pointer		// so this slightly ugly hack is what we've got. If passed a non-pointer
// argument (as would be seen in older buffer intrinsics), does nothing.		// argument (as would be seen in older buffer intrinsics), does nothing.
SDValue bufferRsrcPtrToVector(SDValue MaybePointer, SelectionDAG &DAG) const;		SDValue bufferRsrcPtrToVector(SDValue MaybePointer, SelectionDAG &DAG) const;

		// Wrap a 64-bit pointer into a v4i32 (which is how all SelectionDAG code
		// represents ptr addrspace(8)) using the flags specified in the intrinsic.
		SDValue lowerPointerAsRsrcIntrin(SDNode *Op, SelectionDAG &DAG) const;

// Handle 8 bit and 16 bit buffer loads		// Handle 8 bit and 16 bit buffer loads
SDValue handleByteShortBufferLoads(SelectionDAG &DAG, EVT LoadVT, SDLoc DL,		SDValue handleByteShortBufferLoads(SelectionDAG &DAG, EVT LoadVT, SDLoc DL,
ArrayRef<SDValue> Ops, MemSDNode *M) const;		ArrayRef<SDValue> Ops, MemSDNode *M) const;

// Handle 8 bit and 16 bit buffer stores		// Handle 8 bit and 16 bit buffer stores
SDValue handleByteShortBufferStores(SelectionDAG &DAG, EVT VDataType,		SDValue handleByteShortBufferStores(SelectionDAG &DAG, EVT VDataType,
SDLoc DL, SDValue Ops[],		SDLoc DL, SDValue Ops[],
MemSDNode *M) const;		MemSDNode *M) const;
▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 9 Lines
/// Custom DAG lowering for SI		/// Custom DAG lowering for SI
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SIISelLowering.h"		#include "SIISelLowering.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUInstrInfo.h"		#include "AMDGPUInstrInfo.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/FloatingPointMode.h"		#include "llvm/ADT/FloatingPointMode.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/UniformityAnalysis.h"		#include "llvm/Analysis/UniformityAnalysis.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
#include "llvm/CodeGen/FunctionLoweringInfo.h"		#include "llvm/CodeGen/FunctionLoweringInfo.h"
#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"		#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"
▲ Show 20 Lines • Show All 5,063 Lines • ▼ Show 20 Lines	void SITargetLowering::ReplaceNodeResults(SDNode *N,
case ISD::EXTRACT_VECTOR_ELT: {		case ISD::EXTRACT_VECTOR_ELT: {
if (SDValue Res = lowerEXTRACT_VECTOR_ELT(SDValue(N, 0), DAG))		if (SDValue Res = lowerEXTRACT_VECTOR_ELT(SDValue(N, 0), DAG))
Results.push_back(Res);		Results.push_back(Res);
return;		return;
}		}
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
unsigned IID = cast<ConstantSDNode>(N->getOperand(0))->getZExtValue();		unsigned IID = cast<ConstantSDNode>(N->getOperand(0))->getZExtValue();
switch (IID) {		switch (IID) {
		case Intrinsic::amdgcn_make_buffer_rsrc:
		Results.push_back(lowerPointerAsRsrcIntrin(N, DAG));
		return;
case Intrinsic::amdgcn_cvt_pkrtz: {		case Intrinsic::amdgcn_cvt_pkrtz: {
SDValue Src0 = N->getOperand(1);		SDValue Src0 = N->getOperand(1);
SDValue Src1 = N->getOperand(2);		SDValue Src1 = N->getOperand(2);
SDLoc SL(N);		SDLoc SL(N);
SDValue Cvt = DAG.getNode(AMDGPUISD::CVT_PKRTZ_F16_F32, SL, MVT::i32,		SDValue Cvt = DAG.getNode(AMDGPUISD::CVT_PKRTZ_F16_F32, SL, MVT::i32,
Src0, Src1);		Src0, Src1);
Results.push_back(DAG.getNode(ISD::BITCAST, SL, MVT::v2f16, Cvt));		Results.push_back(DAG.getNode(ISD::BITCAST, SL, MVT::v2f16, Cvt));
return;		return;
▲ Show 20 Lines • Show All 3,500 Lines • ▼ Show 20 Lines
// Analyze a combined offset from an amdgcn_buffer_ intrinsic and store the		// Analyze a combined offset from an amdgcn_buffer_ intrinsic and store the
// three offsets (voffset, soffset and instoffset) into the SDValue[3] array		// three offsets (voffset, soffset and instoffset) into the SDValue[3] array
// pointed to by Offsets.		// pointed to by Offsets.
void SITargetLowering::setBufferOffsets(SDValue CombinedOffset,		void SITargetLowering::setBufferOffsets(SDValue CombinedOffset,
SelectionDAG &DAG, SDValue *Offsets,		SelectionDAG &DAG, SDValue *Offsets,
Align Alignment) const {		Align Alignment) const {
const SIInstrInfo *TII = getSubtarget()->getInstrInfo();		const SIInstrInfo *TII = getSubtarget()->getInstrInfo();
SDLoc DL(CombinedOffset);		SDLoc DL(CombinedOffset);
if (auto C = dyn_cast<ConstantSDNode>(CombinedOffset)) {		if (auto *C = dyn_cast<ConstantSDNode>(CombinedOffset)) {
uint32_t Imm = C->getZExtValue();		uint32_t Imm = C->getZExtValue();
uint32_t SOffset, ImmOffset;		uint32_t SOffset, ImmOffset;
if (TII->splitMUBUFOffset(Imm, SOffset, ImmOffset, Alignment)) {		if (TII->splitMUBUFOffset(Imm, SOffset, ImmOffset, Alignment)) {
Offsets[0] = DAG.getConstant(0, DL, MVT::i32);		Offsets[0] = DAG.getConstant(0, DL, MVT::i32);
Offsets[1] = DAG.getConstant(SOffset, DL, MVT::i32);		Offsets[1] = DAG.getConstant(SOffset, DL, MVT::i32);
Offsets[2] = DAG.getTargetConstant(ImmOffset, DL, MVT::i32);		Offsets[2] = DAG.getTargetConstant(ImmOffset, DL, MVT::i32);
return;		return;
}		}
Show All 22 Lines	if (!MaybePointer.getValueType().isScalarInteger())
return MaybePointer;		return MaybePointer;

SDLoc DL(MaybePointer);		SDLoc DL(MaybePointer);

SDValue Rsrc = DAG.getBitcast(MVT::v4i32, MaybePointer);		SDValue Rsrc = DAG.getBitcast(MVT::v4i32, MaybePointer);
return Rsrc;		return Rsrc;
}		}

		// Wrap a global or flat pointer into a buffer intrinsic using the flags
		// specified in the intrinsic.
		SDValue SITargetLowering::lowerPointerAsRsrcIntrin(SDNode *Op,
		SelectionDAG &DAG) const {
		SDLoc Loc(Op);

		SDValue Pointer = Op->getOperand(1);
		SDValue Stride = Op->getOperand(2);
		SDValue NumRecords = Op->getOperand(3);
		SDValue Flags = Op->getOperand(4);

		auto [LowHalf, HighHalf] = DAG.SplitScalar(Pointer, Loc, MVT::i32, MVT::i32);
		SDValue Mask = DAG.getConstant(0x0000ffff, Loc, MVT::i32);
		SDValue Masked = DAG.getNode(ISD::AND, Loc, MVT::i32, HighHalf, Mask);
		std::optional<uint32_t> ConstStride = std::nullopt;
		if (auto *ConstNode = dyn_cast<ConstantSDNode>(Stride))
		ConstStride = ConstNode->getZExtValue();

		SDValue NewHighHalf = Masked;
		if (!ConstStride \|\| *ConstStride != 0) {
		arsenmUnsubmitted Done Reply Inline Actions Should this be ConstStride != 0? arsenm:* Should this be *ConstStride != 0?
		SDValue ShiftedStride;
		if (ConstStride) {
		ShiftedStride = DAG.getConstant(*ConstStride << 16, Loc, MVT::i32);
		} else {
		SDValue ExtStride = DAG.getAnyExtOrTrunc(Stride, Loc, MVT::i32);
		ShiftedStride =
		DAG.getNode(ISD::SHL, Loc, MVT::i32, ExtStride,
		DAG.getShiftAmountConstant(16, MVT::i32, Loc));
		arsenmUnsubmitted Not Done Reply Inline Actions Hardcoding this to an i32 constant is fine instead of going through DAG.getShiftAmountConstant arsenm: Hardcoding this to an i32 constant is fine instead of going through DAG.getShiftAmountConstant
		}
		NewHighHalf = DAG.getNode(ISD::OR, Loc, MVT::i32, Masked, ShiftedStride);
		}

		SDValue Rsrc = DAG.getNode(ISD::BUILD_VECTOR, Loc, MVT::v4i32, LowHalf,
		NewHighHalf, NumRecords, Flags);
		SDValue RsrcPtr = DAG.getNode(ISD::BITCAST, Loc, MVT::i128, Rsrc);
		return RsrcPtr;
		arsenmUnsubmitted Not Done Reply Inline Actions can fold to direct return arsenm: can fold to direct return
		}

// Handle 8 bit and 16 bit buffer loads		// Handle 8 bit and 16 bit buffer loads
SDValue SITargetLowering::handleByteShortBufferLoads(SelectionDAG &DAG,		SDValue SITargetLowering::handleByteShortBufferLoads(SelectionDAG &DAG,
EVT LoadVT, SDLoc DL,		EVT LoadVT, SDLoc DL,
ArrayRef<SDValue> Ops,		ArrayRef<SDValue> Ops,
MemSDNode *M) const {		MemSDNode *M) const {
EVT IntVT = LoadVT.changeTypeToInteger();		EVT IntVT = LoadVT.changeTypeToInteger();
unsigned Opc = (LoadVT.getScalarType() == MVT::i8) ?		unsigned Opc = (LoadVT.getScalarType() == MVT::i8) ?
AMDGPUISD::BUFFER_LOAD_UBYTE : AMDGPUISD::BUFFER_LOAD_USHORT;		AMDGPUISD::BUFFER_LOAD_UBYTE : AMDGPUISD::BUFFER_LOAD_USHORT;
▲ Show 20 Lines • Show All 4,893 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.make.buffer.rsrc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -stop-after=instruction-select < %s \| FileCheck %s

				define amdgpu_ps ptr addrspace(8) @basic_raw_buffer(ptr inreg %p) {
				; CHECK-LABEL: name: basic_raw_buffer
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 5678
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 1234
				; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY1]], [[S_MOV_B32_2]], implicit-def $scc
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY2]], implicit $exec
				; CHECK-NEXT: $sgpr0 = COPY [[V_READFIRSTLANE_B32_]]
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[S_AND_B32_]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY3]], implicit $exec
				; CHECK-NEXT: $sgpr1 = COPY [[V_READFIRSTLANE_B32_1]]
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_1]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY4]], implicit $exec
				; CHECK-NEXT: $sgpr2 = COPY [[V_READFIRSTLANE_B32_2]]
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY5]], implicit $exec
				; CHECK-NEXT: $sgpr3 = COPY [[V_READFIRSTLANE_B32_3]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0, implicit $sgpr1, implicit $sgpr2, implicit $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 0, i32 1234, i32 5678)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps float @read_raw_buffer(ptr addrspace(1) inreg %p) {
				; CHECK-LABEL: name: read_raw_buffer
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY1]], [[S_MOV_B32_1]], implicit-def $scc
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[S_AND_B32_]], %subreg.sub1, [[S_MOV_B32_]], %subreg.sub2, [[S_MOV_B32_]], %subreg.sub3
				; CHECK-NEXT: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET [[REG_SEQUENCE]], [[S_MOV_B32_]], 4, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.rsrc, align 1, addrspace 8)
				; CHECK-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $vgpr0
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %p, i16 0, i32 0, i32 0)
				%loaded = call float @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %rsrc, i32 4, i32 0, i32 0)
				ret float %loaded
				}

				define amdgpu_ps ptr addrspace(8) @basic_struct_buffer(ptr inreg %p) {
				; CHECK-LABEL: name: basic_struct_buffer
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 5678
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 1234
				; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY1]], [[S_MOV_B32_2]], implicit-def $scc
				; CHECK-NEXT: [[S_MOV_B32_3:%[0-9]+]]:sreg_32 = S_MOV_B32 262144
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], [[S_MOV_B32_3]], implicit-def $scc
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY2]], implicit $exec
				; CHECK-NEXT: $sgpr0 = COPY [[V_READFIRSTLANE_B32_]]
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[S_OR_B32_]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY3]], implicit $exec
				; CHECK-NEXT: $sgpr1 = COPY [[V_READFIRSTLANE_B32_1]]
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_1]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY4]], implicit $exec
				; CHECK-NEXT: $sgpr2 = COPY [[V_READFIRSTLANE_B32_2]]
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY5]], implicit $exec
				; CHECK-NEXT: $sgpr3 = COPY [[V_READFIRSTLANE_B32_3]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0, implicit $sgpr1, implicit $sgpr2, implicit $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 4, i32 1234, i32 5678)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps ptr addrspace(8) @variable_top_half(ptr inreg %p, i32 inreg %numVals, i32 inreg %flags) {
				; CHECK-LABEL: name: variable_top_half
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr3
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY1]], [[S_MOV_B32_]], implicit-def $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 262144
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], [[S_MOV_B32_1]], implicit-def $scc
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY4]], implicit $exec
				; CHECK-NEXT: $sgpr0 = COPY [[V_READFIRSTLANE_B32_]]
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_OR_B32_]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY5]], implicit $exec
				; CHECK-NEXT: $sgpr1 = COPY [[V_READFIRSTLANE_B32_1]]
				; CHECK-NEXT: [[COPY6:%[0-9]+]]:vgpr_32 = COPY [[COPY2]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec
				; CHECK-NEXT: $sgpr2 = COPY [[V_READFIRSTLANE_B32_2]]
				; CHECK-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY3]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY7]], implicit $exec
				; CHECK-NEXT: $sgpr3 = COPY [[V_READFIRSTLANE_B32_3]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0, implicit $sgpr1, implicit $sgpr2, implicit $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 4, i32 %numVals, i32 %flags)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps ptr addrspace(8) @general_case(ptr inreg %p, i16 inreg %stride, i32 inreg %numVals, i32 inreg %flags) {
				; CHECK-LABEL: name: general_case
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr3
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr4
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY1]], [[S_MOV_B32_]], implicit-def $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 16
				; CHECK-NEXT: [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY2]], [[S_MOV_B32_1]], implicit-def $scc
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], [[S_LSHL_B32_]], implicit-def $scc
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY5]], implicit $exec
				; CHECK-NEXT: $sgpr0 = COPY [[V_READFIRSTLANE_B32_]]
				; CHECK-NEXT: [[COPY6:%[0-9]+]]:vgpr_32 = COPY [[S_OR_B32_]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY6]], implicit $exec
				; CHECK-NEXT: $sgpr1 = COPY [[V_READFIRSTLANE_B32_1]]
				; CHECK-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[COPY3]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY7]], implicit $exec
				; CHECK-NEXT: $sgpr2 = COPY [[V_READFIRSTLANE_B32_2]]
				; CHECK-NEXT: [[COPY8:%[0-9]+]]:vgpr_32 = COPY [[COPY4]]
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY8]], implicit $exec
				; CHECK-NEXT: $sgpr3 = COPY [[V_READFIRSTLANE_B32_3]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $sgpr0, implicit $sgpr1, implicit $sgpr2, implicit $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %numVals, i32 %flags)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps float @general_case_load(ptr inreg %p, i16 inreg %stride, i32 inreg %numVals, i32 inreg %flags) {
				; CHECK-LABEL: name: general_case_load
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:sreg_32 = COPY $sgpr3
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:sreg_32 = COPY $sgpr4
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY1]], [[S_MOV_B32_]], implicit-def $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 16
				; CHECK-NEXT: [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY2]], [[S_MOV_B32_1]], implicit-def $scc
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_AND_B32_]], [[S_LSHL_B32_]], implicit-def $scc
				; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[S_OR_B32_]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_2]]
				; CHECK-NEXT: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[COPY5]], [[REG_SEQUENCE]], [[S_MOV_B32_2]], 0, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.rsrc, align 1, addrspace 8)
				; CHECK-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $vgpr0
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %numVals, i32 %flags)
				%value = call float @llvm.amdgcn.struct.ptr.buffer.load(ptr addrspace(8) %rsrc, i32 0, i32 0, i32 0, i32 0)
				ret float %value
				}

				; None of the components are uniform due to the lack of an inreg
				define amdgpu_ps float @general_case_load_with_waterfall(ptr %p, i16 %stride, i32 %numVals, i32 %flags) {
				; CHECK-LABEL: name: general_case_load_with_waterfall
				; CHECK: bb.1 (%ir-block.0):
				; CHECK-NEXT: successors: %bb.2(0x80000000)
				; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr3
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr4
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_]]
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 16
				; CHECK-NEXT: [[COPY6:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_1]]
				; CHECK-NEXT: [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 [[COPY6]], [[COPY2]], implicit $exec
				; CHECK-NEXT: [[V_AND_OR_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_OR_B32_e64 [[COPY1]], [[COPY5]], [[V_LSHLREV_B32_e64_]], implicit $exec
				; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:vreg_128 = REG_SEQUENCE [[COPY]], %subreg.sub0, [[V_AND_OR_B32_e64_]], %subreg.sub1, [[COPY3]], %subreg.sub2, [[COPY4]], %subreg.sub3
				; CHECK-NEXT: [[COPY7:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_2]]
				; CHECK-NEXT: [[S_MOV_B64_:%[0-9]+]]:sreg_64_xexec = S_MOV_B64 $exec
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.2:
				; CHECK-NEXT: successors: %bb.3(0x80000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY]], implicit $exec
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_1:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[V_AND_OR_B32_e64_]], implicit $exec
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_2:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY3]], implicit $exec
				; CHECK-NEXT: [[V_READFIRSTLANE_B32_3:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY4]], implicit $exec
				; CHECK-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[V_READFIRSTLANE_B32_]], %subreg.sub0, [[V_READFIRSTLANE_B32_1]], %subreg.sub1, [[V_READFIRSTLANE_B32_2]], %subreg.sub2, [[V_READFIRSTLANE_B32_3]], %subreg.sub3
				; CHECK-NEXT: [[COPY8:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub0_sub1
				; CHECK-NEXT: [[COPY9:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]].sub2_sub3
				; CHECK-NEXT: [[COPY10:%[0-9]+]]:sreg_64 = COPY [[REG_SEQUENCE1]].sub0_sub1
				; CHECK-NEXT: [[COPY11:%[0-9]+]]:sreg_64 = COPY [[REG_SEQUENCE1]].sub2_sub3
				; CHECK-NEXT: [[V_CMP_EQ_U64_e64_:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[COPY10]], [[COPY8]], implicit $exec
				; CHECK-NEXT: [[V_CMP_EQ_U64_e64_1:%[0-9]+]]:sreg_64_xexec = V_CMP_EQ_U64_e64 [[COPY11]], [[COPY9]], implicit $exec
				; CHECK-NEXT: [[S_AND_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_B64 [[V_CMP_EQ_U64_e64_]], [[V_CMP_EQ_U64_e64_1]], implicit-def $scc
				; CHECK-NEXT: [[S_AND_SAVEEXEC_B64_:%[0-9]+]]:sreg_64_xexec = S_AND_SAVEEXEC_B64 killed [[S_AND_B64_]], implicit-def $exec, implicit-def $scc, implicit $exec
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.3:
				; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.2(0x40000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[COPY7]], [[REG_SEQUENCE1]], [[S_MOV_B32_2]], 0, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.rsrc, align 1, addrspace 8)
				; CHECK-NEXT: $exec = S_XOR_B64_term $exec, [[S_AND_SAVEEXEC_B64_]], implicit-def $scc
				; CHECK-NEXT: SI_WATERFALL_LOOP %bb.2, implicit $exec
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.4:
				; CHECK-NEXT: successors: %bb.5(0x80000000)
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: $exec = S_MOV_B64_term [[S_MOV_B64_]]
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: bb.5:
				; CHECK-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG implicit $vgpr0
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %numVals, i32 %flags)
				%value = call float @llvm.amdgcn.struct.ptr.buffer.load(ptr addrspace(8) %rsrc, i32 0, i32 0, i32 0, i32 0)
				ret float %value
				}

				declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr nocapture readnone, i16, i32, i32)
				declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) nocapture readnone, i16, i32, i32)
				declare float @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) nocapture readonly, i32, i32, i32 immarg)
				declare float @llvm.amdgcn.struct.ptr.buffer.load(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg)

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.make.buffer.rsrc.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -stop-after=amdgpu-isel < %s \| FileCheck %s

				define amdgpu_ps ptr addrspace(8) @basic_raw_buffer(ptr inreg %p) {
				; CHECK-LABEL: name: basic_raw_buffer
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY]], killed [[S_MOV_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 1234
				; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 5678
				; CHECK-NEXT: $sgpr0 = COPY [[COPY1]]
				; CHECK-NEXT: $sgpr1 = COPY [[S_AND_B32_]]
				; CHECK-NEXT: $sgpr2 = COPY [[S_MOV_B32_1]]
				; CHECK-NEXT: $sgpr3 = COPY [[S_MOV_B32_2]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $sgpr0, $sgpr1, $sgpr2, $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 0, i32 1234, i32 5678)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps float @read_raw_buffer(ptr addrspace(1) inreg %p) {
				; CHECK-LABEL: name: read_raw_buffer
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY]], killed [[S_MOV_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY1]], %subreg.sub0, killed [[S_AND_B32_]], %subreg.sub1, [[S_MOV_B32_1]], %subreg.sub2, [[S_MOV_B32_1]], %subreg.sub3
				; CHECK-NEXT: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET killed [[REG_SEQUENCE]], [[S_MOV_B32_1]], 4, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.rsrc, align 1, addrspace 8)
				; CHECK-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $vgpr0
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %p, i16 0, i32 0, i32 0)
				%loaded = call float @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %rsrc, i32 4, i32 0, i32 0)
				ret float %loaded
				}

				define amdgpu_ps ptr addrspace(8) @basic_struct_buffer(ptr inreg %p) {
				; CHECK-LABEL: name: basic_struct_buffer
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr0
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY]], killed [[S_MOV_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 262144
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 killed [[S_AND_B32_]], killed [[S_MOV_B32_1]], implicit-def dead $scc
				; CHECK-NEXT: [[S_MOV_B32_2:%[0-9]+]]:sreg_32 = S_MOV_B32 1234
				; CHECK-NEXT: [[S_MOV_B32_3:%[0-9]+]]:sreg_32 = S_MOV_B32 5678
				; CHECK-NEXT: $sgpr0 = COPY [[COPY1]]
				; CHECK-NEXT: $sgpr1 = COPY [[S_OR_B32_]]
				; CHECK-NEXT: $sgpr2 = COPY [[S_MOV_B32_2]]
				; CHECK-NEXT: $sgpr3 = COPY [[S_MOV_B32_3]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $sgpr0, $sgpr1, $sgpr2, $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 4, i32 1234, i32 5678)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps ptr addrspace(8) @variable_top_half(ptr inreg %p, i32 inreg %numVals, i32 inreg %flags) {
				; CHECK-LABEL: name: variable_top_half
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr3
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr2
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr0
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY2]], killed [[S_MOV_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 262144
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 killed [[S_AND_B32_]], killed [[S_MOV_B32_1]], implicit-def dead $scc
				; CHECK-NEXT: $sgpr0 = COPY [[COPY3]]
				; CHECK-NEXT: $sgpr1 = COPY [[S_OR_B32_]]
				; CHECK-NEXT: $sgpr2 = COPY [[COPY1]]
				; CHECK-NEXT: $sgpr3 = COPY [[COPY]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $sgpr0, $sgpr1, $sgpr2, $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 4, i32 %numVals, i32 %flags)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps ptr addrspace(8) @general_case(ptr inreg %p, i16 inreg %stride, i32 inreg %numVals, i32 inreg %flags) {
				; CHECK-LABEL: name: general_case
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr4
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr3
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = COPY $sgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr0
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY3]], killed [[S_MOV_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY2]], 16, implicit-def dead $scc
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 killed [[S_AND_B32_]], killed [[S_LSHL_B32_]], implicit-def dead $scc
				; CHECK-NEXT: $sgpr0 = COPY [[COPY4]]
				; CHECK-NEXT: $sgpr1 = COPY [[S_OR_B32_]]
				; CHECK-NEXT: $sgpr2 = COPY [[COPY1]]
				; CHECK-NEXT: $sgpr3 = COPY [[COPY]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $sgpr0, $sgpr1, $sgpr2, $sgpr3
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %numVals, i32 %flags)
				ret ptr addrspace(8) %rsrc
				}

				define amdgpu_ps float @general_case_load(ptr inreg %p, i16 inreg %stride, i32 inreg %numVals, i32 inreg %flags) {
				; CHECK-LABEL: name: general_case_load
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $sgpr0, $sgpr1, $sgpr2, $sgpr3, $sgpr4
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:sgpr_32 = COPY $sgpr4
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:sgpr_32 = COPY $sgpr3
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:sgpr_32 = COPY $sgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:sgpr_32 = COPY $sgpr1
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:sgpr_32 = COPY $sgpr0
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[COPY3]], killed [[S_MOV_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[S_LSHL_B32_:%[0-9]+]]:sreg_32 = S_LSHL_B32 [[COPY2]], 16, implicit-def dead $scc
				; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 killed [[S_AND_B32_]], killed [[S_LSHL_B32_]], implicit-def dead $scc
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY4]], %subreg.sub0, killed [[S_OR_B32_]], %subreg.sub1, [[COPY1]], %subreg.sub2, [[COPY]], %subreg.sub3
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_1]]
				; CHECK-NEXT: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[COPY5]], killed [[REG_SEQUENCE]], [[S_MOV_B32_1]], 0, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.rsrc, align 1, addrspace 8)
				; CHECK-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $vgpr0
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %numVals, i32 %flags)
				%value = call float @llvm.amdgcn.struct.ptr.buffer.load(ptr addrspace(8) %rsrc, i32 0, i32 0, i32 0, i32 0)
				ret float %value
				}

				; None of the components are uniform due to the lack of an inreg
				define amdgpu_ps float @general_case_load_with_waterfall(ptr %p, i16 %stride, i32 %numVals, i32 %flags) {
				; CHECK-LABEL: name: general_case_load_with_waterfall
				; CHECK: bb.0 (%ir-block.0):
				; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
				; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr4
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr3
				; CHECK-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; CHECK-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; CHECK-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; CHECK-NEXT: [[V_LSHLREV_B32_e64_:%[0-9]+]]:vgpr_32 = V_LSHLREV_B32_e64 16, [[COPY2]], implicit $exec
				; CHECK-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 65535
				; CHECK-NEXT: [[V_AND_OR_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_OR_B32_e64 [[COPY3]], killed [[S_MOV_B32_]], killed [[V_LSHLREV_B32_e64_]], implicit $exec
				; CHECK-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sgpr_128 = REG_SEQUENCE [[COPY4]], %subreg.sub0, killed [[V_AND_OR_B32_e64_]], %subreg.sub1, [[COPY1]], %subreg.sub2, [[COPY]], %subreg.sub3
				; CHECK-NEXT: [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; CHECK-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[S_MOV_B32_1]]
				; CHECK-NEXT: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[COPY5]], killed [[REG_SEQUENCE]], [[S_MOV_B32_1]], 0, 0, 0, implicit $exec :: (dereferenceable load (s32) from %ir.rsrc, align 1, addrspace 8)
				; CHECK-NEXT: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; CHECK-NEXT: SI_RETURN_TO_EPILOG $vgpr0
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %p, i16 %stride, i32 %numVals, i32 %flags)
				%value = call float @llvm.amdgcn.struct.ptr.buffer.load(ptr addrspace(8) %rsrc, i32 0, i32 0, i32 0, i32 0)
				ret float %value
				}

				declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr nocapture readnone, i16, i32, i32)
				declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) nocapture readnone, i16, i32, i32)
				declare float @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) nocapture readonly, i32, i32, i32 immarg)
				declare float @llvm.amdgcn.struct.ptr.buffer.load(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg)

llvm/test/CodeGen/AMDGPU/make-buffer-rsrc-lds-fails.ll

This file was added.

				; RUN: not --crash llc -march=amdgcn -mcpu=gfx900 < %s
				; RUN: not --crash llc -global-isel -march=amdgcn -mcpu=gfx900 < %s

				chapuniUnsubmitted Not Done Reply Inline Actions Would they really crash? I guess they require +asserts. chapuni: Would they really crash? I guess they require +asserts.
				define amdgpu_ps ptr addrspace(8) @basic_raw_buffer(ptr addrspace(3) inreg %p) {
				%rsrc = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p3(ptr addrspace(3) %p, i16 0, i32 1234, i32 5678)
				ret ptr addrspace(8) %rsrc
				}
				declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p3(ptr addrspace(3) nocapture readnone, i16, i32, i32)

llvm/test/CodeGen/AMDGPU/ptr-buffer-alias-scheduling.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	; GISEL-NEXT: s_endpgm

%l3 = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) %a, i32 12, i32 0, i32 0)		%l3 = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) %a, i32 12, i32 0, i32 0)
%s3 = fmul float %l3, %l3		%s3 = fmul float %l3, %l3
call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %s3, ptr addrspace(8) %b, i32 12, i32 0, i32 0)		call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %s3, ptr addrspace(8) %b, i32 12, i32 0, i32 0)

ret void		ret void
}		}

		define amdgpu_kernel void @buffers_from_flat_dont_alias(ptr noalias %a.flat, ptr noalias %b.flat) {
		; SDAG-LABEL: buffers_from_flat_dont_alias:
		; SDAG: ; %bb.0:
		; SDAG-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
		; SDAG-NEXT: s_mov_b32 s7, 0
		; SDAG-NEXT: s_mov_b32 s6, 16
		; SDAG-NEXT: s_waitcnt lgkmcnt(0)
		; SDAG-NEXT: s_and_b32 s5, s1, 0xffff
		; SDAG-NEXT: s_mov_b32 s4, s0
		; SDAG-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0
		; SDAG-NEXT: s_and_b32 s5, s3, 0xffff
		; SDAG-NEXT: s_mov_b32 s4, s2
		; SDAG-NEXT: s_waitcnt vmcnt(0)
		; SDAG-NEXT: v_mul_f32_e32 v0, v0, v0
		; SDAG-NEXT: v_mul_f32_e32 v1, v1, v1
		; SDAG-NEXT: v_mul_f32_e32 v2, v2, v2
		; SDAG-NEXT: v_mul_f32_e32 v3, v3, v3
		; SDAG-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
		; SDAG-NEXT: s_endpgm
		;
		; GISEL-LABEL: buffers_from_flat_dont_alias:
		; GISEL: ; %bb.0:
		; GISEL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
		; GISEL-NEXT: s_mov_b32 s7, 0
		; GISEL-NEXT: s_mov_b32 s6, 16
		; GISEL-NEXT: s_waitcnt lgkmcnt(0)
		; GISEL-NEXT: s_and_b32 s5, s1, 0xffff
		; GISEL-NEXT: s_mov_b32 s4, s0
		; GISEL-NEXT: buffer_load_dwordx4 v[0:3], off, s[4:7], 0
		; GISEL-NEXT: s_and_b32 s5, s3, 0xffff
		; GISEL-NEXT: s_mov_b32 s4, s2
		; GISEL-NEXT: s_waitcnt vmcnt(0)
		; GISEL-NEXT: v_mul_f32_e32 v0, v0, v0
		; GISEL-NEXT: v_mul_f32_e32 v1, v1, v1
		; GISEL-NEXT: v_mul_f32_e32 v2, v2, v2
		; GISEL-NEXT: v_mul_f32_e32 v3, v3, v3
		; GISEL-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
		; GISEL-NEXT: s_endpgm
		%a = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %a.flat, i16 0, i32 16, i32 0)
		%b = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr %b.flat, i16 0, i32 16, i32 0)

		%l0 = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) %a, i32 0, i32 0, i32 0)
		%s0 = fmul float %l0, %l0
		call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %s0, ptr addrspace(8) %b, i32 0, i32 0, i32 0)

		%l1 = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) %a, i32 4, i32 0, i32 0)
		%s1 = fmul float %l1, %l1
		call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %s1, ptr addrspace(8) %b, i32 4, i32 0, i32 0)

		%l2 = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) %a, i32 8, i32 0, i32 0)
		%s2 = fmul float %l2, %l2
		call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %s2, ptr addrspace(8) %b, i32 8, i32 0, i32 0)

		%l3 = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) %a, i32 12, i32 0, i32 0)
		%s3 = fmul float %l3, %l3
		call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %s3, ptr addrspace(8) %b, i32 12, i32 0, i32 0)

		ret void
		}

define amdgpu_kernel void @buffers_might_alias(ptr addrspace(8) %a, ptr addrspace(8) %b) {		define amdgpu_kernel void @buffers_might_alias(ptr addrspace(8) %a, ptr addrspace(8) %b) {
; SDAG-LABEL: buffers_might_alias:		; SDAG-LABEL: buffers_might_alias:
; SDAG: ; %bb.0:		; SDAG: ; %bb.0:
; SDAG-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24		; SDAG-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
; SDAG-NEXT: s_waitcnt lgkmcnt(0)		; SDAG-NEXT: s_waitcnt lgkmcnt(0)
; SDAG-NEXT: buffer_load_dword v0, off, s[0:3], 0		; SDAG-NEXT: buffer_load_dword v0, off, s[0:3], 0
; SDAG-NEXT: s_waitcnt vmcnt(0)		; SDAG-NEXT: s_waitcnt vmcnt(0)
; SDAG-NEXT: v_mul_f32_e32 v0, v0, v0		; SDAG-NEXT: v_mul_f32_e32 v0, v0, v0
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	; GISEL-NEXT: s_endpgm

ret void		ret void
}		}

declare i32 @llvm.amdgcn.workitem.id.x()		declare i32 @llvm.amdgcn.workitem.id.x()

declare float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8), i32, i32, i32)		declare float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8), i32, i32, i32)
declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32 immarg)		declare void @llvm.amdgcn.raw.ptr.buffer.store.f32(float, ptr addrspace(8), i32, i32, i32 immarg)
		declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p0(ptr readnone nocapture, i16, i32, i32)

llvm/test/Transforms/LICM/AMDGPU/buffer-rsrc-ptrs.ll

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	loop:

%next = add i32 %i, 1		%next = add i32 %i, 1
%cond = icmp ult i32 %next, %bound		%cond = icmp ult i32 %next, %bound
br i1 %cond, label %loop, label %tail		br i1 %cond, label %loop, label %tail
tail:		tail:
ret void		ret void
}		}

		define void @hoistable_buffer_construction_intrinsic(ptr addrspace(1) noalias %p.global, ptr addrspace(1) noalias %q.global, i32 %bound) {
		; CHECK-LABEL: define void @hoistable_buffer_construction_intrinsic
		; CHECK-SAME: (ptr addrspace(1) noalias [[P_GLOBAL:%.]], ptr addrspace(1) noalias [[Q_GLOBAL:%.]], i32 [[BOUND:%.*]]) {
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[P:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) [[P_GLOBAL]], i16 0, i32 0, i32 0)
		; CHECK-NEXT: [[Q:%.*]] = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) [[Q_GLOBAL]], i16 0, i32 0, i32 0)
		; CHECK-NEXT: [[HOISTABLE:%.*]] = call i32 @llvm.amdgcn.struct.ptr.buffer.load.i32(ptr addrspace(8) [[Q]], i32 0, i32 0, i32 0, i32 0)
		; CHECK-NEXT: br label [[LOOP:%.*]]
		; CHECK: loop:
		; CHECK-NEXT: [[I:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[LOOP]] ]
		; CHECK-NEXT: [[ORIG:%.*]] = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) [[P]], i32 [[I]], i32 0, i32 0)
		; CHECK-NEXT: [[INC:%.*]] = add i32 [[HOISTABLE]], [[ORIG]]
		; CHECK-NEXT: call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 [[INC]], ptr addrspace(8) [[P]], i32 [[I]], i32 0, i32 0)
		; CHECK-NEXT: [[NEXT]] = add i32 [[I]], 1
		; CHECK-NEXT: [[COND:%.*]] = icmp ult i32 [[NEXT]], [[BOUND]]
		; CHECK-NEXT: br i1 [[COND]], label [[LOOP]], label [[TAIL:%.*]]
		; CHECK: tail:
		; CHECK-NEXT: ret void
		;
		entry:
		%p = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %p.global, i16 0, i32 0, i32 0)
		%q = call ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) %q.global, i16 0, i32 0, i32 0)
		br label %loop
		loop:
		%i = phi i32 [0, %entry], [%next, %loop]

		%hoistable = call i32 @llvm.amdgcn.struct.ptr.buffer.load.i32(ptr addrspace(8) %q, i32 0, i32 0, i32 0, i32 0)
		%orig = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) %p, i32 %i, i32 0, i32 0)
		%inc = add i32 %hoistable, %orig
		call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %inc, ptr addrspace(8) %p, i32 %i, i32 0, i32 0)

		%next = add i32 %i, 1
		%cond = icmp ult i32 %next, %bound
		br i1 %cond, label %loop, label %tail
		tail:
		ret void
		}


define void @hoistable_buffer_construction_alias_scope(ptr addrspace(1) %p.global, ptr addrspace(1) %q.global, i32 %bound) {		define void @hoistable_buffer_construction_alias_scope(ptr addrspace(1) %p.global, ptr addrspace(1) %q.global, i32 %bound) {
; CHECK-LABEL: define void @hoistable_buffer_construction_alias_scope		; CHECK-LABEL: define void @hoistable_buffer_construction_alias_scope
; CHECK-SAME: (ptr addrspace(1) [[P_GLOBAL:%.]], ptr addrspace(1) [[Q_GLOBAL:%.]], i32 [[BOUND:%.*]]) {		; CHECK-SAME: (ptr addrspace(1) [[P_GLOBAL:%.]], ptr addrspace(1) [[Q_GLOBAL:%.]], i32 [[BOUND:%.*]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[P_GLOBAL_INT:%.*]] = ptrtoint ptr addrspace(1) [[P_GLOBAL]] to i64		; CHECK-NEXT: [[P_GLOBAL_INT:%.*]] = ptrtoint ptr addrspace(1) [[P_GLOBAL]] to i64
; CHECK-NEXT: [[Q_GLOBAL_INT:%.*]] = ptrtoint ptr addrspace(1) [[Q_GLOBAL]] to i64		; CHECK-NEXT: [[Q_GLOBAL_INT:%.*]] = ptrtoint ptr addrspace(1) [[Q_GLOBAL]] to i64
; CHECK-NEXT: [[P_TRUNC:%.*]] = trunc i64 [[P_GLOBAL_INT]] to i48		; CHECK-NEXT: [[P_TRUNC:%.*]] = trunc i64 [[P_GLOBAL_INT]] to i48
; CHECK-NEXT: [[Q_TRUNC:%.*]] = trunc i64 [[Q_GLOBAL_INT]] to i48		; CHECK-NEXT: [[Q_TRUNC:%.*]] = trunc i64 [[Q_GLOBAL_INT]] to i48
Show All 40 Lines
}		}

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)		; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
declare i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) nocapture readonly, i32, i32, i32 immarg) #0		declare i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) nocapture readonly, i32, i32, i32 immarg) #0
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)		; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
declare i32 @llvm.amdgcn.struct.ptr.buffer.load.i32(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg) #0		declare i32 @llvm.amdgcn.struct.ptr.buffer.load.i32(ptr addrspace(8) nocapture readonly, i32, i32, i32, i32 immarg) #0
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: write)		; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: write)
declare void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32, ptr addrspace(8) nocapture writeonly, i32, i32, i32 immarg) #1		declare void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32, ptr addrspace(8) nocapture writeonly, i32, i32, i32 immarg) #1
		; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) nocapture readnone, i16, i32, i32) #2
		declare ptr addrspace(8) @llvm.amdgcn.make.buffer.rsrc.p1(ptr addrspace(1) readnone nocapture, i16, i32, i32)
attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }		attributes #0 = { nocallback nofree nosync nounwind willreturn memory(argmem: read) }
attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }		attributes #1 = { nocallback nofree nosync nounwind willreturn memory(argmem: write) }
		attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsic for converting global pointers to resourcesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 523060

llvm/docs/AMDGPUUsage.rst

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Analysis/ValueTracking.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.h

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.h

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.make.buffer.rsrc.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.make.buffer.rsrc.ll

llvm/test/CodeGen/AMDGPU/make-buffer-rsrc-lds-fails.ll

llvm/test/CodeGen/AMDGPU/ptr-buffer-alias-scheduling.ll

llvm/test/Transforms/LICM/AMDGPU/buffer-rsrc-ptrs.ll

[AMDGPU] Add intrinsic for converting global pointers to resources
ClosedPublic