This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] intrintrics for byte/short load/store
ClosedPublic

Authored by rtaylor on Feb 3 2018, 10:35 AM.

Download Raw Diff

Details

Reviewers

mareko
arsenm
nhaehnle
timcorringham

Group Reviewers

Restricted Project

Summary

Added intrinsics for the instructions:

buffer_load_ubyte
buffer_load_ushort
buffer_store_byte
buffer_store_short

Added test cases to existing buffer load/store tests.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 29056
Build 29055: arc lint + arc unit

Event Timeline

timcorringham created this revision.Feb 3 2018, 10:35 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 6 others. · View Herald TranscriptFeb 3 2018, 10:35 AM

Harbormaster completed remote builds in B14595: Diff 132746.Feb 3 2018, 10:37 AM

timcorringham added a reviewer: Restricted Project.Feb 3 2018, 10:44 AM

timcorringham added a reviewer: mareko.Feb 3 2018, 10:47 AM

I don't think we need intrinsics for these. At most we should add a mangled type to the existing buffer intrinsics.

include/llvm/IR/IntrinsicsAMDGPU.td
822	float return type doesn't make sense

This revision now requires changes to proceed.Feb 3 2018, 11:01 AM

Matt, the instructions zero extend the data to i32, so the return type of the int, ushort and ubyte variants are the same, and overloading would not work.

Matt, we do actually need these intrinsics as we have an urgent requirement for them Open Vulkan (which is of course my motivation for implementing them).

As Tim commented, the load ubyte and load short instructions extend to 32 bits. While float is a little odd, it does match the behavior of the other buffer_load instructions. Also I think changing it would require a disproportionate amount of effort.

It should be easy to change the return type to i32.

and, of course, the vdata type.

Couldn't we also optimize the loads at least based on used bits like a normal load?

In D42885#997648, @timcorringham wrote:

Matt, we do actually need these intrinsics as we have an urgent requirement for them Open Vulkan (which is of course my motivation for implementing them).

As Tim commented, the load ubyte and load short instructions extend to 32 bits. While float is a little odd, it does match the behavior of the other buffer_load instructions. Also I think changing it would require a disproportionate amount of effort.

Yes, they do but the intrinsic doesn't need to care.

If we define an overloaded intrinsic with a return type of i8, and the IR using it wants the value zero extended to i32, the frontend would then have to emit a separate zext. I guess we could optimize that to the zero-extending instruction in instruction selection, but wouldn't it be better to have the intrinsic match what the ISA instruction does by returning the zero extended i32?

Maybe a dumb question - but why can't we just use the tbuffer load/store instead of these? It already upcasts for you (the zext/sext is built in depending on the nfmt I believe).

Herald added a subscriber: jvesely. · View Herald TranscriptOct 19 2018, 1:55 AM

In D42885#1268934, @sheredom wrote:

Maybe a dumb question - but why can't we just use the tbuffer load/store instead of these? It already upcasts for you (the zext/sext is built in depending on the nfmt I believe).

Well we can, but using the loads without conversion can be faster? See https://gpuopen.com/gcn-memory-coalescing/ :

32-bit (or smaller) single-channel buffer loads / wave = 4 clocks (under specific cases)
32-bit (or smaller) filtered texels / wave = 16 clocks

Updating to include requested changes

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 10:53 AM

Harbormaster completed remote builds in B28389: Diff 187824.Feb 21 2019, 10:54 AM

rtaylor added a reviewer: nhaehnle.Feb 21 2019, 10:58 AM

arsenm added inline comments.Feb 21 2019, 5:15 PM

lib/Target/AMDGPU/SIISelLowering.cpp
5611–5619	You shouldn't be inspecting the users. You can just unconditionally use one or the other. You're going to have to insert a truncate back to the original type at the end anyway. You can then add a separate optimization to fold in the sext_inreg or mask into the buffer like is done for loads

rtaylor added inline comments.Feb 22 2019, 7:30 AM

lib/Target/AMDGPU/SIISelLowering.cpp
5611–5619	There are four potential options so what do you mean by one or the other? There is BUFFER_LOAD_ubyte/ushort/short/byte for the Opc.

arsenm added inline comments.Feb 22 2019, 7:49 AM

lib/Target/AMDGPU/SIISelLowering.cpp
5611–5619	You can just unconditionally use load_ubyte/load_ushort. Folding the sign extend in is then a separate optimization on a sext (or more likely a sext_inreg)

rtaylor added inline comments.Feb 22 2019, 8:02 AM

lib/Target/AMDGPU/SIISelLowering.cpp
5611–5619	So outputting byte/short based on sign_extend in the tablgen pattern? This won't allow re-use of the existing multiclass without changes. I think I remember there being a reason Nicolai and I decided not to do this.

arsenm added inline comments.Feb 26 2019, 7:56 AM

lib/Target/AMDGPU/SIISelLowering.cpp
5611–5619	This doesn't change the selection. This is an optimization done in the DAGCombiner

Request changes

Harbormaster completed remote builds in B28601: Diff 188597.Feb 27 2019, 12:09 PM

Rename function to better reflect what it does

Harbormaster completed remote builds in B28602: Diff 188598.Feb 27 2019, 12:17 PM

Ping.

arsenm added inline comments.Mar 5 2019, 8:08 AM

lib/Target/AMDGPU/SIISelLowering.cpp
5605–5609	Ternary operator
5635	Capitalize
5636–5637	This looks identical to the other part, which is kind of surprising to me but this should be factored into something common
5639	This comment can be removed
5670	Repeated again
7782	Formatting
7785	Should have a hasOneUse check
7786	Leftover debugging
7787	This is missing a check on the source type. If you want to be fancier, you can split out the remainder bits into a new sign extend but there probably isn't much reason to
7792–7794	"will be set by" part doesn't make sense here
test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll
275	The base test case shouldn't have a extend of the use and directly use the value. You should also have one with an explicit zext
307	Could use a testcase with a second non-extended use

rtaylor added inline comments.Mar 5 2019, 8:44 AM

lib/Target/AMDGPU/SIISelLowering.cpp
5636–5637	Do you mean that there is simliar/same code in the different cases of buffer_load? (ie struct, raw, normal) And that these should be factored into something common (ie function call)?
7785	You mean there should be a hasOneUse check on the SIGN_EXTEND_INREG right?
7787	Src is the BUFFER_LOAD_XXX. The only way this code is executed is if the Src is a BUFFER_LOAD_XXX. I'm not sure we need a redundant check here do we?

arsenm added inline comments.Mar 5 2019, 9:02 AM

lib/Target/AMDGPU/SIISelLowering.cpp
7785	No, the buffer operation. If there are multiple uses you will end up creating multiple loads
7787	The number of bits in the sext_inreg may not match the load's from-8/16 bit source. You can test this with something like %load = call llvm.amdgcn.buffer.load.i8() %ext = zext i8 %load to i32 %shl = shl i32 %ext, 27 %shr = ashr i32 %shl, 27 There will need more shifts to clear the extra bits in the loaded value

rtaylor added inline comments.Mar 11 2019, 10:05 AM

lib/Target/AMDGPU/SIISelLowering.cpp
7787	This should produce a buffer_load_sbyte right? That is what it does currently.

arsenm added inline comments.Mar 11 2019, 1:50 PM

lib/Target/AMDGPU/SIISelLowering.cpp
7787	But it needs additional shifts even after. Right now you'll not be clearing the extra bits in the low 8 that need to be

arsenm added inline comments.Mar 11 2019, 1:53 PM

lib/Target/AMDGPU/SIISelLowering.cpp
7787	You can either leave it as x = buffer_load_ubyte sext_inreg x, i27 or x = buffer_load_sbyte sext_inreg x, i5 I'm not sure there's much practical difference between them

rtaylor added inline comments.Mar 11 2019, 1:57 PM

lib/Target/AMDGPU/SIISelLowering.cpp
7787	Right, I don't think there is, I'm working on doing the former. Thanks.

arsenm added inline comments.Mar 11 2019, 2:04 PM

lib/Target/AMDGPU/SIISelLowering.cpp
7787	There might be a small advantage by canonicalizing the sext_inreg sizes. The constant masks that will be emitted for BFI might be more likely to be reusable

Changing ownership

Add requested changes

Harbormaster completed remote builds in B29044: Diff 190286.Mar 12 2019, 9:50 AM

Mostly LGTM except some nits. The major one is avoiding the repeated lowering code for each of these cases

lib/Target/AMDGPU/SIISelLowering.cpp
5601	Capitalize
5608	You can just hardcoded this to MVT::Other
5635	Capitalize
5636–5637	Yes
5640–5643	Ternary operator
6283–6286	Ternary operator
6309	Capitalize
6315–6318	Ternary operator
7789	Extra space before ==
7791	Extra space before ==

Requested Changes

Harbormaster completed remote builds in B29056: Diff 190349.Mar 12 2019, 3:29 PM

LGTM except formatting

lib/Target/AMDGPU/SIISelLowering.cpp
6421	Brace placement
6440	Brace placement

This revision is now accepted and ready to land.Mar 18 2019, 7:37 AM

rtaylor closed this revision.Mar 20 2019, 7:11 AM

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

4 lines

lib/

Target/

AMDGPU/

AMDGPUISelLowering.h

6 lines

AMDGPUISelLowering.cpp

22 lines

6 lines

10 lines

116 lines

14 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.buffer.load.ll

185 lines

llvm.amdgcn.buffer.store.ll

26 lines

llvm.amdgcn.raw.buffer.load.ll

58 lines

llvm.amdgcn.raw.buffer.store.ll

28 lines

llvm.amdgcn.struct.buffer.load.ll

58 lines

llvm.amdgcn.struct.buffer.store.ll

28 lines

Diff 190349

include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 813 Lines • ▼ Show 20 Lines
	// Buffer intrinsics			// Buffer intrinsics
	//////////////////////////////////////////////////////////////////////////			//////////////////////////////////////////////////////////////////////////

	let TargetPrefix = "amdgcn" in {			let TargetPrefix = "amdgcn" in {

	defset list<AMDGPURsrcIntrinsic> AMDGPUBufferIntrinsics = {			defset list<AMDGPURsrcIntrinsic> AMDGPUBufferIntrinsics = {

	class AMDGPUBufferLoad : Intrinsic <			class AMDGPUBufferLoad : Intrinsic <
	[llvm_anyfloat_ty],			[llvm_any_ty],
				arsenmUnsubmitted Not Done Reply Inline Actions float return type doesn't make sense arsenm: float return type doesn't make sense
	[llvm_v4i32_ty, // rsrc(SGPR)			[llvm_v4i32_ty, // rsrc(SGPR)
	llvm_i32_ty, // vindex(VGPR)			llvm_i32_ty, // vindex(VGPR)
	llvm_i32_ty, // offset(SGPR/VGPR/imm)			llvm_i32_ty, // offset(SGPR/VGPR/imm)
	llvm_i1_ty, // glc(imm)			llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)			llvm_i1_ty], // slc(imm)
	[IntrReadMem], "", [SDNPMemOperand]>,			[IntrReadMem], "", [SDNPMemOperand]>,
	AMDGPURsrcIntrinsic<0>;			AMDGPURsrcIntrinsic<0>;
	def int_amdgcn_buffer_load_format : AMDGPUBufferLoad;			def int_amdgcn_buffer_load_format : AMDGPUBufferLoad;
	def int_amdgcn_buffer_load : AMDGPUBufferLoad;			def int_amdgcn_buffer_load : AMDGPUBufferLoad;

	def int_amdgcn_s_buffer_load : Intrinsic <			def int_amdgcn_s_buffer_load : Intrinsic <
	[llvm_any_ty],			[llvm_any_ty],
	[llvm_v4i32_ty, // rsrc(SGPR)			[llvm_v4i32_ty, // rsrc(SGPR)
	llvm_i32_ty, // byte offset(SGPR/VGPR/imm)			llvm_i32_ty, // byte offset(SGPR/VGPR/imm)
	llvm_i32_ty], // cachepolicy(imm; bit 0 = glc)			llvm_i32_ty], // cachepolicy(imm; bit 0 = glc)
	[IntrNoMem]>,			[IntrNoMem]>,
	AMDGPURsrcIntrinsic<0>;			AMDGPURsrcIntrinsic<0>;

	class AMDGPUBufferStore : Intrinsic <			class AMDGPUBufferStore : Intrinsic <
	[],			[],
	[llvm_anyfloat_ty, // vdata(VGPR) -- can currently only select f32, v2f32, v4f32			[llvm_any_ty, // vdata(VGPR)
	llvm_v4i32_ty, // rsrc(SGPR)			llvm_v4i32_ty, // rsrc(SGPR)
	llvm_i32_ty, // vindex(VGPR)			llvm_i32_ty, // vindex(VGPR)
	llvm_i32_ty, // offset(SGPR/VGPR/imm)			llvm_i32_ty, // offset(SGPR/VGPR/imm)
	llvm_i1_ty, // glc(imm)			llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)			llvm_i1_ty], // slc(imm)
	[IntrWriteMem], "", [SDNPMemOperand]>,			[IntrWriteMem], "", [SDNPMemOperand]>,
	AMDGPURsrcIntrinsic<1>;			AMDGPURsrcIntrinsic<1>;
	def int_amdgcn_buffer_store_format : AMDGPUBufferStore;			def int_amdgcn_buffer_store_format : AMDGPUBufferStore;
	▲ Show 20 Lines • Show All 697 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
TBUFFER_LOAD_FORMAT_D16,		TBUFFER_LOAD_FORMAT_D16,
DS_ORDERED_COUNT,		DS_ORDERED_COUNT,
ATOMIC_CMP_SWAP,		ATOMIC_CMP_SWAP,
ATOMIC_INC,		ATOMIC_INC,
ATOMIC_DEC,		ATOMIC_DEC,
ATOMIC_LOAD_FMIN,		ATOMIC_LOAD_FMIN,
ATOMIC_LOAD_FMAX,		ATOMIC_LOAD_FMAX,
BUFFER_LOAD,		BUFFER_LOAD,
		BUFFER_LOAD_UBYTE,
		BUFFER_LOAD_USHORT,
		BUFFER_LOAD_BYTE,
		BUFFER_LOAD_SHORT,
BUFFER_LOAD_FORMAT,		BUFFER_LOAD_FORMAT,
BUFFER_LOAD_FORMAT_D16,		BUFFER_LOAD_FORMAT_D16,
SBUFFER_LOAD,		SBUFFER_LOAD,
BUFFER_STORE,		BUFFER_STORE,
		BUFFER_STORE_BYTE,
		BUFFER_STORE_SHORT,
BUFFER_STORE_FORMAT,		BUFFER_STORE_FORMAT,
BUFFER_STORE_FORMAT_D16,		BUFFER_STORE_FORMAT_D16,
BUFFER_ATOMIC_SWAP,		BUFFER_ATOMIC_SWAP,
BUFFER_ATOMIC_ADD,		BUFFER_ATOMIC_ADD,
BUFFER_ATOMIC_SUB,		BUFFER_ATOMIC_SUB,
BUFFER_ATOMIC_SMIN,		BUFFER_ATOMIC_SMIN,
BUFFER_ATOMIC_UMIN,		BUFFER_ATOMIC_UMIN,
BUFFER_ATOMIC_SMAX,		BUFFER_ATOMIC_SMAX,
Show All 15 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 4,194 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(TBUFFER_LOAD_FORMAT_D16)		NODE_NAME_CASE(TBUFFER_LOAD_FORMAT_D16)
NODE_NAME_CASE(DS_ORDERED_COUNT)		NODE_NAME_CASE(DS_ORDERED_COUNT)
NODE_NAME_CASE(ATOMIC_CMP_SWAP)		NODE_NAME_CASE(ATOMIC_CMP_SWAP)
NODE_NAME_CASE(ATOMIC_INC)		NODE_NAME_CASE(ATOMIC_INC)
NODE_NAME_CASE(ATOMIC_DEC)		NODE_NAME_CASE(ATOMIC_DEC)
NODE_NAME_CASE(ATOMIC_LOAD_FMIN)		NODE_NAME_CASE(ATOMIC_LOAD_FMIN)
NODE_NAME_CASE(ATOMIC_LOAD_FMAX)		NODE_NAME_CASE(ATOMIC_LOAD_FMAX)
NODE_NAME_CASE(BUFFER_LOAD)		NODE_NAME_CASE(BUFFER_LOAD)
		NODE_NAME_CASE(BUFFER_LOAD_UBYTE)
		NODE_NAME_CASE(BUFFER_LOAD_USHORT)
		NODE_NAME_CASE(BUFFER_LOAD_BYTE)
		NODE_NAME_CASE(BUFFER_LOAD_SHORT)
NODE_NAME_CASE(BUFFER_LOAD_FORMAT)		NODE_NAME_CASE(BUFFER_LOAD_FORMAT)
NODE_NAME_CASE(BUFFER_LOAD_FORMAT_D16)		NODE_NAME_CASE(BUFFER_LOAD_FORMAT_D16)
NODE_NAME_CASE(SBUFFER_LOAD)		NODE_NAME_CASE(SBUFFER_LOAD)
NODE_NAME_CASE(BUFFER_STORE)		NODE_NAME_CASE(BUFFER_STORE)
		NODE_NAME_CASE(BUFFER_STORE_BYTE)
		NODE_NAME_CASE(BUFFER_STORE_SHORT)
NODE_NAME_CASE(BUFFER_STORE_FORMAT)		NODE_NAME_CASE(BUFFER_STORE_FORMAT)
NODE_NAME_CASE(BUFFER_STORE_FORMAT_D16)		NODE_NAME_CASE(BUFFER_STORE_FORMAT_D16)
NODE_NAME_CASE(BUFFER_ATOMIC_SWAP)		NODE_NAME_CASE(BUFFER_ATOMIC_SWAP)
NODE_NAME_CASE(BUFFER_ATOMIC_ADD)		NODE_NAME_CASE(BUFFER_ATOMIC_ADD)
NODE_NAME_CASE(BUFFER_ATOMIC_SUB)		NODE_NAME_CASE(BUFFER_ATOMIC_SUB)
NODE_NAME_CASE(BUFFER_ATOMIC_SMIN)		NODE_NAME_CASE(BUFFER_ATOMIC_SMIN)
NODE_NAME_CASE(BUFFER_ATOMIC_UMIN)		NODE_NAME_CASE(BUFFER_ATOMIC_UMIN)
NODE_NAME_CASE(BUFFER_ATOMIC_SMAX)		NODE_NAME_CASE(BUFFER_ATOMIC_SMAX)
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < 32; I += 8) {
Known.Zero \|= 0xff << I;		Known.Zero \|= 0xff << I;
} else if (SelBits > 0x0c) {		} else if (SelBits > 0x0c) {
Known.One \|= 0xff << I;		Known.One \|= 0xff << I;
}		}
Sel >>= 8;		Sel >>= 8;
}		}
break;		break;
}		}
		case AMDGPUISD::BUFFER_LOAD_UBYTE: {
		Known.Zero.setHighBits(24);
		break;
		}
		case AMDGPUISD::BUFFER_LOAD_USHORT: {
		Known.Zero.setHighBits(16);
		break;
		}
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
unsigned IID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();		unsigned IID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
switch (IID) {		switch (IID) {
case Intrinsic::amdgcn_mbcnt_lo:		case Intrinsic::amdgcn_mbcnt_lo:
case Intrinsic::amdgcn_mbcnt_hi: {		case Intrinsic::amdgcn_mbcnt_hi: {
const GCNSubtarget &ST =		const GCNSubtarget &ST =
DAG.getMachineFunction().getSubtarget<GCNSubtarget>();		DAG.getMachineFunction().getSubtarget<GCNSubtarget>();
// These return at most the wavefront size - 1.		// These return at most the wavefront size - 1.
Show All 29 Lines	unsigned AMDGPUTargetLowering::ComputeNumSignBitsForTargetNode(
case AMDGPUISD::BFE_U32: {		case AMDGPUISD::BFE_U32: {
ConstantSDNode *Width = dyn_cast<ConstantSDNode>(Op.getOperand(2));		ConstantSDNode *Width = dyn_cast<ConstantSDNode>(Op.getOperand(2));
return Width ? 32 - (Width->getZExtValue() & 0x1f) : 1;		return Width ? 32 - (Width->getZExtValue() & 0x1f) : 1;
}		}

case AMDGPUISD::CARRY:		case AMDGPUISD::CARRY:
case AMDGPUISD::BORROW:		case AMDGPUISD::BORROW:
return 31;		return 31;
		case AMDGPUISD::BUFFER_LOAD_BYTE:
		return 25;
		case AMDGPUISD::BUFFER_LOAD_SHORT:
		return 17;
		case AMDGPUISD::BUFFER_LOAD_UBYTE:
		return 24;
		case AMDGPUISD::BUFFER_LOAD_USHORT:
		return 16;
case AMDGPUISD::FP_TO_FP16:		case AMDGPUISD::FP_TO_FP16:
case AMDGPUISD::FP16_ZEXT:		case AMDGPUISD::FP16_ZEXT:
return 16;		return 16;
default:		default:
return 1;		return 1;
}		}
}		}

▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

lib/Target/AMDGPU/BUFInstructions.td

	Show First 20 Lines • Show All 1,126 Lines • ▼ Show 20 Lines
	} // End HasPackedD16VMem.			} // End HasPackedD16VMem.

	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, f32, "BUFFER_LOAD_DWORD">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, f32, "BUFFER_LOAD_DWORD">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, i32, "BUFFER_LOAD_DWORD">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, i32, "BUFFER_LOAD_DWORD">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v2f32, "BUFFER_LOAD_DWORDX2">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v2f32, "BUFFER_LOAD_DWORDX2">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v2i32, "BUFFER_LOAD_DWORDX2">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v2i32, "BUFFER_LOAD_DWORDX2">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4f32, "BUFFER_LOAD_DWORDX4">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4f32, "BUFFER_LOAD_DWORDX4">;
	defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4i32, "BUFFER_LOAD_DWORDX4">;			defm : MUBUF_LoadIntrinsicPat<SIbuffer_load, v4i32, "BUFFER_LOAD_DWORDX4">;
				defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_byte, i32, "BUFFER_LOAD_SBYTE">;
				defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_short, i32, "BUFFER_LOAD_SSHORT">;
				defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_ubyte, i32, "BUFFER_LOAD_UBYTE">;
				defm : MUBUF_LoadIntrinsicPat<SIbuffer_load_ushort, i32, "BUFFER_LOAD_USHORT">;

	multiclass MUBUF_StoreIntrinsicPat<SDPatternOperator name, ValueType vt,			multiclass MUBUF_StoreIntrinsicPat<SDPatternOperator name, ValueType vt,
	string opcode> {			string opcode> {
	def : GCNPat<			def : GCNPat<
	(name vt:$vdata, v4i32:$rsrc, 0, 0, i32:$soffset, imm:$offset,			(name vt:$vdata, v4i32:$rsrc, 0, 0, i32:$soffset, imm:$offset,
	imm:$cachepolicy, 0),			imm:$cachepolicy, 0),
	(!cast<MUBUF_Pseudo>(opcode # _OFFSET_exact) $vdata, $rsrc, $soffset, (as_i16imm $offset),			(!cast<MUBUF_Pseudo>(opcode # _OFFSET_exact) $vdata, $rsrc, $soffset, (as_i16imm $offset),
	(extract_glc $cachepolicy), (extract_slc $cachepolicy), 0)			(extract_glc $cachepolicy), (extract_slc $cachepolicy), 0)
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	} // End HasPackedD16VMem.			} // End HasPackedD16VMem.

	defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, f32, "BUFFER_STORE_DWORD">;			defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, f32, "BUFFER_STORE_DWORD">;
	defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, i32, "BUFFER_STORE_DWORD">;			defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, i32, "BUFFER_STORE_DWORD">;
	defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v2f32, "BUFFER_STORE_DWORDX2">;			defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v2f32, "BUFFER_STORE_DWORDX2">;
	defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v2i32, "BUFFER_STORE_DWORDX2">;			defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v2i32, "BUFFER_STORE_DWORDX2">;
	defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v4f32, "BUFFER_STORE_DWORDX4">;			defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v4f32, "BUFFER_STORE_DWORDX4">;
	defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v4i32, "BUFFER_STORE_DWORDX4">;			defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v4i32, "BUFFER_STORE_DWORDX4">;
				defm : MUBUF_StoreIntrinsicPat<SIbuffer_store_byte, i32, "BUFFER_STORE_BYTE">;
				defm : MUBUF_StoreIntrinsicPat<SIbuffer_store_short, i32, "BUFFER_STORE_SHORT">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// buffer_atomic patterns			// buffer_atomic patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	multiclass BufferAtomicPatterns<SDPatternOperator name, ValueType vt,			multiclass BufferAtomicPatterns<SDPatternOperator name, ValueType vt,
	string opcode> {			string opcode> {
	def : GCNPat<			def : GCNPat<
	▲ Show 20 Lines • Show All 914 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	private:
SDValue splitBinaryBitConstantOp(DAGCombinerInfo &DCI, const SDLoc &SL,		SDValue splitBinaryBitConstantOp(DAGCombinerInfo &DCI, const SDLoc &SL,
unsigned Opc, SDValue LHS,		unsigned Opc, SDValue LHS,
const ConstantSDNode *CRHS) const;		const ConstantSDNode *CRHS) const;

SDValue performAndCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performAndCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performOrCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performOrCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performXorCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performXorCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performZeroExtendCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performZeroExtendCombine(SDNode *N, DAGCombinerInfo &DCI) const;
		SDValue performSignExtendInRegCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performClassCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performClassCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue getCanonicalConstantFP(SelectionDAG &DAG, const SDLoc &SL, EVT VT,		SDValue getCanonicalConstantFP(SelectionDAG &DAG, const SDLoc &SL, EVT VT,
const APFloat &C) const;		const APFloat &C) const;
SDValue performFCanonicalizeCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performFCanonicalizeCombine(SDNode *N, DAGCombinerInfo &DCI) const;

SDValue performFPMed3ImmCombine(SelectionDAG &DAG, const SDLoc &SL,		SDValue performFPMed3ImmCombine(SelectionDAG &DAG, const SDLoc &SL,
SDValue Op0, SDValue Op1) const;		SDValue Op0, SDValue Op1) const;
SDValue performIntMed3ImmCombine(SelectionDAG &DAG, const SDLoc &SL,		SDValue performIntMed3ImmCombine(SelectionDAG &DAG, const SDLoc &SL,
Show All 36 Lines	private:
bool shouldEmitPCReloc(const GlobalValue *GV) const;		bool shouldEmitPCReloc(const GlobalValue *GV) const;

// Analyze a combined offset from an amdgcn_buffer_ intrinsic and store the		// Analyze a combined offset from an amdgcn_buffer_ intrinsic and store the
// three offsets (voffset, soffset and instoffset) into the SDValue[3] array		// three offsets (voffset, soffset and instoffset) into the SDValue[3] array
// pointed to by Offsets.		// pointed to by Offsets.
void setBufferOffsets(SDValue CombinedOffset, SelectionDAG &DAG,		void setBufferOffsets(SDValue CombinedOffset, SelectionDAG &DAG,
SDValue *Offsets, unsigned Align = 4) const;		SDValue *Offsets, unsigned Align = 4) const;

		// Handle 8 bit and 16 bit buffer loads
		SDValue handleByteShortBufferLoads(SelectionDAG &DAG, EVT LoadVT, SDLoc DL,
		ArrayRef<SDValue> Ops, MemSDNode *M) const;

		// Handle 8 bit and 16 bit buffer stores
		SDValue handleByteShortBufferStores(SelectionDAG &DAG, EVT VDataType,
		SDLoc DL, SDValue Ops[],
		MemSDNode *M) const;

public:		public:
SITargetLowering(const TargetMachine &tm, const GCNSubtarget &STI);		SITargetLowering(const TargetMachine &tm, const GCNSubtarget &STI);

const GCNSubtarget *getSubtarget() const;		const GCNSubtarget *getSubtarget() const;

bool isFPExtFoldable(unsigned Opcode, EVT DestVT, EVT SrcVT) const override;		bool isFPExtFoldable(unsigned Opcode, EVT DestVT, EVT SrcVT) const override;

bool isShuffleMaskLegal(ArrayRef<int> /Mask/, EVT /VT/) const override;		bool isShuffleMaskLegal(ArrayRef<int> /Mask/, EVT /VT/) const override;
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::f16, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2i16, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::v2f16, Custom);

setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v2f16, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v2f16, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v4f16, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v4f16, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v8f16, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::v8f16, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i16, Custom);
		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);

setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::v2i16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::v2i16, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::v2f16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::v2f16, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::v4f16, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::v4f16, Custom);
		setOperationAction(ISD::INTRINSIC_VOID, MVT::i16, Custom);
		setOperationAction(ISD::INTRINSIC_VOID, MVT::i8, Custom);

setOperationAction(ISD::BRCOND, MVT::Other, Custom);		setOperationAction(ISD::BRCOND, MVT::Other, Custom);
setOperationAction(ISD::BR_CC, MVT::i1, Expand);		setOperationAction(ISD::BR_CC, MVT::i1, Expand);
setOperationAction(ISD::BR_CC, MVT::i32, Expand);		setOperationAction(ISD::BR_CC, MVT::i32, Expand);
setOperationAction(ISD::BR_CC, MVT::i64, Expand);		setOperationAction(ISD::BR_CC, MVT::i64, Expand);
setOperationAction(ISD::BR_CC, MVT::f32, Expand);		setOperationAction(ISD::BR_CC, MVT::f32, Expand);
setOperationAction(ISD::BR_CC, MVT::f64, Expand);		setOperationAction(ISD::BR_CC, MVT::f64, Expand);

▲ Show 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	#endif
setTargetDAGCombine(ISD::AND);		setTargetDAGCombine(ISD::AND);
setTargetDAGCombine(ISD::OR);		setTargetDAGCombine(ISD::OR);
setTargetDAGCombine(ISD::XOR);		setTargetDAGCombine(ISD::XOR);
setTargetDAGCombine(ISD::SINT_TO_FP);		setTargetDAGCombine(ISD::SINT_TO_FP);
setTargetDAGCombine(ISD::UINT_TO_FP);		setTargetDAGCombine(ISD::UINT_TO_FP);
setTargetDAGCombine(ISD::FCANONICALIZE);		setTargetDAGCombine(ISD::FCANONICALIZE);
setTargetDAGCombine(ISD::SCALAR_TO_VECTOR);		setTargetDAGCombine(ISD::SCALAR_TO_VECTOR);
setTargetDAGCombine(ISD::ZERO_EXTEND);		setTargetDAGCombine(ISD::ZERO_EXTEND);
		setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);
setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);		setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);		setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);

// All memory operations. Some folding on the pointer operand is done to help		// All memory operations. Some folding on the pointer operand is done to help
// matching the constant offsets in the addressing modes.		// matching the constant offsets in the addressing modes.
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);
setTargetDAGCombine(ISD::STORE);		setTargetDAGCombine(ISD::STORE);
setTargetDAGCombine(ISD::ATOMIC_LOAD);		setTargetDAGCombine(ISD::ATOMIC_LOAD);
▲ Show 20 Lines • Show All 4,899 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_buffer_load_format: {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT IntVT = VT.changeTypeToInteger();		EVT IntVT = VT.changeTypeToInteger();
auto *M = cast<MemSDNode>(Op);		auto *M = cast<MemSDNode>(Op);
EVT LoadVT = Op.getValueType();		EVT LoadVT = Op.getValueType();

if (LoadVT.getScalarType() == MVT::f16)		if (LoadVT.getScalarType() == MVT::f16)
return adjustLoadValueType(AMDGPUISD::BUFFER_LOAD_FORMAT_D16,		return adjustLoadValueType(AMDGPUISD::BUFFER_LOAD_FORMAT_D16,
M, DAG, Ops);		M, DAG, Ops);

		// Handle BUFFER_LOAD_BYTE/UBYTE/SHORT/USHORT overloaded intrinsics
		arsenmUnsubmitted Not Done Reply Inline Actions Capitalize arsenm: Capitalize
		if (LoadVT.getScalarType() == MVT::i8 \|\|
		LoadVT.getScalarType() == MVT::i16)
		return handleByteShortBufferLoads(DAG, LoadVT, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT,
M->getMemOperand());		M->getMemOperand());
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions You can just hardcoded this to MVT::Other arsenm: You can just hardcoded this to MVT::Other
case Intrinsic::amdgcn_raw_buffer_load:		case Intrinsic::amdgcn_raw_buffer_load:
		arsenmUnsubmitted Not Done Reply Inline Actions Ternary operator arsenm: Ternary operator
case Intrinsic::amdgcn_raw_buffer_load_format: {		case Intrinsic::amdgcn_raw_buffer_load_format: {
auto Offsets = splitBufferOffsets(Op.getOperand(3), DAG);		auto Offsets = splitBufferOffsets(Op.getOperand(3), DAG);
SDValue Ops[] = {		SDValue Ops[] = {
Op.getOperand(0), // Chain		Op.getOperand(0), // Chain
Op.getOperand(2), // rsrc		Op.getOperand(2), // rsrc
DAG.getConstant(0, DL, MVT::i32), // vindex		DAG.getConstant(0, DL, MVT::i32), // vindex
Offsets.first, // voffset		Offsets.first, // voffset
Op.getOperand(4), // soffset		Op.getOperand(4), // soffset
Offsets.second, // offset		Offsets.second, // offset
Op.getOperand(5), // cachepolicy		Op.getOperand(5), // cachepolicy
		arsenmUnsubmitted Not Done Reply Inline Actions You shouldn't be inspecting the users. You can just unconditionally use one or the other. You're going to have to insert a truncate back to the original type at the end anyway. You can then add a separate optimization to fold in the sext_inreg or mask into the buffer like is done for loads arsenm: You shouldn't be inspecting the users. You can just unconditionally use one or the other.
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions There are four potential options so what do you mean by one or the other? There is BUFFER_LOAD_ubyte/ushort/short/byte for the Opc. rtaylor: There are four potential options so what do you mean by one or the other? There is…
		arsenmUnsubmitted Not Done Reply Inline Actions You can just unconditionally use load_ubyte/load_ushort. Folding the sign extend in is then a separate optimization on a sext (or more likely a sext_inreg) arsenm: You can just unconditionally use load_ubyte/load_ushort. Folding the sign extend in is then a…
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions So outputting byte/short based on sign_extend in the tablgen pattern? This won't allow re-use of the existing multiclass without changes. I think I remember there being a reason Nicolai and I decided not to do this. rtaylor: So outputting byte/short based on sign_extend in the tablgen pattern? This won't allow re-use…
		arsenmUnsubmitted Not Done Reply Inline Actions This doesn't change the selection. This is an optimization done in the DAGCombiner arsenm: This doesn't change the selection. This is an optimization done in the DAGCombiner
DAG.getConstant(0, DL, MVT::i1), // idxen		DAG.getConstant(0, DL, MVT::i1), // idxen
};		};

unsigned Opc = (IntrID == Intrinsic::amdgcn_raw_buffer_load) ?		unsigned Opc = (IntrID == Intrinsic::amdgcn_raw_buffer_load) ?
AMDGPUISD::BUFFER_LOAD : AMDGPUISD::BUFFER_LOAD_FORMAT;		AMDGPUISD::BUFFER_LOAD : AMDGPUISD::BUFFER_LOAD_FORMAT;

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT IntVT = VT.changeTypeToInteger();		EVT IntVT = VT.changeTypeToInteger();
auto *M = cast<MemSDNode>(Op);		auto *M = cast<MemSDNode>(Op);
EVT LoadVT = Op.getValueType();		EVT LoadVT = Op.getValueType();

if (LoadVT.getScalarType() == MVT::f16)		if (LoadVT.getScalarType() == MVT::f16)
return adjustLoadValueType(AMDGPUISD::BUFFER_LOAD_FORMAT_D16,		return adjustLoadValueType(AMDGPUISD::BUFFER_LOAD_FORMAT_D16,
M, DAG, Ops);		M, DAG, Ops);

		// Handle BUFFER_LOAD_BYTE/UBYTE/SHORT/USHORT overloaded intrinsics
		arsenmUnsubmitted Not Done Reply Inline Actions Capitalize arsenm: Capitalize
		arsenmUnsubmitted Not Done Reply Inline Actions Capitalize arsenm: Capitalize
		if (LoadVT.getScalarType() == MVT::i8 \|\|
		LoadVT.getScalarType() == MVT::i16)
		arsenmUnsubmitted Not Done Reply Inline Actions This looks identical to the other part, which is kind of surprising to me but this should be factored into something common arsenm: This looks identical to the other part, which is kind of surprising to me but this should be…
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions Do you mean that there is simliar/same code in the different cases of buffer_load? (ie struct, raw, normal) And that these should be factored into something common (ie function call)? rtaylor: Do you mean that there is simliar/same code in the different cases of buffer_load? (ie struct…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes arsenm: Yes
		return handleByteShortBufferLoads(DAG, LoadVT, DL, Ops, M);

		arsenmUnsubmitted Not Done Reply Inline Actions This comment can be removed arsenm: This comment can be removed
return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT,
M->getMemOperand());		M->getMemOperand());
}		}
case Intrinsic::amdgcn_struct_buffer_load:		case Intrinsic::amdgcn_struct_buffer_load:
		arsenmUnsubmitted Not Done Reply Inline Actions Ternary operator arsenm: Ternary operator
case Intrinsic::amdgcn_struct_buffer_load_format: {		case Intrinsic::amdgcn_struct_buffer_load_format: {
auto Offsets = splitBufferOffsets(Op.getOperand(4), DAG);		auto Offsets = splitBufferOffsets(Op.getOperand(4), DAG);
SDValue Ops[] = {		SDValue Ops[] = {
Op.getOperand(0), // Chain		Op.getOperand(0), // Chain
Op.getOperand(2), // rsrc		Op.getOperand(2), // rsrc
Op.getOperand(3), // vindex		Op.getOperand(3), // vindex
Offsets.first, // voffset		Offsets.first, // voffset
Op.getOperand(5), // soffset		Op.getOperand(5), // soffset
Offsets.second, // offset		Offsets.second, // offset
Op.getOperand(6), // cachepolicy		Op.getOperand(6), // cachepolicy
DAG.getConstant(1, DL, MVT::i1), // idxen		DAG.getConstant(1, DL, MVT::i1), // idxen
};		};

unsigned Opc = (IntrID == Intrinsic::amdgcn_struct_buffer_load) ?		unsigned Opc = (IntrID == Intrinsic::amdgcn_struct_buffer_load) ?
AMDGPUISD::BUFFER_LOAD : AMDGPUISD::BUFFER_LOAD_FORMAT;		AMDGPUISD::BUFFER_LOAD : AMDGPUISD::BUFFER_LOAD_FORMAT;

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT IntVT = VT.changeTypeToInteger();		EVT IntVT = VT.changeTypeToInteger();
auto *M = cast<MemSDNode>(Op);		auto *M = cast<MemSDNode>(Op);
EVT LoadVT = Op.getValueType();		EVT LoadVT = Op.getValueType();

if (LoadVT.getScalarType() == MVT::f16)		if (LoadVT.getScalarType() == MVT::f16)
return adjustLoadValueType(AMDGPUISD::BUFFER_LOAD_FORMAT_D16,		return adjustLoadValueType(AMDGPUISD::BUFFER_LOAD_FORMAT_D16,
M, DAG, Ops);		M, DAG, Ops);

		// Handle BUFFER_LOAD_BYTE/UBYTE/SHORT/USHORT overloaded intrinsics
		if (LoadVT.getScalarType() == MVT::i8 \|\|
		arsenmUnsubmitted Not Done Reply Inline Actions Repeated again arsenm: Repeated again
		LoadVT.getScalarType() == MVT::i16)
		return handleByteShortBufferLoads(DAG, LoadVT, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops, IntVT,
M->getMemOperand());		M->getMemOperand());
}		}
case Intrinsic::amdgcn_tbuffer_load: {		case Intrinsic::amdgcn_tbuffer_load: {
MemSDNode *M = cast<MemSDNode>(Op);		MemSDNode *M = cast<MemSDNode>(Op);
EVT LoadVT = Op.getValueType();		EVT LoadVT = Op.getValueType();

unsigned Dfmt = cast<ConstantSDNode>(Op.getOperand(7))->getZExtValue();		unsigned Dfmt = cast<ConstantSDNode>(Op.getOperand(7))->getZExtValue();
▲ Show 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	SDValue Ops[] = {
DAG.getConstant(Glc \| (Slc << 1), DL, MVT::i32), // cachepolicy		DAG.getConstant(Glc \| (Slc << 1), DL, MVT::i32), // cachepolicy
DAG.getConstant(IdxEn, DL, MVT::i1), // idxen		DAG.getConstant(IdxEn, DL, MVT::i1), // idxen
};		};
setBufferOffsets(Op.getOperand(5), DAG, &Ops[4]);		setBufferOffsets(Op.getOperand(5), DAG, &Ops[4]);
unsigned Opc = IntrinsicID == Intrinsic::amdgcn_buffer_store ?		unsigned Opc = IntrinsicID == Intrinsic::amdgcn_buffer_store ?
AMDGPUISD::BUFFER_STORE : AMDGPUISD::BUFFER_STORE_FORMAT;		AMDGPUISD::BUFFER_STORE : AMDGPUISD::BUFFER_STORE_FORMAT;
Opc = IsD16 ? AMDGPUISD::BUFFER_STORE_FORMAT_D16 : Opc;		Opc = IsD16 ? AMDGPUISD::BUFFER_STORE_FORMAT_D16 : Opc;
MemSDNode *M = cast<MemSDNode>(Op);		MemSDNode *M = cast<MemSDNode>(Op);

		// Handle BUFFER_STORE_BYTE/SHORT overloaded intrinsics
		EVT VDataType = VData.getValueType().getScalarType();
		if (VDataType == MVT::i8 \|\| VDataType == MVT::i16)
		return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,
M->getMemoryVT(), M->getMemOperand());		M->getMemoryVT(), M->getMemOperand());
}		}

case Intrinsic::amdgcn_raw_buffer_store:		case Intrinsic::amdgcn_raw_buffer_store:
case Intrinsic::amdgcn_raw_buffer_store_format: {		case Intrinsic::amdgcn_raw_buffer_store_format: {
SDValue VData = Op.getOperand(2);		SDValue VData = Op.getOperand(2);
bool IsD16 = (VData.getValueType().getScalarType() == MVT::f16);		bool IsD16 = (VData.getValueType().getScalarType() == MVT::f16);
Show All 10 Lines	SDValue Ops[] = {
Offsets.second, // offset		Offsets.second, // offset
Op.getOperand(6), // cachepolicy		Op.getOperand(6), // cachepolicy
DAG.getConstant(0, DL, MVT::i1), // idxen		DAG.getConstant(0, DL, MVT::i1), // idxen
};		};
unsigned Opc = IntrinsicID == Intrinsic::amdgcn_raw_buffer_store ?		unsigned Opc = IntrinsicID == Intrinsic::amdgcn_raw_buffer_store ?
AMDGPUISD::BUFFER_STORE : AMDGPUISD::BUFFER_STORE_FORMAT;		AMDGPUISD::BUFFER_STORE : AMDGPUISD::BUFFER_STORE_FORMAT;
Opc = IsD16 ? AMDGPUISD::BUFFER_STORE_FORMAT_D16 : Opc;		Opc = IsD16 ? AMDGPUISD::BUFFER_STORE_FORMAT_D16 : Opc;
MemSDNode *M = cast<MemSDNode>(Op);		MemSDNode *M = cast<MemSDNode>(Op);

		// Handle BUFFER_STORE_BYTE/SHORT overloaded intrinsics
		EVT VDataType = VData.getValueType().getScalarType();
		if (VDataType == MVT::i8 \|\| VDataType == MVT::i16)
		return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,
M->getMemoryVT(), M->getMemOperand());		M->getMemoryVT(), M->getMemOperand());
}		}

case Intrinsic::amdgcn_struct_buffer_store:		case Intrinsic::amdgcn_struct_buffer_store:
		arsenmUnsubmitted Not Done Reply Inline Actions Ternary operator arsenm: Ternary operator
case Intrinsic::amdgcn_struct_buffer_store_format: {		case Intrinsic::amdgcn_struct_buffer_store_format: {
SDValue VData = Op.getOperand(2);		SDValue VData = Op.getOperand(2);
bool IsD16 = (VData.getValueType().getScalarType() == MVT::f16);		bool IsD16 = (VData.getValueType().getScalarType() == MVT::f16);
if (IsD16)		if (IsD16)
VData = handleD16VData(VData, DAG);		VData = handleD16VData(VData, DAG);
auto Offsets = splitBufferOffsets(Op.getOperand(5), DAG);		auto Offsets = splitBufferOffsets(Op.getOperand(5), DAG);
SDValue Ops[] = {		SDValue Ops[] = {
Chain,		Chain,
VData,		VData,
Op.getOperand(3), // rsrc		Op.getOperand(3), // rsrc
Op.getOperand(4), // vindex		Op.getOperand(4), // vindex
Offsets.first, // voffset		Offsets.first, // voffset
Op.getOperand(6), // soffset		Op.getOperand(6), // soffset
Offsets.second, // offset		Offsets.second, // offset
Op.getOperand(7), // cachepolicy		Op.getOperand(7), // cachepolicy
DAG.getConstant(1, DL, MVT::i1), // idxen		DAG.getConstant(1, DL, MVT::i1), // idxen
};		};
unsigned Opc = IntrinsicID == Intrinsic::amdgcn_struct_buffer_store ?		unsigned Opc = IntrinsicID == Intrinsic::amdgcn_struct_buffer_store ?
AMDGPUISD::BUFFER_STORE : AMDGPUISD::BUFFER_STORE_FORMAT;		AMDGPUISD::BUFFER_STORE : AMDGPUISD::BUFFER_STORE_FORMAT;
Opc = IsD16 ? AMDGPUISD::BUFFER_STORE_FORMAT_D16 : Opc;		Opc = IsD16 ? AMDGPUISD::BUFFER_STORE_FORMAT_D16 : Opc;
MemSDNode *M = cast<MemSDNode>(Op);		MemSDNode *M = cast<MemSDNode>(Op);

		// Handle BUFFER_STORE_BYTE/SHORT overloaded intrinsics
		arsenmUnsubmitted Not Done Reply Inline Actions Capitalize arsenm: Capitalize
		EVT VDataType = VData.getValueType().getScalarType();
		if (VDataType == MVT::i8 \|\| VDataType == MVT::i16)
		return handleByteShortBufferStores(DAG, VDataType, DL, Ops, M);

return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,		return DAG.getMemIntrinsicNode(Opc, DL, Op->getVTList(), Ops,
M->getMemoryVT(), M->getMemOperand());		M->getMemoryVT(), M->getMemOperand());
}		}

default: {		default: {
		arsenmUnsubmitted Not Done Reply Inline Actions Ternary operator arsenm: Ternary operator
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
return lowerImage(Op, ImageDimIntr, DAG);		return lowerImage(Op, ImageDimIntr, DAG);

return Op;		return Op;
}		}
}		}
}		}
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (Offset >= 0 && AMDGPU::splitMUBUFOffset(Offset, SOffset, ImmOffset,
return;		return;
}		}
}		}
Offsets[0] = CombinedOffset;		Offsets[0] = CombinedOffset;
Offsets[1] = DAG.getConstant(0, DL, MVT::i32);		Offsets[1] = DAG.getConstant(0, DL, MVT::i32);
Offsets[2] = DAG.getConstant(0, DL, MVT::i32);		Offsets[2] = DAG.getConstant(0, DL, MVT::i32);
}		}

		// Handle 8 bit and 16 bit buffer loads
		SDValue SITargetLowering::handleByteShortBufferLoads(SelectionDAG &DAG,
		EVT LoadVT, SDLoc DL,
		ArrayRef<SDValue> Ops,
		MemSDNode *M) const
		{
		arsenmUnsubmitted Not Done Reply Inline Actions Brace placement arsenm: Brace placement
		EVT IntVT = LoadVT.changeTypeToInteger();
		unsigned Opc = (LoadVT.getScalarType() == MVT::i8) ?
		AMDGPUISD::BUFFER_LOAD_UBYTE : AMDGPUISD::BUFFER_LOAD_USHORT;

		SDVTList ResList = DAG.getVTList(MVT::i32, MVT::Other);
		SDValue BufferLoad = DAG.getMemIntrinsicNode(Opc, DL, ResList,
		Ops, IntVT,
		M->getMemOperand());
		SDValue BufferLoadTrunc = DAG.getNode(ISD::TRUNCATE, DL,
		LoadVT.getScalarType(), BufferLoad);
		return DAG.getMergeValues({BufferLoadTrunc, BufferLoad.getValue(1)}, DL);
		}

		// Handle 8 bit and 16 bit buffer stores
		SDValue SITargetLowering::handleByteShortBufferStores(SelectionDAG &DAG,
		EVT VDataType, SDLoc DL,
		SDValue Ops[],
		MemSDNode *M) const
		{
		arsenmUnsubmitted Not Done Reply Inline Actions Brace placement arsenm: Brace placement
		SDValue BufferStoreExt = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, Ops[1]);
		Ops[1] = BufferStoreExt;
		unsigned Opc = (VDataType == MVT::i8) ? AMDGPUISD::BUFFER_STORE_BYTE :
		AMDGPUISD::BUFFER_STORE_SHORT;
		ArrayRef<SDValue> OpsRef = makeArrayRef(&Ops[0], 9);
		return DAG.getMemIntrinsicNode(Opc, DL, M->getVTList(), OpsRef, VDataType,
		M->getMemOperand());
		}

static SDValue getLoadExtOrTrunc(SelectionDAG &DAG,		static SDValue getLoadExtOrTrunc(SelectionDAG &DAG,
ISD::LoadExtType ExtType, SDValue Op,		ISD::LoadExtType ExtType, SDValue Op,
const SDLoc &SL, EVT VT) {		const SDLoc &SL, EVT VT) {
if (VT.bitsLT(Op.getValueType()))		if (VT.bitsLT(Op.getValueType()))
return DAG.getNode(ISD::TRUNCATE, SL, VT, Op);		return DAG.getNode(ISD::TRUNCATE, SL, VT, Op);

switch (ExtType) {		switch (ExtType) {
case ISD::SEXTLOAD:		case ISD::SEXTLOAD:
▲ Show 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines	if (Src.getOpcode() == ISD::BITCAST) {
if (BCSrc.getValueType() == MVT::f16 &&		if (BCSrc.getValueType() == MVT::f16 &&
fp16SrcZerosHighBits(BCSrc.getOpcode()))		fp16SrcZerosHighBits(BCSrc.getOpcode()))
return DCI.DAG.getNode(AMDGPUISD::FP16_ZEXT, SDLoc(N), VT, BCSrc);		return DCI.DAG.getNode(AMDGPUISD::FP16_ZEXT, SDLoc(N), VT, BCSrc);
}		}

return SDValue();		return SDValue();
}		}

		SDValue SITargetLowering::performSignExtendInRegCombine(SDNode *N,
		DAGCombinerInfo &DCI)
		arsenmUnsubmitted Not Done Reply Inline Actions Formatting arsenm: Formatting
		const {

		SDValue Src = N->getOperand(0);
		arsenmUnsubmitted Not Done Reply Inline Actions Should have a hasOneUse check arsenm: Should have a hasOneUse check
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions You mean there should be a hasOneUse check on the SIGN_EXTEND_INREG right? rtaylor: You mean there should be a hasOneUse check on the SIGN_EXTEND_INREG right?
		arsenmUnsubmitted Not Done Reply Inline Actions No, the buffer operation. If there are multiple uses you will end up creating multiple loads arsenm: No, the buffer operation. If there are multiple uses you will end up creating multiple loads
		auto *VTSign = cast<VTSDNode>(N->getOperand(1));
		arsenmUnsubmitted Not Done Reply Inline Actions Leftover debugging arsenm: Leftover debugging

		arsenmUnsubmitted Not Done Reply Inline Actions This is missing a check on the source type. If you want to be fancier, you can split out the remainder bits into a new sign extend but there probably isn't much reason to arsenm: This is missing a check on the source type. If you want to be fancier, you can split out the…
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions Src is the BUFFER_LOAD_XXX. The only way this code is executed is if the Src is a BUFFER_LOAD_XXX. I'm not sure we need a redundant check here do we? rtaylor: Src is the BUFFER_LOAD_XXX. The only way this code is executed is if the Src is a…
		arsenmUnsubmitted Not Done Reply Inline Actions The number of bits in the sext_inreg may not match the load's from-8/16 bit source. You can test this with something like %load = call llvm.amdgcn.buffer.load.i8() %ext = zext i8 %load to i32 %shl = shl i32 %ext, 27 %shr = ashr i32 %shl, 27 There will need more shifts to clear the extra bits in the loaded value arsenm: The number of bits in the sext_inreg may not match the load's from-8/16 bit source. You can…
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions This should produce a buffer_load_sbyte right? That is what it does currently. rtaylor: This should produce a buffer_load_sbyte right? That is what it does currently.
		arsenmUnsubmitted Not Done Reply Inline Actions But it needs additional shifts even after. Right now you'll not be clearing the extra bits in the low 8 that need to be arsenm: But it needs additional shifts even after. Right now you'll not be clearing the extra bits in…
		arsenmUnsubmitted Not Done Reply Inline Actions You can either leave it as x = buffer_load_ubyte sext_inreg x, i27 or x = buffer_load_sbyte sext_inreg x, i5 I'm not sure there's much practical difference between them arsenm: You can either leave it as x = buffer_load_ubyte sext_inreg x, i27 or x = buffer_load_sbyte…
		rtaylorAuthorUnsubmitted Not Done Reply Inline Actions Right, I don't think there is, I'm working on doing the former. Thanks. rtaylor: Right, I don't think there is, I'm working on doing the former. Thanks.
		arsenmUnsubmitted Not Done Reply Inline Actions There might be a small advantage by canonicalizing the sext_inreg sizes. The constant masks that will be emitted for BFI might be more likely to be reusable arsenm: There might be a small advantage by canonicalizing the sext_inreg sizes. The constant masks…
		if (((Src.getOpcode() == AMDGPUISD::BUFFER_LOAD_UBYTE &&
		VTSign->getVT() == MVT::i8) \|\|
		arsenmUnsubmitted Not Done Reply Inline Actions Extra space before == arsenm: Extra space before ==
		(Src.getOpcode() == AMDGPUISD::BUFFER_LOAD_USHORT &&
		VTSign->getVT() == MVT::i16)) &&
		arsenmUnsubmitted Not Done Reply Inline Actions Extra space before == arsenm: Extra space before ==
		Src.hasOneUse()) {
		auto *M = cast<MemSDNode>(Src);
		SDValue Ops[] = {
		arsenmUnsubmitted Not Done Reply Inline Actions "will be set by" part doesn't make sense here arsenm: "will be set by" part doesn't make sense here
		Src.getOperand(0), // Chain
		Src.getOperand(1), // rsrc
		Src.getOperand(2), // vindex
		Src.getOperand(3), // voffset
		Src.getOperand(4), // soffset
		Src.getOperand(5), // offset
		Src.getOperand(6),
		Src.getOperand(7)
		};
		// replace with BUFFER_LOAD_BYTE/SHORT
		SDVTList ResList = DCI.DAG.getVTList(MVT::i32,
		Src.getOperand(0).getValueType());
		unsigned Opc = (Src.getOpcode() == AMDGPUISD::BUFFER_LOAD_UBYTE) ?
		AMDGPUISD::BUFFER_LOAD_BYTE : AMDGPUISD::BUFFER_LOAD_SHORT;
		SDValue BufferLoadSignExt = DCI.DAG.getMemIntrinsicNode(Opc, SDLoc(N),
		ResList,
		Ops, M->getMemoryVT(),
		M->getMemOperand());
		return DCI.DAG.getMergeValues({BufferLoadSignExt,
		BufferLoadSignExt.getValue(1)}, SDLoc(N));
		}
		return SDValue();
		}

SDValue SITargetLowering::performClassCombine(SDNode *N,		SDValue SITargetLowering::performClassCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue Mask = N->getOperand(1);		SDValue Mask = N->getOperand(1);

// fp_class x, 0 -> false		// fp_class x, 0 -> false
if (const ConstantSDNode *CMask = dyn_cast<ConstantSDNode>(Mask)) {		if (const ConstantSDNode *CMask = dyn_cast<ConstantSDNode>(Mask)) {
if (CMask->isNullValue())		if (CMask->isNullValue())
▲ Show 20 Lines • Show All 1,232 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::performClampCombine(SDNode *N,
return SDValue(CSrc, 0);		return SDValue(CSrc, 0);
}		}


SDValue SITargetLowering::PerformDAGCombine(SDNode *N,		SDValue SITargetLowering::PerformDAGCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
if (getTargetMachine().getOptLevel() == CodeGenOpt::None)		if (getTargetMachine().getOptLevel() == CodeGenOpt::None)
return SDValue();		return SDValue();

switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
return AMDGPUTargetLowering::PerformDAGCombine(N, DCI);		return AMDGPUTargetLowering::PerformDAGCombine(N, DCI);
case ISD::ADD:		case ISD::ADD:
return performAddCombine(N, DCI);		return performAddCombine(N, DCI);
case ISD::SUB:		case ISD::SUB:
return performSubCombine(N, DCI);		return performSubCombine(N, DCI);
case ISD::ADDCARRY:		case ISD::ADDCARRY:
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::PerformDAGCombine(SDNode *N,
case ISD::AND:		case ISD::AND:
return performAndCombine(N, DCI);		return performAndCombine(N, DCI);
case ISD::OR:		case ISD::OR:
return performOrCombine(N, DCI);		return performOrCombine(N, DCI);
case ISD::XOR:		case ISD::XOR:
return performXorCombine(N, DCI);		return performXorCombine(N, DCI);
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
return performZeroExtendCombine(N, DCI);		return performZeroExtendCombine(N, DCI);
		case ISD::SIGN_EXTEND_INREG:
		return performSignExtendInRegCombine(N , DCI);
case AMDGPUISD::FP_CLASS:		case AMDGPUISD::FP_CLASS:
return performClassCombine(N, DCI);		return performClassCombine(N, DCI);
case ISD::FCANONICALIZE:		case ISD::FCANONICALIZE:
return performFCanonicalizeCombine(N, DCI);		return performFCanonicalizeCombine(N, DCI);
case AMDGPUISD::RCP:		case AMDGPUISD::RCP:
return performRcpCombine(N, DCI);		return performRcpCombine(N, DCI);
case AMDGPUISD::FRACT:		case AMDGPUISD::FRACT:
case AMDGPUISD::RSQ:		case AMDGPUISD::RSQ:
▲ Show 20 Lines • Show All 799 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	def SDTBufferLoad : SDTypeProfile<1, 7,
SDTCisVT<3, i32>, // voffset(VGPR)		SDTCisVT<3, i32>, // voffset(VGPR)
SDTCisVT<4, i32>, // soffset(SGPR)		SDTCisVT<4, i32>, // soffset(SGPR)
SDTCisVT<5, i32>, // offset(imm)		SDTCisVT<5, i32>, // offset(imm)
SDTCisVT<6, i32>, // cachepolicy(imm)		SDTCisVT<6, i32>, // cachepolicy(imm)
SDTCisVT<7, i1>]>; // idxen(imm)		SDTCisVT<7, i1>]>; // idxen(imm)

def SIbuffer_load : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoad,		def SIbuffer_load : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
		def SIbuffer_load_ubyte : SDNode <"AMDGPUISD::BUFFER_LOAD_UBYTE", SDTBufferLoad,
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
		def SIbuffer_load_ushort : SDNode <"AMDGPUISD::BUFFER_LOAD_USHORT", SDTBufferLoad,
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
		def SIbuffer_load_byte : SDNode <"AMDGPUISD::BUFFER_LOAD_BYTE", SDTBufferLoad,
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
		def SIbuffer_load_short: SDNode <"AMDGPUISD::BUFFER_LOAD_SHORT", SDTBufferLoad,
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_format : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT", SDTBufferLoad,		def SIbuffer_load_format : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_format_d16 : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT_D16",		def SIbuffer_load_format_d16 : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT_D16",
SDTBufferLoad,		SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;

def SDTBufferStore : SDTypeProfile<0, 8,		def SDTBufferStore : SDTypeProfile<0, 8,
[ // vdata		[ // vdata
SDTCisVT<1, v4i32>, // rsrc		SDTCisVT<1, v4i32>, // rsrc
SDTCisVT<2, i32>, // vindex(VGPR)		SDTCisVT<2, i32>, // vindex(VGPR)
SDTCisVT<3, i32>, // voffset(VGPR)		SDTCisVT<3, i32>, // voffset(VGPR)
SDTCisVT<4, i32>, // soffset(SGPR)		SDTCisVT<4, i32>, // soffset(SGPR)
SDTCisVT<5, i32>, // offset(imm)		SDTCisVT<5, i32>, // offset(imm)
SDTCisVT<6, i32>, // cachepolicy(imm)		SDTCisVT<6, i32>, // cachepolicy(imm)
SDTCisVT<7, i1>]>; // idxen(imm)		SDTCisVT<7, i1>]>; // idxen(imm)

def SIbuffer_store : SDNode <"AMDGPUISD::BUFFER_STORE", SDTBufferStore,		def SIbuffer_store : SDNode <"AMDGPUISD::BUFFER_STORE", SDTBufferStore,
[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;		[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;
		def SIbuffer_store_byte: SDNode <"AMDGPUISD::BUFFER_STORE_BYTE",
		SDTBufferStore,
		[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;
		def SIbuffer_store_short : SDNode <"AMDGPUISD::BUFFER_STORE_SHORT",
		SDTBufferStore,
		[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;
def SIbuffer_store_format : SDNode <"AMDGPUISD::BUFFER_STORE_FORMAT",		def SIbuffer_store_format : SDNode <"AMDGPUISD::BUFFER_STORE_FORMAT",
SDTBufferStore,		SDTBufferStore,
[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;		[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;
def SIbuffer_store_format_d16 : SDNode <"AMDGPUISD::BUFFER_STORE_FORMAT_D16",		def SIbuffer_store_format_d16 : SDNode <"AMDGPUISD::BUFFER_STORE_FORMAT_D16",
SDTBufferStore,		SDTBufferStore,
[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;		[SDNPMayStore, SDNPMemOperand, SDNPHasChain]>;

class SDBufferAtomic<string opcode> : SDNode <opcode,		class SDBufferAtomic<string opcode> : SDNode <opcode,
▲ Show 20 Lines • Show All 1,900 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll

Show First 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	main_body:
%vr1 = call <2 x float> @llvm.amdgcn.buffer.load.v2f32(<4 x i32> %rsrc, i32 0, i32 4, i1 0, i1 0)		%vr1 = call <2 x float> @llvm.amdgcn.buffer.load.v2f32(<4 x i32> %rsrc, i32 0, i32 4, i1 0, i1 0)
%r3 = call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %rsrc, i32 0, i32 12, i1 0, i1 0)		%r3 = call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %rsrc, i32 0, i32 12, i1 0, i1 0)
%r1 = extractelement <2 x float> %vr1, i32 0		%r1 = extractelement <2 x float> %vr1, i32 0
%r2 = extractelement <2 x float> %vr1, i32 1		%r2 = extractelement <2 x float> %vr1, i32 1
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float undef, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float undef, i1 true, i1 true)
ret void		ret void
}		}

		;CHECK-LABEL: {{^}}buffer_load_ubyte:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ubyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_ubyte(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%val = uitofp i8 %tmp to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_ushort:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ushort v{{[0-9]}}, off, s[0:3], 0 offset:16
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_u32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_ushort(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 16, i1 0, i1 0)
		%tmp2 = zext i16 %tmp to i32
		%val = uitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sbyte:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sbyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_i32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sbyte(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = sext i8 %tmp to i32
		%val = sitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sshort:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sshort v{{[0-9]}}, off, s[0:3], 0 offset:16
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_i32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sshort(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 16, i1 0, i1 0)
		%tmp2 = sext i16 %tmp to i32
		%val = sitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_ubyte_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ubyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_ubyte_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = zext i8 %tmp to i32
		%val = bitcast i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_ushort_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ushort v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_ushort_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = zext i16 %tmp to i32
		%val = bitcast i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sbyte_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sbyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sbyte_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = sext i8 %tmp to i32
		%val = bitcast i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sshort_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sshort v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sshort_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = sext i16 %tmp to i32
		%val = bitcast i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_ubyte_mul_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ubyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_mul_u32_u24_e32 v{{[0-9]}}, 0xff, v{{[0-9]}}
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_ubyte_mul_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = zext i8 %tmp to i32
		%tmp3 = mul i32 %tmp2, 255
		%val = bitcast i32 %tmp3 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_ushort_mul_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ushort v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_mul_u32_u24_e32 v{{[0-9]}}, 0xff, v{{[0-9]}}
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_ushort_mul_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = zext i16 %tmp to i32
		%tmp3 = mul i32 %tmp2, 255
		%val = bitcast i32 %tmp3 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sbyte_mul_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sbyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_mul_i32_i24_e32 v{{[0-9]}}, 0xff, v{{[0-9]}}
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sbyte_mul_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = sext i8 %tmp to i32
		%tmp3 = mul i32 %tmp2, 255
		%val = bitcast i32 %tmp3 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sshort_mul_bitcast:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sshort v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_mul_i32_i24_e32 v{{[0-9]}}, 0xff, v{{[0-9]}}
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sshort_mul_bitcast(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = sext i16 %tmp to i32
		%tmp3 = mul i32 %tmp2, 255
		%val = bitcast i32 %tmp3 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}buffer_load_sbyte_type_check:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ubyte v{{[0-9]}}, off, s[0:3], 0 offset:8
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_bfe_i32 v{{[0-9]}}, v{{[0-9]}}, 0, 5
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @buffer_load_sbyte_type_check(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
		%tmp2 = zext i8 %tmp to i32
		%tmp3 = shl i32 %tmp2, 27
		%tmp4 = ashr i32 %tmp3, 27
		%val = bitcast i32 %tmp4 to float
		ret float %val
		}

declare float @llvm.amdgcn.buffer.load.f32(<4 x i32>, i32, i32, i1, i1) #0		declare float @llvm.amdgcn.buffer.load.f32(<4 x i32>, i32, i32, i1, i1) #0
declare <2 x float> @llvm.amdgcn.buffer.load.v2f32(<4 x i32>, i32, i32, i1, i1) #0		declare <2 x float> @llvm.amdgcn.buffer.load.v2f32(<4 x i32>, i32, i32, i1, i1) #0
declare <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32>, i32, i32, i1, i1) #0		declare <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32>, i32, i32, i1, i1) #0
		declare i8 @llvm.amdgcn.buffer.load.i8(<4 x i32>, i32, i32, i1, i1) #0
		declare i16 @llvm.amdgcn.buffer.load.i16(<4 x i32>, i32, i32, i1, i1) #0
declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0		declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0

attributes #0 = { nounwind readonly }		attributes #0 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.store.ll

	Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	;CHECK-NOT: s_waitcnt			;CHECK-NOT: s_waitcnt
	;CHECK-DAG: buffer_store_dwordx3 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:8			;CHECK-DAG: buffer_store_dwordx3 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:8
	define amdgpu_ps void @buffer_store_x3_offset_merged3(<4 x i32> inreg %rsrc, <2 x float> %v1, float %v2) {			define amdgpu_ps void @buffer_store_x3_offset_merged3(<4 x i32> inreg %rsrc, <2 x float> %v1, float %v2) {
	call void @llvm.amdgcn.buffer.store.v2f32(<2 x float> %v1, <4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v2f32(<2 x float> %v1, <4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
	call void @llvm.amdgcn.buffer.store.f32(float %v2, <4 x i32> %rsrc, i32 0, i32 16, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.f32(float %v2, <4 x i32> %rsrc, i32 0, i32 16, i1 0, i1 0)
	ret void			ret void
	}			}

				;CHECK-LABEL: {{^}}buffer_store_byte:
				;CHECK-NOT: s_waitcnt
				;CHECK-NEXT: %bb.
				;CHECK: buffer_store_byte v{{[0-9]}}, off, s[0:3], 0 offset:8
				define amdgpu_ps void @buffer_store_byte(<4 x i32> inreg %rsrc, float %v1) {
				main_body:
				%v2 = fptoui float %v1 to i32
				%v3 = trunc i32 %v2 to i8
				call void @llvm.amdgcn.buffer.store.i8(i8 %v3, <4 x i32> %rsrc, i32 0, i32 8, i1 0, i1 0)
				ret void
				}

				;CHECK-LABEL: {{^}}buffer_store_short:
				;CHECK-NOT: s_waitcnt
				;CHECK-NEXT: %bb.
				;CHECK: buffer_store_short v{{[0-9]}}, off, s[0:3], 0 offset:16
				define amdgpu_ps void @buffer_store_short(<4 x i32> inreg %rsrc, float %v1) {
				main_body:
				%v2 = fptoui float %v1 to i32
				%v3 = trunc i32 %v2 to i16
				call void @llvm.amdgcn.buffer.store.i16(i16 %v3, <4 x i32> %rsrc, i32 0, i32 16, i1 0, i1 0)
				ret void
				}

	declare void @llvm.amdgcn.buffer.store.f32(float, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.f32(float, <4 x i32>, i32, i32, i1, i1) #0
	declare void @llvm.amdgcn.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i1, i1) #0
	declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0
				declare void @llvm.amdgcn.buffer.store.i8(i8, <4 x i32>, i32, i32, i1, i1) #0
				declare void @llvm.amdgcn.buffer.store.i16(i16, <4 x i32>, i32, i32, i1, i1) #0
	declare <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32>, i32, i32, i1, i1) #1			declare <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32>, i32, i32, i1, i1) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll

Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	main_body:
%fdata_glc = bitcast <2 x i32> %data_glc to <2 x float>		%fdata_glc = bitcast <2 x i32> %data_glc to <2 x float>
%fdata_slc = bitcast i32 %data_slc to float		%fdata_slc = bitcast i32 %data_slc to float
%r0 = insertvalue {<4 x float>, <2 x float>, float} undef, <4 x float> %fdata, 0		%r0 = insertvalue {<4 x float>, <2 x float>, float} undef, <4 x float> %fdata, 0
%r1 = insertvalue {<4 x float>, <2 x float>, float} %r0, <2 x float> %fdata_glc, 1		%r1 = insertvalue {<4 x float>, <2 x float>, float} %r0, <2 x float> %fdata_glc, 1
%r2 = insertvalue {<4 x float>, <2 x float>, float} %r1, float %fdata_slc, 2		%r2 = insertvalue {<4 x float>, <2 x float>, float} %r1, float %fdata_slc, 2
ret {<4 x float>, <2 x float>, float} %r2		ret {<4 x float>, <2 x float>, float} %r2
}		}

		;CHECK-LABEL: {{^}}raw_buffer_load_ubyte:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ubyte v{{[0-9]}}, off, s[0:3], 0
		;CHECK: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @raw_buffer_load_ubyte(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.raw.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 0, i32 0)
		%tmp2 = zext i8 %tmp to i32
		arsenmUnsubmitted Not Done Reply Inline Actions The base test case shouldn't have a extend of the use and directly use the value. You should also have one with an explicit zext arsenm: The base test case shouldn't have a extend of the use and directly use the value. You should…
		%val = uitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}raw_buffer_load_ushort:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ushort v{{[0-9]}}, off, s[0:3], 0
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_u32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @raw_buffer_load_ushort(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.raw.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 0, i32 0)
		%tmp2 = zext i16 %tmp to i32
		%val = uitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}raw_buffer_load_sbyte:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sbyte v{{[0-9]}}, off, s[0:3], 0
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_i32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @raw_buffer_load_sbyte(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.raw.buffer.load.i8(<4 x i32> %rsrc, i32 0, i32 0, i32 0)
		%tmp2 = sext i8 %tmp to i32
		%val = sitofp i32 %tmp2 to float
		ret float %val
		}

		arsenmUnsubmitted Not Done Reply Inline Actions Could use a testcase with a second non-extended use arsenm: Could use a testcase with a second non-extended use
		;CHECK-LABEL: {{^}}raw_buffer_load_sshort:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sshort v{{[0-9]}}, off, s[0:3], 0
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_i32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @raw_buffer_load_sshort(<4 x i32> inreg %rsrc) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.raw.buffer.load.i16(<4 x i32> %rsrc, i32 0, i32 0, i32 0)
		%tmp2 = sext i16 %tmp to i32
		%val = sitofp i32 %tmp2 to float
		ret float %val
		}

declare float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32>, i32, i32, i32) #0		declare float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32>, i32, i32, i32) #0
declare <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32>, i32, i32, i32) #0		declare <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32>, i32, i32, i32) #0
declare <4 x float> @llvm.amdgcn.raw.buffer.load.v4f32(<4 x i32>, i32, i32, i32) #0		declare <4 x float> @llvm.amdgcn.raw.buffer.load.v4f32(<4 x i32>, i32, i32, i32) #0
declare i32 @llvm.amdgcn.raw.buffer.load.i32(<4 x i32>, i32, i32, i32) #0		declare i32 @llvm.amdgcn.raw.buffer.load.i32(<4 x i32>, i32, i32, i32) #0
declare <2 x i32> @llvm.amdgcn.raw.buffer.load.v2i32(<4 x i32>, i32, i32, i32) #0		declare <2 x i32> @llvm.amdgcn.raw.buffer.load.v2i32(<4 x i32>, i32, i32, i32) #0
declare <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32>, i32, i32, i32) #0		declare <4 x i32> @llvm.amdgcn.raw.buffer.load.v4i32(<4 x i32>, i32, i32, i32) #0
declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0		declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
		declare i8 @llvm.amdgcn.raw.buffer.load.i8(<4 x i32>, i32, i32, i32) #0
		declare i16 @llvm.amdgcn.raw.buffer.load.i16(<4 x i32>, i32, i32, i32) #0

attributes #0 = { nounwind readonly }		attributes #0 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.store.ll

	Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	define amdgpu_ps void @buffer_store_int(<4 x i32> inreg, <4 x i32>, <2 x i32>, i32) {			define amdgpu_ps void @buffer_store_int(<4 x i32> inreg, <4 x i32>, <2 x i32>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %1, <4 x i32> %0, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32> %1, <4 x i32> %0, i32 0, i32 0, i32 0)
	call void @llvm.amdgcn.raw.buffer.store.v2i32(<2 x i32> %2, <4 x i32> %0, i32 0, i32 0, i32 1)			call void @llvm.amdgcn.raw.buffer.store.v2i32(<2 x i32> %2, <4 x i32> %0, i32 0, i32 0, i32 1)
	call void @llvm.amdgcn.raw.buffer.store.i32(i32 %3, <4 x i32> %0, i32 0, i32 0, i32 2)			call void @llvm.amdgcn.raw.buffer.store.i32(i32 %3, <4 x i32> %0, i32 0, i32 0, i32 2)
	ret void			ret void
	}			}

				;CHECK-LABEL: {{^}}raw_buffer_store_byte:
				;CHECK-NEXT: %bb.
				;CHECK-NEXT: v_cvt_u32_f32_e32 v{{[0-9]}}, v{{[0-9]}}
				;CHECK-NEXT: buffer_store_byte v{{[0-9]}}, off, s[0:3], 0
				;CHECK-NEXT: s_endpgm
				define amdgpu_ps void @raw_buffer_store_byte(<4 x i32> inreg %rsrc, float %v1) {
				main_body:
				%v2 = fptoui float %v1 to i32
				%v3 = trunc i32 %v2 to i8
				call void @llvm.amdgcn.raw.buffer.store.i8(i8 %v3, <4 x i32> %rsrc, i32 0, i32 0, i32 0)
				ret void
				}

				;CHECK-LABEL: {{^}}raw_buffer_store_short:
				;CHECK-NEXT: %bb.
				;CHECK-NEXT: v_cvt_u32_f32_e32 v{{[0-9]}}, v{{[0-9]}}
				;CHECK-NEXT: buffer_store_short v{{[0-9]}}, off, s[0:3], 0
				;CHECK-NEXT: s_endpgm
				define amdgpu_ps void @raw_buffer_store_short(<4 x i32> inreg %rsrc, float %v1) {
				main_body:
				%v2 = fptoui float %v1 to i32
				%v3 = trunc i32 %v2 to i16
				call void @llvm.amdgcn.raw.buffer.store.i16(i16 %v3, <4 x i32> %rsrc, i32 0, i32 0, i32 0)
				ret void
				}

	declare void @llvm.amdgcn.raw.buffer.store.f32(float, <4 x i32>, i32, i32, i32) #0			declare void @llvm.amdgcn.raw.buffer.store.f32(float, <4 x i32>, i32, i32, i32) #0
	declare void @llvm.amdgcn.raw.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i32) #0			declare void @llvm.amdgcn.raw.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i32) #0
	declare void @llvm.amdgcn.raw.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32) #0			declare void @llvm.amdgcn.raw.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32) #0
	declare void @llvm.amdgcn.raw.buffer.store.i32(i32, <4 x i32>, i32, i32, i32) #0			declare void @llvm.amdgcn.raw.buffer.store.i32(i32, <4 x i32>, i32, i32, i32) #0
	declare void @llvm.amdgcn.raw.buffer.store.v2i32(<2 x i32>, <4 x i32>, i32, i32, i32) #0			declare void @llvm.amdgcn.raw.buffer.store.v2i32(<2 x i32>, <4 x i32>, i32, i32, i32) #0
	declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32) #0			declare void @llvm.amdgcn.raw.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32) #0
	declare <4 x float> @llvm.amdgcn.raw.buffer.load.v4f32(<4 x i32>, i32, i32, i32) #1			declare <4 x float> @llvm.amdgcn.raw.buffer.load.v4f32(<4 x i32>, i32, i32, i32) #1
				declare void @llvm.amdgcn.raw.buffer.store.i8(i8, <4 x i32>, i32, i32, i32) #0
				declare void @llvm.amdgcn.raw.buffer.store.i16(i16, <4 x i32>, i32, i32, i32) #0

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.load.ll

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	main_body:
%fdata_glc = bitcast <2 x i32> %data_glc to <2 x float>		%fdata_glc = bitcast <2 x i32> %data_glc to <2 x float>
%fdata_slc = bitcast i32 %data_slc to float		%fdata_slc = bitcast i32 %data_slc to float
%r0 = insertvalue {<4 x float>, <2 x float>, float} undef, <4 x float> %fdata, 0		%r0 = insertvalue {<4 x float>, <2 x float>, float} undef, <4 x float> %fdata, 0
%r1 = insertvalue {<4 x float>, <2 x float>, float} %r0, <2 x float> %fdata_glc, 1		%r1 = insertvalue {<4 x float>, <2 x float>, float} %r0, <2 x float> %fdata_glc, 1
%r2 = insertvalue {<4 x float>, <2 x float>, float} %r1, float %fdata_slc, 2		%r2 = insertvalue {<4 x float>, <2 x float>, float} %r1, float %fdata_slc, 2
ret {<4 x float>, <2 x float>, float} %r2		ret {<4 x float>, <2 x float>, float} %r2
}		}

		;CHECK-LABEL: {{^}}struct_buffer_load_ubyte:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ubyte v{{[0-9]}}, v[0:1], s[0:3], 0 idxen offen
		;CHECK: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @struct_buffer_load_ubyte(<4 x i32> inreg %rsrc, i32 %idx, i32 %ofs) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.struct.buffer.load.i8(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i32 0, i32 0)
		%tmp2 = zext i8 %tmp to i32
		%val = uitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}struct_buffer_load_ushort:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_ushort v{{[0-9]}}, v[0:1], s[0:3], 0 idxen offen
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_u32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @struct_buffer_load_ushort(<4 x i32> inreg %rsrc, i32 %idx, i32 %ofs) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.struct.buffer.load.i16(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i32 0, i32 0)
		%tmp2 = zext i16 %tmp to i32
		%val = uitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}struct_buffer_load_sbyte:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sbyte v{{[0-9]}}, v[0:1], s[0:3], 0 idxen offen
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_i32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @struct_buffer_load_sbyte(<4 x i32> inreg %rsrc, i32 %idx, i32 %ofs) {
		main_body:
		%tmp = call i8 @llvm.amdgcn.struct.buffer.load.i8(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i32 0, i32 0)
		%tmp2 = sext i8 %tmp to i32
		%val = sitofp i32 %tmp2 to float
		ret float %val
		}

		;CHECK-LABEL: {{^}}struct_buffer_load_sshort:
		;CHECK-NEXT: %bb.
		;CHECK-NEXT: buffer_load_sshort v{{[0-9]}}, v[0:1], s[0:3], 0 idxen offen
		;CHECK-NEXT: s_waitcnt vmcnt(0)
		;CHECK-NEXT: v_cvt_f32_i32_e32 v0, v0
		;CHECK-NEXT: ; return to shader part epilog
		define amdgpu_ps float @struct_buffer_load_sshort(<4 x i32> inreg %rsrc, i32 %idx, i32 %ofs) {
		main_body:
		%tmp = call i16 @llvm.amdgcn.struct.buffer.load.i16(<4 x i32> %rsrc, i32 %idx, i32 %ofs, i32 0, i32 0)
		%tmp2 = sext i16 %tmp to i32
		%val = sitofp i32 %tmp2 to float
		ret float %val
		}

declare float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32>, i32, i32, i32, i32) #0		declare float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32>, i32, i32, i32, i32) #0
declare <2 x float> @llvm.amdgcn.struct.buffer.load.v2f32(<4 x i32>, i32, i32, i32, i32) #0		declare <2 x float> @llvm.amdgcn.struct.buffer.load.v2f32(<4 x i32>, i32, i32, i32, i32) #0
declare <4 x float> @llvm.amdgcn.struct.buffer.load.v4f32(<4 x i32>, i32, i32, i32, i32) #0		declare <4 x float> @llvm.amdgcn.struct.buffer.load.v4f32(<4 x i32>, i32, i32, i32, i32) #0
declare i32 @llvm.amdgcn.struct.buffer.load.i32(<4 x i32>, i32, i32, i32, i32) #0		declare i32 @llvm.amdgcn.struct.buffer.load.i32(<4 x i32>, i32, i32, i32, i32) #0
declare <2 x i32> @llvm.amdgcn.struct.buffer.load.v2i32(<4 x i32>, i32, i32, i32, i32) #0		declare <2 x i32> @llvm.amdgcn.struct.buffer.load.v2i32(<4 x i32>, i32, i32, i32, i32) #0
declare <4 x i32> @llvm.amdgcn.struct.buffer.load.v4i32(<4 x i32>, i32, i32, i32, i32) #0		declare <4 x i32> @llvm.amdgcn.struct.buffer.load.v4i32(<4 x i32>, i32, i32, i32, i32) #0
declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0		declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0
		declare i8 @llvm.amdgcn.struct.buffer.load.i8(<4 x i32>, i32, i32, i32, i32) #0
		declare i16 @llvm.amdgcn.struct.buffer.load.i16(<4 x i32>, i32, i32, i32, i32) #0

attributes #0 = { nounwind readonly }		attributes #0 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.struct.buffer.store.ll

	Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	define amdgpu_ps void @buffer_store_int(<4 x i32> inreg, <4 x i32>, <2 x i32>, i32) {			define amdgpu_ps void @buffer_store_int(<4 x i32> inreg, <4 x i32>, <2 x i32>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.struct.buffer.store.v4i32(<4 x i32> %1, <4 x i32> %0, i32 0, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.v4i32(<4 x i32> %1, <4 x i32> %0, i32 0, i32 0, i32 0, i32 0)
	call void @llvm.amdgcn.struct.buffer.store.v2i32(<2 x i32> %2, <4 x i32> %0, i32 0, i32 0, i32 0, i32 1)			call void @llvm.amdgcn.struct.buffer.store.v2i32(<2 x i32> %2, <4 x i32> %0, i32 0, i32 0, i32 0, i32 1)
	call void @llvm.amdgcn.struct.buffer.store.i32(i32 %3, <4 x i32> %0, i32 0, i32 0, i32 0, i32 2)			call void @llvm.amdgcn.struct.buffer.store.i32(i32 %3, <4 x i32> %0, i32 0, i32 0, i32 0, i32 2)
	ret void			ret void
	}			}

				;CHECK-LABEL: {{^}}struct_buffer_store_byte:
				;CHECK-NEXT: %bb.
				;CHECK-NEXT: v_cvt_u32_f32_e32 v{{[0-9]}}, v{{[0-9]}}
				;CHECK-NEXT: buffer_store_byte v{{[0-9]}}, v{{[0-9]}}, s[0:3], 0 idxen
				;CHECK-NEXT: s_endpgm
				define amdgpu_ps void @struct_buffer_store_byte(<4 x i32> inreg %rsrc, float %v1, i32 %index) {
				main_body:
				%v2 = fptoui float %v1 to i32
				%v3 = trunc i32 %v2 to i8
				call void @llvm.amdgcn.struct.buffer.store.i8(i8 %v3, <4 x i32> %rsrc, i32 %index, i32 0, i32 0, i32 0)
				ret void
				}

				;CHECK-LABEL: {{^}}struct_buffer_store_short:
				;CHECK-NEXT: %bb.
				;CHECK-NEXT: v_cvt_u32_f32_e32 v{{[0-9]}}, v{{[0-9]}}
				;CHECK-NEXT: buffer_store_short v{{[0-9]}}, v{{[0-9]}}, s[0:3], 0 idxen
				;CHECK-NEXT: s_endpgm
				define amdgpu_ps void @struct_buffer_store_short(<4 x i32> inreg %rsrc, float %v1, i32 %index) {
				main_body:
				%v2 = fptoui float %v1 to i32
				%v3 = trunc i32 %v2 to i16
				call void @llvm.amdgcn.struct.buffer.store.i16(i16 %v3, <4 x i32> %rsrc, i32 %index, i32 0, i32 0, i32 0)
				ret void
				}

	declare void @llvm.amdgcn.struct.buffer.store.f32(float, <4 x i32>, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.struct.buffer.store.f32(float, <4 x i32>, i32, i32, i32, i32) #0
	declare void @llvm.amdgcn.struct.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.struct.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i32, i32) #0
	declare void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i32, i32) #0
	declare void @llvm.amdgcn.struct.buffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.struct.buffer.store.i32(i32, <4 x i32>, i32, i32, i32, i32) #0
	declare void @llvm.amdgcn.struct.buffer.store.v2i32(<2 x i32>, <4 x i32>, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.struct.buffer.store.v2i32(<2 x i32>, <4 x i32>, i32, i32, i32, i32) #0
	declare void @llvm.amdgcn.struct.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32) #0			declare void @llvm.amdgcn.struct.buffer.store.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32) #0
	declare <4 x float> @llvm.amdgcn.struct.buffer.load.v4f32(<4 x i32>, i32, i32, i32, i32) #1			declare <4 x float> @llvm.amdgcn.struct.buffer.load.v4f32(<4 x i32>, i32, i32, i32, i32) #1
				declare void @llvm.amdgcn.struct.buffer.store.i8(i8, <4 x i32>, i32, i32, i32, i32) #0
				declare void @llvm.amdgcn.struct.buffer.store.i16(i16, <4 x i32>, i32, i32, i32, i32) #0

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }