This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Lower buffer store and atomic intrinsics manually
ClosedPublic

Authored by mareko on Oct 18 2017, 9:49 AM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle

Commits

rG5cec64195cee: AMDGPU: Lower buffer store and atomic intrinsics manually
rL317754: AMDGPU: Lower buffer store and atomic intrinsics manually

Summary

Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Diff Detail

Build Status

Buildable 11689
Build 11689: arc lint + arc unit

Event Timeline

mareko created this revision.Oct 18 2017, 9:49 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptOct 18 2017, 9:49 AM

nhaehnle added inline comments.Oct 23 2017, 8:58 AM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	Unfortunately, there's not a lot of documentation on MOVolatile, but I suspect this should not be set at least when GLC == SLC == 0. And I image that that would fix the issue with D39012 as well... (which means the order of patches should be reversed).

arsenm added inline comments.Oct 23 2017, 9:40 AM

lib/Target/AMDGPU/SIISelLowering.cpp
4264–4265	Move to a separate function?
4566–4568	You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to accomplish. I also think we aren't setting volatile directly to GLC and the memory legalizer pass is supposed to set GLC.

Can the same be achieved by implementing getTgtMemIntrinsic?

nhaehnle added inline comments.Oct 23 2017, 10:18 AM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to coherent buffer objects, but those still need memoryBarrier()s for guaranteed ordering. So I agree, buffer stores should never be MOVolatile.

t-tye added inline comments.Oct 23 2017, 8:44 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	MMO now have atomic memory ordering and memory scope that convey how atomics are required to be coherent. The memory legalizer pass uses this information to set glc bit, generate appropriate watcnt, and cache invalidate instructions. These are separate from the volatile property which has a different purpose. So if the goal is to request atomic coherence (release/acquire memory model semantics) shouldn't the MMO memory ordering/scope be set correctly?

nhaehnle added inline comments.Oct 24 2017, 5:37 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by themselves. As far as GLSL is concerned, some buffer stores (those to "coherent" buffers) can be combined with memoryBarrier builtins, in which case there are some guarantees about ordering wrt other shader invocations, but the stores themselves provide no such guarantee. Maybe the "coherent" flag can be modeled with with those memory scopes - where are they documented? In general, I definitely agree that we should use the MMO machinery correctly :)

nhaehnle added inline comments.Oct 24 2017, 5:40 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	That said, it of course makes sense to talk about how to set the MMO for buffer_atomic intrinsics as well. "relaxed" (or "monotonic", in LLVM speak) ordering might actually be sufficient for those for GLSL semantics (again, because GLSL kind of wants you to add explicit memoryBarrier() builtin function calls), but I haven't fully thought this through.

t-tye added inline comments.Oct 24 2017, 6:20 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	For the AMDGPU target the memory model currently implemented is documented in https://llvm.org/docs/AMDGPUUsage.html#memory-model . It does include the LLVM IR fence. Feel free to ping me if you want to discuss what settings you should use to achieve the GLSL memory model semantics as we worked through doing the OpenCL/HSA memory model mapping.

mareko added inline comments.Oct 31 2017, 1:49 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4264–4265	I would if it had more uses.
4566–4568	We could do what amdgcn_atomic_inc/dec intrinsics do: have "i1 volatile" as an intrinsic parameter. In the meantime, I'll just remove MOVolatile.

don't set MOVolatile
this precedes the buffer store merging patch

t-tye added inline comments.Oct 31 2017, 3:49 PM

lib/Target/AMDGPU/SIISelLowering.cpp
4566–4568	Note that using volatile is not the same think as using the atomic memory_order. So if intrinsics are relying on volatile to indicate an atomic operation without setting the memory_order correctly, then that sounds like a bug that would be good to fix:-)

Closed by commit rL317754: AMDGPU: Lower buffer store and atomic intrinsics manually (authored by mareko). · Explain WhyNov 8 2017, 5:53 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUISelLowering.h

13 lines

AMDGPUISelLowering.cpp

13 lines

BUFInstructions.td

40 lines

SIISelLowering.cpp

113 lines

SIInstrInfo.td

47 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.buffer.atomic.ll

3 lines

llvm.amdgcn.buffer.store.format.ll

9 lines

llvm.amdgcn.buffer.store.ll

9 lines

llvm.amdgcn.image.atomic.ll

7 lines

llvm.amdgcn.image.ll

14 lines

llvm.amdgcn.s.waitcnt.ll

2 lines

Diff 121056

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
TBUFFER_STORE_FORMAT,		TBUFFER_STORE_FORMAT,
TBUFFER_STORE_FORMAT_X3,		TBUFFER_STORE_FORMAT_X3,
TBUFFER_LOAD_FORMAT,		TBUFFER_LOAD_FORMAT,
ATOMIC_CMP_SWAP,		ATOMIC_CMP_SWAP,
ATOMIC_INC,		ATOMIC_INC,
ATOMIC_DEC,		ATOMIC_DEC,
BUFFER_LOAD,		BUFFER_LOAD,
BUFFER_LOAD_FORMAT,		BUFFER_LOAD_FORMAT,
		BUFFER_STORE,
		BUFFER_STORE_FORMAT,
		BUFFER_ATOMIC_SWAP,
		BUFFER_ATOMIC_ADD,
		BUFFER_ATOMIC_SUB,
		BUFFER_ATOMIC_SMIN,
		BUFFER_ATOMIC_UMIN,
		BUFFER_ATOMIC_SMAX,
		BUFFER_ATOMIC_UMAX,
		BUFFER_ATOMIC_AND,
		BUFFER_ATOMIC_OR,
		BUFFER_ATOMIC_XOR,
		BUFFER_ATOMIC_CMPSWAP,
LAST_AMDGPU_ISD_NUMBER		LAST_AMDGPU_ISD_NUMBER
};		};


} // End namespace AMDGPUISD		} // End namespace AMDGPUISD

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 3,981 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(TBUFFER_STORE_FORMAT)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT)
NODE_NAME_CASE(TBUFFER_STORE_FORMAT_X3)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT_X3)
NODE_NAME_CASE(TBUFFER_LOAD_FORMAT)		NODE_NAME_CASE(TBUFFER_LOAD_FORMAT)
NODE_NAME_CASE(ATOMIC_CMP_SWAP)		NODE_NAME_CASE(ATOMIC_CMP_SWAP)
NODE_NAME_CASE(ATOMIC_INC)		NODE_NAME_CASE(ATOMIC_INC)
NODE_NAME_CASE(ATOMIC_DEC)		NODE_NAME_CASE(ATOMIC_DEC)
NODE_NAME_CASE(BUFFER_LOAD)		NODE_NAME_CASE(BUFFER_LOAD)
NODE_NAME_CASE(BUFFER_LOAD_FORMAT)		NODE_NAME_CASE(BUFFER_LOAD_FORMAT)
		NODE_NAME_CASE(BUFFER_STORE)
		NODE_NAME_CASE(BUFFER_STORE_FORMAT)
		NODE_NAME_CASE(BUFFER_ATOMIC_SWAP)
		NODE_NAME_CASE(BUFFER_ATOMIC_ADD)
		NODE_NAME_CASE(BUFFER_ATOMIC_SUB)
		NODE_NAME_CASE(BUFFER_ATOMIC_SMIN)
		NODE_NAME_CASE(BUFFER_ATOMIC_UMIN)
		NODE_NAME_CASE(BUFFER_ATOMIC_SMAX)
		NODE_NAME_CASE(BUFFER_ATOMIC_UMAX)
		NODE_NAME_CASE(BUFFER_ATOMIC_AND)
		NODE_NAME_CASE(BUFFER_ATOMIC_OR)
		NODE_NAME_CASE(BUFFER_ATOMIC_XOR)
		NODE_NAME_CASE(BUFFER_ATOMIC_CMPSWAP)
case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break;		case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break;
}		}
return nullptr;		return nullptr;
}		}

SDValue AMDGPUTargetLowering::getSqrtEstimate(SDValue Operand,		SDValue AMDGPUTargetLowering::getSqrtEstimate(SDValue Operand,
SelectionDAG &DAG, int Enabled,		SelectionDAG &DAG, int Enabled,
int &RefinementSteps,		int &RefinementSteps,
▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

lib/Target/AMDGPU/BUFInstructions.td

Show First 20 Lines • Show All 960 Lines • ▼ Show 20 Lines	def : GCNPat<
(!cast<MUBUF_Pseudo>(opcode # _BOTHEN_exact)		(!cast<MUBUF_Pseudo>(opcode # _BOTHEN_exact)
$vdata,		$vdata,
(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),		(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),
$rsrc, $soffset, (as_i16imm $offset),		$rsrc, $soffset, (as_i16imm $offset),
(as_i1imm $glc), (as_i1imm $slc), 0)		(as_i1imm $glc), (as_i1imm $slc), 0)
>;		>;
}		}

defm : MUBUF_StoreIntrinsicPat<int_amdgcn_buffer_store_format, f32, "BUFFER_STORE_FORMAT_X">;		defm : MUBUF_StoreIntrinsicPat<SIbuffer_store_format, f32, "BUFFER_STORE_FORMAT_X">;
defm : MUBUF_StoreIntrinsicPat<int_amdgcn_buffer_store_format, v2f32, "BUFFER_STORE_FORMAT_XY">;		defm : MUBUF_StoreIntrinsicPat<SIbuffer_store_format, v2f32, "BUFFER_STORE_FORMAT_XY">;
defm : MUBUF_StoreIntrinsicPat<int_amdgcn_buffer_store_format, v4f32, "BUFFER_STORE_FORMAT_XYZW">;		defm : MUBUF_StoreIntrinsicPat<SIbuffer_store_format, v4f32, "BUFFER_STORE_FORMAT_XYZW">;
defm : MUBUF_StoreIntrinsicPat<int_amdgcn_buffer_store, f32, "BUFFER_STORE_DWORD">;		defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, f32, "BUFFER_STORE_DWORD">;
defm : MUBUF_StoreIntrinsicPat<int_amdgcn_buffer_store, v2f32, "BUFFER_STORE_DWORDX2">;		defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v2f32, "BUFFER_STORE_DWORDX2">;
defm : MUBUF_StoreIntrinsicPat<int_amdgcn_buffer_store, v4f32, "BUFFER_STORE_DWORDX4">;		defm : MUBUF_StoreIntrinsicPat<SIbuffer_store, v4f32, "BUFFER_STORE_DWORDX4">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// buffer_atomic patterns		// buffer_atomic patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass BufferAtomicPatterns<SDPatternOperator name, string opcode> {		multiclass BufferAtomicPatterns<SDPatternOperator name, string opcode> {
def : GCNPat<		def : GCNPat<
(name i32:$vdata_in, v4i32:$rsrc, 0,		(name i32:$vdata_in, v4i32:$rsrc, 0,
Show All 25 Lines	(name i32:$vdata_in, v4i32:$rsrc, i32:$vindex,
imm:$slc),		imm:$slc),
(!cast<MUBUF_Pseudo>(opcode # _BOTHEN_RTN)		(!cast<MUBUF_Pseudo>(opcode # _BOTHEN_RTN)
$vdata_in,		$vdata_in,
(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),		(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),
$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc))		$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc))
>;		>;
}		}

defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_swap, "BUFFER_ATOMIC_SWAP">;		defm : BufferAtomicPatterns<SIbuffer_atomic_swap, "BUFFER_ATOMIC_SWAP">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_add, "BUFFER_ATOMIC_ADD">;		defm : BufferAtomicPatterns<SIbuffer_atomic_add, "BUFFER_ATOMIC_ADD">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_sub, "BUFFER_ATOMIC_SUB">;		defm : BufferAtomicPatterns<SIbuffer_atomic_sub, "BUFFER_ATOMIC_SUB">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_smin, "BUFFER_ATOMIC_SMIN">;		defm : BufferAtomicPatterns<SIbuffer_atomic_smin, "BUFFER_ATOMIC_SMIN">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_umin, "BUFFER_ATOMIC_UMIN">;		defm : BufferAtomicPatterns<SIbuffer_atomic_umin, "BUFFER_ATOMIC_UMIN">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_smax, "BUFFER_ATOMIC_SMAX">;		defm : BufferAtomicPatterns<SIbuffer_atomic_smax, "BUFFER_ATOMIC_SMAX">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_umax, "BUFFER_ATOMIC_UMAX">;		defm : BufferAtomicPatterns<SIbuffer_atomic_umax, "BUFFER_ATOMIC_UMAX">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_and, "BUFFER_ATOMIC_AND">;		defm : BufferAtomicPatterns<SIbuffer_atomic_and, "BUFFER_ATOMIC_AND">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_or, "BUFFER_ATOMIC_OR">;		defm : BufferAtomicPatterns<SIbuffer_atomic_or, "BUFFER_ATOMIC_OR">;
defm : BufferAtomicPatterns<int_amdgcn_buffer_atomic_xor, "BUFFER_ATOMIC_XOR">;		defm : BufferAtomicPatterns<SIbuffer_atomic_xor, "BUFFER_ATOMIC_XOR">;

def : GCNPat<		def : GCNPat<
(int_amdgcn_buffer_atomic_cmpswap		(SIbuffer_atomic_cmpswap
i32:$data, i32:$cmp, v4i32:$rsrc, 0,		i32:$data, i32:$cmp, v4i32:$rsrc, 0,
(MUBUFIntrinsicOffset i32:$soffset, i16:$offset),		(MUBUFIntrinsicOffset i32:$soffset, i16:$offset),
imm:$slc),		imm:$slc),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(BUFFER_ATOMIC_CMPSWAP_OFFSET_RTN		(BUFFER_ATOMIC_CMPSWAP_OFFSET_RTN
(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),		(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),
$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),		$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),
sub0)		sub0)
>;		>;

def : GCNPat<		def : GCNPat<
(int_amdgcn_buffer_atomic_cmpswap		(SIbuffer_atomic_cmpswap
i32:$data, i32:$cmp, v4i32:$rsrc, i32:$vindex,		i32:$data, i32:$cmp, v4i32:$rsrc, i32:$vindex,
(MUBUFIntrinsicOffset i32:$soffset, i16:$offset),		(MUBUFIntrinsicOffset i32:$soffset, i16:$offset),
imm:$slc),		imm:$slc),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(BUFFER_ATOMIC_CMPSWAP_IDXEN_RTN		(BUFFER_ATOMIC_CMPSWAP_IDXEN_RTN
(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),		(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),
$vindex, $rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),		$vindex, $rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),
sub0)		sub0)
>;		>;

def : GCNPat<		def : GCNPat<
(int_amdgcn_buffer_atomic_cmpswap		(SIbuffer_atomic_cmpswap
i32:$data, i32:$cmp, v4i32:$rsrc, 0,		i32:$data, i32:$cmp, v4i32:$rsrc, 0,
(MUBUFIntrinsicVOffset i32:$soffset, i16:$offset, i32:$voffset),		(MUBUFIntrinsicVOffset i32:$soffset, i16:$offset, i32:$voffset),
imm:$slc),		imm:$slc),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(BUFFER_ATOMIC_CMPSWAP_OFFEN_RTN		(BUFFER_ATOMIC_CMPSWAP_OFFEN_RTN
(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),		(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),
$voffset, $rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),		$voffset, $rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),
sub0)		sub0)
>;		>;

def : GCNPat<		def : GCNPat<
(int_amdgcn_buffer_atomic_cmpswap		(SIbuffer_atomic_cmpswap
i32:$data, i32:$cmp, v4i32:$rsrc, i32:$vindex,		i32:$data, i32:$cmp, v4i32:$rsrc, i32:$vindex,
(MUBUFIntrinsicVOffset i32:$soffset, i16:$offset, i32:$voffset),		(MUBUFIntrinsicVOffset i32:$soffset, i16:$offset, i32:$voffset),
imm:$slc),		imm:$slc),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(BUFFER_ATOMIC_CMPSWAP_BOTHEN_RTN		(BUFFER_ATOMIC_CMPSWAP_BOTHEN_RTN
(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),		(REG_SEQUENCE VReg_64, $data, sub0, $cmp, sub1),
(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),		(REG_SEQUENCE VReg_64, $vindex, sub0, $voffset, sub1),
$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),		$rsrc, $soffset, (as_i16imm $offset), (as_i1imm $slc)),
▲ Show 20 Lines • Show All 633 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 4,227 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_tbuffer_load: {

MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo(),		MachinePointerInfo(),
MachineMemOperand::MOLoad,		MachineMemOperand::MOLoad,
VT.getStoreSize(), VT.getStoreSize());		VT.getStoreSize(), VT.getStoreSize());
return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_LOAD_FORMAT, DL,		return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_LOAD_FORMAT, DL,
Op->getVTList(), Ops, VT, MMO);		Op->getVTList(), Ops, VT, MMO);
}		}
		case Intrinsic::amdgcn_buffer_atomic_swap:
		case Intrinsic::amdgcn_buffer_atomic_add:
		case Intrinsic::amdgcn_buffer_atomic_sub:
		case Intrinsic::amdgcn_buffer_atomic_smin:
		case Intrinsic::amdgcn_buffer_atomic_umin:
		case Intrinsic::amdgcn_buffer_atomic_smax:
		case Intrinsic::amdgcn_buffer_atomic_umax:
		case Intrinsic::amdgcn_buffer_atomic_and:
		case Intrinsic::amdgcn_buffer_atomic_or:
		case Intrinsic::amdgcn_buffer_atomic_xor: {
		SDValue Ops[] = {
		Op.getOperand(0), // Chain
		Op.getOperand(2), // vdata
		Op.getOperand(3), // rsrc
		Op.getOperand(4), // vindex
		Op.getOperand(5), // offset
		Op.getOperand(6) // slc
		};
		EVT VT = Op.getOperand(3).getValueType();
		MachineMemOperand *MMO = MF.getMachineMemOperand(
		MachinePointerInfo(),
		MachineMemOperand::MOLoad \|
		MachineMemOperand::MOStore \|
		MachineMemOperand::MODereferenceable \|
		MachineMemOperand::MOVolatile,
		VT.getStoreSize(), 4);
		unsigned Opcode = 0;

		switch (IntrID) {
		case Intrinsic::amdgcn_buffer_atomic_swap:
		arsenmUnsubmitted Not Done Reply Inline Actions Move to a separate function? arsenm: Move to a separate function?
		marekoAuthorUnsubmitted Not Done Reply Inline Actions I would if it had more uses. mareko: I would if it had more uses.
		Opcode = AMDGPUISD::BUFFER_ATOMIC_SWAP;
		break;
		case Intrinsic::amdgcn_buffer_atomic_add:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_ADD;
		break;
		case Intrinsic::amdgcn_buffer_atomic_sub:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_SUB;
		break;
		case Intrinsic::amdgcn_buffer_atomic_smin:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_SMIN;
		break;
		case Intrinsic::amdgcn_buffer_atomic_umin:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_UMIN;
		break;
		case Intrinsic::amdgcn_buffer_atomic_smax:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_SMAX;
		break;
		case Intrinsic::amdgcn_buffer_atomic_umax:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_UMAX;
		break;
		case Intrinsic::amdgcn_buffer_atomic_and:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_AND;
		break;
		case Intrinsic::amdgcn_buffer_atomic_or:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_OR;
		break;
		case Intrinsic::amdgcn_buffer_atomic_xor:
		Opcode = AMDGPUISD::BUFFER_ATOMIC_XOR;
		break;
		default:
		llvm_unreachable("unhandled atomic opcode");
		}

		return DAG.getMemIntrinsicNode(Opcode, DL, Op->getVTList(), Ops, VT, MMO);
		}

		case Intrinsic::amdgcn_buffer_atomic_cmpswap: {
		SDValue Ops[] = {
		Op.getOperand(0), // Chain
		Op.getOperand(2), // src
		Op.getOperand(3), // cmp
		Op.getOperand(4), // rsrc
		Op.getOperand(5), // vindex
		Op.getOperand(6), // offset
		Op.getOperand(7) // slc
		};
		EVT VT = Op.getOperand(4).getValueType();
		MachineMemOperand *MMO = MF.getMachineMemOperand(
		MachinePointerInfo(),
		MachineMemOperand::MOLoad \|
		MachineMemOperand::MOStore \|
		MachineMemOperand::MODereferenceable \|
		MachineMemOperand::MOVolatile,
		VT.getStoreSize(), 4);

		return DAG.getMemIntrinsicNode(AMDGPUISD::BUFFER_ATOMIC_CMPSWAP, DL,
		Op->getVTList(), Ops, VT, MMO);
		}

// Basic sample.		// Basic sample.
case Intrinsic::amdgcn_image_sample:		case Intrinsic::amdgcn_image_sample:
case Intrinsic::amdgcn_image_sample_cl:		case Intrinsic::amdgcn_image_sample_cl:
case Intrinsic::amdgcn_image_sample_d:		case Intrinsic::amdgcn_image_sample_d:
case Intrinsic::amdgcn_image_sample_d_cl:		case Intrinsic::amdgcn_image_sample_d_cl:
case Intrinsic::amdgcn_image_sample_l:		case Intrinsic::amdgcn_image_sample_l:
case Intrinsic::amdgcn_image_sample_b:		case Intrinsic::amdgcn_image_sample_b:
case Intrinsic::amdgcn_image_sample_b_cl:		case Intrinsic::amdgcn_image_sample_b_cl:
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_tbuffer_store: {
MachineMemOperand *MMO = MF.getMachineMemOperand(		MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo(),		MachinePointerInfo(),
MachineMemOperand::MOStore,		MachineMemOperand::MOStore,
VT.getStoreSize(), 4);		VT.getStoreSize(), 4);
return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT, DL,		return DAG.getMemIntrinsicNode(AMDGPUISD::TBUFFER_STORE_FORMAT, DL,
Op->getVTList(), Ops, VT, MMO);		Op->getVTList(), Ops, VT, MMO);
}		}

		case Intrinsic::amdgcn_buffer_store:
		case Intrinsic::amdgcn_buffer_store_format: {
		SDValue Ops[] = {
		Chain,
		Op.getOperand(2), // vdata
		Op.getOperand(3), // rsrc
		Op.getOperand(4), // vindex
		Op.getOperand(5), // offset
		Op.getOperand(6), // glc
		Op.getOperand(7) // slc
		};
		EVT VT = Op.getOperand(3).getValueType();
		MachineMemOperand *MMO = MF.getMachineMemOperand(
		MachinePointerInfo(),
		MachineMemOperand::MOStore \|
		MachineMemOperand::MODereferenceable,
		VT.getStoreSize(), 4);
		nhaehnleUnsubmitted Not Done Reply Inline Actions Unfortunately, there's not a lot of documentation on MOVolatile, but I suspect this should not be set at least when GLC == SLC == 0. And I image that that would fix the issue with D39012 as well... (which means the order of patches should be reversed). nhaehnle: Unfortunately, there's not a lot of documentation on MOVolatile, but I suspect this should not…
		arsenmUnsubmitted Not Done Reply Inline Actions You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to accomplish. I also think we aren't setting volatile directly to GLC and the memory legalizer pass is supposed to set GLC. arsenm: You should not be setting MOVolatile out of nowhere. Adding that defeats what you are trying to…
		nhaehnleUnsubmitted Not Done Reply Inline Actions You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to coherent buffer objects, but those still need memoryBarrier()s for guaranteed ordering. So I agree, buffer stores should never be MOVolatile. nhaehnle: You're right, MOVolatile should be unnecessary even with GLC. I was thinking of GLSL writes to…
		t-tyeUnsubmitted Not Done Reply Inline Actions MMO now have atomic memory ordering and memory scope that convey how atomics are required to be coherent. The memory legalizer pass uses this information to set glc bit, generate appropriate watcnt, and cache invalidate instructions. These are separate from the volatile property which has a different purpose. So if the goal is to request atomic coherence (release/acquire memory model semantics) shouldn't the MMO memory ordering/scope be set correctly? t-tye: MMO now have atomic memory ordering and memory scope that convey how atomics are required to be…
		nhaehnleUnsubmitted Not Done Reply Inline Actions This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by themselves. As far as GLSL is concerned, some buffer stores (those to "coherent" buffers) can be combined with memoryBarrier builtins, in which case there are some guarantees about ordering wrt other shader invocations, but the stores themselves provide no such guarantee. Maybe the "coherent" flag can be modeled with with those memory scopes - where are they documented? In general, I definitely agree that we should use the MMO machinery correctly :) nhaehnle: This is mostly about stores, not atomics. (GLSL) buffer stores don't imply any ordering by…
		nhaehnleUnsubmitted Not Done Reply Inline Actions That said, it of course makes sense to talk about how to set the MMO for buffer_atomic intrinsics as well. "relaxed" (or "monotonic", in LLVM speak) ordering might actually be sufficient for those for GLSL semantics (again, because GLSL kind of wants you to add explicit memoryBarrier() builtin function calls), but I haven't fully thought this through. nhaehnle: That said, it of course makes sense to talk about how to set the MMO for buffer_atomic…
		t-tyeUnsubmitted Not Done Reply Inline Actions For the AMDGPU target the memory model currently implemented is documented in https://llvm.org/docs/AMDGPUUsage.html#memory-model . It does include the LLVM IR fence. Feel free to ping me if you want to discuss what settings you should use to achieve the GLSL memory model semantics as we worked through doing the OpenCL/HSA memory model mapping. t-tye: For the AMDGPU target the memory model currently implemented is documented in https://llvm.
		marekoAuthorUnsubmitted Not Done Reply Inline Actions We could do what amdgcn_atomic_inc/dec intrinsics do: have "i1 volatile" as an intrinsic parameter. In the meantime, I'll just remove MOVolatile. mareko: We could do what amdgcn_atomic_inc/dec intrinsics do: have "i1 volatile" as an intrinsic…
		t-tyeUnsubmitted Not Done Reply Inline Actions Note that using volatile is not the same think as using the atomic memory_order. So if intrinsics are relying on volatile to indicate an atomic operation without setting the memory_order correctly, then that sounds like a bug that would be good to fix:-) t-tye: Note that using volatile is not the same think as using the atomic memory_order. So if…

		unsigned Opcode = IntrinsicID == Intrinsic::amdgcn_buffer_store ?
		AMDGPUISD::BUFFER_STORE :
		AMDGPUISD::BUFFER_STORE_FORMAT;
		return DAG.getMemIntrinsicNode(Opcode, DL, Op->getVTList(), Ops, VT, MMO);
		}

default:		default:
return Op;		return Op;
}		}
}		}

SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
LoadSDNode *Load = cast<LoadSDNode>(Op);		LoadSDNode *Load = cast<LoadSDNode>(Op);
▲ Show 20 Lines • Show All 2,384 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	def SDTBufferLoad : SDTypeProfile<1, 5,
SDTCisVT<4, i1>, // glc		SDTCisVT<4, i1>, // glc
SDTCisVT<5, i1>]>; // slc		SDTCisVT<5, i1>]>; // slc

def SIbuffer_load : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoad,		def SIbuffer_load : SDNode <"AMDGPUISD::BUFFER_LOAD", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;
def SIbuffer_load_format : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT", SDTBufferLoad,		def SIbuffer_load_format : SDNode <"AMDGPUISD::BUFFER_LOAD_FORMAT", SDTBufferLoad,
[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad]>;

		def SDTBufferStore : SDTypeProfile<0, 6,
		[ // vdata
		SDTCisVT<1, v4i32>, // rsrc
		SDTCisVT<2, i32>, // vindex
		SDTCisVT<3, i32>, // offset
		SDTCisVT<4, i1>, // glc
		SDTCisVT<5, i1>]>; // slc

		def SIbuffer_store : SDNode <"AMDGPUISD::BUFFER_STORE", SDTBufferStore,
		[SDNPMemOperand, SDNPHasChain, SDNPMayStore]>;
		def SIbuffer_store_format : SDNode <"AMDGPUISD::BUFFER_STORE_FORMAT", SDTBufferStore,
		[SDNPMemOperand, SDNPHasChain, SDNPMayStore]>;

		class SDBufferAtomic<string opcode> : SDNode <opcode,
		SDTypeProfile<1, 5,
		[SDTCisVT<0, i32>, // dst
		SDTCisVT<1, i32>, // vdata
		SDTCisVT<2, v4i32>, // rsrc
		SDTCisVT<3, i32>, // vindex
		SDTCisVT<4, i32>, // offset
		SDTCisVT<5, i1>]>, // slc
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad, SDNPMayStore]
		>;

		def SIbuffer_atomic_swap : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_SWAP">;
		def SIbuffer_atomic_add : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_ADD">;
		def SIbuffer_atomic_sub : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_SUB">;
		def SIbuffer_atomic_smin : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_SMIN">;
		def SIbuffer_atomic_umin : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_UMIN">;
		def SIbuffer_atomic_smax : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_SMAX">;
		def SIbuffer_atomic_umax : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_UMAX">;
		def SIbuffer_atomic_and : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_AND">;
		def SIbuffer_atomic_or : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_OR">;
		def SIbuffer_atomic_xor : SDBufferAtomic <"AMDGPUISD::BUFFER_ATOMIC_XOR">;

		def SIbuffer_atomic_cmpswap : SDNode <"AMDGPUISD::BUFFER_ATOMIC_CMPSWAP",
		SDTypeProfile<1, 6,
		[SDTCisVT<0, i32>, // dst
		SDTCisVT<1, i32>, // src
		SDTCisVT<2, i32>, // cmp
		SDTCisVT<3, v4i32>, // rsrc
		SDTCisVT<4, i32>, // vindex
		SDTCisVT<5, i32>, // offset
		SDTCisVT<6, i1>]>, // slc
		[SDNPMemOperand, SDNPHasChain, SDNPMayLoad, SDNPMayStore]
		>;

class SDSample<string opcode> : SDNode <opcode,		class SDSample<string opcode> : SDNode <opcode,
SDTypeProfile<1, 4, [SDTCisVT<0, v4f32>, SDTCisVT<2, v8i32>,		SDTypeProfile<1, 4, [SDTCisVT<0, v4f32>, SDTCisVT<2, v8i32>,
SDTCisVT<3, v4i32>, SDTCisVT<4, i32>]>		SDTCisVT<3, v4i32>, SDTCisVT<4, i32>]>
>;		>;

def SIsample : SDSample<"AMDGPUISD::SAMPLE">;		def SIsample : SDSample<"AMDGPUISD::SAMPLE">;
def SIsampleb : SDSample<"AMDGPUISD::SAMPLEB">;		def SIsampleb : SDSample<"AMDGPUISD::SAMPLEB">;
def SIsampled : SDSample<"AMDGPUISD::SAMPLED">;		def SIsampled : SDSample<"AMDGPUISD::SAMPLED">;
▲ Show 20 Lines • Show All 1,746 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.atomic.ll

;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=SICI		;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=SICI
;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=VI		;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK -check-prefix=VI

;CHECK-LABEL: {{^}}test1:		;CHECK-LABEL: {{^}}test1:
		;CHECK-NOT: s_waitcnt
;CHECK: buffer_atomic_swap v0, off, s[0:3], 0 glc		;CHECK: buffer_atomic_swap v0, off, s[0:3], 0 glc
;VI: s_movk_i32 [[SOFS:s[0-9]+]], 0x1ffc		;VI: s_movk_i32 [[SOFS:s[0-9]+]], 0x1ffc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_swap v0, v1, s[0:3], 0 idxen glc		;CHECK: buffer_atomic_swap v0, v1, s[0:3], 0 idxen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_swap v0, v2, s[0:3], 0 offen glc		;CHECK: buffer_atomic_swap v0, v2, s[0:3], 0 offen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_swap v0, v[1:2], s[0:3], 0 idxen offen glc		;CHECK: buffer_atomic_swap v0, v[1:2], s[0:3], 0 idxen offen glc
Show All 14 Lines	main_body:
%o5 = call i32 @llvm.amdgcn.buffer.atomic.swap(i32 %o4, <4 x i32> %rsrc, i32 0, i32 %ofs.5, i1 0)		%o5 = call i32 @llvm.amdgcn.buffer.atomic.swap(i32 %o4, <4 x i32> %rsrc, i32 0, i32 %ofs.5, i1 0)
%o6 = call i32 @llvm.amdgcn.buffer.atomic.swap(i32 %o5, <4 x i32> %rsrc, i32 0, i32 8192, i1 0)		%o6 = call i32 @llvm.amdgcn.buffer.atomic.swap(i32 %o5, <4 x i32> %rsrc, i32 0, i32 8192, i1 0)
%unused = call i32 @llvm.amdgcn.buffer.atomic.swap(i32 %o6, <4 x i32> %rsrc, i32 0, i32 0, i1 0)		%unused = call i32 @llvm.amdgcn.buffer.atomic.swap(i32 %o6, <4 x i32> %rsrc, i32 0, i32 0, i1 0)
%out = bitcast i32 %o6 to float		%out = bitcast i32 %o6 to float
ret float %out		ret float %out
}		}

;CHECK-LABEL: {{^}}test2:		;CHECK-LABEL: {{^}}test2:
		;CHECK-NOT: s_waitcnt
;CHECK: buffer_atomic_add v0, v1, s[0:3], 0 idxen glc		;CHECK: buffer_atomic_add v0, v1, s[0:3], 0 idxen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_sub v0, v1, s[0:3], 0 idxen glc		;CHECK: buffer_atomic_sub v0, v1, s[0:3], 0 idxen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_smin v0, v1, s[0:3], 0 idxen glc		;CHECK: buffer_atomic_smin v0, v1, s[0:3], 0 idxen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_umin v0, v1, s[0:3], 0 idxen glc		;CHECK: buffer_atomic_umin v0, v1, s[0:3], 0 idxen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
Show All 21 Lines	main_body:
ret float %out		ret float %out
}		}

; Ideally, we would teach tablegen & friends that cmpswap only modifies the		; Ideally, we would teach tablegen & friends that cmpswap only modifies the
; first vgpr. Since we don't do that yet, the register allocator will have to		; first vgpr. Since we don't do that yet, the register allocator will have to
; create copies which we don't bother to track here.		; create copies which we don't bother to track here.
;		;
;CHECK-LABEL: {{^}}test3:		;CHECK-LABEL: {{^}}test3:
		;CHECK-NOT: s_waitcnt
;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 glc		;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;VI: s_movk_i32 [[SOFS:s[0-9]+]], 0x1ffc		;VI: s_movk_i32 [[SOFS:s[0-9]+]], 0x1ffc
;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, v2, s[0:3], 0 idxen glc		;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, v2, s[0:3], 0 idxen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, v3, s[0:3], 0 offen glc		;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, v3, s[0:3], 0 offen glc
;CHECK: s_waitcnt vmcnt(0)		;CHECK: s_waitcnt vmcnt(0)
;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, v[2:3], s[0:3], 0 idxen offen glc		;CHECK: buffer_atomic_cmpswap {{v\[[0-9]+:[0-9]+\]}}, v[2:3], s[0:3], 0 idxen offen glc
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.store.format.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s			;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s			;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s

	;CHECK-LABEL: {{^}}buffer_store:			;CHECK-LABEL: {{^}}buffer_store:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], off, s[0:3], 0			;CHECK: buffer_store_format_xyzw v[0:3], off, s[0:3], 0
	;CHECK: buffer_store_format_xyzw v[4:7], off, s[0:3], 0 glc			;CHECK: buffer_store_format_xyzw v[4:7], off, s[0:3], 0 glc
	;CHECK: buffer_store_format_xyzw v[8:11], off, s[0:3], 0 slc			;CHECK: buffer_store_format_xyzw v[8:11], off, s[0:3], 0 slc
	define amdgpu_ps void @buffer_store(<4 x i32> inreg, <4 x float>, <4 x float>, <4 x float>) {			define amdgpu_ps void @buffer_store(<4 x i32> inreg, <4 x float>, <4 x float>, <4 x float>) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 0, i1 0, i1 0)
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %2, <4 x i32> %0, i32 0, i32 0, i1 1, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %2, <4 x i32> %0, i32 0, i32 0, i1 1, i1 0)
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %3, <4 x i32> %0, i32 0, i32 0, i1 0, i1 1)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %3, <4 x i32> %0, i32 0, i32 0, i1 0, i1 1)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_immoffs:			;CHECK-LABEL: {{^}}buffer_store_immoffs:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], off, s[0:3], 0 offset:42			;CHECK: buffer_store_format_xyzw v[0:3], off, s[0:3], 0 offset:42
	define amdgpu_ps void @buffer_store_immoffs(<4 x i32> inreg, <4 x float>) {			define amdgpu_ps void @buffer_store_immoffs(<4 x i32> inreg, <4 x float>) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 42, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 42, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_idx:			;CHECK-LABEL: {{^}}buffer_store_idx:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen			;CHECK: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_idx(<4 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @buffer_store_idx(<4 x i32> inreg, <4 x float>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_ofs:			;CHECK-LABEL: {{^}}buffer_store_ofs:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 offen			;CHECK: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 offen
	define amdgpu_ps void @buffer_store_ofs(<4 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @buffer_store_ofs(<4 x i32> inreg, <4 x float>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 %2, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 %2, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_both:			;CHECK-LABEL: {{^}}buffer_store_both:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], v[4:5], s[0:3], 0 idxen offen			;CHECK: buffer_store_format_xyzw v[0:3], v[4:5], s[0:3], 0 idxen offen
	define amdgpu_ps void @buffer_store_both(<4 x i32> inreg, <4 x float>, i32, i32) {			define amdgpu_ps void @buffer_store_both(<4 x i32> inreg, <4 x float>, i32, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 %3, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 %3, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_both_reversed:			;CHECK-LABEL: {{^}}buffer_store_both_reversed:
	;CHECK: v_mov_b32_e32 v6, v4			;CHECK: v_mov_b32_e32 v6, v4
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], v[5:6], s[0:3], 0 idxen offen			;CHECK: buffer_store_format_xyzw v[0:3], v[5:6], s[0:3], 0 idxen offen
	define amdgpu_ps void @buffer_store_both_reversed(<4 x i32> inreg, <4 x float>, i32, i32) {			define amdgpu_ps void @buffer_store_both_reversed(<4 x i32> inreg, <4 x float>, i32, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %3, i32 %2, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %3, i32 %2, i1 0, i1 0)
	ret void			ret void
	}			}

	; Ideally, the register allocator would avoid the wait here			; Ideally, the register allocator would avoid the wait here
	;			;
	;CHECK-LABEL: {{^}}buffer_store_wait:			;CHECK-LABEL: {{^}}buffer_store_wait:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen			;CHECK: buffer_store_format_xyzw v[0:3], v4, s[0:3], 0 idxen
	;CHECK: s_waitcnt expcnt(0)			;CHECK: s_waitcnt expcnt(0)
	;CHECK: buffer_load_format_xyzw v[0:3], v5, s[0:3], 0 idxen			;CHECK: buffer_load_format_xyzw v[0:3], v5, s[0:3], 0 idxen
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: buffer_store_format_xyzw v[0:3], v6, s[0:3], 0 idxen			;CHECK: buffer_store_format_xyzw v[0:3], v6, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_wait(<4 x i32> inreg, <4 x float>, i32, i32, i32) {			define amdgpu_ps void @buffer_store_wait(<4 x i32> inreg, <4 x float>, i32, i32, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)
	%data = call <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32> %0, i32 %3, i32 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32> %0, i32 %3, i32 0, i1 0, i1 0)
	call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %data, <4 x i32> %0, i32 %4, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float> %data, <4 x i32> %0, i32 %4, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_x1:			;CHECK-LABEL: {{^}}buffer_store_x1:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_x v0, v1, s[0:3], 0 idxen			;CHECK: buffer_store_format_x v0, v1, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_x1(<4 x i32> inreg %rsrc, float %data, i32 %index) {			define amdgpu_ps void @buffer_store_x1(<4 x i32> inreg %rsrc, float %data, i32 %index) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.f32(float %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.f32(float %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_x2:			;CHECK-LABEL: {{^}}buffer_store_x2:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_format_xy v[0:1], v2, s[0:3], 0 idxen			;CHECK: buffer_store_format_xy v[0:1], v2, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_x2(<4 x i32> inreg %rsrc, <2 x float> %data, i32 %index) {			define amdgpu_ps void @buffer_store_x2(<4 x i32> inreg %rsrc, <2 x float> %data, i32 %index) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.format.v2f32(<2 x float> %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.format.v2f32(<2 x float> %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.format.f32(float, <4 x i32>, i32, i32, i1, i1) #0
	declare void @llvm.amdgcn.buffer.store.format.v2f32(<2 x float>, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.format.v2f32(<2 x float>, <4 x i32>, i32, i32, i1, i1) #0
	declare void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.format.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0
	declare <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32>, i32, i32, i1, i1) #1			declare <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32>, i32, i32, i1, i1) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.store.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s			;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s			;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s

	;CHECK-LABEL: {{^}}buffer_store:			;CHECK-LABEL: {{^}}buffer_store:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			;CHECK: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	;CHECK: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 glc			;CHECK: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 glc
	;CHECK: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 slc			;CHECK: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 slc
	define amdgpu_ps void @buffer_store(<4 x i32> inreg, <4 x float>, <4 x float>, <4 x float>) {			define amdgpu_ps void @buffer_store(<4 x i32> inreg, <4 x float>, <4 x float>, <4 x float>) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 0, i1 0, i1 0)
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %2, <4 x i32> %0, i32 0, i32 0, i1 1, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %2, <4 x i32> %0, i32 0, i32 0, i1 1, i1 0)
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %3, <4 x i32> %0, i32 0, i32 0, i1 0, i1 1)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %3, <4 x i32> %0, i32 0, i32 0, i1 0, i1 1)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_immoffs:			;CHECK-LABEL: {{^}}buffer_store_immoffs:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:42			;CHECK: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:42
	define amdgpu_ps void @buffer_store_immoffs(<4 x i32> inreg, <4 x float>) {			define amdgpu_ps void @buffer_store_immoffs(<4 x i32> inreg, <4 x float>) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 42, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 42, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_idx:			;CHECK-LABEL: {{^}}buffer_store_idx:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], v4, s[0:3], 0 idxen			;CHECK: buffer_store_dwordx4 v[0:3], v4, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_idx(<4 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @buffer_store_idx(<4 x i32> inreg, <4 x float>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_ofs:			;CHECK-LABEL: {{^}}buffer_store_ofs:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], v4, s[0:3], 0 offen			;CHECK: buffer_store_dwordx4 v[0:3], v4, s[0:3], 0 offen
	define amdgpu_ps void @buffer_store_ofs(<4 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @buffer_store_ofs(<4 x i32> inreg, <4 x float>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 %2, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 0, i32 %2, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_both:			;CHECK-LABEL: {{^}}buffer_store_both:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], v[4:5], s[0:3], 0 idxen offen			;CHECK: buffer_store_dwordx4 v[0:3], v[4:5], s[0:3], 0 idxen offen
	define amdgpu_ps void @buffer_store_both(<4 x i32> inreg, <4 x float>, i32, i32) {			define amdgpu_ps void @buffer_store_both(<4 x i32> inreg, <4 x float>, i32, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 %3, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 %3, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_both_reversed:			;CHECK-LABEL: {{^}}buffer_store_both_reversed:
	;CHECK: v_mov_b32_e32 v6, v4			;CHECK: v_mov_b32_e32 v6, v4
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], v[5:6], s[0:3], 0 idxen offen			;CHECK: buffer_store_dwordx4 v[0:3], v[5:6], s[0:3], 0 idxen offen
	define amdgpu_ps void @buffer_store_both_reversed(<4 x i32> inreg, <4 x float>, i32, i32) {			define amdgpu_ps void @buffer_store_both_reversed(<4 x i32> inreg, <4 x float>, i32, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %3, i32 %2, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %3, i32 %2, i1 0, i1 0)
	ret void			ret void
	}			}

	; Ideally, the register allocator would avoid the wait here			; Ideally, the register allocator would avoid the wait here
	;			;
	;CHECK-LABEL: {{^}}buffer_store_wait:			;CHECK-LABEL: {{^}}buffer_store_wait:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx4 v[0:3], v4, s[0:3], 0 idxen			;CHECK: buffer_store_dwordx4 v[0:3], v4, s[0:3], 0 idxen
	;CHECK: s_waitcnt expcnt(0)			;CHECK: s_waitcnt expcnt(0)
	;CHECK: buffer_load_dwordx4 v[0:3], v5, s[0:3], 0 idxen			;CHECK: buffer_load_dwordx4 v[0:3], v5, s[0:3], 0 idxen
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: buffer_store_dwordx4 v[0:3], v6, s[0:3], 0 idxen			;CHECK: buffer_store_dwordx4 v[0:3], v6, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_wait(<4 x i32> inreg, <4 x float>, i32, i32, i32) {			define amdgpu_ps void @buffer_store_wait(<4 x i32> inreg, <4 x float>, i32, i32, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %1, <4 x i32> %0, i32 %2, i32 0, i1 0, i1 0)
	%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 %3, i32 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32> %0, i32 %3, i32 0, i1 0, i1 0)
	call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %data, <4 x i32> %0, i32 %4, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v4f32(<4 x float> %data, <4 x i32> %0, i32 %4, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_x1:			;CHECK-LABEL: {{^}}buffer_store_x1:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dword v0, v1, s[0:3], 0 idxen			;CHECK: buffer_store_dword v0, v1, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_x1(<4 x i32> inreg %rsrc, float %data, i32 %index) {			define amdgpu_ps void @buffer_store_x1(<4 x i32> inreg %rsrc, float %data, i32 %index) {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.f32(float %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.f32(float %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}buffer_store_x2:			;CHECK-LABEL: {{^}}buffer_store_x2:
				;CHECK-NOT: s_waitcnt
	;CHECK: buffer_store_dwordx2 v[0:1], v2, s[0:3], 0 idxen			;CHECK: buffer_store_dwordx2 v[0:1], v2, s[0:3], 0 idxen
	define amdgpu_ps void @buffer_store_x2(<4 x i32> inreg %rsrc, <2 x float> %data, i32 %index) #0 {			define amdgpu_ps void @buffer_store_x2(<4 x i32> inreg %rsrc, <2 x float> %data, i32 %index) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.buffer.store.v2f32(<2 x float> %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)			call void @llvm.amdgcn.buffer.store.v2f32(<2 x float> %data, <4 x i32> %rsrc, i32 %index, i32 0, i1 0, i1 0)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.buffer.store.f32(float, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.f32(float, <4 x i32>, i32, i32, i1, i1) #0
	declare void @llvm.amdgcn.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.v2f32(<2 x float>, <4 x i32>, i32, i32, i1, i1) #0
	declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0			declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #0
	declare <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32>, i32, i32, i1, i1) #1			declare <4 x float> @llvm.amdgcn.buffer.load.v4f32(<4 x i32>, i32, i32, i1, i1) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.image.atomic.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -show-mc-encoding -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=SI			;RUN: llc < %s -march=amdgcn -mcpu=verde -show-mc-encoding -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=SI
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -show-mc-encoding -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=VI			;RUN: llc < %s -march=amdgcn -mcpu=tonga -show-mc-encoding -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=VI

	;CHECK-LABEL: {{^}}image_atomic_swap:			;CHECK-LABEL: {{^}}image_atomic_swap:
				;CHECK-NOT: s_waitcnt
	;SI: image_atomic_swap v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x3c,0xf0,0x00,0x04,0x00,0x00]			;SI: image_atomic_swap v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x3c,0xf0,0x00,0x04,0x00,0x00]
	;VI: image_atomic_swap v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0x00,0x04,0x00,0x00]			;VI: image_atomic_swap v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_atomic_swap(<8 x i32> inreg, <4 x i32>, i32) {			define amdgpu_ps float @image_atomic_swap(<8 x i32> inreg, <4 x i32>, i32) {
	main_body:			main_body:
	%orig = call i32 @llvm.amdgcn.image.atomic.swap.v4i32(i32 %2, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)			%orig = call i32 @llvm.amdgcn.image.atomic.swap.v4i32(i32 %2, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)
	%orig.f = bitcast i32 %orig to float			%orig.f = bitcast i32 %orig to float
	ret float %orig.f			ret float %orig.f
	}			}

	;CHECK-LABEL: {{^}}image_atomic_swap_v2i32:			;CHECK-LABEL: {{^}}image_atomic_swap_v2i32:
				;CHECK-NOT: s_waitcnt
	;SI: image_atomic_swap v2, v[0:1], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x3c,0xf0,0x00,0x02,0x00,0x00]			;SI: image_atomic_swap v2, v[0:1], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x3c,0xf0,0x00,0x02,0x00,0x00]
	;VI: image_atomic_swap v2, v[0:1], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0x00,0x02,0x00,0x00]			;VI: image_atomic_swap v2, v[0:1], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0x00,0x02,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_atomic_swap_v2i32(<8 x i32> inreg, <2 x i32>, i32) {			define amdgpu_ps float @image_atomic_swap_v2i32(<8 x i32> inreg, <2 x i32>, i32) {
	main_body:			main_body:
	%orig = call i32 @llvm.amdgcn.image.atomic.swap.v2i32(i32 %2, <2 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)			%orig = call i32 @llvm.amdgcn.image.atomic.swap.v2i32(i32 %2, <2 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)
	%orig.f = bitcast i32 %orig to float			%orig.f = bitcast i32 %orig to float
	ret float %orig.f			ret float %orig.f
	}			}

	;CHECK-LABEL: {{^}}image_atomic_swap_i32:			;CHECK-LABEL: {{^}}image_atomic_swap_i32:
				;CHECK-NOT: s_waitcnt
	;SI: image_atomic_swap v1, v0, s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x3c,0xf0,0x00,0x01,0x00,0x00]			;SI: image_atomic_swap v1, v0, s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x3c,0xf0,0x00,0x01,0x00,0x00]
	;VI: image_atomic_swap v1, v0, s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0x00,0x01,0x00,0x00]			;VI: image_atomic_swap v1, v0, s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x40,0xf0,0x00,0x01,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_atomic_swap_i32(<8 x i32> inreg, i32, i32) {			define amdgpu_ps float @image_atomic_swap_i32(<8 x i32> inreg, i32, i32) {
	main_body:			main_body:
	%orig = call i32 @llvm.amdgcn.image.atomic.swap.i32(i32 %2, i32 %1, <8 x i32> %0, i1 0, i1 0, i1 0)			%orig = call i32 @llvm.amdgcn.image.atomic.swap.i32(i32 %2, i32 %1, <8 x i32> %0, i1 0, i1 0, i1 0)
	%orig.f = bitcast i32 %orig to float			%orig.f = bitcast i32 %orig to float
	ret float %orig.f			ret float %orig.f
	}			}

	;CHECK-LABEL: {{^}}image_atomic_cmpswap:			;CHECK-LABEL: {{^}}image_atomic_cmpswap:
				;CHECK-NOT: s_waitcnt
	;SI: image_atomic_cmpswap v[4:5], v[0:3], s[0:7] dmask:0x3 unorm glc ; encoding: [0x00,0x33,0x40,0xf0,0x00,0x04,0x00,0x00]			;SI: image_atomic_cmpswap v[4:5], v[0:3], s[0:7] dmask:0x3 unorm glc ; encoding: [0x00,0x33,0x40,0xf0,0x00,0x04,0x00,0x00]
	;VI: image_atomic_cmpswap v[4:5], v[0:3], s[0:7] dmask:0x3 unorm glc ; encoding: [0x00,0x33,0x44,0xf0,0x00,0x04,0x00,0x00]			;VI: image_atomic_cmpswap v[4:5], v[0:3], s[0:7] dmask:0x3 unorm glc ; encoding: [0x00,0x33,0x44,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: v_mov_b32_e32 v0, v4			;CHECK: v_mov_b32_e32 v0, v4
	define amdgpu_ps float @image_atomic_cmpswap(<8 x i32> inreg, <4 x i32>, i32, i32) {			define amdgpu_ps float @image_atomic_cmpswap(<8 x i32> inreg, <4 x i32>, i32, i32) {
	main_body:			main_body:
	%orig = call i32 @llvm.amdgcn.image.atomic.cmpswap.v4i32(i32 %2, i32 %3, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)			%orig = call i32 @llvm.amdgcn.image.atomic.cmpswap.v4i32(i32 %2, i32 %3, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)
	%orig.f = bitcast i32 %orig to float			%orig.f = bitcast i32 %orig to float
	ret float %orig.f			ret float %orig.f
	}			}

	;CHECK-LABEL: {{^}}image_atomic_add:			;CHECK-LABEL: {{^}}image_atomic_add:
				;CHECK-NOT: s_waitcnt
	;SI: image_atomic_add v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x44,0xf0,0x00,0x04,0x00,0x00]			;SI: image_atomic_add v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x44,0xf0,0x00,0x04,0x00,0x00]
	;VI: image_atomic_add v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x48,0xf0,0x00,0x04,0x00,0x00]			;VI: image_atomic_add v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x48,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_atomic_add(<8 x i32> inreg, <4 x i32>, i32) {			define amdgpu_ps float @image_atomic_add(<8 x i32> inreg, <4 x i32>, i32) {
	main_body:			main_body:
	%orig = call i32 @llvm.amdgcn.image.atomic.add.v4i32(i32 %2, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)			%orig = call i32 @llvm.amdgcn.image.atomic.add.v4i32(i32 %2, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)
	%orig.f = bitcast i32 %orig to float			%orig.f = bitcast i32 %orig to float
	ret float %orig.f			ret float %orig.f
	}			}

	;CHECK-LABEL: {{^}}image_atomic_sub:			;CHECK-LABEL: {{^}}image_atomic_sub:
				;CHECK-NOT: s_waitcnt
	;SI: image_atomic_sub v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x48,0xf0,0x00,0x04,0x00,0x00]			;SI: image_atomic_sub v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x48,0xf0,0x00,0x04,0x00,0x00]
	;VI: image_atomic_sub v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x4c,0xf0,0x00,0x04,0x00,0x00]			;VI: image_atomic_sub v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x4c,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_atomic_sub(<8 x i32> inreg, <4 x i32>, i32) {			define amdgpu_ps float @image_atomic_sub(<8 x i32> inreg, <4 x i32>, i32) {
	main_body:			main_body:
	%orig = call i32 @llvm.amdgcn.image.atomic.sub.v4i32(i32 %2, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)			%orig = call i32 @llvm.amdgcn.image.atomic.sub.v4i32(i32 %2, <4 x i32> %1, <8 x i32> %0, i1 0, i1 0, i1 0)
	%orig.f = bitcast i32 %orig to float			%orig.f = bitcast i32 %orig to float
	ret float %orig.f			ret float %orig.f
	}			}

	;CHECK-LABEL: {{^}}image_atomic_unchanged:			;CHECK-LABEL: {{^}}image_atomic_unchanged:
				;CHECK-NOT: s_waitcnt
	;CHECK: image_atomic_smin v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x50,0xf0,0x00,0x04,0x00,0x00]			;CHECK: image_atomic_smin v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x50,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: image_atomic_umin v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x54,0xf0,0x00,0x04,0x00,0x00]			;CHECK: image_atomic_umin v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x54,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: image_atomic_smax v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x58,0xf0,0x00,0x04,0x00,0x00]			;CHECK: image_atomic_smax v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x58,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: image_atomic_umax v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x5c,0xf0,0x00,0x04,0x00,0x00]			;CHECK: image_atomic_umax v4, v[0:3], s[0:7] dmask:0x1 unorm glc ; encoding: [0x00,0x31,0x5c,0xf0,0x00,0x04,0x00,0x00]
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI %s

	; GCN-LABEL: {{^}}image_load_v4i32:			; GCN-LABEL: {{^}}image_load_v4i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {			define amdgpu_ps <4 x float> @image_load_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	; GCN-LABEL: {{^}}image_load_v2i32:			; GCN-LABEL: {{^}}image_load_v2i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm			; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) #0 {			define amdgpu_ps <4 x float> @image_load_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) #0 {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v2i32.v8i32(<2 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v2i32.v8i32(<2 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	; GCN-LABEL: {{^}}image_load_i32:			; GCN-LABEL: {{^}}image_load_i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm			; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_i32(<8 x i32> inreg %rsrc, i32 %c) #0 {			define amdgpu_ps <4 x float> @image_load_i32(<8 x i32> inreg %rsrc, i32 %c) #0 {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	; GCN-LABEL: {{^}}image_load_mip:			; GCN-LABEL: {{^}}image_load_mip:
				; GCN-NOT: s_waitcnt
	; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm			; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_mip(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {			define amdgpu_ps <4 x float> @image_load_mip(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.mip.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			%tex = call <4 x float> @llvm.amdgcn.image.load.mip.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	; GCN-LABEL: {{^}}image_load_1:			; GCN-LABEL: {{^}}image_load_1:
				; GCN-NOT: s_waitcnt
	; GCN: image_load v0, v[0:3], s[0:7] dmask:0x1 unorm			; GCN: image_load v0, v[0:3], s[0:7] dmask:0x1 unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_load_1(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {			define amdgpu_ps float @image_load_1(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	%elt = extractelement <4 x float> %tex, i32 0			%elt = extractelement <4 x float> %tex, i32 0
	ret float %elt			ret float %elt
	}			}

	; GCN-LABEL: {{^}}image_load_f32_v2i32:			; GCN-LABEL: {{^}}image_load_f32_v2i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_load {{v[0-9]+}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 unorm			; GCN: image_load {{v[0-9]+}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_load_f32_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) #0 {			define amdgpu_ps float @image_load_f32_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) #0 {
	main_body:			main_body:
	%tex = call float @llvm.amdgcn.image.load.f32.v2i32.v8i32(<2 x i32> %c, <8 x i32> %rsrc, i32 1, i1 false, i1 false, i1 false, i1 false)			%tex = call float @llvm.amdgcn.image.load.f32.v2i32.v8i32(<2 x i32> %c, <8 x i32> %rsrc, i32 1, i1 false, i1 false, i1 false, i1 false)
	ret float %tex			ret float %tex
	}			}

	; GCN-LABEL: {{^}}image_load_v2f32_v4i32:			; GCN-LABEL: {{^}}image_load_v2f32_v4i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_load {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x3 unorm			; GCN: image_load {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x3 unorm
	; GCN: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <2 x float> @image_load_v2f32_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {			define amdgpu_ps <2 x float> @image_load_v2f32_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) #0 {
	main_body:			main_body:
	%tex = call <2 x float> @llvm.amdgcn.image.load.v2f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 3, i1 false, i1 false, i1 false, i1 false)			%tex = call <2 x float> @llvm.amdgcn.image.load.v2f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 3, i1 false, i1 false, i1 false, i1 false)
	ret <2 x float> %tex			ret <2 x float> %tex
	}			}

	; GCN-LABEL: {{^}}image_store_v4i32:			; GCN-LABEL: {{^}}image_store_v4i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_store v[0:3], v[4:7], s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v[4:7], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_v4i32(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) #0 {			define amdgpu_ps void @image_store_v4i32(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			call void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}image_store_v2i32:			; GCN-LABEL: {{^}}image_store_v2i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_v2i32(<8 x i32> inreg %rsrc, <4 x float> %data, <2 x i32> %coords) #0 {			define amdgpu_ps void @image_store_v2i32(<8 x i32> inreg %rsrc, <4 x float> %data, <2 x i32> %coords) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v4f32.v2i32.v8i32(<4 x float> %data, <2 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			call void @llvm.amdgcn.image.store.v4f32.v2i32.v8i32(<4 x float> %data, <2 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}image_store_i32:			; GCN-LABEL: {{^}}image_store_i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_i32(<8 x i32> inreg %rsrc, <4 x float> %data, i32 %coords) #0 {			define amdgpu_ps void @image_store_i32(<8 x i32> inreg %rsrc, <4 x float> %data, i32 %coords) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %data, i32 %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %data, i32 %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}image_store_f32_i32:			; GCN-LABEL: {{^}}image_store_f32_i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_store {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 unorm			; GCN: image_store {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 unorm
	define amdgpu_ps void @image_store_f32_i32(<8 x i32> inreg %rsrc, float %data, i32 %coords) #0 {			define amdgpu_ps void @image_store_f32_i32(<8 x i32> inreg %rsrc, float %data, i32 %coords) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.f32.i32.v8i32(float %data, i32 %coords, <8 x i32> %rsrc, i32 1, i1 false, i1 false, i1 false, i1 false)			call void @llvm.amdgcn.image.store.f32.i32.v8i32(float %data, i32 %coords, <8 x i32> %rsrc, i32 1, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}image_store_v2f32_v4i32:			; GCN-LABEL: {{^}}image_store_v2f32_v4i32:
				; GCN-NOT: s_waitcnt
	; GCN: image_store {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x3 unorm			; GCN: image_store {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x3 unorm
	define amdgpu_ps void @image_store_v2f32_v4i32(<8 x i32> inreg %rsrc, <2 x float> %data, <4 x i32> %coords) #0 {			define amdgpu_ps void @image_store_v2f32_v4i32(<8 x i32> inreg %rsrc, <2 x float> %data, <4 x i32> %coords) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v2f32.v4i32.v8i32(<2 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 3, i1 false, i1 false, i1 false, i1 false)			call void @llvm.amdgcn.image.store.v2f32.v4i32.v8i32(<2 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 3, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}image_store_mip:			; GCN-LABEL: {{^}}image_store_mip:
				; GCN-NOT: s_waitcnt
	; GCN: image_store_mip v[0:3], v[4:7], s[0:7] dmask:0xf unorm			; GCN: image_store_mip v[0:3], v[4:7], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_mip(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) #0 {			define amdgpu_ps void @image_store_mip(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) #0 {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.v4f32.v4i32.v8i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)			call void @llvm.amdgcn.image.store.mip.v4f32.v4i32.v8i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 false, i1 false, i1 false, i1 false)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}getresinfo:			; GCN-LABEL: {{^}}getresinfo:
				; GCN-NOT: s_waitcnt
	; GCN: image_get_resinfo {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf			; GCN: image_get_resinfo {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
	define amdgpu_ps void @getresinfo() #0 {			define amdgpu_ps void @getresinfo() #0 {
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.getresinfo.v4f32.i32.v8i32(i32 undef, <8 x i32> undef, i32 15, i1 false, i1 false, i1 false, i1 false)			%r = call <4 x float> @llvm.amdgcn.image.getresinfo.v4f32.i32.v8i32(i32 undef, <8 x i32> undef, i32 15, i1 false, i1 false, i1 false, i1 false)
	%r0 = extractelement <4 x float> %r, i32 0			%r0 = extractelement <4 x float> %r, i32 0
	%r1 = extractelement <4 x float> %r, i32 1			%r1 = extractelement <4 x float> %r, i32 1
	%r2 = extractelement <4 x float> %r, i32 2			%r2 = extractelement <4 x float> %r, i32 2
	%r3 = extractelement <4 x float> %r, i32 3			%r3 = extractelement <4 x float> %r, i32 3
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s

	; CHECK-LABEL: {{^}}test1:			; CHECK-LABEL: {{^}}test1:
				; CHECK-NOT: s_waitcnt
	; CHECK: image_store			; CHECK: image_store
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0){{$}}			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0){{$}}
	; CHECK-NEXT: image_store			; CHECK-NEXT: image_store
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <4 x float> %d0, <4 x float> %d1, i32 %c0, i32 %c1) {			define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <4 x float> %d0, <4 x float> %d1, i32 %c0, i32 %c1) {
	call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %d0, i32 %c0, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %d0, i32 %c0, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)
	call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00			call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00
	call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %d1, i32 %c1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %d1, i32 %c1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)
	ret void			ret void
	}			}

	; Test that the intrinsic is merged with automatically generated waits and			; Test that the intrinsic is merged with automatically generated waits and
	; emitted as late as possible.			; emitted as late as possible.
	;			;
	; CHECK-LABEL: {{^}}test2:			; CHECK-LABEL: {{^}}test2:
				; CHECK-NOT: s_waitcnt
	; CHECK: image_load			; CHECK: image_load
	; CHECK-NEXT: s_waitcnt			; CHECK-NEXT: s_waitcnt
	; CHECK: s_waitcnt vmcnt(0){{$}}			; CHECK: s_waitcnt vmcnt(0){{$}}
	; CHECK-NEXT: image_store			; CHECK-NEXT: image_store
	define amdgpu_ps void @test2(<8 x i32> inreg %rsrc, i32 %c) {			define amdgpu_ps void @test2(<8 x i32> inreg %rsrc, i32 %c) {
	%t = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%t = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00			call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00
	%c.1 = mul i32 %c, 2			%c.1 = mul i32 %c, 2
	Show All 11 Lines