Diff 66039

include/llvm/IR/IntrinsicsAMDGPU.td

Context not available.
	def int_amdgcn_atomic_inc : Intrinsic<[llvm_anyint_ty],	def int_amdgcn_atomic_inc : Intrinsic<[llvm_anyint_ty],
	[llvm_anyptr_ty, LLVMMatchType<0>],	[llvm_anyptr_ty, LLVMMatchType<0>],
	[IntrArgMemOnly, NoCapture<0>]	[IntrArgMemOnly, NoCapture<0>]
	>;	>;

	def int_amdgcn_atomic_dec : Intrinsic<[llvm_anyint_ty],	def int_amdgcn_atomic_dec : Intrinsic<[llvm_anyint_ty],
	[llvm_anyptr_ty, LLVMMatchType<0>],	[llvm_anyptr_ty, LLVMMatchType<0>],
	[IntrArgMemOnly, NoCapture<0>]	[IntrArgMemOnly, NoCapture<0>]
	>;	>;

		//=======================================================================
		// flags is a 32-bit immediate to encode the flags for MIMG instructions.
		// UNORM = flags[0]
		// GLC = flags[1]
		// SLC = flags[2]
		// R128 = flags[3]
		nhaehnleUnsubmitted Not Done Reply Inline Actions Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa. nhaehnle: Can we just not do this kind of change, please? I don't see how it improves anything, it's…
		nhaustovUnsubmitted Not Done Reply Inline Actions I agree. Separate flags also play nicely with assembler. nhaustov: I agree. Separate flags also play nicely with assembler.
		arsenmUnsubmitted Not Done Reply Inline Actions The assembler has nothing to do with the intrinsic definition. This isn't changing the MachineInstr's operand structure arsenm: The assembler has nothing to do with the intrinsic definition. This isn't changing the…
		cfangUnsubmitted Not Done Reply Inline Actions We plan to add and expose d16 bit, so the Intrinsics for Mesa has to be updated anyway. For the future chips, we may add more flag (bit) and all applications (including Mesa) have to be updated every time a new flag is added. One advantage of using mask parameter is that we don't have to update the application if the new bit is not exposed. cfang: We plan to add and expose d16 bit, so the Intrinsics for Mesa has to be updated anyway. For the…
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa. I really prefer using a mask over having to create an entire new set of intrinsics each time we have to add a new bit. I think we have two solutions here: As Matt has suggested, keep mesa the same, and have the auto-upgrader in LLVM change from the old intrinsics to the new mask style intrinsics. Although, with this solution, I think we'd still eventually want/need Mesa to start using the mask version. Define the intrinsics as var_arg. This would allow us to add to i1 operands without breaking the existing operands. I'm just not sure how well var_arg intrinsics are supported. tstellarAMD: > Can we just not do this kind of change, please? I don't see how it improves anything, it's…
		// TFE = flags[4]
		// LWE = flags[5]
		// DA = flags[6]
		// D16 = flags[7]
		// RSVD = flags[8]
		arsenmUnsubmitted Not Done Reply Inline Actions The full instruction name is image_get_resinfo, so the intrinsic should be int_amdgcn_image_getresinfo arsenm: The full instruction name is image_get_resinfo, so the intrinsic should be…
		//=======================================================================
	class AMDGPUImageLoad : Intrinsic <	class AMDGPUImageLoad : Intrinsic <
	[llvm_v4f32_ty], // vdata(VGPR)	[llvm_v4f32_ty], // vdata(VGPR)
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This should go in a separate patch. This patch should be only the sampler changes. tstellarAMD: This should go in a separate patch. This patch should be only the sampler changes.
	[llvm_anyint_ty, // vaddr(VGPR)	[llvm_anyint_ty, // vaddr(VGPR)
	llvm_v8i32_ty, // rsrc(SGPR)	llvm_v8i32_ty, // rsrc(SGPR)
	llvm_i32_ty, // dmask(imm)	llvm_i32_ty, // dmask(imm)
	llvm_i1_ty, // r128(imm)	llvm_i32_ty], // flags (imm)
	llvm_i1_ty, // da(imm)
	llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)
	tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Changing this intrinsic will break Mesa, we will need to update Mesa before we can commit this. tstellarAMD: Changing this intrinsic will break Mesa, we will need to update Mesa before we can commit this.
	cfangUnsubmitted Not Done Reply Inline Actions We will have to add d16 bit! So Mesa will have to be update anyway. cfang: We will have to add d16 bit! So Mesa will have to be update anyway.
	[IntrReadMem]>;	[IntrReadMem]>;
		arsenmUnsubmitted Not Done Reply Inline Actions This requires a descriptive comment (including the values for which bits) arsenm: This requires a descriptive comment (including the values for which bits)

	def int_amdgcn_image_load : AMDGPUImageLoad;
	def int_amdgcn_image_load_mip : AMDGPUImageLoad;

	class AMDGPUImageStore : Intrinsic <	class AMDGPUImageStore : Intrinsic <
	[],	[],
	[llvm_v4f32_ty, // vdata(VGPR)	[llvm_v4f32_ty, // vdata(VGPR)
	llvm_anyint_ty, // vaddr(VGPR)	llvm_anyint_ty, // vaddr(VGPR)
	llvm_v8i32_ty, // rsrc(SGPR)	llvm_v8i32_ty, // rsrc(SGPR)
	llvm_i32_ty, // dmask(imm)	llvm_i32_ty, // dmask(imm)
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This is an unrelated change. tstellarAMD: This is an unrelated change.
	llvm_i1_ty, // r128(imm)	llvm_i32_ty], // flags (imm)
	llvm_i1_ty, // da(imm)
	llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)
	[]>;	[]>;

		class AMDGPUImageSample : Intrinsic <
		[llvm_v4f32_ty], // vdata(VGPR)
		[llvm_anyint_ty, // vaddr(VGPR)
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions I'm thinking vdata should be llvm_anyfloat_ty, so we can have it return <4 x half> for the d16 operations. Though it's going to be weird that some <4 x half> values take 4 registers and some only take two. Another thing I'm not sure of is if image samplers always return floating-point values and never integers. tstellarAMD: I'm thinking vdata should be llvm_anyfloat_ty, so we can have it return <4 x half> for the d16…
		llvm_v8i32_ty, // rsrc(SGPR)
		llvm_v4i32_ty, // sampler(SGPR)
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This should be changed to llvm_anyint_ty, so that we can infer the r128 bit. tstellarAMD: This should be changed to llvm_anyint_ty, so that we can infer the r128 bit.
		llvm_i32_ty, // dmask(imm)
		llvm_i32_ty], // flags(imm)
		[IntrNoMem]>;
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Moving the sample intrinsics to this file is unrelated to the AMDGPUImageLoad/AMDGPUImageStore changes, so this should be done in a separate patch. tstellarAMD: Moving the sample intrinsics to this file is unrelated to the AMDGPUImageLoad/AMDGPUImageStore…
		cfangUnsubmitted Not Done Reply Inline Actions The patch is to implement amdgcn image inttrinsics, which has three categories: AMDGPUImageLoad, AMDGPUImageStore and AMDGPUImageSample. While AMDGPUImageSample is newly defined, and the other two are modified, they do use the same mechanism, i.e. mask parameter! I think it should be better for them to be together in one patch. cfang: The patch is to implement amdgcn image inttrinsics, which has three categories: AMDGPUImageLoad…

		def int_amdgcn_image_load : AMDGPUImageLoad;
		def int_amdgcn_image_load_mip : AMDGPUImageLoad;
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This r128 bit should be dropped. tstellarAMD: This r128 bit should be dropped.
		def int_amdgcn_image_getresinfo : AMDGPUImageLoad;
		nhaehnleUnsubmitted Not Done Reply Inline Actions tfe should be dropped. AFAIU it changes the return type (5 return values instead of 4). nhaehnle: tfe should be dropped. AFAIU it changes the return type (5 return values instead of 4).

	def int_amdgcn_image_store : AMDGPUImageStore;	def int_amdgcn_image_store : AMDGPUImageStore;
	def int_amdgcn_image_store_mip : AMDGPUImageStore;	def int_amdgcn_image_store_mip : AMDGPUImageStore;

		// Basic sample
		def int_amdgcn_image_sample : AMDGPUImageSample;
		def int_amdgcn_image_sample_cl : AMDGPUImageSample;
		def int_amdgcn_image_sample_d : AMDGPUImageSample;
		def int_amdgcn_image_sample_d_cl : AMDGPUImageSample;
		def int_amdgcn_image_sample_l : AMDGPUImageSample;
		def int_amdgcn_image_sample_b : AMDGPUImageSample;
		def int_amdgcn_image_sample_b_cl : AMDGPUImageSample;
		def int_amdgcn_image_sample_lz : AMDGPUImageSample;
		def int_amdgcn_image_sample_cd : AMDGPUImageSample;
		def int_amdgcn_image_sample_cd_cl : AMDGPUImageSample;

		// Sample with comparison
		def int_amdgcn_image_sample_c : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_cl : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_d : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_d_cl : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_l : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_b : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_b_cl : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_lz : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_cd : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_cd_cl : AMDGPUImageSample;

		// Sample with offsets
		def int_amdgcn_image_sample_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_d_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_d_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_l_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_b_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_b_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_lz_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_cd_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_cd_cl_o : AMDGPUImageSample;

		// Sample with comparison and offsets
		def int_amdgcn_image_sample_c_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_d_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_d_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_l_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_b_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_b_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_lz_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_cd_o : AMDGPUImageSample;
		def int_amdgcn_image_sample_c_cd_cl_o : AMDGPUImageSample;

		// Basic gather4
		def int_amdgcn_image_gather4 : AMDGPUImageSample;
		def int_amdgcn_image_gather4_cl : AMDGPUImageSample;
		def int_amdgcn_image_gather4_l : AMDGPUImageSample;
		def int_amdgcn_image_gather4_b : AMDGPUImageSample;
		def int_amdgcn_image_gather4_b_cl : AMDGPUImageSample;
		def int_amdgcn_image_gather4_lz : AMDGPUImageSample;

		// Gather4 with comparison
		def int_amdgcn_image_gather4_c : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_cl : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_l : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_b : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_b_cl : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_lz : AMDGPUImageSample;

		// Gather4 with offsets
		def int_amdgcn_image_gather4_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_l_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_b_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_b_cl_o : AMDGPUImageSample;
		arsenmUnsubmitted Not Done Reply Inline Actions These are also missing the image part of the name as well arsenm: These are also missing the image part of the name as well
		def int_amdgcn_image_gather4_lz_o : AMDGPUImageSample;

		// Gather4 with comparison and offsets
		def int_amdgcn_image_gather4_c_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_l_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_b_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_b_cl_o : AMDGPUImageSample;
		def int_amdgcn_image_gather4_c_lz_o : AMDGPUImageSample;

		def int_amdgcn_image_getlod : AMDGPUImageSample;



	class AMDGPUImageAtomic : Intrinsic <	class AMDGPUImageAtomic : Intrinsic <
	[llvm_i32_ty],	[llvm_i32_ty],
	[llvm_i32_ty, // vdata(VGPR)	[llvm_i32_ty, // vdata(VGPR)
	llvm_anyint_ty, // vaddr(VGPR)	llvm_anyint_ty, // vaddr(VGPR)
	llvm_v8i32_ty, // rsrc(SGPR)	llvm_v8i32_ty, // rsrc(SGPR)
	llvm_i1_ty, // r128(imm)	llvm_i1_ty, // r128(imm)
	llvm_i1_ty, // da(imm)	llvm_i1_ty, // da(imm)
	llvm_i1_ty], // slc(imm)	llvm_i1_ty], // slc(imm)
	[]>;	[]>;

Context not available.

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Context not available.
	void SelectADD_SUB_I64(SDNode *N);	void SelectADD_SUB_I64(SDNode *N);
	void SelectDIV_SCALE(SDNode *N);	void SelectDIV_SCALE(SDNode *N);

	SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,	SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,
	uint32_t Offset, uint32_t Width);	uint32_t Offset, uint32_t Width);
	void SelectS_BFEFromShifts(SDNode *N);	void SelectS_BFEFromShifts(SDNode *N);
	void SelectS_BFE(SDNode *N);	void SelectS_BFE(SDNode *N);
	void SelectBRCOND(SDNode *N);	void SelectBRCOND(SDNode *N);
	void SelectATOMIC_CMP_SWAP(SDNode *N);	void SelectATOMIC_CMP_SWAP(SDNode *N);

		bool SelectImageFlagBits(SDValue ArgNode, SDValue &UNORM,
		SDValue &GLC,
		SDValue &SLC,
		SDValue &R128,
		SDValue &TFE,
		SDValue &LWE,
		SDValue &DA) const;
		arsenmUnsubmitted Not Done Reply Inline Actions Param names should be capitalized, the function name itself should be lowercase arsenm: Param names should be capitalized, the function name itself should be lowercase


	// Include the pieces autogenerated from the target description.	// Include the pieces autogenerated from the target description.
	#include "AMDGPUGenDAGISel.inc"	#include "AMDGPUGenDAGISel.inc"
	};	};
	} // end anonymous namespace	} // end anonymous namespace

	/// \brief This pass converts a legalized DAG into a AMDGPU-specific	/// \brief This pass converts a legalized DAG into a AMDGPU-specific
	// DAG, ready for instruction scheduling.	// DAG, ready for instruction scheduling.
	FunctionPass *llvm::createAMDGPUISelDag(TargetMachine &TM) {	FunctionPass *llvm::createAMDGPUISelDag(TargetMachine &TM) {
	return new AMDGPUDAGToDAGISel(TM);	return new AMDGPUDAGToDAGISel(TM);
	}	}
Context not available.

	bool AMDGPUDAGToDAGISel::SelectFlat(SDValue Addr,	bool AMDGPUDAGToDAGISel::SelectFlat(SDValue Addr,
	SDValue &VAddr,	SDValue &VAddr,
	SDValue &SLC,	SDValue &SLC,
	SDValue &TFE) const {	SDValue &TFE) const {
	VAddr = Addr;	VAddr = Addr;
	TFE = SLC = CurDAG->getTargetConstant(0, SDLoc(), MVT::i1);	TFE = SLC = CurDAG->getTargetConstant(0, SDLoc(), MVT::i1);
	return true;	return true;
	}	}


		// TODO: add D16 bit support.
		bool AMDGPUDAGToDAGISel::SelectImageFlagBits(SDValue ArgNode,
		SDValue &UNORM,
		SDValue &GLC,
		SDValue &SLC,
		SDValue &R128,
		SDValue &TFE,
		SDValue &LWE,
		SDValue &DA) const {
		ConstantSDNode *C = dyn_cast<ConstantSDNode>(ArgNode);
		if (!C)
		return false;

		SDLoc SL(ArgNode);
		int64_t ArgImm = C->getZExtValue();

		UNORM = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::UNORM) ? 1 : 0,
		SL, MVT::i1);
		GLC = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::GLC) ? 1 : 0,
		SL, MVT::i1);
		SLC = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::SLC) ? 1 : 0,
		SL, MVT::i1);
		R128 = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::R128) ? 1 : 0,
		SL, MVT::i1);
		TFE = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::TFE) ? 1 : 0,
		SL, MVT::i1);
		LWE = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::LWE) ? 1 : 0,
		SL, MVT::i1);
		DA = CurDAG->getTargetConstant((ArgImm & llvm::MIMGFlags::DA) ? 1 : 0,
		arsenmUnsubmitted Not Done Reply Inline Actions You don't need these intermediate variables, you can just check the bit as the argument to getTargetConstant directly arsenm: You don't need these intermediate variables, you can just check the bit as the argument to…
		SL, MVT::i1);
		return true;
		}



	///	///
	/// \param EncodedOffset This is the immediate value that will be encoded	/// \param EncodedOffset This is the immediate value that will be encoded
	/// directly into the instruction. On SI/CI the \p EncodedOffset	/// directly into the instruction. On SI/CI the \p EncodedOffset
	/// will be in units of dwords and on VI+ it will be units of bytes.	/// will be in units of dwords and on VI+ it will be units of bytes.
	static bool isLegalSMRDImmOffset(const AMDGPUSubtarget *ST,	static bool isLegalSMRDImmOffset(const AMDGPUSubtarget *ST,
	int64_t EncodedOffset) {	int64_t EncodedOffset) {
	return ST->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS ?	return ST->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS ?
	isUInt<8>(EncodedOffset) : isUInt<20>(EncodedOffset);	isUInt<8>(EncodedOffset) : isUInt<20>(EncodedOffset);
	}	}

Context not available.
		arsenmUnsubmitted Not Done Reply Inline Actions Commented out code arsenm: Commented out code

lib/Target/AMDGPU/SIDefines.h

Context not available.
	namespace SIOutMods {	namespace SIOutMods {
	enum {	enum {
	NONE = 0,	NONE = 0,
	MUL2 = 1,	MUL2 = 1,
	MUL4 = 2,	MUL4 = 2,
	DIV2 = 3	DIV2 = 3
	};	};
	}	}

	namespace llvm {	namespace llvm {
		namespace MIMGFlags {
		enum {
		UNORM = 1 << 0,
		GLC = 1 << 1,
		SLC = 1 << 2,
		R128 = 1 << 3,
		TFE = 1 << 4,
		arsenmUnsubmitted Not Done Reply Inline Actions Not all of these bits should be exposed. GLC, SLC, and TFE definitely should not be. TFE changes the register class of the output. This should strictly be the ones for controlling the sampling arsenm: Not all of these bits should be exposed. GLC, SLC, and TFE definitely should not be. TFE…
		LWE = 1 << 5,
		DA = 1 << 6,
		D16 = 1 << 7,
		RSVD = 1 << 8
		};
		}
		}

		namespace llvm {
	namespace AMDGPU {	namespace AMDGPU {
	namespace EncValues { // Encoding values of enum9/8/7 operands	namespace EncValues { // Encoding values of enum9/8/7 operands

	enum {	enum {
	SGPR_MIN = 0,	SGPR_MIN = 0,
	SGPR_MAX = 101,	SGPR_MAX = 101,
	TTMP_MIN = 112,	TTMP_MIN = 112,
	TTMP_MAX = 123,	TTMP_MAX = 123,
	INLINE_INTEGER_C_MIN = 128,	INLINE_INTEGER_C_MIN = 128,
	INLINE_INTEGER_C_POSITIVE_MAX = 192, // 64	INLINE_INTEGER_C_POSITIVE_MAX = 192, // 64
Context not available.

lib/Target/AMDGPU/SIInstrInfo.td

Context not available.

	def MOVRELOffset : ComplexPattern<i32, 2, "SelectMOVRELOffset">;	def MOVRELOffset : ComplexPattern<i32, 2, "SelectMOVRELOffset">;

	def VOP3Mods0 : ComplexPattern<untyped, 4, "SelectVOP3Mods0">;	def VOP3Mods0 : ComplexPattern<untyped, 4, "SelectVOP3Mods0">;
	def VOP3NoMods0 : ComplexPattern<untyped, 4, "SelectVOP3NoMods0">;	def VOP3NoMods0 : ComplexPattern<untyped, 4, "SelectVOP3NoMods0">;
	def VOP3Mods0Clamp : ComplexPattern<untyped, 3, "SelectVOP3Mods0Clamp">;	def VOP3Mods0Clamp : ComplexPattern<untyped, 3, "SelectVOP3Mods0Clamp">;
	def VOP3Mods0Clamp0OMod : ComplexPattern<untyped, 4, "SelectVOP3Mods0Clamp0OMod">;	def VOP3Mods0Clamp0OMod : ComplexPattern<untyped, 4, "SelectVOP3Mods0Clamp0OMod">;
	def VOP3Mods : ComplexPattern<untyped, 2, "SelectVOP3Mods">;	def VOP3Mods : ComplexPattern<untyped, 2, "SelectVOP3Mods">;
	def VOP3NoMods : ComplexPattern<untyped, 2, "SelectVOP3NoMods">;	def VOP3NoMods : ComplexPattern<untyped, 2, "SelectVOP3NoMods">;

		def ImageFlagParameters: ComplexPattern<i32, 7, "SelectImageFlagBits">;


	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// SI assembler operands	// SI assembler operands
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//

	def SIOperand {	def SIOperand {
	int ZERO = 0x80;	int ZERO = 0x80;
	int VCC = 0x6A;	int VCC = 0x6A;
	int FLAT_SCR = 0x68;	int FLAT_SCR = 0x68;
	}	}

Context not available.

lib/Target/AMDGPU/SIInstructions.td

Context not available.
	>;	>;

	multiclass SampleRawPatterns<SDPatternOperator name, string opcode> {	multiclass SampleRawPatterns<SDPatternOperator name, string opcode> {
	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V8), v8i32>;	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V8), v8i32>;
	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V16), v16i32>;	def : SampleRawPattern<name, !cast<MIMG>(opcode # _V4_V16), v16i32>;
	}	}


		// Image + sampler
		class AMDGCNSamplePattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <
		(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, i32:$dmask, (ImageFlagParameters
		i1:$unorm, i1:$glc, i1:$slc, i1:$r128, i1:$tfe, i1:$lwe, i1:$da)),
		(opcode $addr, $rsrc, $sampler,
		(as_i32imm $dmask), (as_i1imm $unorm), (as_i1imm $glc), (as_i1imm $slc),
		(as_i1imm $r128), (as_i1imm $tfe), (as_i1imm $lwe), (as_i1imm $da))
		>;

		multiclass AMDGCNSamplePatterns<SDPatternOperator name, string opcode> {
		def : AMDGCNSamplePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
		def : AMDGCNSamplePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
		def : AMDGCNSamplePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
		def : AMDGCNSamplePattern<name, !cast<MIMG>(opcode # _V4_V8), v8i32>;
		def : AMDGCNSamplePattern<name, !cast<MIMG>(opcode # _V4_V16), v16i32>;
		}


	// Image only	// Image only
	class ImagePattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <	class ImagePattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <
	(name vt:$addr, v8i32:$rsrc, imm:$dmask, imm:$unorm,	(name vt:$addr, v8i32:$rsrc, imm:$dmask, imm:$unorm,
	imm:$r128, imm:$da, imm:$glc, imm:$slc, imm:$tfe, imm:$lwe),	imm:$r128, imm:$da, imm:$glc, imm:$slc, imm:$tfe, imm:$lwe),
	(opcode $addr, $rsrc,	(opcode $addr, $rsrc,
	(as_i32imm $dmask), (as_i1imm $unorm), (as_i1imm $glc), (as_i1imm $slc),	(as_i32imm $dmask), (as_i1imm $unorm), (as_i1imm $glc), (as_i1imm $slc),
	(as_i1imm $r128), (as_i1imm $tfe), (as_i1imm $lwe), (as_i1imm $da))	(as_i1imm $r128), (as_i1imm $tfe), (as_i1imm $lwe), (as_i1imm $da))
	>;	>;

	multiclass ImagePatterns<SDPatternOperator name, string opcode> {	multiclass ImagePatterns<SDPatternOperator name, string opcode> {
	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	}	}

	class ImageLoadPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <	class ImageLoadPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <
	(name vt:$addr, v8i32:$rsrc, imm:$dmask, imm:$r128, imm:$da, imm:$glc,	(name vt:$addr, v8i32:$rsrc, imm:$dmask, (ImageFlagParameters i1:$unorm, i1:$glc,
	imm:$slc),	i1:$slc, i1:$r128, i1:$tfe, i1:$lwe, i1:$da)),
	(opcode $addr, $rsrc,	(opcode $addr, $rsrc,
	(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),	(as_i32imm $dmask), (as_i1imm $unorm), (as_i1imm $glc), (as_i1imm $slc),
	(as_i1imm $r128), 0, 0, (as_i1imm $da))	(as_i1imm $r128), (as_i1imm $tfe), (as_i1imm $lwe), (as_i1imm $da))
	>;	>;

	multiclass ImageLoadPatterns<SDPatternOperator name, string opcode> {	multiclass ImageLoadPatterns<SDPatternOperator name, string opcode> {
	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	}	}

	class ImageStorePattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <	class ImageStorePattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <
	(name v4f32:$data, vt:$addr, v8i32:$rsrc, i32:$dmask, imm:$r128, imm:$da,	(name v4f32:$data, vt:$addr, v8i32:$rsrc, i32:$dmask, (ImageFlagParameters i1:$unorm,
	imm:$glc, imm:$slc),	i1:$glc, i1:$slc, i1:$r128, i1:$tfe, i1:$lwe, i1:$da)),
	(opcode $data, $addr, $rsrc,	(opcode $data, $addr, $rsrc,
	(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),	(as_i32imm $dmask), (as_i1imm $unorm), (as_i1imm $glc), (as_i1imm $slc),
	(as_i1imm $r128), 0, 0, (as_i1imm $da))	(as_i1imm $r128), (as_i1imm $tfe), (as_i1imm $lwe), (as_i1imm $da))
	>;	>;

	multiclass ImageStorePatterns<SDPatternOperator name, string opcode> {	multiclass ImageStorePatterns<SDPatternOperator name, string opcode> {
	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	}	}

	class ImageAtomicPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <	class ImageAtomicPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <
	(name i32:$vdata, vt:$addr, v8i32:$rsrc, imm:$r128, imm:$da, imm:$slc),	(name i32:$vdata, vt:$addr, v8i32:$rsrc, imm:$r128, imm:$da, imm:$slc),
Context not available.

	class ImageAtomicCmpSwapPattern<MIMG opcode, ValueType vt> : Pat <	class ImageAtomicCmpSwapPattern<MIMG opcode, ValueType vt> : Pat <
	(int_amdgcn_image_atomic_cmpswap i32:$vsrc, i32:$vcmp, vt:$addr, v8i32:$rsrc,	(int_amdgcn_image_atomic_cmpswap i32:$vsrc, i32:$vcmp, vt:$addr, v8i32:$rsrc,
	imm:$r128, imm:$da, imm:$slc),	imm:$r128, imm:$da, imm:$slc),
	(EXTRACT_SUBREG	(EXTRACT_SUBREG
	(opcode (REG_SEQUENCE VReg_64, $vsrc, sub0, $vcmp, sub1),	(opcode (REG_SEQUENCE VReg_64, $vsrc, sub0, $vcmp, sub1),
	$addr, $rsrc, 3, 1, 1, (as_i1imm $slc), (as_i1imm $r128), 0, 0, (as_i1imm $da)),	$addr, $rsrc, 3, 1, 1, (as_i1imm $slc), (as_i1imm $r128), 0, 0, (as_i1imm $da)),
	sub0)	sub0)
	>;	>;

		// ======= SI Image Intrinsics ================

		// Image load
		defm : ImagePatterns<int_SI_image_load, "IMAGE_LOAD">;
		defm : ImagePatterns<int_SI_image_load_mip, "IMAGE_LOAD_MIP">;
		def : ImagePattern<int_SI_getresinfo, IMAGE_GET_RESINFO_V4_V1, i32>;

	// Basic sample	// Basic sample
	defm : SampleRawPatterns<int_SI_image_sample, "IMAGE_SAMPLE">;	defm : SampleRawPatterns<int_SI_image_sample, "IMAGE_SAMPLE">;
	defm : SampleRawPatterns<int_SI_image_sample_cl, "IMAGE_SAMPLE_CL">;	defm : SampleRawPatterns<int_SI_image_sample_cl, "IMAGE_SAMPLE_CL">;
	defm : SampleRawPatterns<int_SI_image_sample_d, "IMAGE_SAMPLE_D">;	defm : SampleRawPatterns<int_SI_image_sample_d, "IMAGE_SAMPLE_D">;
	defm : SampleRawPatterns<int_SI_image_sample_d_cl, "IMAGE_SAMPLE_D_CL">;	defm : SampleRawPatterns<int_SI_image_sample_d_cl, "IMAGE_SAMPLE_D_CL">;
	defm : SampleRawPatterns<int_SI_image_sample_l, "IMAGE_SAMPLE_L">;	defm : SampleRawPatterns<int_SI_image_sample_l, "IMAGE_SAMPLE_L">;
	defm : SampleRawPatterns<int_SI_image_sample_b, "IMAGE_SAMPLE_B">;	defm : SampleRawPatterns<int_SI_image_sample_b, "IMAGE_SAMPLE_B">;
	defm : SampleRawPatterns<int_SI_image_sample_b_cl, "IMAGE_SAMPLE_B_CL">;	defm : SampleRawPatterns<int_SI_image_sample_b_cl, "IMAGE_SAMPLE_B_CL">;
	defm : SampleRawPatterns<int_SI_image_sample_lz, "IMAGE_SAMPLE_LZ">;	defm : SampleRawPatterns<int_SI_image_sample_lz, "IMAGE_SAMPLE_LZ">;
	defm : SampleRawPatterns<int_SI_image_sample_cd, "IMAGE_SAMPLE_CD">;	defm : SampleRawPatterns<int_SI_image_sample_cd, "IMAGE_SAMPLE_CD">;
Context not available.
	def : SampleRawPattern<int_SI_gather4_c_l_o, IMAGE_GATHER4_C_L_O_V4_V8, v8i32>;	def : SampleRawPattern<int_SI_gather4_c_l_o, IMAGE_GATHER4_C_L_O_V4_V8, v8i32>;
	def : SampleRawPattern<int_SI_gather4_c_b_o, IMAGE_GATHER4_C_B_O_V4_V8, v8i32>;	def : SampleRawPattern<int_SI_gather4_c_b_o, IMAGE_GATHER4_C_B_O_V4_V8, v8i32>;
	def : SampleRawPattern<int_SI_gather4_c_b_cl_o, IMAGE_GATHER4_C_B_CL_O_V4_V8, v8i32>;	def : SampleRawPattern<int_SI_gather4_c_b_cl_o, IMAGE_GATHER4_C_B_CL_O_V4_V8, v8i32>;
	def : SampleRawPattern<int_SI_gather4_c_lz_o, IMAGE_GATHER4_C_LZ_O_V4_V4, v4i32>;	def : SampleRawPattern<int_SI_gather4_c_lz_o, IMAGE_GATHER4_C_LZ_O_V4_V4, v4i32>;
	def : SampleRawPattern<int_SI_gather4_c_lz_o, IMAGE_GATHER4_C_LZ_O_V4_V8, v8i32>;	def : SampleRawPattern<int_SI_gather4_c_lz_o, IMAGE_GATHER4_C_LZ_O_V4_V8, v8i32>;

	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V1, i32>;	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V1, i32>;
	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V2, v2i32>;	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V2, v2i32>;
	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V4, v4i32>;	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V4, v4i32>;

	def : ImagePattern<int_SI_getresinfo, IMAGE_GET_RESINFO_V4_V1, i32>;
	defm : ImagePatterns<int_SI_image_load, "IMAGE_LOAD">;	// ======= amdgcn Image Intrinsics ==============
	defm : ImagePatterns<int_SI_image_load_mip, "IMAGE_LOAD_MIP">;
		// Image load
	defm : ImageLoadPatterns<int_amdgcn_image_load, "IMAGE_LOAD">;	defm : ImageLoadPatterns<int_amdgcn_image_load, "IMAGE_LOAD">;
	defm : ImageLoadPatterns<int_amdgcn_image_load_mip, "IMAGE_LOAD_MIP">;	defm : ImageLoadPatterns<int_amdgcn_image_load_mip, "IMAGE_LOAD_MIP">;
		def : ImageLoadPattern<int_amdgcn_image_getresinfo, IMAGE_GET_RESINFO_V4_V1, i32>;

		// Image store
		tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions These image load/store changes should go in a separate patch. tstellarAMD: These image load/store changes should go in a separate patch.
	defm : ImageStorePatterns<int_amdgcn_image_store, "IMAGE_STORE">;	defm : ImageStorePatterns<int_amdgcn_image_store, "IMAGE_STORE">;
	defm : ImageStorePatterns<int_amdgcn_image_store_mip, "IMAGE_STORE_MIP">;	defm : ImageStorePatterns<int_amdgcn_image_store_mip, "IMAGE_STORE_MIP">;

		// Basic sample
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample, "IMAGE_SAMPLE">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cl, "IMAGE_SAMPLE_CL">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_d, "IMAGE_SAMPLE_D">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_d_cl, "IMAGE_SAMPLE_D_CL">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_l, "IMAGE_SAMPLE_L">;
		arsenmUnsubmitted Not Done Reply Inline Actions Not sure what this means arsenm: Not sure what this means
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_b, "IMAGE_SAMPLE_B">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_b_cl, "IMAGE_SAMPLE_B_CL">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_lz, "IMAGE_SAMPLE_LZ">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cd, "IMAGE_SAMPLE_CD">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cd_cl, "IMAGE_SAMPLE_CD_CL">;

		// Sample with comparison
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c, "IMAGE_SAMPLE_C">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_cl, "IMAGE_SAMPLE_C_CL">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_d, "IMAGE_SAMPLE_C_D">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_d_cl, "IMAGE_SAMPLE_C_D_CL">;
		arsenmUnsubmitted Not Done Reply Inline Actions I think you can reduce the number of repeated lines by passing in the suffix, and then concat + cast to the intrinsic (Similar to how MUBUF_LoadIntrinsicPat does it) arsenm: I think you can reduce the number of repeated lines by passing in the suffix, and then concat +…
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_l, "IMAGE_SAMPLE_C_L">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_b, "IMAGE_SAMPLE_C_B">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_b_cl, "IMAGE_SAMPLE_C_B_CL">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_lz, "IMAGE_SAMPLE_C_LZ">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_cd, "IMAGE_SAMPLE_C_CD">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_cd_cl, "IMAGE_SAMPLE_C_CD_CL">;

		// Sample with offsets
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_o, "IMAGE_SAMPLE_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cl_o, "IMAGE_SAMPLE_CL_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_d_o, "IMAGE_SAMPLE_D_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_d_cl_o, "IMAGE_SAMPLE_D_CL_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_l_o, "IMAGE_SAMPLE_L_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_b_o, "IMAGE_SAMPLE_B_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_b_cl_o, "IMAGE_SAMPLE_B_CL_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_lz_o, "IMAGE_SAMPLE_LZ_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cd_o, "IMAGE_SAMPLE_CD_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cd_cl_o, "IMAGE_SAMPLE_CD_CL_O">;

		// Sample with comparison and offsets
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_o, "IMAGE_SAMPLE_C_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_cl_o, "IMAGE_SAMPLE_C_CL_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_d_o, "IMAGE_SAMPLE_C_D_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_d_cl_o, "IMAGE_SAMPLE_C_D_CL_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_l_o, "IMAGE_SAMPLE_C_L_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_b_o, "IMAGE_SAMPLE_C_B_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_b_cl_o, "IMAGE_SAMPLE_C_B_CL_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_lz_o, "IMAGE_SAMPLE_C_LZ_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_cd_o, "IMAGE_SAMPLE_C_CD_O">;
		defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_c_cd_cl_o, "IMAGE_SAMPLE_C_CD_CL_O">;

		// Gather opcodes
		// Only the variants which make sense are defined.
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4, IMAGE_GATHER4_V4_V2, v2i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4, IMAGE_GATHER4_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_cl, IMAGE_GATHER4_CL_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_l, IMAGE_GATHER4_L_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_b, IMAGE_GATHER4_B_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_b_cl, IMAGE_GATHER4_B_CL_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_b_cl, IMAGE_GATHER4_B_CL_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_lz, IMAGE_GATHER4_LZ_V4_V2, v2i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_lz, IMAGE_GATHER4_LZ_V4_V4, v4i32>;

		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c, IMAGE_GATHER4_C_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_cl, IMAGE_GATHER4_C_CL_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_cl, IMAGE_GATHER4_C_CL_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_l, IMAGE_GATHER4_C_L_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_l, IMAGE_GATHER4_C_L_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_b, IMAGE_GATHER4_C_B_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_b, IMAGE_GATHER4_C_B_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_b_cl, IMAGE_GATHER4_C_B_CL_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_lz, IMAGE_GATHER4_C_LZ_V4_V4, v4i32>;

		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_o, IMAGE_GATHER4_O_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_cl_o, IMAGE_GATHER4_CL_O_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_cl_o, IMAGE_GATHER4_CL_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_l_o, IMAGE_GATHER4_L_O_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_l_o, IMAGE_GATHER4_L_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_b_o, IMAGE_GATHER4_B_O_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_b_o, IMAGE_GATHER4_B_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_b_cl_o, IMAGE_GATHER4_B_CL_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_lz_o, IMAGE_GATHER4_LZ_O_V4_V4, v4i32>;

		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_o, IMAGE_GATHER4_C_O_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_o, IMAGE_GATHER4_C_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_cl_o, IMAGE_GATHER4_C_CL_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_l_o, IMAGE_GATHER4_C_L_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_b_o, IMAGE_GATHER4_C_B_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_b_cl_o, IMAGE_GATHER4_C_B_CL_O_V4_V8, v8i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_lz_o, IMAGE_GATHER4_C_LZ_O_V4_V4, v4i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_gather4_c_lz_o, IMAGE_GATHER4_C_LZ_O_V4_V8, v8i32>;

		def : AMDGCNSamplePattern<int_amdgcn_image_getlod, IMAGE_GET_LOD_V4_V1, i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_getlod, IMAGE_GET_LOD_V4_V2, v2i32>;
		def : AMDGCNSamplePattern<int_amdgcn_image_getlod, IMAGE_GET_LOD_V4_V4, v4i32>;

		// Image atomics
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_swap, "IMAGE_ATOMIC_SWAP">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_swap, "IMAGE_ATOMIC_SWAP">;
	def : ImageAtomicCmpSwapPattern<IMAGE_ATOMIC_CMPSWAP_V1, i32>;	def : ImageAtomicCmpSwapPattern<IMAGE_ATOMIC_CMPSWAP_V1, i32>;
	def : ImageAtomicCmpSwapPattern<IMAGE_ATOMIC_CMPSWAP_V2, v2i32>;	def : ImageAtomicCmpSwapPattern<IMAGE_ATOMIC_CMPSWAP_V2, v2i32>;
	def : ImageAtomicCmpSwapPattern<IMAGE_ATOMIC_CMPSWAP_V4, v4i32>;	def : ImageAtomicCmpSwapPattern<IMAGE_ATOMIC_CMPSWAP_V4, v4i32>;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_add, "IMAGE_ATOMIC_ADD">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_add, "IMAGE_ATOMIC_ADD">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_sub, "IMAGE_ATOMIC_SUB">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_sub, "IMAGE_ATOMIC_SUB">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_smin, "IMAGE_ATOMIC_SMIN">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_smin, "IMAGE_ATOMIC_SMIN">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umin, "IMAGE_ATOMIC_UMIN">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umin, "IMAGE_ATOMIC_UMIN">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_smax, "IMAGE_ATOMIC_SMAX">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_smax, "IMAGE_ATOMIC_SMAX">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umax, "IMAGE_ATOMIC_UMAX">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_umax, "IMAGE_ATOMIC_UMAX">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_and, "IMAGE_ATOMIC_AND">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_and, "IMAGE_ATOMIC_AND">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_or, "IMAGE_ATOMIC_OR">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_or, "IMAGE_ATOMIC_OR">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_xor, "IMAGE_ATOMIC_XOR">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_xor, "IMAGE_ATOMIC_XOR">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_inc, "IMAGE_ATOMIC_INC">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_inc, "IMAGE_ATOMIC_INC">;
	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_dec, "IMAGE_ATOMIC_DEC">;	defm : ImageAtomicPatterns<int_amdgcn_image_atomic_dec, "IMAGE_ATOMIC_DEC">;


	/* SIsample for simple 1D texture lookup */	/* SIsample for simple 1D texture lookup */
	def : Pat <	def : Pat <
	(SIsample i32:$addr, v8i32:$rsrc, v4i32:$sampler, imm),	(SIsample i32:$addr, v8i32:$rsrc, v4i32:$sampler, imm),
	(IMAGE_SAMPLE_V4_V1 $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)	(IMAGE_SAMPLE_V4_V1 $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)
	>;	>;

	class SamplePattern<SDNode name, MIMG opcode, ValueType vt> : Pat <	class SamplePattern<SDNode name, MIMG opcode, ValueType vt> : Pat <
	(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, imm),	(name vt:$addr, v8i32:$rsrc, v4i32:$sampler, imm),
	(opcode $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)	(opcode $addr, $rsrc, $sampler, 0xf, 0, 0, 0, 0, 0, 0, 0)
	>;	>;
Context not available.

test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s
				; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck --check-prefix=GCN %s

				; GCN-LABEL: {{^}}gather4_v2:
				; GCN: image_gather4 {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_v2(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.v2i32(<2 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4:
				; GCN: image_gather4 {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_cl:
				; GCN: image_gather4_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_l:
				; GCN: image_gather4_l {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_l(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.l.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_b:
				; GCN: image_gather4_b {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_b(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.b.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_b_cl:
				; GCN: image_gather4_b_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_b_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_b_cl_v8:
				; GCN: image_gather4_b_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_b_cl_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_lz_v2:
				; GCN: image_gather4_lz {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_lz_v2(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.lz.v2i32(<2 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_lz:
				; GCN: image_gather4_lz {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_lz(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.lz.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}



				; GCN-LABEL: {{^}}gather4_o:
				; GCN: image_gather4_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_cl_o:
				; GCN: image_gather4_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_cl_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_cl_o_v8:
				; GCN: image_gather4_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_cl_o_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.cl.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_l_o:
				; GCN: image_gather4_l_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_l_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.l.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_l_o_v8:
				; GCN: image_gather4_l_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_l_o_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.l.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_b_o:
				; GCN: image_gather4_b_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_b_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.b.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_b_o_v8:
				; GCN: image_gather4_b_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_b_o_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.b.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_b_cl_o:
				; GCN: image_gather4_b_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_b_cl_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_lz_o:
				; GCN: image_gather4_lz_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_lz_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.lz.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}


				; GCN-LABEL: {{^}}gather4_c:
				; GCN: image_gather4_c {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_cl:
				; GCN: image_gather4_c_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_cl_v8:
				; GCN: image_gather4_c_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_cl_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_l:
				; GCN: image_gather4_c_l {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_l(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.l.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_l_v8:
				; GCN: image_gather4_c_l {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_l_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.l.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_b:
				; GCN: image_gather4_c_b {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_b(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.b.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_b_v8:
				; GCN: image_gather4_c_b {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_b_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.b.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_b_cl:
				; GCN: image_gather4_c_b_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_b_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_lz:
				; GCN: image_gather4_c_lz {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_lz(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}



				; GCN-LABEL: {{^}}gather4_c_o:
				; GCN: image_gather4_c_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_o_v8:
				; GCN: image_gather4_c_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_o_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_cl_o:
				; GCN: image_gather4_c_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_cl_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_l_o:
				; GCN: image_gather4_c_l_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_l_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.l.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_b_o:
				; GCN: image_gather4_c_b_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_b_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.b.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_b_cl_o:
				; GCN: image_gather4_c_b_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_b_cl_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_lz_o:
				; GCN: image_gather4_c_lz_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_lz_o(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}gather4_c_lz_o_v8:
				; GCN: image_gather4_c_lz_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x1 da
				define void @gather4_c_lz_o_v8(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.v8i32(<8 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 1, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}


				declare <4 x float> @llvm.amdgcn.image.gather4.v2i32(<2 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.l.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.b.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.lz.v2i32(<2 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.lz.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.gather4.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.cl.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.l.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.l.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.b.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.b.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.b.cl.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.lz.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.gather4.c.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.l.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.l.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.b.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.b.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.gather4.c.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.cl.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.l.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.b.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.v8i32(<8 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0


				attributes #0 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.image.getlod.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s
				; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck --check-prefix=GCN %s

				; GCN-LABEL: {{^}}getlod:
				; GCN: image_get_lod {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf da
				define void @getlod(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getlod.i32(i32 undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 64) ; flag = da
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}getlod_v2:
				; GCN: image_get_lod {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf da
				define void @getlod_v2(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getlod.v2i32(<2 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 64) ; flag = da
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}getlod_v4:
				; GCN: image_get_lod {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf da
				define void @getlod_v4(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getlod.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 64)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}


				declare <4 x float> @llvm.amdgcn.image.getlod.i32(i32, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.getlod.v2i32(<2 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.getlod.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0


				attributes #0 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck --check-prefix=GCN %s

	;CHECK-LABEL: {{^}}image_load_v4i32:
	;CHECK: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm			; GCN-LABEL: {{^}}image_load_v4i32:
	;CHECK: s_waitcnt vmcnt(0)			; GCN: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm
				; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) {			define amdgpu_ps <4 x float> @image_load_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_v2i32:			; GCN-LABEL: {{^}}image_load_v2i32:
	;CHECK: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm			; GCN: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) {			define amdgpu_ps <4 x float> @image_load_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32> %c, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_i32:			; GCN-LABEL: {{^}}image_load_i32:
	;CHECK: image_load v[0:3], v0, s[0:7] dmask:0xf unorm			; GCN: image_load v[0:3], v0, s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_i32(<8 x i32> inreg %rsrc, i32 %c) {			define amdgpu_ps <4 x float> @image_load_i32(<8 x i32> inreg %rsrc, i32 %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %c, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_mip:			; GCN-LABEL: {{^}}image_load_mip:
	;CHECK: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm			; GCN: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_mip(<8 x i32> inreg %rsrc, <4 x i32> %c) {			define amdgpu_ps <4 x float> @image_load_mip(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_1:			; GCN-LABEL: {{^}}image_load_1:
	;CHECK: image_load v0, v[0:3], s[0:7] dmask:0x1 unorm			; GCN: image_load v0, v[0:3], s[0:7] dmask:0x1 unorm
	;CHECK: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_load_1(<8 x i32> inreg %rsrc, <4 x i32> %c) {			define amdgpu_ps float @image_load_1(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	%elt = extractelement <4 x float> %tex, i32 0			%elt = extractelement <4 x float> %tex, i32 0
	; Only first component used, test that dmask etc. is changed accordingly			; Only first component used, test that dmask etc. is changed accordingly
	ret float %elt			ret float %elt
	}			}

	;CHECK-LABEL: {{^}}image_store_v4i32:			; GCN-LABEL: {{^}}image_store_v4i32:
	;CHECK: image_store v[0:3], v[4:7], s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v[4:7], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_v4i32(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {			define amdgpu_ps void @image_store_v4i32(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v4i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}image_store_v2i32:			; GCN-LABEL: {{^}}image_store_v2i32:
	;CHECK: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_v2i32(<8 x i32> inreg %rsrc, <4 x float> %data, <2 x i32> %coords) {			define amdgpu_ps void @image_store_v2i32(<8 x i32> inreg %rsrc, <4 x float> %data, <2 x i32> %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v2i32(<4 x float> %data, <2 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v2i32(<4 x float> %data, <2 x i32> %coords, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}image_store_i32:			; GCN-LABEL: {{^}}image_store_i32:
	;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_i32(<8 x i32> inreg %rsrc, <4 x float> %data, i32 %coords) {			define amdgpu_ps void @image_store_i32(<8 x i32> inreg %rsrc, <4 x float> %data, i32 %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %coords, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}image_store_mip:			; GCN-LABEL: {{^}}image_store_mip:
	;CHECK: image_store_mip v[0:3], v[4:7], s[0:7] dmask:0xf unorm			; GCN: image_store_mip v[0:3], v[4:7], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_mip(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {			define amdgpu_ps void @image_store_mip(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.v4i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.mip.v4i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i32 1) ; 1<<0 unorm
	ret void			ret void
	}			}

	; Ideally, the register allocator would avoid the wait here			; Ideally, the register allocator would avoid the wait here
	;			;
	;CHECK-LABEL: {{^}}image_store_wait:			; GCN-LABEL: {{^}}image_store_wait:
	;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm			; GCN: image_store v[0:3], v4, s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0) expcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0)
	;CHECK: image_load v[0:3], v4, s[8:15] dmask:0xf unorm			; GCN: image_load v[0:3], v4, s[8:15] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			; GCN: s_waitcnt vmcnt(0)
	;CHECK: image_store v[0:3], v4, s[16:23] dmask:0xf unorm			; GCN: image_store v[0:3], v4, s[16:23] dmask:0xf unorm
	define amdgpu_ps void @image_store_wait(<8 x i32> inreg, <8 x i32> inreg, <8 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @image_store_wait(<8 x i32> inreg, <8 x i32> inreg, <8 x i32> inreg, <4 x float>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.i32(<4 x float> %3, i32 %4, <8 x i32> %0, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %3, i32 %4, <8 x i32> %0, i32 15, i32 1) ; 1<<0 unorm
	%data = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %4, <8 x i32> %1, i32 15, i1 0, i1 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %4, <8 x i32> %1, i32 15, i32 1) ; 1<<0 unorm
	call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %4, <8 x i32> %2, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %4, <8 x i32> %2, i32 15, i32 1) ; 1<<0 unorm
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0			; GCN-LABEL: {{^}}getresinfo:
	declare void @llvm.amdgcn.image.store.v2i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			; GCN: image_get_resinfo {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
	declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			define amdgpu_ps void @getresinfo() {
	declare void @llvm.amdgcn.image.store.mip.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getresinfo.i32(i32 undef, <8 x i32> undef, i32 15, i32 0)
				%r0 = extractelement <4 x float> %r, i32 0
				%r1 = extractelement <4 x float> %r, i32 1
				%r2 = extractelement <4 x float> %r, i32 2
				%r3 = extractelement <4 x float> %r, i32 3
				call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %r0, float %r1, float %r2, float %r3)
				ret void
				}


				declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.v2i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i32) #0
				declare void @llvm.amdgcn.image.store.mip.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32>, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i32) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32>, <8 x i32>, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.getresinfo.i32(i32, <8 x i32>, i32, i32)
	declare <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1			declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
	declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample-masked.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=verde \| FileCheck --check-prefix=GCN %s
				; RUN: llc < %s -march=amdgcn -mcpu=tonga \| FileCheck --check-prefix=GCN %s

				; GCN-LABEL: {{^}}v1:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xd
				define amdgpu_ps void @v1(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 0
				%3 = extractelement <4 x float> %1, i32 2
				%4 = extractelement <4 x float> %1, i32 3
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4)
				ret void
				}

				; GCN-LABEL: {{^}}v2:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xb
				define amdgpu_ps void @v2(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 0
				%3 = extractelement <4 x float> %1, i32 1
				%4 = extractelement <4 x float> %1, i32 3
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4)
				ret void
				}

				; GCN-LABEL: {{^}}v3:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xe
				define amdgpu_ps void @v3(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 1
				%3 = extractelement <4 x float> %1, i32 2
				%4 = extractelement <4 x float> %1, i32 3
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4)
				ret void
				}

				; GCN-LABEL: {{^}}v4:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x7
				define amdgpu_ps void @v4(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 0
				%3 = extractelement <4 x float> %1, i32 1
				%4 = extractelement <4 x float> %1, i32 2
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %4, float %4)
				ret void
				}

				; GCN-LABEL: {{^}}v5:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xa
				define amdgpu_ps void @v5(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 1
				%3 = extractelement <4 x float> %1, i32 3
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %3, float %3)
				ret void
				}

				; GCN-LABEL: {{^}}v6:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x6
				define amdgpu_ps void @v6(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 1
				%3 = extractelement <4 x float> %1, i32 2
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %3, float %3)
				ret void
				}

				; GCN-LABEL: {{^}}v7:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0x9
				define amdgpu_ps void @v7(i32 %a1) {
				entry:
				%0 = insertelement <1 x i32> undef, i32 %a1, i32 0
				%1 = call <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32> %0, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				%2 = extractelement <4 x float> %1, i32 0
				%3 = extractelement <4 x float> %1, i32 3
				call void @llvm.SI.export(i32 15, i32 0, i32 1, i32 12, i32 0, float %2, float %3, float %3, float %3)
				ret void
				}

				declare <4 x float> @llvm.amdgcn.image.sample.v1i32(<1 x i32>, <8 x i32>, <4 x i32>, i32, i32) readnone

				declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s
				; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck --check-prefix=GCN %s

				; GCN-LABEL: {{^}}sample:
				; GCN: image_sample {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_cl:
				; GCN: image_sample_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_d:
				; GCN: image_sample_d {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_d(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.d.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_d_cl:
				; GCN: image_sample_d_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_d_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.d.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_l:
				; GCN: image_sample_l {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_l(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.l.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_b:
				; GCN: image_sample_b {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_b(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.b.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_b_cl:
				; GCN: image_sample_b_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_b_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.b.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_lz:
				; GCN: image_sample_lz {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_lz(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.lz.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_cd:
				; GCN: image_sample_cd {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_cd(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.cd.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_cd_cl:
				; GCN: image_sample_cd_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_cd_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c:
				; GCN: image_sample_c {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_cl:
				; GCN: image_sample_c_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_d:
				; GCN: image_sample_c_d {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_d(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.d.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_d_cl:
				; GCN: image_sample_c_d_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_d_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_l:
				; GCN: image_sample_c_l {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_l(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.l.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_b:
				; GCN: image_sample_c_b {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_b(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.b.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_b_cl:
				; GCN: image_sample_c_b_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_b_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_lz:
				; GCN: image_sample_c_lz {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_lz(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.lz.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_cd:
				; GCN: image_sample_c_cd {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_cd(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.cd.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_cd_cl:
				; GCN: image_sample_c_cd_cl {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_cd_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}


				declare <4 x float> @llvm.amdgcn.image.sample.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.d.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.d.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.l.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.b.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.b.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.lz.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.cd.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.cd.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.sample.c.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.d.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.d.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.l.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.b.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.lz.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.cd.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0


				attributes #0 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.o.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck --check-prefix=GCN %s
				; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck --check-prefix=GCN %s

				; GCN-LABEL: {{^}}sample:
				; GCN: image_sample_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_cl:
				; GCN: image_sample_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_d:
				; GCN: image_sample_d_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_d(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.d.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_d_cl:
				; GCN: image_sample_d_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_d_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.d.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_l:
				; GCN: image_sample_l_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_l(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.l.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_b:
				; GCN: image_sample_b_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_b(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.b.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_b_cl:
				; GCN: image_sample_b_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_b_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.b.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_lz:
				; GCN: image_sample_lz_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_lz(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.lz.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_cd:
				; GCN: image_sample_cd_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_cd(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.cd.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_cd_cl:
				; GCN: image_sample_cd_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_cd_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c:
				; GCN: image_sample_c_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_cl:
				; GCN: image_sample_c_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_d:
				; GCN: image_sample_c_d_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_d(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.d.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_d_cl:
				; GCN: image_sample_c_d_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_d_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_l:
				; GCN: image_sample_c_l_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_l(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_b:
				; GCN: image_sample_c_b_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_b(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_b_cl:
				; GCN: image_sample_c_b_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_b_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_lz:
				; GCN: image_sample_c_lz_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_lz(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_cd:
				; GCN: image_sample_c_cd_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_cd(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.cd.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}sample_c_cd_cl:
				; GCN: image_sample_c_cd_cl_o {{v\[[0-9]+:[0-9]+\]}}, {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define void @sample_c_cd_cl(<4 x float> addrspace(1)* %out) {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.o.v4i32(<4 x i32> undef, <8 x i32> undef, <4 x i32> undef, i32 15, i32 0)
				store <4 x float> %r, <4 x float> addrspace(1)* %out
				ret void
				}


				declare <4 x float> @llvm.amdgcn.image.sample.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.d.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.d.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.l.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.b.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.b.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.lz.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.cd.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.cd.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0

				declare <4 x float> @llvm.amdgcn.image.sample.c.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.d.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.d.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.l.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.b.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.b.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.lz.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.cd.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0
				declare <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.o.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #0


				attributes #0 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

	; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s			; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s

	; CHECK-LABEL: {{^}}test1:			; CHECK-LABEL: {{^}}test1:
	; CHECK: image_store			; CHECK: image_store
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0){{$}}			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0){{$}}
	; CHECK-NEXT: image_store			; CHECK-NEXT: image_store
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <4 x float> %d0, <4 x float> %d1, i32 %c0, i32 %c1) {			define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <4 x float> %d0, <4 x float> %d1, i32 %c0, i32 %c1) {
	call void @llvm.amdgcn.image.store.i32(<4 x float> %d0, i32 %c0, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %d0, i32 %c0, <8 x i32> %rsrc, i32 15, i32 3) ; (1<<0 \| 1<<1) unorm + glc
	call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00			call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00
	call void @llvm.amdgcn.image.store.i32(<4 x float> %d1, i32 %c1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %d1, i32 %c1, <8 x i32> %rsrc, i32 15, i32 3) ; (1<<0 \| 1<<1) unorm + glc
	ret void			ret void
	}			}

	; Test that the intrinsic is merged with automatically generated waits and			; Test that the intrinsic is merged with automatically generated waits and
	; emitted as late as possible.			; emitted as late as possible.
	;			;
	; CHECK-LABEL: {{^}}test2:			; CHECK-LABEL: {{^}}test2:
	; CHECK: image_load			; CHECK: image_load
	; CHECK-NOT: s_waitcnt vmcnt(0){{$}}			; CHECK-NOT: s_waitcnt vmcnt(0){{$}}
	; CHECK: s_waitcnt			; CHECK: s_waitcnt
	; CHECK-NEXT: image_store			; CHECK-NEXT: image_store
	define amdgpu_ps void @test2(<8 x i32> inreg %rsrc, i32 %c) {			define amdgpu_ps void @test2(<8 x i32> inreg %rsrc, i32 %c) {
	%t = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%t = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %c, <8 x i32> %rsrc, i32 15, i32 1) ; unorm
	call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00			call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00
	%c.1 = mul i32 %c, 2			%c.1 = mul i32 %c, 2
	call void @llvm.amdgcn.image.store.i32(<4 x float> %t, i32 %c.1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.i32(<4 x float> %t, i32 %c.1, <8 x i32> %rsrc, i32 15, i32 1) ; unorm
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.s.waitcnt(i32) #0			declare void @llvm.amdgcn.s.waitcnt(i32) #0

	declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i32) #1
	declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i32) #0

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/wqm.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=SI	; RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=SI
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=VI	; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=VI

	; Check that WQM isn't triggered by image load/store intrinsics.	; Check that WQM isn't triggered by image load/store intrinsics.
	;	;
	;CHECK-LABEL: {{^}}test1:	;CHECK-LABEL: {{^}}test1:
	;CHECK-NOT: s_wqm	;CHECK-NOT: s_wqm
	define amdgpu_ps <4 x float> @test1(<8 x i32> inreg %rsrc, <4 x i32> %c) {	define amdgpu_ps <4 x float> @test1(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:	main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)	%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i32 1) ; flag = unorm
	call void @llvm.amdgcn.image.store.v4i32(<4 x float> %tex, <4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)	call void @llvm.amdgcn.image.store.v4i32(<4 x float> %tex, <4 x i32> %c, <8 x i32> %rsrc, i32 15, i32 1) ; flag = unorm
	ret <4 x float> %tex	ret <4 x float> %tex
	}	}

	; Check that WQM is triggered by image samples and left untouched for loads...	; Check that WQM is triggered by image samples and left untouched for loads...
	;	;
	;CHECK-LABEL: {{^}}test2:	;CHECK-LABEL: {{^}}test2:
	;CHECK-NEXT: ; %main_body	;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_wqm_b64 exec, exec	;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: image_sample	;CHECK: image_sample
	;CHECK-NOT: exec	;CHECK-NOT: exec
	;CHECK: _load_dword v0,	;CHECK: _load_dword v0,
	define amdgpu_ps float @test2(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <4 x i32> %c) {	define amdgpu_ps float @test2(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <4 x i32> %c) {
	main_body:	main_body:
	%c.1 = call <4 x float> @llvm.SI.image.sample.v4i32(<4 x i32> %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%c.1 = call <4 x float> @llvm.amdgcn.image.sample.v4i32(<4 x i32> %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0) ; flags = 0
	%c.2 = bitcast <4 x float> %c.1 to <4 x i32>	%c.2 = bitcast <4 x float> %c.1 to <4 x i32>
	%c.3 = extractelement <4 x i32> %c.2, i32 0	%c.3 = extractelement <4 x i32> %c.2, i32 0
	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c.3	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c.3
	%data = load float, float addrspace(1)* %gep	%data = load float, float addrspace(1)* %gep
	ret float %data	ret float %data
	}	}

	; ... but disabled for stores (and, in this simple case, not re-enabled).	; ... but disabled for stores (and, in this simple case, not re-enabled).
	;	;
	;CHECK-LABEL: {{^}}test3:	;CHECK-LABEL: {{^}}test3:
	;CHECK-NEXT: ; %main_body	;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK-NEXT: s_wqm_b64 exec, exec	;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: image_sample	;CHECK: image_sample
	;CHECK: s_and_b64 exec, exec, [[ORIG]]	;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: store	;CHECK: store
	;CHECK-NOT: exec	;CHECK-NOT: exec
	;CHECK: .size test3	;CHECK: .size test3
	define amdgpu_ps <4 x float> @test3(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <4 x i32> %c) {	define amdgpu_ps <4 x float> @test3(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <4 x i32> %c) {
	main_body:	main_body:
	%tex = call <4 x float> @llvm.SI.image.sample.v4i32(<4 x i32> %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.v4i32(<4 x i32> %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	%tex.1 = bitcast <4 x float> %tex to <4 x i32>	%tex.1 = bitcast <4 x float> %tex to <4 x i32>
	%tex.2 = extractelement <4 x i32> %tex.1, i32 0	%tex.2 = extractelement <4 x i32> %tex.1, i32 0
	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %tex.2	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %tex.2
	%wr = extractelement <4 x float> %tex, i32 1	%wr = extractelement <4 x float> %tex, i32 1
	store float %wr, float addrspace(1)* %gep	store float %wr, float addrspace(1)* %gep
	ret <4 x float> %tex	ret <4 x float> %tex
	}	}

	; Check that WQM is re-enabled when required.	; Check that WQM is re-enabled when required.
	;	;
Context not available.
	;CHECK: v_mul_lo_i32 [[MUL:v[0-9]+]], v0, v1	;CHECK: v_mul_lo_i32 [[MUL:v[0-9]+]], v0, v1
	;CHECK: s_and_b64 exec, exec, [[ORIG]]	;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: store	;CHECK: store
	;CHECK: s_wqm_b64 exec, exec	;CHECK: s_wqm_b64 exec, exec
	;CHECK: image_sample v[0:3], [[MUL]], s[0:7], s[8:11] dmask:0xf	;CHECK: image_sample v[0:3], [[MUL]], s[0:7], s[8:11] dmask:0xf
	define amdgpu_ps <4 x float> @test4(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %d, float %data) {	define amdgpu_ps <4 x float> @test4(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %d, float %data) {
	main_body:	main_body:
	%c.1 = mul i32 %c, %d	%c.1 = mul i32 %c, %d
	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c.1	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c.1
	store float %data, float addrspace(1)* %gep	store float %data, float addrspace(1)* %gep
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %c.1, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %c.1, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	ret <4 x float> %tex	ret <4 x float> %tex
	}	}

	; Check a case of one branch of an if-else requiring WQM, the other requiring	; Check a case of one branch of an if-else requiring WQM, the other requiring
	; exact.	; exact.
	;	;
	; Note: In this particular case, the save-and-restore could be avoided if the	; Note: In this particular case, the save-and-restore could be avoided if the
	; analysis understood that the two branches of the if-else are mutually	; analysis understood that the two branches of the if-else are mutually
	; exclusive.	; exclusive.
	;	;
Context not available.
	;CHECK: store	;CHECK: store
	;CHECK: s_mov_b64 exec, [[SAVED]]	;CHECK: s_mov_b64 exec, [[SAVED]]
	;CHECK: %IF	;CHECK: %IF
	;CHECK: image_sample	;CHECK: image_sample
	define amdgpu_ps float @test_control_flow_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %z, float %data) {	define amdgpu_ps float @test_control_flow_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %z, float %data) {
	main_body:	main_body:
	%cmp = icmp eq i32 %z, 0	%cmp = icmp eq i32 %z, 0
	br i1 %cmp, label %IF, label %ELSE	br i1 %cmp, label %IF, label %ELSE

	IF:	IF:
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	%data.if = extractelement <4 x float> %tex, i32 0	%data.if = extractelement <4 x float> %tex, i32 0
	br label %END	br label %END

	ELSE:	ELSE:
	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c
	store float %data, float addrspace(1)* %gep	store float %data, float addrspace(1)* %gep
	br label %END	br label %END

	END:	END:
	%r = phi float [ %data.if, %IF ], [ %data, %ELSE ]	%r = phi float [ %data.if, %IF ], [ %data, %ELSE ]
Context not available.
	;CHECK: [[END_BB]]: ; %END	;CHECK: [[END_BB]]: ; %END
	;CHECK: s_or_b64 exec, exec,	;CHECK: s_or_b64 exec, exec,
	;CHECK: v_mov_b32_e32 v0	;CHECK: v_mov_b32_e32 v0
	;CHECK: ; return	;CHECK: ; return
	define amdgpu_ps float @test_control_flow_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %z, float %data) {	define amdgpu_ps float @test_control_flow_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %z, float %data) {
	main_body:	main_body:
	%cmp = icmp eq i32 %z, 0	%cmp = icmp eq i32 %z, 0
	br i1 %cmp, label %ELSE, label %IF	br i1 %cmp, label %ELSE, label %IF

	IF:	IF:
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %c, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	%data.if = extractelement <4 x float> %tex, i32 0	%data.if = extractelement <4 x float> %tex, i32 0
	br label %END	br label %END

	ELSE:	ELSE:
	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %c
	store float %data, float addrspace(1)* %gep	store float %data, float addrspace(1)* %gep
	br label %END	br label %END

	END:	END:
	%r = phi float [ %data.if, %IF ], [ %data, %ELSE ]	%r = phi float [ %data.if, %IF ], [ %data, %ELSE ]
Context not available.
	IF:	IF:
	%coord.IF = mul i32 %coord, 3	%coord.IF = mul i32 %coord, 3
	br label %END	br label %END

	ELSE:	ELSE:
	%coord.ELSE = mul i32 %coord, 4	%coord.ELSE = mul i32 %coord, 4
	br label %END	br label %END

	END:	END:
	%coord.END = phi i32 [ %coord.IF, %IF ], [ %coord.ELSE, %ELSE ]	%coord.END = phi i32 [ %coord.IF, %IF ], [ %coord.ELSE, %ELSE ]
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %coord.END, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %coord.END, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	ret <4 x float> %tex	ret <4 x float> %tex
	}	}

	; ... but only if they really do need it.	; ... but only if they really do need it.
	;	;
	;CHECK-LABEL: {{^}}test_control_flow_3:	;CHECK-LABEL: {{^}}test_control_flow_3:
	;CHECK-NEXT: ; %main_body	;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK-NEXT: s_wqm_b64 exec, exec	;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: image_sample	;CHECK: image_sample
	;CHECK: s_and_b64 exec, exec, [[ORIG]]	;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: store	;CHECK: store
	;CHECK: load	;CHECK: load
	;CHECK: store	;CHECK: store
	;CHECK: v_cmp	;CHECK: v_cmp
	define amdgpu_ps float @test_control_flow_3(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <3 x i32> %idx, <2 x float> %data, i32 %coord) {	define amdgpu_ps float @test_control_flow_3(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <3 x i32> %idx, <2 x float> %data, i32 %coord) {
	main_body:	main_body:
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	%tex.1 = extractelement <4 x float> %tex, i32 0	%tex.1 = extractelement <4 x float> %tex, i32 0

	%idx.1 = extractelement <3 x i32> %idx, i32 0	%idx.1 = extractelement <3 x i32> %idx, i32 0
	%gep.1 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.1	%gep.1 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.1
	%data.1 = extractelement <2 x float> %data, i32 0	%data.1 = extractelement <2 x float> %data, i32 0
	store float %data.1, float addrspace(1)* %gep.1	store float %data.1, float addrspace(1)* %gep.1

	%idx.2 = extractelement <3 x i32> %idx, i32 1	%idx.2 = extractelement <3 x i32> %idx, i32 1
	%gep.2 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.2	%gep.2 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.2
	%z = load float, float addrspace(1)* %gep.2	%z = load float, float addrspace(1)* %gep.2
Context not available.
	%cond = icmp eq i32 %y, 0	%cond = icmp eq i32 %y, 0
	br i1 %cond, label %IF, label %END	br i1 %cond, label %IF, label %END

	IF:	IF:
	%data = load float, float addrspace(1)* %ptr	%data = load float, float addrspace(1)* %ptr
	%gep = getelementptr float, float addrspace(1)* %ptr, i32 1	%gep = getelementptr float, float addrspace(1)* %ptr, i32 1
	store float %data, float addrspace(1)* %gep	store float %data, float addrspace(1)* %gep
	br label %END	br label %END

	END:	END:
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	ret <4 x float> %tex	ret <4 x float> %tex
	}	}

	; Kill is performed in WQM mode so that uniform kill behaves correctly ...	; Kill is performed in WQM mode so that uniform kill behaves correctly ...
	;	;
	;CHECK-LABEL: {{^}}test_kill_0:	;CHECK-LABEL: {{^}}test_kill_0:
	;CHECK-NEXT: ; %main_body	;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK-NEXT: s_wqm_b64 exec, exec	;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: image_sample	;CHECK: image_sample
Context not available.
	;VI: flat_store_dword	;VI: flat_store_dword
	;CHECK: s_wqm_b64 exec, exec	;CHECK: s_wqm_b64 exec, exec
	;CHECK: v_cmpx_	;CHECK: v_cmpx_
	;CHECK: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[ORIG]]	;CHECK: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[ORIG]]
	;SI: buffer_store_dword	;SI: buffer_store_dword
	;VI: flat_store_dword	;VI: flat_store_dword
	;CHECK: s_mov_b64 exec, [[SAVE]]	;CHECK: s_mov_b64 exec, [[SAVE]]
	;CHECK: image_sample	;CHECK: image_sample
	define amdgpu_ps <4 x float> @test_kill_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <2 x i32> %idx, <2 x float> %data, i32 %coord, i32 %coord2, float %z) {	define amdgpu_ps <4 x float> @test_kill_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <2 x i32> %idx, <2 x float> %data, i32 %coord, i32 %coord2, float %z) {
	main_body:	main_body:
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)

	%idx.0 = extractelement <2 x i32> %idx, i32 0	%idx.0 = extractelement <2 x i32> %idx, i32 0
	%gep.0 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.0	%gep.0 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.0
	%data.0 = extractelement <2 x float> %data, i32 0	%data.0 = extractelement <2 x float> %data, i32 0
	store float %data.0, float addrspace(1)* %gep.0	store float %data.0, float addrspace(1)* %gep.0

	call void @llvm.AMDGPU.kill(float %z)	call void @llvm.AMDGPU.kill(float %z)

	%idx.1 = extractelement <2 x i32> %idx, i32 1	%idx.1 = extractelement <2 x i32> %idx, i32 1
	%gep.1 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.1	%gep.1 = getelementptr float, float addrspace(1)* %ptr, i32 %idx.1
	%data.1 = extractelement <2 x float> %data, i32 1	%data.1 = extractelement <2 x float> %data, i32 1
	store float %data.1, float addrspace(1)* %gep.1	store float %data.1, float addrspace(1)* %gep.1

	%tex2 = call <4 x float> @llvm.SI.image.sample.i32(i32 %coord2, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex2 = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %coord2, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)
	%out = fadd <4 x float> %tex, %tex2	%out = fadd <4 x float> %tex, %tex2

	ret <4 x float> %out	ret <4 x float> %out
	}	}

	; ... but only if WQM is necessary.	; ... but only if WQM is necessary.
	;	;
	; CHECK-LABEL: {{^}}test_kill_1:	; CHECK-LABEL: {{^}}test_kill_1:
	; CHECK-NEXT: ; %main_body	; CHECK-NEXT: ; %main_body
	; CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec	; CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	; CHECK: s_wqm_b64 exec, exec	; CHECK: s_wqm_b64 exec, exec
	; CHECK: image_sample	; CHECK: image_sample
	; CHECK: s_and_b64 exec, exec, [[ORIG]]	; CHECK: s_and_b64 exec, exec, [[ORIG]]
	; SI: buffer_store_dword	; SI: buffer_store_dword
	; VI: flat_store_dword	; VI: flat_store_dword
	; CHECK-NOT: wqm	; CHECK-NOT: wqm
	; CHECK: v_cmpx_	; CHECK: v_cmpx_
	define amdgpu_ps <4 x float> @test_kill_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %idx, float %data, i32 %coord, i32 %coord2, float %z) {	define amdgpu_ps <4 x float> @test_kill_1(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %idx, float %data, i32 %coord, i32 %coord2, float %z) {
	main_body:	main_body:
	%tex = call <4 x float> @llvm.SI.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0)	%tex = call <4 x float> @llvm.amdgcn.image.sample.i32(i32 %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i32 15, i32 0)

	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %idx	%gep = getelementptr float, float addrspace(1)* %ptr, i32 %idx
	store float %data, float addrspace(1)* %gep	store float %data, float addrspace(1)* %gep

	call void @llvm.AMDGPU.kill(float %z)	call void @llvm.AMDGPU.kill(float %z)

	ret <4 x float> %tex	ret <4 x float> %tex
	}	}

	; Check prolog shaders.	; Check prolog shaders.
Context not available.
	; CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec	; CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	; CHECK: s_wqm_b64 exec, exec	; CHECK: s_wqm_b64 exec, exec
	; CHECK: v_add_f32_e32 v0,	; CHECK: v_add_f32_e32 v0,
	; CHECK: s_and_b64 exec, exec, [[ORIG]]	; CHECK: s_and_b64 exec, exec, [[ORIG]]
	define amdgpu_ps float @test_prolog_1(float %a, float %b) #4 {	define amdgpu_ps float @test_prolog_1(float %a, float %b) #4 {
	main_body:	main_body:
	%s = fadd float %a, %b	%s = fadd float %a, %b
	ret float %s	ret float %s
	}	}

	declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1	declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #2	declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i32) #2

	declare <4 x float> @llvm.SI.image.sample.i32(i32, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3	declare <4 x float> @llvm.amdgcn.image.sample.i32(i32, <8 x i32>, <4 x i32>, i32, i32) #3
	declare <4 x float> @llvm.SI.image.sample.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3	declare <4 x float> @llvm.amdgcn.image.sample.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32) #3

	declare void @llvm.AMDGPU.kill(float)	declare void @llvm.AMDGPU.kill(float)
	declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)	declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)

	attributes #1 = { nounwind }	attributes #1 = { nounwind }
	attributes #2 = { nounwind readonly }	attributes #2 = { nounwind readonly }
	attributes #3 = { nounwind readnone }	attributes #3 = { nounwind readnone }
	attributes #4 = { "amdgpu-ps-wqm-outputs" }	attributes #4 = { "amdgpu-ps-wqm-outputs" }
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement amdgcn image intrinsics
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 66039

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

lib/Target/AMDGPU/SIDefines.h

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.getlod.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample-masked.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.o.ll

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

test/CodeGen/AMDGPU/wqm.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement amdgcn image intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 66039

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

lib/Target/AMDGPU/SIDefines.h

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.getlod.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample-masked.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.ll

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.o.ll

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

test/CodeGen/AMDGPU/wqm.ll

AMDGPU/SI: Implement amdgcn image intrinsics
ClosedPublic