Diff 74300

include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	// and always uses rtz, so is not suitable for implementing the OpenCL			// and always uses rtz, so is not suitable for implementing the OpenCL
	// fract function. It should be ok on VI.			// fract function. It should be ok on VI.
	def int_amdgcn_fract : Intrinsic<			def int_amdgcn_fract : Intrinsic<
	[llvm_anyfloat_ty], [LLVMMatchType<0>], [IntrNoMem]			[llvm_anyfloat_ty], [LLVMMatchType<0>], [IntrNoMem]
	>;			>;

	def int_amdgcn_class : Intrinsic<			def int_amdgcn_class : Intrinsic<
	[llvm_i1_ty], [llvm_anyfloat_ty, llvm_i32_ty], [IntrNoMem]			[llvm_i1_ty], [llvm_anyfloat_ty, llvm_i32_ty], [IntrNoMem]
	>;			>;

	def int_amdgcn_cubeid : GCCBuiltin<"__builtin_amdgcn_cubeid">,			def int_amdgcn_cubeid : GCCBuiltin<"__builtin_amdgcn_cubeid">,
	Intrinsic<[llvm_float_ty],			Intrinsic<[llvm_float_ty],
	[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]			[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]
	>;			>;
				nhaehnleUnsubmitted Not Done Reply Inline Actions Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa. nhaehnle: Can we just not do this kind of change, please? I don't see how it improves anything, it's…
				nhaustovUnsubmitted Not Done Reply Inline Actions I agree. Separate flags also play nicely with assembler. nhaustov: I agree. Separate flags also play nicely with assembler.
				arsenmUnsubmitted Not Done Reply Inline Actions The assembler has nothing to do with the intrinsic definition. This isn't changing the MachineInstr's operand structure arsenm: The assembler has nothing to do with the intrinsic definition. This isn't changing the…
				cfangUnsubmitted Not Done Reply Inline Actions We plan to add and expose d16 bit, so the Intrinsics for Mesa has to be updated anyway. For the future chips, we may add more flag (bit) and all applications (including Mesa) have to be updated every time a new flag is added. One advantage of using mask parameter is that we don't have to update the application if the new bit is not exposed. cfang: We plan to add and expose d16 bit, so the Intrinsics for Mesa has to be updated anyway. For the…
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa. I really prefer using a mask over having to create an entire new set of intrinsics each time we have to add a new bit. I think we have two solutions here: As Matt has suggested, keep mesa the same, and have the auto-upgrader in LLVM change from the old intrinsics to the new mask style intrinsics. Although, with this solution, I think we'd still eventually want/need Mesa to start using the mask version. Define the intrinsics as var_arg. This would allow us to add to i1 operands without breaking the existing operands. I'm just not sure how well var_arg intrinsics are supported. tstellarAMD: > Can we just not do this kind of change, please? I don't see how it improves anything, it's…

	def int_amdgcn_cubema : GCCBuiltin<"__builtin_amdgcn_cubema">,			def int_amdgcn_cubema : GCCBuiltin<"__builtin_amdgcn_cubema">,
	Intrinsic<[llvm_float_ty],			Intrinsic<[llvm_float_ty],
	tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Changing this intrinsic will break Mesa, we will need to update Mesa before we can commit this. tstellarAMD: Changing this intrinsic will break Mesa, we will need to update Mesa before we can commit this.
	cfangUnsubmitted Not Done Reply Inline Actions We will have to add d16 bit! So Mesa will have to be update anyway. cfang: We will have to add d16 bit! So Mesa will have to be update anyway.
	[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]			[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]
	>;			>;
				arsenmUnsubmitted Not Done Reply Inline Actions The full instruction name is image_get_resinfo, so the intrinsic should be int_amdgcn_image_getresinfo arsenm: The full instruction name is image_get_resinfo, so the intrinsic should be…

	def int_amdgcn_cubesc : GCCBuiltin<"__builtin_amdgcn_cubesc">,			def int_amdgcn_cubesc : GCCBuiltin<"__builtin_amdgcn_cubesc">,
	Intrinsic<[llvm_float_ty],			Intrinsic<[llvm_float_ty],
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This should go in a separate patch. This patch should be only the sampler changes. tstellarAMD: This should go in a separate patch. This patch should be only the sampler changes.
	[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]			[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]
	>;			>;

	def int_amdgcn_cubetc : GCCBuiltin<"__builtin_amdgcn_cubetc">,			def int_amdgcn_cubetc : GCCBuiltin<"__builtin_amdgcn_cubetc">,
	Intrinsic<[llvm_float_ty],			Intrinsic<[llvm_float_ty],
				arsenmUnsubmitted Not Done Reply Inline Actions This requires a descriptive comment (including the values for which bits) arsenm: This requires a descriptive comment (including the values for which bits)
	[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]			[llvm_float_ty, llvm_float_ty, llvm_float_ty], [IntrNoMem]
	>;			>;

	// v_ffbh_i32, as opposed to v_ffbh_u32. For v_ffbh_u32, llvm.ctlz			// v_ffbh_i32, as opposed to v_ffbh_u32. For v_ffbh_u32, llvm.ctlz
	// should be used.			// should be used.
	def int_amdgcn_sffbh :			def int_amdgcn_sffbh :
	Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>], [IntrNoMem]>;			Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>], [IntrNoMem]>;
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This is an unrelated change. tstellarAMD: This is an unrelated change.

	// TODO: Do we want an ordering for these?			// TODO: Do we want an ordering for these?
	def int_amdgcn_atomic_inc : Intrinsic<[llvm_anyint_ty],			def int_amdgcn_atomic_inc : Intrinsic<[llvm_anyint_ty],
	[llvm_anyptr_ty, LLVMMatchType<0>],			[llvm_anyptr_ty, LLVMMatchType<0>],
	[IntrArgMemOnly, NoCapture<0>]			[IntrArgMemOnly, NoCapture<0>]
	>;			>;
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions I'm thinking vdata should be llvm_anyfloat_ty, so we can have it return <4 x half> for the d16 operations. Though it's going to be weird that some <4 x half> values take 4 registers and some only take two. Another thing I'm not sure of is if image samplers always return floating-point values and never integers. tstellarAMD: I'm thinking vdata should be llvm_anyfloat_ty, so we can have it return <4 x half> for the d16…

	def int_amdgcn_atomic_dec : Intrinsic<[llvm_anyint_ty],			def int_amdgcn_atomic_dec : Intrinsic<[llvm_anyint_ty],
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This should be changed to llvm_anyint_ty, so that we can infer the r128 bit. tstellarAMD: This should be changed to llvm_anyint_ty, so that we can infer the r128 bit.
	[llvm_anyptr_ty, LLVMMatchType<0>],			[llvm_anyptr_ty, LLVMMatchType<0>],
	[IntrArgMemOnly, NoCapture<0>]			[IntrArgMemOnly, NoCapture<0>]
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions Moving the sample intrinsics to this file is unrelated to the AMDGPUImageLoad/AMDGPUImageStore changes, so this should be done in a separate patch. tstellarAMD: Moving the sample intrinsics to this file is unrelated to the AMDGPUImageLoad/AMDGPUImageStore…
				cfangUnsubmitted Not Done Reply Inline Actions The patch is to implement amdgcn image inttrinsics, which has three categories: AMDGPUImageLoad, AMDGPUImageStore and AMDGPUImageSample. While AMDGPUImageSample is newly defined, and the other two are modified, they do use the same mechanism, i.e. mask parameter! I think it should be better for them to be together in one patch. cfang: The patch is to implement amdgcn image inttrinsics, which has three categories: AMDGPUImageLoad…
	>;			>;

	class AMDGPUImageLoad : Intrinsic <			class AMDGPUImageLoad : Intrinsic <
	[llvm_v4f32_ty], // vdata(VGPR)			[llvm_anyfloat_ty], // vdata(VGPR)
				tstellarAMDAuthorUnsubmitted Not Done Reply Inline Actions This r128 bit should be dropped. tstellarAMD: This r128 bit should be dropped.
	[llvm_anyint_ty, // vaddr(VGPR)			[llvm_anyint_ty, // vaddr(VGPR)
				nhaehnleUnsubmitted Not Done Reply Inline Actions tfe should be dropped. AFAIU it changes the return type (5 return values instead of 4). nhaehnle: tfe should be dropped. AFAIU it changes the return type (5 return values instead of 4).
	llvm_v8i32_ty, // rsrc(SGPR)			llvm_anyint_ty, // rsrc(SGPR)
	llvm_i32_ty, // dmask(imm)			llvm_i32_ty, // dmask(imm)
	llvm_i1_ty, // r128(imm)
	llvm_i1_ty, // da(imm)
	llvm_i1_ty, // glc(imm)			llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)			llvm_i1_ty, // slc(imm)
				llvm_i1_ty, // lwe(imm)
				llvm_i1_ty], // da(imm)
	[IntrReadMem]>;			[IntrReadMem]>;

	def int_amdgcn_image_load : AMDGPUImageLoad;			def int_amdgcn_image_load : AMDGPUImageLoad;
	def int_amdgcn_image_load_mip : AMDGPUImageLoad;			def int_amdgcn_image_load_mip : AMDGPUImageLoad;
				def int_amdgcn_image_getresinfo : AMDGPUImageLoad;

	class AMDGPUImageStore : Intrinsic <			class AMDGPUImageStore : Intrinsic <
	[],			[],
	[llvm_v4f32_ty, // vdata(VGPR)			[llvm_anyfloat_ty, // vdata(VGPR)
	llvm_anyint_ty, // vaddr(VGPR)			llvm_anyint_ty, // vaddr(VGPR)
	llvm_v8i32_ty, // rsrc(SGPR)			llvm_anyint_ty, // rsrc(SGPR)
	llvm_i32_ty, // dmask(imm)			llvm_i32_ty, // dmask(imm)
	llvm_i1_ty, // r128(imm)
	llvm_i1_ty, // da(imm)
	llvm_i1_ty, // glc(imm)			llvm_i1_ty, // glc(imm)
	llvm_i1_ty], // slc(imm)			llvm_i1_ty, // slc(imm)
				llvm_i1_ty, // lwe(imm)
				llvm_i1_ty], // da(imm)
	[]>;			[]>;

	def int_amdgcn_image_store : AMDGPUImageStore;			def int_amdgcn_image_store : AMDGPUImageStore;
	def int_amdgcn_image_store_mip : AMDGPUImageStore;			def int_amdgcn_image_store_mip : AMDGPUImageStore;

	class AMDGPUImageSample : Intrinsic <			class AMDGPUImageSample : Intrinsic <
	[llvm_anyfloat_ty], // vdata(VGPR)			[llvm_anyfloat_ty], // vdata(VGPR)
	[llvm_anyfloat_ty, // vaddr(VGPR)			[llvm_anyfloat_ty, // vaddr(VGPR)
	llvm_anyint_ty, // rsrc(SGPR)			llvm_anyint_ty, // rsrc(SGPR)
	llvm_v4i32_ty, // sampler(SGPR)			llvm_v4i32_ty, // sampler(SGPR)
	llvm_i32_ty, // dmask(imm)			llvm_i32_ty, // dmask(imm)
	llvm_i1_ty, // unorm(imm)			llvm_i1_ty, // unorm(imm)
	llvm_i1_ty, // glc(imm)			llvm_i1_ty, // glc(imm)
	llvm_i1_ty, // slc(imm)			llvm_i1_ty, // slc(imm)
	llvm_i1_ty, // lwe(imm)			llvm_i1_ty, // lwe(imm)
	llvm_i1_ty], // da(imm)			llvm_i1_ty], // da(imm)
	[IntrReadMem]>;			[IntrReadMem]>;

	// Basic sample			// Basic sample
	def int_amdgcn_image_sample : AMDGPUImageSample;			def int_amdgcn_image_sample : AMDGPUImageSample;
	def int_amdgcn_image_sample_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_cl : AMDGPUImageSample;
	def int_amdgcn_image_sample_d : AMDGPUImageSample;			def int_amdgcn_image_sample_d : AMDGPUImageSample;
	def int_amdgcn_image_sample_d_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_d_cl : AMDGPUImageSample;
	def int_amdgcn_image_sample_l : AMDGPUImageSample;			def int_amdgcn_image_sample_l : AMDGPUImageSample;
	def int_amdgcn_image_sample_b : AMDGPUImageSample;			def int_amdgcn_image_sample_b : AMDGPUImageSample;
	def int_amdgcn_image_sample_b_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_b_cl : AMDGPUImageSample;
	def int_amdgcn_image_sample_lz : AMDGPUImageSample;			def int_amdgcn_image_sample_lz : AMDGPUImageSample;
	def int_amdgcn_image_sample_cd : AMDGPUImageSample;			def int_amdgcn_image_sample_cd : AMDGPUImageSample;
	def int_amdgcn_image_sample_cd_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_cd_cl : AMDGPUImageSample;

	// Sample with comparison			// Sample with comparison
	def int_amdgcn_image_sample_c : AMDGPUImageSample;			def int_amdgcn_image_sample_c : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_c_cl : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_d : AMDGPUImageSample;			def int_amdgcn_image_sample_c_d : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_d_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_c_d_cl : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_l : AMDGPUImageSample;			def int_amdgcn_image_sample_c_l : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_b : AMDGPUImageSample;			def int_amdgcn_image_sample_c_b : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_b_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_c_b_cl : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_lz : AMDGPUImageSample;			def int_amdgcn_image_sample_c_lz : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_cd : AMDGPUImageSample;			def int_amdgcn_image_sample_c_cd : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_cd_cl : AMDGPUImageSample;			def int_amdgcn_image_sample_c_cd_cl : AMDGPUImageSample;

	// Sample with offsets			// Sample with offsets
	def int_amdgcn_image_sample_o : AMDGPUImageSample;			def int_amdgcn_image_sample_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_cl_o : AMDGPUImageSample;			def int_amdgcn_image_sample_cl_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_d_o : AMDGPUImageSample;			def int_amdgcn_image_sample_d_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_d_cl_o : AMDGPUImageSample;			def int_amdgcn_image_sample_d_cl_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_l_o : AMDGPUImageSample;			def int_amdgcn_image_sample_l_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_b_o : AMDGPUImageSample;			def int_amdgcn_image_sample_b_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_b_cl_o : AMDGPUImageSample;			def int_amdgcn_image_sample_b_cl_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_lz_o : AMDGPUImageSample;			def int_amdgcn_image_sample_lz_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_cd_o : AMDGPUImageSample;			def int_amdgcn_image_sample_cd_o : AMDGPUImageSample;
				arsenmUnsubmitted Not Done Reply Inline Actions These are also missing the image part of the name as well arsenm: These are also missing the image part of the name as well
	def int_amdgcn_image_sample_cd_cl_o : AMDGPUImageSample;			def int_amdgcn_image_sample_cd_cl_o : AMDGPUImageSample;

	// Sample with comparison and offsets			// Sample with comparison and offsets
	def int_amdgcn_image_sample_c_o : AMDGPUImageSample;			def int_amdgcn_image_sample_c_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_cl_o : AMDGPUImageSample;			def int_amdgcn_image_sample_c_cl_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_d_o : AMDGPUImageSample;			def int_amdgcn_image_sample_c_d_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_d_cl_o : AMDGPUImageSample;			def int_amdgcn_image_sample_c_d_cl_o : AMDGPUImageSample;
	def int_amdgcn_image_sample_c_l_o : AMDGPUImageSample;			def int_amdgcn_image_sample_c_l_o : AMDGPUImageSample;
	▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

lib/Target/AMDGPU/MIMGInstructions.td

	Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines
	>;			>;

	multiclass ImagePatterns<SDPatternOperator name, string opcode> {			multiclass ImagePatterns<SDPatternOperator name, string opcode> {
	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;			def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;			def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;			def : ImagePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	}			}

	class ImageLoadPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <			multiclass ImageLoadPattern<SDPatternOperator name, MIMG opcode, ValueType vt> {
	(name vt:$addr, v8i32:$rsrc, imm:$dmask, imm:$r128, imm:$da, imm:$glc,			def : Pat <
	imm:$slc),			(v4f32 (name vt:$addr, v8i32:$rsrc, i32:$dmask, i1:$glc, i1:$slc, i1:$lwe,
				i1:$da)),
	(opcode $addr, $rsrc,			(opcode $addr, $rsrc,
	(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),			(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),
	(as_i1imm $r128), 0, 0, (as_i1imm $da))			0, 0, (as_i1imm $lwe), (as_i1imm $da))
	>;			>;
				}

	multiclass ImageLoadPatterns<SDPatternOperator name, string opcode> {			multiclass ImageLoadPatterns<SDPatternOperator name, string opcode> {
	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;			defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;			defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;			defm : ImageLoadPattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	}			}

	class ImageStorePattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <			multiclass ImageStorePattern<SDPatternOperator name, MIMG opcode, ValueType vt> {
	(name v4f32:$data, vt:$addr, v8i32:$rsrc, i32:$dmask, imm:$r128, imm:$da,			def : Pat <
	imm:$glc, imm:$slc),			(name v4f32:$data, vt:$addr, v8i32:$rsrc, i32:$dmask, i1:$glc, i1:$slc,
				i1:$lwe, i1:$da),
	(opcode $data, $addr, $rsrc,			(opcode $data, $addr, $rsrc,
	(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),			(as_i32imm $dmask), 1, (as_i1imm $glc), (as_i1imm $slc),
	(as_i1imm $r128), 0, 0, (as_i1imm $da))			0, 0, (as_i1imm $lwe), (as_i1imm $da))
	>;			>;
				}

	multiclass ImageStorePatterns<SDPatternOperator name, string opcode> {			multiclass ImageStorePatterns<SDPatternOperator name, string opcode> {
	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;			defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V1), i32>;
	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;			defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V2), v2i32>;
	def : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;			defm : ImageStorePattern<name, !cast<MIMG>(opcode # _V4_V4), v4i32>;
	}			}

	class ImageAtomicPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <			class ImageAtomicPattern<SDPatternOperator name, MIMG opcode, ValueType vt> : Pat <
	(name i32:$vdata, vt:$addr, v8i32:$rsrc, imm:$r128, imm:$da, imm:$slc),			(name i32:$vdata, vt:$addr, v8i32:$rsrc, imm:$r128, imm:$da, imm:$slc),
	(opcode $vdata, $addr, $rsrc, 1, 1, 1, (as_i1imm $slc), (as_i1imm $r128), 0, 0, (as_i1imm $da))			(opcode $vdata, $addr, $rsrc, 1, 1, 1, (as_i1imm $slc), (as_i1imm $r128), 0, 0, (as_i1imm $da))
	>;			>;

	multiclass ImageAtomicPatterns<SDPatternOperator name, string opcode> {			multiclass ImageAtomicPatterns<SDPatternOperator name, string opcode> {
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V2, v2i32>;			def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V2, v2i32>;
	def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V4, v4i32>;			def : SampleRawPattern<int_SI_getlod, IMAGE_GET_LOD_V4_V4, v4i32>;

	// ======= amdgcn Image Intrinsics ==============			// ======= amdgcn Image Intrinsics ==============

	// Image load			// Image load
	defm : ImageLoadPatterns<int_amdgcn_image_load, "IMAGE_LOAD">;			defm : ImageLoadPatterns<int_amdgcn_image_load, "IMAGE_LOAD">;
	defm : ImageLoadPatterns<int_amdgcn_image_load_mip, "IMAGE_LOAD_MIP">;			defm : ImageLoadPatterns<int_amdgcn_image_load_mip, "IMAGE_LOAD_MIP">;
				defm : ImageLoadPattern<int_amdgcn_image_getresinfo, IMAGE_GET_RESINFO_V4_V1, i32>;

	// Image store			// Image store
	defm : ImageStorePatterns<int_amdgcn_image_store, "IMAGE_STORE">;			defm : ImageStorePatterns<int_amdgcn_image_store, "IMAGE_STORE">;
	defm : ImageStorePatterns<int_amdgcn_image_store_mip, "IMAGE_STORE_MIP">;			defm : ImageStorePatterns<int_amdgcn_image_store_mip, "IMAGE_STORE_MIP">;

	// Basic sample			// Basic sample
	defm : AMDGCNSamplePatterns<int_amdgcn_image_sample, "IMAGE_SAMPLE">;			defm : AMDGCNSamplePatterns<int_amdgcn_image_sample, "IMAGE_SAMPLE">;
	defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cl, "IMAGE_SAMPLE_CL">;			defm : AMDGCNSamplePatterns<int_amdgcn_image_sample_cl, "IMAGE_SAMPLE_CL">;
	▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s			;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s			;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s

	;CHECK-LABEL: {{^}}image_load_v4i32:			;CHECK-LABEL: {{^}}image_load_v4i32:
	;CHECK: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm			;CHECK: image_load v[0:3], v[0:3], s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) {			define amdgpu_ps <4 x float> @image_load_v4i32(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_v2i32:			;CHECK-LABEL: {{^}}image_load_v2i32:
	;CHECK: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm			;CHECK: image_load v[0:3], v[0:1], s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) {			define amdgpu_ps <4 x float> @image_load_v2i32(<8 x i32> inreg %rsrc, <2 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v2i32.v8i32(<2 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_i32:			;CHECK-LABEL: {{^}}image_load_i32:
	;CHECK: image_load v[0:3], v0, s[0:7] dmask:0xf unorm			;CHECK: image_load v[0:3], v0, s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_i32(<8 x i32> inreg %rsrc, i32 %c) {			define amdgpu_ps <4 x float> @image_load_i32(<8 x i32> inreg %rsrc, i32 %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_mip:			;CHECK-LABEL: {{^}}image_load_mip:
	;CHECK: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm			;CHECK: image_load_mip v[0:3], v[0:3], s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps <4 x float> @image_load_mip(<8 x i32> inreg %rsrc, <4 x i32> %c) {			define amdgpu_ps <4 x float> @image_load_mip(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.mip.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret <4 x float> %tex			ret <4 x float> %tex
	}			}

	;CHECK-LABEL: {{^}}image_load_1:			;CHECK-LABEL: {{^}}image_load_1:
	;CHECK: image_load v0, v[0:3], s[0:7] dmask:0x1 unorm			;CHECK: image_load v0, v[0:3], s[0:7] dmask:0x1 unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	define amdgpu_ps float @image_load_1(<8 x i32> inreg %rsrc, <4 x i32> %c) {			define amdgpu_ps float @image_load_1(<8 x i32> inreg %rsrc, <4 x i32> %c) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	%elt = extractelement <4 x float> %tex, i32 0			%elt = extractelement <4 x float> %tex, i32 0
	; Only first component used, test that dmask etc. is changed accordingly			; Only first component used, test that dmask etc. is changed accordingly
	ret float %elt			ret float %elt
	}			}

	;CHECK-LABEL: {{^}}image_store_v4i32:			;CHECK-LABEL: {{^}}image_store_v4i32:
	;CHECK: image_store v[0:3], v[4:7], s[0:7] dmask:0xf unorm			;CHECK: image_store v[0:3], v[4:7], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_v4i32(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {			define amdgpu_ps void @image_store_v4i32(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v4i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}image_store_v2i32:			;CHECK-LABEL: {{^}}image_store_v2i32:
	;CHECK: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm			;CHECK: image_store v[0:3], v[4:5], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_v2i32(<8 x i32> inreg %rsrc, <4 x float> %data, <2 x i32> %coords) {			define amdgpu_ps void @image_store_v2i32(<8 x i32> inreg %rsrc, <4 x float> %data, <2 x i32> %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.v2i32(<4 x float> %data, <2 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4f32.v2i32.v8i32(<4 x float> %data, <2 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}image_store_i32:			;CHECK-LABEL: {{^}}image_store_i32:
	;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm			;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_i32(<8 x i32> inreg %rsrc, <4 x float> %data, i32 %coords) {			define amdgpu_ps void @image_store_i32(<8 x i32> inreg %rsrc, <4 x float> %data, i32 %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %data, i32 %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret void			ret void
	}			}

	;CHECK-LABEL: {{^}}image_store_mip:			;CHECK-LABEL: {{^}}image_store_mip:
	;CHECK: image_store_mip v[0:3], v[4:7], s[0:7] dmask:0xf unorm			;CHECK: image_store_mip v[0:3], v[4:7], s[0:7] dmask:0xf unorm
	define amdgpu_ps void @image_store_mip(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {			define amdgpu_ps void @image_store_mip(<8 x i32> inreg %rsrc, <4 x float> %data, <4 x i32> %coords) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.v4i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.mip.v4f32.v4i32.v8i32(<4 x float> %data, <4 x i32> %coords, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret void			ret void
	}			}

				;CHECK-LABEL: {{^}}getresinfo:
				;CHECK: image_get_resinfo {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}} dmask:0xf
				define amdgpu_ps void @getresinfo() {
				main_body:
				%r = call <4 x float> @llvm.amdgcn.image.getresinfo.v4f32.i32.v8i32(i32 undef, <8 x i32> undef, i32 15, i1 0, i1 0, i1 0, i1 0)
				%r0 = extractelement <4 x float> %r, i32 0
				%r1 = extractelement <4 x float> %r, i32 1
				%r2 = extractelement <4 x float> %r, i32 2
				%r3 = extractelement <4 x float> %r, i32 3
				call void @llvm.SI.export(i32 15, i32 1, i32 1, i32 0, i32 1, float %r0, float %r1, float %r2, float %r3)
				ret void
				}


	; Ideally, the register allocator would avoid the wait here			; Ideally, the register allocator would avoid the wait here
	;			;
	;CHECK-LABEL: {{^}}image_store_wait:			;CHECK-LABEL: {{^}}image_store_wait:
	;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm			;CHECK: image_store v[0:3], v4, s[0:7] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0) expcnt(0)			;CHECK: s_waitcnt vmcnt(0) expcnt(0)
	;CHECK: image_load v[0:3], v4, s[8:15] dmask:0xf unorm			;CHECK: image_load v[0:3], v4, s[8:15] dmask:0xf unorm
	;CHECK: s_waitcnt vmcnt(0)			;CHECK: s_waitcnt vmcnt(0)
	;CHECK: image_store v[0:3], v4, s[16:23] dmask:0xf unorm			;CHECK: image_store v[0:3], v4, s[16:23] dmask:0xf unorm
	define amdgpu_ps void @image_store_wait(<8 x i32> inreg, <8 x i32> inreg, <8 x i32> inreg, <4 x float>, i32) {			define amdgpu_ps void @image_store_wait(<8 x i32> inreg, <8 x i32> inreg, <8 x i32> inreg, <4 x float>, i32) {
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.i32(<4 x float> %3, i32 %4, <8 x i32> %0, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %3, i32 %4, <8 x i32> %0, i32 15, i1 0, i1 0, i1 0, i1 0)
	%data = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %4, <8 x i32> %1, i32 15, i1 0, i1 0, i1 0, i1 0)			%data = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %4, <8 x i32> %1, i32 15, i1 0, i1 0, i1 0, i1 0)
	call void @llvm.amdgcn.image.store.i32(<4 x float> %data, i32 %4, <8 x i32> %2, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %data, i32 %4, <8 x i32> %2, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0
	declare void @llvm.amdgcn.image.store.v2i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.v4f32.v2i32.v8i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0
	declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0
	declare void @llvm.amdgcn.image.store.mip.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.mip.v4f32.v4i32.v8i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0

				declare <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1
				declare <4 x float> @llvm.amdgcn.image.load.v4f32.v2i32.v8i32(<2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
				declare <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
				declare <4 x float> @llvm.amdgcn.image.load.mip.v4f32.v4i32.v8i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1

				declare <4 x float> @llvm.amdgcn.image.getresinfo.v4f32.i32.v8i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #0

	declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1			declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)
	declare <4 x float> @llvm.amdgcn.image.load.v2i32(<2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare <4 x float> @llvm.amdgcn.image.load.mip.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

	; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s			; RUN: llc -march=amdgcn -mcpu=SI -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=CHECK %s

	; CHECK-LABEL: {{^}}test1:			; CHECK-LABEL: {{^}}test1:
	; CHECK: image_store			; CHECK: image_store
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0){{$}}			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0){{$}}
	; CHECK-NEXT: image_store			; CHECK-NEXT: image_store
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <4 x float> %d0, <4 x float> %d1, i32 %c0, i32 %c1) {			define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <4 x float> %d0, <4 x float> %d1, i32 %c0, i32 %c1) {
	call void @llvm.amdgcn.image.store.i32(<4 x float> %d0, i32 %c0, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %d0, i32 %c0, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)
	call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00			call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00
	call void @llvm.amdgcn.image.store.i32(<4 x float> %d1, i32 %c1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %d1, i32 %c1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 1, i1 0)
	ret void			ret void
	}			}

	; Test that the intrinsic is merged with automatically generated waits and			; Test that the intrinsic is merged with automatically generated waits and
	; emitted as late as possible.			; emitted as late as possible.
	;			;
	; CHECK-LABEL: {{^}}test2:			; CHECK-LABEL: {{^}}test2:
	; CHECK: image_load			; CHECK: image_load
	; CHECK-NOT: s_waitcnt vmcnt(0){{$}}			; CHECK-NOT: s_waitcnt vmcnt(0){{$}}
	; CHECK: s_waitcnt			; CHECK: s_waitcnt
	; CHECK-NEXT: image_store			; CHECK-NEXT: image_store
	define amdgpu_ps void @test2(<8 x i32> inreg %rsrc, i32 %c) {			define amdgpu_ps void @test2(<8 x i32> inreg %rsrc, i32 %c) {
	%t = call <4 x float> @llvm.amdgcn.image.load.i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			%t = call <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32 %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00			call void @llvm.amdgcn.s.waitcnt(i32 3840) ; 0xf00
	%c.1 = mul i32 %c, 2			%c.1 = mul i32 %c, 2
	call void @llvm.amdgcn.image.store.i32(<4 x float> %t, i32 %c.1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)			call void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float> %t, i32 %c.1, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
	ret void			ret void
	}			}

	declare void @llvm.amdgcn.s.waitcnt(i32) #0			declare void @llvm.amdgcn.s.waitcnt(i32) #0

	declare <4 x float> @llvm.amdgcn.image.load.i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1			declare <4 x float> @llvm.amdgcn.image.load.v4f32.i32.v8i32(i32, <8 x i32>, i32, i1, i1, i1, i1) #1
	declare void @llvm.amdgcn.image.store.i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0			declare void @llvm.amdgcn.image.store.v4f32.i32.v8i32(<4 x float>, i32, <8 x i32>, i32, i1, i1, i1, i1) #0

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

test/CodeGen/AMDGPU/wqm.ll

;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=SI		;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=SI
;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=VI		;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=VI

; Check that WQM isn't triggered by image load/store intrinsics.		; Check that WQM isn't triggered by image load/store intrinsics.
;		;
;CHECK-LABEL: {{^}}test1:		;CHECK-LABEL: {{^}}test1:
;CHECK-NOT: s_wqm		;CHECK-NOT: s_wqm
define amdgpu_ps <4 x float> @test1(<8 x i32> inreg %rsrc, <4 x i32> %c) {		define amdgpu_ps <4 x float> @test1(<8 x i32> inreg %rsrc, <4 x i32> %c) {
main_body:		main_body:
%tex = call <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)		%tex = call <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
call void @llvm.amdgcn.image.store.v4i32(<4 x float> %tex, <4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)		call void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float> %tex, <4 x i32> %c, <8 x i32> %rsrc, i32 15, i1 0, i1 0, i1 0, i1 0)
ret <4 x float> %tex		ret <4 x float> %tex
}		}

; Check that WQM is triggered by image samples and left untouched for loads...		; Check that WQM is triggered by image samples and left untouched for loads...
;		;
;CHECK-LABEL: {{^}}test2:		;CHECK-LABEL: {{^}}test2:
;CHECK-NEXT: ; %main_body		;CHECK-NEXT: ; %main_body
;CHECK-NEXT: s_wqm_b64 exec, exec		;CHECK-NEXT: s_wqm_b64 exec, exec
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
; CHECK: [[LOOPHDR]]: ; %loop		; CHECK: [[LOOPHDR]]: ; %loop
; CHECK: v_cmp_lt_f32_e32 vcc, [[SEVEN]], [[CTR]]		; CHECK: v_cmp_lt_f32_e32 vcc, [[SEVEN]], [[CTR]]
; CHECK: s_cbranch_vccz		; CHECK: s_cbranch_vccz
; CHECK: ; %break		; CHECK: ; %break

; CHECK: ; return		; CHECK: ; return
define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) nounwind {		define amdgpu_ps <4 x float> @test_loop_vcc(<4 x float> %in) nounwind {
entry:		entry:
call void @llvm.amdgcn.image.store.v4i32(<4 x float> %in, <4 x i32> undef, <8 x i32> undef, i32 15, i1 0, i1 0, i1 0, i1 0)		call void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float> %in, <4 x i32> undef, <8 x i32> undef, i32 15, i1 0, i1 0, i1 0, i1 0)
br label %loop		br label %loop

loop:		loop:
%ctr.iv = phi float [ 0.0, %entry ], [ %ctr.next, %body ]		%ctr.iv = phi float [ 0.0, %entry ], [ %ctr.next, %body ]
%c.iv = phi <4 x float> [ %in, %entry ], [ %c.next, %body ]		%c.iv = phi <4 x float> [ %in, %entry ], [ %c.next, %body ]
%cc = fcmp ogt float %ctr.iv, 7.0		%cc = fcmp ogt float %ctr.iv, 7.0
br i1 %cc, label %break, label %body		br i1 %cc, label %break, label %body

▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	end:
%r = phi <4 x float> [ %r.if, %if ], [ %r.else, %else ]		%r = phi <4 x float> [ %r.if, %if ], [ %r.else, %else ]

call void @llvm.amdgcn.buffer.store.f32(float 1.0, <4 x i32> undef, i32 %idx, i32 0, i1 0, i1 0)		call void @llvm.amdgcn.buffer.store.f32(float 1.0, <4 x i32> undef, i32 %idx, i32 0, i1 0, i1 0)

ret <4 x float> %r		ret <4 x float> %r
}		}


declare void @llvm.amdgcn.image.store.v4i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1		declare void @llvm.amdgcn.image.store.v4f32.v4i32.v8i32(<4 x float>, <4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
declare void @llvm.amdgcn.buffer.store.f32(float, <4 x i32>, i32, i32, i1, i1) #1		declare void @llvm.amdgcn.buffer.store.f32(float, <4 x i32>, i32, i32, i1, i1) #1
declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #1		declare void @llvm.amdgcn.buffer.store.v4f32(<4 x float>, <4 x i32>, i32, i32, i1, i1) #1

declare <4 x float> @llvm.amdgcn.image.load.v4i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #2		declare <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #2
declare float @llvm.amdgcn.buffer.load.f32(<4 x i32>, i32, i32, i1, i1) #2		declare float @llvm.amdgcn.buffer.load.f32(<4 x i32>, i32, i32, i1, i1) #2

declare <4 x float> @llvm.SI.image.sample.i32(i32, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3		declare <4 x float> @llvm.SI.image.sample.i32(i32, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3
declare <4 x float> @llvm.SI.image.sample.v2i32(<2 x i32>, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3		declare <4 x float> @llvm.SI.image.sample.v2i32(<2 x i32>, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3
declare <4 x float> @llvm.SI.image.sample.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3		declare <4 x float> @llvm.SI.image.sample.v4i32(<4 x i32>, <8 x i32>, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32) #3

declare void @llvm.AMDGPU.kill(float)		declare void @llvm.AMDGPU.kill(float)
declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)		declare void @llvm.SI.export(i32, i32, i32, i32, i32, float, float, float, float)

attributes #1 = { nounwind }		attributes #1 = { nounwind }
attributes #2 = { nounwind readonly }		attributes #2 = { nounwind readonly }
attributes #3 = { nounwind readnone }		attributes #3 = { nounwind readnone }
attributes #4 = { "amdgpu-ps-wqm-outputs" }		attributes #4 = { "amdgpu-ps-wqm-outputs" }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement amdgcn image intrinsics
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74300

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/MIMGInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

test/CodeGen/AMDGPU/wqm.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement amdgcn image intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74300

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/MIMGInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.image.ll

test/CodeGen/AMDGPU/llvm.amdgcn.s.waitcnt.ll

test/CodeGen/AMDGPU/wqm.ll

AMDGPU/SI: Implement amdgcn image intrinsics
ClosedPublic