Diff 437872

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,482 Lines • ▼ Show 20 Lines
	// high selects whether high or low 16-bits are loaded from LDS			// high selects whether high or low 16-bits are loaded from LDS
	def int_amdgcn_interp_p2_f16 :			def int_amdgcn_interp_p2_f16 :
	GCCBuiltin<"__builtin_amdgcn_interp_p2_f16">,			GCCBuiltin<"__builtin_amdgcn_interp_p2_f16">,
	Intrinsic<[llvm_half_ty],			Intrinsic<[llvm_half_ty],
	[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty],			[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty],
	[IntrNoMem, IntrSpeculatable, IntrWillReturn,			[IntrNoMem, IntrSpeculatable, IntrWillReturn,
	ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>]>;			ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>]>;

				// __int_amdgcn_lds_direct_load <m0>
				foadUnsubmitted Not Done Reply Inline Actions The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load). foad: The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name…
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions Done, see 75378d432fda5408b7210fd3627db884561db650 Joe_Nash: Done, see 75378d432fda5408b7210fd3627db884561db650
				// The input argument is m0, which contains a packed combination of address
				// offset and flags describing the data type.
				def int_amdgcn_lds_direct_load :
				arsenmUnsubmitted Not Done Reply Inline Actions Why is the return type hardcoded to float instead of mangled for a load? arsenm: Why is the return type hardcoded to float instead of mangled for a load?
				Intrinsic<[llvm_any_ty], // overloaded for types u8, u16, i32/f32, i8, i16
				arsenmUnsubmitted Not Done Reply Inline Actions Also would expect this to be an addrspace 3 pointer arsenm: Also would expect this to be an addrspace 3 pointer
				critsonUnsubmitted Not Done Reply Inline Actions This is setting M0, the same as parameter loads. Construction of M0 values is handled by the front-end because it is not a pure address pointer but rather a combination of address offset and flags describing the data type. I guess we could rework this to form M0 in the backend based on an address 3 pointer and a return type. critson: This is setting M0, the same as parameter loads. Construction of M0 values is handled by the…
				nhaehnleUnsubmitted Not Done Reply Inline Actions I think it makes sense to have an intrinsic that closely corresponds to the instruction itself. Maybe add a comment explaining this fact about M0? I agree with Matt that the return value should be mangled. nhaehnle: I think it makes sense to have an intrinsic that closely corresponds to the instruction itself.
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions I have commented on the input argument, and changed the return type. I am not familiar with the type mangling here, does this look correct? Joe_Nash: I have commented on the input argument, and changed the return type. I am not familiar with the…
				[llvm_i32_ty],
				[IntrReadMem, IntrSpeculatable, IntrWillReturn]>;

				// __int_amdgcn_lds_param_load <attr_chan>, <attr>, <m0>
				// Like interp intrinsics, this reads from lds, but the memory values are constant,
				// so it behaves like IntrNoMem.
				def int_amdgcn_lds_param_load :
				Intrinsic<[llvm_float_ty],
				[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
				[IntrNoMem, IntrSpeculatable, IntrWillReturn,
				ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>]>;

	// Deprecated: use llvm.amdgcn.live.mask instead.			// Deprecated: use llvm.amdgcn.live.mask instead.
	def int_amdgcn_ps_live : Intrinsic <			def int_amdgcn_ps_live : Intrinsic <
	[llvm_i1_ty],			[llvm_i1_ty],
	[],			[],
	[IntrNoMem, IntrWillReturn]>;			[IntrNoMem, IntrWillReturn]>;

	// Query currently live lanes.			// Query currently live lanes.
	// Returns true if lane is live (and not a helper lane).			// Returns true if lane is live (and not a helper lane).
	▲ Show 20 Lines • Show All 768 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 3,002 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_writelane: {
constrainOpWithReadfirstlane(MI, MRI, 2); // Source value		constrainOpWithReadfirstlane(MI, MRI, 2); // Source value
constrainOpWithReadfirstlane(MI, MRI, 3); // Index		constrainOpWithReadfirstlane(MI, MRI, 3); // Index
return;		return;
}		}
case Intrinsic::amdgcn_interp_p1:		case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p2:		case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_interp_mov:		case Intrinsic::amdgcn_interp_mov:
case Intrinsic::amdgcn_interp_p1_f16:		case Intrinsic::amdgcn_interp_p1_f16:
case Intrinsic::amdgcn_interp_p2_f16: {		case Intrinsic::amdgcn_interp_p2_f16:
		case Intrinsic::amdgcn_lds_param_load: {
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);

// Readlane for m0 value, which is always the last operand.		// Readlane for m0 value, which is always the last operand.
// FIXME: Should this be a waterfall loop instead?		// FIXME: Should this be a waterfall loop instead?
constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index		constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index
return;		return;
}		}
case Intrinsic::amdgcn_permlane16:		case Intrinsic::amdgcn_permlane16:
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_global_load_lds: {
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);
constrainOpWithReadfirstlane(MI, MRI, 2);		constrainOpWithReadfirstlane(MI, MRI, 2);
return;		return;
}		}
case Intrinsic::amdgcn_exp_row:		case Intrinsic::amdgcn_exp_row:
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);
constrainOpWithReadfirstlane(MI, MRI, 8); // M0		constrainOpWithReadfirstlane(MI, MRI, 8); // M0
return;		return;
		case Intrinsic::amdgcn_lds_direct_load: {
		applyDefaultMapping(OpdMapper);
		// Readlane for m0 value, which is always the last operand.
		constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index
		return;
		}
default: {		default: {
if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =		if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
AMDGPU::lookupRsrcIntrinsic(IntrID)) {		AMDGPU::lookupRsrcIntrinsic(IntrID)) {
// Non-images can have complications from operands that allow both SGPR		// Non-images can have complications from operands that allow both SGPR
// and VGPR. For now it's too complicated to figure out the final opcode		// and VGPR. For now it's too complicated to figure out the final opcode
// to derive the register bank from the MCInstrDesc.		// to derive the register bank from the MCInstrDesc.
if (RSrcIntrin->IsImage) {		if (RSrcIntrin->IsImage) {
applyMappingImage(MI, OpdMapper, MRI, RSrcIntrin->RsrcArg);		applyMappingImage(MI, OpdMapper, MRI, RSrcIntrin->RsrcArg);
▲ Show 20 Lines • Show All 1,310 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8: {
OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);		OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);		OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
break;		break;
}		}
case Intrinsic::amdgcn_interp_p1:		case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p2:		case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_interp_mov:		case Intrinsic::amdgcn_interp_mov:
case Intrinsic::amdgcn_interp_p1_f16:		case Intrinsic::amdgcn_interp_p1_f16:
case Intrinsic::amdgcn_interp_p2_f16: {		case Intrinsic::amdgcn_interp_p2_f16:
		case Intrinsic::amdgcn_lds_param_load: {
const int M0Idx = MI.getNumOperands() - 1;		const int M0Idx = MI.getNumOperands() - 1;
Register M0Reg = MI.getOperand(M0Idx).getReg();		Register M0Reg = MI.getOperand(M0Idx).getReg();
unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);		unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();		unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();

OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)		for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);		OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_ds_gws_sema_release_all: {
OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);		OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
break;		break;
}		}
case Intrinsic::amdgcn_global_load_lds: {		case Intrinsic::amdgcn_global_load_lds: {
OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);		OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);		OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
break;		break;
}		}
		case Intrinsic::amdgcn_lds_direct_load: {
		const int M0Idx = MI.getNumOperands() - 1;
		Register M0Reg = MI.getOperand(M0Idx).getReg();
		unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
		unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();

		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
		for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
		OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);

		// Must be SGPR, but we must take whatever the original bank is and fix it
		// later.
		OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
		break;
		}
case Intrinsic::amdgcn_ds_add_gs_reg_rtn:		case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:		case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);		OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);		OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
break;		break;
default:		default:
return getInvalidInstructionMapping();		return getInvalidInstructionMapping();
}		}
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	def : SourceOfDivergence<int_amdgcn_workitem_id_x>;			def : SourceOfDivergence<int_amdgcn_workitem_id_x>;
	def : SourceOfDivergence<int_amdgcn_workitem_id_y>;			def : SourceOfDivergence<int_amdgcn_workitem_id_y>;
	def : SourceOfDivergence<int_amdgcn_workitem_id_z>;			def : SourceOfDivergence<int_amdgcn_workitem_id_z>;
	def : SourceOfDivergence<int_amdgcn_interp_mov>;			def : SourceOfDivergence<int_amdgcn_interp_mov>;
	def : SourceOfDivergence<int_amdgcn_interp_p1>;			def : SourceOfDivergence<int_amdgcn_interp_p1>;
	def : SourceOfDivergence<int_amdgcn_interp_p2>;			def : SourceOfDivergence<int_amdgcn_interp_p2>;
	def : SourceOfDivergence<int_amdgcn_interp_p1_f16>;			def : SourceOfDivergence<int_amdgcn_interp_p1_f16>;
	def : SourceOfDivergence<int_amdgcn_interp_p2_f16>;			def : SourceOfDivergence<int_amdgcn_interp_p2_f16>;
				def : SourceOfDivergence<int_amdgcn_lds_direct_load>;
				def : SourceOfDivergence<int_amdgcn_lds_param_load>;
	def : SourceOfDivergence<int_amdgcn_mbcnt_hi>;			def : SourceOfDivergence<int_amdgcn_mbcnt_hi>;
	def : SourceOfDivergence<int_amdgcn_mbcnt_lo>;			def : SourceOfDivergence<int_amdgcn_mbcnt_lo>;
	def : SourceOfDivergence<int_r600_read_tidig_x>;			def : SourceOfDivergence<int_r600_read_tidig_x>;
	def : SourceOfDivergence<int_r600_read_tidig_y>;			def : SourceOfDivergence<int_r600_read_tidig_y>;
	def : SourceOfDivergence<int_r600_read_tidig_z>;			def : SourceOfDivergence<int_r600_read_tidig_z>;
	def : SourceOfDivergence<int_amdgcn_atomic_inc>;			def : SourceOfDivergence<int_amdgcn_atomic_inc>;
	def : SourceOfDivergence<int_amdgcn_atomic_dec>;			def : SourceOfDivergence<int_amdgcn_atomic_dec>;
	def : SourceOfDivergence<int_amdgcn_global_atomic_csub>;			def : SourceOfDivergence<int_amdgcn_global_atomic_csub>;
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// LDS Direct Instructions			// LDS Direct Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def LDS_DIRECT_LOAD : LDSDIR_Pseudo<"lds_direct_load", 1>;			def LDS_DIRECT_LOAD : LDSDIR_Pseudo<"lds_direct_load", 1>;
	def LDS_PARAM_LOAD : LDSDIR_Pseudo<"lds_param_load", 0>;			def LDS_PARAM_LOAD : LDSDIR_Pseudo<"lds_param_load", 0>;

				def : GCNPat <
				(f32 (int_amdgcn_lds_direct_load M0)),
				(LDS_DIRECT_LOAD 0)
				rampitecUnsubmitted Not Done Reply Inline Actions How does that work? It seems to ignore M0 argument. rampitec: How does that work? It seems to ignore M0 argument.
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems to work as it should given that. Joe_Nash: Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems…
				rampitecUnsubmitted Not Done Reply Inline Actions Right, it uses M0, but where is a link from the call argument and actual store of that value into M0? rampitec: Right, it uses M0, but where is a link from the call argument and actual store of that value…
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc // Label 2092: @108525 GIM_Try, /On fail goto//Label 2093/ 108577, // Rule ID 3516 // GIM_CheckIntrinsicID, /MI/0, /Op/1, Intrinsic::amdgcn_lds_direct_load, GIM_CheckType, /MI/0, /Op/0, /Type/GILLT_s32, GIM_CheckType, /MI/0, /Op/2, /Type/GILLT_s32, GIM_CheckRegBankForClass, /MI/0, /Op/0, /RC/AMDGPU::VGPR_32RegClassID, GIM_CheckRegBankForClass, /MI/0, /Op/2, /RC/AMDGPU::M0_CLASSRegClassID, // (intrinsic_w_chain:{ :[f32] } 1864:{ :[iPTR] }, M0:{ :[i32] }) => (LDS_DIRECT_LOAD:{ :[f32] } 0:{ :[i8] }) GIR_BuildMI, /InsnID/1, /Opcode/TargetOpcode::COPY, GIR_AddRegister, /InsnID/1, AMDGPU::M0, /AddRegisterRegFlags/RegState::Define, GIR_Copy, /NewInsnID/1, /OldInsnID/0, /OpIdx/2, // M0 GIR_BuildMI, /InsnID/0, /Opcode/AMDGPU::LDS_DIRECT_LOAD, GIR_Copy, /NewInsnID/0, /OldInsnID/0, /OpIdx/0, // vdst GIR_AddImm, /InsnID/0, /Imm/0, GIR_MergeMemOperands, /InsnID/0, /MergeInsnID's/0, GIU_MergeMemOperands_EndOfList, GIR_EraseFromParent, /InsnID/0, GIR_ConstrainSelectedInstOperands, /InsnID/0, // GIR_Coverage, 3516, GIR_Done, // Label 2093: @108577 GIM_Reject, Joe_Nash:* I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc // Label…
				rampitecUnsubmitted Not Done Reply Inline Actions OK, I must admit I still do not understand how does it work, but it obviously does. rampitec: OK, I must admit I still do not understand how does it work, but it obviously does.
				arsenmUnsubmitted Not Done Reply Inline Actions The physical register takes the place in the pattern of what normally would be a named virtual input, like VReg_32:$src1 arsenm: The physical register takes the place in the pattern of what normally would be a named virtual…
				>;

				def : GCNPat <
				(f32 (int_amdgcn_lds_param_load timm:$attrchan, timm:$attr, M0)),
				(LDS_PARAM_LOAD timm:$attr, timm:$attrchan, 0)
				>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GFX11+			// GFX11+
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	multiclass LDSDIR_Real_gfx11<bits<2> op, LDSDIR_Pseudo lds = !cast<LDSDIR_Pseudo>(NAME)> {			multiclass LDSDIR_Real_gfx11<bits<2> op, LDSDIR_Pseudo lds = !cast<LDSDIR_Pseudo>(NAME)> {
	def _gfx11 : LDSDIR_Real<op, lds, SIEncodingFamily.GFX11> {			def _gfx11 : LDSDIR_Real<op, lds, SIEncodingFamily.GFX11> {
	let AssemblerPredicate = isGFX11Plus;			let AssemblerPredicate = isGFX11Plus;
	let DecoderNamespace = "GFX11";			let DecoderNamespace = "GFX11";
	}			}
	}			}

	defm LDS_PARAM_LOAD : LDSDIR_Real_gfx11<0x0>;			defm LDS_PARAM_LOAD : LDSDIR_Real_gfx11<0x0>;
	defm LDS_DIRECT_LOAD : LDSDIR_Real_gfx11<0x1>;			defm LDS_DIRECT_LOAD : LDSDIR_Real_gfx11<0x1>;

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.direct.load.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s \| FileCheck %s
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: lds_direct_load_s
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $sgpr0
				; CHECK-LABEL: name: lds_direct_load_s
				; CHECK: liveins: $sgpr0
				; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), [[COPY]](s32)
				%0:_(s32) = COPY $sgpr0
				%1:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), %0
				...

				---
				name: lds_direct_load_v
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: lds_direct_load_v
				; CHECK: liveins: $vgpr0
				; CHECK: [[COPY:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0
				; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32(s32) = V_READFIRSTLANE_B32 [[COPY]](s32), implicit $exec
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), [[V_READFIRSTLANE_B32_]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), %0
				...

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.param.load.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s \| FileCheck %s
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: lds_param_load_s
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $sgpr0
				; CHECK-LABEL: name: lds_param_load_s
				; CHECK: liveins: $sgpr0
				; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, [[COPY]](s32)
				%0:_(s32) = COPY $sgpr0
				%1:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, %0
				...

				---
				name: lds_param_load_v
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: lds_param_load_v
				; CHECK: liveins: $vgpr0
				; CHECK: [[COPY:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0
				; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32(s32) = V_READFIRSTLANE_B32 [[COPY]](s32), implicit $exec
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, [[V_READFIRSTLANE_B32_]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, %0
				...

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s

				arsenmUnsubmitted Done Reply Inline Actions Missing globalisel codegen tests arsenm: Missing globalisel codegen tests
				; GFX11-LABEL: {{^}}lds_direct_load:
				; GFX11: s_mov_b32 m0
				; GFX11: lds_direct_load v{{[0-9]+}}
				; GFX11: s_mov_b32 m0
				; GFX11: lds_direct_load v{{[0-9]+}}
				; GFX11: s_mov_b32 m0
				; GFX11: lds_direct_load v{{[0-9]+}}
				; GFX11: v_add_f32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				define amdgpu_ps void @lds_direct_load(<4 x i32> inreg %buf, i32 inreg %arg0,
				i32 inreg %arg1, i32 inreg %arg2) #0 {
				main_body:
				%p0 = call float @llvm.amdgcn.lds.direct.load(i32 %arg0)
				; Ensure memory clustering is occuring for lds_direct_load
				%p5 = fadd float %p0, 1.0
				%p1 = call float @llvm.amdgcn.lds.direct.load(i32 %arg1)
				%p2 = call float @llvm.amdgcn.lds.direct.load(i32 %arg2)
				%p3 = call float @llvm.amdgcn.lds.direct.load(i32 %arg1)
				%p4 = call float @llvm.amdgcn.lds.direct.load(i32 %arg2)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p5, <4 x i32> %buf, i32 4, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p1, <4 x i32> %buf, i32 4, i32 1, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p2, <4 x i32> %buf, i32 4, i32 2, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p3, <4 x i32> %buf, i32 4, i32 3, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p4, <4 x i32> %buf, i32 4, i32 4, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p0, <4 x i32> %buf, i32 4, i32 5, i32 0)
				ret void
				}

				declare float @llvm.amdgcn.lds.direct.load(i32) #1
				declare void @llvm.amdgcn.raw.buffer.store.f32(float, <4 x i32>, i32, i32, i32)

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readonly }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX11 %s

				arsenmUnsubmitted Done Reply Inline Actions Missing globalisel codegen tests arsenm: Missing globalisel codegen tests
				; GFX11-LABEL: {{^}}lds_param_load:
				critsonUnsubmitted Done Reply Inline Actions There should probably be a move to m0 here? critson: There should probably be a move to m0 here?
				; GFX11: s_mov_b32 m0
				; GFX11-DAG: lds_param_load v{{[0-9]+}}, attr0.x
				; GFX11-DAG: lds_param_load v{{[0-9]+}}, attr0.y
				; GFX11-DAG: lds_param_load v{{[0-9]+}}, attr0.z
				; GFX11-DAG: lds_param_load v{{[0-9]+}}, attr0.w
				; GFX11-DAG: lds_param_load v{{[0-9]+}}, attr1.x
				; GFX11: v_add_f32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				; GFX11: buffer_store_b32
				define amdgpu_ps void @lds_param_load(<4 x i32> inreg %buf, i32 inreg %arg) #0 {
				main_body:
				%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %arg)
				; Ensure memory clustering is occuring for lds_param_load
				%p5 = fadd float %p0, 1.0
				%p1 = call float @llvm.amdgcn.lds.param.load(i32 1, i32 0, i32 %arg)
				%p2 = call float @llvm.amdgcn.lds.param.load(i32 2, i32 0, i32 %arg)
				%p3 = call float @llvm.amdgcn.lds.param.load(i32 3, i32 0, i32 %arg)
				%p4 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %arg)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p5, <4 x i32> %buf, i32 4, i32 0, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p1, <4 x i32> %buf, i32 4, i32 1, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p2, <4 x i32> %buf, i32 4, i32 2, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p3, <4 x i32> %buf, i32 4, i32 3, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p4, <4 x i32> %buf, i32 4, i32 4, i32 0)
				call void @llvm.amdgcn.raw.buffer.store.f32(float %p0, <4 x i32> %buf, i32 4, i32 5, i32 0)
				ret void
				}

				declare float @llvm.amdgcn.lds.param.load(i32, i32, i32) #1
				declare void @llvm.amdgcn.raw.buffer.store.f32(float, <4 x i32>, i32, i32, i32)

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 ldsdir intrinsics and ISel
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437872

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.direct.load.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.param.load.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 ldsdir intrinsics and ISelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 437872

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.direct.load.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.param.load.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll

[AMDGPU] gfx11 ldsdir intrinsics and ISel
ClosedPublic