Download Raw Diff

Details

Reviewers

foad
arsenm
critson
piotr
rampitec

Group Reviewers

Restricted Project

Commits

rG6d5d8b131300: [AMDGPU] gfx11 ldsdir intrinsics and ISel

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Joe_Nash created this revision.Jun 13 2022, 8:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 8:50 AM

Herald added subscribers: kosarev, jsilvanus, hsmhsm and 9 others. · View Herald Transcript

Joe_Nash requested review of this revision.Jun 13 2022, 8:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 8:50 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Joe_Nash added a reviewer: Restricted Project.Jun 13 2022, 8:51 AM

Can we also add something like test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll and test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll to test codegen?

added ISA codegen test.

In D127664#3578415, @foad wrote:

Can we also add something like test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll and test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll to test codegen?

I have added something. Just to note, changes to SIInsertWaitcnt are coming which will affect these tests.

Harbormaster completed remote builds in B169484: Diff 436442.Jun 13 2022, 11:59 AM

Joe_Nash added a child revision: D127756: [AMDGPU] gfx11 VINTERP intrinsics and ISel support.Jun 14 2022, 8:53 AM

Joe_Nash added a reviewer: piotr.Jun 14 2022, 8:55 AM

JonChesterfield added a subscriber: JonChesterfield.Jun 14 2022, 9:36 AM

arsenm added inline comments.Jun 14 2022, 9:43 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1464	Why is the return type hardcoded to float instead of mangled for a load?
1465	Also would expect this to be an addrspace 3 pointer

critson added inline comments.Jun 14 2022, 8:11 PM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1465	This is setting M0, the same as parameter loads. Construction of M0 values is handled by the front-end because it is not a pure address pointer but rather a combination of address offset and flags describing the data type. I guess we could rework this to form M0 in the backend based on an address 3 pointer and a return type.
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll
3 ↗	(On Diff #436442)	There should probably be a move to m0 here?

nhaehnle added inline comments.Jun 15 2022, 12:54 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1465	I think it makes sense to have an intrinsic that closely corresponds to the instruction itself. Maybe add a comment explaining this fact about M0? I agree with Matt that the return value should be mangled.

changed return type of lds_direct_load and commented on input argument type

Joe_Nash added inline comments.Jun 15 2022, 7:07 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1465	I have commented on the input argument, and changed the return type. I am not familiar with the type mangling here, does this look correct?

added s_mov m0 to param.load test

Harbormaster completed remote builds in B169983: Diff 437154.Jun 15 2022, 8:20 AM

removed builtin from intrinsic definition

I think all outstanding issues are addressed; please take another look.

arsenm added inline comments.Jun 15 2022, 1:52 PM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll
2 ↗	(On Diff #437327)	Missing globalisel codegen tests
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll
2 ↗	(On Diff #437327)	Missing globalisel codegen tests

Harbormaster completed remote builds in B170105: Diff 437327.Jun 15 2022, 4:05 PM

added globalisel runlines to codegen test

Joe_Nash marked 2 inline comments as done.Jun 16 2022, 6:18 AM

foad added a child revision: D127963: [AMDGPU] Add support for GFX11 LDSDIR hazards.Jun 16 2022, 7:18 AM

Harbormaster completed remote builds in B170247: Diff 437522.Jun 16 2022, 7:21 AM

Joe_Nash added a reviewer: rampitec.Jun 16 2022, 10:21 AM

rampitec added inline comments.Jun 16 2022, 11:23 AM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96	How does that work? It seems to ignore M0 argument.

Joe_Nash added inline comments.Jun 16 2022, 11:35 AM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96	Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems to work as it should given that.

rampitec added inline comments.Jun 16 2022, 11:42 AM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96	Right, it uses M0, but where is a link from the call argument and actual store of that value into M0?

Joe_Nash added inline comments.Jun 16 2022, 1:44 PM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc

// Label 2092: @108525
GIM_Try, /*On fail goto*//*Label 2093*/ 108577, // Rule ID 3516 //
  GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, Intrinsic::amdgcn_lds_direct_load,
  GIM_CheckType, /*MI*/0, /*Op*/0, /*Type*/GILLT_s32,
  GIM_CheckType, /*MI*/0, /*Op*/2, /*Type*/GILLT_s32,
  GIM_CheckRegBankForClass, /*MI*/0, /*Op*/0, /*RC*/AMDGPU::VGPR_32RegClassID,
  GIM_CheckRegBankForClass, /*MI*/0, /*Op*/2, /*RC*/AMDGPU::M0_CLASSRegClassID,
  // (intrinsic_w_chain:{ *:[f32] } 1864:{ *:[iPTR] }, M0:{ *:[i32] })  =>  (LDS_DIRECT_LOAD:{ *:[f32] } 0:{ *:[i8] })
  GIR_BuildMI, /*InsnID*/1, /*Opcode*/TargetOpcode::COPY,
  GIR_AddRegister, /*InsnID*/1, AMDGPU::M0, /*AddRegisterRegFlags*/RegState::Define,
  GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/2, // M0
  GIR_BuildMI, /*InsnID*/0, /*Opcode*/AMDGPU::LDS_DIRECT_LOAD,
  GIR_Copy, /*NewInsnID*/0, /*OldInsnID*/0, /*OpIdx*/0, // vdst
  GIR_AddImm, /*InsnID*/0, /*Imm*/0,
  GIR_MergeMemOperands, /*InsnID*/0, /*MergeInsnID's*/0, GIU_MergeMemOperands_EndOfList,
  GIR_EraseFromParent, /*InsnID*/0,
  GIR_ConstrainSelectedInstOperands, /*InsnID*/0,
  // GIR_Coverage, 3516,
  GIR_Done,
// Label 2093: @108577
GIM_Reject,

LGTM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96	OK, I must admit I still do not understand how does it work, but it obviously does.

This revision is now accepted and ready to land.Jun 16 2022, 1:55 PM

arsenm added inline comments.Jun 16 2022, 2:00 PM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96	The physical register takes the place in the pattern of what normally would be a named virtual input, like VReg_32:$src1

This revision was landed with ongoing or failed builds.Jun 17 2022, 6:32 AM

Closed by commit rG6d5d8b131300: [AMDGPU] gfx11 ldsdir intrinsics and ISel (authored by Joe_Nash). · Explain Why

This revision was automatically updated to reflect the committed changes.

Joe_Nash added a commit: rG6d5d8b131300: [AMDGPU] gfx11 ldsdir intrinsics and ISel.

foad added inline comments.Jun 17 2022, 7:39 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1461	The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load).

Joe_Nash added inline comments.Jun 17 2022, 9:59 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1461	Done, see 75378d432fda5408b7210fd3627db884561db650

Diff 436428

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,452 Lines • ▼ Show 20 Lines
	// high selects whether high or low 16-bits are loaded from LDS			// high selects whether high or low 16-bits are loaded from LDS
	def int_amdgcn_interp_p2_f16 :			def int_amdgcn_interp_p2_f16 :
	GCCBuiltin<"__builtin_amdgcn_interp_p2_f16">,			GCCBuiltin<"__builtin_amdgcn_interp_p2_f16">,
	Intrinsic<[llvm_half_ty],			Intrinsic<[llvm_half_ty],
	[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty],			[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty],
	[IntrNoMem, IntrSpeculatable, IntrWillReturn,			[IntrNoMem, IntrSpeculatable, IntrWillReturn,
	ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>]>;			ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>]>;

				// __builtin_amdgcn_lds_direct_load <m0>
				foadUnsubmitted Not Done Reply Inline Actions The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load). foad: The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name…
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions Done, see 75378d432fda5408b7210fd3627db884561db650 Joe_Nash: Done, see 75378d432fda5408b7210fd3627db884561db650
				def int_amdgcn_lds_direct_load :
				GCCBuiltin<"__builtin_amdgcn_lds_direct_load">,
				Intrinsic<[llvm_float_ty],
				arsenmUnsubmitted Not Done Reply Inline Actions Why is the return type hardcoded to float instead of mangled for a load? arsenm: Why is the return type hardcoded to float instead of mangled for a load?
				[llvm_i32_ty],
				arsenmUnsubmitted Not Done Reply Inline Actions Also would expect this to be an addrspace 3 pointer arsenm: Also would expect this to be an addrspace 3 pointer
				critsonUnsubmitted Not Done Reply Inline Actions This is setting M0, the same as parameter loads. Construction of M0 values is handled by the front-end because it is not a pure address pointer but rather a combination of address offset and flags describing the data type. I guess we could rework this to form M0 in the backend based on an address 3 pointer and a return type. critson: This is setting M0, the same as parameter loads. Construction of M0 values is handled by the…
				nhaehnleUnsubmitted Not Done Reply Inline Actions I think it makes sense to have an intrinsic that closely corresponds to the instruction itself. Maybe add a comment explaining this fact about M0? I agree with Matt that the return value should be mangled. nhaehnle: I think it makes sense to have an intrinsic that closely corresponds to the instruction itself.
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions I have commented on the input argument, and changed the return type. I am not familiar with the type mangling here, does this look correct? Joe_Nash: I have commented on the input argument, and changed the return type. I am not familiar with the…
				[IntrReadMem, IntrSpeculatable, IntrWillReturn]>;

				// __builtin_amdgcn_lds_param_load <attr_chan>, <attr>, <m0>
				// Like interp intrinsics, this reads from lds, but the memory values are constant,
				// so it behaves like IntrNoMem.
				def int_amdgcn_lds_param_load :
				GCCBuiltin<"__builtin_amdgcn_lds_param_load">,
				Intrinsic<[llvm_float_ty],
				[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
				[IntrNoMem, IntrSpeculatable, IntrWillReturn,
				ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>]>;

	// Deprecated: use llvm.amdgcn.live.mask instead.			// Deprecated: use llvm.amdgcn.live.mask instead.
	def int_amdgcn_ps_live : Intrinsic <			def int_amdgcn_ps_live : Intrinsic <
	[llvm_i1_ty],			[llvm_i1_ty],
	[],			[],
	[IntrNoMem, IntrWillReturn]>;			[IntrNoMem, IntrWillReturn]>;

	// Query currently live lanes.			// Query currently live lanes.
	// Returns true if lane is live (and not a helper lane).			// Returns true if lane is live (and not a helper lane).
	▲ Show 20 Lines • Show All 663 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 3,002 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_writelane: {
constrainOpWithReadfirstlane(MI, MRI, 2); // Source value		constrainOpWithReadfirstlane(MI, MRI, 2); // Source value
constrainOpWithReadfirstlane(MI, MRI, 3); // Index		constrainOpWithReadfirstlane(MI, MRI, 3); // Index
return;		return;
}		}
case Intrinsic::amdgcn_interp_p1:		case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p2:		case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_interp_mov:		case Intrinsic::amdgcn_interp_mov:
case Intrinsic::amdgcn_interp_p1_f16:		case Intrinsic::amdgcn_interp_p1_f16:
case Intrinsic::amdgcn_interp_p2_f16: {		case Intrinsic::amdgcn_interp_p2_f16:
		case Intrinsic::amdgcn_lds_param_load: {
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);

// Readlane for m0 value, which is always the last operand.		// Readlane for m0 value, which is always the last operand.
// FIXME: Should this be a waterfall loop instead?		// FIXME: Should this be a waterfall loop instead?
constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index		constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index
return;		return;
}		}
case Intrinsic::amdgcn_permlane16:		case Intrinsic::amdgcn_permlane16:
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_struct_buffer_load_lds: {
constrainOpWithReadfirstlane(MI, MRI, 6); // soffset		constrainOpWithReadfirstlane(MI, MRI, 6); // soffset
return;		return;
}		}
case Intrinsic::amdgcn_global_load_lds: {		case Intrinsic::amdgcn_global_load_lds: {
applyDefaultMapping(OpdMapper);		applyDefaultMapping(OpdMapper);
constrainOpWithReadfirstlane(MI, MRI, 2);		constrainOpWithReadfirstlane(MI, MRI, 2);
return;		return;
}		}
		case Intrinsic::amdgcn_lds_direct_load: {
		applyDefaultMapping(OpdMapper);
		// Readlane for m0 value, which is always the last operand.
		constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index
		return;
		}
default: {		default: {
if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =		if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
AMDGPU::lookupRsrcIntrinsic(IntrID)) {		AMDGPU::lookupRsrcIntrinsic(IntrID)) {
// Non-images can have complications from operands that allow both SGPR		// Non-images can have complications from operands that allow both SGPR
// and VGPR. For now it's too complicated to figure out the final opcode		// and VGPR. For now it's too complicated to figure out the final opcode
// to derive the register bank from the MCInstrDesc.		// to derive the register bank from the MCInstrDesc.
if (RSrcIntrin->IsImage) {		if (RSrcIntrin->IsImage) {
applyMappingImage(MI, OpdMapper, MRI, RSrcIntrin->RsrcArg);		applyMappingImage(MI, OpdMapper, MRI, RSrcIntrin->RsrcArg);
▲ Show 20 Lines • Show All 1,304 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8: {
OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);		OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);		OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
break;		break;
}		}
case Intrinsic::amdgcn_interp_p1:		case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p2:		case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_interp_mov:		case Intrinsic::amdgcn_interp_mov:
case Intrinsic::amdgcn_interp_p1_f16:		case Intrinsic::amdgcn_interp_p1_f16:
case Intrinsic::amdgcn_interp_p2_f16: {		case Intrinsic::amdgcn_interp_p2_f16:
		case Intrinsic::amdgcn_lds_param_load: {
const int M0Idx = MI.getNumOperands() - 1;		const int M0Idx = MI.getNumOperands() - 1;
Register M0Reg = MI.getOperand(M0Idx).getReg();		Register M0Reg = MI.getOperand(M0Idx).getReg();
unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);		unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();		unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();

OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)		for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);		OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_ds_gws_sema_release_all: {
OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);		OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
break;		break;
}		}
case Intrinsic::amdgcn_global_load_lds: {		case Intrinsic::amdgcn_global_load_lds: {
OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);		OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);		OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
break;		break;
}		}
		case Intrinsic::amdgcn_lds_direct_load: {
		const int M0Idx = MI.getNumOperands() - 1;
		Register M0Reg = MI.getOperand(M0Idx).getReg();
		unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
		unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();

		OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
		for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
		OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);

		// Must be SGPR, but we must take whatever the original bank is and fix it
		// later.
		OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
		break;
		}
default:		default:
return getInvalidInstructionMapping();		return getInvalidInstructionMapping();
}		}
break;		break;
}		}
case AMDGPU::G_SELECT: {		case AMDGPU::G_SELECT: {
unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();		unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,		unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	def : SourceOfDivergence<int_amdgcn_workitem_id_x>;			def : SourceOfDivergence<int_amdgcn_workitem_id_x>;
	def : SourceOfDivergence<int_amdgcn_workitem_id_y>;			def : SourceOfDivergence<int_amdgcn_workitem_id_y>;
	def : SourceOfDivergence<int_amdgcn_workitem_id_z>;			def : SourceOfDivergence<int_amdgcn_workitem_id_z>;
	def : SourceOfDivergence<int_amdgcn_interp_mov>;			def : SourceOfDivergence<int_amdgcn_interp_mov>;
	def : SourceOfDivergence<int_amdgcn_interp_p1>;			def : SourceOfDivergence<int_amdgcn_interp_p1>;
	def : SourceOfDivergence<int_amdgcn_interp_p2>;			def : SourceOfDivergence<int_amdgcn_interp_p2>;
	def : SourceOfDivergence<int_amdgcn_interp_p1_f16>;			def : SourceOfDivergence<int_amdgcn_interp_p1_f16>;
	def : SourceOfDivergence<int_amdgcn_interp_p2_f16>;			def : SourceOfDivergence<int_amdgcn_interp_p2_f16>;
				def : SourceOfDivergence<int_amdgcn_lds_direct_load>;
				def : SourceOfDivergence<int_amdgcn_lds_param_load>;
	def : SourceOfDivergence<int_amdgcn_mbcnt_hi>;			def : SourceOfDivergence<int_amdgcn_mbcnt_hi>;
	def : SourceOfDivergence<int_amdgcn_mbcnt_lo>;			def : SourceOfDivergence<int_amdgcn_mbcnt_lo>;
	def : SourceOfDivergence<int_r600_read_tidig_x>;			def : SourceOfDivergence<int_r600_read_tidig_x>;
	def : SourceOfDivergence<int_r600_read_tidig_y>;			def : SourceOfDivergence<int_r600_read_tidig_y>;
	def : SourceOfDivergence<int_r600_read_tidig_z>;			def : SourceOfDivergence<int_r600_read_tidig_z>;
	def : SourceOfDivergence<int_amdgcn_atomic_inc>;			def : SourceOfDivergence<int_amdgcn_atomic_inc>;
	def : SourceOfDivergence<int_amdgcn_atomic_dec>;			def : SourceOfDivergence<int_amdgcn_atomic_dec>;
	def : SourceOfDivergence<int_amdgcn_global_atomic_csub>;			def : SourceOfDivergence<int_amdgcn_global_atomic_csub>;
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// LDS Direct Instructions			// LDS Direct Instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def LDS_DIRECT_LOAD : LDSDIR_Pseudo<"lds_direct_load", 1>;			def LDS_DIRECT_LOAD : LDSDIR_Pseudo<"lds_direct_load", 1>;
	def LDS_PARAM_LOAD : LDSDIR_Pseudo<"lds_param_load", 0>;			def LDS_PARAM_LOAD : LDSDIR_Pseudo<"lds_param_load", 0>;

				def : GCNPat <
				(f32 (int_amdgcn_lds_direct_load M0)),
				(LDS_DIRECT_LOAD 0)
				rampitecUnsubmitted Not Done Reply Inline Actions How does that work? It seems to ignore M0 argument. rampitec: How does that work? It seems to ignore M0 argument.
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems to work as it should given that. Joe_Nash: Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems…
				rampitecUnsubmitted Not Done Reply Inline Actions Right, it uses M0, but where is a link from the call argument and actual store of that value into M0? rampitec: Right, it uses M0, but where is a link from the call argument and actual store of that value…
				Joe_NashAuthorUnsubmitted Done Reply Inline Actions I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc // Label 2092: @108525 GIM_Try, /On fail goto//Label 2093/ 108577, // Rule ID 3516 // GIM_CheckIntrinsicID, /MI/0, /Op/1, Intrinsic::amdgcn_lds_direct_load, GIM_CheckType, /MI/0, /Op/0, /Type/GILLT_s32, GIM_CheckType, /MI/0, /Op/2, /Type/GILLT_s32, GIM_CheckRegBankForClass, /MI/0, /Op/0, /RC/AMDGPU::VGPR_32RegClassID, GIM_CheckRegBankForClass, /MI/0, /Op/2, /RC/AMDGPU::M0_CLASSRegClassID, // (intrinsic_w_chain:{ :[f32] } 1864:{ :[iPTR] }, M0:{ :[i32] }) => (LDS_DIRECT_LOAD:{ :[f32] } 0:{ :[i8] }) GIR_BuildMI, /InsnID/1, /Opcode/TargetOpcode::COPY, GIR_AddRegister, /InsnID/1, AMDGPU::M0, /AddRegisterRegFlags/RegState::Define, GIR_Copy, /NewInsnID/1, /OldInsnID/0, /OpIdx/2, // M0 GIR_BuildMI, /InsnID/0, /Opcode/AMDGPU::LDS_DIRECT_LOAD, GIR_Copy, /NewInsnID/0, /OldInsnID/0, /OpIdx/0, // vdst GIR_AddImm, /InsnID/0, /Imm/0, GIR_MergeMemOperands, /InsnID/0, /MergeInsnID's/0, GIU_MergeMemOperands_EndOfList, GIR_EraseFromParent, /InsnID/0, GIR_ConstrainSelectedInstOperands, /InsnID/0, // GIR_Coverage, 3516, GIR_Done, // Label 2093: @108577 GIM_Reject, Joe_Nash:* I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc // Label…
				rampitecUnsubmitted Not Done Reply Inline Actions OK, I must admit I still do not understand how does it work, but it obviously does. rampitec: OK, I must admit I still do not understand how does it work, but it obviously does.
				arsenmUnsubmitted Not Done Reply Inline Actions The physical register takes the place in the pattern of what normally would be a named virtual input, like VReg_32:$src1 arsenm: The physical register takes the place in the pattern of what normally would be a named virtual…
				>;

				def : GCNPat <
				(f32 (int_amdgcn_lds_param_load timm:$attrchan, timm:$attr, M0)),
				(LDS_PARAM_LOAD timm:$attr, timm:$attrchan, 0)
				>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GFX11+			// GFX11+
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	multiclass LDSDIR_Real_gfx11<bits<2> op, LDSDIR_Pseudo lds = !cast<LDSDIR_Pseudo>(NAME)> {			multiclass LDSDIR_Real_gfx11<bits<2> op, LDSDIR_Pseudo lds = !cast<LDSDIR_Pseudo>(NAME)> {
	def _gfx11 : LDSDIR_Real<op, lds, SIEncodingFamily.GFX11> {			def _gfx11 : LDSDIR_Real<op, lds, SIEncodingFamily.GFX11> {
	let AssemblerPredicate = isGFX11Plus;			let AssemblerPredicate = isGFX11Plus;
	let DecoderNamespace = "GFX11";			let DecoderNamespace = "GFX11";
	}			}
	}			}

	defm LDS_PARAM_LOAD : LDSDIR_Real_gfx11<0x0>;			defm LDS_PARAM_LOAD : LDSDIR_Real_gfx11<0x0>;
	defm LDS_DIRECT_LOAD : LDSDIR_Real_gfx11<0x1>;			defm LDS_DIRECT_LOAD : LDSDIR_Real_gfx11<0x1>;

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.direct.load.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s \| FileCheck %s
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: lds_direct_load_s
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $sgpr0
				; CHECK-LABEL: name: lds_direct_load_s
				; CHECK: liveins: $sgpr0
				; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), [[COPY]](s32)
				%0:_(s32) = COPY $sgpr0
				%1:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), %0
				...

				---
				name: lds_direct_load_v
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: lds_direct_load_v
				; CHECK: liveins: $vgpr0
				; CHECK: [[COPY:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0
				; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32(s32) = V_READFIRSTLANE_B32 [[COPY]](s32), implicit $exec
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), [[V_READFIRSTLANE_B32_]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.lds.direct.load), %0
				...

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.param.load.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs -o - %s \| FileCheck %s
				# RUN: llc -march=amdgcn -mcpu=gfx1100 -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: lds_param_load_s
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $sgpr0
				; CHECK-LABEL: name: lds_param_load_s
				; CHECK: liveins: $sgpr0
				; CHECK: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, [[COPY]](s32)
				%0:_(s32) = COPY $sgpr0
				%1:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, %0
				...

				---
				name: lds_param_load_v
				legalized: true
				tracksRegLiveness: true

				body: \|
				bb.0:
				liveins: $vgpr0
				; CHECK-LABEL: name: lds_param_load_v
				; CHECK: liveins: $vgpr0
				; CHECK: [[COPY:%[0-9]+]]:vgpr_32(s32) = COPY $vgpr0
				; CHECK: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32(s32) = V_READFIRSTLANE_B32 [[COPY]](s32), implicit $exec
				; CHECK: [[INT:%[0-9]+]]:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, [[V_READFIRSTLANE_B32_]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.lds.param.load), 1, 1, %0
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 ldsdir intrinsics and ISel
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 436428

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.direct.load.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.param.load.mir

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 ldsdir intrinsics and ISelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 436428

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.direct.load.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.lds.param.load.mir

[AMDGPU] gfx11 ldsdir intrinsics and ISel
ClosedPublic