Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rGd48445d51392: AMDGPU/SI: Implement sendmsghalt intrinsic
rL290977: AMDGPU/SI: Implement sendmsghalt intrinsic

Diff Detail

Repository: rL LLVM

Event Timeline

jvesely updated this revision to Diff 68039.Aug 15 2016, 8:19 AM

jvesely retitled this revision from to AMDGPU/SI: Implement sendmsghalt intrinsic.

jvesely updated this object.

jvesely added a reviewer: • tstellarAMD.

jvesely set the repository for this revision to rL LLVM.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptAug 15 2016, 8:19 AM

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

In D23511#515722, @jvesely wrote:

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

SI.sendmsg also needs to be changed and replaced (as well as the rest of the intrinsics in the backend). The goal is to eventually fix any intrinsic design issues and fix the names when moving them to the public intrinsics

rename and expose both sendmsg intrinsics

In D23511#515739, @arsenm wrote:

In D23511#515722, @jvesely wrote:

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

SI.sendmsg also needs to be changed and replaced (as well as the rest of the intrinsics in the backend). The goal is to eventually fix any intrinsic design issues and fix the names when moving them to the public intrinsics

I understand that, although having a list of approaches considered deprecated would reduce some wasted effort.
My question was whether there are users of the old sendmsg intrinsic name that would break.

I'm not sure how to unbundle writing m0 if we want to expose sendmsg as __builtin.function, is there a generic way to write to m0 from high level language?

In D23511#516775, @jvesely wrote:

In D23511#515739, @arsenm wrote:

In D23511#515722, @jvesely wrote:

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

SI.sendmsg also needs to be changed and replaced (as well as the rest of the intrinsics in the backend). The goal is to eventually fix any intrinsic design issues and fix the names when moving them to the public intrinsics

I understand that, although having a list of approaches considered deprecated would reduce some wasted effort.
My question was whether there are users of the old sendmsg intrinsic name that would break.

I'm not sure how to unbundle writing m0 if we want to expose sendmsg as __builtin.function, is there a generic way to write to m0 from high level language?

We can keep the old intrinsic working while adding the new one. A builtin would be needed for emitting the write to m0 since read/write_register are not directly exposed. My concern about this is what happens if you have something like:

llvm.write_register(m0)
%foo = load i32, i32 addrspace(3)*
llvm.amdgcn.s.sendmsg()

The lowering of the LDS access will insert initialization of m0 to -1, clobbering the old value. I'm not sure if it's better to either switch the M0 initialization lowering to copy the pre-existing value and restore after. I'm also not sure if we should keep considering m0 as an allocatable register

In D23511#517122, @arsenm wrote:

We can keep the old intrinsic working while adding the new one.

If we don't know of any users now, we probably won't know more in the future. I don't mind updating mine.
Let me know your preference, I have an alternative patch that keeps the old name ready.

A builtin would be needed for emitting the write to m0 since read/write_register are not directly exposed. My concern about this is what happens if you have something like:

llvm.write_register(m0)
%foo = load i32, i32 addrspace(3)*
llvm.amdgcn.s.sendmsg()

The lowering of the LDS access will insert initialization of m0 to -1, clobbering the old value. I'm not sure if it's better to either switch the M0 initialization lowering to copy the pre-existing value and restore after. I'm also not sure if we should keep considering m0 as an allocatable register

shouldn't register allocation (picking the same phys m0) and instruction scheduling figure out the hazard and either schedule the instructions correctly or spill?

anyway, It looks like proper handling of m0 would need a separate patch. I'd prefer to leave that for another time. This change just adds a halting copy of sendmsg, so both can be modified at the same time if necessary.

keep the old intrinsic around

Herald added subscribers: nhaehnle, wdng. · View Herald TranscriptSep 14 2016, 8:21 PM

arsenm added inline comments.Dec 20 2016, 9:27 AM

include/llvm/IR/IntrinsicsAMDGPU.td
107–110	These should have a comments explaining the arguments, at least mentioning that the implicit m0 argument is the last one
test/CodeGen/AMDGPU/amdgcn.sendmsg.ll
23	better name would be test_sendmsg or something
31–35	I would split each of these sets into its own function

add comment
improve test

Herald edited edge metadata. · View Herald TranscriptDec 23 2016, 11:36 AM

Herald added subscribers: tony-tye, yaxunl. · View Herald Transcript

jvesely marked 3 inline comments as done.Dec 23 2016, 11:37 AM

LGTM

test/CodeGen/AMDGPU/amdgcn.sendmsg.ll
2–3	Can you change these to use GCN as the check prefix

This revision is now accepted and ready to land.Jan 4 2017, 8:35 AM

Closed by commit rL290977: AMDGPU/SI: Implement sendmsghalt intrinsic (authored by jvesely). · Explain WhyJan 4 2017, 10:17 AM

This revision was automatically updated to reflect the committed changes.

Diff 71473

include/llvm/IR/IntrinsicsAMDGPU.td

Context not available.
	// Instruction Intrinsics	// Instruction Intrinsics
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//

		def int_amdgcn_s_sendmsg : GCCBuiltin<"__builtin_amdgcn_s_sendmsg">,
		Intrinsic <[], [llvm_i32_ty, llvm_i32_ty], []>;
		def int_amdgcn_s_sendmsghalt : GCCBuiltin<"__builtin_amdgcn_s_sendmsghalt">,
		Intrinsic <[], [llvm_i32_ty, llvm_i32_ty], []>;
		arsenmUnsubmitted Done Reply Inline Actions These should have a comments explaining the arguments, at least mentioning that the implicit m0 argument is the last one arsenm: These should have a comments explaining the arguments, at least mentioning that the implicit m0…

	def int_amdgcn_s_barrier : GCCBuiltin<"__builtin_amdgcn_s_barrier">,	def int_amdgcn_s_barrier : GCCBuiltin<"__builtin_amdgcn_s_barrier">,
	Intrinsic<[], [], [IntrConvergent]>;	Intrinsic<[], [], [IntrConvergent]>;

Context not available.

lib/Target/AMDGPU/AMDGPUISelLowering.h

Context not available.
	/// Pointer to the start of the shader's constant data.	/// Pointer to the start of the shader's constant data.
	CONST_DATA_PTR,	CONST_DATA_PTR,
	SENDMSG,	SENDMSG,
		SENDMSGHALT,
	INTERP_MOV,	INTERP_MOV,
	INTERP_P1,	INTERP_P1,
	INTERP_P2,	INTERP_P2,
Context not available.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Context not available.
	NODE_NAME_CASE(KILL)	NODE_NAME_CASE(KILL)
	case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;	case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;
	NODE_NAME_CASE(SENDMSG)	NODE_NAME_CASE(SENDMSG)
		NODE_NAME_CASE(SENDMSGHALT)
	NODE_NAME_CASE(INTERP_MOV)	NODE_NAME_CASE(INTERP_MOV)
	NODE_NAME_CASE(INTERP_P1)	NODE_NAME_CASE(INTERP_P1)
	NODE_NAME_CASE(INTERP_P2)	NODE_NAME_CASE(INTERP_P2)
Context not available.

lib/Target/AMDGPU/AMDGPUInstrInfo.td

Context not available.
	SDTypeProfile<0, 1, [SDTCisInt<0>]>,	SDTypeProfile<0, 1, [SDTCisInt<0>]>,
	[SDNPHasChain, SDNPInGlue]>;	[SDNPHasChain, SDNPInGlue]>;

		def AMDGPUsendmsghalt : SDNode<"AMDGPUISD::SENDMSGHALT",
		SDTypeProfile<0, 1, [SDTCisInt<0>]>,
		[SDNPHasChain, SDNPInGlue]>;

	def AMDGPUinterp_mov : SDNode<"AMDGPUISD::INTERP_MOV",	def AMDGPUinterp_mov : SDNode<"AMDGPUISD::INTERP_MOV",
	SDTypeProfile<1, 3, [SDTCisFP<0>]>,	SDTypeProfile<1, 3, [SDTCisFP<0>]>,
	[SDNPInGlue]>;	[SDNPInGlue]>;
Context not available.

lib/Target/AMDGPU/SIISelLowering.cpp

Context not available.
	unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();	unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();

	switch (IntrinsicID) {	switch (IntrinsicID) {
	case AMDGPUIntrinsic::SI_sendmsg: {	case AMDGPUIntrinsic::SI_sendmsg:
		case Intrinsic::amdgcn_s_sendmsg: {
	Chain = copyToM0(DAG, Chain, DL, Op.getOperand(3));	Chain = copyToM0(DAG, Chain, DL, Op.getOperand(3));
	SDValue Glue = Chain.getValue(1);	SDValue Glue = Chain.getValue(1);
	return DAG.getNode(AMDGPUISD::SENDMSG, DL, MVT::Other, Chain,	return DAG.getNode(AMDGPUISD::SENDMSG, DL, MVT::Other, Chain,
	Op.getOperand(2), Glue);	Op.getOperand(2), Glue);
	}	}
		case Intrinsic::amdgcn_s_sendmsghalt: {
		Chain = copyToM0(DAG, Chain, DL, Op.getOperand(3));
		SDValue Glue = Chain.getValue(1);
		return DAG.getNode(AMDGPUISD::SENDMSGHALT, DL, MVT::Other, Chain,
		Op.getOperand(2), Glue);
		}
	case AMDGPUIntrinsic::SI_tbuffer_store: {	case AMDGPUIntrinsic::SI_tbuffer_store: {
	SDValue Ops[] = {	SDValue Ops[] = {
	Chain,	Chain,
Context not available.

lib/Target/AMDGPU/SIInsertWaits.cpp

Context not available.
	return;	return;

	// There must be "S_NOP 0" between an instruction writing M0 and S_SENDMSG.	// There must be "S_NOP 0" between an instruction writing M0 and S_SENDMSG.
	if (LastInstWritesM0 && I->getOpcode() == AMDGPU::S_SENDMSG) {	if (LastInstWritesM0 && (I->getOpcode() == AMDGPU::S_SENDMSG \|\| I->getOpcode() == AMDGPU::S_SENDMSGHALT)) {
	BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);	BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);
	LastInstWritesM0 = false;	LastInstWritesM0 = false;
	return;	return;
Context not available.
	// but we also want to wait for any other outstanding transfers before	// but we also want to wait for any other outstanding transfers before
	// signalling other hardware blocks	// signalling other hardware blocks
	if (I->getOpcode() == AMDGPU::S_BARRIER \|\|	if (I->getOpcode() == AMDGPU::S_BARRIER \|\|
	I->getOpcode() == AMDGPU::S_SENDMSG)	I->getOpcode() == AMDGPU::S_SENDMSG \|\|
		I->getOpcode() == AMDGPU::S_SENDMSGHALT)
	Required = LastIssued;	Required = LastIssued;
	else	else
	Required = handleOperands(*I);	Required = handleOperands(*I);
Context not available.

lib/Target/AMDGPU/SOPInstructions.td

Context not available.
	def S_SENDMSG : SOPP <0x00000010, (ins SendMsgImm:$simm16), "s_sendmsg $simm16",	def S_SENDMSG : SOPP <0x00000010, (ins SendMsgImm:$simm16), "s_sendmsg $simm16",
	[(AMDGPUsendmsg (i32 imm:$simm16))]	[(AMDGPUsendmsg (i32 imm:$simm16))]
	>;	>;

		def S_SENDMSGHALT : SOPP <0x00000011, (ins SendMsgImm:$simm16), "s_sendmsghalt $simm16",
		[(AMDGPUsendmsghalt (i32 imm:$simm16))]
		>;
	} // End Uses = [EXEC, M0]	} // End Uses = [EXEC, M0]

	def S_SENDMSGHALT : SOPP <0x00000011, (ins SendMsgImm:$simm16), "s_sendmsghalt $simm16">;
	def S_TRAP : SOPP <0x00000012, (ins i16imm:$simm16), "s_trap $simm16">;	def S_TRAP : SOPP <0x00000012, (ins i16imm:$simm16), "s_trap $simm16">;
	def S_ICACHE_INV : SOPP <0x00000013, (ins), "s_icache_inv"> {	def S_ICACHE_INV : SOPP <0x00000013, (ins), "s_icache_inv"> {
	let simm16 = 0;	let simm16 = 0;
Context not available.

test/CodeGen/AMDGPU/amdgcn.sendmsg-m0.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s

				; GCN-LABEL: {{^}}main:
				; GCN: s_mov_b32 m0, s0
				; VI-NEXT: s_nop 0
				; GCN-NEXT: sendmsg(MSG_GS_DONE, GS_OP_NOP)
				; GCN-NEXT: s_endpgm

				define amdgpu_gs void @main(i32 inreg %a) #0 {
				call void @llvm.amdgcn.s.sendmsg(i32 3, i32 %a)
				ret void
				}

				; GCN-LABEL: {{^}}main_halt:
				; GCN: s_mov_b32 m0, s0
				; VI-NEXT: s_nop 0
				; GCN-NEXT: s_sendmsghalt sendmsg(MSG_INTERRUPT)
				; GCN-NEXT: s_endpgm

				define void @main_halt(i32 inreg %a) #0 {
				call void @llvm.amdgcn.s.sendmsghalt(i32 1, i32 %a)
				ret void
				}

				; GCN-LABEL: {{^}}legacy:
				; GCN: s_mov_b32 m0, s0
				; VI-NEXT: s_nop 0
				; GCN-NEXT: sendmsg(MSG_GS_DONE, GS_OP_NOP)
				; GCN-NEXT: s_endpgm

				define amdgpu_gs void @legacy(i32 inreg %a) #0 {
				call void @llvm.SI.sendmsg(i32 3, i32 %a)
				ret void
				}

				declare void @llvm.amdgcn.s.sendmsg(i32, i32) #0
				declare void @llvm.amdgcn.s.sendmsghalt(i32, i32) #0
				declare void @llvm.SI.sendmsg(i32, i32) #0

				attributes #0 = { nounwind }

test/CodeGen/AMDGPU/amdgcn.sendmsg.ll

This file was added.

				;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s
				;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s

				arsenmUnsubmitted Not Done Reply Inline Actions Can you change these to use GCN as the check prefix arsenm: Can you change these to use GCN as the check prefix
				; CHECK-LABEL: {{^}}main:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_INTERRUPT)
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT, 0)
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_CUT, 1)
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
				; CHECK: s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_NOP)
				; CHECK: s_sendmsghalt sendmsg(MSG_INTERRUPT)
				; CHECK: s_sendmsghalt sendmsg(MSG_GS, GS_OP_EMIT, 0)
				; CHECK: s_sendmsghalt sendmsg(MSG_GS, GS_OP_CUT, 1)
				; CHECK: s_sendmsghalt sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
				; CHECK: s_sendmsghalt sendmsg(MSG_GS_DONE, GS_OP_NOP)

				; Legacy
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT, 0)
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_CUT, 1)
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
				; CHECK: s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_NOP)
				define void @main() {
				arsenmUnsubmitted Done Reply Inline Actions better name would be test_sendmsg or something arsenm: better name would be test_sendmsg or something
				main_body:
				call void @llvm.amdgcn.s.sendmsg(i32 1, i32 0);
				call void @llvm.amdgcn.s.sendmsg(i32 34, i32 0);
				call void @llvm.amdgcn.s.sendmsg(i32 274, i32 0);
				call void @llvm.amdgcn.s.sendmsg(i32 562, i32 0);
				call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0);

				call void @llvm.amdgcn.s.sendmsghalt(i32 1, i32 0);
				call void @llvm.amdgcn.s.sendmsghalt(i32 34, i32 0);
				call void @llvm.amdgcn.s.sendmsghalt(i32 274, i32 0);
				call void @llvm.amdgcn.s.sendmsghalt(i32 562, i32 0);
				call void @llvm.amdgcn.s.sendmsghalt(i32 3, i32 0);
				arsenmUnsubmitted Done Reply Inline Actions I would split each of these sets into its own function arsenm: I would split each of these sets into its own function

				call void @llvm.SI.sendmsg(i32 34, i32 0);
				call void @llvm.SI.sendmsg(i32 274, i32 0);
				call void @llvm.SI.sendmsg(i32 562, i32 0);
				call void @llvm.SI.sendmsg(i32 3, i32 0);
				ret void
				}

				; Function Attrs: nounwind
				declare void @llvm.amdgcn.s.sendmsg(i32, i32) #0
				declare void @llvm.amdgcn.s.sendmsghalt(i32, i32) #0
				declare void @llvm.SI.sendmsg(i32, i32) #0

				attributes #0 = { nounwind }

test/CodeGen/AMDGPU/llvm.SI.sendmsg-m0.ll

This file was deleted.

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s

	; GCN-LABEL: {{^}}main:
	; GCN: s_mov_b32 m0, s0
	; VI-NEXT: s_nop 0
	; GCN-NEXT: sendmsg(MSG_GS_DONE, GS_OP_NOP)
	; GCN-NEXT: s_endpgm

	define amdgpu_gs void @main(i32 inreg %a) #0 {
	call void @llvm.SI.sendmsg(i32 3, i32 %a)
	ret void
	}

	declare void @llvm.SI.sendmsg(i32, i32) #0

	attributes #0 = { nounwind }

test/CodeGen/AMDGPU/llvm.SI.sendmsg.ll

This file was deleted.

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s

	; CHECK-LABEL: {{^}}main:
	; CHECK: s_mov_b32 m0, 0
	; CHECK-NOT: s_mov_b32 m0
	; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT, 0)
	; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_CUT, 1)
	; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
	; CHECK: s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_NOP)

	define void @main() {
	main_body:
	call void @llvm.SI.sendmsg(i32 34, i32 0);
	call void @llvm.SI.sendmsg(i32 274, i32 0);
	call void @llvm.SI.sendmsg(i32 562, i32 0);
	call void @llvm.SI.sendmsg(i32 3, i32 0);
	ret void
	}

	; Function Attrs: nounwind
	declare void @llvm.SI.sendmsg(i32, i32) #0

	attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement sendmsghalt intrinsic
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 71473

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInsertWaits.cpp

lib/Target/AMDGPU/SOPInstructions.td

test/CodeGen/AMDGPU/amdgcn.sendmsg-m0.ll

test/CodeGen/AMDGPU/amdgcn.sendmsg.ll

test/CodeGen/AMDGPU/llvm.SI.sendmsg-m0.ll

test/CodeGen/AMDGPU/llvm.SI.sendmsg.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement sendmsghalt intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 71473

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInsertWaits.cpp

lib/Target/AMDGPU/SOPInstructions.td

test/CodeGen/AMDGPU/amdgcn.sendmsg-m0.ll

test/CodeGen/AMDGPU/amdgcn.sendmsg.ll

test/CodeGen/AMDGPU/llvm.SI.sendmsg-m0.ll

test/CodeGen/AMDGPU/llvm.SI.sendmsg.ll

AMDGPU/SI: Implement sendmsghalt intrinsic
ClosedPublic