Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rGd48445d51392: AMDGPU/SI: Implement sendmsghalt intrinsic
rL290977: AMDGPU/SI: Implement sendmsghalt intrinsic

Diff Detail

Repository: rL LLVM

Event Timeline

jvesely updated this revision to Diff 68039.Aug 15 2016, 8:19 AM

jvesely retitled this revision from to AMDGPU/SI: Implement sendmsghalt intrinsic.

jvesely updated this object.

jvesely added a reviewer: • tstellarAMD.

jvesely set the repository for this revision to rL LLVM.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptAug 15 2016, 8:19 AM

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

In D23511#515722, @jvesely wrote:

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

SI.sendmsg also needs to be changed and replaced (as well as the rest of the intrinsics in the backend). The goal is to eventually fix any intrinsic design issues and fix the names when moving them to the public intrinsics

rename and expose both sendmsg intrinsics

In D23511#515739, @arsenm wrote:

In D23511#515722, @jvesely wrote:

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

SI.sendmsg also needs to be changed and replaced (as well as the rest of the intrinsics in the backend). The goal is to eventually fix any intrinsic design issues and fix the names when moving them to the public intrinsics

I understand that, although having a list of approaches considered deprecated would reduce some wasted effort.
My question was whether there are users of the old sendmsg intrinsic name that would break.

I'm not sure how to unbundle writing m0 if we want to expose sendmsg as __builtin.function, is there a generic way to write to m0 from high level language?

In D23511#516775, @jvesely wrote:

In D23511#515739, @arsenm wrote:

In D23511#515722, @jvesely wrote:

In D23511#515604, @arsenm wrote:

New intrinsics should go in include/llvm/IR/IntrinsicsAMDGPU.td, and have an amdgcn prefix. I also think it probably should not have a separate parameter for m0, and instead rely on llvm.write_register setting m0

I'd like it to keep it as close to sendmsg as possible. Do you know if there are compatibility issues if I rename SI_sendmsg to amdgcn_sendmsg?

SI.sendmsg also needs to be changed and replaced (as well as the rest of the intrinsics in the backend). The goal is to eventually fix any intrinsic design issues and fix the names when moving them to the public intrinsics

I understand that, although having a list of approaches considered deprecated would reduce some wasted effort.
My question was whether there are users of the old sendmsg intrinsic name that would break.

I'm not sure how to unbundle writing m0 if we want to expose sendmsg as __builtin.function, is there a generic way to write to m0 from high level language?

We can keep the old intrinsic working while adding the new one. A builtin would be needed for emitting the write to m0 since read/write_register are not directly exposed. My concern about this is what happens if you have something like:

llvm.write_register(m0)
%foo = load i32, i32 addrspace(3)*
llvm.amdgcn.s.sendmsg()

The lowering of the LDS access will insert initialization of m0 to -1, clobbering the old value. I'm not sure if it's better to either switch the M0 initialization lowering to copy the pre-existing value and restore after. I'm also not sure if we should keep considering m0 as an allocatable register

In D23511#517122, @arsenm wrote:

We can keep the old intrinsic working while adding the new one.

If we don't know of any users now, we probably won't know more in the future. I don't mind updating mine.
Let me know your preference, I have an alternative patch that keeps the old name ready.

A builtin would be needed for emitting the write to m0 since read/write_register are not directly exposed. My concern about this is what happens if you have something like:

llvm.write_register(m0)
%foo = load i32, i32 addrspace(3)*
llvm.amdgcn.s.sendmsg()

The lowering of the LDS access will insert initialization of m0 to -1, clobbering the old value. I'm not sure if it's better to either switch the M0 initialization lowering to copy the pre-existing value and restore after. I'm also not sure if we should keep considering m0 as an allocatable register

shouldn't register allocation (picking the same phys m0) and instruction scheduling figure out the hazard and either schedule the instructions correctly or spill?

anyway, It looks like proper handling of m0 would need a separate patch. I'd prefer to leave that for another time. This change just adds a halting copy of sendmsg, so both can be modified at the same time if necessary.

keep the old intrinsic around

Herald added subscribers: nhaehnle, wdng. · View Herald TranscriptSep 14 2016, 8:21 PM

arsenm added inline comments.Dec 20 2016, 9:27 AM

include/llvm/IR/IntrinsicsAMDGPU.td
107–110 ↗	(On Diff #71473)	These should have a comments explaining the arguments, at least mentioning that the implicit m0 argument is the last one
test/CodeGen/AMDGPU/amdgcn.sendmsg.ll
23 ↗	(On Diff #71473)	better name would be test_sendmsg or something
31–35 ↗	(On Diff #71473)	I would split each of these sets into its own function

add comment
improve test

Herald edited edge metadata. · View Herald TranscriptDec 23 2016, 11:36 AM

Herald added subscribers: tony-tye, yaxunl. · View Herald Transcript

jvesely marked 3 inline comments as done.Dec 23 2016, 11:37 AM

LGTM

test/CodeGen/AMDGPU/amdgcn.sendmsg.ll
1–2 ↗	(On Diff #82416)	Can you change these to use GCN as the check prefix

This revision is now accepted and ready to land.Jan 4 2017, 8:35 AM

Closed by commit rL290977: AMDGPU/SI: Implement sendmsghalt intrinsic (authored by jvesely). · Explain WhyJan 4 2017, 10:17 AM

This revision was automatically updated to reflect the committed changes.

Diff 83075

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	def int_amdgcn_dispatch_id :			def int_amdgcn_dispatch_id :
	GCCBuiltin<"__builtin_amdgcn_dispatch_id">,			GCCBuiltin<"__builtin_amdgcn_dispatch_id">,
	Intrinsic<[llvm_i64_ty], [], [IntrNoMem]>;			Intrinsic<[llvm_i64_ty], [], [IntrNoMem]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Instruction Intrinsics			// Instruction Intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				// The first parameter is s_sendmsg immediate (i16),
				// the second one is copied to m0
				def int_amdgcn_s_sendmsg : GCCBuiltin<"__builtin_amdgcn_s_sendmsg">,
				Intrinsic <[], [llvm_i32_ty, llvm_i32_ty], []>;
				def int_amdgcn_s_sendmsghalt : GCCBuiltin<"__builtin_amdgcn_s_sendmsghalt">,
				Intrinsic <[], [llvm_i32_ty, llvm_i32_ty], []>;

	def int_amdgcn_s_barrier : GCCBuiltin<"__builtin_amdgcn_s_barrier">,			def int_amdgcn_s_barrier : GCCBuiltin<"__builtin_amdgcn_s_barrier">,
	Intrinsic<[], [], [IntrConvergent]>;			Intrinsic<[], [], [IntrConvergent]>;

	def int_amdgcn_wave_barrier : GCCBuiltin<"__builtin_amdgcn_wave_barrier">,			def int_amdgcn_wave_barrier : GCCBuiltin<"__builtin_amdgcn_wave_barrier">,
	Intrinsic<[], [], [IntrConvergent]>;			Intrinsic<[], [], [IntrConvergent]>;

	def int_amdgcn_s_waitcnt : Intrinsic<[], [llvm_i32_ty], []>;			def int_amdgcn_s_waitcnt : Intrinsic<[], [llvm_i32_ty], []>;

	▲ Show 20 Lines • Show All 503 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// T0\|v.x\| \| \| \|		/// T0\|v.x\| \| \| \|
/// T1\|v.y\| \| \| \|		/// T1\|v.y\| \| \| \|
/// T2\|v.z\| \| \| \|		/// T2\|v.z\| \| \| \|
/// T3\|v.w\| \| \| \|		/// T3\|v.w\| \| \| \|
BUILD_VERTICAL_VECTOR,		BUILD_VERTICAL_VECTOR,
/// Pointer to the start of the shader's constant data.		/// Pointer to the start of the shader's constant data.
CONST_DATA_PTR,		CONST_DATA_PTR,
SENDMSG,		SENDMSG,
		SENDMSGHALT,
INTERP_MOV,		INTERP_MOV,
INTERP_P1,		INTERP_P1,
INTERP_P2,		INTERP_P2,
PC_ADD_REL_OFFSET,		PC_ADD_REL_OFFSET,
KILL,		KILL,
FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,		FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,
STORE_MSKOR,		STORE_MSKOR,
LOAD_CONSTANT,		LOAD_CONSTANT,
Show All 15 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 3,042 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(CVT_F32_UBYTE2)		NODE_NAME_CASE(CVT_F32_UBYTE2)
NODE_NAME_CASE(CVT_F32_UBYTE3)		NODE_NAME_CASE(CVT_F32_UBYTE3)
NODE_NAME_CASE(BUILD_VERTICAL_VECTOR)		NODE_NAME_CASE(BUILD_VERTICAL_VECTOR)
NODE_NAME_CASE(CONST_DATA_PTR)		NODE_NAME_CASE(CONST_DATA_PTR)
NODE_NAME_CASE(PC_ADD_REL_OFFSET)		NODE_NAME_CASE(PC_ADD_REL_OFFSET)
NODE_NAME_CASE(KILL)		NODE_NAME_CASE(KILL)
case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;		case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;
NODE_NAME_CASE(SENDMSG)		NODE_NAME_CASE(SENDMSG)
		NODE_NAME_CASE(SENDMSGHALT)
NODE_NAME_CASE(INTERP_MOV)		NODE_NAME_CASE(INTERP_MOV)
NODE_NAME_CASE(INTERP_P1)		NODE_NAME_CASE(INTERP_P1)
NODE_NAME_CASE(INTERP_P2)		NODE_NAME_CASE(INTERP_P2)
NODE_NAME_CASE(STORE_MSKOR)		NODE_NAME_CASE(STORE_MSKOR)
NODE_NAME_CASE(LOAD_CONSTANT)		NODE_NAME_CASE(LOAD_CONSTANT)
NODE_NAME_CASE(TBUFFER_STORE_FORMAT)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT)
NODE_NAME_CASE(ATOMIC_CMP_SWAP)		NODE_NAME_CASE(ATOMIC_CMP_SWAP)
NODE_NAME_CASE(ATOMIC_INC)		NODE_NAME_CASE(ATOMIC_INC)
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines
	>;			>;

	def AMDGPUfmed3 : SDNode<"AMDGPUISD::FMED3", SDTFPTernaryOp, []>;			def AMDGPUfmed3 : SDNode<"AMDGPUISD::FMED3", SDTFPTernaryOp, []>;

	def AMDGPUsendmsg : SDNode<"AMDGPUISD::SENDMSG",			def AMDGPUsendmsg : SDNode<"AMDGPUISD::SENDMSG",
	SDTypeProfile<0, 1, [SDTCisInt<0>]>,			SDTypeProfile<0, 1, [SDTCisInt<0>]>,
	[SDNPHasChain, SDNPInGlue]>;			[SDNPHasChain, SDNPInGlue]>;

				def AMDGPUsendmsghalt : SDNode<"AMDGPUISD::SENDMSGHALT",
				SDTypeProfile<0, 1, [SDTCisInt<0>]>,
				[SDNPHasChain, SDNPInGlue]>;

	def AMDGPUinterp_mov : SDNode<"AMDGPUISD::INTERP_MOV",			def AMDGPUinterp_mov : SDNode<"AMDGPUISD::INTERP_MOV",
	SDTypeProfile<1, 3, [SDTCisFP<0>]>,			SDTypeProfile<1, 3, [SDTCisFP<0>]>,
	[SDNPInGlue]>;			[SDNPInGlue]>;

	def AMDGPUinterp_p1 : SDNode<"AMDGPUISD::INTERP_P1",			def AMDGPUinterp_p1 : SDNode<"AMDGPUISD::INTERP_P1",
	SDTypeProfile<1, 3, [SDTCisFP<0>]>,			SDTypeProfile<1, 3, [SDTCisFP<0>]>,
	[SDNPInGlue, SDNPOutGlue]>;			[SDNPInGlue, SDNPOutGlue]>;

	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

	Show First 20 Lines • Show All 2,700 Lines • ▼ Show 20 Lines
	SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,			SDValue SITargetLowering::LowerINTRINSIC_VOID(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	MachineFunction &MF = DAG.getMachineFunction();			MachineFunction &MF = DAG.getMachineFunction();
	SDLoc DL(Op);			SDLoc DL(Op);
	SDValue Chain = Op.getOperand(0);			SDValue Chain = Op.getOperand(0);
	unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();			unsigned IntrinsicID = cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();

	switch (IntrinsicID) {			switch (IntrinsicID) {
	case AMDGPUIntrinsic::SI_sendmsg: {			case AMDGPUIntrinsic::SI_sendmsg:
				case Intrinsic::amdgcn_s_sendmsg: {
	Chain = copyToM0(DAG, Chain, DL, Op.getOperand(3));			Chain = copyToM0(DAG, Chain, DL, Op.getOperand(3));
	SDValue Glue = Chain.getValue(1);			SDValue Glue = Chain.getValue(1);
	return DAG.getNode(AMDGPUISD::SENDMSG, DL, MVT::Other, Chain,			return DAG.getNode(AMDGPUISD::SENDMSG, DL, MVT::Other, Chain,
	Op.getOperand(2), Glue);			Op.getOperand(2), Glue);
	}			}
				case Intrinsic::amdgcn_s_sendmsghalt: {
				Chain = copyToM0(DAG, Chain, DL, Op.getOperand(3));
				SDValue Glue = Chain.getValue(1);
				return DAG.getNode(AMDGPUISD::SENDMSGHALT, DL, MVT::Other, Chain,
				Op.getOperand(2), Glue);
				}
	case AMDGPUIntrinsic::SI_tbuffer_store: {			case AMDGPUIntrinsic::SI_tbuffer_store: {
	SDValue Ops[] = {			SDValue Ops[] = {
	Chain,			Chain,
	Op.getOperand(2),			Op.getOperand(2),
	Op.getOperand(3),			Op.getOperand(3),
	Op.getOperand(4),			Op.getOperand(4),
	Op.getOperand(5),			Op.getOperand(5),
	Op.getOperand(6),			Op.getOperand(6),
	▲ Show 20 Lines • Show All 1,797 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp

Show First 20 Lines • Show All 498 Lines • ▼ Show 20 Lines
}		}

void SIInsertWaits::handleSendMsg(MachineBasicBlock &MBB,		void SIInsertWaits::handleSendMsg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) {		MachineBasicBlock::iterator I) {
if (ST->getGeneration() < SISubtarget::VOLCANIC_ISLANDS)		if (ST->getGeneration() < SISubtarget::VOLCANIC_ISLANDS)
return;		return;

// There must be "S_NOP 0" between an instruction writing M0 and S_SENDMSG.		// There must be "S_NOP 0" between an instruction writing M0 and S_SENDMSG.
if (LastInstWritesM0 && I->getOpcode() == AMDGPU::S_SENDMSG) {		if (LastInstWritesM0 && (I->getOpcode() == AMDGPU::S_SENDMSG \|\| I->getOpcode() == AMDGPU::S_SENDMSGHALT)) {
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);
LastInstWritesM0 = false;		LastInstWritesM0 = false;
return;		return;
}		}

// Set whether this instruction sets M0		// Set whether this instruction sets M0
LastInstWritesM0 = false;		LastInstWritesM0 = false;

▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();

// Wait for everything before a barrier.		// Wait for everything before a barrier.
//		//
// S_SENDMSG implicitly waits for all outstanding LGKM transfers to finish,		// S_SENDMSG implicitly waits for all outstanding LGKM transfers to finish,
// but we also want to wait for any other outstanding transfers before		// but we also want to wait for any other outstanding transfers before
// signalling other hardware blocks		// signalling other hardware blocks
if ((I->getOpcode() == AMDGPU::S_BARRIER &&		if ((I->getOpcode() == AMDGPU::S_BARRIER &&
ST->needWaitcntBeforeBarrier()) \|\|		ST->needWaitcntBeforeBarrier()) \|\|
I->getOpcode() == AMDGPU::S_SENDMSG)		I->getOpcode() == AMDGPU::S_SENDMSG \|\|
		I->getOpcode() == AMDGPU::S_SENDMSGHALT)
Required = LastIssued;		Required = LastIssued;
else		else
Required = handleOperands(*I);		Required = handleOperands(*I);

Counters Increment = getHwCounts(*I);		Counters Increment = getHwCounts(*I);

if (countersNonZero(Required) \|\| countersNonZero(Increment))		if (countersNonZero(Required) \|\| countersNonZero(Increment))
increaseCounters(Required, DelayedWaitOn);		increaseCounters(Required, DelayedWaitOn);
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 822 Lines • ▼ Show 20 Lines

	def S_SETPRIO : SOPP <0x0000000f, (ins i16imm:$simm16), "s_setprio $simm16">;			def S_SETPRIO : SOPP <0x0000000f, (ins i16imm:$simm16), "s_setprio $simm16">;

	let Uses = [EXEC, M0] in {			let Uses = [EXEC, M0] in {
	// FIXME: Should this be mayLoad+mayStore?			// FIXME: Should this be mayLoad+mayStore?
	def S_SENDMSG : SOPP <0x00000010, (ins SendMsgImm:$simm16), "s_sendmsg $simm16",			def S_SENDMSG : SOPP <0x00000010, (ins SendMsgImm:$simm16), "s_sendmsg $simm16",
	[(AMDGPUsendmsg (i32 imm:$simm16))]			[(AMDGPUsendmsg (i32 imm:$simm16))]
	>;			>;

				def S_SENDMSGHALT : SOPP <0x00000011, (ins SendMsgImm:$simm16), "s_sendmsghalt $simm16",
				[(AMDGPUsendmsghalt (i32 imm:$simm16))]
				>;
	} // End Uses = [EXEC, M0]			} // End Uses = [EXEC, M0]

	def S_SENDMSGHALT : SOPP <0x00000011, (ins SendMsgImm:$simm16), "s_sendmsghalt $simm16">;
	def S_TRAP : SOPP <0x00000012, (ins i16imm:$simm16), "s_trap $simm16">;			def S_TRAP : SOPP <0x00000012, (ins i16imm:$simm16), "s_trap $simm16">;
	def S_ICACHE_INV : SOPP <0x00000013, (ins), "s_icache_inv"> {			def S_ICACHE_INV : SOPP <0x00000013, (ins), "s_icache_inv"> {
	let simm16 = 0;			let simm16 = 0;
	}			}
	def S_INCPERFLEVEL : SOPP <0x00000014, (ins i32imm:$simm16), "s_incperflevel $simm16",			def S_INCPERFLEVEL : SOPP <0x00000014, (ins i32imm:$simm16), "s_incperflevel $simm16",
	[(int_amdgcn_s_incperflevel SIMM16bit:$simm16)]> {			[(int_amdgcn_s_incperflevel SIMM16bit:$simm16)]> {
	let hasSideEffects = 1;			let hasSideEffects = 1;
	let mayLoad = 1;			let mayLoad = 1;
	▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/amdgcn.sendmsg-m0.ll

				; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s

				; GCN-LABEL: {{^}}main:
				; GCN: s_mov_b32 m0, s0
				; VI-NEXT: s_nop 0
				; GCN-NEXT: sendmsg(MSG_GS_DONE, GS_OP_NOP)
				; GCN-NEXT: s_endpgm

				define amdgpu_gs void @main(i32 inreg %a) #0 {
				call void @llvm.amdgcn.s.sendmsg(i32 3, i32 %a)
				ret void
				}

				; GCN-LABEL: {{^}}main_halt:
				; GCN: s_mov_b32 m0, s0
				; VI-NEXT: s_nop 0
				; GCN-NEXT: s_sendmsghalt sendmsg(MSG_INTERRUPT)
				; GCN-NEXT: s_endpgm

				define void @main_halt(i32 inreg %a) #0 {
				call void @llvm.amdgcn.s.sendmsghalt(i32 1, i32 %a)
				ret void
				}

				; GCN-LABEL: {{^}}legacy:
				; GCN: s_mov_b32 m0, s0
				; VI-NEXT: s_nop 0
				; GCN-NEXT: sendmsg(MSG_GS_DONE, GS_OP_NOP)
				; GCN-NEXT: s_endpgm

				define amdgpu_gs void @legacy(i32 inreg %a) #0 {
				call void @llvm.SI.sendmsg(i32 3, i32 %a)
				ret void
				}

				declare void @llvm.amdgcn.s.sendmsg(i32, i32) #0
				declare void @llvm.amdgcn.s.sendmsghalt(i32, i32) #0
				declare void @llvm.SI.sendmsg(i32, i32) #0

				attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AMDGPU/amdgcn.sendmsg.ll

				;RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck %s
				;RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s

				; CHECK-LABEL: {{^}}test_interrupt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_INTERRUPT)
				define void @test_interrupt() {
				body:
				call void @llvm.amdgcn.s.sendmsg(i32 1, i32 0);
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_emit:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT, 0)
				define void @test_gs_emit() {
				body:
				call void @llvm.amdgcn.s.sendmsg(i32 34, i32 0);
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_cut:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_CUT, 1)
				define void @test_gs_cut() {
				body:
				call void @llvm.amdgcn.s.sendmsg(i32 274, i32 0);
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_emit_cut:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
				define void @test_gs_emit_cut() {
				body:
				call void @llvm.amdgcn.s.sendmsg(i32 562, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_done:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_NOP)
				define void @test_gs_done() {
				body:
				call void @llvm.amdgcn.s.sendmsg(i32 3, i32 0)
				ret void
				}


				; CHECK-LABEL: {{^}}test_interrupt_halt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsghalt sendmsg(MSG_INTERRUPT)
				define void @test_interrupt_halt() {
				body:
				call void @llvm.amdgcn.s.sendmsghalt(i32 1, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_emit_halt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsghalt sendmsg(MSG_GS, GS_OP_EMIT, 0)
				define void @test_gs_emit_halt() {
				body:
				call void @llvm.amdgcn.s.sendmsghalt(i32 34, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_cut_halt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsghalt sendmsg(MSG_GS, GS_OP_CUT, 1)
				define void @test_gs_cut_halt() {
				body:
				call void @llvm.amdgcn.s.sendmsghalt(i32 274, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_emit_cut_halt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsghalt sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
				define void @test_gs_emit_cut_halt() {
				body:
				call void @llvm.amdgcn.s.sendmsghalt(i32 562, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_gs_done_halt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsghalt sendmsg(MSG_GS_DONE, GS_OP_NOP)
				define void @test_gs_done_halt() {
				body:
				call void @llvm.amdgcn.s.sendmsghalt(i32 3, i32 0)
				ret void
				}

				; Legacy
				; CHECK-LABEL: {{^}}test_legacy_interrupt:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_INTERRUPT)
				define void @test_legacy_interrupt() {
				body:
				call void @llvm.SI.sendmsg(i32 1, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_legacy_gs_emit:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT, 0)
				define void @test_legacy_gs_emit() {
				body:
				call void @llvm.SI.sendmsg(i32 34, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_legacy_gs_cut:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_CUT, 1)
				define void @test_legacy_gs_cut() {
				body:
				call void @llvm.SI.sendmsg(i32 274, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_legacy_gs_emit_cut:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
				define void @test_legacy_gs_emit_cut() {
				body:
				call void @llvm.SI.sendmsg(i32 562, i32 0)
				ret void
				}

				; CHECK-LABEL: {{^}}test_legacy_gs_done:
				; CHECK: s_mov_b32 m0, 0
				; CHECK-NOT: s_mov_b32 m0
				; CHECK: s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_NOP)
				define void @test_legacy_gs_done() {
				body:
				call void @llvm.SI.sendmsg(i32 3, i32 0)
				ret void
				}

				; Function Attrs: nounwind
				declare void @llvm.amdgcn.s.sendmsg(i32, i32) #0
				declare void @llvm.amdgcn.s.sendmsghalt(i32, i32) #0
				declare void @llvm.SI.sendmsg(i32, i32) #0

				attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AMDGPU/llvm.SI.sendmsg-m0.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=VI -check-prefix=GCN %s

	; GCN-LABEL: {{^}}main:
	; GCN: s_mov_b32 m0, s0
	; VI-NEXT: s_nop 0
	; GCN-NEXT: sendmsg(MSG_GS_DONE, GS_OP_NOP)
	; GCN-NEXT: s_endpgm

	define amdgpu_gs void @main(i32 inreg %a) #0 {
	call void @llvm.SI.sendmsg(i32 3, i32 %a)
	ret void
	}

	declare void @llvm.SI.sendmsg(i32, i32) #0

	attributes #0 = { nounwind }

llvm/trunk/test/CodeGen/AMDGPU/llvm.SI.sendmsg.ll

	;RUN: llc < %s -march=amdgcn -mcpu=verde -verify-machineinstrs \| FileCheck %s
	;RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck %s

	; CHECK-LABEL: {{^}}main:
	; CHECK: s_mov_b32 m0, 0
	; CHECK-NOT: s_mov_b32 m0
	; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT, 0)
	; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_CUT, 1)
	; CHECK: s_sendmsg sendmsg(MSG_GS, GS_OP_EMIT_CUT, 2)
	; CHECK: s_sendmsg sendmsg(MSG_GS_DONE, GS_OP_NOP)

	define void @main() {
	main_body:
	call void @llvm.SI.sendmsg(i32 34, i32 0);
	call void @llvm.SI.sendmsg(i32 274, i32 0);
	call void @llvm.SI.sendmsg(i32 562, i32 0);
	call void @llvm.SI.sendmsg(i32 3, i32 0);
	ret void
	}

	; Function Attrs: nounwind
	declare void @llvm.SI.sendmsg(i32, i32) #0

	attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement sendmsghalt intrinsic
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83075

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/amdgcn.sendmsg-m0.ll

llvm/trunk/test/CodeGen/AMDGPU/amdgcn.sendmsg.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.SI.sendmsg-m0.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.SI.sendmsg.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement sendmsghalt intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83075

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/SIInsertWaits.cpp

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/amdgcn.sendmsg-m0.ll

llvm/trunk/test/CodeGen/AMDGPU/amdgcn.sendmsg.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.SI.sendmsg-m0.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.SI.sendmsg.ll

AMDGPU/SI: Implement sendmsghalt intrinsic
ClosedPublic