This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic
ClosedPublic

Authored by pravinjagtap on Jul 31 2023, 12:04 AM.

Download Raw Diff

Details

Reviewers

arsenm
foad
b-sumner
yassingh
cdevadas

Group Reviewers

Restricted Project

Commits

rGaf5fd142d352: [AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic

Summary

This will be useful to avoid the bit-casting noise
required to extend support for Floating Point
Operations in atomic optimizer for DPP in D156301

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pravinjagtap created this revision.Jul 31 2023, 12:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 12:04 AM

Herald added subscribers: foad, kerbowa, hiraditya and 5 others. · View Herald Transcript

pravinjagtap requested review of this revision.Jul 31 2023, 12:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 12:04 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

pravinjagtap edited the summary of this revision. (Show Details)Jul 31 2023, 12:07 AM

pravinjagtap added reviewers: foad, b-sumner, yassingh.

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 31 2023, 12:07 AM

Harbormaster completed remote builds in B249115: Diff 545519.Jul 31 2023, 1:09 AM

Did you run the clang tests? I expect this would break the builtin test

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
114	Add tests with the different immediate operands exercised

This revision now requires changes to proceed.Jul 31 2023, 1:45 PM

arsenm added inline comments.Jul 31 2023, 1:46 PM

llvm/lib/Target/AMDGPU/VOP1Instructions.td
1215–1221	Can factor this into a pattern class and just instantiate twice

pravinjagtap mentioned this in D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer..Aug 2 2023, 12:26 AM

In D156647#4548252, @arsenm wrote:

Did you run the clang tests? I expect this would break the builtin test

Yes, Clang tests are clean with this change.

Addressed review comments & added few more test points.

pravinjagtap added a reviewer: cdevadas.Aug 2 2023, 11:06 PM

Harbormaster completed remote builds in B249955: Diff 546704.Aug 3 2023, 12:36 AM

arsenm accepted this revision.Aug 15 2023, 4:46 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/VOP1Instructions.td
1215–1216	In a follow on, can/should handle all the legal types (v2i16 and v2f16 are easy, i16/f16 are potentially a little more work)
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
219	Nit: drop the call site attributes and use true/false instead of i1 0/1

This revision is now accepted and ready to land.Aug 15 2023, 4:46 PM

As a follow on could have AMDGPUInstCombineIntrinsic try to fold bitcasts in

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2180	this shouldn't break any existing clients, a name mangling param was needed anyway

pravinjagtap added inline comments.Aug 15 2023, 8:58 PM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2180	Are you referring to clients like LLPC ? Is there any way to make sure that its not breaking any of the clients ?

If there are no objections, I will go ahead and push this patch.

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2180	I ran LLPC pipeline, its clean. @foad

pravinjagtap added a reviewer: Restricted Project.Aug 16 2023, 10:53 PM

arsenm accepted this revision.Aug 17 2023, 6:50 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll
219	test nit not done

Addressed reveiw comments

This revision was landed with ongoing or failed builds.Aug 17 2023, 7:45 AM

Closed by commit rGaf5fd142d352: [AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic (authored by pravinjagtap). · Explain Why

This revision was automatically updated to reflect the committed changes.

pravinjagtap added a commit: rGaf5fd142d352: [AMDGPU] Extend f32 support for llvm.amdgcn.update.dpp intrinsic.

Harbormaster completed remote builds in B253216: Diff 551130.Aug 17 2023, 8:39 AM

pravinjagtap added a child revision: D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer..Aug 22 2023, 9:07 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

2 lines

lib/

Target/

AMDGPU/

VOP1Instructions.td

7 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.update.dpp.ll

129 lines

Diff 551133

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 2,171 Lines • ▼ Show 20 Lines	Intrinsic<[llvm_anyint_ty],
ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>,		ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>,
ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>, IntrNoCallback, IntrNoFree]>;		ImmArg<ArgIndex<3>>, ImmArg<ArgIndex<4>>, IntrNoCallback, IntrNoFree]>;

// llvm.amdgcn.update.dpp.i32 <old> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>		// llvm.amdgcn.update.dpp.i32 <old> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>
// Should be equivalent to:		// Should be equivalent to:
// v_mov_b32 <dest> <old>		// v_mov_b32 <dest> <old>
// v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>		// v_mov_b32 <dest> <src> <dpp_ctrl> <row_mask> <bank_mask> <bound_ctrl>
def int_amdgcn_update_dpp :		def int_amdgcn_update_dpp :
Intrinsic<[llvm_anyint_ty],		Intrinsic<[llvm_any_ty],
arsenmUnsubmitted Not Done Reply Inline Actions this shouldn't break any existing clients, a name mangling param was needed anyway arsenm: this shouldn't break any existing clients, a name mangling param was needed anyway
pravinjagtapAuthorUnsubmitted Done Reply Inline Actions Are you referring to clients like LLPC ? Is there any way to make sure that its not breaking any of the clients ? pravinjagtap: Are you referring to clients like LLPC ? Is there any way to make sure that its not breaking…
pravinjagtapAuthorUnsubmitted Done Reply Inline Actions I ran LLPC pipeline, its clean. @foad pravinjagtap: I ran LLPC pipeline, its clean. @foad
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty,		[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty,
llvm_i32_ty, llvm_i32_ty, llvm_i1_ty],		llvm_i32_ty, llvm_i32_ty, llvm_i1_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn,		[IntrNoMem, IntrConvergent, IntrWillReturn,
ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>,		ImmArg<ArgIndex<2>>, ImmArg<ArgIndex<3>>,
ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>, IntrNoCallback, IntrNoFree]>;		ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>, IntrNoCallback, IntrNoFree]>;

def int_amdgcn_s_dcache_wb :		def int_amdgcn_s_dcache_wb :
ClangBuiltin<"__builtin_amdgcn_s_dcache_wb">,		ClangBuiltin<"__builtin_amdgcn_s_dcache_wb">,
▲ Show 20 Lines • Show All 589 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOP1Instructions.td

	Show First 20 Lines • Show All 1,197 Lines • ▼ Show 20 Lines
	def : GCNPat <			def : GCNPat <
	(i32 (int_amdgcn_mov_dpp i32:$src, timm:$dpp_ctrl, timm:$row_mask,			(i32 (int_amdgcn_mov_dpp i32:$src, timm:$dpp_ctrl, timm:$row_mask,
	timm:$bank_mask, timm:$bound_ctrl)),			timm:$bank_mask, timm:$bound_ctrl)),
	(V_MOV_B32_dpp VGPR_32:$src, VGPR_32:$src, (as_i32timm $dpp_ctrl),			(V_MOV_B32_dpp VGPR_32:$src, VGPR_32:$src, (as_i32timm $dpp_ctrl),
	(as_i32timm $row_mask), (as_i32timm $bank_mask),			(as_i32timm $row_mask), (as_i32timm $bank_mask),
	(as_i1timm $bound_ctrl))			(as_i1timm $bound_ctrl))
	>;			>;

	def : GCNPat <			class UpdateDPPPat<ValueType vt> : GCNPat <
	(i32 (int_amdgcn_update_dpp i32:$old, i32:$src, timm:$dpp_ctrl,			(vt (int_amdgcn_update_dpp vt:$old, vt:$src, timm:$dpp_ctrl,
	timm:$row_mask, timm:$bank_mask,			timm:$row_mask, timm:$bank_mask,
	timm:$bound_ctrl)),			timm:$bound_ctrl)),
	(V_MOV_B32_dpp VGPR_32:$old, VGPR_32:$src, (as_i32timm $dpp_ctrl),			(V_MOV_B32_dpp VGPR_32:$old, VGPR_32:$src, (as_i32timm $dpp_ctrl),
	(as_i32timm $row_mask), (as_i32timm $bank_mask),			(as_i32timm $row_mask), (as_i32timm $bank_mask),
	(as_i1timm $bound_ctrl))			(as_i1timm $bound_ctrl))
	>;			>;

				def : UpdateDPPPat<i32>;
				def : UpdateDPPPat<f32>;
				arsenmUnsubmitted Not Done Reply Inline Actions In a follow on, can/should handle all the legal types (v2i16 and v2f16 are easy, i16/f16 are potentially a little more work) arsenm: In a follow on, can/should handle all the legal types (v2i16 and v2f16 are easy, i16/f16 are…

	} // End OtherPredicates = [isGFX8Plus]			} // End OtherPredicates = [isGFX8Plus]

	let OtherPredicates = [isGFX8Plus] in {			let OtherPredicates = [isGFX8Plus] in {
	def : GCNPat<			def : GCNPat<
				arsenmUnsubmitted Not Done Reply Inline Actions Can factor this into a pattern class and just instantiate twice arsenm: Can factor this into a pattern class and just instantiate twice
	(i32 (anyext i16:$src)),			(i32 (anyext i16:$src)),
	(COPY $src)			(COPY $src)
	>;			>;

	def : GCNPat<			def : GCNPat<
	(i64 (anyext i16:$src)),			(i64 (anyext i16:$src)),
	(REG_SEQUENCE VReg_64,			(REG_SEQUENCE VReg_64,
	(i32 (COPY $src)), sub0,			(i32 (COPY $src)), sub0,
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll

	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX8,GFX8-OPT,GCN-OPT %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX8,GFX8-OPT,GCN-OPT %s
	; RUN: llc -march=amdgcn -mcpu=tonga -O0 -mattr=-flat-for-global -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX8,GFX8-NOOPT %s			; RUN: llc -march=amdgcn -mcpu=tonga -O0 -mattr=-flat-for-global -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX8,GFX8-NOOPT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-flat-for-global -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10,GCN-OPT %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-flat-for-global -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10,GCN-OPT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=-flat-for-global -amdgpu-enable-vopd=0 -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX11,GCN-OPT %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=-flat-for-global -amdgpu-enable-vopd=0 -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX11,GCN-OPT %s

	; GCN-LABEL: {{^}}dpp_test:			; GCN-LABEL: {{^}}dpp_test:
	; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}			; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
	; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}			; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
	; GFX8-OPT: s_mov			; GFX8-OPT: s_mov
	; GFX8-OPT: s_mov			; GFX8-OPT: s_mov
	; GFX8-NOOPT: s_nop 1			; GFX8-NOOPT: s_nop 1
	; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	define amdgpu_kernel void @dpp_test(ptr addrspace(1) %out, i32 %in1, i32 %in2) {			define amdgpu_kernel void @dpp_test(ptr addrspace(1) %out, i32 %in1, i32 %in2) {
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 0) #0			%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 false) #0
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %tmp0, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}dpp_test_bc:			; GCN-LABEL: {{^}}dpp_test_bc:
	; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}			; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
	; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}			; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
	; GFX8-OPT: s_mov			; GFX8-OPT: s_mov
	; GFX8-OPT: s_mov			; GFX8-OPT: s_mov
	; GFX8-NOOPT: s_nop 1			; GFX8-NOOPT: s_nop 1
	; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[2,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:1{{$}}			; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[2,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:1{{$}}
	define amdgpu_kernel void @dpp_test_bc(ptr addrspace(1) %out, i32 %in1, i32 %in2) {			define amdgpu_kernel void @dpp_test_bc(ptr addrspace(1) %out, i32 %in1, i32 %in2) {
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 2, i32 1, i32 1, i1 1) #0			%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 2, i32 1, i32 1, i1 true) #0
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %tmp0, ptr addrspace(1) %out
	ret void			ret void
	}			}


	; GCN-LABEL: {{^}}dpp_test1:			; GCN-LABEL: {{^}}dpp_test1:
	; GFX10,GFX11: v_add_nc_u32_e32 [[REG:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}			; GFX10,GFX11: v_add_nc_u32_e32 [[REG:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
	; GFX8-OPT: v_add_u32_e32 [[REG:v[0-9]+]], vcc, v{{[0-9]+}}, v{{[0-9]+}}			; GFX8-OPT: v_add_u32_e32 [[REG:v[0-9]+]], vcc, v{{[0-9]+}}, v{{[0-9]+}}
	Show All 22 Lines
	; GCN-LABEL: {{^}}update_dpp64_test:			; GCN-LABEL: {{^}}update_dpp64_test:
	; GCN: load_{{dwordx2\|b64}} v[[[SRC_LO:[0-9]+]]:[[SRC_HI:[0-9]+]]]			; GCN: load_{{dwordx2\|b64}} v[[[SRC_LO:[0-9]+]]:[[SRC_HI:[0-9]+]]]
	; GCN-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GCN-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	define amdgpu_kernel void @update_dpp64_test(ptr addrspace(1) %arg, i64 %in1, i64 %in2) {			define amdgpu_kernel void @update_dpp64_test(ptr addrspace(1) %arg, i64 %in1, i64 %in2) {
	%id = tail call i32 @llvm.amdgcn.workitem.id.x()			%id = tail call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr inbounds i64, ptr addrspace(1) %arg, i32 %id			%gep = getelementptr inbounds i64, ptr addrspace(1) %arg, i32 %id
	%load = load i64, ptr addrspace(1) %gep			%load = load i64, ptr addrspace(1) %gep
	%tmp0 = call i64 @llvm.amdgcn.update.dpp.i64(i64 %in1, i64 %load, i32 1, i32 1, i32 1, i1 0) #0			%tmp0 = call i64 @llvm.amdgcn.update.dpp.i64(i64 %in1, i64 %load, i32 1, i32 1, i32 1, i1 false) #0
	store i64 %tmp0, ptr addrspace(1) %gep			store i64 %tmp0, ptr addrspace(1) %gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}update_dpp64_imm_old_test:			; GCN-LABEL: {{^}}update_dpp64_imm_old_test:
	; GCN-OPT-DAG: v_mov_b32_e32 v[[OLD_LO:[0-9]+]], 0x3afaedd9			; GCN-OPT-DAG: v_mov_b32_e32 v[[OLD_LO:[0-9]+]], 0x3afaedd9
	; GFX8-OPT-DAG,GFX10-DAG: v_mov_b32_e32 v[[OLD_HI:[0-9]+]], 0x7047			; GFX8-OPT-DAG,GFX10-DAG: v_mov_b32_e32 v[[OLD_HI:[0-9]+]], 0x7047
	; GFX11-DAG: v_mov_b32_e32 v[[OLD_HI:[0-9]+]], 0x7047			; GFX11-DAG: v_mov_b32_e32 v[[OLD_HI:[0-9]+]], 0x7047
	; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_LO:[0-9]+]], 0x3afaedd9			; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_LO:[0-9]+]], 0x3afaedd9
	; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_HI:[0-9]+]], 0x7047			; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_HI:[0-9]+]], 0x7047
	; GCN-DAG: load_{{dwordx2\|b64}} v[[[SRC_LO:[0-9]+]]:[[SRC_HI:[0-9]+]]]			; GCN-DAG: load_{{dwordx2\|b64}} v[[[SRC_LO:[0-9]+]]:[[SRC_HI:[0-9]+]]]
	; GCN-OPT-DAG: v_mov_b32_dpp v[[OLD_LO]], v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-OPT-DAG: v_mov_b32_dpp v[[OLD_LO]], v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GFX8-OPT-DAG,GFX10-DAG,GFX11-DAG: v_mov_b32_dpp v[[OLD_HI]], v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GFX8-OPT-DAG,GFX10-DAG,GFX11-DAG: v_mov_b32_dpp v[[OLD_HI]], v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	define amdgpu_kernel void @update_dpp64_imm_old_test(ptr addrspace(1) %arg, i64 %in2) {			define amdgpu_kernel void @update_dpp64_imm_old_test(ptr addrspace(1) %arg, i64 %in2) {
	%id = tail call i32 @llvm.amdgcn.workitem.id.x()			%id = tail call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr inbounds i64, ptr addrspace(1) %arg, i32 %id			%gep = getelementptr inbounds i64, ptr addrspace(1) %arg, i32 %id
	%load = load i64, ptr addrspace(1) %gep			%load = load i64, ptr addrspace(1) %gep
	%tmp0 = call i64 @llvm.amdgcn.update.dpp.i64(i64 123451234512345, i64 %load, i32 1, i32 1, i32 1, i1 0) #0			%tmp0 = call i64 @llvm.amdgcn.update.dpp.i64(i64 123451234512345, i64 %load, i32 1, i32 1, i32 1, i1 false) #0
	store i64 %tmp0, ptr addrspace(1) %gep			store i64 %tmp0, ptr addrspace(1) %gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}update_dpp64_imm_src_test:			; GCN-LABEL: {{^}}update_dpp64_imm_src_test:
	; GCN-OPT-DAG: v_mov_b32_e32 v[[OLD_LO:[0-9]+]], 0x3afaedd9			; GCN-OPT-DAG: v_mov_b32_e32 v[[OLD_LO:[0-9]+]], 0x3afaedd9
	; GCN-OPT-DAG: v_mov_b32_e32 v[[OLD_HI:[0-9]+]], 0x7047			; GCN-OPT-DAG: v_mov_b32_e32 v[[OLD_HI:[0-9]+]], 0x7047
	; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_LO:[0-9]+]], 0x3afaedd9			; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_LO:[0-9]+]], 0x3afaedd9
	; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_HI:[0-9]+]], 0x7047			; GFX8-NOOPT-DAG: s_mov_b32 s[[SOLD_HI:[0-9]+]], 0x7047
	; GCN-OPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[OLD_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-OPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[OLD_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GCN-OPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[OLD_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-OPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[OLD_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_LO]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}			; GCN-NOOPT-DAG: v_mov_b32_dpp v{{[0-9]+}}, v[[SRC_HI]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
	define amdgpu_kernel void @update_dpp64_imm_src_test(ptr addrspace(1) %out, i64 %in1) {			define amdgpu_kernel void @update_dpp64_imm_src_test(ptr addrspace(1) %out, i64 %in1) {
	%tmp0 = call i64 @llvm.amdgcn.update.dpp.i64(i64 %in1, i64 123451234512345, i32 1, i32 1, i32 1, i1 0) #0			%tmp0 = call i64 @llvm.amdgcn.update.dpp.i64(i64 %in1, i64 123451234512345, i32 1, i32 1, i32 1, i1 false) #0
	store i64 %tmp0, ptr addrspace(1) %out			store i64 %tmp0, ptr addrspace(1) %out
	ret void			ret void
	}			}

				; GCN-LABEL: {{^}}dpp_test_f32:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1{{$}}
				define amdgpu_kernel void @dpp_test_f32(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 1, i32 1, i32 1, i1 false)
				arsenmUnsubmitted Not Done Reply Inline Actions Add tests with the different immediate operands exercised arsenm: Add tests with the different immediate operands exercised
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb1:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[0,0,0,0] row_mask:0x0 bank_mask:0x0{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb1(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 0, i32 0, i32 0, i1 false)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb2:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[3,0,0,0] row_mask:0x3 bank_mask:0x3{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb2(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 3, i32 3, i32 3, i1 false)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb3:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[1,0,0,0] row_mask:0x2 bank_mask:0x3 bound_ctrl:1{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb3(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 1, i32 2, i32 3, i1 true)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb4:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[0,1,0,0] row_mask:0x3 bank_mask:0x2 bound_ctrl:1{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb4(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 4, i32 3, i32 2, i1 true)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb5:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[3,3,3,0] row_mask:0xe bank_mask:0xd bound_ctrl:1{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb5(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 63, i32 62, i32 61, i1 true)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb6:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[3,3,3,0] row_mask:0xf bank_mask:0xf bound_ctrl:1{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb6(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 63, i32 63, i32 63, i1 true)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}


				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb7:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[0,0,0,1] row_mask:0x0 bank_mask:0x0 bound_ctrl:1{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb7(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 64, i32 64, i32 64, i1 true)
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

				; GCN-LABEL: {{^}}dpp_test_f32_imm_comb8:
				; GCN: v_mov_b32_e32 [[DST:v[0-9]+]], s{{[0-9]+}}
				; GCN: v_mov_b32_e32 [[SRC:v[0-9]+]], s{{[0-9]+}}
				; GFX8-OPT: s_mov
				; GFX8-OPT: s_mov
				; GFX8-NOOPT: s_nop 1
				; GCN: v_mov_b32_dpp [[DST]], [[SRC]] quad_perm:[3,3,1,0] row_mask:0xf bank_mask:0x0 bound_ctrl:1{{$}}
				define amdgpu_kernel void @dpp_test_f32_imm_comb8(ptr addrspace(1) %out, float %in1, float %in2) {
				%tmp0 = call float @llvm.amdgcn.update.dpp.f32(float %in1, float %in2, i32 31, i32 63, i32 128, i1 true)
				arsenmUnsubmitted Not Done Reply Inline Actions Nit: drop the call site attributes and use true/false instead of i1 0/1 arsenm: Nit: drop the call site attributes and use true/false instead of i1 0/1
				arsenmUnsubmitted Not Done Reply Inline Actions test nit not done arsenm: test nit not done
				store float %tmp0, ptr addrspace(1) %out
				ret void
				}

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare void @llvm.amdgcn.s.barrier()			declare void @llvm.amdgcn.s.barrier()
	declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1) #0			declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1) #0
				declare float @llvm.amdgcn.update.dpp.f32(float, float, i32, i32, i32, i1) #0
	declare i64 @llvm.amdgcn.update.dpp.i64(i64, i64, i32, i32, i32, i1) #0			declare i64 @llvm.amdgcn.update.dpp.i64(i64, i64, i32, i32, i32, i1) #0

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }