This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics.
ClosedPublic

Authored by cfang on Feb 25 2016, 10:02 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rG24f035af32ef: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and…
rL262356: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and…

Summary

These are instructions introduced in VI+ Chips. We defined the instructions in this patch, and introduce intrinsics
llvm.amdgcn.ds.permute/llvm.amdgcn.ds.bpermute to expose them.

Diff Detail

Event Timeline

cfang updated this revision to Diff 49084.Feb 25 2016, 10:02 AM

cfang retitled this revision from to AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics..

cfang updated this object.

cfang added reviewers: arsenm, • tstellarAMD.

cfang added subscribers: arsenm, llvm-commits.

arsenm added inline comments.Feb 25 2016, 11:13 AM

lib/Target/AMDGPU/SIInstrInfo.cpp
230–231	Checking just offset0 should be sufficient, and can be moved above
lib/Target/AMDGPU/VIInstructions.td
131	Are we sure these don't real M0?
test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
4–5	I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups)

• tstellarAMD added inline comments.Feb 25 2016, 2:39 PM

lib/Target/AMDGPU/VIInstructions.td
131	It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0.

Update the patch based on Matt's Review:

Check only Offset0Imm and move the check one line ahead.
It is safe to remove M0 from the Uses list for ds_permute.ds_bpermute.
split the LIT test for ds_permute and ds_bpermute separately.

cfang added inline comments.Feb 26 2016, 9:28 AM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
5–6	I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks.

arsenm added inline comments.Feb 27 2016, 12:36 AM

lib/Target/AMDGPU/VIInstructions.td
136–139	Why can't these be patterns on the instruction definition instead of standalone Pats?

Move the pattern for ds_permute intrinsic code generation into
the instruction definition, based on Matt's comment.

LGTM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll
6	Might want to check that there are 2 VGPR operands

This revision is now accepted and ready to land.Feb 29 2016, 3:13 PM

Closed by commit rL262356: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and… (authored by chfang). · Explain WhyMar 1 2016, 9:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

8 lines

lib/

Target/

AMDGPU/

SIInstrInfo.cpp

4 lines

SIInstrInfo.td

17 lines

VIInstructions.td

11 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.ds.bpermute.ll

13 lines

llvm.amdgcn.ds.permute.ll

13 lines

Diff 49427

include/llvm/IR/IntrinsicsAMDGPU.td

Context not available.
	GCCBuiltin<"__builtin_amdgcn_s_dcache_wb_vol">,	GCCBuiltin<"__builtin_amdgcn_s_dcache_wb_vol">,
	Intrinsic<[], [], []>;	Intrinsic<[], [], []>;

		// llvm.amdgcn.ds.permute <index> <src>
		def int_amdgcn_ds_permute :
		Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

		// llvm.amdgcn.ds.bpermute <index> <src>
		def int_amdgcn_ds_bpermute :
		Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

	}	}
Context not available.

lib/Target/AMDGPU/SIInstrInfo.cpp

Context not available.
	// will use this for some partially aligned loads.	// will use this for some partially aligned loads.
	const MachineOperand Offset0Imm = getNamedOperand(LdSt,	const MachineOperand Offset0Imm = getNamedOperand(LdSt,
	AMDGPU::OpName::offset0);	AMDGPU::OpName::offset0);
		// DS_PERMUTE does not have Offset0Imm (and Offset1Imm).
		if (!Offset0Imm)
		return false;

	const MachineOperand Offset1Imm = getNamedOperand(LdSt,	const MachineOperand Offset1Imm = getNamedOperand(LdSt,
		arsenmUnsubmitted Not Done Reply Inline Actions Checking just offset0 should be sufficient, and can be moved above arsenm: Checking just offset0 should be sufficient, and can be moved above
	AMDGPU::OpName::offset1);	AMDGPU::OpName::offset1);

Context not available.

lib/Target/AMDGPU/SIInstrInfo.td

Context not available.
	}	}
	}	}

		multiclass DS_1A1D_PERMUTE <bits<8> op, string opName, RegisterClass rc,
		SDPatternOperator node = null_frag,
		dag outs = (outs rc:$vdst),
		dag ins = (ins VGPR_32:$addr, rc:$data0),
		string asm = opName#" $vdst, $addr, $data0"> {

		let mayLoad = 0, mayStore = 0, isConvergent = 1 in {
		def "" : DS_Pseudo <opName, outs, ins,
		[(set (i32 rc:$vdst),
		(node (i32 VGPR_32:$addr), (i32 rc:$data0)))]>;

		let data1 = 0, offset0 = 0, offset1 = 0, gds = 0 in {
		def "_vi" : DS_Real_vi <op, opName, outs, ins, asm>;
		}
		}
		}

	multiclass DS_1A2D_RET_m <bits<8> op, string opName, RegisterClass rc,	multiclass DS_1A2D_RET_m <bits<8> op, string opName, RegisterClass rc,
	string noRetOp = "", dag ins,	string noRetOp = "", dag ins,
	dag outs = (outs rc:$vdst),	dag outs = (outs rc:$vdst),
Context not available.

lib/Target/AMDGPU/VIInstructions.td

Context not available.
	(as_i32imm $bank_mask), (as_i32imm $row_mask))	(as_i32imm $bank_mask), (as_i32imm $row_mask))
	>;	>;

		//===----------------------------------------------------------------------===//
		// DS_PERMUTE/DS_BPERMUTE Instructions.
		//===----------------------------------------------------------------------===//

		let Uses = [EXEC] in {
		arsenmUnsubmitted Not Done Reply Inline Actions Are we sure these don't real M0? arsenm: Are we sure these don't real M0?
		tstellarAMDUnsubmitted Not Done Reply Inline Actions It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0. tstellarAMD: It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if…
		defm DS_PERMUTE_B32 : DS_1A1D_PERMUTE <0x3e, "ds_permute_b32", VGPR_32,
		int_amdgcn_ds_permute>;
		defm DS_BPERMUTE_B32 : DS_1A1D_PERMUTE <0x3f, "ds_bpermute_b32", VGPR_32,
		int_amdgcn_ds_bpermute>;
		}

	} // End Predicates = [isVI]	} // End Predicates = [isVI]
Context not available.

test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll

This file was added.

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ds.bpermute(i32, i32) #0

				; FUNC-LABEL: {{^}}ds_bpermute:
				; CHECK: ds_bpermute
				arsenmUnsubmitted Not Done Reply Inline Actions Might want to check that there are 2 VGPR operands arsenm: Might want to check that there are 2 VGPR operands
				define void @ds_bpermute(i32 addrspace(1)* %out, i32 %index, i32 %src) nounwind {
				%bpermute = call i32 @llvm.amdgcn.ds.bpermute(i32 %index, i32 %src) #0
				store i32 %bpermute, i32 addrspace(1)* %out, align 4
				ret void
				}

				attributes #0 = { nounwind readnone convergent }

test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll

This file was added.

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ds.permute(i32, i32) #0

				; FUNC-LABEL: {{^}}ds_permute:
				arsenmUnsubmitted Not Done Reply Inline Actions I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups) arsenm: I would prefer splitting the 2 separate intrinsics into separate patches. These are also…
				; CHECK: ds_permute
				cfangAuthorUnsubmitted Not Done Reply Inline Actions I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks. cfang: I just split the test case. If you want to split the intrinsics and/or instruction definitions…
				define void @ds_permute(i32 addrspace(1)* %out, i32 %index, i32 %src) nounwind {
				%bpermute = call i32 @llvm.amdgcn.ds.permute(i32 %index, i32 %src) #0
				store i32 %bpermute, i32 addrspace(1)* %out, align 4
				ret void
				}

				attributes #0 = { nounwind readnone convergent }