This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics.
ClosedPublic

Authored by cfang on Feb 25 2016, 10:02 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rG24f035af32ef: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and…
rL262356: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and…

Summary

These are instructions introduced in VI+ Chips. We defined the instructions in this patch, and introduce intrinsics
llvm.amdgcn.ds.permute/llvm.amdgcn.ds.bpermute to expose them.

Diff Detail

Event Timeline

cfang updated this revision to Diff 49084.Feb 25 2016, 10:02 AM

cfang retitled this revision from to AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics..

cfang updated this object.

cfang added reviewers: arsenm, • tstellarAMD.

cfang added subscribers: arsenm, llvm-commits.

arsenm added inline comments.Feb 25 2016, 11:13 AM

lib/Target/AMDGPU/SIInstrInfo.cpp
230–231	Checking just offset0 should be sufficient, and can be moved above
lib/Target/AMDGPU/VIInstructions.td
131	Are we sure these don't real M0?
test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
3–4	I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups)

• tstellarAMD added inline comments.Feb 25 2016, 2:39 PM

lib/Target/AMDGPU/VIInstructions.td
131	It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0.

Update the patch based on Matt's Review:

Check only Offset0Imm and move the check one line ahead.
It is safe to remove M0 from the Uses list for ds_permute.ds_bpermute.
split the LIT test for ds_permute and ds_bpermute separately.

cfang added inline comments.Feb 26 2016, 9:28 AM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
5–6	I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks.

arsenm added inline comments.Feb 27 2016, 12:36 AM

lib/Target/AMDGPU/VIInstructions.td
136–139	Why can't these be patterns on the instruction definition instead of standalone Pats?

Move the pattern for ds_permute intrinsic code generation into
the instruction definition, based on Matt's comment.

LGTM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll
6 ↗	(On Diff #49427)	Might want to check that there are 2 VGPR operands

This revision is now accepted and ready to land.Feb 29 2016, 3:13 PM

Closed by commit rL262356: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and… (authored by chfang). · Explain WhyMar 1 2016, 9:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

8 lines

lib/

Target/

AMDGPU/

SIInstrInfo.cpp

4 lines

SIInstrInfo.td

14 lines

VIInstructions.td

19 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.ds.permute.ll

23 lines

Diff 49084

include/llvm/IR/IntrinsicsAMDGPU.td

Context not available.
	GCCBuiltin<"__builtin_amdgcn_s_dcache_wb_vol">,	GCCBuiltin<"__builtin_amdgcn_s_dcache_wb_vol">,
	Intrinsic<[], [], []>;	Intrinsic<[], [], []>;

		// llvm.amdgcn.ds.permute <index> <src>
		def int_amdgcn_ds_permute :
		Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

		// llvm.amdgcn.ds.bpermute <index> <src>
		def int_amdgcn_ds_bpermute :
		Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

	}	}
Context not available.

lib/Target/AMDGPU/SIInstrInfo.cpp

Context not available.
	const MachineOperand Offset1Imm = getNamedOperand(LdSt,	const MachineOperand Offset1Imm = getNamedOperand(LdSt,
	AMDGPU::OpName::offset1);	AMDGPU::OpName::offset1);

		// DS_PERMUTE has no offset0 and offset1.
		if (!Offset0Imm \|\| !Offset1Imm)
		arsenmUnsubmitted Not Done Reply Inline Actions Checking just offset0 should be sufficient, and can be moved above arsenm: Checking just offset0 should be sufficient, and can be moved above
		return false;

	uint8_t Offset0 = Offset0Imm->getImm();	uint8_t Offset0 = Offset0Imm->getImm();
	uint8_t Offset1 = Offset1Imm->getImm();	uint8_t Offset1 = Offset1Imm->getImm();

Context not available.

lib/Target/AMDGPU/SIInstrInfo.td

Context not available.
	}	}
	}	}

		multiclass DS_1A1D_PERMUTE <bits<8> op, string opName, RegisterClass rc,
		dag outs = (outs rc:$vdst),
		dag ins = (ins VGPR_32:$addr, rc:$data0),
		string asm = opName#" $vdst, $addr, $data0"> {

		let mayLoad = 0, mayStore = 0, isConvergent = 1 in {
		def "" : DS_Pseudo <opName, outs, ins, []>;

		let data1 = 0, offset0 = 0, offset1 = 0, gds = 0 in {
		def "_vi" : DS_Real_vi <op, opName, outs, ins, asm>;
		}
		}
		}

	multiclass DS_1A2D_RET_m <bits<8> op, string opName, RegisterClass rc,	multiclass DS_1A2D_RET_m <bits<8> op, string opName, RegisterClass rc,
	string noRetOp = "", dag ins,	string noRetOp = "", dag ins,
	dag outs = (outs rc:$vdst),	dag outs = (outs rc:$vdst),
Context not available.

lib/Target/AMDGPU/VIInstructions.td

Context not available.
	(as_i32imm $bank_mask), (as_i32imm $row_mask))	(as_i32imm $bank_mask), (as_i32imm $row_mask))
	>;	>;

		//===----------------------------------------------------------------------===//
		// DS_PERMUTE/DS_BPERMUTE Instructions.
		//===----------------------------------------------------------------------===//

		let Uses = [EXEC] in {
		arsenmUnsubmitted Not Done Reply Inline Actions Are we sure these don't real M0? arsenm: Are we sure these don't real M0?
		tstellarAMDUnsubmitted Not Done Reply Inline Actions It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0. tstellarAMD: It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if…
		defm DS_PERMUTE_B32 : DS_1A1D_PERMUTE < 0x3e, "ds_permute_b32", VGPR_32>;
		defm DS_BPERMUTE_B32 : DS_1A1D_PERMUTE < 0x3f, "ds_bpermute_b32", VGPR_32>;
		}

		def : Pat <
		(int_amdgcn_ds_permute i32:$addr, i32:$data0),
		(DS_PERMUTE_B32 $addr, $data0)
		>;
		arsenmUnsubmitted Not Done Reply Inline Actions Why can't these be patterns on the instruction definition instead of standalone Pats? arsenm: Why can't these be patterns on the instruction definition instead of standalone Pats?

		def : Pat <
		(int_amdgcn_ds_bpermute i32:$addr, i32:$data0),
		(DS_BPERMUTE_B32 $addr, $data0)
		>;

	} // End Predicates = [isVI]	} // End Predicates = [isVI]
Context not available.

test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll

This file was added.

				; RUN: llc -march=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ds.permute(i32, i32) convergent
				declare i32 @llvm.amdgcn.ds.bpermute(i32, i32) convergent
				arsenmUnsubmitted Not Done Reply Inline Actions I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups) arsenm: I would prefer splitting the 2 separate intrinsics into separate patches. These are also…

				; FUNC-LABEL: {{^}}ds_permute:
				cfangAuthorUnsubmitted Not Done Reply Inline Actions I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks. cfang: I just split the test case. If you want to split the intrinsics and/or instruction definitions…
				; CHECK: ds_permute
				define void @ds_permute(i32 addrspace(1)* %out, i32 %index, i32 %src) nounwind {
				%bpermute = call i32 @llvm.amdgcn.ds.permute(i32 %index, i32 %src) convergent
				store i32 %bpermute, i32 addrspace(1)* %out, align 4
				ret void
				}


				; FUNC-LABEL: {{^}}ds_bpermute:
				; CHECK: ds_bpermute
				define void @ds_bpermute(i32 addrspace(1)* %out, i32 %index, i32 %src) nounwind {
				%bpermute = call i32 @llvm.amdgcn.ds.bpermute(i32 %index, i32 %src) convergent
				store i32 %bpermute, i32 addrspace(1)* %out, align 4
				ret void
				}