This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics.
ClosedPublic

Authored by cfang on Feb 25 2016, 10:02 AM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rG24f035af32ef: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and…
rL262356: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and…

Summary

These are instructions introduced in VI+ Chips. We defined the instructions in this patch, and introduce intrinsics
llvm.amdgcn.ds.permute/llvm.amdgcn.ds.bpermute to expose them.

Diff Detail

Repository: rL LLVM

Event Timeline

cfang updated this revision to Diff 49084.Feb 25 2016, 10:02 AM

cfang retitled this revision from to AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics..

cfang updated this object.

cfang added reviewers: arsenm, • tstellarAMD.

cfang added subscribers: arsenm, llvm-commits.

arsenm added inline comments.Feb 25 2016, 11:13 AM

lib/Target/AMDGPU/SIInstrInfo.cpp
230–231 ↗	(On Diff #49084)	Checking just offset0 should be sufficient, and can be moved above
lib/Target/AMDGPU/VIInstructions.td
131 ↗	(On Diff #49084)	Are we sure these don't real M0?
test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
3–4 ↗	(On Diff #49084)	I would prefer splitting the 2 separate intrinsics into separate patches. These are also missing the readnone (which shoulda also use attribute groups)

• tstellarAMD added inline comments.Feb 25 2016, 2:39 PM

lib/Target/AMDGPU/VIInstructions.td
131 ↗	(On Diff #49084)	It reads M0, but it is supposed to ignore its value, so for our purposes we can treat it as if it doesn't read M0.

Update the patch based on Matt's Review:

Check only Offset0Imm and move the check one line ahead.
It is safe to remove M0 from the Uses list for ds_permute.ds_bpermute.
split the LIT test for ds_permute and ds_bpermute separately.

cfang added inline comments.Feb 26 2016, 9:28 AM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll
4–5 ↗	(On Diff #49202)	I just split the test case. If you want to split the intrinsics and/or instruction definitions, I can do it in the integration. Thanks.

arsenm added inline comments.Feb 27 2016, 12:36 AM

lib/Target/AMDGPU/VIInstructions.td
136–139 ↗	(On Diff #49202)	Why can't these be patterns on the instruction definition instead of standalone Pats?

Move the pattern for ds_permute intrinsic code generation into
the instruction definition, based on Matt's comment.

LGTM

test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll
6 ↗	(On Diff #49427)	Might want to check that there are 2 VGPR operands

This revision is now accepted and ready to land.Feb 29 2016, 3:13 PM

Closed by commit rL262356: AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and… (authored by chfang). · Explain WhyMar 1 2016, 9:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

9 lines

lib/

Target/

AMDGPU/

SIInstrInfo.cpp

4 lines

SIInstrInfo.td

17 lines

VIInstructions.td

11 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.ds.bpermute.ll

13 lines

llvm.amdgcn.ds.permute.ll

13 lines

Diff 49507

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 252 Lines • ▼ Show 20 Lines

	def int_amdgcn_s_dcache_wb_vol :			def int_amdgcn_s_dcache_wb_vol :
	GCCBuiltin<"__builtin_amdgcn_s_dcache_wb_vol">,			GCCBuiltin<"__builtin_amdgcn_s_dcache_wb_vol">,
	Intrinsic<[], [], []>;			Intrinsic<[], [], []>;

	def int_amdgcn_s_memrealtime :			def int_amdgcn_s_memrealtime :
	GCCBuiltin<"__builtin_amdgcn_s_memrealtime">,			GCCBuiltin<"__builtin_amdgcn_s_memrealtime">,
	Intrinsic<[llvm_i64_ty], [], []>;			Intrinsic<[llvm_i64_ty], [], []>;

				// llvm.amdgcn.ds.permute <index> <src>
				def int_amdgcn_ds_permute :
				Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

				// llvm.amdgcn.ds.bpermute <index> <src>
				def int_amdgcn_ds_bpermute :
				Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrNoMem, IntrConvergent]>;

	}			}

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	if (OffsetImm) {
return true;		return true;
}		}

// The 2 offset instructions use offset0 and offset1 instead. We can treat		// The 2 offset instructions use offset0 and offset1 instead. We can treat
// these as a load with a single offset if the 2 offsets are consecutive. We		// these as a load with a single offset if the 2 offsets are consecutive. We
// will use this for some partially aligned loads.		// will use this for some partially aligned loads.
const MachineOperand Offset0Imm = getNamedOperand(LdSt,		const MachineOperand Offset0Imm = getNamedOperand(LdSt,
AMDGPU::OpName::offset0);		AMDGPU::OpName::offset0);
		// DS_PERMUTE does not have Offset0Imm (and Offset1Imm).
		if (!Offset0Imm)
		return false;

const MachineOperand Offset1Imm = getNamedOperand(LdSt,		const MachineOperand Offset1Imm = getNamedOperand(LdSt,
AMDGPU::OpName::offset1);		AMDGPU::OpName::offset1);

uint8_t Offset0 = Offset0Imm->getImm();		uint8_t Offset0 = Offset0Imm->getImm();
uint8_t Offset1 = Offset1Imm->getImm();		uint8_t Offset1 = Offset1Imm->getImm();

if (Offset1 > Offset0 && Offset1 - Offset0 == 1) {		if (Offset1 > Offset0 && Offset1 - Offset0 == 1) {
// Each of these offsets is in element sized units, so we need to convert		// Each of these offsets is in element sized units, so we need to convert
▲ Show 20 Lines • Show All 2,701 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.td

Show First 20 Lines • Show All 2,403 Lines • ▼ Show 20 Lines	let hasPostISelHook = 1 in {

let data1 = 0 in {		let data1 = 0 in {
def _si : DS_Off16_Real_si <op, opName, outs, ins, asm>;		def _si : DS_Off16_Real_si <op, opName, outs, ins, asm>;
def _vi : DS_Off16_Real_vi <op, opName, outs, ins, asm>;		def _vi : DS_Off16_Real_vi <op, opName, outs, ins, asm>;
}		}
}		}
}		}

		multiclass DS_1A1D_PERMUTE <bits<8> op, string opName, RegisterClass rc,
		SDPatternOperator node = null_frag,
		dag outs = (outs rc:$vdst),
		dag ins = (ins VGPR_32:$addr, rc:$data0),
		string asm = opName#" $vdst, $addr, $data0"> {

		let mayLoad = 0, mayStore = 0, isConvergent = 1 in {
		def "" : DS_Pseudo <opName, outs, ins,
		[(set (i32 rc:$vdst),
		(node (i32 VGPR_32:$addr), (i32 rc:$data0)))]>;

		let data1 = 0, offset0 = 0, offset1 = 0, gds = 0 in {
		def "_vi" : DS_Real_vi <op, opName, outs, ins, asm>;
		}
		}
		}

multiclass DS_1A2D_RET_m <bits<8> op, string opName, RegisterClass rc,		multiclass DS_1A2D_RET_m <bits<8> op, string opName, RegisterClass rc,
string noRetOp = "", dag ins,		string noRetOp = "", dag ins,
dag outs = (outs rc:$vdst),		dag outs = (outs rc:$vdst),
string asm = opName#" $vdst, $addr, $data0, $data1"#"$offset"#"$gds"> {		string asm = opName#" $vdst, $addr, $data0, $data1"#"$offset"#"$gds"> {

let hasPostISelHook = 1 in {		let hasPostISelHook = 1 in {
def "" : DS_Pseudo <opName, outs, ins, []>,		def "" : DS_Pseudo <opName, outs, ins, []>,
AtomicNoRet<noRetOp, 1>;		AtomicNoRet<noRetOp, 1>;
▲ Show 20 Lines • Show All 838 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/VIInstructions.td

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	// Misc Patterns			// Misc Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : Pat <			def : Pat <
	(i64 (readcyclecounter)),			(i64 (readcyclecounter)),
	(S_MEMREALTIME)			(S_MEMREALTIME)
	>;			>;

				//===----------------------------------------------------------------------===//
				// DS_PERMUTE/DS_BPERMUTE Instructions.
				//===----------------------------------------------------------------------===//

				let Uses = [EXEC] in {
				defm DS_PERMUTE_B32 : DS_1A1D_PERMUTE <0x3e, "ds_permute_b32", VGPR_32,
				int_amdgcn_ds_permute>;
				defm DS_BPERMUTE_B32 : DS_1A1D_PERMUTE <0x3f, "ds_bpermute_b32", VGPR_32,
				int_amdgcn_ds_bpermute>;
				}

	} // End Predicates = [isVI]			} // End Predicates = [isVI]

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.ds.bpermute.ll

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ds.bpermute(i32, i32) #0

				; FUNC-LABEL: {{^}}ds_bpermute:
				; CHECK: ds_bpermute_b32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				define void @ds_bpermute(i32 addrspace(1)* %out, i32 %index, i32 %src) nounwind {
				%bpermute = call i32 @llvm.amdgcn.ds.bpermute(i32 %index, i32 %src) #0
				store i32 %bpermute, i32 addrspace(1)* %out, align 4
				ret void
				}

				attributes #0 = { nounwind readnone convergent }

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.ds.permute.ll

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.ds.permute(i32, i32) #0

				; FUNC-LABEL: {{^}}ds_permute:
				; CHECK: ds_permute_b32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				define void @ds_permute(i32 addrspace(1)* %out, i32 %index, i32 %src) nounwind {
				%bpermute = call i32 @llvm.amdgcn.ds.permute(i32 %index, i32 %src) #0
				store i32 %bpermute, i32 addrspace(1)* %out, align 4
				ret void
				}

				attributes #0 = { nounwind readnone convergent }