This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
ClosedPublic

Authored by mareko on Jan 2 2018, 4:25 AM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle

Commits

rG0f84e3ee0094: Merging r323908: --------------------------------------------------------------…
rG13e4741275a1: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
rL324103: Merging r323908:
rL323908: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}

Diff Detail

Build Status

Buildable 13467
Build 13467: arc lint + arc unit

Event Timeline

mareko created this revision.Jan 2 2018, 4:25 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptJan 2 2018, 4:25 AM

arsenm added inline comments.Jan 2 2018, 7:55 AM

include/llvm/IR/IntrinsicsAMDGPU.td
241–244	Since on some subtargets v2i16/v2f16 are legal, they should probably use that return type directly. This will require a little more work for the other subtargets in the custom lowering. Alternatively, aren't these all just pairs of convert (x / constant)? Can we just directly match that?

mareko added inline comments.Jan 2 2018, 8:43 AM

include/llvm/IR/IntrinsicsAMDGPU.td
241–244	Not sure how to add support for v2i16, but Mesa will never need v2i16 from these intrinsics. The intrinsics are non-trivial. We are talking about 10 or so instructions when emulated.

arsenm added inline comments.Jan 2 2018, 9:01 AM

include/llvm/IR/IntrinsicsAMDGPU.td
241–244	To add support you do the same thing that ReplaceNodeResults does for amdgcn_cvt_pkrtz. For targets without legal packed types, it just replaces it with i32 and casts back

Switched the return type to v2i16.

LGTM

This revision is now accepted and ready to land.Jan 3 2018, 10:23 AM

Closed by commit rL323908: AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} (authored by mareko). · Explain WhyJan 31 2018, 12:20 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 31 2018, 12:20 PM

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

20 lines

lib/

Target/

AMDGPU/

AMDGPUISelLowering.h

4 lines

AMDGPUISelLowering.cpp

4 lines

AMDGPUInstrInfo.td

8 lines

SIISelLowering.cpp

12 lines

VOP2Instructions.td

8 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

12 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.cvt.pk.i16.ll

79 lines

llvm.amdgcn.cvt.pk.u16.ll

79 lines

llvm.amdgcn.cvt.pknorm.i16.ll

155 lines

llvm.amdgcn.cvt.pknorm.u16.ll

155 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

108 lines

Diff 128410

include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	def int_amdgcn_fract : Intrinsic<
[llvm_anyfloat_ty], [LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]		[llvm_anyfloat_ty], [LLVMMatchType<0>], [IntrNoMem, IntrSpeculatable]
>;		>;

def int_amdgcn_cvt_pkrtz : Intrinsic<		def int_amdgcn_cvt_pkrtz : Intrinsic<
[llvm_v2f16_ty], [llvm_float_ty, llvm_float_ty],		[llvm_v2f16_ty], [llvm_float_ty, llvm_float_ty],
[IntrNoMem, IntrSpeculatable]		[IntrNoMem, IntrSpeculatable]
>;		>;

		def int_amdgcn_cvt_pknorm_i16 : Intrinsic<
		[llvm_i32_ty], [llvm_float_ty, llvm_float_ty],
		[IntrNoMem, IntrSpeculatable]
		>;
		arsenmUnsubmitted Not Done Reply Inline Actions Since on some subtargets v2i16/v2f16 are legal, they should probably use that return type directly. This will require a little more work for the other subtargets in the custom lowering. Alternatively, aren't these all just pairs of convert (x / constant)? Can we just directly match that? arsenm: Since on some subtargets v2i16/v2f16 are legal, they should probably use that return type…
		marekoAuthorUnsubmitted Not Done Reply Inline Actions Not sure how to add support for v2i16, but Mesa will never need v2i16 from these intrinsics. The intrinsics are non-trivial. We are talking about 10 or so instructions when emulated. mareko: Not sure how to add support for v2i16, but Mesa will never need v2i16 from these intrinsics.
		arsenmUnsubmitted Not Done Reply Inline Actions To add support you do the same thing that ReplaceNodeResults does for amdgcn_cvt_pkrtz. For targets without legal packed types, it just replaces it with i32 and casts back arsenm: To add support you do the same thing that ReplaceNodeResults does for amdgcn_cvt_pkrtz. For…

		def int_amdgcn_cvt_pknorm_u16 : Intrinsic<
		[llvm_i32_ty], [llvm_float_ty, llvm_float_ty],
		[IntrNoMem, IntrSpeculatable]
		>;

		def int_amdgcn_cvt_pk_i16 : Intrinsic<
		[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
		[IntrNoMem, IntrSpeculatable]
		>;

		def int_amdgcn_cvt_pk_u16 : Intrinsic<
		[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
		[IntrNoMem, IntrSpeculatable]
		>;

def int_amdgcn_class : Intrinsic<		def int_amdgcn_class : Intrinsic<
[llvm_i1_ty], [llvm_anyfloat_ty, llvm_i32_ty],		[llvm_i1_ty], [llvm_anyfloat_ty, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable]		[IntrNoMem, IntrSpeculatable]
>;		>;

def int_amdgcn_fmed3 : GCCBuiltin<"__builtin_amdgcn_fmed3">,		def int_amdgcn_fmed3 : GCCBuiltin<"__builtin_amdgcn_fmed3">,
Intrinsic<[llvm_anyfloat_ty],		Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
▲ Show 20 Lines • Show All 626 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CVT_F32_UBYTE0,		CVT_F32_UBYTE0,
CVT_F32_UBYTE1,		CVT_F32_UBYTE1,
CVT_F32_UBYTE2,		CVT_F32_UBYTE2,
CVT_F32_UBYTE3,		CVT_F32_UBYTE3,

// Convert two float 32 numbers into a single register holding two packed f16		// Convert two float 32 numbers into a single register holding two packed f16
// with round to zero.		// with round to zero.
CVT_PKRTZ_F16_F32,		CVT_PKRTZ_F16_F32,
		CVT_PKNORM_I16_F32,
		CVT_PKNORM_U16_F32,
		CVT_PK_I16_I32,
		CVT_PK_U16_U32,

// Same as the standard node, except the high bits of the resulting integer		// Same as the standard node, except the high bits of the resulting integer
// are known 0.		// are known 0.
FP_TO_FP16,		FP_TO_FP16,

// Wrapper around fp16 results that are known to zero the high bits.		// Wrapper around fp16 results that are known to zero the high bits.
FP16_ZEXT,		FP16_ZEXT,

▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 3,951 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(SAMPLEB)		NODE_NAME_CASE(SAMPLEB)
NODE_NAME_CASE(SAMPLED)		NODE_NAME_CASE(SAMPLED)
NODE_NAME_CASE(SAMPLEL)		NODE_NAME_CASE(SAMPLEL)
NODE_NAME_CASE(CVT_F32_UBYTE0)		NODE_NAME_CASE(CVT_F32_UBYTE0)
NODE_NAME_CASE(CVT_F32_UBYTE1)		NODE_NAME_CASE(CVT_F32_UBYTE1)
NODE_NAME_CASE(CVT_F32_UBYTE2)		NODE_NAME_CASE(CVT_F32_UBYTE2)
NODE_NAME_CASE(CVT_F32_UBYTE3)		NODE_NAME_CASE(CVT_F32_UBYTE3)
NODE_NAME_CASE(CVT_PKRTZ_F16_F32)		NODE_NAME_CASE(CVT_PKRTZ_F16_F32)
		NODE_NAME_CASE(CVT_PKNORM_I16_F32)
		NODE_NAME_CASE(CVT_PKNORM_U16_F32)
		NODE_NAME_CASE(CVT_PK_I16_I32)
		NODE_NAME_CASE(CVT_PK_U16_U32)
NODE_NAME_CASE(FP_TO_FP16)		NODE_NAME_CASE(FP_TO_FP16)
NODE_NAME_CASE(FP16_ZEXT)		NODE_NAME_CASE(FP16_ZEXT)
NODE_NAME_CASE(BUILD_VERTICAL_VECTOR)		NODE_NAME_CASE(BUILD_VERTICAL_VECTOR)
NODE_NAME_CASE(CONST_DATA_PTR)		NODE_NAME_CASE(CONST_DATA_PTR)
NODE_NAME_CASE(PC_ADD_REL_OFFSET)		NODE_NAME_CASE(PC_ADD_REL_OFFSET)
NODE_NAME_CASE(KILL)		NODE_NAME_CASE(KILL)
NODE_NAME_CASE(DUMMY_CHAIN)		NODE_NAME_CASE(DUMMY_CHAIN)
case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;		case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show All 29 Lines
	def AMDGPUFPClassOp : SDTypeProfile<1, 2,			def AMDGPUFPClassOp : SDTypeProfile<1, 2,
	[SDTCisInt<0>, SDTCisFP<1>, SDTCisInt<2>]			[SDTCisInt<0>, SDTCisFP<1>, SDTCisInt<2>]
	>;			>;

	def AMDGPUFPPackOp : SDTypeProfile<1, 2,			def AMDGPUFPPackOp : SDTypeProfile<1, 2,
	[SDTCisFP<1>, SDTCisSameAs<1, 2>]			[SDTCisFP<1>, SDTCisSameAs<1, 2>]
	>;			>;

				def AMDGPUIntPackOp : SDTypeProfile<1, 2,
				[SDTCisInt<1>, SDTCisSameAs<1, 2>]
				>;

	def AMDGPUDivScaleOp : SDTypeProfile<2, 3,			def AMDGPUDivScaleOp : SDTypeProfile<2, 3,
	[SDTCisFP<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisSameAs<0, 4>]			[SDTCisFP<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisSameAs<0, 4>]
	>;			>;

	// float, float, float, vcc			// float, float, float, vcc
	def AMDGPUFmasOp : SDTypeProfile<1, 4,			def AMDGPUFmasOp : SDTypeProfile<1, 4,
	[SDTCisFP<0>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisInt<4>]			[SDTCisFP<0>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisInt<4>]
	>;			>;
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	def AMDGPUrsq_legacy : SDNode<"AMDGPUISD::RSQ_LEGACY", SDTFPUnaryOp>;			def AMDGPUrsq_legacy : SDNode<"AMDGPUISD::RSQ_LEGACY", SDTFPUnaryOp>;

	// out = 1.0 / sqrt(a) result clamped to +/- max_float.			// out = 1.0 / sqrt(a) result clamped to +/- max_float.
	def AMDGPUrsq_clamp : SDNode<"AMDGPUISD::RSQ_CLAMP", SDTFPUnaryOp>;			def AMDGPUrsq_clamp : SDNode<"AMDGPUISD::RSQ_CLAMP", SDTFPUnaryOp>;

	def AMDGPUldexp : SDNode<"AMDGPUISD::LDEXP", AMDGPULdExpOp>;			def AMDGPUldexp : SDNode<"AMDGPUISD::LDEXP", AMDGPULdExpOp>;

	def AMDGPUpkrtz_f16_f32 : SDNode<"AMDGPUISD::CVT_PKRTZ_F16_F32", AMDGPUFPPackOp>;			def AMDGPUpkrtz_f16_f32 : SDNode<"AMDGPUISD::CVT_PKRTZ_F16_F32", AMDGPUFPPackOp>;
				def AMDGPUpknorm_i16_f32 : SDNode<"AMDGPUISD::CVT_PKNORM_I16_F32", AMDGPUFPPackOp>;
				def AMDGPUpknorm_u16_f32 : SDNode<"AMDGPUISD::CVT_PKNORM_U16_F32", AMDGPUFPPackOp>;
				def AMDGPUpk_i16_i32 : SDNode<"AMDGPUISD::CVT_PK_I16_I32", AMDGPUIntPackOp>;
				def AMDGPUpk_u16_u32 : SDNode<"AMDGPUISD::CVT_PK_U16_U32", AMDGPUIntPackOp>;
	def AMDGPUfp_to_f16 : SDNode<"AMDGPUISD::FP_TO_FP16" , SDTFPToIntOp>;			def AMDGPUfp_to_f16 : SDNode<"AMDGPUISD::FP_TO_FP16" , SDTFPToIntOp>;
	def AMDGPUfp16_zext : SDNode<"AMDGPUISD::FP16_ZEXT" , SDTFPToIntOp>;			def AMDGPUfp16_zext : SDNode<"AMDGPUISD::FP16_ZEXT" , SDTFPToIntOp>;


	def AMDGPUfp_class : SDNode<"AMDGPUISD::FP_CLASS", AMDGPUFPClassOp>;			def AMDGPUfp_class : SDNode<"AMDGPUISD::FP_CLASS", AMDGPUFPClassOp>;

	// out = max(a, b) a and b are floats, where a nan comparison fails.			// out = max(a, b) a and b are floats, where a nan comparison fails.
	// This is not commutative because this gives the second operand:			// This is not commutative because this gives the second operand:
	▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,433 Lines • ▼ Show 20 Lines	return DAG.getNode(AMDGPUISD::BFE_U32, DL, VT,
Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));		Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
case Intrinsic::amdgcn_cvt_pkrtz: {		case Intrinsic::amdgcn_cvt_pkrtz: {
// FIXME: Stop adding cast if v2f16 legal.		// FIXME: Stop adding cast if v2f16 legal.
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDValue Node = DAG.getNode(AMDGPUISD::CVT_PKRTZ_F16_F32, DL, MVT::i32,		SDValue Node = DAG.getNode(AMDGPUISD::CVT_PKRTZ_F16_F32, DL, MVT::i32,
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
return DAG.getNode(ISD::BITCAST, DL, VT, Node);		return DAG.getNode(ISD::BITCAST, DL, VT, Node);
}		}
		case Intrinsic::amdgcn_cvt_pknorm_i16:
		return DAG.getNode(AMDGPUISD::CVT_PKNORM_I16_F32, DL, MVT::i32,
		Op.getOperand(1), Op.getOperand(2));
		case Intrinsic::amdgcn_cvt_pknorm_u16:
		return DAG.getNode(AMDGPUISD::CVT_PKNORM_U16_F32, DL, MVT::i32,
		Op.getOperand(1), Op.getOperand(2));
		case Intrinsic::amdgcn_cvt_pk_i16:
		return DAG.getNode(AMDGPUISD::CVT_PK_I16_I32, DL, MVT::i32,
		Op.getOperand(1), Op.getOperand(2));
		case Intrinsic::amdgcn_cvt_pk_u16:
		return DAG.getNode(AMDGPUISD::CVT_PK_U16_U32, DL, MVT::i32,
		Op.getOperand(1), Op.getOperand(2));
case Intrinsic::amdgcn_wqm: {		case Intrinsic::amdgcn_wqm: {
SDValue Src = Op.getOperand(1);		SDValue Src = Op.getOperand(1);
return SDValue(DAG.getMachineNode(AMDGPU::WQM, DL, Src.getValueType(), Src),		return SDValue(DAG.getMachineNode(AMDGPU::WQM, DL, Src.getValueType(), Src),
0);		0);
}		}
case Intrinsic::amdgcn_wwm: {		case Intrinsic::amdgcn_wwm: {
SDValue Src = Op.getOperand(1);		SDValue Src = Op.getOperand(1);
return SDValue(DAG.getMachineNode(AMDGPU::WWM, DL, Src.getValueType(), Src),		return SDValue(DAG.getMachineNode(AMDGPU::WWM, DL, Src.getValueType(), Src),
▲ Show 20 Lines • Show All 2,849 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP2Instructions.td

	Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines
	} // End isConvergent = 1			} // End isConvergent = 1

	defm V_BFM_B32 : VOP2Inst <"v_bfm_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;			defm V_BFM_B32 : VOP2Inst <"v_bfm_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;
	defm V_BCNT_U32_B32 : VOP2Inst <"v_bcnt_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;			defm V_BCNT_U32_B32 : VOP2Inst <"v_bcnt_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>>;
	defm V_MBCNT_LO_U32_B32 : VOP2Inst <"v_mbcnt_lo_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_lo>;			defm V_MBCNT_LO_U32_B32 : VOP2Inst <"v_mbcnt_lo_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_lo>;
	defm V_MBCNT_HI_U32_B32 : VOP2Inst <"v_mbcnt_hi_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_hi>;			defm V_MBCNT_HI_U32_B32 : VOP2Inst <"v_mbcnt_hi_u32_b32", VOP_NO_EXT<VOP_I32_I32_I32>, int_amdgcn_mbcnt_hi>;
	defm V_LDEXP_F32 : VOP2Inst <"v_ldexp_f32", VOP_NO_EXT<VOP_F32_F32_I32>, AMDGPUldexp>;			defm V_LDEXP_F32 : VOP2Inst <"v_ldexp_f32", VOP_NO_EXT<VOP_F32_F32_I32>, AMDGPUldexp>;
	defm V_CVT_PKACCUM_U8_F32 : VOP2Inst <"v_cvt_pkaccum_u8_f32", VOP_NO_EXT<VOP_I32_F32_I32>>; // TODO: set "Uses = dst"			defm V_CVT_PKACCUM_U8_F32 : VOP2Inst <"v_cvt_pkaccum_u8_f32", VOP_NO_EXT<VOP_I32_F32_I32>>; // TODO: set "Uses = dst"
	defm V_CVT_PKNORM_I16_F32 : VOP2Inst <"v_cvt_pknorm_i16_f32", VOP_NO_EXT<VOP_I32_F32_F32>>;			defm V_CVT_PKNORM_I16_F32 : VOP2Inst <"v_cvt_pknorm_i16_f32", VOP_NO_EXT<VOP_I32_F32_F32>, AMDGPUpknorm_i16_f32>;
	defm V_CVT_PKNORM_U16_F32 : VOP2Inst <"v_cvt_pknorm_u16_f32", VOP_NO_EXT<VOP_I32_F32_F32>>;			defm V_CVT_PKNORM_U16_F32 : VOP2Inst <"v_cvt_pknorm_u16_f32", VOP_NO_EXT<VOP_I32_F32_F32>, AMDGPUpknorm_u16_f32>;
	defm V_CVT_PKRTZ_F16_F32 : VOP2Inst <"v_cvt_pkrtz_f16_f32", VOP_NO_EXT<VOP_I32_F32_F32>, AMDGPUpkrtz_f16_f32>;			defm V_CVT_PKRTZ_F16_F32 : VOP2Inst <"v_cvt_pkrtz_f16_f32", VOP_NO_EXT<VOP_I32_F32_F32>, AMDGPUpkrtz_f16_f32>;
	defm V_CVT_PK_U16_U32 : VOP2Inst <"v_cvt_pk_u16_u32", VOP_NO_EXT<VOP_I32_I32_I32>>;			defm V_CVT_PK_U16_U32 : VOP2Inst <"v_cvt_pk_u16_u32", VOP_NO_EXT<VOP_I32_I32_I32>, AMDGPUpk_u16_u32>;
	defm V_CVT_PK_I16_I32 : VOP2Inst <"v_cvt_pk_i16_i32", VOP_NO_EXT<VOP_I32_I32_I32>>;			defm V_CVT_PK_I16_I32 : VOP2Inst <"v_cvt_pk_i16_i32", VOP_NO_EXT<VOP_I32_I32_I32>, AMDGPUpk_i16_i32>;

	} // End SubtargetPredicate = isGCN			} // End SubtargetPredicate = isGCN

	def : GCNPat<			def : GCNPat<
	(AMDGPUadde i32:$src0, i32:$src1, i1:$src2),			(AMDGPUadde i32:$src0, i32:$src1, i1:$src2),
	(V_ADDC_U32_e64 $src0, $src1, $src2)			(V_ADDC_U32_e64 $src0, $src1, $src2)
	>;			>;

	▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 3,258 Lines • ▼ Show 20 Lines	if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {
}		}
}		}

if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1))		if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1))
return replaceInstUsesWith(*II, UndefValue::get(II->getType()));		return replaceInstUsesWith(*II, UndefValue::get(II->getType()));

break;		break;
}		}
		case Intrinsic::amdgcn_cvt_pknorm_i16:
		case Intrinsic::amdgcn_cvt_pknorm_u16:
		case Intrinsic::amdgcn_cvt_pk_i16:
		case Intrinsic::amdgcn_cvt_pk_u16: {
		Value *Src0 = II->getArgOperand(0);
		Value *Src1 = II->getArgOperand(1);

		if (isa<UndefValue>(Src0) && isa<UndefValue>(Src1))
		return replaceInstUsesWith(*II, UndefValue::get(II->getType()));

		break;
		}
case Intrinsic::amdgcn_ubfe:		case Intrinsic::amdgcn_ubfe:
case Intrinsic::amdgcn_sbfe: {		case Intrinsic::amdgcn_sbfe: {
// Decompose simple cases into standard shifts.		// Decompose simple cases into standard shifts.
Value *Src = II->getArgOperand(0);		Value *Src = II->getArgOperand(0);
if (isa<UndefValue>(Src))		if (isa<UndefValue>(Src))
return replaceInstUsesWith(*II, Src);		return replaceInstUsesWith(*II, Src);

unsigned Width;		unsigned Width;
▲ Show 20 Lines • Show All 1,117 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pk.i16.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s

				; GCN-LABEL: {{^}}s_cvt_pk_i16_i32:
				; GCN-DAG: s_load_dword [[X:s[0-9]+]], s[0:1], 0x{{b\|2c}}
				; GCN-DAG: s_load_dword [[SY:s[0-9]+]], s[0:1], 0x{{c\|30}}
				; GCN: v_mov_b32_e32 [[VY:v[0-9]+]], [[SY]]
				; SI: v_cvt_pk_i16_i32_e32 v{{[0-9]+}}, [[X]], [[VY]]
				; VI: v_cvt_pk_i16_i32 v{{[0-9]+}}, [[X]], [[VY]]
				define amdgpu_kernel void @s_cvt_pk_i16_i32(i32 addrspace(1)* %out, i32 %x, i32 %y) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pk.i16(i32 %x, i32 %y)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}s_cvt_pk_i16_samereg_i32:
				; GCN: s_load_dword [[X:s[0-9]+]]
				; GCN: v_cvt_pk_i16_i32{{(_e64)*}} v{{[0-9]+}}, [[X]], [[X]]
				define amdgpu_kernel void @s_cvt_pk_i16_samereg_i32(i32 addrspace(1)* %out, i32 %x) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pk.i16(i32 %x, i32 %x)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pk_i16_i32:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; SI: v_cvt_pk_i16_i32_e32 v{{[0-9]+}}, [[A]], [[B]]
				; VI: v_cvt_pk_i16_i32 v{{[0-9]+}}, [[A]], [[B]]
				define amdgpu_kernel void @v_cvt_pk_i16_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %a.ptr, i32 addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds i32, i32 addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile i32, i32 addrspace(1)* %a.gep
				%b = load volatile i32, i32 addrspace(1)* %b.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 %a, i32 %b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pk_i16_i32_reg_imm:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: v_cvt_pk_i16_i32{{(_e64)*}} v{{[0-9]+}}, [[A]], 1
				define amdgpu_kernel void @v_cvt_pk_i16_i32_reg_imm(i32 addrspace(1)* %out, i32 addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile i32, i32 addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 %a, i32 1)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pk_i16_i32_imm_reg:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; SI: v_cvt_pk_i16_i32_e32 v{{[0-9]+}}, 1, [[A]]
				; VI: v_cvt_pk_i16_i32 v{{[0-9]+}}, 1, [[A]]
				define amdgpu_kernel void @v_cvt_pk_i16_i32_imm_reg(i32 addrspace(1)* %out, i32 addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile i32, i32 addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 1, i32 %a)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				declare i32 @llvm.amdgcn.cvt.pk.i16(i32, i32) #1
				declare i32 @llvm.amdgcn.workitem.id.x() #1


				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pk.u16.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s

				; GCN-LABEL: {{^}}s_cvt_pk_u16_u32:
				; GCN-DAG: s_load_dword [[X:s[0-9]+]], s[0:1], 0x{{b\|2c}}
				; GCN-DAG: s_load_dword [[SY:s[0-9]+]], s[0:1], 0x{{c\|30}}
				; GCN: v_mov_b32_e32 [[VY:v[0-9]+]], [[SY]]
				; SI: v_cvt_pk_u16_u32_e32 v{{[0-9]+}}, [[X]], [[VY]]
				; VI: v_cvt_pk_u16_u32 v{{[0-9]+}}, [[X]], [[VY]]
				define amdgpu_kernel void @s_cvt_pk_u16_u32(i32 addrspace(1)* %out, i32 %x, i32 %y) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pk.u16(i32 %x, i32 %y)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}s_cvt_pk_u16_samereg_i32:
				; GCN: s_load_dword [[X:s[0-9]+]]
				; GCN: v_cvt_pk_u16_u32{{(_e64)*}} v{{[0-9]+}}, [[X]], [[X]]
				define amdgpu_kernel void @s_cvt_pk_u16_samereg_i32(i32 addrspace(1)* %out, i32 %x) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pk.u16(i32 %x, i32 %x)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pk_u16_u32:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; SI: v_cvt_pk_u16_u32_e32 v{{[0-9]+}}, [[A]], [[B]]
				; VI: v_cvt_pk_u16_u32 v{{[0-9]+}}, [[A]], [[B]]
				define amdgpu_kernel void @v_cvt_pk_u16_u32(i32 addrspace(1)* %out, i32 addrspace(1)* %a.ptr, i32 addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds i32, i32 addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile i32, i32 addrspace(1)* %a.gep
				%b = load volatile i32, i32 addrspace(1)* %b.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 %a, i32 %b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pk_u16_u32_reg_imm:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: v_cvt_pk_u16_u32{{(_e64)*}} v{{[0-9]+}}, [[A]], 1
				define amdgpu_kernel void @v_cvt_pk_u16_u32_reg_imm(i32 addrspace(1)* %out, i32 addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile i32, i32 addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 %a, i32 1)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pk_u16_u32_imm_reg:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; SI: v_cvt_pk_u16_u32_e32 v{{[0-9]+}}, 1, [[A]]
				; VI: v_cvt_pk_u16_u32 v{{[0-9]+}}, 1, [[A]]
				define amdgpu_kernel void @v_cvt_pk_u16_u32_imm_reg(i32 addrspace(1)* %out, i32 addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile i32, i32 addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 1, i32 %a)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				declare i32 @llvm.amdgcn.cvt.pk.u16(i32, i32) #1
				declare i32 @llvm.amdgcn.workitem.id.x() #1


				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pknorm.i16.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s

				; GCN-LABEL: {{^}}s_cvt_pknorm_i16_f32:
				; GCN-DAG: s_load_dword [[X:s[0-9]+]], s[0:1], 0x{{b\|2c}}
				; GCN-DAG: s_load_dword [[SY:s[0-9]+]], s[0:1], 0x{{c\|30}}
				; GCN: v_mov_b32_e32 [[VY:v[0-9]+]], [[SY]]
				; SI: v_cvt_pknorm_i16_f32_e32 v{{[0-9]+}}, [[X]], [[VY]]
				; VI: v_cvt_pknorm_i16_f32 v{{[0-9]+}}, [[X]], [[VY]]
				define amdgpu_kernel void @s_cvt_pknorm_i16_f32(i32 addrspace(1)* %out, float %x, float %y) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %x, float %y)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}s_cvt_pknorm_i16_samereg_f32:
				; GCN: s_load_dword [[X:s[0-9]+]]
				; GCN: v_cvt_pknorm_i16_f32{{(_e64)*}} v{{[0-9]+}}, [[X]], [[X]]
				define amdgpu_kernel void @s_cvt_pknorm_i16_samereg_f32(i32 addrspace(1)* %out, float %x) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %x, float %x)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; SI: v_cvt_pknorm_i16_f32_e32 v{{[0-9]+}}, [[A]], [[B]]
				; VI: v_cvt_pknorm_i16_f32 v{{[0-9]+}}, [[A]], [[B]]
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %a, float %b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32_reg_imm:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: v_cvt_pknorm_i16_f32{{(_e64)*}} v{{[0-9]+}}, [[A]], 1.0
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32_reg_imm(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %a, float 1.0)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32_imm_reg:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; SI: v_cvt_pknorm_i16_f32_e32 v{{[0-9]+}}, 1.0, [[A]]
				; VI: v_cvt_pknorm_i16_f32 v{{[0-9]+}}, 1.0, [[A]]
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32_imm_reg(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float 1.0, float %a)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32_fneg_lo:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_i16_f32{{(_e64)*}} v{{[0-9]+}}, -[[A]], [[B]]
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32_fneg_lo(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%neg.a = fsub float -0.0, %a
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %neg.a, float %b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32_fneg_hi:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_i16_f32{{(_e64)*}} v{{[0-9]+}}, [[A]], -[[B]]
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32_fneg_hi(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%neg.b = fsub float -0.0, %b
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %a, float %neg.b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32_fneg_lo_hi:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_i16_f32{{(_e64)*}} v{{[0-9]+}}, -[[A]], -[[B]]
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32_fneg_lo_hi(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%neg.a = fsub float -0.0, %a
				%neg.b = fsub float -0.0, %b
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %neg.a, float %neg.b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_i16_f32_fneg_fabs_lo_fneg_hi:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_i16_f32{{(_e64)*}} v{{[0-9]+}}, -\|[[A]]\|, -[[B]]
				define amdgpu_kernel void @v_cvt_pknorm_i16_f32_fneg_fabs_lo_fneg_hi(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%fabs.a = call float @llvm.fabs.f32(float %a)
				%neg.fabs.a = fsub float -0.0, %fabs.a
				%neg.b = fsub float -0.0, %b
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %neg.fabs.a, float %neg.b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				declare i32 @llvm.amdgcn.cvt.pknorm.i16(float, float) #1
				declare float @llvm.fabs.f32(float) #1
				declare i32 @llvm.amdgcn.workitem.id.x() #1


				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pknorm.u16.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=SI %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI %s

				; GCN-LABEL: {{^}}s_cvt_pknorm_u16_f32:
				; GCN-DAG: s_load_dword [[X:s[0-9]+]], s[0:1], 0x{{b\|2c}}
				; GCN-DAG: s_load_dword [[SY:s[0-9]+]], s[0:1], 0x{{c\|30}}
				; GCN: v_mov_b32_e32 [[VY:v[0-9]+]], [[SY]]
				; SI: v_cvt_pknorm_u16_f32_e32 v{{[0-9]+}}, [[X]], [[VY]]
				; VI: v_cvt_pknorm_u16_f32 v{{[0-9]+}}, [[X]], [[VY]]
				define amdgpu_kernel void @s_cvt_pknorm_u16_f32(i32 addrspace(1)* %out, float %x, float %y) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %x, float %y)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}s_cvt_pknorm_u16_samereg_f32:
				; GCN: s_load_dword [[X:s[0-9]+]]
				; GCN: v_cvt_pknorm_u16_f32{{(_e64)*}} v{{[0-9]+}}, [[X]], [[X]]
				define amdgpu_kernel void @s_cvt_pknorm_u16_samereg_f32(i32 addrspace(1)* %out, float %x) #0 {
				%result = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %x, float %x)
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; SI: v_cvt_pknorm_u16_f32_e32 v{{[0-9]+}}, [[A]], [[B]]
				; VI: v_cvt_pknorm_u16_f32 v{{[0-9]+}}, [[A]], [[B]]
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %a, float %b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32_reg_imm:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: v_cvt_pknorm_u16_f32{{(_e64)*}} v{{[0-9]+}}, [[A]], 1.0
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32_reg_imm(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %a, float 1.0)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32_imm_reg:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; SI: v_cvt_pknorm_u16_f32_e32 v{{[0-9]+}}, 1.0, [[A]]
				; VI: v_cvt_pknorm_u16_f32 v{{[0-9]+}}, 1.0, [[A]]
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32_imm_reg(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float 1.0, float %a)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32_fneg_lo:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_u16_f32{{(_e64)*}} v{{[0-9]+}}, -[[A]], [[B]]
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32_fneg_lo(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%neg.a = fsub float -0.0, %a
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %neg.a, float %b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32_fneg_hi:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_u16_f32{{(_e64)*}} v{{[0-9]+}}, [[A]], -[[B]]
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32_fneg_hi(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%neg.b = fsub float -0.0, %b
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %a, float %neg.b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32_fneg_lo_hi:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_u16_f32{{(_e64)*}} v{{[0-9]+}}, -[[A]], -[[B]]
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32_fneg_lo_hi(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%neg.a = fsub float -0.0, %a
				%neg.b = fsub float -0.0, %b
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %neg.a, float %neg.b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				; GCN-LABEL: {{^}}v_cvt_pknorm_u16_f32_fneg_fabs_lo_fneg_hi:
				; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
				; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
				; GCN: v_cvt_pknorm_u16_f32{{(_e64)*}} v{{[0-9]+}}, -\|[[A]]\|, -[[B]]
				define amdgpu_kernel void @v_cvt_pknorm_u16_f32_fneg_fabs_lo_fneg_hi(i32 addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				%tid.ext = sext i32 %tid to i64
				%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
				%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
				%out.gep = getelementptr inbounds i32, i32 addrspace(1)* %out, i64 %tid.ext
				%a = load volatile float, float addrspace(1)* %a.gep
				%b = load volatile float, float addrspace(1)* %b.gep
				%fabs.a = call float @llvm.fabs.f32(float %a)
				%neg.fabs.a = fsub float -0.0, %fabs.a
				%neg.b = fsub float -0.0, %b
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %neg.fabs.a, float %neg.b)
				store i32 %cvt, i32 addrspace(1)* %out.gep
				ret void
				}

				declare i32 @llvm.amdgcn.cvt.pknorm.u16(float, float) #1
				declare float @llvm.fabs.f32(float) #1
				declare i32 @llvm.amdgcn.workitem.id.x() #1


				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	Show First 20 Lines • Show All 717 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @constant_rtz_pkrtz(			; CHECK-LABEL: @constant_rtz_pkrtz(
	; CHECK: ret <2 x half> <half 0xH7BFF, half 0xH7BFF>			; CHECK: ret <2 x half> <half 0xH7BFF, half 0xH7BFF>
	define <2 x half> @constant_rtz_pkrtz() {			define <2 x half> @constant_rtz_pkrtz() {
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 65535.0, float 65535.0)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 65535.0, float 65535.0)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
				; llvm.amdgcn.cvt.pknorm.i16
				; --------------------------------------------------------------------

				declare i32 @llvm.amdgcn.cvt.pknorm.i16(float, float) nounwind readnone

				; CHECK-LABEL: @undef_lhs_cvt_pknorm_i16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float undef, float %y)
				define i32 @undef_lhs_cvt_pknorm_i16(float %y) {
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float undef, float %y)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_rhs_cvt_pknorm_i16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %x, float undef)
				define i32 @undef_rhs_cvt_pknorm_i16(float %x) {
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float %x, float undef)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_cvt_pknorm_i16(
				; CHECK: ret i32 undef
				define i32 @undef_cvt_pknorm_i16() {
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.i16(float undef, float undef)
				ret i32 %cvt
				}

				; --------------------------------------------------------------------
				; llvm.amdgcn.cvt.pknorm.u16
				; --------------------------------------------------------------------

				declare i32 @llvm.amdgcn.cvt.pknorm.u16(float, float) nounwind readnone

				; CHECK-LABEL: @undef_lhs_cvt_pknorm_u16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float undef, float %y)
				define i32 @undef_lhs_cvt_pknorm_u16(float %y) {
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float undef, float %y)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_rhs_cvt_pknorm_u16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %x, float undef)
				define i32 @undef_rhs_cvt_pknorm_u16(float %x) {
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float %x, float undef)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_cvt_pknorm_u16(
				; CHECK: ret i32 undef
				define i32 @undef_cvt_pknorm_u16() {
				%cvt = call i32 @llvm.amdgcn.cvt.pknorm.u16(float undef, float undef)
				ret i32 %cvt
				}

				; --------------------------------------------------------------------
				; llvm.amdgcn.cvt.pk.i16
				; --------------------------------------------------------------------

				declare i32 @llvm.amdgcn.cvt.pk.i16(i32, i32) nounwind readnone

				; CHECK-LABEL: @undef_lhs_cvt_pk_i16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 %y)
				define i32 @undef_lhs_cvt_pk_i16(i32 %y) {
				%cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 %y)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_rhs_cvt_pk_i16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 %x, i32 undef)
				define i32 @undef_rhs_cvt_pk_i16(i32 %x) {
				%cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 %x, i32 undef)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_cvt_pk_i16(
				; CHECK: ret i32 undef
				define i32 @undef_cvt_pk_i16() {
				%cvt = call i32 @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 undef)
				ret i32 %cvt
				}

				; --------------------------------------------------------------------
				; llvm.amdgcn.cvt.pk.u16
				; --------------------------------------------------------------------

				declare i32 @llvm.amdgcn.cvt.pk.u16(i32, i32) nounwind readnone

				; CHECK-LABEL: @undef_lhs_cvt_pk_u16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 %y)
				define i32 @undef_lhs_cvt_pk_u16(i32 %y) {
				%cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 %y)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_rhs_cvt_pk_u16(
				; CHECK: %cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 %x, i32 undef)
				define i32 @undef_rhs_cvt_pk_u16(i32 %x) {
				%cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 %x, i32 undef)
				ret i32 %cvt
				}

				; CHECK-LABEL: @undef_cvt_pk_u16(
				; CHECK: ret i32 undef
				define i32 @undef_cvt_pk_u16() {
				%cvt = call i32 @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 undef)
				ret i32 %cvt
				}

				; --------------------------------------------------------------------
	; llvm.amdgcn.ubfe			; llvm.amdgcn.ubfe
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.ubfe.i32(i32, i32, i32) nounwind readnone			declare i32 @llvm.amdgcn.ubfe.i32(i32, i32, i32) nounwind readnone
	declare i64 @llvm.amdgcn.ubfe.i64(i64, i32, i32) nounwind readnone			declare i64 @llvm.amdgcn.ubfe.i64(i64, i32, i32) nounwind readnone

	; CHECK-LABEL: @ubfe_var_i32(			; CHECK-LABEL: @ubfe_var_i32(
	; CHECK-NEXT: %bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 %width)			; CHECK-NEXT: %bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 %width)
	▲ Show 20 Lines • Show All 855 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 128410

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/VOP2Instructions.td

lib/Transforms/InstCombine/InstCombineCalls.cpp

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pk.i16.ll

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pk.u16.ll

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pknorm.i16.ll

test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pknorm.u16.ll

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
ClosedPublic