Download Raw Diff

Details

Reviewers

dstuttard
arsenm
tpr

Group Reviewers

Restricted Project

Commits

rG824ca3f3dd85: [AMDGPU] Add intrinsics for 16 bit interpolation
rL352357: [AMDGPU] Add intrinsics for 16 bit interpolation

Summary

Added the intrinsics llvm.amdgcn.interp.p1.f16() and
llvm.amdgcn.interp.p2.f16() and related LIT test.

The p1 intrinsic generates code appropriate for both 16 and 32
bank LDS.

Diff Detail

Repository: rL LLVM

Event Timeline

timcorringham created this revision.May 11 2018, 6:48 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 6 others. · View Herald TranscriptMay 11 2018, 6:48 AM

timcorringham added reviewers: Restricted Project, dstuttard, arsenm, tpr.May 11 2018, 6:51 AM

arsenm requested changes to this revision.May 14 2018, 4:26 AM

arsenm added inline comments.

include/llvm/IR/IntrinsicsAMDGPU.td
1070 ↗	(On Diff #146317)	You should add name mangling to the existing intrinsics rather than new intrinsics. The builtin declaration needs to be done in clang for the GCCBuiltin

This revision now requires changes to proceed.May 14 2018, 4:26 AM

timcorringham added inline comments.May 15 2018, 4:56 AM

include/llvm/IR/IntrinsicsAMDGPU.td
1070 ↗	(On Diff #146317)	I now have the clang changes in D46871 (I have added the 32 bit interp builtins too as they were missing). I don't believe it is possible to overload these intrinsics as they have an extra operand compared to the 32 bit versions. Also apart from the extra operand the signature of the 16 bit p1 intrinsic is identical to the 32 bit one, so there iosn't any type difference to overload.

Corrected the ordering of operands to interp_p2_f16, added lowered
intrinsics to list of those that cware a source of divergence, and
amended LIT test.

I have not overloaded the intrinsics as I don't believe it is possible
in this case as they have an additional operand, and apart from that
additional operand the interp_p1_f16 has the same types as the 32 bit
version, so there are no type differences to provide disambiguation.

In D46754#1104736, @timcorringham wrote:

Corrected the ordering of operands to interp_p2_f16, added lowered
intrinsics to list of those that cware a source of divergence, and
amended LIT test.

I have not overloaded the intrinsics as I don't believe it is possible
in this case as they have an additional operand, and apart from that
additional operand the interp_p1_f16 has the same types as the 32 bit
version, so there are no type differences to provide disambiguation.

Is the extra parameter you're referring the high parameter to change where the register is read from the high or low bits? That shouldn't be exposed in the intrinsic at all. Eliminating the high bit extraction is a codegen optimization pattern

In D46754#1104799, @arsenm wrote:

In D46754#1104736, @timcorringham wrote:

Corrected the ordering of operands to interp_p2_f16, added lowered
intrinsics to list of those that cware a source of divergence, and
amended LIT test.

I have not overloaded the intrinsics as I don't believe it is possible
in this case as they have an additional operand, and apart from that
additional operand the interp_p1_f16 has the same types as the 32 bit
version, so there are no type differences to provide disambiguation.

Is the extra parameter you're referring the high parameter to change where the register is read from the high or low bits? That shouldn't be exposed in the intrinsic at all. Eliminating the high bit extraction is a codegen optimization pattern

Or is this bit controlling the weird load from memory? The manual isn't particularly clear to me. I see mention of LDs loads, but also op_sel control of destination bits

arsenm added inline comments.May 18 2018, 10:55 AM

lib/Target/AMDGPU/AMDGPUSearchableTables.td
47–48 ↗	(On Diff #147544)	Should get a test in test/DivergenceAnalysis

Even without the high operand I don't think it is possible to overload interp_p1 and interp_p1_f16 as they would have identical types - there is nothing to disambiguate them.

Or is this bit controlling the weird load from memory? The manual isn't particularly clear to me. I see mention of LDs loads, but also op_sel control of destination bits

Yes, the high bit controls the LDS access. As all the operands to interp_p1_f16 are the same types as for the 32 bit variant, I don't know of any way to deduce the value of the high bit if it isn't specified explicitly.

Added a divergence LIT test for the 16 bit interp intrinsics.

tpr added inline comments.May 22 2018, 11:34 AM

lib/Target/AMDGPU/VOP3Instructions.td
459 ↗	(On Diff #147796)	Don't forget to fix the problem found with this i1 in testing.

Change the omod operand type to be i32 rather than i1, to avoid
a build failure when building using a debug TableGen.

Harbormaster completed remote builds in B18472: Diff 148075.May 22 2018, 12:30 PM

[AMDGPU] Add intrinsics for 16 bit interpolation

Added a new pass to to ensure that the 16 bit interpolation
instructions use the round to zero rounding mode.

Herald added a subscriber: mgorny. · View Herald TranscriptJul 3 2018, 7:31 AM

Harbormaster completed remote builds in B19984: Diff 153913.Jul 3 2018, 7:31 AM

A slighly more performant implementation of the pass to add any
required changes to the double precision rounding mode.

Harbormaster completed remote builds in B20138: Diff 154558.Jul 9 2018, 3:04 AM

Refactored pass to insert rounding mode to use a style more in line
with other LLVM passes. This fails to optimize a few corner cases,
but they are expected to occur very rarely if at all.

Harbormaster completed remote builds in B20196: Diff 154774.Jul 10 2018, 4:12 AM

Changed mode register pass to use an explicit stack instead of recursion.

Removed the mode register pass, as that will be introduced as a
separate change.

Harbormaster completed remote builds in B20653: Diff 157007.Jul 24 2018, 4:51 AM

arsenm added inline comments.Jul 27 2018, 1:53 AM

test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll
1–3 ↗	(On Diff #157007)	Use -'s instead of _'s in the check prefixes
5–7 ↗	(On Diff #157007)	Might as well just use update_llc_test_checks at this point?

Updated the LIT test as per review comments.

Rebased, and amended LIT test now that the required mode register
pass has been committed.

Herald added a subscriber: jvesely. · View Herald TranscriptDec 18 2018, 1:53 AM

Harbormaster completed remote builds in B26098: Diff 178625.Dec 18 2018, 1:53 AM

arsenm added inline comments.Jan 22 2019, 2:14 PM

test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll
57 ↗	(On Diff #178625)	Can you add a test case with LDS usage to make sure m0 is properly restored after?

arsenm added inline comments.Jan 22 2019, 2:54 PM

lib/Target/AMDGPU/AMDGPUSearchableTables.td
47–48 ↗	(On Diff #147544)	Test still missing

Extended llvm.amdgcn.interp.f16.ll to check that m0 is set before
each interp instruction if necessary, and added a new LIT test
to check that the interp f16 intrinsics are identified as being
divergent.

Harbormaster completed remote builds in B27248: Diff 183328.Jan 24 2019, 9:31 AM

timcorringham marked 2 inline comments as done.Jan 24 2019, 9:33 AM

timcorringham added inline comments.

test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll
57 ↗	(On Diff #178625)	I have added test cases to check that m0 is set up before each of the interp f16 instructions if necessary. I have done this by explicitly writing to m0 rather than using LDS as I couldn't see a way to do the latter, and other tests use the technique of writing to m0.

LGTM

This revision is now accepted and ready to land.Jan 25 2019, 9:13 AM

Closed by commit rL352357: [AMDGPU] Add intrinsics for 16 bit interpolation (authored by timcorringham). · Explain WhyJan 28 2019, 5:48 AM

This revision was automatically updated to reflect the committed changes.

Diff 183836

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,159 Lines • ▼ Show 20 Lines
	// __builtin_amdgcn_interp_p2 <p1>, <j>, <attr_chan>, <attr>, <m0>			// __builtin_amdgcn_interp_p2 <p1>, <j>, <attr_chan>, <attr>, <m0>
	def int_amdgcn_interp_p2 :			def int_amdgcn_interp_p2 :
	GCCBuiltin<"__builtin_amdgcn_interp_p2">,			GCCBuiltin<"__builtin_amdgcn_interp_p2">,
	Intrinsic<[llvm_float_ty],			Intrinsic<[llvm_float_ty],
	[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],			[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
	[IntrNoMem, IntrSpeculatable]>;			[IntrNoMem, IntrSpeculatable]>;
	// See int_amdgcn_v_interp_p1 for why this is IntrNoMem.			// See int_amdgcn_v_interp_p1 for why this is IntrNoMem.

				// __builtin_amdgcn_interp_p1_f16 <i>, <attr_chan>, <attr>, <high>, <m0>
				def int_amdgcn_interp_p1_f16 :
				GCCBuiltin<"__builtin_amdgcn_interp_p1_f16">,
				Intrinsic<[llvm_float_ty],
				[llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty],
				[IntrNoMem, IntrSpeculatable]>;

				// __builtin_amdgcn_interp_p2_f16 <p1>, <j>, <attr_chan>, <attr>, <high>, <m0>
				def int_amdgcn_interp_p2_f16 :
				GCCBuiltin<"__builtin_amdgcn_interp_p2_f16">,
				Intrinsic<[llvm_half_ty],
				[llvm_float_ty, llvm_float_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i32_ty],
				[IntrNoMem, IntrSpeculatable]>;

	// Pixel shaders only: whether the current pixel is live (i.e. not a helper			// Pixel shaders only: whether the current pixel is live (i.e. not a helper
	// invocation for derivative computation).			// invocation for derivative computation).
	def int_amdgcn_ps_live : Intrinsic <			def int_amdgcn_ps_live : Intrinsic <
	[llvm_i1_ty],			[llvm_i1_ty],
	[],			[],
	[IntrNoMem]>;			[IntrNoMem]>;

	def int_amdgcn_mbcnt_lo :			def int_amdgcn_mbcnt_lo :
	▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CONST_DATA_PTR,		CONST_DATA_PTR,
INIT_EXEC,		INIT_EXEC,
INIT_EXEC_FROM_INPUT,		INIT_EXEC_FROM_INPUT,
SENDMSG,		SENDMSG,
SENDMSGHALT,		SENDMSGHALT,
INTERP_MOV,		INTERP_MOV,
INTERP_P1,		INTERP_P1,
INTERP_P2,		INTERP_P2,
		INTERP_P1LL_F16,
		INTERP_P1LV_F16,
		INTERP_P2_F16,
PC_ADD_REL_OFFSET,		PC_ADD_REL_OFFSET,
KILL,		KILL,
DUMMY_CHAIN,		DUMMY_CHAIN,
FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,		FIRST_MEM_OPCODE_NUMBER = ISD::FIRST_TARGET_MEMORY_OPCODE,
STORE_MSKOR,		STORE_MSKOR,
LOAD_CONSTANT,		LOAD_CONSTANT,
TBUFFER_STORE_FORMAT,		TBUFFER_STORE_FORMAT,
TBUFFER_STORE_FORMAT_X3,		TBUFFER_STORE_FORMAT_X3,
Show All 37 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 4,177 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;		case AMDGPUISD::FIRST_MEM_OPCODE_NUMBER: break;
NODE_NAME_CASE(INIT_EXEC)		NODE_NAME_CASE(INIT_EXEC)
NODE_NAME_CASE(INIT_EXEC_FROM_INPUT)		NODE_NAME_CASE(INIT_EXEC_FROM_INPUT)
NODE_NAME_CASE(SENDMSG)		NODE_NAME_CASE(SENDMSG)
NODE_NAME_CASE(SENDMSGHALT)		NODE_NAME_CASE(SENDMSGHALT)
NODE_NAME_CASE(INTERP_MOV)		NODE_NAME_CASE(INTERP_MOV)
NODE_NAME_CASE(INTERP_P1)		NODE_NAME_CASE(INTERP_P1)
NODE_NAME_CASE(INTERP_P2)		NODE_NAME_CASE(INTERP_P2)
		NODE_NAME_CASE(INTERP_P1LL_F16)
		NODE_NAME_CASE(INTERP_P1LV_F16)
		NODE_NAME_CASE(INTERP_P2_F16)
NODE_NAME_CASE(STORE_MSKOR)		NODE_NAME_CASE(STORE_MSKOR)
NODE_NAME_CASE(LOAD_CONSTANT)		NODE_NAME_CASE(LOAD_CONSTANT)
NODE_NAME_CASE(TBUFFER_STORE_FORMAT)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT)
NODE_NAME_CASE(TBUFFER_STORE_FORMAT_X3)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT_X3)
NODE_NAME_CASE(TBUFFER_STORE_FORMAT_D16)		NODE_NAME_CASE(TBUFFER_STORE_FORMAT_D16)
NODE_NAME_CASE(TBUFFER_LOAD_FORMAT)		NODE_NAME_CASE(TBUFFER_LOAD_FORMAT)
NODE_NAME_CASE(TBUFFER_LOAD_FORMAT_D16)		NODE_NAME_CASE(TBUFFER_LOAD_FORMAT_D16)
NODE_NAME_CASE(DS_ORDERED_COUNT)		NODE_NAME_CASE(DS_ORDERED_COUNT)
▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 363 Lines • ▼ Show 20 Lines
	def AMDGPUinterp_p1 : SDNode<"AMDGPUISD::INTERP_P1",			def AMDGPUinterp_p1 : SDNode<"AMDGPUISD::INTERP_P1",
	SDTypeProfile<1, 3, [SDTCisFP<0>]>,			SDTypeProfile<1, 3, [SDTCisFP<0>]>,
	[SDNPInGlue, SDNPOutGlue]>;			[SDNPInGlue, SDNPOutGlue]>;

	def AMDGPUinterp_p2 : SDNode<"AMDGPUISD::INTERP_P2",			def AMDGPUinterp_p2 : SDNode<"AMDGPUISD::INTERP_P2",
	SDTypeProfile<1, 4, [SDTCisFP<0>]>,			SDTypeProfile<1, 4, [SDTCisFP<0>]>,
	[SDNPInGlue]>;			[SDNPInGlue]>;

				def AMDGPUinterp_p1ll_f16 : SDNode<"AMDGPUISD::INTERP_P1LL_F16",
				SDTypeProfile<1, 7, [SDTCisFP<0>]>,
				[SDNPInGlue, SDNPOutGlue]>;

				def AMDGPUinterp_p1lv_f16 : SDNode<"AMDGPUISD::INTERP_P1LV_F16",
				SDTypeProfile<1, 9, [SDTCisFP<0>]>,
				[SDNPInGlue, SDNPOutGlue]>;

				def AMDGPUinterp_p2_f16 : SDNode<"AMDGPUISD::INTERP_P2_F16",
				SDTypeProfile<1, 8, [SDTCisFP<0>]>,
				[SDNPInGlue]>;

	def AMDGPUkill : SDNode<"AMDGPUISD::KILL", AMDGPUKillSDT,			def AMDGPUkill : SDNode<"AMDGPUISD::KILL", AMDGPUKillSDT,
	[SDNPHasChain, SDNPSideEffect]>;			[SDNPHasChain, SDNPSideEffect]>;

	// SI+ export			// SI+ export
	def AMDGPUExportOp : SDTypeProfile<0, 8, [			def AMDGPUExportOp : SDTypeProfile<0, 8, [
	SDTCisInt<0>, // i8 tgt			SDTCisInt<0>, // i8 tgt
	SDTCisInt<1>, // i8 en			SDTCisInt<1>, // i8 en
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUSearchableTables.td

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	}			}

	def : SourceOfDivergence<int_amdgcn_workitem_id_x>;			def : SourceOfDivergence<int_amdgcn_workitem_id_x>;
	def : SourceOfDivergence<int_amdgcn_workitem_id_y>;			def : SourceOfDivergence<int_amdgcn_workitem_id_y>;
	def : SourceOfDivergence<int_amdgcn_workitem_id_z>;			def : SourceOfDivergence<int_amdgcn_workitem_id_z>;
	def : SourceOfDivergence<int_amdgcn_interp_mov>;			def : SourceOfDivergence<int_amdgcn_interp_mov>;
	def : SourceOfDivergence<int_amdgcn_interp_p1>;			def : SourceOfDivergence<int_amdgcn_interp_p1>;
	def : SourceOfDivergence<int_amdgcn_interp_p2>;			def : SourceOfDivergence<int_amdgcn_interp_p2>;
				def : SourceOfDivergence<int_amdgcn_interp_p1_f16>;
				def : SourceOfDivergence<int_amdgcn_interp_p2_f16>;
	def : SourceOfDivergence<int_amdgcn_mbcnt_hi>;			def : SourceOfDivergence<int_amdgcn_mbcnt_hi>;
	def : SourceOfDivergence<int_amdgcn_mbcnt_lo>;			def : SourceOfDivergence<int_amdgcn_mbcnt_lo>;
	def : SourceOfDivergence<int_r600_read_tidig_x>;			def : SourceOfDivergence<int_r600_read_tidig_x>;
	def : SourceOfDivergence<int_r600_read_tidig_y>;			def : SourceOfDivergence<int_r600_read_tidig_y>;
	def : SourceOfDivergence<int_r600_read_tidig_z>;			def : SourceOfDivergence<int_r600_read_tidig_z>;
	def : SourceOfDivergence<int_amdgcn_atomic_inc>;			def : SourceOfDivergence<int_amdgcn_atomic_inc>;
	def : SourceOfDivergence<int_amdgcn_atomic_dec>;			def : SourceOfDivergence<int_amdgcn_atomic_dec>;
	def : SourceOfDivergence<int_amdgcn_ds_fadd>;			def : SourceOfDivergence<int_amdgcn_ds_fadd>;
	Show All 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,286 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
}		}
case Intrinsic::amdgcn_interp_p2: {		case Intrinsic::amdgcn_interp_p2: {
SDValue M0 = copyToM0(DAG, DAG.getEntryNode(), DL, Op.getOperand(5));		SDValue M0 = copyToM0(DAG, DAG.getEntryNode(), DL, Op.getOperand(5));
SDValue Glue = SDValue(M0.getNode(), 1);		SDValue Glue = SDValue(M0.getNode(), 1);
return DAG.getNode(AMDGPUISD::INTERP_P2, DL, MVT::f32, Op.getOperand(1),		return DAG.getNode(AMDGPUISD::INTERP_P2, DL, MVT::f32, Op.getOperand(1),
Op.getOperand(2), Op.getOperand(3), Op.getOperand(4),		Op.getOperand(2), Op.getOperand(3), Op.getOperand(4),
Glue);		Glue);
}		}
		case Intrinsic::amdgcn_interp_p1_f16: {
		SDValue M0 = copyToM0(DAG, DAG.getEntryNode(), DL, Op.getOperand(5));
		SDValue Glue = M0.getValue(1);
		if (getSubtarget()->getLDSBankCount() == 16) {
		// 16 bank LDS
		SDValue S = DAG.getNode(AMDGPUISD::INTERP_MOV, DL, MVT::f32,
		DAG.getConstant(2, DL, MVT::i32), // P0
		Op.getOperand(2), // Attrchan
		Op.getOperand(3), // Attr
		Glue);
		SDValue Ops[] = {
		Op.getOperand(1), // Src0
		Op.getOperand(2), // Attrchan
		Op.getOperand(3), // Attr
		DAG.getConstant(0, DL, MVT::i32), // $src0_modifiers
		S, // Src2 - holds two f16 values selected by high
		DAG.getConstant(0, DL, MVT::i32), // $src2_modifiers
		Op.getOperand(4), // high
		DAG.getConstant(0, DL, MVT::i1), // $clamp
		DAG.getConstant(0, DL, MVT::i32) // $omod
		};
		return DAG.getNode(AMDGPUISD::INTERP_P1LV_F16, DL, MVT::f32, Ops);
		} else {
		// 32 bank LDS
		SDValue Ops[] = {
		Op.getOperand(1), // Src0
		Op.getOperand(2), // Attrchan
		Op.getOperand(3), // Attr
		DAG.getConstant(0, DL, MVT::i32), // $src0_modifiers
		Op.getOperand(4), // high
		DAG.getConstant(0, DL, MVT::i1), // $clamp
		DAG.getConstant(0, DL, MVT::i32), // $omod
		Glue
		};
		return DAG.getNode(AMDGPUISD::INTERP_P1LL_F16, DL, MVT::f32, Ops);
		}
		}
		case Intrinsic::amdgcn_interp_p2_f16: {
		SDValue M0 = copyToM0(DAG, DAG.getEntryNode(), DL, Op.getOperand(6));
		SDValue Glue = SDValue(M0.getNode(), 1);
		SDValue Ops[] = {
		Op.getOperand(2), // Src0
		Op.getOperand(3), // Attrchan
		Op.getOperand(4), // Attr
		DAG.getConstant(0, DL, MVT::i32), // $src0_modifiers
		Op.getOperand(1), // Src2
		DAG.getConstant(0, DL, MVT::i32), // $src2_modifiers
		Op.getOperand(5), // high
		DAG.getConstant(0, DL, MVT::i1), // $clamp
		Glue
		};
		return DAG.getNode(AMDGPUISD::INTERP_P2_F16, DL, MVT::f16, Ops);
		}
case Intrinsic::amdgcn_sin:		case Intrinsic::amdgcn_sin:
return DAG.getNode(AMDGPUISD::SIN_HW, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::SIN_HW, DL, VT, Op.getOperand(1));

case Intrinsic::amdgcn_cos:		case Intrinsic::amdgcn_cos:
return DAG.getNode(AMDGPUISD::COS_HW, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::COS_HW, DL, VT, Op.getOperand(1));

case Intrinsic::amdgcn_log_clamp: {		case Intrinsic::amdgcn_log_clamp: {
if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)		if (Subtarget->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
▲ Show 20 Lines • Show All 4,458 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/VOP3Instructions.td

	Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines
	let SubtargetPredicate = Has16BitInsts, isCommutable = 1 in {			let SubtargetPredicate = Has16BitInsts, isCommutable = 1 in {

	let renamedInGFX9 = 1 in {			let renamedInGFX9 = 1 in {
	def V_MAD_U16 : VOP3Inst <"v_mad_u16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;			def V_MAD_U16 : VOP3Inst <"v_mad_u16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;
	def V_MAD_I16 : VOP3Inst <"v_mad_i16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;			def V_MAD_I16 : VOP3Inst <"v_mad_i16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;
	let FPDPRounding = 1 in {			let FPDPRounding = 1 in {
	def V_MAD_F16 : VOP3Inst <"v_mad_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, fmad>;			def V_MAD_F16 : VOP3Inst <"v_mad_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, fmad>;
	let Uses = [M0, EXEC] in {			let Uses = [M0, EXEC] in {
	def V_INTERP_P2_F16 : VOP3Interp <"v_interp_p2_f16", VOP3_INTERP16<[f16, f32, i32, f32]>>;			def V_INTERP_P2_F16 : VOP3Interp <"v_interp_p2_f16", VOP3_INTERP16<[f16, f32, i32, f32]>,
				[(set f16:$vdst, (AMDGPUinterp_p2_f16 f32:$src0, (i32 imm:$attrchan),
				(i32 imm:$attr),
				(i32 imm:$src0_modifiers),
				(f32 VRegSrc_32:$src2),
				(i32 imm:$src2_modifiers),
				(i1 imm:$high),
				(i1 imm:$clamp)))]>;
	} // End Uses = [M0, EXEC]			} // End Uses = [M0, EXEC]
	} // End FPDPRounding = 1			} // End FPDPRounding = 1
	} // End renamedInGFX9 = 1			} // End renamedInGFX9 = 1

	let SubtargetPredicate = isGFX9 in {			let SubtargetPredicate = isGFX9 in {
	def V_MAD_F16_gfx9 : VOP3Inst <"v_mad_f16_gfx9", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>> {			def V_MAD_F16_gfx9 : VOP3Inst <"v_mad_f16_gfx9", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>> {
	let FPDPRounding = 1;			let FPDPRounding = 1;
	}			}
	def V_MAD_U16_gfx9 : VOP3Inst <"v_mad_u16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;			def V_MAD_U16_gfx9 : VOP3Inst <"v_mad_u16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;
	def V_MAD_I16_gfx9 : VOP3Inst <"v_mad_i16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;			def V_MAD_I16_gfx9 : VOP3Inst <"v_mad_i16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;
	def V_INTERP_P2_F16_gfx9 : VOP3Interp <"v_interp_p2_f16_gfx9", VOP3_INTERP16<[f16, f32, i32, f32]>>;			def V_INTERP_P2_F16_gfx9 : VOP3Interp <"v_interp_p2_f16_gfx9", VOP3_INTERP16<[f16, f32, i32, f32]>>;
	} // End SubtargetPredicate = isGFX9			} // End SubtargetPredicate = isGFX9

	let Uses = [M0, EXEC], FPDPRounding = 1 in {			let Uses = [M0, EXEC], FPDPRounding = 1 in {
	def V_INTERP_P1LL_F16 : VOP3Interp <"v_interp_p1ll_f16", VOP3_INTERP16<[f32, f32, i32, untyped]>>;			def V_INTERP_P1LL_F16 : VOP3Interp <"v_interp_p1ll_f16", VOP3_INTERP16<[f32, f32, i32, untyped]>,
	def V_INTERP_P1LV_F16 : VOP3Interp <"v_interp_p1lv_f16", VOP3_INTERP16<[f32, f32, i32, f16]>>;			[(set f32:$vdst, (AMDGPUinterp_p1ll_f16 f32:$src0, (i32 imm:$attrchan),
				(i32 imm:$attr),
				(i32 imm:$src0_modifiers),
				(i1 imm:$high),
				(i1 imm:$clamp),
				(i32 imm:$omod)))]>;
				def V_INTERP_P1LV_F16 : VOP3Interp <"v_interp_p1lv_f16", VOP3_INTERP16<[f32, f32, i32, f16]>,
				[(set f32:$vdst, (AMDGPUinterp_p1lv_f16 f32:$src0, (i32 imm:$attrchan),
				(i32 imm:$attr),
				(i32 imm:$src0_modifiers),
				(f32 VRegSrc_32:$src2),
				(i32 imm:$src2_modifiers),
				(i1 imm:$high),
				(i1 imm:$clamp),
				(i32 imm:$omod)))]>;
	} // End Uses = [M0, EXEC], FPDPRounding = 1			} // End Uses = [M0, EXEC], FPDPRounding = 1

	} // End SubtargetPredicate = Has16BitInsts, isCommutable = 1			} // End SubtargetPredicate = Has16BitInsts, isCommutable = 1

	let SubtargetPredicate = isVI in {			let SubtargetPredicate = isVI in {
	def V_INTERP_P1_F32_e64 : VOP3Interp <"v_interp_p1_f32", VOP3_INTERP>;			def V_INTERP_P1_F32_e64 : VOP3Interp <"v_interp_p1_f32", VOP3_INTERP>;
	def V_INTERP_P2_F32_e64 : VOP3Interp <"v_interp_p2_f32", VOP3_INTERP>;			def V_INTERP_P2_F32_e64 : VOP3Interp <"v_interp_p2_f32", VOP3_INTERP>;
	def V_INTERP_MOV_F32_e64 : VOP3Interp <"v_interp_mov_f32", VOP3_INTERP_MOV>;			def V_INTERP_MOV_F32_e64 : VOP3Interp <"v_interp_mov_f32", VOP3_INTERP_MOV>;
	▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

llvm/trunk/test/Analysis/DivergenceAnalysis/AMDGPU/interp_f16.ll

				; RUN: opt -mtriple=amdgcn-- -analyze -divergence -use-gpu-divergence-analysis %s \| FileCheck %s

				; CHECK: for function 'interp_p1_f16'
				; CHECK: DIVERGENT: %p1 = call float @llvm.amdgcn.interp.p1.f16
				define amdgpu_ps float @interp_p1_f16(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
				main_body:
				%p1 = call float @llvm.amdgcn.interp.p1.f16(float %i, i32 1, i32 2, i1 0, i32 %m0)
				ret float %p1
				}

				; CHECK: for function 'interp_p2_f16'
				; CHECK: DIVERGENT: %p2 = call half @llvm.amdgcn.interp.p2.f16
				define amdgpu_ps half @interp_p2_f16(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
				main_body:
				%p2 = call half @llvm.amdgcn.interp.p2.f16(float %i, float %j, i32 1, i32 2, i1 0, i32 %m0)
				ret half %p2
				}

				; float @llvm.amdgcn.interp.p1.f16(i, attrchan, attr, high, m0)
				declare float @llvm.amdgcn.interp.p1.f16(float, i32, i32, i1, i32) #0
				; half @llvm.amdgcn.interp.p1.f16(p1, j, attrchan, attr, high, m0)
				declare half @llvm.amdgcn.interp.p2.f16(float, float, i32, i32, i1, i32) #0
				declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #0

				attributes #0 = { nounwind readnone }

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-32BANK %s
				; RUN: llc -mtriple=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8-32BANK %s
				; RUN: llc -mtriple=amdgcn -mcpu=gfx810 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8-16BANK %s

				define amdgpu_ps half @interp_f16(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
				; GFX9-32BANK-LABEL: interp_f16:
				; GFX9-32BANK: ; %bb.0: ; %main_body
				; GFX9-32BANK-NEXT: v_mov_b32_e32 v0, s0
				; GFX9-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX9-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX9-32BANK-NEXT: v_interp_p1ll_f16 v1, v0, attr2.y
				; GFX9-32BANK-NEXT: v_mov_b32_e32 v2, s1
				; GFX9-32BANK-NEXT: v_interp_p1ll_f16 v0, v0, attr2.y high
				; GFX9-32BANK-NEXT: v_interp_p2_legacy_f16 v1, v2, attr2.y, v1
				; GFX9-32BANK-NEXT: v_interp_p2_legacy_f16 v0, v2, attr2.y, v0 high
				; GFX9-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX9-32BANK-NEXT: v_add_f16_e32 v0, v1, v0
				; GFX9-32BANK-NEXT: ; return to shader part epilog
				;
				; GFX8-32BANK-LABEL: interp_f16:
				; GFX8-32BANK: ; %bb.0: ; %main_body
				; GFX8-32BANK-NEXT: v_mov_b32_e32 v0, s0
				; GFX8-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX8-32BANK-NEXT: v_interp_p1ll_f16 v1, v0, attr2.y
				; GFX8-32BANK-NEXT: v_mov_b32_e32 v2, s1
				; GFX8-32BANK-NEXT: v_interp_p1ll_f16 v0, v0, attr2.y high
				; GFX8-32BANK-NEXT: v_interp_p2_f16 v1, v2, attr2.y, v1
				; GFX8-32BANK-NEXT: v_interp_p2_f16 v0, v2, attr2.y, v0 high
				; GFX8-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX8-32BANK-NEXT: v_add_f16_e32 v0, v1, v0
				; GFX8-32BANK-NEXT: ; return to shader part epilog
				;
				; GFX8-16BANK-LABEL: interp_f16:
				; GFX8-16BANK: ; %bb.0: ; %main_body
				; GFX8-16BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-16BANK-NEXT: v_interp_mov_f32_e32 v0, p0, attr2.y
				; GFX8-16BANK-NEXT: v_mov_b32_e32 v1, s0
				; GFX8-16BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX8-16BANK-NEXT: v_interp_p1lv_f16 v2, v1, attr2.y, v0
				; GFX8-16BANK-NEXT: v_mov_b32_e32 v3, s1
				; GFX8-16BANK-NEXT: v_interp_p1lv_f16 v0, v1, attr2.y, v0 high
				; GFX8-16BANK-NEXT: v_interp_p2_f16 v2, v3, attr2.y, v2
				; GFX8-16BANK-NEXT: v_interp_p2_f16 v0, v3, attr2.y, v0 high
				; GFX8-16BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX8-16BANK-NEXT: v_add_f16_e32 v0, v2, v0
				; GFX8-16BANK-NEXT: ; return to shader part epilog
				main_body:
				%p1_0 = call float @llvm.amdgcn.interp.p1.f16(float %i, i32 1, i32 2, i1 0, i32 %m0)
				%p2_0 = call half @llvm.amdgcn.interp.p2.f16(float %p1_0, float %j, i32 1, i32 2, i1 0, i32 %m0)
				%p1_1 = call float @llvm.amdgcn.interp.p1.f16(float %i, i32 1, i32 2, i1 1, i32 %m0)
				%p2_1 = call half @llvm.amdgcn.interp.p2.f16(float %p1_1, float %j, i32 1, i32 2, i1 1, i32 %m0)
				%res = fadd half %p2_0, %p2_1
				ret half %res
				}

				; check that m0 is setup correctly before the interp p1 instruction
				define amdgpu_ps half @interp_p1_m0_setup(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
				; GFX9-32BANK-LABEL: interp_p1_m0_setup:
				; GFX9-32BANK: ; %bb.0: ; %main_body
				; GFX9-32BANK-NEXT: ;;#ASMSTART
				; GFX9-32BANK-NEXT: s_mov_b32 m0, 0
				; GFX9-32BANK-NEXT: ;;#ASMEND
				; GFX9-32BANK-NEXT: s_mov_b32 s3, m0
				; GFX9-32BANK-NEXT: v_mov_b32_e32 v0, s0
				; GFX9-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX9-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX9-32BANK-NEXT: v_interp_p1ll_f16 v0, v0, attr2.y
				; GFX9-32BANK-NEXT: v_mov_b32_e32 v1, s1
				; GFX9-32BANK-NEXT: v_interp_p2_legacy_f16 v0, v1, attr2.y, v0
				; GFX9-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX9-32BANK-NEXT: v_add_f16_e32 v0, s3, v0
				; GFX9-32BANK-NEXT: ; return to shader part epilog
				;
				; GFX8-32BANK-LABEL: interp_p1_m0_setup:
				; GFX8-32BANK: ; %bb.0: ; %main_body
				; GFX8-32BANK-NEXT: ;;#ASMSTART
				; GFX8-32BANK-NEXT: s_mov_b32 m0, 0
				; GFX8-32BANK-NEXT: ;;#ASMEND
				; GFX8-32BANK-NEXT: s_mov_b32 s3, m0
				; GFX8-32BANK-NEXT: v_mov_b32_e32 v0, s0
				; GFX8-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX8-32BANK-NEXT: v_interp_p1ll_f16 v0, v0, attr2.y
				; GFX8-32BANK-NEXT: v_mov_b32_e32 v1, s1
				; GFX8-32BANK-NEXT: v_interp_p2_f16 v0, v1, attr2.y, v0
				; GFX8-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX8-32BANK-NEXT: v_add_f16_e32 v0, s3, v0
				; GFX8-32BANK-NEXT: ; return to shader part epilog
				;
				; GFX8-16BANK-LABEL: interp_p1_m0_setup:
				; GFX8-16BANK: ; %bb.0: ; %main_body
				; GFX8-16BANK-NEXT: ;;#ASMSTART
				; GFX8-16BANK-NEXT: s_mov_b32 m0, 0
				; GFX8-16BANK-NEXT: ;;#ASMEND
				; GFX8-16BANK-NEXT: s_mov_b32 s3, m0
				; GFX8-16BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-16BANK-NEXT: v_interp_mov_f32_e32 v0, p0, attr2.y
				; GFX8-16BANK-NEXT: v_mov_b32_e32 v1, s0
				; GFX8-16BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX8-16BANK-NEXT: v_interp_p1lv_f16 v0, v1, attr2.y, v0
				; GFX8-16BANK-NEXT: v_mov_b32_e32 v1, s1
				; GFX8-16BANK-NEXT: v_interp_p2_f16 v0, v1, attr2.y, v0
				; GFX8-16BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX8-16BANK-NEXT: v_add_f16_e32 v0, s3, v0
				; GFX8-16BANK-NEXT: ; return to shader part epilog
				main_body:
				%mx = call i32 asm sideeffect "s_mov_b32 m0, 0", "={M0}"() #0
				%p1_0 = call float @llvm.amdgcn.interp.p1.f16(float %i, i32 1, i32 2, i1 0, i32 %m0)
				%p2_0 = call half @llvm.amdgcn.interp.p2.f16(float %p1_0, float %j, i32 1, i32 2, i1 0, i32 %m0)
				%my = trunc i32 %mx to i16
				%mh = bitcast i16 %my to half
				%res = fadd half %p2_0, %mh
				ret half %res
				}

				; check that m0 is setup correctly before the interp p2 instruction
				define amdgpu_ps half @interp_p2_m0_setup(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
				; GFX9-32BANK-LABEL: interp_p2_m0_setup:
				; GFX9-32BANK: ; %bb.0: ; %main_body
				; GFX9-32BANK-NEXT: v_mov_b32_e32 v0, s0
				; GFX9-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX9-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX9-32BANK-NEXT: v_interp_p1ll_f16 v0, v0, attr2.y
				; GFX9-32BANK-NEXT: ;;#ASMSTART
				; GFX9-32BANK-NEXT: s_mov_b32 m0, 0
				; GFX9-32BANK-NEXT: ;;#ASMEND
				; GFX9-32BANK-NEXT: s_mov_b32 s0, m0
				; GFX9-32BANK-NEXT: v_mov_b32_e32 v1, s1
				; GFX9-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX9-32BANK-NEXT: v_interp_p2_legacy_f16 v0, v1, attr2.y, v0
				; GFX9-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX9-32BANK-NEXT: v_add_f16_e32 v0, s0, v0
				; GFX9-32BANK-NEXT: ; return to shader part epilog
				;
				; GFX8-32BANK-LABEL: interp_p2_m0_setup:
				; GFX8-32BANK: ; %bb.0: ; %main_body
				; GFX8-32BANK-NEXT: v_mov_b32_e32 v0, s0
				; GFX8-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX8-32BANK-NEXT: v_interp_p1ll_f16 v0, v0, attr2.y
				; GFX8-32BANK-NEXT: ;;#ASMSTART
				; GFX8-32BANK-NEXT: s_mov_b32 m0, 0
				; GFX8-32BANK-NEXT: ;;#ASMEND
				; GFX8-32BANK-NEXT: s_mov_b32 s0, m0
				; GFX8-32BANK-NEXT: v_mov_b32_e32 v1, s1
				; GFX8-32BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-32BANK-NEXT: v_interp_p2_f16 v0, v1, attr2.y, v0
				; GFX8-32BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX8-32BANK-NEXT: v_add_f16_e32 v0, s0, v0
				; GFX8-32BANK-NEXT: ; return to shader part epilog
				;
				; GFX8-16BANK-LABEL: interp_p2_m0_setup:
				; GFX8-16BANK: ; %bb.0: ; %main_body
				; GFX8-16BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-16BANK-NEXT: v_interp_mov_f32_e32 v0, p0, attr2.y
				; GFX8-16BANK-NEXT: v_mov_b32_e32 v1, s0
				; GFX8-16BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GFX8-16BANK-NEXT: v_interp_p1lv_f16 v0, v1, attr2.y, v0
				; GFX8-16BANK-NEXT: ;;#ASMSTART
				; GFX8-16BANK-NEXT: s_mov_b32 m0, 0
				; GFX8-16BANK-NEXT: ;;#ASMEND
				; GFX8-16BANK-NEXT: s_mov_b32 s0, m0
				; GFX8-16BANK-NEXT: v_mov_b32_e32 v1, s1
				; GFX8-16BANK-NEXT: s_mov_b32 m0, s2
				; GFX8-16BANK-NEXT: v_interp_p2_f16 v0, v1, attr2.y, v0
				; GFX8-16BANK-NEXT: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				; GFX8-16BANK-NEXT: v_add_f16_e32 v0, s0, v0
				; GFX8-16BANK-NEXT: ; return to shader part epilog
				main_body:
				%p1_0 = call float @llvm.amdgcn.interp.p1.f16(float %i, i32 1, i32 2, i1 0, i32 %m0)
				%mx = call i32 asm sideeffect "s_mov_b32 m0, 0", "={M0}"() #0
				%p2_0 = call half @llvm.amdgcn.interp.p2.f16(float %p1_0, float %j, i32 1, i32 2, i1 0, i32 %m0)
				%my = trunc i32 %mx to i16
				%mh = bitcast i16 %my to half
				%res = fadd half %p2_0, %mh
				ret half %res
				}

				; float @llvm.amdgcn.interp.p1.f16(i, attrchan, attr, high, m0)
				declare float @llvm.amdgcn.interp.p1.f16(float, i32, i32, i1, i32) #0
				; half @llvm.amdgcn.interp.p1.f16(p1, j, attrchan, attr, high, m0)
				declare half @llvm.amdgcn.interp.p2.f16(float, float, i32, i32, i1, i32) #0
				declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #0

				attributes #0 = { nounwind readnone }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsics for 16 bit interpolation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 183836

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/VOP3Instructions.td

llvm/trunk/test/Analysis/DivergenceAnalysis/AMDGPU/interp_f16.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add intrinsics for 16 bit interpolationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 183836

llvm/trunk/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/VOP3Instructions.td

llvm/trunk/test/Analysis/DivergenceAnalysis/AMDGPU/interp_f16.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll

[AMDGPU] Add intrinsics for 16 bit interpolation
ClosedPublic