Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rG07e03712d3cd: AMDGPU : Add intrinsics for compare with the full wavefront result
rL276998: AMDGPU : Add intrinsics for compare with the full wavefront result

Summary

Add an LLVM intrinsic / Clang Builtin to expose the v_cmp_ne_i32 instruction.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng updated this revision to Diff 64393.Jul 18 2016, 2:57 PM

wdng retitled this revision from to AMDGPU : Add an LLVM intrinsic / Clang Builtin to expose the v_cmp_ne_i32 instruction..

wdng updated this object.

wdng added reviewers: • tstellarAMD, arsenm.

wdng set the repository for this revision to rL LLVM.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptJul 18 2016, 2:57 PM

We should only do a general icmp, not an intrinsic for specific compare opcodes

This revision now requires changes to proceed.Jul 18 2016, 2:58 PM

In D22482#487548, @arsenm wrote:

We should only do a general icmp, not an intrinsic for specific compare opcodes

Do you mean an LLVM IR icmp instruction or a generic intrinsic that takes a condition code as its third input?

In D22482#488452, @tstellarAMD wrote:

In D22482#487548, @arsenm wrote:

We should only do a general icmp, not an intrinsic for specific compare opcodes

Do you mean an LLVM IR icmp instruction or a generic intrinsic that takes a condition code as its third input?

Yes

Use a general way to implement v_cmp_ne_i32.

Fixed a data type issue for fcmp intrinsic definition.

arsenm added inline comments.Jul 21 2016, 4:12 PM

include/llvm/IR/IntrinsicsAMDGPU.td
394	Remove the GCCBuiltins, they don't work with overloaded intrinsics
400	the 3rd parameter should be i32
lib/Target/AMDGPU/AMDGPUISelLowering.h
231	Needs a comment that this is setcc with the full mask result
lib/Target/AMDGPU/SIISelLowering.cpp
1655–1656	These should be put towards the end of the cases
1657	Variables should be capitalized and camel case. What happens if the cond code is out of range? There should probably be a clamp
1658	Extra spaces between type and name
1659	This looks like it goes over 80 characters
lib/Target/AMDGPU/SIInstructions.td
2365–2366	This can just be a class. You can also try adding the pattern dag to the v_cmp instruction definition patterns list (although I'm not 100% sure if the multiple patterns actually work). A multiclass might help if you don't want to repeat for i32/i64
2372–2373	All compare types should be defined. Additionally i64 and the FP ones are missing
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ne.ll
6–12 ↗	(On Diff #64938)	There should be a test for every condition code and i32/i64
8 ↗	(On Diff #64938)	nounwind should also be an attribute group
9 ↗	(On Diff #64938)	Call site doesn't need the at tributes

• tstellarAMD added inline comments.Jul 22 2016, 6:20 AM

include/llvm/IR/IntrinsicsAMDGPU.td
400	Also, should the return type be i64 instead of double?

Code changes based on Matt's comments.

include/llvm/IR/IntrinsicsAMDGPU.td
400	Yes, code has been changed accordingly. Thanks!

arsenm added inline comments.Jul 25 2016, 12:09 PM

lib/Target/AMDGPU/AMDGPUISelLowering.h
231	Should have space after the //, and it should be capitalized and punctuated. Maybe clearer would be a compare with a result bit per item in the wavefront or something, mask result sounds more ambiguous maybe
lib/Target/AMDGPU/SIISelLowering.cpp
1909–1910	You should do the range check before the static_cast since I think it is undefined behavior to have an out of bounds enum value inserted. This also won't work for fcmp, each should be handled in its own case with its own range check for the specific compare types' range
lib/Target/AMDGPU/SIInstructions.td
2374–2375	The unsigned should use the _U32 compare
2383–2384	Ditto
2385–2386	Ditto
2399–2411	This also needs to be done for the unordered compares
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
102–104	Missing unordered compares

Changes based on Matt's comments.

arsenm added inline comments.Jul 25 2016, 3:14 PM

include/llvm/IR/IntrinsicsAMDGPU.td
392	You should remove this comment
397	And this one
lib/Target/AMDGPU/SIISelLowering.cpp
1907–1909	Instead of an assert, how about returning undef? this should also have a test. Same if the operand isn't really constant, you'll need to do the dyn_cast yourself
1918–1922	Should refer to FCmpInst
lib/Target/AMDGPU/SIInstructions.td
2413–2425	These are not the correct unordered comparison instructions, refer to the existing set of fcmp patterns for which to use

• tstellarAMD added inline comments.Jul 25 2016, 3:15 PM

lib/Target/AMDGPU/SIInstructions.td
2413–2425	Unordered compares should select the V_CMP_N* instructions. Take a look at the instruction definitions to see which condition matches to which instruction.

arsenm added inline comments.Jul 25 2016, 3:18 PM

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
200	You should also add a test that uses fabs on the inputs to make sure that source modifiers are folded

Add dyn_cast for type converting.
Fixed not using correct FCmpInst type.
Fixed incorrectly use of unordered insutrctions
Added fabs as input for fcmp test.

Upload correct diff with cached LIT tests update.

Upload correct diff file.

The title of the commit is also inaccurate, it should be intrinsics for compare with the full wavefront result

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
201	Still missing these tests

wdng added inline comments.Jul 26 2016, 4:13 PM

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
201	I have just created one "define void @v_fcmp_f32_oeq_with_fabs(i64 addrspace(1)* %out, float %src, float %a) #1" and put it one the top of tests. Should I write fabs tests for all fcmp comparisons?

wdng retitled this revision from AMDGPU : Add an LLVM intrinsic / Clang Builtin to expose the v_cmp_ne_i32 instruction. to AMDGPU : Add intrinsics for compare with the full wavefront result, such as v_cmp_ne_i32, etc...Jul 26 2016, 4:14 PM

arsenm added inline comments.Jul 26 2016, 4:21 PM

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
7	Use the attribute group
11–16	The call site does not need the attribute specified. Can you also test the other operand? The check line should check the actual operands, this currently does not actually check much

Modified one LIT test to check operands.

Fixed corrupted diff patch.

arsenm added inline comments.Jul 26 2016, 6:09 PM

lib/Target/AMDGPU/SIISelLowering.cpp
1907	Space before =
1921	Ditto
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
8	Missing test for invalid condition code value
10	You can move the \|s outside of the regex and then you don't have to escape them
12	Don't need call site attributes
12–17	Still should test that both operands can have the source modifiers folded
30	it doesn't really matter, but there's no reason this test needs to under-align the stores, Fix these to be align 8 or remove the aligns
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
7	Missing test for invalid condition code value

Add illegal cond code LIT tests.

arsenm added inline comments.Jul 27 2016, 12:01 PM

lib/Target/AMDGPU/SIInstructions.td
2391–2392	Spaces before the types and the next (
test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll
29	You should drop the suffix here to strengthen the test. It would be best to reduce to just v_cmp because something could commute the instruction
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
17	Ditto

Rename LIT test function names.
Add space between type and parenthesis.

LGTM, you can drop the "such as" part from your commit message

This revision is now accepted and ready to land.Jul 27 2016, 4:00 PM

Closed by commit rL276998: AMDGPU : Add intrinsics for compare with the full wavefront result (authored by wdng). · Explain WhyJul 28 2016, 9:50 AM

This revision was automatically updated to reflect the committed changes.

Diff 65358

include/llvm/IR/IntrinsicsAMDGPU.td

Context not available.
	GCCBuiltin<"__builtin_amdgcn_lerp">,	GCCBuiltin<"__builtin_amdgcn_lerp">,
	Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;	Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], [IntrNoMem]>;

		// llvm.amdgcn.icmp
		arsenmUnsubmitted Done Reply Inline Actions You should remove this comment arsenm: You should remove this comment
		def int_amdgcn_icmp :
		Intrinsic<[llvm_i64_ty], [llvm_anyint_ty, LLVMMatchType<0>, llvm_i32_ty],
		arsenmUnsubmitted Done Reply Inline Actions Remove the GCCBuiltins, they don't work with overloaded intrinsics arsenm: Remove the GCCBuiltins, they don't work with overloaded intrinsics
		[IntrNoMem, IntrConvergent]>;

		// llvm.amdgcn.fcmp
		arsenmUnsubmitted Done Reply Inline Actions And this one arsenm: And this one
		def int_amdgcn_fcmp :
		Intrinsic<[llvm_i64_ty], [llvm_anyfloat_ty, LLVMMatchType<0>, llvm_i32_ty],
		[IntrNoMem, IntrConvergent]>;
		arsenmUnsubmitted Done Reply Inline Actions the 3rd parameter should be i32 arsenm: the 3rd parameter should be i32
		tstellarAMDUnsubmitted Done Reply Inline Actions Also, should the return type be i64 instead of double? tstellarAMD: Also, should the return type be i64 instead of double?
		wdngAuthorUnsubmitted Not Done Reply Inline Actions Yes, code has been changed accordingly. Thanks! wdng: Yes, code has been changed accordingly. Thanks!

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// CI+ Intrinsics	// CI+ Intrinsics
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
Context not available.

lib/Target/AMDGPU/AMDGPUISelLowering.h

Context not available.
	DWORDADDR,	DWORDADDR,
	FRACT,	FRACT,
	CLAMP,	CLAMP,
		SETCC, //this is setcc with the full mask result
		arsenmUnsubmitted Done Reply Inline Actions Needs a comment that this is setcc with the full mask result arsenm: Needs a comment that this is setcc with the full mask result
		arsenmUnsubmitted Done Reply Inline Actions Should have space after the //, and it should be capitalized and punctuated. Maybe clearer would be a compare with a result bit per item in the wavefront or something, mask result sounds more ambiguous maybe arsenm: Should have space after the //, and it should be capitalized and punctuated. Maybe clearer…

	// SIN_HW, COS_HW - f32 for SI, 1 ULP max error, valid from -100 pi to 100 pi.	// SIN_HW, COS_HW - f32 for SI, 1 ULP max error, valid from -100 pi to 100 pi.
	// Denormals handled on some parts.	// Denormals handled on some parts.
Context not available.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Context not available.
	NODE_NAME_CASE(RETURN)	NODE_NAME_CASE(RETURN)
	NODE_NAME_CASE(DWORDADDR)	NODE_NAME_CASE(DWORDADDR)
	NODE_NAME_CASE(FRACT)	NODE_NAME_CASE(FRACT)
		NODE_NAME_CASE(SETCC)
	NODE_NAME_CASE(CLAMP)	NODE_NAME_CASE(CLAMP)
	NODE_NAME_CASE(COS_HW)	NODE_NAME_CASE(COS_HW)
	NODE_NAME_CASE(SIN_HW)	NODE_NAME_CASE(SIN_HW)
Context not available.

lib/Target/AMDGPU/AMDGPUInstrInfo.td

Context not available.
	// out = (src1 > src0) ? 1 : 0	// out = (src1 > src0) ? 1 : 0
	def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>;	def AMDGPUborrow : SDNode<"AMDGPUISD::BORROW", SDTIntBinOp, []>;

		def AMDGPUSetCCOp : SDTypeProfile<1, 3, [ // setcc
		SDTCisVT<0, i64>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>
		]>;

		def AMDGPUsetcc : SDNode<"AMDGPUISD::SETCC", AMDGPUSetCCOp>;

	def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0",	def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0",
	SDTIntToFPOp, []>;	SDTIntToFPOp, []>;
Context not available.

lib/Target/AMDGPU/SIISelLowering.cpp

Context not available.
	#include "llvm/CodeGen/MachineInstrBuilder.h"	#include "llvm/CodeGen/MachineInstrBuilder.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"	#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/CodeGen/SelectionDAG.h"	#include "llvm/CodeGen/SelectionDAG.h"
		#include "llvm/CodeGen/Analysis.h"
	#include "llvm/IR/DiagnosticInfo.h"	#include "llvm/IR/DiagnosticInfo.h"
	#include "llvm/IR/Function.h"	#include "llvm/IR/Function.h"

		arsenmUnsubmitted Done Reply Inline Actions Variables should be capitalized and camel case. What happens if the cond code is out of range? There should probably be a clamp arsenm: Variables should be capitalized and camel case. What happens if the cond code is out of range?
		arsenmUnsubmitted Done Reply Inline Actions This looks like it goes over 80 characters arsenm: This looks like it goes over 80 characters
		arsenmUnsubmitted Done Reply Inline Actions Extra spaces between type and name arsenm: Extra spaces between type and name
		arsenmUnsubmitted Done Reply Inline Actions These should be put towards the end of the cases arsenm: These should be put towards the end of the cases
Context not available.
	return DAG.getNode(AMDGPUISD::DIV_SCALE, DL, Op->getVTList(), Src0,	return DAG.getNode(AMDGPUISD::DIV_SCALE, DL, Op->getVTList(), Src0,
	Denominator, Numerator);	Denominator, Numerator);
	}	}
		case Intrinsic::amdgcn_icmp:
		case Intrinsic::amdgcn_fcmp: {
		ICmpInst::Predicate IcInput =
		arsenmUnsubmitted Done Reply Inline Actions Space before = arsenm: Space before =
		static_cast<ICmpInst::Predicate>(Op.getConstantOperandVal(3));
		assert(ICmpInst::Predicate::FIRST_ICMP_PREDICATE <= IcInput &&
		arsenmUnsubmitted Done Reply Inline Actions Instead of an assert, how about returning undef? this should also have a test. Same if the operand isn't really constant, you'll need to do the dyn_cast yourself arsenm: Instead of an assert, how about returning undef? this should also have a test. Same if the…
		IcInput < ICmpInst::Predicate::BAD_ICMP_PREDICATE);
		arsenmUnsubmitted Done Reply Inline Actions You should do the range check before the static_cast since I think it is undefined behavior to have an out of bounds enum value inserted. This also won't work for fcmp, each should be handled in its own case with its own range check for the specific compare types' range arsenm: You should do the range check before the static_cast since I think it is undefined behavior to…
		ISD::CondCode CCOpcode = getICmpCondCode(IcInput);
		return DAG.getNode(AMDGPUISD::SETCC, DL, VT, Op.getOperand(1),
		Op.getOperand(2), DAG.getCondCode(CCOpcode));
		}
	default:	default:
	return AMDGPUTargetLowering::LowerOperation(Op, DAG);	return AMDGPUTargetLowering::LowerOperation(Op, DAG);
	}	}
Context not available.
		arsenmUnsubmitted Done Reply Inline Actions Should refer to FCmpInst arsenm: Should refer to FCmpInst
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto

lib/Target/AMDGPU/SIInstructions.td

Context not available.
	(DS_SWIZZLE_B32 $src, (as_i16imm $offset16), (i1 0))	(DS_SWIZZLE_B32 $src, (as_i16imm $offset16), (i1 0))
	>;	>;


		//===----------------------------------------------------------------------===//
		// V_ICMPIntrinsic Pattern.
		//===----------------------------------------------------------------------===//

		class ICMP_Pattern <PatLeaf cond, Instruction inst, ValueType vt> : Pat <
		(AMDGPUsetcc vt:$src0, vt:$src1, cond),
		arsenmUnsubmitted Done Reply Inline Actions This can just be a class. You can also try adding the pattern dag to the v_cmp instruction definition patterns list (although I'm not 100% sure if the multiple patterns actually work). A multiclass might help if you don't want to repeat for i32/i64 arsenm: This can just be a class. You can also try adding the pattern dag to the v_cmp instruction…
		(inst $src0, $src1)
		>;

		def : ICMP_Pattern <COND_EQ, V_CMP_EQ_I32_e64, i32>;
		def : ICMP_Pattern <COND_NE, V_CMP_NE_I32_e64, i32>;
		def : ICMP_Pattern <COND_UGT, V_CMP_GT_I32_e64, i32>;
		def : ICMP_Pattern <COND_UGE, V_CMP_GE_I32_e64, i32>;
		arsenmUnsubmitted Done Reply Inline Actions All compare types should be defined. Additionally i64 and the FP ones are missing arsenm: All compare types should be defined. Additionally i64 and the FP ones are missing
		def : ICMP_Pattern <COND_ULT, V_CMP_LT_I32_e64, i32>;
		def : ICMP_Pattern <COND_ULE, V_CMP_LE_I32_e64, i32>;
		arsenmUnsubmitted Done Reply Inline Actions The unsigned should use the _U32 compare arsenm: The unsigned should use the _U32 compare
		def : ICMP_Pattern <COND_SGT, V_CMP_GT_I32_e64, i32>;
		def : ICMP_Pattern <COND_SGE, V_CMP_GE_I32_e64, i32>;
		def : ICMP_Pattern <COND_SLT, V_CMP_LT_I32_e64, i32>;
		def : ICMP_Pattern <COND_SLE, V_CMP_LE_I32_e64, i32>;

		def : ICMP_Pattern <COND_EQ, V_CMP_EQ_I64_e64, i64>;
		def : ICMP_Pattern <COND_NE, V_CMP_NE_I64_e64, i64>;
		def : ICMP_Pattern <COND_UGT, V_CMP_GT_I64_e64, i64>;
		def : ICMP_Pattern <COND_UGE, V_CMP_GE_I64_e64, i64>;
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
		def : ICMP_Pattern <COND_ULT, V_CMP_LT_I64_e64, i64>;
		def : ICMP_Pattern <COND_ULE, V_CMP_LE_I64_e64, i64>;
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
		def : ICMP_Pattern <COND_SGT, V_CMP_GT_I64_e64, i64>;
		def : ICMP_Pattern <COND_SGE, V_CMP_GE_I64_e64, i64>;
		def : ICMP_Pattern <COND_SLT, V_CMP_LT_I64_e64, i64>;
		def : ICMP_Pattern <COND_SLE, V_CMP_LE_I64_e64, i64>;

		class FCMP_Pattern <PatLeaf cond, Instruction inst, ValueType vt> : Pat <
		arsenmUnsubmitted Not Done Reply Inline Actions Spaces before the types and the next ( arsenm: Spaces before the types and the next (
		(i64(AMDGPUsetcc (vt(VOP3Mods vt:$src0, i32:$src0_modifiers)),
		(vt(VOP3Mods vt:$src1, i32:$src1_modifiers)), cond)),
		(inst $src0_modifiers, $src0, $src1_modifiers, $src1,
		DSTCLAMP.NONE, DSTOMOD.NONE)
		>;

		def : FCMP_Pattern <COND_OEQ, V_CMP_EQ_F32_e64, f32>;
		def : FCMP_Pattern <COND_ONE, V_CMP_NEQ_F32_e64, f32>;
		def : FCMP_Pattern <COND_OGT, V_CMP_GT_F32_e64, f32>;
		def : FCMP_Pattern <COND_OGE, V_CMP_GE_F32_e64, f32>;
		def : FCMP_Pattern <COND_OLT, V_CMP_LT_F32_e64, f32>;
		def : FCMP_Pattern <COND_OLE, V_CMP_LE_F32_e64, f32>;

		def : FCMP_Pattern <COND_OEQ, V_CMP_EQ_F64_e64, f64>;
		def : FCMP_Pattern <COND_ONE, V_CMP_NEQ_F64_e64, f64>;
		def : FCMP_Pattern <COND_OGT, V_CMP_GT_F64_e64, f64>;
		def : FCMP_Pattern <COND_OGE, V_CMP_GE_F64_e64, f64>;
		def : FCMP_Pattern <COND_OLT, V_CMP_LT_F64_e64, f64>;
		def : FCMP_Pattern <COND_OLE, V_CMP_LE_F64_e64, f64>;
		arsenmUnsubmitted Done Reply Inline Actions This also needs to be done for the unordered compares arsenm: This also needs to be done for the unordered compares

	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
	// SMRD Patterns	// SMRD Patterns
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
Context not available.

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll

This file was added.

				; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				declare i64 @llvm.amdgcn.fcmp.f32(float, float, i32) #0
				declare i64 @llvm.amdgcn.fcmp.f64(double, double, i32) #0

				; GCN-LABEL: {{^}}v_fcmp_f32_eq:
				arsenmUnsubmitted Not Done Reply Inline Actions Use the attribute group arsenm: Use the attribute group
				; GCN: v_cmp_eq_f32_e64
				arsenmUnsubmitted Not Done Reply Inline Actions Missing test for invalid condition code value arsenm: Missing test for invalid condition code value
				define void @v_fcmp_f32_eq(i64 addrspace(1)* %out, float %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 32)
				arsenmUnsubmitted Done Reply Inline Actions You can move the \|s outside of the regex and then you don't have to escape them arsenm: You can move the \|s outside of the regex and then you don't have to escape them
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				arsenmUnsubmitted Done Reply Inline Actions Don't need call site attributes arsenm: Don't need call site attributes
				}

				; GCN-LABEL: {{^}}v_fcmp_f32_ne:
				; GCN: v_cmp_neq_f32_e64
				arsenmUnsubmitted Done Reply Inline Actions The call site does not need the attribute specified. Can you also test the other operand? The check line should check the actual operands, this currently does not actually check much arsenm: The call site does not need the attribute specified. Can you also test the other operand? The…
				define void @v_fcmp_f32_ne(i64 addrspace(1)* %out, float %src) #1 {
				arsenmUnsubmitted Done Reply Inline Actions Still should test that both operands can have the source modifiers folded arsenm: Still should test that both operands can have the source modifiers folded
				%result = call i64 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 33)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f32_ogt:
				; GCN: v_cmp_gt_f32_e64
				define void @v_fcmp_f32_ogt(i64 addrspace(1)* %out, float %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 38)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}
				arsenmUnsubmitted Done Reply Inline Actions You should drop the suffix here to strengthen the test. It would be best to reduce to just v_cmp because something could commute the instruction arsenm: You should drop the suffix here to strengthen the test. It would be best to reduce to just…

				arsenmUnsubmitted Done Reply Inline Actions it doesn't really matter, but there's no reason this test needs to under-align the stores, Fix these to be align 8 or remove the aligns arsenm: it doesn't really matter, but there's no reason this test needs to under-align the stores, Fix…
				; GCN-LABEL: {{^}}v_fcmp_f32_oge:
				; GCN: v_cmp_ge_f32_e64
				define void @v_fcmp_f32_oge(i64 addrspace(1)* %out, float %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 39)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f32_olt:
				; GCN: v_cmp_lt_f32_e64
				define void @v_fcmp_f32_olt(i64 addrspace(1)* %out, float %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 40)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f32_ole:
				; GCN: v_cmp_le_f32_e64
				define void @v_fcmp_f32_ole(i64 addrspace(1)* %out, float %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f32(float %src, float 100.00, i32 41)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f64_eq:
				; GCN: v_cmp_eq_f64_e64
				define void @v_fcmp_f64_eq(i64 addrspace(1)* %out, double %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 32)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f64_ne:
				; GCN: v_cmp_neq_f64_e64
				define void @v_fcmp_f64_ne(i64 addrspace(1)* %out, double %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 33)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f64_ogt:
				; GCN: v_cmp_gt_f64_e64
				define void @v_fcmp_f64_ogt(i64 addrspace(1)* %out, double %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 38)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f64_oge:
				; GCN: v_cmp_ge_f64_e64
				define void @v_fcmp_f64_oge(i64 addrspace(1)* %out, double %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 39)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f64_olt:
				; GCN: v_cmp_lt_f64_e64
				define void @v_fcmp_f64_olt(i64 addrspace(1)* %out, double %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 40)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_fcmp_f64_ole:
				; GCN: v_cmp_le_f64_e64
				define void @v_fcmp_f64_ole(i64 addrspace(1)* %out, double %src) #1 {
				%result = call i64 @llvm.amdgcn.fcmp.f64(double %src, double 100.00, i32 41)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}



				arsenmUnsubmitted Done Reply Inline Actions Missing unordered compares arsenm: Missing unordered compares

				attributes #0 = { nounwind readnone convergent }
				attributes #1 = { nounwind }
				arsenmUnsubmitted Not Done Reply Inline Actions You should also add a test that uses fabs on the inputs to make sure that source modifiers are folded arsenm: You should also add a test that uses fabs on the inputs to make sure that source modifiers are…
				arsenmUnsubmitted Not Done Reply Inline Actions Still missing these tests arsenm: Still missing these tests
				wdngAuthorUnsubmitted Not Done Reply Inline Actions I have just created one "define void @v_fcmp_f32_oeq_with_fabs(i64 addrspace(1)* %out, float %src, float %a) #1" and put it one the top of tests. Should I write fabs tests for all fcmp comparisons? wdng: I have just created one "define void @v_fcmp_f32_oeq_with_fabs(i64 addrspace(1)* %out, float…

test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll

This file was added.

				; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				declare i64 @llvm.amdgcn.icmp.i32(i32, i32, i32) #0
				declare i64 @llvm.amdgcn.icmp.i64(i64, i64, i32) #0

				; GCN-LABEL: {{^}}v_icmp_i32_eq:
				arsenmUnsubmitted Done Reply Inline Actions Missing test for invalid condition code value arsenm: Missing test for invalid condition code value
				; GCN: v_cmp_eq_i32_e64
				define void @v_icmp_i32_eq(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 32)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_ne:
				; GCN: v_cmp_ne_i32_e64
				define void @v_icmp_i32_ne(i64 addrspace(1)* %out, i32 %src) #1 {
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 33)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_ugt:
				; GCN: v_cmp_gt_i32_e64
				define void @v_icmp_i32_ugt(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 34)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_uge:
				; GCN: v_cmp_ge_i32_e64
				define void @v_icmp_i32_uge(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 35)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_ult:
				; GCN: v_cmp_lt_i32_e64
				define void @v_icmp_i32_ult(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 36)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_ule:
				; GCN: v_cmp_le_i32_e64
				define void @v_icmp_i32_ule(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 37)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_sgt:
				; GCN: v_cmp_gt_i32_e64
				define void @v_icmp_i32_sgt(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 38)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_sge:
				; GCN: v_cmp_ge_i32_e64
				define void @v_icmp_i32_sge(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 39)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i32_slt:
				; GCN: v_cmp_lt_i32_e64
				define void @v_icmp_i32_slt(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 40)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}
				; GCN-LABEL: {{^}}v_icmp_i32_sle:
				; GCN: v_cmp_le_i32_e64
				define void @v_icmp_i32_sle(i64 addrspace(1)* %out, i32 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i32(i32 %src, i32 100, i32 41)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_eq:
				; GCN: v_cmp_eq_i64_e64
				define void @v_icmp_i64_eq(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 32)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_ne:
				; GCN: v_cmp_ne_i64_e64
				define void @v_icmp_i64_ne(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 33)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_ugt:
				; GCN: v_cmp_gt_i64_e64
				define void @v_icmp_i64_ugt(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 34)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_uge:
				; GCN: v_cmp_ge_i64_e64
				define void @v_icmp_i64_uge(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 35)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_ult:
				; GCN: v_cmp_lt_i64_e64
				define void @v_icmp_i64_ult(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 36)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_ule:
				; GCN: v_cmp_le_i64_e64
				define void @v_icmp_i64_ule(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 37)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_sgt:
				; GCN: v_cmp_gt_i64_e64
				define void @v_icmp_i64_sgt(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 38)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_sge:
				; GCN: v_cmp_ge_i64_e64
				define void @v_icmp_i64_sge(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 39)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				; GCN-LABEL: {{^}}v_icmp_i64_slt:
				; GCN: v_cmp_lt_i64_e64
				define void @v_icmp_i64_slt(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 40)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}
				; GCN-LABEL: {{^}}v_icmp_i64_sle:
				; GCN: v_cmp_le_i64_e64
				define void @v_icmp_i64_sle(i64 addrspace(1)* %out, i64 %src) #1 {
				%result = call i64 @llvm.amdgcn.icmp.i64(i64 %src, i64 100, i32 41)
				store i64 %result, i64 addrspace(1)* %out, align 4
				ret void
				}

				attributes #0 = { nounwind readnone convergent }
				attributes #1 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add intrinsics for compare with the full wavefront result, such as v_cmp_ne_i32, etc..
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 65358

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll

test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add intrinsics for compare with the full wavefront result, such as v_cmp_ne_i32, etc..ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 65358

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstructions.td

test/CodeGen/AMDGPU/llvm.amdgcn.fcmp.ll

test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll

AMDGPU : Add intrinsics for compare with the full wavefront result, such as v_cmp_ne_i32, etc..
ClosedPublic