Diff 106298

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 4,611 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::performClassCombine(SDNode *N,
}		}

if (N->getOperand(0).isUndef())		if (N->getOperand(0).isUndef())
return DAG.getUNDEF(MVT::i1);		return DAG.getUNDEF(MVT::i1);

return SDValue();		return SDValue();
}		}

		static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op) {
		if (!DAG.getTargetLoweringInfo().hasFloatingPointExceptions())
		return true;

		return DAG.isKnownNeverNaN(Op);
		}

		static bool isCanonicalized(SDValue Op, const SISubtarget *ST,
		unsigned MaxDepth=5) {
		// If source is a result of another standard FP operation it is already in
		// canonical form.

		switch (Op.getOpcode()) {
		default:
		break;

		// These will flush denorms if required.
		case ISD::FADD:
		arsenmUnsubmitted Not Done Reply Inline Actions Since FMAD always flushes I don't think it's OK to handle it arsenm: Since FMAD always flushes I don't think it's OK to handle it
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions FMAD - Perform a * b + c, while getting the same result as the separately rounded operations. It is essentially v_mac_f32 which always flushes, but if denorms are disabled it is lowered as fma. So handling it here is correct, and there is a test test_fold_canonicalize_fmuladd_value_f32 for it. rampitec: FMAD - Perform a * b + c, while getting the same result as the separately rounded operations.
		case ISD::FSUB:
		case ISD::FMUL:
		arsenmUnsubmitted Not Done Reply Inline Actions fcanonicalize should have the idempotent optimization applied so there shouldn't be a need to handle it arsenm: fcanonicalize should have the idempotent optimization applied so there shouldn't be a need to…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions It is not: v_mul_f32_e32 v2, 1.0, v2 v_mul_f32_e32 v2, 1.0, v2 rampitec: It is not: ``` v_mul_f32_e32 v2, 1.0, v2 v_mul_f32_e32 v2, 1.0, v2 ```
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions And I do not think there are idempotent SDNodes. rampitec: And I do not think there are idempotent SDNodes.
		arsenmUnsubmitted Not Done Reply Inline Actions I don't think it's done now, but fcanonicalize (fcanonicalize x) should fold to just one canonicalize arsenm: I don't think it's done now, but fcanonicalize (fcanonicalize x) should fold to just one…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions ...and that is what I am doing here. rampitec: ...and that is what I am doing here.
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions In fact even if a general folding is implemented in DAGCombiner, I believe it shall stay here as well. It is the same idea as with constants, something else which also normalizes can be combined with it. rampitec: In fact even if a general folding is implemented in DAGCombiner, I believe it shall stay here…
		case ISD::FSQRT:
		case ISD::FCEIL:
		case ISD::FFLOOR:
		case ISD::FMA:
		case ISD::FMAD:

		case ISD::FCANONICALIZE:
		return true;

		case ISD::FP_ROUND:
		return Op.getValueType().getScalarType() != MVT::f16 \|\|
		ST->hasFP16Denormals();

		case ISD::FP_EXTEND:
		return Op.getOperand(0).getValueType().getScalarType() != MVT::f16 \|\|
		ST->hasFP16Denormals();

		case ISD::FP16_TO_FP:
		case ISD::FP_TO_FP16:
		return ST->hasFP16Denormals();

		// It can/will be lowered or combined as a bit operation.
		// Need to check their input recursively to handle.
		case ISD::FNEG:
		arsenmUnsubmitted Not Done Reply Inline Actions We don't handle this arsenm: We don't handle this
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions We do not handle this now, but may handle later. The idea is still the same as with FSIN and FCOS. rampitec: We do not handle this now, but may handle later. The idea is still the same as with FSIN and…
		case ISD::FABS:
		return (MaxDepth > 0) &&
		isCanonicalized(Op.getOperand(0), ST, MaxDepth - 1);

		case ISD::FSIN:
		case ISD::FCOS:
		arsenmUnsubmitted Done Reply Inline Actions The output is never flushed anywhere? arsenm: The output is never flushed anywhere?
		rampitecAuthorUnsubmitted Done Reply Inline Actions GFX9 flushes. rampitec: GFX9 flushes.
		arsenmUnsubmitted Done Reply Inline Actions That is broken. If that is the case we probably shouldn't be using the regular minnum/maxnum intrinsics without denormals, and then it's an optimization to fold canonicalize (min/max) to the weird target behavior. arsenm: That is broken. If that is the case we probably shouldn't be using the regular minnum/maxnum…
		rampitecAuthorUnsubmitted Done Reply Inline Actions The library is common for different targets. rampitec: The library is common for different targets.
		arsenmUnsubmitted Done Reply Inline Actions We need to fix that then. If it's not returning exactly one of the inputs it's not implementing the IEEE min/max arsenm: We need to fix that then. If it's not returning exactly one of the inputs it's not implementing…
		rampitecAuthorUnsubmitted Done Reply Inline Actions In fact IEEE754-2008 (as described for maxnum/minnum llvm ir) tells to return canonicalized numbers, so it is the opposite: GFX9 is finally IEEE compliant. rampitec: In fact IEEE754-2008 (as described for maxnum/minnum llvm ir) tells to return canonicalized…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions This part is split-off the patch. rampitec: This part is split-off the patch.
		case ISD::FSINCOS:
		return Op.getValueType().getScalarType() != MVT::f16;
		arsenmUnsubmitted Not Done Reply Inline Actions We don't handle these arsenm: We don't handle these
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions This again only now. I had a use of it in HSAIL and may actually port it. rampitec: This again only now. I had a use of it in HSAIL and may actually port it.

		// In pre-GFX9 targets V_MIN_F32 and others do not flush denorms.
		arsenmUnsubmitted Done Reply Inline Actions I don't think this is true, but should have a named check in the subtarget arsenm: I don't think this is true, but should have a named check in the subtarget
		rampitecAuthorUnsubmitted Done Reply Inline Actions I would rather think about denorm support flag in TD for every single instruction wrt subtarget. Why add just a single one? rampitec: I would rather think about denorm support flag in TD for every single instruction wrt subtarget.
		arsenmUnsubmitted Done Reply Inline Actions We don't need a full fledged subtarget feature, just put the generation check in a function with name/description rather than adding more random looking generation checks arsenm: We don't need a full fledged subtarget feature, just put the generation check in a function…
		rampitecAuthorUnsubmitted Done Reply Inline Actions I'm not sure I follow. Could you please describe a name of such check? rampitec: I'm not sure I follow. Could you please describe a name of such check?
		arsenmUnsubmitted Done Reply Inline Actions hasAddr64() or hasMed3_16() arsenm: hasAddr64() or hasMed3_16()
		rampitecAuthorUnsubmitted Done Reply Inline Actions hasNormalizingMinMax()? It returns us to the initial point. rampitec: hasNormalizingMinMax()? It returns us to the initial point.
		arsenmUnsubmitted Done Reply Inline Actions The SC name was SupportsMinMaxDenormModes. Either way works arsenm: The SC name was SupportsMinMaxDenormModes. Either way works
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions This code part is dropped for now. rampitec: This code part is dropped for now.
		// For such targets need to check their input recursively.
		// TODO: on GFX9+ we could return true without checking provided no-nan
		// mode, since canonicalization is also used to quiet sNaNs.
		case ISD::FMINNUM:
		case ISD::FMAXNUM:
		case ISD::FMINNAN:
		case ISD::FMAXNAN:

		arsenmUnsubmitted Not Done Reply Inline Actions We constant fold this already so this shouldn't be necessary arsenm: We constant fold this already so this shouldn't be necessary
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions This is necessary. We may constant fold it to remove fcanonicalize on a constant, but consider: canonicalize(fmax(x, 0.0f)); Without above code it would not be handled because we have to look though arguments of min/max/fneg/fabs. I have a test for it actually. In particular it is needed to fold an extremely common case like max(max, ...). rampitec: This is necessary. We may constant fold it to remove fcanonicalize on a constant, but consider…
		return (MaxDepth > 0) &&
		isCanonicalized(Op.getOperand(0), ST, MaxDepth - 1) &&
		isCanonicalized(Op.getOperand(1), ST, MaxDepth - 1);

		case ISD::ConstantFP: {
		auto F = cast<ConstantFPSDNode>(Op)->getValueAPF();
		return !F.isDenormal() && !(F.isNaN() && F.isSignaling());
		}
		}
		return false;
		}

// Constant fold canonicalize.		// Constant fold canonicalize.
SDValue SITargetLowering::performFCanonicalizeCombine(		SDValue SITargetLowering::performFCanonicalizeCombine(
SDNode *N,		SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
		SelectionDAG &DAG = DCI.DAG;
ConstantFPSDNode *CFP = isConstOrConstSplatFP(N->getOperand(0));		ConstantFPSDNode *CFP = isConstOrConstSplatFP(N->getOperand(0));
if (!CFP)
		if (!CFP) {
		SDValue N0 = N->getOperand(0);

		bool IsIEEEMode = Subtarget->enableIEEEBit(DAG.getMachineFunction());

		if ((IsIEEEMode \|\| isKnownNeverSNan(DAG, N0)) &&
		isCanonicalized(N0, getSubtarget()))
		arsenmUnsubmitted Done Reply Inline Actions This is really a check for whether SNaNs are enabled. We have the isKnownNeverSNan check for this already. I think we need to fix having separate FPExceptions and enableIEEEBit subtarget checks arsenm: This is really a check for whether SNaNs are enabled. We have the isKnownNeverSNan check for…
		arsenmUnsubmitted Done Reply Inline Actions This would read more accurately as IsIEEEMode \|\| isKnownNeverSNan. arsenm: This would read more accurately as IsIEEEMode \|\| isKnownNeverSNan.
		return N0;

return SDValue();		return SDValue();
		}

SelectionDAG &DAG = DCI.DAG;
const APFloat &C = CFP->getValueAPF();		const APFloat &C = CFP->getValueAPF();

// Flush denormals to 0 if not enabled.		// Flush denormals to 0 if not enabled.
if (C.isDenormal()) {		if (C.isDenormal()) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
if (SVT == MVT::f32 && !Subtarget->hasFP32Denormals())		if (SVT == MVT::f32 && !Subtarget->hasFP32Denormals())
return DAG.getConstantFP(0.0, SDLoc(N), VT);		return DAG.getConstantFP(0.0, SDLoc(N), VT);
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::performIntMed3ImmCombine(
SDValue Tmp1 = DAG.getNode(ExtOp, SL, NVT, Op0->getOperand(0));		SDValue Tmp1 = DAG.getNode(ExtOp, SL, NVT, Op0->getOperand(0));
SDValue Tmp2 = DAG.getNode(ExtOp, SL, NVT, Op0->getOperand(1));		SDValue Tmp2 = DAG.getNode(ExtOp, SL, NVT, Op0->getOperand(1));
SDValue Tmp3 = DAG.getNode(ExtOp, SL, NVT, Op1);		SDValue Tmp3 = DAG.getNode(ExtOp, SL, NVT, Op1);

SDValue Med3 = DAG.getNode(Med3Opc, SL, NVT, Tmp1, Tmp2, Tmp3);		SDValue Med3 = DAG.getNode(Med3Opc, SL, NVT, Tmp1, Tmp2, Tmp3);
return DAG.getNode(ISD::TRUNCATE, SL, VT, Med3);		return DAG.getNode(ISD::TRUNCATE, SL, VT, Med3);
}		}

static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op) {
if (!DAG.getTargetLoweringInfo().hasFloatingPointExceptions())
return true;

return DAG.isKnownNeverNaN(Op);
}

SDValue SITargetLowering::performFPMed3ImmCombine(SelectionDAG &DAG,		SDValue SITargetLowering::performFPMed3ImmCombine(SelectionDAG &DAG,
const SDLoc &SL,		const SDLoc &SL,
SDValue Op0,		SDValue Op0,
SDValue Op1) const {		SDValue Op1) const {
ConstantFPSDNode *K1 = dyn_cast<ConstantFPSDNode>(Op1);		ConstantFPSDNode *K1 = dyn_cast<ConstantFPSDNode>(Op1);
if (!K1)		if (!K1)
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 1,003 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fcanonicalize-elimination.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=GCN-FLUSH %s
				; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=-fp32-denormals,+fp-exceptions < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-EXCEPT -check-prefix=VI -check-prefix=GCN-FLUSH %s
				; RUN: llc -march=amdgcn -mcpu=gfx901 -verify-machineinstrs -mattr=+fp32-denormals < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 -check-prefix=GFX9-DENORM %s
				arsenmUnsubmitted Done Reply Inline Actions This needs a run line for gfx9 with denormals disabled too arsenm: This needs a run line for gfx9 with denormals disabled too
				; RUN: llc -march=amdgcn -mcpu=gfx901 -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 -check-prefix=GCN-FLUSH %s

				; GCN-LABEL: {{^}}test_no_fold_canonicalize_loaded_value_f32:
				; GCN: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
				define amdgpu_kernel void @test_no_fold_canonicalize_loaded_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%v = load float, float addrspace(1)* %gep, align 4
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: {{^}}test_fold_canonicalize_fmul_value_f32:
				; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fmul_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = fmul float %load, 15.0
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: {{^}}test_fold_canonicalize_sub_value_f32:
				; GCN: v_sub_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_sub_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = fsub float 15.0, %load
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: {{^}}test_fold_canonicalize_add_value_f32:
				; GCN: v_add_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_add_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = fadd float %load, 15.0
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: {{^}}test_fold_canonicalize_sqrt_value_f32:
				; GCN: v_sqrt_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_sqrt_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = call float @llvm.sqrt.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fceil_value_f32:
				; GCN: v_ceil_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fceil_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = call float @llvm.ceil.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_floor_value_f32:
				; GCN: v_floor_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_floor_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = call float @llvm.floor.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fma_value_f32:
				; GCN: v_fma_f32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fma_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = call float @llvm.fma.f32(float %load, float 15.0, float 15.0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fmuladd_value_f32:
				; GCN-FLUSH: v_mac_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
				; GFX9-DENORM: v_fma_f32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fmuladd_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = call float @llvm.fmuladd.f32(float %load, float 15.0, float 15.0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_canonicalize_value_f32:
				; GCN: flat_load_dword [[LOAD:v[0-9]+]],
				; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 1.0, [[LOAD]]
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_canonicalize_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = call float @llvm.canonicalize.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fpextend_value_f64_f32:
				; GCN: v_cvt_f64_f32_e32 [[V:v\[[0-9]+:[0-9]+\]]], v{{[0-9]+}}
				; GCN: flat_store_dwordx2 v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fpextend_value_f64_f32(float addrspace(1)* %arg, double addrspace(1)* %out) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = fpext float %load to double
				%canonicalized = tail call double @llvm.canonicalize.f64(double %v)
				%gep2 = getelementptr inbounds double, double addrspace(1)* %out, i32 %id
				store double %canonicalized, double addrspace(1)* %gep2, align 8
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fpextend_value_f32_f16:
				; GCN: v_cvt_f32_f16_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fpextend_value_f32_f16(half addrspace(1)* %arg, float addrspace(1)* %out) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
				%load = load half, half addrspace(1)* %gep, align 2
				%v = fpext half %load to float
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id
				store float %canonicalized, float addrspace(1)* %gep2, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fpround_value_f32_f64:
				; GCN: v_cvt_f32_f64_e32 [[V:v[0-9]+]], v[{{[0-9:]+}}]
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fpround_value_f32_f64(double addrspace(1)* %arg, float addrspace(1)* %out) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id
				%load = load double, double addrspace(1)* %gep, align 8
				%v = fptrunc double %load to float
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id
				store float %canonicalized, float addrspace(1)* %gep2, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fpround_value_f16_f32:
				; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_short v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fpround_value_f16_f32(float addrspace(1)* %arg, half addrspace(1)* %out) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = fptrunc float %load to half
				%canonicalized = tail call half @llvm.canonicalize.f16(half %v)
				%gep2 = getelementptr inbounds half, half addrspace(1)* %out, i32 %id
				store half %canonicalized, half addrspace(1)* %gep2, align 2
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fpround_value_v2f16_v2f32:
				; GCN-DAG: v_cvt_f16_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}
				; VI-DAG: v_cvt_f16_f32_sdwa [[V1:v[0-9]+]], v{{[0-9]+}}
				; VI: v_or_b32_e32 [[V:v[0-9]+]], [[V0]], [[V1]]
				; GFX9: v_cvt_f16_f32_e32 [[V1:v[0-9]+]], v{{[0-9]+}}
				; GFX9: v_and_b32_e32 [[V0_16:v[0-9]+]], 0xffff, [[V0]]
				; GFX9: v_lshl_or_b32 [[V:v[0-9]+]], [[V1]], 16, [[V0_16]]
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fpround_value_v2f16_v2f32(<2 x float> addrspace(1)* %arg, <2 x half> addrspace(1)* %out) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds <2 x float>, <2 x float> addrspace(1)* %arg, i32 %id
				%load = load <2 x float>, <2 x float> addrspace(1)* %gep, align 8
				%v = fptrunc <2 x float> %load to <2 x half>
				%canonicalized = tail call <2 x half> @llvm.canonicalize.v2f16(<2 x half> %v)
				%gep2 = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i32 %id
				store <2 x half> %canonicalized, <2 x half> addrspace(1)* %gep2, align 4
				ret void
				}

				; GCN-LABEL: test_no_fold_canonicalize_fneg_value_f32:
				; GCN: v_mul_f32_e64 v{{[0-9]+}}, 1.0, -v{{[0-9]+}}
				define amdgpu_kernel void @test_no_fold_canonicalize_fneg_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = fsub float -0.0, %load
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fneg_value_f32:
				; GCN: v_xor_b32_e32 [[V:v[0-9]+]], 0x80000000, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fneg_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v0 = fadd float %load, 0.0
				%v = fsub float -0.0, %v0
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_no_fold_canonicalize_fabs_value_f32:
				; GCN: v_mul_f32_e64 v{{[0-9]+}}, 1.0, \|v{{[0-9]+}}\|
				define amdgpu_kernel void @test_no_fold_canonicalize_fabs_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.fabs.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_fabs_value_f32:
				; GCN: v_and_b32_e32 [[V:v[0-9]+]], 0x7fffffff, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_fabs_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v0 = fadd float %load, 0.0
				%v = tail call float @llvm.fabs.f32(float %v0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_sin_value_f32:
				; GCN: v_sin_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_sin_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.sin.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_cos_value_f32:
				; GCN: v_cos_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_cos_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.cos.f32(float %load)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_sin_value_f16:
				; GCN: v_sin_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}
				; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], [[V0]]
				; GCN: flat_store_short v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_sin_value_f16(half addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
				%load = load half, half addrspace(1)* %gep, align 2
				%v = tail call half @llvm.sin.f16(half %load)
				%canonicalized = tail call half @llvm.canonicalize.f16(half %v)
				store half %canonicalized, half addrspace(1)* %gep, align 2
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_cos_value_f16:
				; GCN: v_cos_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}
				; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], [[V0]]
				; GCN: flat_store_short v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_cos_value_f16(half addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
				%load = load half, half addrspace(1)* %gep, align 2
				%v = tail call half @llvm.cos.f16(half %load)
				%canonicalized = tail call half @llvm.canonicalize.f16(half %v)
				store half %canonicalized, half addrspace(1)* %gep, align 2
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_qNaN_value_f32:
				; GCN: v_mov_b32_e32 [[V:v[0-9]+]], 0x7fc00000
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_qNaN_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%canonicalized = tail call float @llvm.canonicalize.f32(float 0x7FF8000000000000)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_minnum_value_from_load_f32:
				; GCN: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
				define amdgpu_kernel void @test_fold_canonicalize_minnum_value_from_load_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.minnum.f32(float %load, float 0.0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_minnum_value_f32:
				; GCN: v_min_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v0 = fadd float %load, 0.0
				%v = tail call float @llvm.minnum.f32(float %v0, float 0.0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:
				; GCN: v_min_f32_e32 [[V0:v[0-9]+]], 0x7f800001, v{{[0-9]+}}
				; GCN: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[V0]]
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.minnum.f32(float %load, float bitcast (i32 2139095041 to float))
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_denorm_value_f32:
				; GCN: v_min_f32_e32 [[V0:v[0-9]+]], 0x7fffff, v{{[0-9]+}}
				; GCN: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[V0]]
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				define amdgpu_kernel void @test_fold_canonicalize_denorm_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.minnum.f32(float %load, float bitcast (i32 8388607 to float))
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_maxnum_value_from_load_f32:
				; GCN: v_max_f32_e32 [[V0:v[0-9]+]], 0, v{{[0-9]+}}
				; GCN: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[V0]]
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_from_load_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v = tail call float @llvm.maxnum.f32(float %load, float 0.0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_maxnum_value_f32:
				; GCN: v_max_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}
				; GCN: flat_store_dword v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f32(float addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
				%load = load float, float addrspace(1)* %gep, align 4
				%v0 = fadd float %load, 0.0
				%v = tail call float @llvm.maxnum.f32(float %v0, float 0.0)
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				store float %canonicalized, float addrspace(1)* %gep, align 4
				ret void
				}

				; GCN-LABEL: test_fold_canonicalize_maxnum_value_f64:
				; GCN: v_max_f64 [[V:v\[[0-9]+:[0-9]+\]]], v[{{[0-9:]+}}], 0
				; GCN: flat_store_dwordx2 v[{{[0-9:]+}}], [[V]]
				; GCN-NOT: 1.0
				define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f64(double addrspace(1)* %arg) {
				%id = tail call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id
				%load = load double, double addrspace(1)* %gep, align 8
				%v0 = fadd double %load, 0.0
				%v = tail call double @llvm.maxnum.f64(double %v0, double 0.0)
				%canonicalized = tail call double @llvm.canonicalize.f64(double %v)
				store double %canonicalized, double addrspace(1)* %gep, align 8
				ret void
				}

				; GCN-LABEL: test_no_fold_canonicalize_fmul_value_f32_no_ieee:
				; GCN-EXCEPT: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
				define amdgpu_ps float @test_no_fold_canonicalize_fmul_value_f32_no_ieee(float %arg) {
				entry:
				%v = fmul float %arg, 15.0
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				ret float %canonicalized
				}

				; GCN-LABEL: test_fold_canonicalize_fmul_nnan_value_f32_no_ieee:
				; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
				; GCN-NEXT: ; return
				; GCN-NOT: 1.0
				define amdgpu_ps float @test_fold_canonicalize_fmul_nnan_value_f32_no_ieee(float %arg) {
				entry:
				%v = fmul nnan float %arg, 15.0
				%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
				ret float %canonicalized
				}

				declare float @llvm.canonicalize.f32(float) #0
				declare double @llvm.canonicalize.f64(double) #0
				declare half @llvm.canonicalize.f16(half) #0
				declare <2 x half> @llvm.canonicalize.v2f16(<2 x half>) #0
				declare i32 @llvm.amdgcn.workitem.id.x() #0
				declare float @llvm.sqrt.f32(float) #0
				declare float @llvm.ceil.f32(float) #0
				declare float @llvm.floor.f32(float) #0
				declare float @llvm.fma.f32(float, float, float) #0
				declare float @llvm.fmuladd.f32(float, float, float) #0
				declare float @llvm.fabs.f32(float) #0
				declare float @llvm.sin.f32(float) #0
				declare float @llvm.cos.f32(float) #0
				declare half @llvm.sin.f16(half) #0
				declare half @llvm.cos.f16(half) #0
				declare float @llvm.minnum.f32(float, float) #0
				declare float @llvm.maxnum.f32(float, float) #0
				declare double @llvm.maxnum.f64(double, double) #0

				attributes #0 = { nounwind readnone }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] fcanonicalize elimination optimization
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106298

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/fcanonicalize-elimination.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] fcanonicalize elimination optimizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106298

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/fcanonicalize-elimination.ll

[AMDGPU] fcanonicalize elimination optimization
ClosedPublic