This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Try to make isKnownNeverSNan more accurate
AbandonedPublic

Authored by arsenm on Jul 19 2018, 11:14 AM.

Download Raw Diff

Details

Reviewers

scanon
rampitec
b-sumner

Summary

If I'm interpreting the standard correctly, everything
here was essentially incorrect.

Saying no if fp exceptions are not enabled, then no value is an snan is incorrect.
A loaded value could still be an snan. If FP exceptions
are enabled, certain operations will produce an snan.
Common operations won't produce an snan, and will quiet
an snan input.

Try to get the set of operations that would raise an exception if
they were to produce an snan more correct.

Diff Detail

Event Timeline

arsenm created this revision.Jul 19 2018, 11:14 AM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptJul 19 2018, 11:14 AM

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

rampitec added inline comments.Jul 19 2018, 12:48 PM

lib/Target/AMDGPU/SIISelLowering.cpp
6733	Do these quiet incoming sNaNs?

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

lib/Target/AMDGPU/SIISelLowering.cpp
6733	That's my understanding of how the basic operations work

scanon added inline comments.Jul 19 2018, 2:41 PM

lib/Target/AMDGPU/SIISelLowering.cpp
6733	Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign, fabs, (fneg would if we had it), and things like loads and bitcasts.

scanon added inline comments.Jul 19 2018, 2:42 PM

lib/Target/AMDGPU/SIISelLowering.cpp
6733	(and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.)

rampitec added a reviewer: b-sumner.Jul 19 2018, 4:05 PM

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

At the very least it is knownNeverNan, then it cannot be a signaling nan as well.

In D49561#1169015, @rampitec wrote:

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

Check ieee_mode in minnum/maxnum handling, although I think it is totally broken that this is necessary

First, I think this is wrong diff attached, that is not what is in the trunk on the left side of the diff now.

In D49561#1169653, @arsenm wrote:

In D49561#1169015, @rampitec wrote:

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

Should we care about sNaNs with FP exceptions disabled?
Should we care about sNaNs if a node is known never NaN?
Should we care about sNaNs if no-nans is enabled?
Should we care about sNaNs if fast-math is enabled?
Should we turn ieee_mode off?

Let’s see:

Description of setHasFloatingPointExceptions():

/// Tells the code generator that this target supports floating point
/// exceptions and cares about preserving floating point exception behavior.

I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.
-enable-no-nans-fp-math and derivatives:

/// NoNaNsFPMath - This flag is enabled when the
/// -enable-no-nans-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to
/// assume the FP arithmetic arguments and results are never NaNs.

So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

UnsafeAlgebra includes NoNaNs and thus all of the above applies.
I do not believe we have to disable ieee_mode.
- Even min/max handling was changed few times with and without ieee_mode bit in different generations of our GPU, so that is not a universal solution.
- Other fp operations will not quiet sNaNs as required if you disable it.

I believe we were exploring the idea to disable ieee in the past and found we will have more problems without it. With ieee we only have min/max problem.

arsenm added a child revision: D49605: AMDGPU: Fix implementation of isCanonicalized.Jul 20 2018, 12:37 PM

In D49561#1170203, @rampitec wrote:

In D49561#1169653, @arsenm wrote:

In D49561#1169015, @rampitec wrote:

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

Should we care about sNaNs if a node is known never NaN?

Knowing it's never nan is how you know you don't need to care about it

Should we care about sNaNs if no-nans is enabled?

I think this is a grey area, but probably not.

Should we care about sNaNs if fast-math is enabled?

Should we turn ieee_mode off?

Let’s see:

Description of setHasFloatingPointExceptions():
/// Tells the code generator that this target supports floating point
/// exceptions and cares about preserving floating point exception behavior.
I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

This is not true. Assuming this will break the fmin/fmax conformance tests. This is now just something we always turn on. We don't enable trap on FP exception, but that doesn't mean that there aren't sNaNs somewhere that need to be handled correctly. This property also should probably be removed, since things are moving to relying on the STRICT_* versions of operations to get this behavior

sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.

Yes

-enable-no-nans-fp-math and derivatives:
/// NoNaNsFPMath - This flag is enabled when the
/// -enable-no-nans-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to
/// assume the FP arithmetic arguments and results are never NaNs.
So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

Yes

UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

Attach right diff

Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

OK. Makes sense. Just maybe we need to tell llvm that we actually support exceptions, which should not necessarily mean we trap on them.

UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

OK, UnsafeAlgebra is probably not a right condition. FastMathFlags::isFast() definitely is.

UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

OK, UnsafeAlgebra is probably not a right condition. FastMathFlags::isFast() definitely is.

Is fast is just all the flags enabled. No nans is just one bit of it

So given the discussion you seem to be missing DAG.isKnownNeverNaN(Op) condition.

rampitec added inline comments.Jul 20 2018, 1:32 PM

lib/Target/AMDGPU/SIISelLowering.h
317	It does not belong to this patch.

arsenm mentioned this in D49662: DAG: Enhance isKnownNeverNaN.Jul 23 2018, 4:12 AM

Superseded by D49662 and D49841

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIISelLowering.h

5 lines

SIISelLowering.cpp

102 lines

test/

CodeGen/

AMDGPU/

fcanonicalize-elimination.ll

243 lines

Diff 156523

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	public:
void computeKnownBitsForFrameIndex(const SDValue Op,		void computeKnownBitsForFrameIndex(const SDValue Op,
KnownBits &Known,		KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth = 0) const override;		unsigned Depth = 0) const override;

bool isSDNodeSourceOfDivergence(const SDNode *N,		bool isSDNodeSourceOfDivergence(const SDNode *N,
FunctionLoweringInfo FLI, DivergenceAnalysis DA) const override;		FunctionLoweringInfo FLI, DivergenceAnalysis DA) const override;

		bool isCanonicalized(SelectionDAG &DAG, SDValue Op,
		unsigned MaxDepth = 5) const;

		bool denormalsEnabledForType(EVT VT) const;
		rampitecUnsubmitted Not Done Reply Inline Actions It does not belong to this patch. rampitec: It does not belong to this patch.
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,724 Lines • ▼ Show 20 Lines	static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op,
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
case ISD::ConstantFP: {		case ISD::ConstantFP: {
ConstantFPSDNode *C = cast<ConstantFPSDNode>(Op);		ConstantFPSDNode *C = cast<ConstantFPSDNode>(Op);
return !C->getValueAPF().isNaN() \|\|		return !C->getValueAPF().isNaN() \|\|
!C->getValueAPF().isSignaling();		!C->getValueAPF().isSignaling();
}		}
case ISD::FADD:		case ISD::FADD:
case ISD::FSUB:		case ISD::FSUB:
case ISD::FMUL:		case ISD::FMUL:
		rampitecUnsubmitted Not Done Reply Inline Actions Do these quiet incoming sNaNs? rampitec: Do these quiet incoming sNaNs?
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions That's my understanding of how the basic operations work arsenm: That's my understanding of how the basic operations work
		scanonUnsubmitted Not Done Reply Inline Actions Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign, fabs, (fneg would if we had it), and things like loads and bitcasts. scanon: Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign…
		scanonUnsubmitted Not Done Reply Inline Actions (and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.) scanon: (and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.)
case ISD::FMAD:		case ISD::FMAD:
case ISD::FMINNUM:		case ISD::FMINNUM:
case ISD::FMAXNUM:		case ISD::FMAXNUM:
case ISD::FCANONICALIZE:		case ISD::FCANONICALIZE:
case AMDGPUISD::FMED3:		case AMDGPUISD::FMED3:
case AMDGPUISD::FMIN3:		case AMDGPUISD::FMIN3:
case AMDGPUISD::FMAX3:		case AMDGPUISD::FMAX3:
case AMDGPUISD::FMIN_LEGACY:		case AMDGPUISD::FMIN_LEGACY:
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op,
case ISD::SELECT:		case ISD::SELECT:
return isKnownNeverSNan(DAG, Op.getOperand(1), Depth + 1) &&		return isKnownNeverSNan(DAG, Op.getOperand(1), Depth + 1) &&
isKnownNeverSNan(DAG, Op.getOperand(2), Depth + 1);		isKnownNeverSNan(DAG, Op.getOperand(2), Depth + 1);

case ISD::FMAXNAN:		case ISD::FMAXNAN:
case ISD::FMINNAN:		case ISD::FMINNAN:
// TODO: What do these do for snans?		// TODO: What do these do for snans?
default:		default:
return false;		return DAG.getTarget().Options.NoNaNsFPMath;
}		}
}		}

static bool isCanonicalized(SelectionDAG &DAG, SDValue Op,		bool SITargetLowering::isCanonicalized(SelectionDAG &DAG, SDValue Op,
const GCNSubtarget *ST, unsigned MaxDepth=5) {		unsigned MaxDepth) const {
// If source is a result of another standard FP operation it is already in		// If source is a result of another standard FP operation it is already in
// canonical form.		// canonical form.

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:
break;

// These will flush denorms if required.		// These will flush denorms if required.
case ISD::FADD:		case ISD::FADD:
case ISD::FSUB:		case ISD::FSUB:
case ISD::FMUL:		case ISD::FMUL:
case ISD::FSQRT:
case ISD::FCEIL:		case ISD::FCEIL:
case ISD::FFLOOR:		case ISD::FFLOOR:
case ISD::FMA:		case ISD::FMA:
case ISD::FMAD:		case ISD::FMAD:

case ISD::FCANONICALIZE:		case ISD::FCANONICALIZE:
		case AMDGPUISD::FMUL_LEGACY:
return true;		return true;
		case ISD::FSQRT:
		case ISD::FDIV:
		case ISD::FREM:
		return !hasFloatingPointExceptions();
case ISD::FP_ROUND:		case ISD::FP_ROUND:
return Op.getValueType().getScalarType() != MVT::f16 \|\|		return Op.getValueType().getScalarType() != MVT::f16 \|\|
ST->hasFP16Denormals();		Subtarget->hasFP16Denormals();

case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
return Op.getOperand(0).getValueType().getScalarType() != MVT::f16 \|\|		return Op.getOperand(0).getValueType().getScalarType() != MVT::f16 \|\|
ST->hasFP16Denormals();		Subtarget->hasFP16Denormals();

case ISD::FP16_TO_FP:		case ISD::FP16_TO_FP:
case ISD::FP_TO_FP16:		case ISD::FP_TO_FP16:
return ST->hasFP16Denormals();		return Subtarget->hasFP16Denormals();

// It can/will be lowered or combined as a bit operation.		// It can/will be lowered or combined as a bit operation.
// Need to check their input recursively to handle.		// Need to check their input recursively to handle.
case ISD::FNEG:		case ISD::FNEG:
case ISD::FABS:		case ISD::FABS:
		case ISD::FCOPYSIGN:
return (MaxDepth > 0) &&		return (MaxDepth > 0) &&
isCanonicalized(DAG, Op.getOperand(0), ST, MaxDepth - 1);		isCanonicalized(DAG, Op.getOperand(0), MaxDepth - 1);

case ISD::FSIN:		case ISD::FSIN:
case ISD::FCOS:		case ISD::FCOS:
case ISD::FSINCOS:		case ISD::FSINCOS:
return Op.getValueType().getScalarType() != MVT::f16;		return Op.getValueType().getScalarType() != MVT::f16;

		case ISD::FMINNUM:
		case ISD::FMAXNUM: {
		// Returns quieted sNaNs
		bool IsIEEEMode = Subtarget->enableIEEEBit(DAG.getMachineFunction());
		if (IsIEEEMode && Subtarget->supportsMinMaxDenormModes()) {
// In pre-GFX9 targets V_MIN_F32 and others do not flush denorms.		// In pre-GFX9 targets V_MIN_F32 and others do not flush denorms.
// For such targets need to check their input recursively.		// For such targets need to check their input recursively.
case ISD::FMINNUM:		// FIXME: Shouldn't treat the generic operations different based on this.
case ISD::FMAXNUM:
case ISD::FMINNAN:
case ISD::FMAXNAN:

if (ST->supportsMinMaxDenormModes() &&
DAG.isKnownNeverNaN(Op.getOperand(0)) &&
DAG.isKnownNeverNaN(Op.getOperand(1)))
return true;		return true;
		}

return (MaxDepth > 0) &&		// With ieee_mode off, the nan is returned as-is, so if it is an sNaN it
isCanonicalized(DAG, Op.getOperand(0), ST, MaxDepth - 1) &&		// needs to be quieted.
isCanonicalized(DAG, Op.getOperand(1), ST, MaxDepth - 1);		if (denormalsEnabledForType(Op.getValueType())) {
		// No flushing required, so we just need to care about snans.
		return isKnownNeverSNan(DAG, Op.getOperand(0)) &&
		isKnownNeverSNan(DAG, Op.getOperand(1));
		}

		// Flushing or quieting may be necessary.
		return (MaxDepth > 0) &&
		isCanonicalized(DAG, Op.getOperand(0), MaxDepth - 1) &&
		isCanonicalized(DAG, Op.getOperand(1), MaxDepth - 1);
		}
case ISD::ConstantFP: {		case ISD::ConstantFP: {
auto F = cast<ConstantFPSDNode>(Op)->getValueAPF();		auto F = cast<ConstantFPSDNode>(Op)->getValueAPF();
return !F.isDenormal() && !(F.isNaN() && F.isSignaling());		if (F.isNaN() && F.isSignaling())
		return false;
		return !F.isDenormal() \|\| denormalsEnabledForType(Op.getValueType());
}		}
		case ISD::SELECT: {
		return isCanonicalized(DAG, Op.getOperand(1), MaxDepth - 1) &&
		isCanonicalized(DAG, Op.getOperand(2), MaxDepth - 1);
}		}
return false;		default:
		return denormalsEnabledForType(Op.getValueType()) &&
		DAG.isKnownNeverNaN(Op);
		}

		llvm_unreachable("invalid operation");
}		}

// Constant fold canonicalize.		// Constant fold canonicalize.
SDValue SITargetLowering::performFCanonicalizeCombine(		SDValue SITargetLowering::performFCanonicalizeCombine(
SDNode *N,		SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
ConstantFPSDNode *CFP = isConstOrConstSplatFP(N->getOperand(0));		ConstantFPSDNode *CFP = isConstOrConstSplatFP(N->getOperand(0));

if (!CFP) {		if (!CFP) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N0.getValueType().getScalarType();		return isCanonicalized(DAG, N0) ? N0 : SDValue();
auto ST = getSubtarget();

if (((VT == MVT::f32 && ST->hasFP32Denormals()) \|\|
(VT == MVT::f64 && ST->hasFP64Denormals()) \|\|
(VT == MVT::f16 && ST->hasFP16Denormals())) &&
DAG.isKnownNeverNaN(N0))
return N0;

bool IsIEEEMode = Subtarget->enableIEEEBit(DAG.getMachineFunction());

if ((IsIEEEMode \|\| isKnownNeverSNan(DAG, N0)) &&
isCanonicalized(DAG, N0, ST))
return N0;

return SDValue();
}		}

const APFloat &C = CFP->getValueAPF();		const APFloat &C = CFP->getValueAPF();

// Flush denormals to 0 if not enabled.		// Flush denormals to 0 if not enabled.
if (C.isDenormal()) {		if (C.isDenormal()) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
▲ Show 20 Lines • Show All 1,600 Lines • ▼ Show 20 Lines	switch (N->getOpcode()) {
// lowered to AMDGPUISD so we also need to check those too.		// lowered to AMDGPUISD so we also need to check those too.
case AMDGPUISD::INTERP_MOV:		case AMDGPUISD::INTERP_MOV:
case AMDGPUISD::INTERP_P1:		case AMDGPUISD::INTERP_P1:
case AMDGPUISD::INTERP_P2:		case AMDGPUISD::INTERP_P2:
return true;		return true;
}		}
return false;		return false;
}		}

		bool SITargetLowering::denormalsEnabledForType(EVT VT) const {
		switch (VT.getScalarType().getSimpleVT().SimpleTy) {
		case MVT::f32:
		return Subtarget->hasFP32Denormals();
		case MVT::f64:
		return Subtarget->hasFP64Denormals();
		case MVT::f16:
		return Subtarget->hasFP16Denormals();
		default:
		return false;
		}
		}

test/CodeGen/AMDGPU/fcanonicalize-elimination.ll

; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,GCN-FLUSH %s		; RUN: llc -march=amdgcn -mcpu=gfx801 -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,VI-FLUSH,GCN-FLUSH,GCN-NOEXCEPT %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=-fp32-denormals,+fp-exceptions < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN-EXCEPT,VI,GCN-FLUSH %s		; RUN: llc -march=amdgcn -mcpu=gfx801 -verify-machineinstrs -mattr=-fp32-denormals,+fp-exceptions < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN-EXCEPT,VI,VI-FLUSH,GCN-FLUSH %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -mattr=+fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-DENORM,GCN-DENORM %s		; RUN: llc -march=amdgcn -mcpu=gfx801 -verify-machineinstrs -mattr=+fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,VI-DENORM,GCN-DENORM,GCN-NOEXCEPT %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-FLUSH,GCN-FLUSH %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -mattr=+fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-DENORM,GCN-DENORM,GCN-NOEXCEPT %s
		; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-FLUSH,GCN-FLUSH,GCN-NOEXCEPT %s

; GCN-LABEL: {{^}}test_no_fold_canonicalize_loaded_value_f32:		; GCN-LABEL: {{^}}test_no_fold_canonicalize_loaded_value_f32:
; GCN-FLUSH: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}		; GCN-FLUSH: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
; GFX9-DENORM: v_max_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GFX9-DENORM: v_max_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
define amdgpu_kernel void @test_no_fold_canonicalize_loaded_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_no_fold_canonicalize_loaded_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%v = load float, float addrspace(1)* %gep, align 4		%v = load float, float addrspace(1)* %gep, align 4
Show All 11 Lines	define amdgpu_kernel void @test_fold_canonicalize_fmul_value_f32(float addrspace(1)* %arg) {
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = fmul float %load, 15.0		%v = fmul float %load, 15.0
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

		; GCN-LABEL: {{^}}test_fold_canonicalize_fmul_legacy_value_f32:
		; GCN: v_mul_legacy_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
		define amdgpu_kernel void @test_fold_canonicalize_fmul_legacy_value_f32(float addrspace(1)* %arg) {
		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
		%load = load float, float addrspace(1)* %gep, align 4
		%v = call float @llvm.amdgcn.fmul.legacy(float %load, float 15.0)
		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
		store float %canonicalized, float addrspace(1)* %gep, align 4
		ret void
		}

; GCN-LABEL: {{^}}test_fold_canonicalize_sub_value_f32:		; GCN-LABEL: {{^}}test_fold_canonicalize_sub_value_f32:
; GCN: v_sub_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}		; GCN: v_sub_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_sub_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_sub_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = fsub float 15.0, %load		%v = fsub float 15.0, %load
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_fold_canonicalize_add_value_f32:		; GCN-LABEL: {{^}}test_fold_canonicalize_add_value_f32:
; GCN: v_add_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}		; GCN: v_add_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_add_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_add_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = fadd float %load, 15.0		%v = fadd float %load, 15.0
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_fold_canonicalize_sqrt_value_f32:		; GCN-LABEL: {{^}}test_fold_canonicalize_sqrt_value_f32:
; GCN: v_sqrt_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}		; GCN: v_sqrt_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
		; GCN-NOEXCEPT-NOT: v_mul
		; GCN-NOEXCEPT-NOT: v_max
		; GCN-EXCEPT: v_mul_f32_e32 [[V:v[0-9]+]], 1.0, [[V]]
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_sqrt_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_sqrt_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.sqrt.f32(float %load)		%v = call float @llvm.sqrt.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fceil_value_f32:		; GCN-LABEL: test_fold_canonicalize_fceil_value_f32:
; GCN: v_ceil_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}		; GCN: v_ceil_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fceil_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_fceil_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.ceil.f32(float %load)		%v = call float @llvm.ceil.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_floor_value_f32:		; GCN-LABEL: test_fold_canonicalize_floor_value_f32:
; GCN: v_floor_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}		; GCN: v_floor_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_floor_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_floor_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.floor.f32(float %load)		%v = call float @llvm.floor.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fma_value_f32:		; GCN-LABEL: test_fold_canonicalize_fma_value_f32:
; GCN: v_fma_f32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN: v_fma_f32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fma_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_fma_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.fma.f32(float %load, float 15.0, float 15.0)		%v = call float @llvm.fma.f32(float %load, float 15.0, float 15.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fmuladd_value_f32:		; GCN-LABEL: test_fold_canonicalize_fmuladd_value_f32:
; GCN-FLUSH: v_mac_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}		; GCN-FLUSH: v_mac_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
; GFX9-DENORM: v_fma_f32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GFX9-DENORM: v_fma_f32 [[V:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN-NOT: v_mul
; GCN-NOT: 1.0		; GCN-NOT: v_max
		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}],
define amdgpu_kernel void @test_fold_canonicalize_fmuladd_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_fmuladd_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.fmuladd.f32(float %load, float 15.0, float 15.0)		%v = call float @llvm.fmuladd.f32(float %load, float 15.0, float 15.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_canonicalize_value_f32:		; GCN-LABEL: test_fold_canonicalize_canonicalize_value_f32:
; GCN: {{flat\|global}}_load_dword [[LOAD:v[0-9]+]],		; GCN: {{flat\|global}}_load_dword [[LOAD:v[0-9]+]],
; GCN-FLUSH: v_mul_f32_e32 [[V:v[0-9]+]], 1.0, [[LOAD]]		; GCN-FLUSH: v_mul_f32_e32 [[V:v[0-9]+]], 1.0, [[LOAD]]
; GCN-DENORM: v_max_f32_e32 [[V:v[0-9]+]], [[LOAD]], [[LOAD]]		; GCN-DENORM: v_max_f32_e32 [[V:v[0-9]+]], [[LOAD]], [[LOAD]]
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_canonicalize_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_canonicalize_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.canonicalize.f32(float %load)		%v = call float @llvm.canonicalize.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fpextend_value_f64_f32:		; GCN-LABEL: test_fold_canonicalize_fpextend_value_f64_f32:
; GCN: v_cvt_f64_f32_e32 [[V:v\[[0-9]+:[0-9]+\]]], v{{[0-9]+}}		; GCN: v_cvt_f64_f32_e32 [[V:v\[[0-9]+:[0-9]+\]]], v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dwordx2 v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dwordx2 v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fpextend_value_f64_f32(float addrspace(1)* %arg, double addrspace(1)* %out) {		define amdgpu_kernel void @test_fold_canonicalize_fpextend_value_f64_f32(float addrspace(1)* %arg, double addrspace(1)* %out) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = fpext float %load to double		%v = fpext float %load to double
%canonicalized = tail call double @llvm.canonicalize.f64(double %v)		%canonicalized = tail call double @llvm.canonicalize.f64(double %v)
%gep2 = getelementptr inbounds double, double addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds double, double addrspace(1)* %out, i32 %id
store double %canonicalized, double addrspace(1)* %gep2, align 8		store double %canonicalized, double addrspace(1)* %gep2, align 8
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fpextend_value_f32_f16:		; GCN-LABEL: test_fold_canonicalize_fpextend_value_f32_f16:
; GCN: v_cvt_f32_f16_e32 [[V:v[0-9]+]], v{{[0-9]+}}		; GCN: v_cvt_f32_f16_e32 [[V:v[0-9]+]], v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fpextend_value_f32_f16(half addrspace(1)* %arg, float addrspace(1)* %out) {		define amdgpu_kernel void @test_fold_canonicalize_fpextend_value_f32_f16(half addrspace(1)* %arg, float addrspace(1)* %out) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
%load = load half, half addrspace(1)* %gep, align 2		%load = load half, half addrspace(1)* %gep, align 2
%v = fpext half %load to float		%v = fpext half %load to float
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id
store float %canonicalized, float addrspace(1)* %gep2, align 4		store float %canonicalized, float addrspace(1)* %gep2, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fpround_value_f32_f64:		; GCN-LABEL: test_fold_canonicalize_fpround_value_f32_f64:
; GCN: v_cvt_f32_f64_e32 [[V:v[0-9]+]], v[{{[0-9:]+}}]		; GCN: v_cvt_f32_f64_e32 [[V:v[0-9]+]], v[{{[0-9:]+}}]
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fpround_value_f32_f64(double addrspace(1)* %arg, float addrspace(1)* %out) {		define amdgpu_kernel void @test_fold_canonicalize_fpround_value_f32_f64(double addrspace(1)* %arg, float addrspace(1)* %out) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id
%load = load double, double addrspace(1)* %gep, align 8		%load = load double, double addrspace(1)* %gep, align 8
%v = fptrunc double %load to float		%v = fptrunc double %load to float
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id
store float %canonicalized, float addrspace(1)* %gep2, align 4		store float %canonicalized, float addrspace(1)* %gep2, align 4
Show All 17 Lines

; GCN-LABEL: test_fold_canonicalize_fpround_value_v2f16_v2f32:		; GCN-LABEL: test_fold_canonicalize_fpround_value_v2f16_v2f32:
; GCN-DAG: v_cvt_f16_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}		; GCN-DAG: v_cvt_f16_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}
; VI-DAG: v_cvt_f16_f32_sdwa [[V1:v[0-9]+]], v{{[0-9]+}}		; VI-DAG: v_cvt_f16_f32_sdwa [[V1:v[0-9]+]], v{{[0-9]+}}
; VI: v_or_b32_e32 [[V:v[0-9]+]], [[V0]], [[V1]]		; VI: v_or_b32_e32 [[V:v[0-9]+]], [[V0]], [[V1]]
; GFX9: v_cvt_f16_f32_e32 [[V1:v[0-9]+]], v{{[0-9]+}}		; GFX9: v_cvt_f16_f32_e32 [[V1:v[0-9]+]], v{{[0-9]+}}
; GFX9: v_and_b32_e32 [[V0_16:v[0-9]+]], 0xffff, [[V0]]		; GFX9: v_and_b32_e32 [[V0_16:v[0-9]+]], 0xffff, [[V0]]
; GFX9: v_lshl_or_b32 [[V:v[0-9]+]], [[V1]], 16, [[V0_16]]		; GFX9: v_lshl_or_b32 [[V:v[0-9]+]], [[V1]], 16, [[V0_16]]
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fpround_value_v2f16_v2f32(<2 x float> addrspace(1)* %arg, <2 x half> addrspace(1)* %out) {		define amdgpu_kernel void @test_fold_canonicalize_fpround_value_v2f16_v2f32(<2 x float> addrspace(1)* %arg, <2 x half> addrspace(1)* %out) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds <2 x float>, <2 x float> addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds <2 x float>, <2 x float> addrspace(1)* %arg, i32 %id
%load = load <2 x float>, <2 x float> addrspace(1)* %gep, align 8		%load = load <2 x float>, <2 x float> addrspace(1)* %gep, align 8
%v = fptrunc <2 x float> %load to <2 x half>		%v = fptrunc <2 x float> %load to <2 x half>
%canonicalized = tail call <2 x half> @llvm.canonicalize.v2f16(<2 x half> %v)		%canonicalized = tail call <2 x half> @llvm.canonicalize.v2f16(<2 x half> %v)
%gep2 = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i32 %id
store <2 x half> %canonicalized, <2 x half> addrspace(1)* %gep2, align 4		store <2 x half> %canonicalized, <2 x half> addrspace(1)* %gep2, align 4
Show All 10 Lines	define amdgpu_kernel void @test_no_fold_canonicalize_fneg_value_f32(float addrspace(1)* %arg) {
%v = fsub float -0.0, %load		%v = fsub float -0.0, %load
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fneg_value_f32:		; GCN-LABEL: test_fold_canonicalize_fneg_value_f32:
; GCN: v_xor_b32_e32 [[V:v[0-9]+]], 0x80000000, v{{[0-9]+}}		; GCN: v_xor_b32_e32 [[V:v[0-9]+]], 0x80000000, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fneg_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_fneg_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v0 = fadd float %load, 0.0		%v0 = fadd float %load, 0.0
%v = fsub float -0.0, %v0		%v = fsub float -0.0, %v0
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_no_fold_canonicalize_fabs_value_f32:		; GCN-LABEL: test_no_fold_canonicalize_fabs_value_f32:
; GCN-FLUSH: v_mul_f32_e64 v{{[0-9]+}}, 1.0, \|v{{[0-9]+}}\|		; GCN-FLUSH: v_mul_f32_e64 v{{[0-9]+}}, 1.0, \|v{{[0-9]+}}\|
; GCN-DENORM: v_max_f32_e64 v{{[0-9]+}}, \|v{{[0-9]+}}\|, \|v{{[0-9]+}}\|		; GCN-DENORM: v_max_f32_e64 v{{[0-9]+}}, \|v{{[0-9]+}}\|, \|v{{[0-9]+}}\|
define amdgpu_kernel void @test_no_fold_canonicalize_fabs_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_no_fold_canonicalize_fabs_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.fabs.f32(float %load)		%v = tail call float @llvm.fabs.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

		; GCN-LABEL: test_no_fold_canonicalize_fcopysign_value_f32:
		; GCN-FLUSH: v_mul_f32_e64 v{{[0-9]+}}, 1.0, \|v{{[0-9]+}}\|
		; GCN-DENORM: v_max_f32_e64 v{{[0-9]+}}, \|v{{[0-9]+}}\|, \|v{{[0-9]+}}\|
		; GCN-NOT: v_mul_
		; GCN-NOT: v_max_
		define amdgpu_kernel void @test_no_fold_canonicalize_fcopysign_value_f32(float addrspace(1)* %arg, float %sign) {
		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
		%load = load float, float addrspace(1)* %gep, align 4
		%canon.load = tail call float @llvm.canonicalize.f32(float %load)
		%copysign = call float @llvm.copysign.f32(float %canon.load, float %sign)
		%v = tail call float @llvm.fabs.f32(float %load)
		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
		store float %canonicalized, float addrspace(1)* %gep, align 4
		ret void
		}

; GCN-LABEL: test_fold_canonicalize_fabs_value_f32:		; GCN-LABEL: test_fold_canonicalize_fabs_value_f32:
; GCN: v_and_b32_e32 [[V:v[0-9]+]], 0x7fffffff, v{{[0-9]+}}		; GCN: v_and_b32_e32 [[V:v[0-9]+]], 0x7fffffff, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_fabs_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_fabs_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v0 = fadd float %load, 0.0		%v0 = fadd float %load, 0.0
%v = tail call float @llvm.fabs.f32(float %v0)		%v = tail call float @llvm.fabs.f32(float %v0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_sin_value_f32:		; GCN-LABEL: test_fold_canonicalize_sin_value_f32:
; GCN: v_sin_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}		; GCN: v_sin_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_sin_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_sin_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.sin.f32(float %load)		%v = tail call float @llvm.sin.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_cos_value_f32:		; GCN-LABEL: test_fold_canonicalize_cos_value_f32:
; GCN: v_cos_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}		; GCN: v_cos_f32_e32 [[V:v[0-9]+]], v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_cos_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_cos_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.cos.f32(float %load)		%v = tail call float @llvm.cos.f32(float %load)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_sin_value_f16:		; GCN-LABEL: test_fold_canonicalize_sin_value_f16:
; GCN: v_sin_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}		; GCN: v_sin_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}
; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], [[V0]]		; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], [[V0]]
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_short v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_short v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_sin_value_f16(half addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_sin_value_f16(half addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
%load = load half, half addrspace(1)* %gep, align 2		%load = load half, half addrspace(1)* %gep, align 2
%v = tail call half @llvm.sin.f16(half %load)		%v = tail call half @llvm.sin.f16(half %load)
%canonicalized = tail call half @llvm.canonicalize.f16(half %v)		%canonicalized = tail call half @llvm.canonicalize.f16(half %v)
store half %canonicalized, half addrspace(1)* %gep, align 2		store half %canonicalized, half addrspace(1)* %gep, align 2
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_cos_value_f16:		; GCN-LABEL: test_fold_canonicalize_cos_value_f16:
; GCN: v_cos_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}		; GCN: v_cos_f32_e32 [[V0:v[0-9]+]], v{{[0-9]+}}
; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], [[V0]]		; GCN: v_cvt_f16_f32_e32 [[V:v[0-9]+]], [[V0]]
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_short v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_short v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_cos_value_f16(half addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_cos_value_f16(half addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
%load = load half, half addrspace(1)* %gep, align 2		%load = load half, half addrspace(1)* %gep, align 2
%v = tail call half @llvm.cos.f16(half %load)		%v = tail call half @llvm.cos.f16(half %load)
%canonicalized = tail call half @llvm.canonicalize.f16(half %v)		%canonicalized = tail call half @llvm.canonicalize.f16(half %v)
store half %canonicalized, half addrspace(1)* %gep, align 2		store half %canonicalized, half addrspace(1)* %gep, align 2
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_qNaN_value_f32:		; GCN-LABEL: test_fold_canonicalize_qNaN_value_f32:
; GCN: v_mov_b32_e32 [[V:v[0-9]+]], 0x7fc00000		; GCN: v_mov_b32_e32 [[V:v[0-9]+]], 0x7fc00000
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_qNaN_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_qNaN_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%canonicalized = tail call float @llvm.canonicalize.f32(float 0x7FF8000000000000)		%canonicalized = tail call float @llvm.canonicalize.f32(float 0x7FF8000000000000)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_minnum_value_from_load_f32:		; GCN-LABEL: test_fold_canonicalize_minnum_value_from_load_f32:
; VI: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}		; GCN: v_min_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}
; GFX9: v_min_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}		; GFX9-NOT: v_max
		; GFX9-NOT: v_mul

		; VI-DENORM: v_max_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
		; VI-FLUSH: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}

; GFX9: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GFX9: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
define amdgpu_kernel void @test_fold_canonicalize_minnum_value_from_load_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_minnum_value_from_load_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.minnum.f32(float %load, float 0.0)		%v = tail call float @llvm.minnum.f32(float %load, float 0.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

		; GCN-LABEL: test_fold_canonicalize_minnum_value_from_load_f32_nnan:
		; GCN: v_min_f32_e32 v{{[0-9]+}}, 0, v{{[0-9]+}}
		; GCN-DENORM-NOT: v_max
		; GCN-DENORM-NOT: v_mul
		; VI-FLUSH: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}

		; GFX9: {{flat\|global}}_store_dword v[{{[0-9:]+}}]
		define amdgpu_kernel void @test_fold_canonicalize_minnum_value_from_load_f32_nnan(float addrspace(1)* %arg) #1 {
		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
		%load = load float, float addrspace(1)* %gep, align 4
		%v = tail call float @llvm.minnum.f32(float %load, float 0.0)
		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
		store float %canonicalized, float addrspace(1)* %gep, align 4
		ret void
		}

; GCN-LABEL: test_fold_canonicalize_minnum_value_f32:		; GCN-LABEL: test_fold_canonicalize_minnum_value_f32:
; GCN: v_min_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}		; GCN: v_min_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_minnum_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v0 = fadd float %load, 0.0		%v0 = fadd float %load, 0.0
%v = tail call float @llvm.minnum.f32(float %v0, float 0.0)		%v = tail call float @llvm.minnum.f32(float %v0, float 0.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; FIXME: Should there be more checks here? minnum with NaN operand is simplified away.		; FIXME: Should there be more checks here? minnum with NaN operand is simplified away.

; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:		; GCN-LABEL: test_fold_canonicalize_sNaN_value_f32:
; VI: v_add_u32_e32 v{{[0-9]+}}		; GCN: {{flat\|global}}_load_dword [[LOAD:v[0-9]+]]
; GFX9: v_add_co_u32_e32 v{{[0-9]+}}		; GCN-FLUSH: v_mul_f32_e32 v{{[0-9]+}}, 1.0, [[LOAD]]
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}]		; GCN-DENORM: v_max_f32_e32 v{{[0-9]+}}, [[LOAD]], [[LOAD]]
define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_sNaN_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.minnum.f32(float %load, float bitcast (i32 2139095041 to float))		%v = tail call float @llvm.minnum.f32(float %load, float bitcast (i32 2139095041 to float))
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_denorm_value_f32:		; GCN-LABEL: test_fold_canonicalize_denorm_value_f32:
; GFX9: v_min_f32_e32 [[RESULT:v[0-9]+]], 0x7fffff, v{{[0-9]+}}		; GFX9: v_min_f32_e32 [[RESULT:v[0-9]+]], 0x7fffff, v{{[0-9]+}}
; VI: v_min_f32_e32 [[V0:v[0-9]+]], 0x7fffff, v{{[0-9]+}}
; VI: v_mul_f32_e32 [[RESULT:v[0-9]+]], 1.0, [[V0]]		; VI-FLUSH: v_min_f32_e32 [[V0:v[0-9]+]], 0x7fffff, v{{[0-9]+}}
		; VI-FLUSH: v_mul_f32_e32 [[RESULT:v[0-9]+]], 1.0, [[V0]]

		; VI-DENORM: v_min_f32_e32 [[V0:v[0-9]+]], 0x7fffff, v{{[0-9]+}}
		; VI-DENORM: v_max_f32_e32 [[RESULT:v[0-9]+]], [[V0]], [[V0]]


		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[RESULT]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[RESULT]]
; GFX9-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_denorm_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_denorm_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.minnum.f32(float %load, float bitcast (i32 8388607 to float))		%v = tail call float @llvm.minnum.f32(float %load, float bitcast (i32 8388607 to float))
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_maxnum_value_from_load_f32:		; GCN-LABEL: test_fold_canonicalize_maxnum_value_from_load_f32:
; GFX9: v_max_f32_e32 [[RESULT:v[0-9]+]], 0, v{{[0-9]+}}		; GFX9: v_max_f32_e32 [[RESULT:v[0-9]+]], 0, v{{[0-9]+}}
; VI: v_max_f32_e32 [[V0:v[0-9]+]], 0, v{{[0-9]+}}		; VI-FLUSH: v_max_f32_e32 [[V0:v[0-9]+]], 0, v{{[0-9]+}}
; VI: v_mul_f32_e32 [[RESULT:v[0-9]+]], 1.0, [[V0]]		; VI-FLUSH: v_mul_f32_e32 [[RESULT:v[0-9]+]], 1.0, [[V0]]

		; VI-DENORM: v_max_f32_e32 [[V0:v[0-9]+]], 0, v{{[0-9]+}}
		; VI-DENORM: v_max_f32_e32 [[RESULT:v[0-9]+]], [[V0]], [[V0]]

		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[RESULT]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[RESULT]]
; GFX9-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_from_load_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_from_load_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = tail call float @llvm.maxnum.f32(float %load, float 0.0)		%v = tail call float @llvm.maxnum.f32(float %load, float 0.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_maxnum_value_f32:		; GCN-LABEL: test_fold_canonicalize_maxnum_value_f32:
; GCN: v_max_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}		; GCN: v_max_f32_e32 [[V:v[0-9]+]], 0, v{{[0-9]+}}
		; GCN-NOT: v_max
		; GCN-NOT: v_mul
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v0 = fadd float %load, 0.0		%v0 = fadd float %load, 0.0
%v = tail call float @llvm.maxnum.f32(float %v0, float 0.0)		%v = tail call float @llvm.maxnum.f32(float %v0, float 0.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_maxnum_value_f64:		; GCN-LABEL: test_fold_canonicalize_maxnum_value_f64:
; GCN: v_max_f64 [[V:v\[[0-9]+:[0-9]+\]]], v[{{[0-9:]+}}], 0		; GCN: v_max_f64 [[V:v\[[0-9]+:[0-9]+\]]], v[{{[0-9:]+}}], 0
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dwordx2 v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dwordx2 v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0
define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f64(double addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f64(double addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id
%load = load double, double addrspace(1)* %gep, align 8		%load = load double, double addrspace(1)* %gep, align 8
%v0 = fadd double %load, 0.0		%v0 = fadd double %load, 0.0
%v = tail call double @llvm.maxnum.f64(double %v0, double 0.0)		%v = tail call double @llvm.maxnum.f64(double %v0, double 0.0)
%canonicalized = tail call double @llvm.canonicalize.f64(double %v)		%canonicalized = tail call double @llvm.canonicalize.f64(double %v)
store double %canonicalized, double addrspace(1)* %gep, align 8		store double %canonicalized, double addrspace(1)* %gep, align 8
ret void		ret void
}		}

; GCN-LABEL: test_no_fold_canonicalize_fdiv_value_f32_no_ieee:		; GCN-LABEL: test_no_fold_canonicalize_fdiv_value_f32_no_ieee:
; GCN-EXCEPT: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}		; GCN-EXCEPT: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
		; GCN-NOEXCEPT-NOT: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
define amdgpu_ps float @test_no_fold_canonicalize_fdiv_value_f32_no_ieee(float %arg0) {		define amdgpu_ps float @test_no_fold_canonicalize_fdiv_value_f32_no_ieee(float %arg0) {
entry:		entry:
%v = fdiv float %arg0, 15.0		%v = fdiv float %arg0, 15.0
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
ret float %canonicalized		ret float %canonicalized
}		}

; GCN-LABEL: test_fold_canonicalize_fmul_nnan_value_f32_no_ieee:		; GCN-LABEL: test_fold_canonicalize_fmul_nnan_value_f32_no_ieee:
; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}		; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
		; GCN-NOT: v_mul
		; GCN-NOT: v_max
; GCN-NEXT: ; return		; GCN-NEXT: ; return
; GCN-NOT: 1.0
define amdgpu_ps float @test_fold_canonicalize_fmul_nnan_value_f32_no_ieee(float %arg) {		define amdgpu_ps float @test_fold_canonicalize_fmul_nnan_value_f32_no_ieee(float %arg) {
entry:		entry:
%v = fmul nnan float %arg, 15.0		%v = fmul nnan float %arg, 15.0
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
ret float %canonicalized		ret float %canonicalized
}		}

; GCN-LABEL: {{^}}test_fold_canonicalize_load_nnan_value_f32		; GCN-LABEL: {{^}}test_fold_canonicalize_load_nnan_value_f32
Show All 9 Lines	define amdgpu_kernel void @test_fold_canonicalize_load_nnan_value_f32(float addrspace(1)* %arg, float addrspace(1)* %out) #1 {
%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds float, float addrspace(1)* %out, i32 %id
store float %canonicalized, float addrspace(1)* %gep2, align 4		store float %canonicalized, float addrspace(1)* %gep2, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_fold_canonicalize_load_nnan_value_f64		; GCN-LABEL: {{^}}test_fold_canonicalize_load_nnan_value_f64
; GCN: {{flat\|global}}_load_dwordx2 [[V:v\[[0-9:]+\]]],		; GCN: {{flat\|global}}_load_dwordx2 [[V:v\[[0-9:]+\]]],
; GCN: {{flat\|global}}_store_dwordx2 v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dwordx2 v[{{[0-9:]+}}], [[V]]
; GCN-NOT: 1.0		; GCN-NOT: v_mul_
		; GCN-NOT: v_max_
define amdgpu_kernel void @test_fold_canonicalize_load_nnan_value_f64(double addrspace(1)* %arg, double addrspace(1)* %out) #1 {		define amdgpu_kernel void @test_fold_canonicalize_load_nnan_value_f64(double addrspace(1)* %arg, double addrspace(1)* %out) #1 {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds double, double addrspace(1)* %arg, i32 %id
%v = load double, double addrspace(1)* %gep, align 8		%v = load double, double addrspace(1)* %gep, align 8
%canonicalized = tail call double @llvm.canonicalize.f64(double %v)		%canonicalized = tail call double @llvm.canonicalize.f64(double %v)
%gep2 = getelementptr inbounds double, double addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds double, double addrspace(1)* %out, i32 %id
store double %canonicalized, double addrspace(1)* %gep2, align 8		store double %canonicalized, double addrspace(1)* %gep2, align 8
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_fold_canonicalize_load_nnan_value_f16		; GCN-LABEL: {{^}}test_fold_canonicalize_load_nnan_value_f16
; GCN: {{flat\|global}}_load_ushort [[V:v[0-9]+]],		; GCN: {{flat\|global}}_load_ushort [[V:v[0-9]+]],
; GCN: {{flat\|global}}_store_short v[{{[0-9:]+}}], [[V]]		; GCN-NOT: v_mul
; GCN-NOT: 1.0		; GCN-NOT: v_max
		; GCN: {{flat\|global}}_store_short v{{\[[0-9]+:[0-9]+\]}}, [[V]]
define amdgpu_kernel void @test_fold_canonicalize_load_nnan_value_f16(half addrspace(1)* %arg, half addrspace(1)* %out) #1 {		define amdgpu_kernel void @test_fold_canonicalize_load_nnan_value_f16(half addrspace(1)* %arg, half addrspace(1)* %out) #1 {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds half, half addrspace(1)* %arg, i32 %id
%v = load half, half addrspace(1)* %gep, align 2		%v = load half, half addrspace(1)* %gep, align 2
%canonicalized = tail call half @llvm.canonicalize.f16(half %v)		%canonicalized = tail call half @llvm.canonicalize.f16(half %v)
%gep2 = getelementptr inbounds half, half addrspace(1)* %out, i32 %id		%gep2 = getelementptr inbounds half, half addrspace(1)* %out, i32 %id
store half %canonicalized, half addrspace(1)* %gep2, align 2		store half %canonicalized, half addrspace(1)* %gep2, align 2
ret void		ret void
}		}

		; GCN-LABEL: {{^}}test_fold_canonicalize_select_value_f32:
		; GCN: v_add_f32
		; GCN: v_add_f32
		; GCN: v_cndmask_b32
		; GCN-NOT: v_mul_
		; GCN-NOT: v_max_
		define amdgpu_kernel void @test_fold_canonicalize_select_value_f32(float addrspace(1)* %arg) {
		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
		%load0 = load volatile float, float addrspace(1)* %gep, align 4
		%load1 = load volatile float, float addrspace(1)* %gep, align 4
		%load2 = load volatile i32, i32 addrspace(1)* undef, align 4
		%v0 = fadd float %load0, 15.0
		%v1 = fadd float %load1, 32.0
		%cond = icmp eq i32 %load2, 0
		%select = select i1 %cond, float %v0, float %v1
		%canonicalized = tail call float @llvm.canonicalize.f32(float %select)
		store float %canonicalized, float addrspace(1)* %gep, align 4
		ret void
		}

		; Need to quiet the nan with a separate instruction since it will be
		; passed through the minnum.

		; GCN-LABEL: {{^}}test_fold_canonicalize_minnum_value_no_ieee_mode:
		; GFX9: v_min_f32_e32 v0, v0, v1
		; GFX9-FLUSH-NEXT: v_mul_f32_e32 v0, 1.0, v0
		; GFX9-DENORM-NEXT: v_max_f32_e32 v0, v0, v0
		; GFX9-NEXT: ; return to shader

		; VI: v_min_f32_e32 v0, v0, v1
		; VI-FLUSH: v_mul_f32_e32 v0, 1.0, v0
		; VI-DENORM: v_max_f32_e32 v0, v0, v0
		define amdgpu_ps float @test_fold_canonicalize_minnum_value_no_ieee_mode(float %arg0, float %arg1) {
		%v = tail call float @llvm.minnum.f32(float %arg0, float %arg1)
		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
		ret float %canonicalized
		}

		; GCN-LABEL: {{^}}test_fold_canonicalize_minnum_value_ieee_mode:
		; GFX9: v_min_f32_e32 v0, v0, v1
		; GFX9-NEXT: s_setpc_b64

		; VI: v_min_f32_e32 v0, v0, v1
		; VI-FLUSH-NEXT: v_mul_f32_e32 v0, 1.0, v0
		; VI-DENORM-NEXT: v_max_f32_e32 v0, v0, v0
		; VI-NEXT: s_setpc_b64
		define float @test_fold_canonicalize_minnum_value_ieee_mode(float %arg0, float %arg1) {
		%v = tail call float @llvm.minnum.f32(float %arg0, float %arg1)
		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
		ret float %canonicalized
		}

		; GCN-LABEL: {{^}}test_fold_canonicalize_minnum_value_no_ieee_mode_nnan:
		; GCN: v_min_f32_e32 v0, v0, v1
		; GCN-FLUSH-NEXT: v_mul_f32_e32 v0, 1.0, v0
		; GCN-NEXT: ; return
		define amdgpu_ps float @test_fold_canonicalize_minnum_value_no_ieee_mode_nnan(float %arg0, float %arg1) #1 {
		%v = tail call float @llvm.minnum.f32(float %arg0, float %arg1)
		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
		ret float %canonicalized
		}

; Avoid failing the test on FreeBSD11.0 which will match the GCN-NOT: 1.0		; Avoid failing the test on FreeBSD11.0 which will match the GCN-NOT: 1.0
; in the .amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802" directive		; in the .amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802" directive
; CHECK: .amd_amdgpu_isa		; CHECK: .amd_amdgpu_isa

declare float @llvm.canonicalize.f32(float) #0		declare float @llvm.canonicalize.f32(float) #0
		declare float @llvm.copysign.f32(float, float) #0
		declare float @llvm.amdgcn.fmul.legacy(float, float) #0
declare double @llvm.canonicalize.f64(double) #0		declare double @llvm.canonicalize.f64(double) #0
declare half @llvm.canonicalize.f16(half) #0		declare half @llvm.canonicalize.f16(half) #0
declare <2 x half> @llvm.canonicalize.v2f16(<2 x half>) #0		declare <2 x half> @llvm.canonicalize.v2f16(<2 x half>) #0
declare i32 @llvm.amdgcn.workitem.id.x() #0		declare i32 @llvm.amdgcn.workitem.id.x() #0
declare float @llvm.sqrt.f32(float) #0		declare float @llvm.sqrt.f32(float) #0
declare float @llvm.ceil.f32(float) #0		declare float @llvm.ceil.f32(float) #0
declare float @llvm.floor.f32(float) #0		declare float @llvm.floor.f32(float) #0
declare float @llvm.fma.f32(float, float, float) #0		declare float @llvm.fma.f32(float, float, float) #0
Show All 12 Lines