This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Try to make isKnownNeverSNan more accurate
AbandonedPublic

Authored by arsenm on Jul 19 2018, 11:14 AM.

Download Raw Diff

Details

Reviewers

scanon
rampitec
b-sumner

Summary

If I'm interpreting the standard correctly, everything
here was essentially incorrect.

Saying no if fp exceptions are not enabled, then no value is an snan is incorrect.
A loaded value could still be an snan. If FP exceptions
are enabled, certain operations will produce an snan.
Common operations won't produce an snan, and will quiet
an snan input.

Try to get the set of operations that would raise an exception if
they were to produce an snan more correct.

Diff Detail

Event Timeline

arsenm created this revision.Jul 19 2018, 11:14 AM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptJul 19 2018, 11:14 AM

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

rampitec added inline comments.Jul 19 2018, 12:48 PM

lib/Target/AMDGPU/SIISelLowering.cpp
6733	Do these quiet incoming sNaNs?

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

lib/Target/AMDGPU/SIISelLowering.cpp
6733	That's my understanding of how the basic operations work

scanon added inline comments.Jul 19 2018, 2:41 PM

lib/Target/AMDGPU/SIISelLowering.cpp
6733	Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign, fabs, (fneg would if we had it), and things like loads and bitcasts.

scanon added inline comments.Jul 19 2018, 2:42 PM

lib/Target/AMDGPU/SIISelLowering.cpp
6733	(and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.)

rampitec added a reviewer: b-sumner.Jul 19 2018, 4:05 PM

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

At the very least it is knownNeverNan, then it cannot be a signaling nan as well.

In D49561#1169015, @rampitec wrote:

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

Check ieee_mode in minnum/maxnum handling, although I think it is totally broken that this is necessary

First, I think this is wrong diff attached, that is not what is in the trunk on the left side of the diff now.

In D49561#1169653, @arsenm wrote:

In D49561#1169015, @rampitec wrote:

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

Should we care about sNaNs with FP exceptions disabled?
Should we care about sNaNs if a node is known never NaN?
Should we care about sNaNs if no-nans is enabled?
Should we care about sNaNs if fast-math is enabled?
Should we turn ieee_mode off?

Let’s see:

Description of setHasFloatingPointExceptions():

/// Tells the code generator that this target supports floating point
/// exceptions and cares about preserving floating point exception behavior.

I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.
-enable-no-nans-fp-math and derivatives:

/// NoNaNsFPMath - This flag is enabled when the
/// -enable-no-nans-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to
/// assume the FP arithmetic arguments and results are never NaNs.

So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

UnsafeAlgebra includes NoNaNs and thus all of the above applies.
I do not believe we have to disable ieee_mode.
- Even min/max handling was changed few times with and without ieee_mode bit in different generations of our GPU, so that is not a universal solution.
- Other fp operations will not quiet sNaNs as required if you disable it.

I believe we were exploring the idea to disable ieee in the past and found we will have more problems without it. With ieee we only have min/max problem.

arsenm added a child revision: D49605: AMDGPU: Fix implementation of isCanonicalized.Jul 20 2018, 12:37 PM

In D49561#1170203, @rampitec wrote:

In D49561#1169653, @arsenm wrote:

In D49561#1169015, @rampitec wrote:

In D49561#1168791, @arsenm wrote:

In D49561#1168643, @rampitec wrote:

I would tend to say isKnownNeverSNan here basically tells not that it cannot be sNaN, but rather that we do not care even if it is. At least we should not care if FP exceptions are off.

Except that's not true since there can be sNaNs can be loaded, so even if exceptions aren't handled they need to be dealt with

Do you mean we shall quiet snans with an fmin/fmax even if exception handling is off? My understanding was different. In fact I thought we can completely ignore them with say fast math.

Yes. My understanding is the exception handling aspect is different from the existence of sNaNs, and also orthogonal to enabling fast math. If exception handling is enabled, operations that can produce sNaNs will, but there can still be sNaNs loaded from memory, constants etc. I think if the function is marked no-nans, we can maybe assume this won't happen? The fast math flags aren't sufficient there because arbitrary operations like a load or function argument don't have an associated flag.

With ieee_mode enabled (which is currently the default) v_min_f32 et. al. a qNaN is returned if either input is an sNaN. If ieee_mode is off, it returns the non-NaN operand as-if it were a qNaN. I think for what we actually want, ieee_mode is harmful since it requires the library implementation of OpenCL's fmin to insert canonicalizes to quiet the inputs. Since LLVM doesn't properly handle sNaNs anywhere, I think enabling this is a bit pointless. However, whether it's on or not, I think we need to to get sNaN behavior correct at least for this one operation in order to be able to optimize out redundant canonicalizes.

More general, we have several questions:

Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

Should we care about sNaNs if a node is known never NaN?

Knowing it's never nan is how you know you don't need to care about it

Should we care about sNaNs if no-nans is enabled?

I think this is a grey area, but probably not.

Should we care about sNaNs if fast-math is enabled?

Should we turn ieee_mode off?

Let’s see:

Description of setHasFloatingPointExceptions():
/// Tells the code generator that this target supports floating point
/// exceptions and cares about preserving floating point exception behavior.
I read it this way: if we say no here, we do not care about preserving floating point exception behavior. I.e. we may assume there are no sNaNs even if they are present.

This is not true. Assuming this will break the fmin/fmax conformance tests. This is now just something we always turn on. We don't enable trap on FP exception, but that doesn't mean that there aren't sNaNs somewhere that need to be handled correctly. This property also should probably be removed, since things are moving to relying on the STRICT_* versions of operations to get this behavior

sNaN is a signaling NaN, so it is a NaN. If a node is known to be not a NaN, it cannot be its signaling form as well.

Yes

-enable-no-nans-fp-math and derivatives:
/// NoNaNsFPMath - This flag is enabled when the
/// -enable-no-nans-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to
/// assume the FP arithmetic arguments and results are never NaNs.
So then if that flag is on we are allowed to assume not only FP arithmetic results are never NaN, but also arguments. Even if they are and even if they are loaded or constant sNaN/NaN.

Yes

UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

Attach right diff

Should we care about sNaNs with FP exceptions disabled?

Yes. This has to work. The OpenCL conformance tests check for this

OK. Makes sense. Just maybe we need to tell llvm that we actually support exceptions, which should not necessarily mean we trap on them.

UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

OK, UnsafeAlgebra is probably not a right condition. FastMathFlags::isFast() definitely is.

UnsafeAlgebra includes NoNaNs and thus all of the above applies.

I do not agree with this. The per-instruction flags have decoupled the unsafe algebra properties from no-nans, and the per-function/global flags should follow suit.

OK, UnsafeAlgebra is probably not a right condition. FastMathFlags::isFast() definitely is.

Is fast is just all the flags enabled. No nans is just one bit of it

So given the discussion you seem to be missing DAG.isKnownNeverNaN(Op) condition.

rampitec added inline comments.Jul 20 2018, 1:32 PM

lib/Target/AMDGPU/SIISelLowering.h
317	It does not belong to this patch.

arsenm mentioned this in D49662: DAG: Enhance isKnownNeverNaN.Jul 23 2018, 4:12 AM

Superseded by D49662 and D49841

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIISelLowering.h

5 lines

SIISelLowering.cpp

81 lines

test/

CodeGen/

AMDGPU/

clamp.ll

12 lines

fcanonicalize-elimination.ll

6 lines

fmed3.ll

19 lines

Diff 156576

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	public:
void computeKnownBitsForFrameIndex(const SDValue Op,		void computeKnownBitsForFrameIndex(const SDValue Op,
KnownBits &Known,		KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth = 0) const override;		unsigned Depth = 0) const override;

bool isSDNodeSourceOfDivergence(const SDNode *N,		bool isSDNodeSourceOfDivergence(const SDNode *N,
FunctionLoweringInfo FLI, DivergenceAnalysis DA) const override;		FunctionLoweringInfo FLI, DivergenceAnalysis DA) const override;

		bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op,
		unsigned Depth = 0) const;

		bool denormalsEnabledForType(EVT VT) const;
		rampitecUnsubmitted Not Done Reply Inline Actions It does not belong to this patch. rampitec: It does not belong to this patch.
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,711 Lines • ▼ Show 20 Lines	if (VT == MVT::f32 && (N0.getOpcode() == ISD::UINT_TO_FP \|\|
N0.getOpcode() == ISD::SINT_TO_FP)) {		N0.getOpcode() == ISD::SINT_TO_FP)) {
return DCI.DAG.getNode(AMDGPUISD::RCP_IFLAG, SDLoc(N), VT, N0,		return DCI.DAG.getNode(AMDGPUISD::RCP_IFLAG, SDLoc(N), VT, N0,
N->getFlags());		N->getFlags());
}		}

return AMDGPUTargetLowering::performRcpCombine(N, DCI);		return AMDGPUTargetLowering::performRcpCombine(N, DCI);
}		}

static bool isKnownNeverSNan(SelectionDAG &DAG, SDValue Op) {		bool SITargetLowering::isKnownNeverSNan(SelectionDAG &DAG, SDValue Op,
if (!DAG.getTargetLoweringInfo().hasFloatingPointExceptions())		unsigned Depth) const {
		if (Depth >= 6)
		return false;

		switch (Op.getOpcode()) {
		case ISD::ConstantFP: {
		ConstantFPSDNode *C = cast<ConstantFPSDNode>(Op);
		return !C->getValueAPF().isNaN() \|\|
		!C->getValueAPF().isSignaling();
		}
		case ISD::FMINNUM:
		case ISD::FMAXNUM: {
		bool IsIEEEMode = Subtarget->enableIEEEBit(DAG.getMachineFunction());
		rampitecUnsubmitted Not Done Reply Inline Actions Do these quiet incoming sNaNs? rampitec: Do these quiet incoming sNaNs?
		arsenmAuthorUnsubmitted Not Done Reply Inline Actions That's my understanding of how the basic operations work arsenm: That's my understanding of how the basic operations work
		scanonUnsubmitted Not Done Reply Inline Actions Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign, fabs, (fneg would if we had it), and things like loads and bitcasts. scanon: Yes, all computational operations quiet sNaNs. The only things that produce sNaN are fcopysign…
		scanonUnsubmitted Not Done Reply Inline Actions (and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.) scanon: (and to be clear, fcopysign and fabs can only produce sNaN if their input is sNaN.)
		return IsIEEEMode \|\| (isKnownNeverSNan(DAG, Op.getOperand(0), Depth + 1) &&
		isKnownNeverSNan(DAG, Op.getOperand(1), Depth + 1));
		}
		case ISD::FADD:
		case ISD::FSUB:
		case ISD::FMUL:
		case ISD::FMAD:
		case ISD::FCANONICALIZE:
		case AMDGPUISD::FMED3:
		case AMDGPUISD::FMIN3:
		case AMDGPUISD::FMAX3:
		case AMDGPUISD::FMIN_LEGACY:
		case AMDGPUISD::FMAX_LEGACY:
		case AMDGPUISD::CLAMP:
return true;		return true;
		case ISD::FDIV:
		case ISD::FREM:
		case ISD::FMA:
		case ISD::SINT_TO_FP:
		case ISD::UINT_TO_FP:
		case ISD::FSQRT:
		case ISD::FSIN:
		case ISD::FCOS:
		case ISD::FPOWI:
		case ISD::FPOW:
		case ISD::FLOG:
		case ISD::FLOG2:
		case ISD::FLOG10:
		case ISD::FEXP:
		case ISD::FEXP2:
		case ISD::FCEIL:
		case ISD::FTRUNC:
		case ISD::FRINT:
		case ISD::FNEARBYINT:
		case ISD::FROUND:
		case ISD::FFLOOR:
		case AMDGPUISD::RCP:
		case AMDGPUISD::RSQ:
		case AMDGPUISD::RSQ_CLAMP:
		case AMDGPUISD::CVT_F32_UBYTE0:
		case AMDGPUISD::CVT_F32_UBYTE1:
		case AMDGPUISD::CVT_F32_UBYTE2:
		case AMDGPUISD::CVT_F32_UBYTE3:
		// TODO: This could be refined based on operands.
		return !DAG.getTargetLoweringInfo().hasFloatingPointExceptions() \|\|
		Op->getFlags().hasNoNaNs() \|\|
		DAG.getTarget().Options.NoNaNsFPMath;
		case ISD::FNEG:
		case ISD::FABS:
		case ISD::FCOPYSIGN:
		case ISD::FP_EXTEND:
		case AMDGPUISD::FP16_ZEXT:
		case AMDGPUISD::FP_TO_FP16:
		case AMDGPUISD::CVT_PKRTZ_F16_F32:
		return isKnownNeverSNan(DAG, Op.getOperand(0), Depth + 1);

return DAG.isKnownNeverNaN(Op);		case ISD::SELECT:
		return isKnownNeverSNan(DAG, Op.getOperand(1), Depth + 1) &&
		isKnownNeverSNan(DAG, Op.getOperand(2), Depth + 1);

		case ISD::FMAXNAN:
		case ISD::FMINNAN:
		// TODO: What do these do for snans?
		default:
		return false;
		}
}		}

static bool isCanonicalized(SelectionDAG &DAG, SDValue Op,		static bool isCanonicalized(SelectionDAG &DAG, SDValue Op,
const GCNSubtarget *ST, unsigned MaxDepth=5) {		const GCNSubtarget *ST, unsigned MaxDepth=5) {
// If source is a result of another standard FP operation it is already in		// If source is a result of another standard FP operation it is already in
// canonical form.		// canonical form.

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
▲ Show 20 Lines • Show All 1,705 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/clamp.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_clamp_negabs_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_negzero_f32:		; GCN-LABEL: {{^}}v_clamp_negzero_f32:
; GCN-DAG: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN-DAG: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN-DAG: v_bfrev_b32_e32 [[SIGNBIT:v[0-9]+]], 1		; GCN-DAG: v_bfrev_b32_e32 [[SIGNBIT:v[0-9]+]], 1
; GCN: v_med3_f32 v{{[0-9]+}}, [[A]], [[SIGNBIT]], 1.0		; GCN-DAG: v_add_f32_e32 [[QUIET:v[0-9]+]], 0, [[A]]
		; GCN: v_med3_f32 v{{[0-9]+}}, [[QUIET]], [[SIGNBIT]], 1.0
define amdgpu_kernel void @v_clamp_negzero_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_negzero_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%max = call float @llvm.maxnum.f32(float %a, float -0.0)		%quiet = fadd float %a, 0.0
		%max = call float @llvm.maxnum.f32(float %quiet, float -0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_multi_use_max_f32:		; GCN-LABEL: {{^}}v_clamp_multi_use_max_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines
}		}

; ---------------------------------------------------------------------		; ---------------------------------------------------------------------
; Test non-default behaviors enabling snans and disabling dx10_clamp		; Test non-default behaviors enabling snans and disabling dx10_clamp
; ---------------------------------------------------------------------		; ---------------------------------------------------------------------

; GCN-LABEL: {{^}}v_clamp_f32_no_dx10_clamp:		; GCN-LABEL: {{^}}v_clamp_f32_no_dx10_clamp:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_med3_f32 v{{[0-9]+}}, [[A]], 0, 1.0		; GCN: v_add_f32_e32 [[QUIET:v[0-9]+]], 0, [[A]]
		; GCN: v_med3_f32 v{{[0-9]+}}, [[QUIET]], 0, 1.0
define amdgpu_kernel void @v_clamp_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {		define amdgpu_kernel void @v_clamp_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%max = call float @llvm.maxnum.f32(float %a, float 0.0)		%quiet = fadd float %a, 0.0
		%max = call float @llvm.maxnum.f32(float %quiet, float 0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_f32_snan_dx10clamp:		; GCN-LABEL: {{^}}v_clamp_f32_snan_dx10clamp:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fcanonicalize-elimination.ll

Show First 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_fold_canonicalize_maxnum_value_f64(double addrspace(1)* %arg) {
%load = load double, double addrspace(1)* %gep, align 8		%load = load double, double addrspace(1)* %gep, align 8
%v0 = fadd double %load, 0.0		%v0 = fadd double %load, 0.0
%v = tail call double @llvm.maxnum.f64(double %v0, double 0.0)		%v = tail call double @llvm.maxnum.f64(double %v0, double 0.0)
%canonicalized = tail call double @llvm.canonicalize.f64(double %v)		%canonicalized = tail call double @llvm.canonicalize.f64(double %v)
store double %canonicalized, double addrspace(1)* %gep, align 8		store double %canonicalized, double addrspace(1)* %gep, align 8
ret void		ret void
}		}

; GCN-LABEL: test_no_fold_canonicalize_fmul_value_f32_no_ieee:		; GCN-LABEL: test_no_fold_canonicalize_fdiv_value_f32_no_ieee:
; GCN-EXCEPT: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}		; GCN-EXCEPT: v_mul_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
define amdgpu_ps float @test_no_fold_canonicalize_fmul_value_f32_no_ieee(float %arg) {		define amdgpu_ps float @test_no_fold_canonicalize_fdiv_value_f32_no_ieee(float %arg0) {
entry:		entry:
%v = fmul float %arg, 15.0		%v = fdiv float %arg0, 15.0
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
ret float %canonicalized		ret float %canonicalized
}		}

; GCN-LABEL: test_fold_canonicalize_fmul_nnan_value_f32_no_ieee:		; GCN-LABEL: test_fold_canonicalize_fmul_nnan_value_f32_no_ieee:
; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}		; GCN: v_mul_f32_e32 [[V:v[0-9]+]], 0x41700000, v{{[0-9]+}}
; GCN-NEXT: ; return		; GCN-NEXT: ; return
; GCN-NOT: 1.0		; GCN-NOT: 1.0
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fmed3.ll

Show All 26 Lines

; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}		; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}
; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}		; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
define amdgpu_kernel void @v_test_fmed3_r_i_i_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {		define amdgpu_kernel void @v_test_fmed3_r_i_i_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid		%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
		%may.be.snan = call float @llvm.sqrt.f32(float %a)

%max = call float @llvm.maxnum.f32(float %a, float 2.0)		%max = call float @llvm.maxnum.f32(float %may.be.snan, float 2.0)
%med = call float @llvm.minnum.f32(float %max, float 4.0)		%med = call float @llvm.minnum.f32(float %max, float 4.0)

store float %med, float addrspace(1)* %outgep		store float %med, float addrspace(1)* %outgep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_commute0_f32:		; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_commute0_f32:
; NOSNAN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0		; NOSNAN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0

; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}		; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}
; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}		; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
define amdgpu_kernel void @v_test_fmed3_r_i_i_commute0_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {		define amdgpu_kernel void @v_test_fmed3_r_i_i_commute0_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid		%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
		%may.be.snan = call float @llvm.sqrt.f32(float %a)
%max = call float @llvm.maxnum.f32(float 2.0, float %a)		%max = call float @llvm.maxnum.f32(float 2.0, float %may.be.snan)
%med = call float @llvm.minnum.f32(float 4.0, float %max)		%med = call float @llvm.minnum.f32(float 4.0, float %max)

store float %med, float addrspace(1)* %outgep		store float %med, float addrspace(1)* %outgep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_commute1_f32:		; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_commute1_f32:
; NOSNAN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0		; NOSNAN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0

; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}		; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}
; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}		; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
define amdgpu_kernel void @v_test_fmed3_r_i_i_commute1_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {		define amdgpu_kernel void @v_test_fmed3_r_i_i_commute1_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid		%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
		%may.be.snan = call float @llvm.sqrt.f32(float %a)

%max = call float @llvm.maxnum.f32(float %a, float 2.0)		%max = call float @llvm.maxnum.f32(float %may.be.snan, float 2.0)
%med = call float @llvm.minnum.f32(float 4.0, float %max)		%med = call float @llvm.minnum.f32(float 4.0, float %max)

store float %med, float addrspace(1)* %outgep		store float %med, float addrspace(1)* %outgep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_constant_order_f32:		; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_constant_order_f32:
; GCN: v_max_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}		; GCN: v_max_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_no_nans_f32:		; GCN-LABEL: {{^}}v_test_fmed3_r_i_i_no_nans_f32:
; GCN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0		; GCN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0
define amdgpu_kernel void @v_test_fmed3_r_i_i_no_nans_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {		define amdgpu_kernel void @v_test_fmed3_r_i_i_no_nans_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid		%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
		%may.be.snan = call float @llvm.sqrt.f32(float %a)

%max = call float @llvm.maxnum.f32(float %a, float 2.0)		%max = call float @llvm.maxnum.f32(float %may.be.snan, float 2.0)
%med = call float @llvm.minnum.f32(float %max, float 4.0)		%med = call float @llvm.minnum.f32(float %max, float 4.0)

store float %med, float addrspace(1)* %outgep		store float %med, float addrspace(1)* %outgep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_test_legacy_fmed3_r_i_i_f32:		; GCN-LABEL: {{^}}v_test_legacy_fmed3_r_i_i_f32:
; NOSNAN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0		; NOSNAN: v_med3_f32 v{{[0-9]+}}, v{{[0-9]+}}, 2.0, 4.0

; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}		; SNAN: v_max_f32_e32 v{{[0-9]+}}, 2.0, v{{[0-9]+}}
; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}		; SNAN: v_min_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
define amdgpu_kernel void @v_test_legacy_fmed3_r_i_i_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {		define amdgpu_kernel void @v_test_legacy_fmed3_r_i_i_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid		%outgep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
		%may.be.snan = call float @llvm.sqrt.f32(float %a)

; fmax_legacy		; fmax_legacy
%cmp0 = fcmp ule float %a, 2.0		%cmp0 = fcmp ule float %may.be.snan, 2.0
%max = select i1 %cmp0, float 2.0, float %a		%max = select i1 %cmp0, float 2.0, float %may.be.snan

; fmin_legacy		; fmin_legacy
%cmp1 = fcmp uge float %max, 4.0		%cmp1 = fcmp uge float %max, 4.0
%med = select i1 %cmp1, float 4.0, float %max		%med = select i1 %cmp1, float 4.0, float %max

store float %med, float addrspace(1)* %outgep		store float %med, float addrspace(1)* %outgep
ret void		ret void
}		}
▲ Show 20 Lines • Show All 791 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_nnan_inputs_med3_f16_pat0(half addrspace(1)* %out, half addrspace(1)* %aptr, half addrspace(1)* %bptr, half addrspace(1)* %cptr) #1 {
%tmp2 = call half @llvm.minnum.f16(half %tmp1, half %c.nnan)		%tmp2 = call half @llvm.minnum.f16(half %tmp1, half %c.nnan)
%med3 = call half @llvm.maxnum.f16(half %tmp0, half %tmp2)		%med3 = call half @llvm.maxnum.f16(half %tmp0, half %tmp2)
store half %med3, half addrspace(1)* %outgep		store half %med3, half addrspace(1)* %outgep
ret void		ret void
}		}

declare i32 @llvm.amdgcn.workitem.id.x() #0		declare i32 @llvm.amdgcn.workitem.id.x() #0
declare float @llvm.fabs.f32(float) #0		declare float @llvm.fabs.f32(float) #0
		declare float @llvm.sqrt.f32(float) #0
declare float @llvm.minnum.f32(float, float) #0		declare float @llvm.minnum.f32(float, float) #0
declare float @llvm.maxnum.f32(float, float) #0		declare float @llvm.maxnum.f32(float, float) #0
declare double @llvm.minnum.f64(double, double) #0		declare double @llvm.minnum.f64(double, double) #0
declare double @llvm.maxnum.f64(double, double) #0		declare double @llvm.maxnum.f64(double, double) #0
declare half @llvm.fabs.f16(half) #0		declare half @llvm.fabs.f16(half) #0
declare half @llvm.minnum.f16(half, half) #0		declare half @llvm.minnum.f16(half, half) #0
declare half @llvm.maxnum.f16(half, half) #0		declare half @llvm.maxnum.f16(half, half) #0

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }
attributes #1 = { nounwind "unsafe-fp-math"="false" "no-nans-fp-math"="false" }		attributes #1 = { nounwind "unsafe-fp-math"="false" "no-nans-fp-math"="false" }
attributes #2 = { nounwind "unsafe-fp-math"="false" "no-nans-fp-math"="true" }		attributes #2 = { nounwind "unsafe-fp-math"="false" "no-nans-fp-math"="true" }