This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
2/14
SIISelLowering.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
clamp-modifier.ll
1/2
clamp.ll
-
known-never-snan.ll

Differential D88572

AMDGPU/SelectionDAG Check for NaN, DX10Clamp and IEEE in fmed3 combine
Needs ReviewPublic

Authored by Petar.Avramovic on Sep 30 2020, 7:39 AM.

Download Raw Diff

Details

Reviewers

foad
arsenm

Summary

Add more checks for potential SNaN or QNaN input together with DX10Clamp
and IEEE flags in combines that involve floating point min, max, med3 and
clamp in order to get equivalent instruction for each input.

Diff Detail

Event Timeline

Petar.Avramovic created this revision.Sep 30 2020, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2020, 7:39 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 7 others. · View Herald Transcript

Petar.Avramovic requested review of this revision.Sep 30 2020, 7:39 AM

Herald added a subscriber: wdng. · View Herald TranscriptSep 30 2020, 7:39 AM

Petar.Avramovic added a child revision: D88574: AMDGPU/SelectionDAG Include fast-math-flags for fmed3 intrinsic.Sep 30 2020, 7:41 AM

Petar.Avramovic added a child revision: D88573: [SelectionDAG] Add check for BUILD_VECTOR in isKnownNeverNaN.

I think it's a futile to expect correct behavior with IEEE=0. No other operation behaves correctly with respect to snan quieting, and we're not going to implement all of the necessary manual quieting (i.e. isKnownNeverSNaN is always wrong since the operations that quiet don't really)

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9816	Missing the
9822–9823	Should not need to do both of these checks
9825	Should short circuit expensive check based on IEEE mode

Simplify logic in 'is safe to clamp' checks.

foad added inline comments.Oct 13 2020, 5:15 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9819	Is there a good reason to keep flags from the inner node, here, but not on line 9775?
9824–9825	You're assuming that Info->getMode().DX10Clamp affects the behaviour of AMDGPUISD::CLAMP which I think is OK, though it would be nice if that was clearly documented (I can't understand the comment about dx10_enable in AMDGPUISelLowering.h). You're also assuming that Info->getMode().IEEE affects the behaviour of the Op0 node which could be ISD::FMAXNUM or ISD::FMAXNUM_IEEE or FMAX_LEGACY. This seems wrong to me.
9917	Typo "is NaN".
9933	Typo "remaining".
9959	You're assuming that the behaviour of AMDGPUISD::FMED3 depends on Info->getMode().IEEE, which seems reasonable.
9961	A constant could be a NaN, but isKnownNeverNaN should already have handled the constant case. So I don't think you need the check for isa<ConstantFPSDNode>(X) at all.
9961	Why do you keep flags here but not on line 9925?
9961	Typo "depending".

Fix typos.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9824–9825	Check for Info->getMode().IEEE only for ISD::FMAXNUM. This node is known never SNaN when IEEE is off.
llvm/test/CodeGen/AMDGPU/clamp.ll
420	isa<ConstantFPSDNode>(X) check in min max combine is there to cover existing test where X is NaN constant, min max will be folded to clamp and clamp will become constant in performClampCombine.

foad added inline comments.Oct 14 2020, 7:36 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9824–9825	As I understand it, the behaviour of ISD::FMAXNUM should not depend on the IEEE mode. (Yes we currently codegen it to an instruction that behaves differently depending on the IEEE mode bit, but that's a bug that could be fixed, if anyone cared enough to fix it.)
llvm/test/CodeGen/AMDGPU/clamp.ll
420	This test is wrong then, isn't it? It's checking that fmed3(0,1,snan) -> 0, with DX10Clamp=1 and IEEE=1. But that's the wrong answer. fmed3(0,1,snan) is qnan in IEEE mode. This test is relying on an incorrect folding from fmed3 to clamp. Maybe the test should be changed to put the 0 last, i.e. fmed3(snan,1,0) ?

Respect description of ISD::FMAXNUM and ISD::FMAXNUM_IEEE (their behavior is not affected by IEEE mode).
Legalizer will deal with use of FMAXNUM when IEEE=true (makes sure that potential SNaN input gets quieted) and we can try to fold FMAXNUM_IEEE later.
Most of the test changes come from clamp/fmed on top of fcanonicalize with IEEE=true.

We should probably check if ISD::FMAX... matches IEEE mode when folding min3 and max3 since these also depend on IEEE mode.
However this brings many regressions. For some reason in combine round after legalization nodes are not visited starting from end of the function but kind of random.For example in min(max(max(.., ..), 0.0) 1.0) middle max gets visited first and it folds into max3 and we miss clamp fold since min was not visited first.

All of this could be fixed with some test changes.
Currently, there no clean way to represent ISD::FMAXNUM_IEEE in IR. It can only be legalized from ISD::FMAXNUM by quieting its input.
I assume that producer of function with IEEE=true would desire llvm.maxnum_ieee.* (TODO introduce this in IR) instead of llvm.maxnum.* for IR floating point max.
This would make intention of tests that make use of isKnownNeverSNaN more clear since currently ISD::FMAXNUM_IEEE can't have SNaN input because it can only be produced from ISD::FMAXNUM.

Petar.Avramovic added inline comments.Oct 20 2020, 4:14 AM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9847	This patch fixes clamp and fmed3 folds that could be incorrect with IEEE=true(potential not-silenced SNaN inputs). @arsenm Folding ISD::FMAXNUM with IEEE=true into max3 here could be incorrect when one of the inputs is SNaN. There are no correctness issues with IEEE=false. Thoughts on moving this combine after legalizer? We could also potentially move min3/max3 and min/max patterns with constants to td file so that global-isel can use them as well.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIISelLowering.cpp

119 lines

test/

CodeGen/

AMDGPU/

clamp-modifier.ll

6 lines

clamp.ll

93 lines

known-never-snan.ll

14 lines

Diff 298622

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,766 Lines • ▼ Show 20 Lines	static ConstantFPSDNode *getSplatConstantFP(SDValue Op) {

return nullptr;		return nullptr;
}		}

SDValue SITargetLowering::performFPMed3ImmCombine(SelectionDAG &DAG,		SDValue SITargetLowering::performFPMed3ImmCombine(SelectionDAG &DAG,
const SDLoc &SL,		const SDLoc &SL,
SDValue Op0,		SDValue Op0,
SDValue Op1) const {		SDValue Op1) const {
		const MachineFunction &MF = DAG.getMachineFunction();
		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
		unsigned Opc = Op0.getOpcode();
		bool IEEE = Info->getMode().IEEE;
		// Based on IEEE setting false/true, v_max_f instruction behaves like
		// ISD::FMAXNUM/ISD::FMAXNUM_IEEE respectively. Skip complicated checks when
		// IEEE setting and opcode don't match. Retry after legalization when they do.
		if ((Opc == ISD::FMAXNUM && IEEE) \|\| (Opc == ISD::FMAXNUM_IEEE && !IEEE))
		return SDValue();

ConstantFPSDNode *K1 = getSplatConstantFP(Op1);		ConstantFPSDNode *K1 = getSplatConstantFP(Op1);
if (!K1)		if (!K1)
return SDValue();		return SDValue();

ConstantFPSDNode *K0 = getSplatConstantFP(Op0.getOperand(1));		ConstantFPSDNode *K0 = getSplatConstantFP(Op0.getOperand(1));
if (!K0)		if (!K0)
return SDValue();		return SDValue();

// Ordered >= (although NaN inputs should have folded away by now).		// Ordered >= (although NaN inputs should have folded away by now).
if (K0->getValueAPF() > K1->getValueAPF())		if (K0->getValueAPF() > K1->getValueAPF())
return SDValue();		return SDValue();

const MachineFunction &MF = DAG.getMachineFunction();		// Folding min(max(Val, 0.0), 1.0) into clamp(Val) is safe for non-NaN input.
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		// FMAXNUM_IEEE(SNaN, 0.0) = QNaN; FMINNUM_IEEE(QNaN, 1.0) = 1.0.
		// FMAXNUM(NaN, 0.0) = FMAXNUM_IEEE(QNaN, 0.0) = 0.0 (returns non-NaN input)
// TODO: Check IEEE bit enabled?		// FMAX_LEGACY(NaN, 0.0) -> NaN >= 0.0 ? NaN : 0.0 = 0.0; min(0.0, 1.0) = 0.0.
		// For the source, check for SNaN input to FMAXNUM_IEEE since only it doesn't
		// evaluate to 0.0. For the destination we want to clamp NaNs to 0.0.
		// When inner node is FMAXNUM_IEEE check if its result is known non-SNaN. This
		// check for no-NaN flags first and then if input(Val) is known non-SNaN.
EVT VT = Op0.getValueType();		EVT VT = Op0.getValueType();
		SDValue Val = Op0.getOperand(0);
if (Info->getMode().DX10Clamp) {		if (Info->getMode().DX10Clamp) {
// If dx10_clamp is enabled, NaNs clamp to 0.0. This is the same as the		// If dx10_clamp is enabled, NaNs clamp to 0.0. This is the same as the
// hardware fmed3 behavior converting to a min.		// hardware fmed3 behavior converting to a min.
// FIXME: Should this be allowing -0.0?		// FIXME: Should this be allowing -0.0?
if (K1->isExactlyValue(1.0) && K0->isExactlyValue(0.0))		if (K1->isExactlyValue(1.0) && K0->isExactlyValue(0.0))
return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Op0.getOperand(0));		if (Opc != ISD::FMAXNUM_IEEE \|\| DAG.isKnownNeverSNaN(Op0))
		return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Val, Op0->getFlags());
}		}

// med3 for f16 is only available on gfx9+, and not available for v2f16.		// med3 for f16 is only available on gfx9+, and not available for v2f16.
		arsenmUnsubmitted Not Done Reply Inline Actions Missing the arsenm: Missing the
if (VT == MVT::f32 \|\| (VT == MVT::f16 && Subtarget->hasMed3_16())) {		if (VT == MVT::f32 \|\| (VT == MVT::f16 && Subtarget->hasMed3_16())) {
// This isn't safe with signaling NaNs because in IEEE mode, min/max on a		// Folding min(max(Val, K0), K1) into fmed3(Val, K0, K1) is safe for non-NaN
// signaling NaN gives a quiet NaN. The quiet NaN input to the min would		// input. fmed3(NaN, K0, K1) is equivalent to min(min(NaN, K0), K1), since
		foadUnsubmitted Not Done Reply Inline Actions Is there a good reason to keep flags from the inner node, here, but not on line 9775? foad: Is there a good reason to keep flags from the inner node, here, but not on line 9775?
// then give the other result, which is different from med3 with a NaN		// inner nodes(max/min) have same behavior for 'NaN input as first operand'
// input.		// this is safe to fold for all inputs.
SDValue Var = Op0.getOperand(0);
if (!DAG.isKnownNeverSNaN(Var))
return SDValue();

const SIInstrInfo *TII = getSubtarget()->getInstrInfo();		const SIInstrInfo *TII = getSubtarget()->getInstrInfo();

		arsenmUnsubmitted Not Done Reply Inline Actions Should not need to do both of these checks arsenm: Should not need to do both of these checks
if ((!K0->hasOneUse() \|\|		if ((!K0->hasOneUse() \|\|
TII->isInlineConstant(K0->getValueAPF().bitcastToAPInt())) &&		TII->isInlineConstant(K0->getValueAPF().bitcastToAPInt())) &&
		arsenmUnsubmitted Not Done Reply Inline Actions Should short circuit expensive check based on IEEE mode arsenm: Should short circuit expensive check based on IEEE mode
		foadUnsubmitted Not Done Reply Inline Actions You're assuming that Info->getMode().DX10Clamp affects the behaviour of AMDGPUISD::CLAMP which I think is OK, though it would be nice if that was clearly documented (I can't understand the comment about dx10_enable in AMDGPUISelLowering.h). You're also assuming that Info->getMode().IEEE affects the behaviour of the Op0 node which could be ISD::FMAXNUM or ISD::FMAXNUM_IEEE or FMAX_LEGACY. This seems wrong to me. foad: You're assuming that Info->getMode().DX10Clamp affects the behaviour of AMDGPUISD::CLAMP which…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions Check for Info->getMode().IEEE only for ISD::FMAXNUM. This node is known never SNaN when IEEE is off. Petar.Avramovic: Check for Info->getMode().IEEE only for ISD::FMAXNUM. This node is known never SNaN when IEEE…
		foadUnsubmitted Not Done Reply Inline Actions As I understand it, the behaviour of ISD::FMAXNUM should not depend on the IEEE mode. (Yes we currently codegen it to an instruction that behaves differently depending on the IEEE mode bit, but that's a bug that could be fixed, if anyone cared enough to fix it.) foad: As I understand it, the behaviour of ISD::FMAXNUM should not depend on the IEEE mode. (Yes we…
(!K1->hasOneUse() \|\|		(!K1->hasOneUse() \|\|
TII->isInlineConstant(K1->getValueAPF().bitcastToAPInt()))) {		TII->isInlineConstant(K1->getValueAPF().bitcastToAPInt()))) {
return DAG.getNode(AMDGPUISD::FMED3, SL, K0->getValueType(0),		return DAG.getNode(AMDGPUISD::FMED3, SL, K0->getValueType(0),
Var, SDValue(K0, 0), SDValue(K1, 0));		Val, SDValue(K0, 0), SDValue(K1, 0), Op0->getFlags());
}		}
}		}

return SDValue();		return SDValue();
}		}

SDValue SITargetLowering::performMinMaxCombine(SDNode *N,		SDValue SITargetLowering::performMinMaxCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned Opc = N->getOpcode();		unsigned Opc = N->getOpcode();
SDValue Op0 = N->getOperand(0);		SDValue Op0 = N->getOperand(0);
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);

// Only do this if the inner op has one use since this will just increases		// Only do this if the inner op has one use since this will just increases
// register pressure for no benefit.		// register pressure for no benefit.

		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions This patch fixes clamp and fmed3 folds that could be incorrect with IEEE=true(potential not-silenced SNaN inputs). @arsenm Folding ISD::FMAXNUM with IEEE=true into max3 here could be incorrect when one of the inputs is SNaN. There are no correctness issues with IEEE=false. Thoughts on moving this combine after legalizer? We could also potentially move min3/max3 and min/max patterns with constants to td file so that global-isel can use them as well. Petar.Avramovic: This patch fixes clamp and fmed3 folds that could be incorrect with IEEE=true(potential not…
if (Opc != AMDGPUISD::FMIN_LEGACY && Opc != AMDGPUISD::FMAX_LEGACY &&		if (Opc != AMDGPUISD::FMIN_LEGACY && Opc != AMDGPUISD::FMAX_LEGACY &&
!VT.isVector() &&		!VT.isVector() &&
(VT == MVT::i32 \|\| VT == MVT::f32 \|\|		(VT == MVT::i32 \|\| VT == MVT::f32 \|\|
((VT == MVT::f16 \|\| VT == MVT::i16) && Subtarget->hasMin3Max3_16()))) {		((VT == MVT::f16 \|\| VT == MVT::i16) && Subtarget->hasMin3Max3_16()))) {
// max(max(a, b), c) -> max3(a, b, c)		// max(max(a, b), c) -> max3(a, b, c)
// min(min(a, b), c) -> min3(a, b, c)		// min(min(a, b), c) -> min3(a, b, c)
if (Op0.getOpcode() == Opc && Op0.hasOneUse()) {		if (Op0.getOpcode() == Opc && Op0.hasOneUse()) {
SDLoc DL(N);		SDLoc DL(N);
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (((Opc == ISD::FMINNUM && Op0.getOpcode() == ISD::FMAXNUM) \|\|
Op0.hasOneUse()) {		Op0.hasOneUse()) {
if (SDValue Res = performFPMed3ImmCombine(DAG, SDLoc(N), Op0, Op1))		if (SDValue Res = performFPMed3ImmCombine(DAG, SDLoc(N), Op0, Op1))
return Res;		return Res;
}		}

return SDValue();		return SDValue();
}		}

static bool isClampZeroToOne(SDValue A, SDValue B) {		static bool isOperandExactlyValue(SDNode *N, unsigned Idx, double Value) {
if (ConstantFPSDNode *CA = dyn_cast<ConstantFPSDNode>(A)) {		if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N->getOperand(Idx)))
if (ConstantFPSDNode *CB = dyn_cast<ConstantFPSDNode>(B)) {		return C->isExactlyValue(Value);
// FIXME: Should this be allowing -0.0?
return (CA->isExactlyValue(0.0) && CB->isExactlyValue(1.0)) \|\|
(CA->isExactlyValue(1.0) && CB->isExactlyValue(0.0));
}
}

return false;		return false;
}		}

// FIXME: Should only worry about snans for version with chain.		// FIXME: Should only worry about snans for version with chain.
SDValue SITargetLowering::performFMed3Combine(SDNode *N,		SDValue SITargetLowering::performFMed3Combine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
// v_med3_f32 and v_max_f32 behave identically wrt denorms, exceptions and		// v_med3_f32 and v_max_f32 behave identically wrt denorms, exceptions and
// NaNs. With a NaN input, the order of the operands may change the result.		// NaNs. With a SNaN input in IEEE mode, the order of the operands may change
		// the result because then fmed3(a, b, c) is equivalent to min(min(a, b), c).
		foadUnsubmitted Not Done Reply Inline Actions Typo "is NaN". foad: Typo "is NaN".

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDLoc SL(N);		SDLoc SL(N);

SDValue Src0 = N->getOperand(0);		// Two, out of the three, operands need to be 0.0 and 1.0.
SDValue Src1 = N->getOperand(1);		unsigned ValIdx = 0, ZeroIdx = 1, OneIdx = 2; // Initial guess.
SDValue Src2 = N->getOperand(2);		// Find index of the operand with 0.0.
		if (!isOperandExactlyValue(N, ZeroIdx, 0.0)) {
if (isClampZeroToOne(Src0, Src1)) {		std::swap(ZeroIdx, ValIdx);
// const_a, const_b, x -> clamp is safe in all cases including signaling		if (!isOperandExactlyValue(N, ZeroIdx, 0.0)) {
// nans.		std::swap(ZeroIdx, OneIdx);
// FIXME: Should this be allowing -0.0?		if (!isOperandExactlyValue(N, ZeroIdx, 0.0))
return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Src2);		return SDValue();
		}
		}
		// Find index of the operand with 1.0, remaining index is Val.
		foadUnsubmitted Not Done Reply Inline Actions Typo "remaining". foad: Typo "remaining".
		if (!isOperandExactlyValue(N, OneIdx, 1.0)) {
		std::swap(OneIdx, ValIdx);
		if (!isOperandExactlyValue(N, OneIdx, 1.0))
		return SDValue();
}		}

		SDValue Val = N->getOperand(ValIdx);
		// If we're told that NaNs won't happen assume that it is safe to clamp.
		if (N->getFlags().hasNoNaNs() \|\| DAG.getTarget().Options.NoNaNsFPMath)
		return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Val, N->getFlags());

		// Folding fmed3(Val, 0.0, 1.0) into clamp(Val). Consider all 6 operand
		// permutations for fmed3. It is safe to clamp when Val is not NaN.
		// For NaN input we consider result of min(min(a, b), c) based on min wrt IEEE
		// mode and clamp(Val) wrt DX10Clamp. Val can be 0.0 or 1.0. Result depends on
		// the result of the inner min and value of the last operand(named c above).
		// SNaN input and IEEE=true: min(SNaN, Val) -> QNaN, min(QNaN, Val) -> Val
		// fmed3 returns either QNaN, 1.0 or 0.0(when 0.0 is last operand(c)).
		// X=QNaN or (X=SNaN and IEEE=false): min(NaN, Val) -> Val
		// min with NaN will return the other operand thus fmed3 returns 0.0.
		// It is not safe to fold with DX10Clamp=false since that lets NaN through and
		// never matches fmed3 result. DX10Clamp=true clamps NaN to 0.0 and is safe to
		// fold when fmed3 returns 0.0.
const MachineFunction &MF = DAG.getMachineFunction();		const MachineFunction &MF = DAG.getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
		if ((Info->getMode().DX10Clamp &&
		foadUnsubmitted Not Done Reply Inline Actions You're assuming that the behaviour of AMDGPUISD::FMED3 depends on Info->getMode().IEEE, which seems reasonable. foad: You're assuming that the behaviour of AMDGPUISD::FMED3 depends on Info->getMode().IEEE, which…
// FIXME: dx10_clamp behavior assumed in instcombine. Should we really bother		(!Info->getMode().IEEE \|\| ZeroIdx == 2 \|\| DAG.isKnownNeverSNaN(Val))) \|\|
// handling no dx10-clamp?		DAG.isKnownNeverNaN(Val))
		foadUnsubmitted Not Done Reply Inline Actions A constant could be a NaN, but isKnownNeverNaN should already have handled the constant case. So I don't think you need the check for isa<ConstantFPSDNode>(X) at all. foad: A constant could be a NaN, but isKnownNeverNaN should already have handled the constant case.
		foadUnsubmitted Not Done Reply Inline Actions Why do you keep flags here but not on line 9925? foad: Why do you keep flags here but not on line 9925?
		foadUnsubmitted Not Done Reply Inline Actions Typo "depending". foad: Typo "depending".
if (Info->getMode().DX10Clamp) {		return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Val, N->getFlags());
// If NaNs is clamped to 0, we are free to reorder the inputs.

if (isa<ConstantFPSDNode>(Src0) && !isa<ConstantFPSDNode>(Src1))
std::swap(Src0, Src1);

if (isa<ConstantFPSDNode>(Src1) && !isa<ConstantFPSDNode>(Src2))
std::swap(Src1, Src2);

if (isa<ConstantFPSDNode>(Src0) && !isa<ConstantFPSDNode>(Src1))
std::swap(Src0, Src1);

if (isClampZeroToOne(Src1, Src2))
return DAG.getNode(AMDGPUISD::CLAMP, SL, VT, Src0);
}

return SDValue();		return SDValue();
}		}

SDValue SITargetLowering::performCvtPkRTZCombine(SDNode *N,		SDValue SITargetLowering::performCvtPkRTZCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SDValue Src0 = N->getOperand(0);		SDValue Src0 = N->getOperand(0);
SDValue Src1 = N->getOperand(1);		SDValue Src1 = N->getOperand(1);
▲ Show 20 Lines • Show All 1,985 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/clamp-modifier.ll

	Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64

	; VI: v_cvt_pkrtz_f16_f32 v0, v0, v1{{$}}			; VI: v_cvt_pkrtz_f16_f32 v0, v0, v1{{$}}
	; VI: v_max_f16_sdwa			; VI: v_max_f16_sdwa
	; VI: v_max_f16_e64			; VI: v_max_f16_e64
	; VI: v_or_b32			; VI: v_or_b32

	; SI: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1{{$}}			; SI: v_cvt_pkrtz_f16_f32_e32 v0, v0, v1{{$}}
	; SI-DAG: v_cvt_f32_f16_e64 v0, v0 clamp			; SI-DAG: v_cvt_f32_f16_e32 v0, v0
	; SI-DAG: v_cvt_f32_f16_e64 v1, v1 clamp			; SI-DAG: v_cvt_f32_f16_e32 v1, v1
				; SI-DAG: v_mul_f32_e64 v1, 1.0, v1 clamp
				; SI-DAG: v_mul_f32_e64 v0, 1.0, v0 clamp
	define <2 x half> @v_clamp_cvt_pkrtz_src_v2f16_denorm(float %a, float %b) #0 {			define <2 x half> @v_clamp_cvt_pkrtz_src_v2f16_denorm(float %a, float %b) #0 {
	%add = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float %b)			%add = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float %b)
	%max = call <2 x half> @llvm.maxnum.v2f16(<2 x half> %add, <2 x half> zeroinitializer)			%max = call <2 x half> @llvm.maxnum.v2f16(<2 x half> %add, <2 x half> zeroinitializer)
	%clamp = call <2 x half> @llvm.minnum.v2f16(<2 x half> %max, <2 x half> <half 1.0, half 1.0>)			%clamp = call <2 x half> @llvm.minnum.v2f16(<2 x half> %max, <2 x half> <half 1.0, half 1.0>)
	ret <2 x half> %clamp			ret <2 x half> %clamp
	}			}

	declare i32 @llvm.amdgcn.workitem.id.x() #1			declare i32 @llvm.amdgcn.workitem.id.x() #1
	Show All 39 Lines

llvm/test/CodeGen/AMDGPU/clamp.ll

; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI,GFX678 %s		; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI,GFX678 %s
; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX89,VI,GFX678 %s		; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX89,VI,GFX678 %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX89,GFX9 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX89,GFX9 %s

; GCN-LABEL: {{^}}v_clamp_f32:		; GCN-LABEL: {{^}}v_clamp_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GFX678: v_mul_f32_e64 v{{[0-9]+}}, 1.0, [[A]] clamp{{$}}
		; GFX9: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}
define amdgpu_kernel void @v_clamp_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%max = call float @llvm.maxnum.f32(float %a, float 0.0)		%max = call float @llvm.maxnum.f32(float %a, float 0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_neg_f32:		; GCN-LABEL: {{^}}v_clamp_neg_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, -[[A]], -[[A]] clamp{{$}}		; GFX678: v_mul_f32_e64 v{{[0-9]+}}, -1.0, [[A]] clamp{{$}}
		; GFX9: v_max_f32_e64 v{{[0-9]+}}, -[[A]], -[[A]] clamp{{$}}
define amdgpu_kernel void @v_clamp_neg_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_neg_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%fneg.a = fsub float -0.0, %a		%fneg.a = fsub float -0.0, %a
%max = call float @llvm.maxnum.f32(float %fneg.a, float 0.0)		%max = call float @llvm.maxnum.f32(float %fneg.a, float 0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_negabs_f32:		; GCN-LABEL: {{^}}v_clamp_negabs_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, -\|[[A]]\|, -\|[[A]]\| clamp{{$}}		; GFX678: v_mul_f32_e64 v{{[0-9]+}}, -1.0, \|[[A]]\| clamp{{$}}
		; GFX9: v_max_f32_e64 v{{[0-9]+}}, -\|[[A]]\|, -\|[[A]]\| clamp{{$}}
define amdgpu_kernel void @v_clamp_negabs_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_negabs_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%fabs.a = call float @llvm.fabs.f32(float %a)		%fabs.a = call float @llvm.fabs.f32(float %a)
%fneg.fabs.a = fsub float -0.0, %fabs.a		%fneg.fabs.a = fsub float -0.0, %fabs.a

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_clamp_multi_use_max_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
store volatile float %max, float addrspace(1)* undef		store volatile float %max, float addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_f16:		; GCN-LABEL: {{^}}v_clamp_f16:
; GCN: {{buffer\|flat\|global}}_load_ushort [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_ushort [[A:v[0-9]+]]
; GFX89: v_max_f16_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GFX89: v_max_f16_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}

; SI: v_cvt_f32_f16_e64 [[CVT:v[0-9]+]], [[A]] clamp{{$}}		; SI: v_cvt_f32_f16_e32 [[CVT:v[0-9]+]], [[A]]
; SI: v_cvt_f16_f32_e32 v{{[0-9]+}}, [[CVT]]		; SI: v_mul_f32_e64 [[FCANON:v[0-9]+]], 1.0, [[CVT]] clamp{{$}}
		; SI: v_cvt_f16_f32_e32 v{{[0-9]+}}, [[FCANON]]
define amdgpu_kernel void @v_clamp_f16(half addrspace(1)* %out, half addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_f16(half addrspace(1)* %out, half addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr half, half addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr half, half addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr half, half addrspace(1)* %out, i32 %tid		%out.gep = getelementptr half, half addrspace(1)* %out, i32 %tid
%a = load half, half addrspace(1)* %gep0		%a = load half, half addrspace(1)* %gep0
%max = call half @llvm.maxnum.f16(half %a, half 0.0)		%max = call half @llvm.maxnum.f16(half %a, half 0.0)
%med = call half @llvm.minnum.f16(half %max, half 1.0)		%med = call half @llvm.minnum.f16(half %max, half 1.0)

store half %med, half addrspace(1)* %out.gep		store half %med, half addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_neg_f16:		; GCN-LABEL: {{^}}v_clamp_neg_f16:
; GCN: {{buffer\|flat\|global}}_load_ushort [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_ushort [[A:v[0-9]+]]
; GFX89: v_max_f16_e64 v{{[0-9]+}}, -[[A]], -[[A]] clamp{{$}}		; GFX89: v_max_f16_e64 v{{[0-9]+}}, -[[A]], -[[A]] clamp{{$}}

; FIXME: Better to fold neg into max		; FIXME: Better to fold neg into max
; SI: v_cvt_f32_f16_e64 [[CVT:v[0-9]+]], -[[A]] clamp{{$}}		; SI: v_cvt_f32_f16_e64 [[CVT:v[0-9]+]], -[[A]]
; SI: v_cvt_f16_f32_e32 v{{[0-9]+}}, [[CVT]]		; SI: v_mul_f32_e64 [[FCANON:v[0-9]+]], 1.0, [[CVT]] clamp{{$}}
		; SI: v_cvt_f16_f32_e32 v{{[0-9]+}}, [[FCANON]]
define amdgpu_kernel void @v_clamp_neg_f16(half addrspace(1)* %out, half addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_neg_f16(half addrspace(1)* %out, half addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr half, half addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr half, half addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr half, half addrspace(1)* %out, i32 %tid		%out.gep = getelementptr half, half addrspace(1)* %out, i32 %tid
%a = load half, half addrspace(1)* %gep0		%a = load half, half addrspace(1)* %gep0
%fneg.a = fsub half -0.0, %a		%fneg.a = fsub half -0.0, %a
%max = call half @llvm.maxnum.f16(half %fneg.a, half 0.0)		%max = call half @llvm.maxnum.f16(half %fneg.a, half 0.0)
%med = call half @llvm.minnum.f16(half %max, half 1.0)		%med = call half @llvm.minnum.f16(half %max, half 1.0)

store half %med, half addrspace(1)* %out.gep		store half %med, half addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_negabs_f16:		; GCN-LABEL: {{^}}v_clamp_negabs_f16:
; GCN: {{buffer\|flat\|global}}_load_ushort [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_ushort [[A:v[0-9]+]]
; GFX89: v_max_f16_e64 v{{[0-9]+}}, -\|[[A]]\|, -\|[[A]]\| clamp{{$}}		; GFX89: v_max_f16_e64 v{{[0-9]+}}, -\|[[A]]\|, -\|[[A]]\| clamp{{$}}

; FIXME: Better to fold neg/abs into max		; FIXME: Better to fold neg/abs into max

; SI: v_cvt_f32_f16_e64 [[CVT:v[0-9]+]], -\|[[A]]\| clamp{{$}}		; SI: v_cvt_f32_f16_e64 [[CVT:v[0-9]+]], -\|[[A]]\|
; SI: v_cvt_f16_f32_e32 v{{[0-9]+}}, [[CVT]]		; SI: v_mul_f32_e64 [[FCANON:v[0-9]+]], 1.0, [[CVT]] clamp{{$}}
		; SI: v_cvt_f16_f32_e32 v{{[0-9]+}}, [[FCANON]]
define amdgpu_kernel void @v_clamp_negabs_f16(half addrspace(1)* %out, half addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_negabs_f16(half addrspace(1)* %out, half addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr half, half addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr half, half addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr half, half addrspace(1)* %out, i32 %tid		%out.gep = getelementptr half, half addrspace(1)* %out, i32 %tid
%a = load half, half addrspace(1)* %gep0		%a = load half, half addrspace(1)* %gep0
%fabs.a = call half @llvm.fabs.f16(half %a)		%fabs.a = call half @llvm.fabs.f16(half %a)
%fneg.fabs.a = fsub half -0.0, %fabs.a		%fneg.fabs.a = fsub half -0.0, %fabs.a

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_clamp_med3_aby_negzero_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float -0.0, float 1.0, float %a)		%med = call float @llvm.amdgcn.fmed3.f32(float -0.0, float 1.0, float %a)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_aby_f32:		; GCN-LABEL: {{^}}v_clamp_med3_aby_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_med3_f32 v{{[0-9]+}}, 0, 1.0, [[A]]
define amdgpu_kernel void @v_clamp_med3_aby_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_med3_aby_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float %a)		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float %a)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_bay_f32:		; GCN-LABEL: {{^}}v_clamp_med3_bay_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_med3_f32 v{{[0-9]+}}, 1.0, 0, [[A]]
define amdgpu_kernel void @v_clamp_med3_bay_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_med3_bay_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float 1.0, float 0.0, float %a)		%med = call float @llvm.amdgcn.fmed3.f32(float 1.0, float 0.0, float %a)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_yab_f32:		; GCN-LABEL: {{^}}v_clamp_med3_yab_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_med3_f32 v{{[0-9]+}}, [[A]], 0, 1.0
define amdgpu_kernel void @v_clamp_med3_yab_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_med3_yab_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float %a, float 0.0, float 1.0)		%med = call float @llvm.amdgcn.fmed3.f32(float %a, float 0.0, float 1.0)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
Show All 9 Lines	define amdgpu_kernel void @v_clamp_med3_yba_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float %a, float 1.0, float 0.0)		%med = call float @llvm.amdgcn.fmed3.f32(float %a, float 1.0, float 0.0)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_ayb_f32:		; GCN-LABEL: {{^}}v_clamp_med3_ayb_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_med3_f32 v{{[0-9]+}}, 0, [[A]], 1.0
define amdgpu_kernel void @v_clamp_med3_ayb_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_med3_ayb_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float %a, float 1.0)		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float %a, float 1.0)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_bya_f32:		; GCN-LABEL: {{^}}v_clamp_med3_bya_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}
define amdgpu_kernel void @v_clamp_med3_bya_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {		define amdgpu_kernel void @v_clamp_med3_bya_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float 1.0, float %a, float 0.0)		%med = call float @llvm.amdgcn.fmed3.f32(float 1.0, float %a, float 0.0)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

		; GCN-LABEL: {{^}}v_clamp_med3_ayb_f32_snan_no_ieee:
		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
		; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}
		define amdgpu_kernel void @v_clamp_med3_ayb_f32_snan_no_ieee(float addrspace(1)* %out, float addrspace(1)* %aptr) #5 {
		%tid = call i32 @llvm.amdgcn.workitem.id.x()
		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
		%a = load float, float addrspace(1)* %gep0
		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float %a, float 1.0)
		store float %med, float addrspace(1)* %out.gep
		ret void
		}

		; GCN-LABEL: {{^}}v_clamp_nnan_med3_ayb_f32:
		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
		; GCN: v_med3_f32 v{{[0-9]+}}, 0, [[A]], 1.0
		define amdgpu_kernel void @v_clamp_nnan_med3_ayb_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0 {
		%tid = call i32 @llvm.amdgcn.workitem.id.x()
		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
		%a = load float, float addrspace(1)* %gep0
		%med = call nnan float @llvm.amdgcn.fmed3.f32(float 0.0, float %a, float 1.0)
		store float %med, float addrspace(1)* %out.gep
		ret void
		}

		; GCN-LABEL: {{^}}v_clamp_med3_ayb_f32_no_nans_fp_math:
		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
		; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}
		define amdgpu_kernel void @v_clamp_med3_ayb_f32_no_nans_fp_math(float addrspace(1)* %out, float addrspace(1)* %aptr) #6 {
		%tid = call i32 @llvm.amdgcn.workitem.id.x()
		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
		%a = load float, float addrspace(1)* %gep0
		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float %a, float 1.0)
		store float %med, float addrspace(1)* %out.gep
		ret void
		}

; GCN-LABEL: {{^}}v_clamp_constants_to_one_f32:		; GCN-LABEL: {{^}}v_clamp_constants_to_one_f32:
; GCN: v_mov_b32_e32 v{{[0-9]+}}, 1.0		; GCN: v_mov_b32_e32 v{{[0-9]+}}, 1.0
define amdgpu_kernel void @v_clamp_constants_to_one_f32(float addrspace(1)* %out) #0 {		define amdgpu_kernel void @v_clamp_constants_to_one_f32(float addrspace(1)* %out) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 4.0)		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 4.0)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
Show All 35 Lines	define amdgpu_kernel void @v_clamp_constant_qnan_f32(float addrspace(1)* %out) #0 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8000000000000)		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8000000000000)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_constant_snan_f32:		; GCN-LABEL: {{^}}v_clamp_constant_snan_f32:
; GCN: v_mov_b32_e32 v{{[0-9]+}}, 0{{$}}		; GCN: v_mov_b32_e32 [[A:v[0-9]+]], 0x7f800001{{$}}
		; GCN: v_med3_f32 v{{[0-9]+}}, 0, 1.0, [[A]]
define amdgpu_kernel void @v_clamp_constant_snan_f32(float addrspace(1)* %out) #0 {		define amdgpu_kernel void @v_clamp_constant_snan_f32(float addrspace(1)* %out) #0 {
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions isa<ConstantFPSDNode>(X) check in min max combine is there to cover existing test where X is NaN constant, min max will be folded to clamp and clamp will become constant in performClampCombine. Petar.Avramovic: isa<ConstantFPSDNode>(X) check in min max combine is there to cover existing test where X is…
		foadUnsubmitted Not Done Reply Inline Actions This test is wrong then, isn't it? It's checking that fmed3(0,1,snan) -> 0, with DX10Clamp=1 and IEEE=1. But that's the wrong answer. fmed3(0,1,snan) is qnan in IEEE mode. This test is relying on an incorrect folding from fmed3 to clamp. Maybe the test should be changed to put the 0 last, i.e. fmed3(snan,1,0) ? foad: This test is wrong then, isn't it? It's checking that fmed3(0,1,snan) -> 0, with DX10Clamp=1…
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float bitcast (i32 2139095041 to float))		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float bitcast (i32 2139095041 to float))
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; ---------------------------------------------------------------------		; ---------------------------------------------------------------------
; Test non-default behaviors enabling snans and disabling dx10_clamp		; Test non-default behaviors enabling snans and disabling dx10_clamp
; ---------------------------------------------------------------------		; ---------------------------------------------------------------------

; GCN-LABEL: {{^}}v_clamp_f32_no_dx10_clamp:		; GCN-LABEL: {{^}}v_clamp_f32_no_dx10_clamp:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_add_f32_e32 [[ADD:v[0-9]+]], 0.5, [[A]]		; GCN: v_add_f32_e64 [[ADD:v[0-9]+]], [[A]], 0.5 clamp{{$}}
; GCN: v_med3_f32 v{{[0-9]+}}, [[ADD]], 0, 1.0
define amdgpu_kernel void @v_clamp_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {		define amdgpu_kernel void @v_clamp_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%a.nnan = fadd nnan float %a, 0.5		%a.nnan = fadd nnan float %a, 0.5
%max = call float @llvm.maxnum.f32(float %a.nnan, float 0.0)		%max = call float @llvm.maxnum.f32(float %a.nnan, float 0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)
Show All 15 Lines	define amdgpu_kernel void @v_clamp_f32_snan_dx10clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #3 {
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_f32_snan_no_dx10clamp:		; GCN-LABEL: {{^}}v_clamp_f32_snan_no_dx10clamp:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GFX678: v_mul_f32_e32 [[QUIET_A:v[0-9]+]], 1.0, [[A]]		; GCN: v_med3_f32 {{v[0-9]+}}, [[A]], 0, 1.0
; GFX9: v_max_f32_e32 [[QUIET_A:v[0-9]+]], [[A]], [[A]]
; GCN: v_med3_f32 {{v[0-9]+}}, [[QUIET_A]], 0, 1.0
define amdgpu_kernel void @v_clamp_f32_snan_no_dx10clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #4 {		define amdgpu_kernel void @v_clamp_f32_snan_no_dx10clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #4 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%max = call float @llvm.maxnum.f32(float %a, float 0.0)		%max = call float @llvm.maxnum.f32(float %a, float 0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_f32_snan_no_dx10clamp_nnan_src:		; GCN-LABEL: {{^}}v_clamp_f32_snan_no_dx10clamp_nnan_src:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_add_f32_e32 [[ADD:v[0-9]+]], 1.0, [[A]]		; GCN: v_add_f32_e64 [[ADD:v[0-9]+]], [[A]], 1.0 clamp{{$}}
; GCN: v_med3_f32 v{{[0-9]+}}, [[ADD]], 0, 1.0
define amdgpu_kernel void @v_clamp_f32_snan_no_dx10clamp_nnan_src(float addrspace(1)* %out, float addrspace(1)* %aptr) #4 {		define amdgpu_kernel void @v_clamp_f32_snan_no_dx10clamp_nnan_src(float addrspace(1)* %out, float addrspace(1)* %aptr) #4 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%add = fadd nnan float %a, 1.0		%add = fadd nnan float %a, 1.0
%max = call float @llvm.maxnum.f32(float %add, float 0.0)		%max = call float @llvm.maxnum.f32(float %add, float 0.0)
%med = call float @llvm.minnum.f32(float %max, float 1.0)		%med = call float @llvm.minnum.f32(float %max, float 1.0)

store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_aby_f32_no_dx10_clamp:		; GCN-LABEL: {{^}}v_clamp_med3_aby_f32_no_dx10_clamp:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_med3_f32 v{{[0-9]+}}, 0, 1.0, [[A]]
define amdgpu_kernel void @v_clamp_med3_aby_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {		define amdgpu_kernel void @v_clamp_med3_aby_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float %a)		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float %a)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_med3_bay_f32_no_dx10_clamp:		; GCN-LABEL: {{^}}v_clamp_med3_bay_f32_no_dx10_clamp:
; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[A]] clamp{{$}}		; GCN: v_med3_f32 v{{[0-9]+}}, 1.0, 0, [[A]]
define amdgpu_kernel void @v_clamp_med3_bay_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {		define amdgpu_kernel void @v_clamp_med3_bay_f32_no_dx10_clamp(float addrspace(1)* %out, float addrspace(1)* %aptr) #2 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid
%a = load float, float addrspace(1)* %gep0		%a = load float, float addrspace(1)* %gep0
%med = call float @llvm.amdgcn.fmed3.f32(float 1.0, float 0.0, float %a)		%med = call float @llvm.amdgcn.fmed3.f32(float 1.0, float 0.0, float %a)
store float %med, float addrspace(1)* %out.gep		store float %med, float addrspace(1)* %out.gep
ret void		ret void
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_clamp_v2f16_undef_limit_elts1(<2 x half> addrspace(1)* %out, <2 x half> addrspace(1)* %aptr) #0 {

store <2 x half> %med, <2 x half> addrspace(1)* %out.gep		store <2 x half> %med, <2 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}v_clamp_diff_source_f32:		; GCN-LABEL: {{^}}v_clamp_diff_source_f32:
; GCN: v_add_f32_e32 [[A:v[0-9]+]]		; GCN: v_add_f32_e32 [[A:v[0-9]+]]
; GCN: v_add_f32_e32 [[B:v[0-9]+]]		; GCN: v_add_f32_e32 [[B:v[0-9]+]]
; GCN: v_max_f32_e64 v{{[0-9]+}}, [[A]], [[B]] clamp{{$}}		; GCN: v_max3_f32 [[MAX3:v[0-9]+]], [[A]], [[B]], 0
		; GCN: v_min_f32_e32 v{{[0-9]+}}, 1.0, [[MAX3]]
define amdgpu_kernel void @v_clamp_diff_source_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0		define amdgpu_kernel void @v_clamp_diff_source_f32(float addrspace(1)* %out, float addrspace(1)* %aptr) #0
{		{
%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 0		%gep0 = getelementptr float, float addrspace(1)* %aptr, i32 0
%gep1 = getelementptr float, float addrspace(1)* %aptr, i32 1		%gep1 = getelementptr float, float addrspace(1)* %aptr, i32 1
%gep2 = getelementptr float, float addrspace(1)* %aptr, i32 2		%gep2 = getelementptr float, float addrspace(1)* %aptr, i32 2
%l0 = load float, float addrspace(1)* %gep0		%l0 = load float, float addrspace(1)* %gep0
%l1 = load float, float addrspace(1)* %gep1		%l1 = load float, float addrspace(1)* %gep1
%l2 = load float, float addrspace(1)* %gep2		%l2 = load float, float addrspace(1)* %gep2
Show All 22 Lines
declare <2 x half> @llvm.minnum.v2f16(<2 x half>, <2 x half>) #1		declare <2 x half> @llvm.minnum.v2f16(<2 x half>, <2 x half>) #1
declare <2 x half> @llvm.maxnum.v2f16(<2 x half>, <2 x half>) #1		declare <2 x half> @llvm.maxnum.v2f16(<2 x half>, <2 x half>) #1

attributes #0 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" }		attributes #0 = { nounwind "denormal-fp-math-f32"="preserve-sign,preserve-sign" }
attributes #1 = { nounwind readnone }		attributes #1 = { nounwind readnone }
attributes #2 = { nounwind "amdgpu-dx10-clamp"="false" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "no-nans-fp-math"="false" }		attributes #2 = { nounwind "amdgpu-dx10-clamp"="false" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "no-nans-fp-math"="false" }
attributes #3 = { nounwind "amdgpu-dx10-clamp"="true" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "no-nans-fp-math"="false" }		attributes #3 = { nounwind "amdgpu-dx10-clamp"="true" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "no-nans-fp-math"="false" }
attributes #4 = { nounwind "amdgpu-dx10-clamp"="false" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "no-nans-fp-math"="false" }		attributes #4 = { nounwind "amdgpu-dx10-clamp"="false" "denormal-fp-math-f32"="preserve-sign,preserve-sign" "no-nans-fp-math"="false" }
		attributes #5 = { nounwind "amdgpu-dx10-clamp"="true" "amdgpu-ieee"="false" }
		attributes #6 = { nounwind "no-nans-fp-math"="true" }

llvm/test/CodeGen/AMDGPU/known-never-snan.ll

	Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	}			}

	define float @v_test_known_not_snan_maxnum_input_fmed3_r_i_i_f32(float %a, float %b) #0 {			define float @v_test_known_not_snan_maxnum_input_fmed3_r_i_i_f32(float %a, float %b) #0 {
	; GCN-LABEL: v_test_known_not_snan_maxnum_input_fmed3_r_i_i_f32:			; GCN-LABEL: v_test_known_not_snan_maxnum_input_fmed3_r_i_i_f32:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_rcp_f32_e32 v0, v0			; GCN-NEXT: v_rcp_f32_e32 v0, v0
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: v_max_f32_e32 v0, v0, v1			; GCN-NEXT: v_max3_f32 v0, v0, v1, 2.0
	; GCN-NEXT: v_med3_f32 v0, v0, 2.0, 4.0			; GCN-NEXT: v_min_f32_e32 v0, 4.0, v0
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%a.nnan.add = fdiv nnan float 1.0, %a, !fpmath !0			%a.nnan.add = fdiv nnan float 1.0, %a, !fpmath !0
	%b.nnan.add = fadd nnan float %b, 1.0			%b.nnan.add = fadd nnan float %b, 1.0
	%known.not.snan = call float @llvm.maxnum.f32(float %a.nnan.add, float %b.nnan.add)			%known.not.snan = call float @llvm.maxnum.f32(float %a.nnan.add, float %b.nnan.add)
	%max = call float @llvm.maxnum.f32(float %known.not.snan, float 2.0)			%max = call float @llvm.maxnum.f32(float %known.not.snan, float 2.0)
	%med = call float @llvm.minnum.f32(float %max, float 4.0)			%med = call float @llvm.minnum.f32(float %max, float 4.0)
	ret float %med			ret float %med
	}			}

	define float @v_maxnum_possible_nan_lhs_input_fmed3_r_i_i_f32(float %a, float %b) #0 {			define float @v_maxnum_possible_nan_lhs_input_fmed3_r_i_i_f32(float %a, float %b) #0 {
	; GCN-LABEL: v_maxnum_possible_nan_lhs_input_fmed3_r_i_i_f32:			; GCN-LABEL: v_maxnum_possible_nan_lhs_input_fmed3_r_i_i_f32:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: v_mul_f32_e32 v0, 1.0, v0			; GCN-NEXT: v_max3_f32 v0, v0, v1, 2.0
	; GCN-NEXT: v_max_f32_e32 v0, v0, v1			; GCN-NEXT: v_min_f32_e32 v0, 4.0, v0
	; GCN-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%b.nnan.add = fadd nnan float %b, 1.0			%b.nnan.add = fadd nnan float %b, 1.0
	%known.not.snan = call float @llvm.maxnum.f32(float %a, float %b.nnan.add)			%known.not.snan = call float @llvm.maxnum.f32(float %a, float %b.nnan.add)
	%max = call float @llvm.maxnum.f32(float %known.not.snan, float 2.0)			%max = call float @llvm.maxnum.f32(float %known.not.snan, float 2.0)
	%med = call float @llvm.minnum.f32(float %max, float 4.0)			%med = call float @llvm.minnum.f32(float %max, float 4.0)
	ret float %med			ret float %med
	}			}

	define float @v_maxnum_possible_nan_rhs_input_fmed3_r_i_i_f32(float %a, float %b) #0 {			define float @v_maxnum_possible_nan_rhs_input_fmed3_r_i_i_f32(float %a, float %b) #0 {
	; GCN-LABEL: v_maxnum_possible_nan_rhs_input_fmed3_r_i_i_f32:			; GCN-LABEL: v_maxnum_possible_nan_rhs_input_fmed3_r_i_i_f32:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_rcp_f32_e32 v0, v0			; GCN-NEXT: v_rcp_f32_e32 v0, v0
	; GCN-NEXT: v_mul_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_max3_f32 v0, v0, v1, 2.0
	; GCN-NEXT: v_max_f32_e32 v0, v0, v1			; GCN-NEXT: v_min_f32_e32 v0, 4.0, v0
	; GCN-NEXT: v_med3_f32 v0, v0, 2.0, 4.0
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%a.nnan.add = fdiv nnan float 1.0, %a, !fpmath !0			%a.nnan.add = fdiv nnan float 1.0, %a, !fpmath !0
	%known.not.snan = call float @llvm.maxnum.f32(float %a.nnan.add, float %b)			%known.not.snan = call float @llvm.maxnum.f32(float %a.nnan.add, float %b)
	%max = call float @llvm.maxnum.f32(float %known.not.snan, float 2.0)			%max = call float @llvm.maxnum.f32(float %known.not.snan, float 2.0)
	%med = call float @llvm.minnum.f32(float %max, float 4.0)			%med = call float @llvm.minnum.f32(float %max, float 4.0)
	ret float %med			ret float %med
	}			}

	▲ Show 20 Lines • Show All 449 Lines • Show Last 20 Lines