This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
1/2
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
PhaseOrdering/X86/
-
X86/
-
vector-reductions-expanded.ll
-
SLPVectorizer/X86/
-
X86/
-
fmaxnum.ll
-
fminnum.ll

Differential D94913

[SLP] match maxnum/minnum intrinsics as FP reduction ops
ClosedPublic

Authored by spatel on Jan 18 2021, 6:59 AM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
craig.topper
fhahn
dmgreen

Commits

rG5b77ac32b115: [SLP] match maxnum/minnum intrinsics as FP reduction ops

Summary

After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready.
We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this will hopefully restore that for SLP.
There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). I am planning to look at fixing FMF propagation next.
There's also an open cost model question: should we prefer the getIntrinsicInstrCost() API (as is currently shown in the patch) or use getMinMaxReductionCost() (as is currently shown for the integer min/max ops)? There are no test differences with the current regression tests, but that will need to be examined in more detail to make sure we are getting accurate costs.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jan 18 2021, 6:59 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptJan 18 2021, 6:59 AM

spatel requested review of this revision.Jan 18 2021, 6:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2021, 6:59 AM

Patch updated: I changed some variable names with d1c4e85 and missed rebasing this diff.

ABataev added inline comments.Jan 18 2021, 8:51 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7106–7108	Probably better to use getMinMaxReductionCost rather than the cost of the intrinsic call, it is more precise.

spatel added inline comments.Jan 18 2021, 9:19 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
7106–7108	Ok - I'll update. The current patch is actually wrong because I did not use the correct reduction IDs (`Intrinsic::vector_reduce_fmax`) for the vector cost.

Patch updated:

Change cost calc to use getMinMaxReductionCost() (this is copied from the existing integer min/max code).
Added test coverage/comments to demonstrate FMF constraints.

LGTM

This revision is now accepted and ready to land.Jan 18 2021, 11:02 AM

Closed by commit rG5b77ac32b115: [SLP] match maxnum/minnum intrinsics as FP reduction ops (authored by spatel). · Explain WhyJan 18 2021, 2:37 PM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG5b77ac32b115: [SLP] match maxnum/minnum intrinsics as FP reduction ops.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

42 lines

test/

Transforms/

PhaseOrdering/

X86/

vector-reductions-expanded.ll

21 lines

SLPVectorizer/

X86/

fmaxnum.ll

50 lines

fminnum.ll

50 lines

Diff 317420

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,449 Lines • ▼ Show 20 Lines	Value createOp(IRBuilder<> &Builder, Value LHS, Value *RHS,
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::Or:		case RecurKind::Or:
case RecurKind::And:		case RecurKind::And:
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::FAdd:		case RecurKind::FAdd:
case RecurKind::FMul:		case RecurKind::FMul:
return Builder.CreateBinOp((Instruction::BinaryOps)RdxOpcode, LHS, RHS,		return Builder.CreateBinOp((Instruction::BinaryOps)RdxOpcode, LHS, RHS,
Name);		Name);
		case RecurKind::FMax:
		return Builder.CreateBinaryIntrinsic(Intrinsic::maxnum, LHS, RHS);
		case RecurKind::FMin:
		return Builder.CreateBinaryIntrinsic(Intrinsic::minnum, LHS, RHS);

case RecurKind::SMax: {		case RecurKind::SMax: {
Value *Cmp = Builder.CreateICmpSGT(LHS, RHS, Name);		Value *Cmp = Builder.CreateICmpSGT(LHS, RHS, Name);
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
}		}
case RecurKind::SMin: {		case RecurKind::SMin: {
Value *Cmp = Builder.CreateICmpSLT(LHS, RHS, Name);		Value *Cmp = Builder.CreateICmpSLT(LHS, RHS, Name);
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:
}		}

/// Checks if instruction is associative and can be vectorized.		/// Checks if instruction is associative and can be vectorized.
bool isAssociative(Instruction *I) const {		bool isAssociative(Instruction *I) const {
assert(Kind != RecurKind::None && "Expected reduction operation.");		assert(Kind != RecurKind::None && "Expected reduction operation.");
if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(Kind))		if (RecurrenceDescriptor::isIntMinMaxRecurrenceKind(Kind))
return true;		return true;

		if (Kind == RecurKind::FMax \|\| Kind == RecurKind::FMin) {
		// FP min/max are associative except for NaN and -0.0. We do not
		// have to rule out -0.0 here because the intrinsic semantics do not
		// specify a fixed result for it.
		// TODO: This is artificially restricted to fast because the code that
		// creates reductions assumes/produces fast ops.
		return I->getFastMathFlags().isFast();
		}

return I->isAssociative();		return I->isAssociative();
}		}

/// Checks if the reduction operation can be vectorized.		/// Checks if the reduction operation can be vectorized.
bool isVectorizable(Instruction *I) const {		bool isVectorizable(Instruction *I) const {
return isVectorizable() && isAssociative(I);		return isVectorizable() && isAssociative(I);
}		}

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (match(I, m_Or(m_Value(), m_Value())))
return OperationData(RecurKind::Or);		return OperationData(RecurKind::Or);
if (match(I, m_Xor(m_Value(), m_Value())))		if (match(I, m_Xor(m_Value(), m_Value())))
return OperationData(RecurKind::Xor);		return OperationData(RecurKind::Xor);
if (match(I, m_FAdd(m_Value(), m_Value())))		if (match(I, m_FAdd(m_Value(), m_Value())))
return OperationData(RecurKind::FAdd);		return OperationData(RecurKind::FAdd);
if (match(I, m_FMul(m_Value(), m_Value())))		if (match(I, m_FMul(m_Value(), m_Value())))
return OperationData(RecurKind::FMul);		return OperationData(RecurKind::FMul);

		if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value())))
		return OperationData(RecurKind::FMax);
		if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())))
		return OperationData(RecurKind::FMin);

if (match(I, m_SMax(m_Value(), m_Value())))		if (match(I, m_SMax(m_Value(), m_Value())))
return OperationData(RecurKind::SMax);		return OperationData(RecurKind::SMax);
if (match(I, m_SMin(m_Value(), m_Value())))		if (match(I, m_SMin(m_Value(), m_Value())))
return OperationData(RecurKind::SMin);		return OperationData(RecurKind::SMin);
if (match(I, m_UMax(m_Value(), m_Value())))		if (match(I, m_UMax(m_Value(), m_Value())))
return OperationData(RecurKind::UMax);		return OperationData(RecurKind::UMax);
if (match(I, m_UMin(m_Value(), m_Value())))		if (match(I, m_UMin(m_Value(), m_Value())))
return OperationData(RecurKind::UMin);		return OperationData(RecurKind::UMin);
▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	int getReductionCost(TargetTransformInfo TTI, Value FirstReducedVal,
case RecurKind::FAdd:		case RecurKind::FAdd:
case RecurKind::FMul: {		case RecurKind::FMul: {
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);		unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);
VectorCost = TTI->getArithmeticReductionCost(RdxOpcode, VectorTy,		VectorCost = TTI->getArithmeticReductionCost(RdxOpcode, VectorTy,
/IsPairwiseForm=/false);		/IsPairwiseForm=/false);
ScalarCost = TTI->getArithmeticInstrCost(RdxOpcode, ScalarTy);		ScalarCost = TTI->getArithmeticInstrCost(RdxOpcode, ScalarTy);
break;		break;
}		}
		case RecurKind::FMax:
		case RecurKind::FMin: {
		auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));
		VectorCost =
		TTI->getMinMaxReductionCost(VectorTy, VecCondTy,
		/pairwise=/false, /unsigned=/false);
		ScalarCost =
		TTI->getCmpSelInstrCost(Instruction::FCmp, ScalarTy) +
		TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
		CmpInst::makeCmpResultType(ScalarTy));
		break;
		}
		ABataevUnsubmitted Not Done Reply Inline Actions Probably better to use getMinMaxReductionCost rather than the cost of the intrinsic call, it is more precise. ABataev: Probably better to use getMinMaxReductionCost rather than the cost of the intrinsic call, it is…
		spatelAuthorUnsubmitted Done Reply Inline Actions Ok - I'll update. The current patch is actually wrong because I did not use the correct reduction IDs (`Intrinsic::vector_reduce_fmax`) for the vector cost. spatel: Ok - I'll update. The current patch is actually wrong because I did not use the correct…
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin: {		case RecurKind::UMin: {
auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));		auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));
bool IsUnsigned = Kind == RecurKind::UMax \|\| Kind == RecurKind::UMin;		bool IsUnsigned = Kind == RecurKind::UMax \|\| Kind == RecurKind::UMin;
VectorCost =		VectorCost =
TTI->getMinMaxReductionCost(VectorTy, VecCondTy,		TTI->getMinMaxReductionCost(VectorTy, VecCondTy,
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	static Value getReductionValue(const DominatorTree DT, PHINode *P,
}		}

if (Rdx && DominatedReduxValue(Rdx))		if (Rdx && DominatedReduxValue(Rdx))
return Rdx;		return Rdx;

return nullptr;		return nullptr;
}		}

		static bool matchRdxBop(Instruction I, Value &V0, Value *&V1) {
		if (match(I, m_BinOp(m_Value(V0), m_Value(V1))))
		return true;
		if (match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(V0), m_Value(V1))))
		return true;
		if (match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(V0), m_Value(V1))))
		return true;
		return false;
		}

/// Attempt to reduce a horizontal reduction.		/// Attempt to reduce a horizontal reduction.
/// If it is legal to match a horizontal reduction feeding the phi node \a P		/// If it is legal to match a horizontal reduction feeding the phi node \a P
/// with reduction operators \a Root (or one of its operands) in a basic block		/// with reduction operators \a Root (or one of its operands) in a basic block
/// \a BB, then check if it can be done. If horizontal reduction is not found		/// \a BB, then check if it can be done. If horizontal reduction is not found
/// and root instruction is a binary operation, vectorization of the operands is		/// and root instruction is a binary operation, vectorization of the operands is
/// attempted.		/// attempted.
/// \returns true if a horizontal reduction was matched and reduced or operands		/// \returns true if a horizontal reduction was matched and reduced or operands
/// of one of the binary instruction were vectorized.		/// of one of the binary instruction were vectorized.
Show All 24 Lines	static bool tryToVectorizeHorReductionOrInstOperands(
SmallVector<std::pair<Instruction *, unsigned>, 8> Stack(1, {Root, 0});		SmallVector<std::pair<Instruction *, unsigned>, 8> Stack(1, {Root, 0});
SmallPtrSet<Value *, 8> VisitedInstrs;		SmallPtrSet<Value *, 8> VisitedInstrs;
bool Res = false;		bool Res = false;
while (!Stack.empty()) {		while (!Stack.empty()) {
Instruction *Inst;		Instruction *Inst;
unsigned Level;		unsigned Level;
std::tie(Inst, Level) = Stack.pop_back_val();		std::tie(Inst, Level) = Stack.pop_back_val();
Value B0, B1;		Value B0, B1;
bool IsBinop = match(Inst, m_BinOp(m_Value(B0), m_Value(B1)));		bool IsBinop = matchRdxBop(Inst, B0, B1);
bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));		bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));
if (IsBinop \|\| IsSelect) {		if (IsBinop \|\| IsSelect) {
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, Inst)) {		if (HorRdx.matchAssociativeReduction(P, Inst)) {
if (HorRdx.tryToReduce(R, TTI)) {		if (HorRdx.tryToReduce(R, TTI)) {
Res = true;		Res = true;
// Set P to nullptr to avoid re-analysis of phi node in		// Set P to nullptr to avoid re-analysis of phi node in
// matchAssociativeReduction function unless this is the root node.		// matchAssociativeReduction function unless this is the root node.
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll

	Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines

	for.end:			for.end:
	ret float %r.0			ret float %r.0
	}			}

	define float @fmin_v4i32(float* %p) #0 {			define float @fmin_v4i32(float* %p) #0 {
	; CHECK-LABEL: @fmin_v4i32(			; CHECK-LABEL: @fmin_v4i32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[P:%.*]], align 4, [[TBAA7]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[P:%.]] to <4 x float>
	; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[P]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4, [[TBAA7]]
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4, [[TBAA7]]			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP1]], <4 x float> poison, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @llvm.minnum.f32(float [[TMP1]], float [[TMP0]])			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast olt <4 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[P]], i64 2			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select fast <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP1]], <4 x float> [[RDX_SHUF]]
	; CHECK-NEXT: [[TMP3:%.]] = load float, float [[ARRAYIDX_2]], align 4, [[TBAA7]]			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @llvm.minnum.f32(float [[TMP3]], float [[TMP2]])			; CHECK-NEXT: [[RDX_MINMAX_CMP4:%.*]] = fcmp fast olt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[P]], i64 3			; CHECK-NEXT: [[RDX_MINMAX_SELECT5:%.*]] = select fast <4 x i1> [[RDX_MINMAX_CMP4]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX_3]], align 4, [[TBAA7]]			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT5]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = tail call fast float @llvm.minnum.f32(float [[TMP5]], float [[TMP4]])			; CHECK-NEXT: ret float [[TMP2]]
	; CHECK-NEXT: ret float [[TMP6]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%r.0 = phi float [ 0x47EFFFFFE0000000, %entry ], [ %cond, %for.inc ]			%r.0 = phi float [ 0x47EFFFFFE0000000, %entry ], [ %cond, %for.inc ]
	%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]			%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
	%cmp = icmp slt i32 %i.0, 4			%cmp = icmp slt i32 %i.0, 4
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fmaxnum.ll

Show First 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

define float @reduction_v4f32_fast(float* %p) {		define float @reduction_v4f32_fast(float* %p) {
; CHECK-LABEL: @reduction_v4f32_fast(		; CHECK-LABEL: @reduction_v4f32_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P]] to <4 x float>*
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load float, float [[G3]], align 4		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: [[M1:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T1]], float [[T0]])
; CHECK-NEXT: [[M2:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T2]], float [[M1]])
; CHECK-NEXT: [[M3:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T3]], float [[M2]])
; CHECK-NEXT: ret float [[M3]]
;		;
%g1 = getelementptr inbounds float, float* %p, i64 1		%g1 = getelementptr inbounds float, float* %p, i64 1
%g2 = getelementptr inbounds float, float* %p, i64 2		%g2 = getelementptr inbounds float, float* %p, i64 2
%g3 = getelementptr inbounds float, float* %p, i64 3		%g3 = getelementptr inbounds float, float* %p, i64 3
%t0 = load float, float* %p, align 4		%t0 = load float, float* %p, align 4
%t1 = load float, float* %g1, align 4		%t1 = load float, float* %g1, align 4
%t2 = load float, float* %g2, align 4		%t2 = load float, float* %g2, align 4
%t3 = load float, float* %g3, align 4		%t3 = load float, float* %g3, align 4
%m1 = tail call fast float @llvm.maxnum.f32(float %t1, float %t0)		%m1 = tail call fast float @llvm.maxnum.f32(float %t1, float %t0)
%m2 = tail call fast float @llvm.maxnum.f32(float %t2, float %m1)		%m2 = tail call fast float @llvm.maxnum.f32(float %t2, float %m1)
%m3 = tail call fast float @llvm.maxnum.f32(float %t3, float %m2)		%m3 = tail call fast float @llvm.maxnum.f32(float %t3, float %m2)
ret float %m3		ret float %m3
}		}

		; TODO: This should become a reduce intrinsic.

define float @reduction_v4f32_nnan(float* %p) {		define float @reduction_v4f32_nnan(float* %p) {
; CHECK-LABEL: @reduction_v4f32_nnan(		; CHECK-LABEL: @reduction_v4f32_nnan(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4
Show All 11 Lines	;
%t2 = load float, float* %g2, align 4		%t2 = load float, float* %g2, align 4
%t3 = load float, float* %g3, align 4		%t3 = load float, float* %g3, align 4
%m1 = tail call nnan float @llvm.maxnum.f32(float %t1, float %t0)		%m1 = tail call nnan float @llvm.maxnum.f32(float %t1, float %t0)
%m2 = tail call nnan float @llvm.maxnum.f32(float %t2, float %m1)		%m2 = tail call nnan float @llvm.maxnum.f32(float %t2, float %m1)
%m3 = tail call nnan float @llvm.maxnum.f32(float %t3, float %m2)		%m3 = tail call nnan float @llvm.maxnum.f32(float %t3, float %m2)
ret float %m3		ret float %m3
}		}

		; Negative test - must have nnan.

define float @reduction_v4f32_not_fast(float* %p) {		define float @reduction_v4f32_not_fast(float* %p) {
; CHECK-LABEL: @reduction_v4f32_not_fast(		; CHECK-LABEL: @reduction_v4f32_not_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4
Show All 20 Lines
; CHECK-LABEL: @reduction_v8f32_fast(		; CHECK-LABEL: @reduction_v8f32_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds float, float [[P]], i64 4		; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds float, float [[P]], i64 4
; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds float, float [[P]], i64 5		; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds float, float [[P]], i64 5
; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds float, float [[P]], i64 6		; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds float, float [[P]], i64 6
; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds float, float [[P]], i64 7		; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds float, float [[P]], i64 7
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P]] to <8 x float>*
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.vector.reduce.fmax.v8f32(<8 x float> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load float, float [[G3]], align 4		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: [[T4:%.]] = load float, float [[G4]], align 4
; CHECK-NEXT: [[T5:%.]] = load float, float [[G5]], align 4
; CHECK-NEXT: [[T6:%.]] = load float, float [[G6]], align 4
; CHECK-NEXT: [[T7:%.]] = load float, float [[G7]], align 4
; CHECK-NEXT: [[M1:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T1]], float [[T0]])
; CHECK-NEXT: [[M2:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T2]], float [[M1]])
; CHECK-NEXT: [[M3:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T3]], float [[M2]])
; CHECK-NEXT: [[M4:%.*]] = tail call fast float @llvm.maxnum.f32(float [[T4]], float [[M3]])
; CHECK-NEXT: [[M5:%.*]] = tail call fast float @llvm.maxnum.f32(float [[M4]], float [[T6]])
; CHECK-NEXT: [[M6:%.*]] = tail call fast float @llvm.maxnum.f32(float [[M5]], float [[T5]])
; CHECK-NEXT: [[M7:%.*]] = tail call fast float @llvm.maxnum.f32(float [[M6]], float [[T7]])
; CHECK-NEXT: ret float [[M7]]
;		;
%g1 = getelementptr inbounds float, float* %p, i64 1		%g1 = getelementptr inbounds float, float* %p, i64 1
%g2 = getelementptr inbounds float, float* %p, i64 2		%g2 = getelementptr inbounds float, float* %p, i64 2
%g3 = getelementptr inbounds float, float* %p, i64 3		%g3 = getelementptr inbounds float, float* %p, i64 3
%g4 = getelementptr inbounds float, float* %p, i64 4		%g4 = getelementptr inbounds float, float* %p, i64 4
%g5 = getelementptr inbounds float, float* %p, i64 5		%g5 = getelementptr inbounds float, float* %p, i64 5
%g6 = getelementptr inbounds float, float* %p, i64 6		%g6 = getelementptr inbounds float, float* %p, i64 6
%g7 = getelementptr inbounds float, float* %p, i64 7		%g7 = getelementptr inbounds float, float* %p, i64 7
Show All 30 Lines	;
ret double %m1		ret double %m1
}		}

define double @reduction_v4f64_fast(double* %p) {		define double @reduction_v4f64_fast(double* %p) {
; CHECK-LABEL: @reduction_v4f64_fast(		; CHECK-LABEL: @reduction_v4f64_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load double, double [[P]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[P]] to <4 x double>*
; CHECK-NEXT: [[T1:%.]] = load double, double [[G1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load double, double [[G2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call fast double @llvm.vector.reduce.fmax.v4f64(<4 x double> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load double, double [[G3]], align 4		; CHECK-NEXT: ret double [[TMP3]]
; CHECK-NEXT: [[M1:%.*]] = tail call fast double @llvm.maxnum.f64(double [[T1]], double [[T0]])
; CHECK-NEXT: [[M2:%.*]] = tail call fast double @llvm.maxnum.f64(double [[T2]], double [[M1]])
; CHECK-NEXT: [[M3:%.*]] = tail call fast double @llvm.maxnum.f64(double [[T3]], double [[M2]])
; CHECK-NEXT: ret double [[M3]]
;		;
%g1 = getelementptr inbounds double, double* %p, i64 1		%g1 = getelementptr inbounds double, double* %p, i64 1
%g2 = getelementptr inbounds double, double* %p, i64 2		%g2 = getelementptr inbounds double, double* %p, i64 2
%g3 = getelementptr inbounds double, double* %p, i64 3		%g3 = getelementptr inbounds double, double* %p, i64 3
%t0 = load double, double* %p, align 4		%t0 = load double, double* %p, align 4
%t1 = load double, double* %g1, align 4		%t1 = load double, double* %g1, align 4
%t2 = load double, double* %g2, align 4		%t2 = load double, double* %g2, align 4
%t3 = load double, double* %g3, align 4		%t3 = load double, double* %g3, align 4
%m1 = tail call fast double @llvm.maxnum.f64(double %t1, double %t0)		%m1 = tail call fast double @llvm.maxnum.f64(double %t1, double %t0)
%m2 = tail call fast double @llvm.maxnum.f64(double %t2, double %m1)		%m2 = tail call fast double @llvm.maxnum.f64(double %t2, double %m1)
%m3 = tail call fast double @llvm.maxnum.f64(double %t3, double %m2)		%m3 = tail call fast double @llvm.maxnum.f64(double %t3, double %m2)
ret double %m3		ret double %m3
}		}

		; Negative test - must have nnan.

define double @reduction_v4f64_wrong_fmf(double* %p) {		define double @reduction_v4f64_wrong_fmf(double* %p) {
; CHECK-LABEL: @reduction_v4f64_wrong_fmf(		; CHECK-LABEL: @reduction_v4f64_wrong_fmf(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load double, double [[P]], align 4		; CHECK-NEXT: [[T0:%.]] = load double, double [[P]], align 4
; CHECK-NEXT: [[T1:%.]] = load double, double [[G1]], align 4		; CHECK-NEXT: [[T1:%.]] = load double, double [[G1]], align 4
; CHECK-NEXT: [[T2:%.]] = load double, double [[G2]], align 4		; CHECK-NEXT: [[T2:%.]] = load double, double [[G2]], align 4
Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/fminnum.ll

Show First 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

define float @reduction_v4f32_fast(float* %p) {		define float @reduction_v4f32_fast(float* %p) {
; CHECK-LABEL: @reduction_v4f32_fast(		; CHECK-LABEL: @reduction_v4f32_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P]] to <4 x float>*
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load float, float [[G3]], align 4		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: [[M1:%.*]] = tail call fast float @llvm.minnum.f32(float [[T1]], float [[T0]])
; CHECK-NEXT: [[M2:%.*]] = tail call fast float @llvm.minnum.f32(float [[T2]], float [[M1]])
; CHECK-NEXT: [[M3:%.*]] = tail call fast float @llvm.minnum.f32(float [[T3]], float [[M2]])
; CHECK-NEXT: ret float [[M3]]
;		;
%g1 = getelementptr inbounds float, float* %p, i64 1		%g1 = getelementptr inbounds float, float* %p, i64 1
%g2 = getelementptr inbounds float, float* %p, i64 2		%g2 = getelementptr inbounds float, float* %p, i64 2
%g3 = getelementptr inbounds float, float* %p, i64 3		%g3 = getelementptr inbounds float, float* %p, i64 3
%t0 = load float, float* %p, align 4		%t0 = load float, float* %p, align 4
%t1 = load float, float* %g1, align 4		%t1 = load float, float* %g1, align 4
%t2 = load float, float* %g2, align 4		%t2 = load float, float* %g2, align 4
%t3 = load float, float* %g3, align 4		%t3 = load float, float* %g3, align 4
%m1 = tail call fast float @llvm.minnum.f32(float %t1, float %t0)		%m1 = tail call fast float @llvm.minnum.f32(float %t1, float %t0)
%m2 = tail call fast float @llvm.minnum.f32(float %t2, float %m1)		%m2 = tail call fast float @llvm.minnum.f32(float %t2, float %m1)
%m3 = tail call fast float @llvm.minnum.f32(float %t3, float %m2)		%m3 = tail call fast float @llvm.minnum.f32(float %t3, float %m2)
ret float %m3		ret float %m3
}		}

		; TODO: This should become a reduce intrinsic.

define float @reduction_v4f32_nnan(float* %p) {		define float @reduction_v4f32_nnan(float* %p) {
; CHECK-LABEL: @reduction_v4f32_nnan(		; CHECK-LABEL: @reduction_v4f32_nnan(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4
Show All 11 Lines	;
%t2 = load float, float* %g2, align 4		%t2 = load float, float* %g2, align 4
%t3 = load float, float* %g3, align 4		%t3 = load float, float* %g3, align 4
%m1 = tail call nnan float @llvm.minnum.f32(float %t1, float %t0)		%m1 = tail call nnan float @llvm.minnum.f32(float %t1, float %t0)
%m2 = tail call nnan float @llvm.minnum.f32(float %t2, float %m1)		%m2 = tail call nnan float @llvm.minnum.f32(float %t2, float %m1)
%m3 = tail call nnan float @llvm.minnum.f32(float %t3, float %m2)		%m3 = tail call nnan float @llvm.minnum.f32(float %t3, float %m2)
ret float %m3		ret float %m3
}		}

		; Negative test - must have nnan.

define float @reduction_v4f32_wrong_fmf(float* %p) {		define float @reduction_v4f32_wrong_fmf(float* %p) {
; CHECK-LABEL: @reduction_v4f32_wrong_fmf(		; CHECK-LABEL: @reduction_v4f32_wrong_fmf(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4
Show All 20 Lines
; CHECK-LABEL: @reduction_v8f32_fast(		; CHECK-LABEL: @reduction_v8f32_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds float, float [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds float, float [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds float, float [[P]], i64 3
; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds float, float [[P]], i64 4		; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds float, float [[P]], i64 4
; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds float, float [[P]], i64 5		; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds float, float [[P]], i64 5
; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds float, float [[P]], i64 6		; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds float, float [[P]], i64 6
; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds float, float [[P]], i64 7		; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds float, float [[P]], i64 7
; CHECK-NEXT: [[T0:%.]] = load float, float [[P]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P]] to <8 x float>*
; CHECK-NEXT: [[T1:%.]] = load float, float [[G1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load float, float [[G2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call fast float @llvm.vector.reduce.fmin.v8f32(<8 x float> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load float, float [[G3]], align 4		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: [[T4:%.]] = load float, float [[G4]], align 4
; CHECK-NEXT: [[T5:%.]] = load float, float [[G5]], align 4
; CHECK-NEXT: [[T6:%.]] = load float, float [[G6]], align 4
; CHECK-NEXT: [[T7:%.]] = load float, float [[G7]], align 4
; CHECK-NEXT: [[M1:%.*]] = tail call fast float @llvm.minnum.f32(float [[T1]], float [[T0]])
; CHECK-NEXT: [[M2:%.*]] = tail call fast float @llvm.minnum.f32(float [[T2]], float [[M1]])
; CHECK-NEXT: [[M3:%.*]] = tail call fast float @llvm.minnum.f32(float [[T3]], float [[M2]])
; CHECK-NEXT: [[M4:%.*]] = tail call fast float @llvm.minnum.f32(float [[T4]], float [[M3]])
; CHECK-NEXT: [[M5:%.*]] = tail call fast float @llvm.minnum.f32(float [[M4]], float [[T6]])
; CHECK-NEXT: [[M6:%.*]] = tail call fast float @llvm.minnum.f32(float [[M5]], float [[T5]])
; CHECK-NEXT: [[M7:%.*]] = tail call fast float @llvm.minnum.f32(float [[M6]], float [[T7]])
; CHECK-NEXT: ret float [[M7]]
;		;
%g1 = getelementptr inbounds float, float* %p, i64 1		%g1 = getelementptr inbounds float, float* %p, i64 1
%g2 = getelementptr inbounds float, float* %p, i64 2		%g2 = getelementptr inbounds float, float* %p, i64 2
%g3 = getelementptr inbounds float, float* %p, i64 3		%g3 = getelementptr inbounds float, float* %p, i64 3
%g4 = getelementptr inbounds float, float* %p, i64 4		%g4 = getelementptr inbounds float, float* %p, i64 4
%g5 = getelementptr inbounds float, float* %p, i64 5		%g5 = getelementptr inbounds float, float* %p, i64 5
%g6 = getelementptr inbounds float, float* %p, i64 6		%g6 = getelementptr inbounds float, float* %p, i64 6
%g7 = getelementptr inbounds float, float* %p, i64 7		%g7 = getelementptr inbounds float, float* %p, i64 7
Show All 30 Lines	;
ret double %m1		ret double %m1
}		}

define double @reduction_v4f64_fast(double* %p) {		define double @reduction_v4f64_fast(double* %p) {
; CHECK-LABEL: @reduction_v4f64_fast(		; CHECK-LABEL: @reduction_v4f64_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load double, double [[P]], align 4		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[P]] to <4 x double>*
; CHECK-NEXT: [[T1:%.]] = load double, double [[G1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <4 x double>, <4 x double> [[TMP1]], align 4
; CHECK-NEXT: [[T2:%.]] = load double, double [[G2]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = call fast double @llvm.vector.reduce.fmin.v4f64(<4 x double> [[TMP2]])
; CHECK-NEXT: [[T3:%.]] = load double, double [[G3]], align 4		; CHECK-NEXT: ret double [[TMP3]]
; CHECK-NEXT: [[M1:%.*]] = tail call fast double @llvm.minnum.f64(double [[T1]], double [[T0]])
; CHECK-NEXT: [[M2:%.*]] = tail call fast double @llvm.minnum.f64(double [[T2]], double [[M1]])
; CHECK-NEXT: [[M3:%.*]] = tail call fast double @llvm.minnum.f64(double [[T3]], double [[M2]])
; CHECK-NEXT: ret double [[M3]]
;		;
%g1 = getelementptr inbounds double, double* %p, i64 1		%g1 = getelementptr inbounds double, double* %p, i64 1
%g2 = getelementptr inbounds double, double* %p, i64 2		%g2 = getelementptr inbounds double, double* %p, i64 2
%g3 = getelementptr inbounds double, double* %p, i64 3		%g3 = getelementptr inbounds double, double* %p, i64 3
%t0 = load double, double* %p, align 4		%t0 = load double, double* %p, align 4
%t1 = load double, double* %g1, align 4		%t1 = load double, double* %g1, align 4
%t2 = load double, double* %g2, align 4		%t2 = load double, double* %g2, align 4
%t3 = load double, double* %g3, align 4		%t3 = load double, double* %g3, align 4
%m1 = tail call fast double @llvm.minnum.f64(double %t1, double %t0)		%m1 = tail call fast double @llvm.minnum.f64(double %t1, double %t0)
%m2 = tail call fast double @llvm.minnum.f64(double %t2, double %m1)		%m2 = tail call fast double @llvm.minnum.f64(double %t2, double %m1)
%m3 = tail call fast double @llvm.minnum.f64(double %t3, double %m2)		%m3 = tail call fast double @llvm.minnum.f64(double %t3, double %m2)
ret double %m3		ret double %m3
}		}

		; Negative test - must have nnan.

define double @reduction_v4f64_not_fast(double* %p) {		define double @reduction_v4f64_not_fast(double* %p) {
; CHECK-LABEL: @reduction_v4f64_not_fast(		; CHECK-LABEL: @reduction_v4f64_not_fast(
; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1		; CHECK-NEXT: [[G1:%.]] = getelementptr inbounds double, double [[P:%.*]], i64 1
; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2		; CHECK-NEXT: [[G2:%.]] = getelementptr inbounds double, double [[P]], i64 2
; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3		; CHECK-NEXT: [[G3:%.]] = getelementptr inbounds double, double [[P]], i64 3
; CHECK-NEXT: [[T0:%.]] = load double, double [[P]], align 4		; CHECK-NEXT: [[T0:%.]] = load double, double [[P]], align 4
; CHECK-NEXT: [[T1:%.]] = load double, double [[G1]], align 4		; CHECK-NEXT: [[T1:%.]] = load double, double [[G1]], align 4
; CHECK-NEXT: [[T2:%.]] = load double, double [[G2]], align 4		; CHECK-NEXT: [[T2:%.]] = load double, double [[G2]], align 4
Show All 20 Lines