This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
8
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2
aarch64-minmaxv.ll

Differential D13121

Improve ISel across lane float min/max reduction
ClosedPublic

Authored by junbuml on Sep 23 2015, 3:40 PM.

Download Raw Diff

Details

Reviewers

jmolloy
mcrosier

Summary

In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

svn0 = vector_shuffle t0, undef<2,3,u,u>
fmin = fminnum t0,svn0
svn1 = vector_shuffle fmin, undef<1,u,u,u>
cc = setcc fmin, svn1, ole
n0 = extract_vector_elt cc, #0
n1 = extract_vector_elt fmin, #0
n2 = extract_vector_elt fmin, #1
result = select n0, n1,n2

becomes:

result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 35564.Sep 23 2015, 3:40 PM

junbuml retitled this revision from to Improve ISel across lane float min/max reduction.

junbuml updated this object.

Herald added a subscriber: aemerson. · View Herald TranscriptSep 23 2015, 3:40 PM

junbuml added reviewers: jmolloy, mcrosier.Sep 23 2015, 3:42 PM

junbuml added a subscriber: llvm-commits.

junbuml updated this revision to Diff 35667.Sep 24 2015, 12:45 PM

jmolloy added inline comments.Sep 25 2015, 8:04 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
8621–8627	What about FMAXNAN and FMINNAN (-> FMAXV, FMINV)?
8684	Why are you making this change? What's the rationale?

junbuml added inline comments.Sep 25 2015, 10:50 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
8621–8627	I also thought the same, but I wasn't able to generate FMAXNAN with a vector input. It appears that matchSelectPattern() cannot return SPNB_RETURNS_NAN with fcmp fast. And even without fast math flag, SPNB_RETURNS_NAN cannot be returned because both LHSSafe and RHSSafe in matchSelectPattern() are false for vector inputs. Please let me know if I miss something.
8684	The only reason that I change to use intrinsics here is just to be consistent in the way I handle nodes because I could see only intrinsic of FMAXNMV, no SDNode for FMAXNMV. If SDNode need to be used I will add SDNode definition in td file for FMAXNMV. Please let me know.

Hi Jun,

This looks fine, if you address my last comment.

Thanks,

James

lib/Target/AArch64/AArch64ISelLowering.cpp
8621–8627	Yes, it does appear you're right. I'll have to go fix that.
8684	Given that we have these cross-lane nodes, it makes sense to use them in preference to the intrinsic version. As there are no such nodes for FMAXNMV and friends, the intrinsics make sense. I don't think there is any need to create ISDNodes for FMAXNMV and friends.

This revision is now accepted and ready to land.Oct 7 2015, 10:14 AM

Thanks James for the review. I will address your last comment.

junbuml updated this revision to Diff 36788.Oct 7 2015, 1:34 PM

junbuml edited edge metadata.

Just final minor changes.
I will commit this tomorrow unless I get any comment for this minor changes.

lib/Target/AArch64/AArch64ISelLowering.cpp
8684	Use intrinsic for F[MAX\|MIN]NMV, but use SDNode for addv and [s\|u][min\|max]v.
8866	Add check for minimum size in case of the input vector is too small.

Committed in r249834.

junbuml closed this revision.Oct 9 2015, 7:15 AM

Hi Jun, I've just got around to looking at this, and you need to update the test names, please see below.

test/CodeGen/AArch64/aarch64-minmaxv.ll
289–291	Need to rename this test, the function body can be anything and this test will pass. I've been caught out before naming a function the same as an instruction I'm looking for. The `CHECK: fmaxmv` will actually match the label in this case.
305–307	Same as above, please change the test name.

Thanks Charlie for finding this bug. I will fix it immediately!

Rename the function name in r250052.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

59 lines

test/

CodeGen/

AArch64/

aarch64-minmaxv.ll

33 lines

Diff 36788

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,612 Lines • ▼ Show 20 Lines
static SDValue tryMatchAcrossLaneShuffleForReduction(SDNode *N, SDValue OpV,		static SDValue tryMatchAcrossLaneShuffleForReduction(SDNode *N, SDValue OpV,
unsigned Op,		unsigned Op,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
EVT VTy = OpV->getOperand(0).getValueType();		EVT VTy = OpV->getOperand(0).getValueType();
if (!VTy.isVector())		if (!VTy.isVector())
return SDValue();		return SDValue();

int NumVecElts = VTy.getVectorNumElements();		int NumVecElts = VTy.getVectorNumElements();
		if (Op == ISD::FMAXNUM \|\| Op == ISD::FMINNUM) {
		if (NumVecElts != 4)
		return SDValue();
		} else {
if (NumVecElts != 4 && NumVecElts != 8 && NumVecElts != 16)		if (NumVecElts != 4 && NumVecElts != 8 && NumVecElts != 16)
return SDValue();		return SDValue();
		}
		jmolloyUnsubmitted Not Done Reply Inline Actions What about FMAXNAN and FMINNAN (-> FMAXV, FMINV)? jmolloy: What about FMAXNAN and FMINNAN (-> FMAXV, FMINV)?
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions I also thought the same, but I wasn't able to generate FMAXNAN with a vector input. It appears that matchSelectPattern() cannot return SPNB_RETURNS_NAN with fcmp fast. And even without fast math flag, SPNB_RETURNS_NAN cannot be returned because both LHSSafe and RHSSafe in matchSelectPattern() are false for vector inputs. Please let me know if I miss something. junbuml: I also thought the same, but I wasn't able to generate FMAXNAN with a vector input. It appears…
		jmolloyUnsubmitted Not Done Reply Inline Actions Yes, it does appear you're right. I'll have to go fix that. jmolloy: Yes, it does appear you're right. I'll have to go fix that.

int NumExpectedSteps = APInt(8, NumVecElts).logBase2();		int NumExpectedSteps = APInt(8, NumVecElts).logBase2();
SDValue PreOp = OpV;		SDValue PreOp = OpV;
// Iterate over each step of the across vector reduction.		// Iterate over each step of the across vector reduction.
for (int CurStep = 0; CurStep != NumExpectedSteps; ++CurStep) {		for (int CurStep = 0; CurStep != NumExpectedSteps; ++CurStep) {
SDValue CurOp = PreOp.getOperand(0);		SDValue CurOp = PreOp.getOperand(0);
SDValue Shuffle = PreOp.getOperand(1);		SDValue Shuffle = PreOp.getOperand(1);
if (Shuffle.getOpcode() != ISD::VECTOR_SHUFFLE) {		if (Shuffle.getOpcode() != ISD::VECTOR_SHUFFLE) {
Show All 34 Lines	for (int CurStep = 0; CurStep != NumExpectedSteps; ++CurStep) {
for (int i = 0; i < NumVecElts; ++i)		for (int i = 0; i < NumVecElts; ++i)
if ((i < NumMaskElts && Mask[i] != (NumMaskElts + i)) \|\|		if ((i < NumMaskElts && Mask[i] != (NumMaskElts + i)) \|\|
(i >= NumMaskElts && !(Mask[i] < 0)))		(i >= NumMaskElts && !(Mask[i] < 0)))
return SDValue();		return SDValue();

PreOp = CurOp;		PreOp = CurOp;
}		}
unsigned Opcode;		unsigned Opcode;
		bool IsIntrinsic = false;

switch (Op) {		switch (Op) {
default:		default:
llvm_unreachable("Unexpected operator for across vector reduction");		llvm_unreachable("Unexpected operator for across vector reduction");
case ISD::ADD:		case ISD::ADD:
Opcode = AArch64ISD::UADDV;		Opcode = AArch64ISD::UADDV;
		jmolloyUnsubmitted Not Done Reply Inline Actions Why are you making this change? What's the rationale? jmolloy: Why are you making this change? What's the rationale?
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions The only reason that I change to use intrinsics here is just to be consistent in the way I handle nodes because I could see only intrinsic of FMAXNMV, no SDNode for FMAXNMV. If SDNode need to be used I will add SDNode definition in td file for FMAXNMV. Please let me know. junbuml: The only reason that I change to use intrinsics here is just to be consistent in the way I…
		jmolloyUnsubmitted Not Done Reply Inline Actions Given that we have these cross-lane nodes, it makes sense to use them in preference to the intrinsic version. As there are no such nodes for FMAXNMV and friends, the intrinsics make sense. I don't think there is any need to create ISDNodes for FMAXNMV and friends. jmolloy: Given that we have these cross-lane nodes, it makes sense to use them in preference to the…
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions Use intrinsic for F[MAX\|MIN]NMV, but use SDNode for addv and [s\|u][min\|max]v. junbuml: Use intrinsic for F[MAX\|MIN]NMV, but use SDNode for addv and [s\|u][min\|max]v.
break;		break;
case ISD::SMAX:		case ISD::SMAX:
Opcode = AArch64ISD::SMAXV;		Opcode = AArch64ISD::SMAXV;
break;		break;
case ISD::UMAX:		case ISD::UMAX:
Opcode = AArch64ISD::UMAXV;		Opcode = AArch64ISD::UMAXV;
break;		break;
case ISD::SMIN:		case ISD::SMIN:
Opcode = AArch64ISD::SMINV;		Opcode = AArch64ISD::SMINV;
break;		break;
case ISD::UMIN:		case ISD::UMIN:
Opcode = AArch64ISD::UMINV;		Opcode = AArch64ISD::UMINV;
break;		break;
		case ISD::FMAXNUM:
		Opcode = Intrinsic::aarch64_neon_fmaxnmv;
		IsIntrinsic = true;
		break;
		case ISD::FMINNUM:
		Opcode = Intrinsic::aarch64_neon_fminnmv;
		IsIntrinsic = true;
		break;
}		}
SDLoc DL(N);		SDLoc DL(N);
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, N->getValueType(0),
		return IsIntrinsic
		? DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, N->getValueType(0),
		DAG.getConstant(Opcode, DL, MVT::i32), PreOp)
		: DAG.getNode(
		ISD::EXTRACT_VECTOR_ELT, DL, N->getValueType(0),
DAG.getNode(Opcode, DL, PreOp.getSimpleValueType(), PreOp),		DAG.getNode(Opcode, DL, PreOp.getSimpleValueType(), PreOp),
DAG.getConstant(0, DL, MVT::i64));		DAG.getConstant(0, DL, MVT::i64));
}		}

/// Target-specific DAG combine for the across vector min/max reductions.		/// Target-specific DAG combine for the across vector min/max reductions.
/// This function specifically handles the final clean-up step of the vector		/// This function specifically handles the final clean-up step of the vector
/// min/max reductions produced by the LoopVectorizer. It is the log2-shuffle		/// min/max reductions produced by the LoopVectorizer. It is the log2-shuffle
/// pattern, which narrows down and finds the final min/max value from all		/// pattern, which narrows down and finds the final min/max value from all
/// elements of the vector.		/// elements of the vector.
/// For example, for a <16 x i8> vector :		/// For example, for a <16 x i8> vector :
/// svn0 = vector_shuffle %0, undef<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u>		/// svn0 = vector_shuffle %0, undef<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u>
/// %smax0 = smax %arr, svn0		/// %smax0 = smax %arr, svn0
/// %svn1 = vector_shuffle %smax0, undef<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>		/// %svn1 = vector_shuffle %smax0, undef<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
/// %smax1 = smax %smax0, %svn1		/// %smax1 = smax %smax0, %svn1
/// %svn2 = vector_shuffle %smax1, undef<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>		/// %svn2 = vector_shuffle %smax1, undef<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
/// %smax2 = smax %smax1, svn2		/// %smax2 = smax %smax1, svn2
/// %svn3 = vector_shuffle %smax2, undef<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>		/// %svn3 = vector_shuffle %smax2, undef<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
/// %sc = setcc %smax2, %svn3, gt		/// %sc = setcc %smax2, %svn3, gt
/// %n0 = extract_vector_elt %sc, #0		/// %n0 = extract_vector_elt %sc, #0
/// %n1 = extract_vector_elt %smax2, #0		/// %n1 = extract_vector_elt %smax2, #0
/// %n2 = extract_vector_elt $smax2, #1		/// %n2 = extract_vector_elt $smax2, #1
/// %result = select %n0, %n1, n2		/// %result = select %n0, %n1, n2
/// becomes :		/// becomes :
/// %1 = smaxv %0		/// %1 = smaxv %0
/// %result = extract_vector_elt %1, 0		/// %result = extract_vector_elt %1, 0
/// FIXME: Currently this function matches only SMAXV, UMAXV, SMINV, and UMINV.
/// We could also support other types of across lane reduction available
/// in AArch64, including FMAXNMV, FMAXV, FMINNMV, and FMINV.
static SDValue		static SDValue
performAcrossLaneMinMaxReductionCombine(SDNode *N, SelectionDAG &DAG,		performAcrossLaneMinMaxReductionCombine(SDNode *N, SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
if (!Subtarget->hasNEON())		if (!Subtarget->hasNEON())
return SDValue();		return SDValue();

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue IfTrue = N->getOperand(1);		SDValue IfTrue = N->getOperand(1);
Show All 11 Lines	performAcrossLaneMinMaxReductionCombine(SDNode *N, SelectionDAG &DAG,
EVT SetCCVT = SetCC.getValueType();		EVT SetCCVT = SetCC.getValueType();
if (SetCC.getOpcode() != ISD::SETCC \|\| !SetCCVT.isVector() \|\|		if (SetCC.getOpcode() != ISD::SETCC \|\| !SetCCVT.isVector() \|\|
SetCCVT.getVectorElementType() != MVT::i1)		SetCCVT.getVectorElementType() != MVT::i1)
return SDValue();		return SDValue();

SDValue VectorOp = SetCC.getOperand(0);		SDValue VectorOp = SetCC.getOperand(0);
unsigned Op = VectorOp->getOpcode();		unsigned Op = VectorOp->getOpcode();
// Check if the input vector is fed by the operator we want to handle.		// Check if the input vector is fed by the operator we want to handle.
if (Op != ISD::SMAX && Op != ISD::UMAX && Op != ISD::SMIN && Op != ISD::UMIN)		if (Op != ISD::SMAX && Op != ISD::UMAX && Op != ISD::SMIN &&
		Op != ISD::UMIN && Op != ISD::FMAXNUM && Op != ISD::FMINNUM)
return SDValue();		return SDValue();

EVT VTy = VectorOp.getValueType();		EVT VTy = VectorOp.getValueType();
if (!VTy.isVector())		if (!VTy.isVector())
return SDValue();		return SDValue();

		if (VTy.getSizeInBits() < 64)
		return SDValue();

EVT EltTy = VTy.getVectorElementType();		EVT EltTy = VTy.getVectorElementType();
		if (Op == ISD::FMAXNUM \|\| Op == ISD::FMINNUM) {
		if (EltTy != MVT::f32)
		return SDValue();
		} else {
if (EltTy != MVT::i32 && EltTy != MVT::i16 && EltTy != MVT::i8)		if (EltTy != MVT::i32 && EltTy != MVT::i16 && EltTy != MVT::i8)
return SDValue();		return SDValue();
		}

// Check if extracting from the same vector.		// Check if extracting from the same vector.
// For example,		// For example,
// %sc = setcc %vector, %svn1, gt		// %sc = setcc %vector, %svn1, gt
// %n0 = extract_vector_elt %sc, #0		// %n0 = extract_vector_elt %sc, #0
// %n1 = extract_vector_elt %vector, #0		// %n1 = extract_vector_elt %vector, #0
// %n2 = extract_vector_elt $vector, #1		// %n2 = extract_vector_elt $vector, #1
if (!(VectorOp == IfTrue->getOperand(0) &&		if (!(VectorOp == IfTrue->getOperand(0) &&
VectorOp == IfFalse->getOperand(0)))		VectorOp == IfFalse->getOperand(0)))
return SDValue();		return SDValue();

// Check if the condition code is matched with the operator type.		// Check if the condition code is matched with the operator type.
ISD::CondCode CC = cast<CondCodeSDNode>(SetCC->getOperand(2))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(SetCC->getOperand(2))->get();
if ((Op == ISD::SMAX && CC != ISD::SETGT && CC != ISD::SETGE) \|\|		if ((Op == ISD::SMAX && CC != ISD::SETGT && CC != ISD::SETGE) \|\|
(Op == ISD::UMAX && CC != ISD::SETUGT && CC != ISD::SETUGE) \|\|		(Op == ISD::UMAX && CC != ISD::SETUGT && CC != ISD::SETUGE) \|\|
(Op == ISD::SMIN && CC != ISD::SETLT && CC != ISD::SETLE) \|\|		(Op == ISD::SMIN && CC != ISD::SETLT && CC != ISD::SETLE) \|\|
(Op == ISD::UMIN && CC != ISD::SETULT && CC != ISD::SETULE))		(Op == ISD::UMIN && CC != ISD::SETULT && CC != ISD::SETULE) \|\|
		(Op == ISD::FMAXNUM && CC != ISD::SETOGT && CC != ISD::SETOGE &&
		CC != ISD::SETUGT && CC != ISD::SETUGE && CC != ISD::SETGT &&
		CC != ISD::SETGE) \|\|
		(Op == ISD::FMINNUM && CC != ISD::SETOLT && CC != ISD::SETOLE &&
		CC != ISD::SETULT && CC != ISD::SETULE && CC != ISD::SETLT &&
		CC != ISD::SETLE))
return SDValue();		return SDValue();

// Expect to check only lane 0 from the vector SETCC.		// Expect to check only lane 0 from the vector SETCC.
if (!isa<ConstantSDNode>(N0.getOperand(1)) \|\|		if (!isa<ConstantSDNode>(N0.getOperand(1)) \|\|
cast<ConstantSDNode>(N0.getOperand(1))->getZExtValue() != 0)		cast<ConstantSDNode>(N0.getOperand(1))->getZExtValue() != 0)
return SDValue();		return SDValue();

// Expect to extract the true value from lane 0.		// Expect to extract the true value from lane 0.
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	performAcrossLaneAddReductionCombine(SDNode *N, SelectionDAG &DAG,
EVT VTy = N0.getValueType();		EVT VTy = N0.getValueType();
if (!VTy.isVector())		if (!VTy.isVector())
return SDValue();		return SDValue();

EVT EltTy = VTy.getVectorElementType();		EVT EltTy = VTy.getVectorElementType();
if (EltTy != MVT::i32 && EltTy != MVT::i16 && EltTy != MVT::i8)		if (EltTy != MVT::i32 && EltTy != MVT::i16 && EltTy != MVT::i8)
return SDValue();		return SDValue();

		if (VTy.getSizeInBits() < 64)
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions Add check for minimum size in case of the input vector is too small. junbuml: Add check for minimum size in case of the input vector is too small.
		return SDValue();

return tryMatchAcrossLaneShuffleForReduction(N, N0, ISD::ADD, DAG);		return tryMatchAcrossLaneShuffleForReduction(N, N0, ISD::ADD, DAG);
}		}

/// Target-specific DAG combine function for NEON load/store intrinsics		/// Target-specific DAG combine function for NEON load/store intrinsics
/// to merge base address updates.		/// to merge base address updates.
static SDValue performNEONPostLDSTCombine(SDNode *N,		static SDValue performNEONPostLDSTCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
▲ Show 20 Lines • Show All 911 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64-minmaxv.ll

Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	define i64 @umin_D(<2 x i64>* nocapture readonly %arr) {
%rdx.shuf = shufflevector <2 x i64> %rdx.minmax.select, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>		%rdx.shuf = shufflevector <2 x i64> %rdx.minmax.select, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
%rdx.minmax.cmp18 = icmp ult <2 x i64> %rdx.minmax.select, %rdx.shuf		%rdx.minmax.cmp18 = icmp ult <2 x i64> %rdx.minmax.select, %rdx.shuf
%rdx.minmax.cmp18.elt = extractelement <2 x i1> %rdx.minmax.cmp18, i32 0		%rdx.minmax.cmp18.elt = extractelement <2 x i1> %rdx.minmax.cmp18, i32 0
%rdx.minmax.select.elt = extractelement <2 x i64> %rdx.minmax.select, i32 0		%rdx.minmax.select.elt = extractelement <2 x i64> %rdx.minmax.select, i32 0
%rdx.shuf.elt = extractelement <2 x i64> %rdx.minmax.select, i32 1		%rdx.shuf.elt = extractelement <2 x i64> %rdx.minmax.select, i32 1
%r = select i1 %rdx.minmax.cmp18.elt, i64 %rdx.minmax.select.elt, i64 %rdx.shuf.elt		%r = select i1 %rdx.minmax.cmp18.elt, i64 %rdx.minmax.select.elt, i64 %rdx.shuf.elt
ret i64 %r		ret i64 %r
}		}

		; CHECK-LABEL: f_fmaxnmv
		; CHECK: fmaxnmv
		define float @f_fmaxnmv(<4 x float>* nocapture readonly %arr) {
		chatur01Unsubmitted Not Done Reply Inline Actions Need to rename this test, the function body can be anything and this test will pass. I've been caught out before naming a function the same as an instruction I'm looking for. The `CHECK: fmaxmv` will actually match the label in this case. chatur01: Need to rename this test, the function body can be anything and this test will pass. I've been…
		%rdx.minmax.select = load <4 x float>, <4 x float>* %arr
		%rdx.shuf = shufflevector <4 x float> %rdx.minmax.select, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
		%rdx.minmax.cmp = fcmp fast oge <4 x float> %rdx.minmax.select, %rdx.shuf
		%rdx.minmax.select1 = select <4 x i1> %rdx.minmax.cmp, <4 x float> %rdx.minmax.select, <4 x float> %rdx.shuf
		%rdx.shuf1 = shufflevector <4 x float> %rdx.minmax.select1, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
		%rdx.minmax.cmp1 = fcmp fast oge <4 x float> %rdx.minmax.select1, %rdx.shuf1
		%rdx.minmax.cmp1.elt = extractelement <4 x i1> %rdx.minmax.cmp1, i32 0
		%rdx.minmax.select1.elt = extractelement <4 x float> %rdx.minmax.select1, i32 0
		%rdx.shuf1.elt = extractelement <4 x float> %rdx.minmax.select1, i32 1
		%r = select i1 %rdx.minmax.cmp1.elt, float %rdx.minmax.select1.elt, float %rdx.shuf1.elt
		ret float %r
		}

		; CHECK-LABEL: f_fminnmv
		; CHECK: fminnmv
		define float @f_fminnmv(<4 x float>* nocapture readonly %arr) {
		chatur01Unsubmitted Not Done Reply Inline Actions Same as above, please change the test name. chatur01: Same as above, please change the test name.
		%rdx.minmax.select = load <4 x float>, <4 x float>* %arr
		%rdx.shuf = shufflevector <4 x float> %rdx.minmax.select, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
		%rdx.minmax.cmp = fcmp fast ole <4 x float> %rdx.minmax.select, %rdx.shuf
		%rdx.minmax.select1 = select <4 x i1> %rdx.minmax.cmp, <4 x float> %rdx.minmax.select, <4 x float> %rdx.shuf
		%rdx.shuf1 = shufflevector <4 x float> %rdx.minmax.select1, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
		%rdx.minmax.cmp1 = fcmp fast ole <4 x float> %rdx.minmax.select1, %rdx.shuf1
		%rdx.minmax.cmp1.elt = extractelement <4 x i1> %rdx.minmax.cmp1, i32 0
		%rdx.minmax.select1.elt = extractelement <4 x float> %rdx.minmax.select1, i32 0
		%rdx.shuf1.elt = extractelement <4 x float> %rdx.minmax.select1, i32 1
		%r = select i1 %rdx.minmax.cmp1.elt, float %rdx.minmax.select1.elt, float %rdx.shuf1.elt
		ret float %r
		}