This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Support for horizontal min/max reduction
ClosedPublic

Authored by ABataev on Dec 16 2016, 5:47 AM.

Download Raw Diff

Details

Reviewers

spatel
mzolotukhin
mkuper
hfinkel
RKSimon
chandlerc

Commits

rGccce7afee8ad: [SLP] Support for horizontal min/max reduction.
rG6dd29fccb881: [SLP] Support for horizontal min/max reduction.
rL314101: [SLP] Support for horizontal min/max reduction.
rL312791: [SLP] Support for horizontal min/max reduction.

Summary

SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions.
Patch fixes PR26956.

Diff Detail

Build Status

Buildable 8984
Build 8984: arc lint + arc unit

Event Timeline

ABataev updated this revision to Diff 81744.Dec 16 2016, 5:47 AM

ABataev retitled this revision from to [SLP] Support for horizontal min/max reduction.

ABataev updated this object.

ABataev added reviewers: RKSimon, spatel, mkuper, mzolotukhin, hfinkel.

ABataev added a subscriber: llvm-commits.

Ping

mssimpso added a subscriber: mssimpso.Jan 3 2017, 7:21 AM

A few comments, but someone with more vectorizer knowledge needs to review as well.

include/llvm/CodeGen/BasicTTIImpl.h
1183	Missing assert message
1186	Move this comment out and just above the arithmetic/minmax functions?
1194	where only the first n/2 elements are meaningful,
1197	, not n,
lib/Target/X86/X86TargetTransformInfo.cpp
1888	Unnecessary?

ABataev marked 5 inline comments as done.Jan 18 2017, 11:03 AM

ABataev added inline comments.

lib/Target/X86/X86TargetTransformInfo.cpp
1888	Yes, missed it, thanks.

Address Simon's comments

mcrosier added a subscriber: mcrosier.Jan 18 2017, 11:19 AM

Ping

This is another example of what I was talking about re smaller patches.

You could do things like name changes (Reduction -> ArithmeticReduction) or moving comments around (from inside the body of getReductionCost to above the declaration) in separate NFC patches. In a lot of cases they're simple enough not to require pre-commit review, and it makes reviewing actual meaningful patches much much easier, because the diff only contains relevant things. You could also start with a patch that only supports one kind of min/max reduction, and add the rest of them as a follow-up.

I'm sorry if it seems like I'm being petty. In fact I'm really interested in getting all of this stuff in. But the SLP vectorizer is not the simplest or the clearest piece of code to begin with, and I'm not deeply familiar with its nuances, so it's really hard for me to meaningfully review SLP patches if they also contain noise, and if they do more than one thing. If there's someone else who's capable of reviewing and LGTMing these patches as is, I don't oppose it. But I can't.

In D27846#661215, @mkuper wrote:

This is another example of what I was talking about re smaller patches.

You could do things like name changes (Reduction -> ArithmeticReduction) or moving comments around (from inside the body of getReductionCost to above the declaration) in separate NFC patches. In a lot of cases they're simple enough not to require pre-commit review, and it makes reviewing actual meaningful patches much much easier, because the diff only contains relevant things. You could also start with a patch that only supports one kind of min/max reduction, and add the rest of them as a follow-up.

I'm sorry if it seems like I'm being petty. In fact I'm really interested in getting all of this stuff in. But the SLP vectorizer is not the simplest or the clearest piece of code to begin with, and I'm not deeply familiar with its nuances, so it's really hard for me to meaningfully review SLP patches if they also contain noise, and if they do more than one thing. If there's someone else who's capable of reviewing and LGTMing these patches as is, I don't oppose it. But I can't.

Michael, no problems at all. Will do my best to split this and other patches into several smaller pieces, but I can't guarantee that it will be possible to do in all cases.

RKSimon resigned from this revision.Feb 8 2017, 3:44 AM

RKSimon added a subscriber: RKSimon.

RKSimon added a reviewer: RKSimon.Jun 25 2017, 3:26 AM

This looks like it needs rebasing and making dependent on D29826

ABataev added a parent revision: D29826: [SLP] General improvements of SLP vectorization process..Aug 2 2017, 6:28 AM

Update to latest revision.

Harbormaster completed remote builds in B8984: Diff 109576.Aug 3 2017, 9:18 AM

Herald added a subscriber: javed.absar. · View Herald TranscriptAug 3 2017, 9:18 AM

RKSimon added inline comments.Aug 3 2017, 9:36 AM

include/llvm/CodeGen/BasicTTIImpl.h
1242	This seems the same as the comment before getArithmeticReductionCost - in which case is it worth keeping?
1296	ConcreteTTI->getCmpSelInstrCost(

Update after review

Harbormaster completed remote builds in B9081: Diff 109990.Aug 7 2017, 7:27 AM

RKSimon added inline comments.Aug 7 2017, 8:09 AM

include/llvm/CodeGen/BasicTTIImpl.h
1182	ScalarTy->isFloatingPointTy()
lib/Analysis/CostModel.cpp
194	They're not public, but maybe keep to style guide (also, maybe drop the class?) enum ReductionKind { RK_None, /// Not a reduction. RK_Arithmetic, /// Binary reduction data. RK_MinMax, /// Min/max reduction data. };
210	Do you need the this == &RD? Won't it always match on (Kind == RD.Kind && Opcode == RD.Opcode)?
lib/Target/X86/X86TargetTransformInfo.cpp
1892	One cost entry per line
lib/Transforms/Vectorize/SLPVectorizer.cpp
4470	Same as above: enum ReductionKind { RK_Not, /// Not a reduction. RK_Arithmetic, /// Binary reduction data. RK_Min, /// Minimum reduction data. RK_UMin, /// Unsigned minimum reduction data. RK_Max, /// Maximum reduction data. RK_UMax, /// Unsigned maximum reduction data. };

Update after review.

Harbormaster completed remote builds in B9150: Diff 110255.Aug 8 2017, 12:28 PM

I think its almost there now, my only concern is that we're not discriminating between signed/unsigned in getMinMaxReductionCost calls - e.g. SSE is very inconsistent with support for these.

Update after review

Harbormaster completed remote builds in B9281: Diff 111056.Aug 14 2017, 1:22 PM

No more comments from me

@mkuper - any thoughts?

LGTM with one minor

lib/Transforms/Vectorize/SLPVectorizer.cpp
4464	Call this RK_None to match the other version?

This revision is now accepted and ready to land.Sep 8 2017, 4:25 AM

Closed by commit rL312791: [SLP] Support for horizontal min/max reduction. (authored by ABataev). · Explain WhySep 8 2017, 6:50 AM

This revision was automatically updated to reflect the committed changes.

asbirlea mentioned this in D37616: [X86] PR34149 Suboptimal codegen for fast minnum and maxnum..Sep 12 2017, 2:39 PM

(Partially) reverted in rL313409 due to PR34635

This revision is now accepted and ready to land.Sep 17 2017, 7:02 AM

PR34635 needs addressing

This revision now requires changes to proceed.Sep 17 2017, 7:02 AM

Update after fixing PR34635

RKSimon added a reviewer: chandlerc.Sep 18 2017, 12:52 PM

Is there any chance that you can simplify the PR34635.ll test case that you committed? There's a lot of metadata/unnecessary code in there which is likely to make the testcase very brittle.

Update after test update

Harbormaster completed remote builds in B10412: Diff 115837.Sep 19 2017, 6:50 AM

In D27846#874984, @RKSimon wrote:

Is there any chance that you can simplify the PR34635.ll test case that you committed? There's a lot of metadata/unnecessary code in there which is likely to make the testcase very brittle.

Done

LGTM, thanks

This revision is now accepted and ready to land.Sep 24 2017, 7:19 AM

Closed by commit rL314101: [SLP] Support for horizontal min/max reduction. (authored by ABataev). · Explain WhySep 25 2017, 6:36 AM

This revision was automatically updated to reflect the committed changes.

sabuasal added a subscriber: sabuasal.Mar 6 2018, 6:47 PM

sabuasal added inline comments.

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
5065 ↗	(On Diff #116544)	Hi, Why are we filling the NonNan flags for the "OperationData" object the value from the condition of the select instruction instead of the select Instruction itself? Wheb ew get to code gen we check that the value itself is not a Nan, am I missing something?

ABataev added inline comments.Mar 7 2018, 6:24 AM

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
5065 ↗	(On Diff #116544)	SelectInst itself does not have any fp flags, only fcmp does. I don't fully understand your question, but seems to me you're asking where are the checks for the NaN in the generated code, right? Nowhere, we do not emit these checks.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

127 lines

Transforms/

Vectorize/

SLPVectorizer.h

13 lines

lib/

Analysis/

CostModel.cpp

145 lines

TargetTransformInfo.cpp

7 lines

Target/

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

105 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

507 lines

test/

Transforms/

SLPVectorizer/

AArch64/

gather-root.ll

100 lines

X86/

horizontal-list.ll

198 lines

horizontal-minmax.ll

1373 lines

horizontal.ll

64 lines

insert-element-build-vector.ll

40 lines

Diff 109576

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 707 Lines • ▼ Show 20 Lines	public:
/// Pairwise:		/// Pairwise:
/// (v0, v1, v2, v3)		/// (v0, v1, v2, v3)
/// ((v0+v1), (v2, v3), undef, undef)		/// ((v0+v1), (v2, v3), undef, undef)
/// Split:		/// Split:
/// (v0, v1, v2, v3)		/// (v0, v1, v2, v3)
/// ((v0+v2), (v1+v3), undef, undef)		/// ((v0+v2), (v1+v3), undef, undef)
int getArithmeticReductionCost(unsigned Opcode, Type *Ty,		int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm) const;		bool IsPairwiseForm) const;
		int getMinMaxReductionCost(Type Ty, Type CondTy, bool IsPairwiseForm) const;

/// \returns The cost of Intrinsic instructions. Analyses the real arguments.		/// \returns The cost of Intrinsic instructions. Analyses the real arguments.
/// Three cases are handled: 1. scalar instruction 2. vector instruction		/// Three cases are handled: 1. scalar instruction 2. vector instruction
/// 3. scalar instruction which is to be vectorized with VF.		/// 3. scalar instruction which is to be vectorized with VF.
int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF,		ArrayRef<Value *> Args, FastMathFlags FMF,
unsigned VF = 1) const;		unsigned VF = 1) const;

▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	virtual int getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
unsigned Alignment) = 0;		unsigned Alignment) = 0;
virtual int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,		virtual int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
unsigned Factor,		unsigned Factor,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Alignment,		unsigned Alignment,
unsigned AddressSpace) = 0;		unsigned AddressSpace) = 0;
virtual int getArithmeticReductionCost(unsigned Opcode, Type *Ty,		virtual int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm) = 0;		bool IsPairwiseForm) = 0;
		virtual int getMinMaxReductionCost(Type Ty, Type CondTy,
		bool IsPairwiseForm) = 0;
virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF,		ArrayRef<Type *> Tys, FastMathFlags FMF,
unsigned ScalarizationCostPassed) = 0;		unsigned ScalarizationCostPassed) = 0;
virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) = 0;		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) = 0;
virtual int getCallInstrCost(Function F, Type RetTy,		virtual int getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys) = 0;		ArrayRef<Type *> Tys) = 0;
virtual unsigned getNumberOfParts(Type *Tp) = 0;		virtual unsigned getNumberOfParts(Type *Tp) = 0;
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy, unsigned Factor,
unsigned AddressSpace) override {		unsigned AddressSpace) override {
return Impl.getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return Impl.getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
}		}
int getArithmeticReductionCost(unsigned Opcode, Type *Ty,		int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm) override {		bool IsPairwiseForm) override {
return Impl.getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);		return Impl.getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
}		}
		int getMinMaxReductionCost(Type Ty, Type CondTy,
		bool IsPairwiseForm) override {
		return Impl.getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm);
		}
int getIntrinsicInstrCost(Intrinsic::ID ID, Type RetTy, ArrayRef<Type > Tys,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type RetTy, ArrayRef<Type > Tys,
FastMathFlags FMF, unsigned ScalarizationCostPassed) override {		FastMathFlags FMF, unsigned ScalarizationCostPassed) override {
return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF,		return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
ScalarizationCostPassed);		ScalarizationCostPassed);
}		}
int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) override {		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) override {
return Impl.getIntrinsicInstrCost(ID, RetTy, Args, FMF, VF);		return Impl.getIntrinsicInstrCost(ID, RetTy, Args, FMF, VF);
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 424 Lines • ▼ Show 20 Lines	public:

unsigned getAddressComputationCost(Type Tp, ScalarEvolution ,		unsigned getAddressComputationCost(Type Tp, ScalarEvolution ,
const SCEV *) {		const SCEV *) {
return 0;		return 0;
}		}

unsigned getArithmeticReductionCost(unsigned, Type *, bool) { return 1; }		unsigned getArithmeticReductionCost(unsigned, Type *, bool) { return 1; }

		unsigned getMinMaxReductionCost(Type , Type , bool) { return 1; }

unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) { return 0; }		unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) { return 0; }

bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) {		bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info) {
return false;		return false;
}		}

unsigned getAtomicMemIntrinsicMaxElementSize() const {		unsigned getAtomicMemIntrinsicMaxElementSize() const {
// Note for overrides: You must ensure for all element unordered-atomic		// Note for overrides: You must ensure for all element unordered-atomic
▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,164 Lines • ▼ Show 20 Lines	unsigned getArithmeticReductionCost(unsigned Opcode, Type *Ty,
ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *		ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *
ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,		ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
NumVecElts, Ty);		NumVecElts, Ty);
ArithCost += (NumReduxLevels - LongVectorCount) *		ArithCost += (NumReduxLevels - LongVectorCount) *
ConcreteTTI->getArithmeticInstrCost(Opcode, Ty);		ConcreteTTI->getArithmeticInstrCost(Opcode, Ty);
return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true);		return ShuffleCost + ArithCost + getScalarizationOverhead(Ty, false, true);
}		}

		/// Try to calculate arithmetic and shuffle op costs for reduction operations.
		/// Try to calculate arithmetic and shuffle op costs for reduction operations.
		/// We're assuming that reduction operation are performing the following way:
		/// 1. Non-pairwise reduction
		/// %val1 = shufflevector<n x t> %val, <n x t> %undef,
		/// <n x i32> <i32 n/2, i32 n/2 + 1, ..., i32 n, i32 undef, ..., i32 undef>
		/// \----------------v-------------/ \----------v------------/
		/// n/2 elements n/2 elements
		/// %red1 = op <n x t> %val, <n x t> val1
		/// After this operation we have a vector %red1 where only the first n/2
		RKSimonUnsubmitted Not Done Reply Inline Actions ScalarTy->isFloatingPointTy() RKSimon: ScalarTy->isFloatingPointTy()
		/// elements are meaningful, the second n/2 elements are undefined and can be
		RKSimonUnsubmitted Done Reply Inline Actions Missing assert message RKSimon: Missing assert message
		/// dropped. All other operations are actually working with the vector of
		/// length n/2, not n, though the real vector length is still n.
		/// %val2 = shufflevector<n x t> %red1, <n x t> %undef,
		RKSimonUnsubmitted Done Reply Inline Actions Move this comment out and just above the arithmetic/minmax functions? RKSimon: Move this comment out and just above the arithmetic/minmax functions?
		/// <n x i32> <i32 n/4, i32 n/4 + 1, ..., i32 n/2, i32 undef, ..., i32 undef>
		/// \----------------v-------------/ \----------v------------/
		/// n/4 elements 3*n/4 elements
		/// %red2 = op <n x t> %red1, <n x t> val2 - working with the vector of
		/// length n/2, the resulting vector has length n/4 etc.
		/// 2. Pairwise reduction:
		/// Everything is the same except for an additional shuffle operation which
		/// is used to produce operands for pairwise kind of reductions.
		RKSimonUnsubmitted Done Reply Inline Actions where only the first n/2 elements are meaningful, RKSimon: where only the first n/2 elements are meaningful,
		/// %val1 = shufflevector<n x t> %val, <n x t> %undef,
		/// <n x i32> <i32 0, i32 2, ..., i32 n-2, i32 undef, ..., i32 undef>
		/// \-------------v----------/ \----------v------------/
		RKSimonUnsubmitted Done Reply Inline Actions , not n, RKSimon: , not n,
		/// n/2 elements n/2 elements
		/// %val2 = shufflevector<n x t> %val, <n x t> %undef,
		/// <n x i32> <i32 1, i32 3, ..., i32 n-1, i32 undef, ..., i32 undef>
		/// \-------------v----------/ \----------v------------/
		/// n/2 elements n/2 elements
		/// %red1 = op <n x t> %val1, <n x t> val2
		/// Again, the operation is performed on <n x t> vector, but the resulting
		/// vector %red1 is <n/2 x t> vector.
		///
		/// The cost model should take into account that the actual length of the
		/// vector is reduced on each iteration.
		/// We're assuming that reduction operation are performing the following way:
		/// 1. Non-pairwise reduction
		/// %val1 = shufflevector<n x t> %val, <n x t> %undef,
		/// <n x i32> <i32 n/2, i32 n/2 + 1, ..., i32 n, i32 undef, ..., i32 undef>
		/// \----------------v-------------/ \----------v------------/
		/// n/2 elements n/2 elements
		/// %red1 = op <n x t> %val, <n x t> val1
		/// After this operation we have a vector %red1 where only the first n/2
		/// elements are meaningful, the second n/2 elements are undefined and can be
		/// dropped. All other operations are actually working with the vector of
		/// length n/2, not n, though the real vector length is still n.
		/// %val2 = shufflevector<n x t> %red1, <n x t> %undef,
		/// <n x i32> <i32 n/4, i32 n/4 + 1, ..., i32 n/2, i32 undef, ..., i32 undef>
		/// \----------------v-------------/ \----------v------------/
		/// n/4 elements 3*n/4 elements
		/// %red2 = op <n x t> %red1, <n x t> val2 - working with the vector of
		/// length n/2, the resulting vector has length n/4 etc.
		/// 2. Pairwise reduction:
		/// Everything is the same except for an additional shuffle operation which
		/// is used to produce operands for pairwise kind of reductions.
		/// %val1 = shufflevector<n x t> %val, <n x t> %undef,
		/// <n x i32> <i32 0, i32 2, ..., i32 n-2, i32 undef, ..., i32 undef>
		/// \-------------v----------/ \----------v------------/
		/// n/2 elements n/2 elements
		/// %val2 = shufflevector<n x t> %val, <n x t> %undef,
		/// <n x i32> <i32 1, i32 3, ..., i32 n-1, i32 undef, ..., i32 undef>
		/// \-------------v----------/ \----------v------------/
		/// n/2 elements n/2 elements
		/// %red1 = op <n x t> %val1, <n x t> val2
		/// Again, the operation is performed on <n x t> vector, but the resulting
		/// vector %red1 is <n/2 x t> vector.
		///
		/// The cost model should take into account that the actual length of the
		/// vector is reduced on each iteration.
		RKSimonUnsubmitted Not Done Reply Inline Actions This seems the same as the comment before getArithmeticReductionCost - in which case is it worth keeping? RKSimon: This seems the same as the comment before getArithmeticReductionCost - in which case is it…
		unsigned getMinMaxReductionCost(Type Ty, Type CondTy, bool IsPairwise) {
		assert(Ty->isVectorTy() && "Expect a vector type");
		Type *ScalarTy = Ty->getVectorElementType();
		Type *ScalarCondTy = CondTy->getVectorElementType();
		unsigned NumVecElts = Ty->getVectorNumElements();
		unsigned NumReduxLevels = Log2_32(NumVecElts);
		unsigned CmpOpcode;
		if (Ty->getVectorElementType()->isFloatingPointTy())
		CmpOpcode = Instruction::FCmp;
		else {
		assert(Ty->isIntOrIntVectorTy() &&
		"expecting floating point or integer type for min/max reduction");
		CmpOpcode = Instruction::ICmp;
		}
		unsigned MinMaxCost = 0;
		unsigned ShuffleCost = 0;
		auto ConcreteTTI = static_cast<T >(this);
		std::pair<unsigned, MVT> LT =
		ConcreteTTI->getTLI()->getTypeLegalizationCost(DL, Ty);
		unsigned LongVectorCount = 0;
		unsigned MVTLen =
		LT.second.isVector() ? LT.second.getVectorNumElements() : 1;
		while (NumVecElts > MVTLen) {
		NumVecElts /= 2;
		// Assume the pairwise shuffles add a cost.
		ShuffleCost += (IsPairwise + 1) *
		ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
		NumVecElts, Ty);
		MinMaxCost +=
		ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy, nullptr) +
		ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
		nullptr);
		Ty = VectorType::get(ScalarTy, NumVecElts);
		CondTy = VectorType::get(ScalarCondTy, NumVecElts);
		++LongVectorCount;
		}
		// The minimal length of the vector is limited by the real length of vector
		// operations performed on the current platform. That's why several final
		// reduction opertions are perfomed on the vectors with the same
		// architecture-dependent length.
		ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *
		ConcreteTTI->getShuffleCost(TTI::SK_ExtractSubvector, Ty,
		NumVecElts, Ty);
		MinMaxCost +=
		(NumReduxLevels - LongVectorCount) *
		(ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy, nullptr) +
		ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
		nullptr));
		// Need 3 extractelement instructions for scalarization + an additional
		// scalar select instruction.
		return ShuffleCost + MinMaxCost +
		3 * getScalarizationOverhead(Ty, /Insert=/false,
		/Extract=/true) +
		static_cast<T *>(this)->getCmpSelInstrCost(
		RKSimonUnsubmitted Not Done Reply Inline Actions ConcreteTTI->getCmpSelInstrCost( RKSimon: ConcreteTTI->getCmpSelInstrCost(
		Instruction::Select, ScalarTy, ScalarCondTy, nullptr);
		}

unsigned getVectorSplitCost() { return 1; }		unsigned getVectorSplitCost() { return 1; }

/// @}		/// @}
};		};

/// \brief Concrete BasicTTIImpl that can be used if no further customization		/// \brief Concrete BasicTTIImpl that can be used if no further customization
/// is needed.		/// is needed.
class BasicTTIImpl : public BasicTTIImplBase<BasicTTIImpl> {		class BasicTTIImpl : public BasicTTIImplBase<BasicTTIImpl> {
Show All 16 Lines

include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	private:
bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeGEPIndices(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

/// Try to find horizontal reduction or otherwise vectorize a chain of binary		/// Try to find horizontal reduction or otherwise vectorize a chain of binary
/// operators.		/// operators.
bool vectorizeRootInstruction(PHINode P, Value V, BasicBlock *BB,		bool vectorizeRootInstruction(PHINode P, Value V, BasicBlock *BB,
slpvectorizer::BoUpSLP &R,		slpvectorizer::BoUpSLP &R,
TargetTransformInfo *TTI);		TargetTransformInfo *TTI);

		/// Try to vectorize trees that start at insertvalue instructions.
		bool vectorizeInsertValueInst(InsertValueInst IVI, BasicBlock BB,
		slpvectorizer::BoUpSLP &R);
		/// Try to vectorize trees that start at insertelement instructions.
		bool vectorizeInsertElementInst(InsertElementInst IEI, BasicBlock BB,
		slpvectorizer::BoUpSLP &R);
		/// Try to vectorize trees that start at compare instructions.
		bool vectorizeCmpInst(CmpInst CI, BasicBlock BB, slpvectorizer::BoUpSLP &R);
		/// Tries to vectorize constructs started from CmpInst, InsertValueInst or
		/// InsertElementInst instructions.
		bool vectorizeSimpleInstructions(SmallVectorImpl<WeakVH> &Instructions,
		BasicBlock *BB, slpvectorizer::BoUpSLP &R);

/// \brief Scan the basic block and look for patterns that are likely to start		/// \brief Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.		/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,		bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,
unsigned VecRegSize);		unsigned VecRegSize);

bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);		bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);
Show All 10 Lines

lib/Analysis/CostModel.cpp

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	static bool matchPairwiseShuffleMask(ShuffleVectorInst *SI, bool IsLeft,
for (unsigned i = 0, e = (1 << Level), val = !IsLeft; i != e; ++i, val += 2)		for (unsigned i = 0, e = (1 << Level), val = !IsLeft; i != e; ++i, val += 2)
Mask[i] = val;		Mask[i] = val;

SmallVector<int, 16> ActualMask = SI->getShuffleMask();		SmallVector<int, 16> ActualMask = SI->getShuffleMask();
return Mask == ActualMask;		return Mask == ActualMask;
}		}

namespace {		namespace {
		/// Kind of the reduction data.
		enum class ReductionKind {
		NotReduction, /// Not a reduction.
		ArithmeticReduction, /// Binary reduction data.
		MinMaxReduction, /// Min/max reduction data.
		};
		RKSimonUnsubmitted Not Done Reply Inline Actions They're not public, but maybe keep to style guide (also, maybe drop the class?) enum ReductionKind { RK_None, /// Not a reduction. RK_Arithmetic, /// Binary reduction data. RK_MinMax, /// Min/max reduction data. }; RKSimon: They're not public, but maybe keep to style guide (also, maybe drop the class?) ``` enum…
/// Contains opcode + LHS/RHS parts of the reduction operations.		/// Contains opcode + LHS/RHS parts of the reduction operations.
struct ReductionData {		struct ReductionData {
explicit ReductionData() = default;		ReductionData() = delete;
ReductionData(unsigned Opcode, Value LHS, Value RHS)		ReductionData(ReductionKind Kind, unsigned Opcode, Value LHS, Value RHS)
: Opcode(Opcode), LHS(LHS), RHS(RHS) {}		: Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind) {
		assert(Kind != ReductionKind::NotReduction &&
		"expected binary or min/max reduction only.");
		}
unsigned Opcode = 0;		unsigned Opcode = 0;
Value *LHS = nullptr;		Value *LHS = nullptr;
Value *RHS = nullptr;		Value *RHS = nullptr;
		ReductionKind Kind = ReductionKind::NotReduction;
		bool isBinary() const { return Kind == ReductionKind::ArithmeticReduction; }
		bool isMinMax() const { return Kind == ReductionKind::MinMaxReduction; }
		bool hasSameData(ReductionData &RD) const {
		return this == &RD \|\| (Kind == RD.Kind && Opcode == RD.Opcode);
		RKSimonUnsubmitted Not Done Reply Inline Actions Do you need the this == &RD? Won't it always match on (Kind == RD.Kind && Opcode == RD.Opcode)? RKSimon: Do you need the this == &RD? Won't it always match on (Kind == RD.Kind && Opcode == RD.Opcode)?
		}
};		};
} // namespace		} // namespace

static Optional<ReductionData> getReductionData(Instruction *I) {		static Optional<ReductionData> getReductionData(Instruction *I) {
Value L, R;		Value L, R;
if (m_BinOp(m_Value(L), m_Value(R)).match(I))		if (m_BinOp(m_Value(L), m_Value(R)).match(I)) {
return ReductionData(I->getOpcode(), L, R);		return ReductionData(ReductionKind::ArithmeticReduction, I->getOpcode(), L,
		R);
		}
		if (auto *SI = dyn_cast<SelectInst>(I)) {
		if (m_UMin(m_Value(L), m_Value(R)).match(SI) \|\|
		m_SMin(m_Value(L), m_Value(R)).match(SI) \|\|
		m_SMax(m_Value(L), m_Value(R)).match(SI) \|\|
		m_UMax(m_Value(L), m_Value(R)).match(SI) \|\|
		m_OrdFMin(m_Value(L), m_Value(R)).match(SI) \|\|
		m_OrdFMax(m_Value(L), m_Value(R)).match(SI) \|\|
		m_UnordFMin(m_Value(L), m_Value(R)).match(SI) \|\|
		m_UnordFMax(m_Value(L), m_Value(R)).match(SI)) {
		auto *CI = cast<CmpInst>(SI->getCondition());
		return ReductionData(ReductionKind::MinMaxReduction, CI->getOpcode(), L,
		R);
		}
		}
return llvm::None;		return llvm::None;
}		}

static bool matchPairwiseReductionAtLevel(Instruction *I, unsigned Level,		static ReductionKind matchPairwiseReductionAtLevel(Instruction *I,
		unsigned Level,
unsigned NumLevels) {		unsigned NumLevels) {
// Match one level of pairwise operations.		// Match one level of pairwise operations.
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,		// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>		// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,		// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
if (!I)		if (!I)
return false;		return ReductionKind::NotReduction;

assert(I->getType()->isVectorTy() && "Expecting a vector type");		assert(I->getType()->isVectorTy() && "Expecting a vector type");

Optional<ReductionData> RD = getReductionData(I);		Optional<ReductionData> RD = getReductionData(I);
if (!RD)		if (!RD)
return false;		return ReductionKind::NotReduction;

ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);		ShuffleVectorInst *LS = dyn_cast<ShuffleVectorInst>(RD->LHS);
if (!LS && Level)		if (!LS && Level)
return false;		return ReductionKind::NotReduction;
ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);		ShuffleVectorInst *RS = dyn_cast<ShuffleVectorInst>(RD->RHS);
if (!RS && Level)		if (!RS && Level)
return false;		return ReductionKind::NotReduction;

// On level 0 we can omit one shufflevector instruction.		// On level 0 we can omit one shufflevector instruction.
if (!Level && !RS && !LS)		if (!Level && !RS && !LS)
return false;		return ReductionKind::NotReduction;

// Shuffle inputs must match.		// Shuffle inputs must match.
Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;		Value *NextLevelOpL = LS ? LS->getOperand(0) : nullptr;
Value *NextLevelOpR = RS ? RS->getOperand(0) : nullptr;		Value *NextLevelOpR = RS ? RS->getOperand(0) : nullptr;
Value *NextLevelOp = nullptr;		Value *NextLevelOp = nullptr;
if (NextLevelOpR && NextLevelOpL) {		if (NextLevelOpR && NextLevelOpL) {
// If we have two shuffles their operands must match.		// If we have two shuffles their operands must match.
if (NextLevelOpL != NextLevelOpR)		if (NextLevelOpL != NextLevelOpR)
return false;		return ReductionKind::NotReduction;

NextLevelOp = NextLevelOpL;		NextLevelOp = NextLevelOpL;
} else if (Level == 0 && (NextLevelOpR \|\| NextLevelOpL)) {		} else if (Level == 0 && (NextLevelOpR \|\| NextLevelOpL)) {
// On the first level we can omit the shufflevector <0, undef,...>. So the		// On the first level we can omit the shufflevector <0, undef,...>. So the
// input to the other shufflevector <1, undef> must match with one of the		// input to the other shufflevector <1, undef> must match with one of the
// inputs to the current binary operation.		// inputs to the current binary operation.
// Example:		// Example:
// %NextLevelOpL = shufflevector %R, <1, undef ...>		// %NextLevelOpL = shufflevector %R, <1, undef ...>
// %BinOp = fadd %NextLevelOpL, %R		// %BinOp = fadd %NextLevelOpL, %R
if (NextLevelOpL && NextLevelOpL != RD->RHS)		if (NextLevelOpL && NextLevelOpL != RD->RHS)
return false;		return ReductionKind::NotReduction;
else if (NextLevelOpR && NextLevelOpR != RD->LHS)		else if (NextLevelOpR && NextLevelOpR != RD->LHS)
return false;		return ReductionKind::NotReduction;

NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;		NextLevelOp = NextLevelOpL ? RD->RHS : RD->LHS;
} else		} else
return false;		return ReductionKind::NotReduction;

// Check that the next levels binary operation exists and matches with the		// Check that the next levels binary operation exists and matches with the
// current one.		// current one.
if (Level + 1 != NumLevels) {		if (Level + 1 != NumLevels) {
Optional<ReductionData> NextLevelRD =		Optional<ReductionData> NextLevelRD =
getReductionData(cast<Instruction>(NextLevelOp));		getReductionData(cast<Instruction>(NextLevelOp));
if (!NextLevelRD \|\| RD->Opcode != NextLevelRD->Opcode)		if (!NextLevelRD \|\| !RD->hasSameData(*NextLevelRD))
return false;		return ReductionKind::NotReduction;
}		}

// Shuffle mask for pairwise operation must match.		// Shuffle mask for pairwise operation must match.
if (matchPairwiseShuffleMask(LS, /IsLeft=/true, Level)) {		if (matchPairwiseShuffleMask(LS, /IsLeft=/true, Level)) {
if (!matchPairwiseShuffleMask(RS, /IsLeft=/false, Level))		if (!matchPairwiseShuffleMask(RS, /IsLeft=/false, Level))
return false;		return ReductionKind::NotReduction;
} else if (matchPairwiseShuffleMask(RS, /IsLeft=/true, Level)) {		} else if (matchPairwiseShuffleMask(RS, /IsLeft=/true, Level)) {
if (!matchPairwiseShuffleMask(LS, /IsLeft=/false, Level))		if (!matchPairwiseShuffleMask(LS, /IsLeft=/false, Level))
return false;		return ReductionKind::NotReduction;
} else		} else
return false;		return ReductionKind::NotReduction;

if (++Level == NumLevels)		if (++Level == NumLevels)
return true;		return RD->Kind;

// Match next level.		// Match next level.
return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp), Level,		return matchPairwiseReductionAtLevel(cast<Instruction>(NextLevelOp), Level,
NumLevels);		NumLevels);
}		}

static bool matchPairwiseReduction(const ExtractElementInst *ReduxRoot,		static ReductionKind matchPairwiseReduction(const ExtractElementInst *ReduxRoot,
unsigned &Opcode, Type *&Ty) {		unsigned &Opcode, Type *&Ty) {
if (!EnableReduxCost)		if (!EnableReduxCost)
return false;		return ReductionKind::NotReduction;

// Need to extract the first element.		// Need to extract the first element.
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));		ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
unsigned Idx = ~0u;		unsigned Idx = ~0u;
if (CI)		if (CI)
Idx = CI->getZExtValue();		Idx = CI->getZExtValue();
if (Idx != 0)		if (Idx != 0)
return false;		return ReductionKind::NotReduction;

auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));		auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
if (!RdxStart)		if (!RdxStart)
return false;		return ReductionKind::NotReduction;
Optional<ReductionData> RD = getReductionData(RdxStart);		Optional<ReductionData> RD = getReductionData(RdxStart);
if (!RD)		if (!RD)
return false;		return ReductionKind::NotReduction;

Type *VecTy = RdxStart->getType();		Type *VecTy = RdxStart->getType();
unsigned NumVecElems = VecTy->getVectorNumElements();		unsigned NumVecElems = VecTy->getVectorNumElements();
if (!isPowerOf2_32(NumVecElems))		if (!isPowerOf2_32(NumVecElems))
return false;		return ReductionKind::NotReduction;

// We look for a sequence of shuffle,shuffle,add triples like the following		// We look for a sequence of shuffle,shuffle,add triples like the following
// that builds a pairwise reduction tree.		// that builds a pairwise reduction tree.
//		//
// (X0, X1, X2, X3)		// (X0, X1, X2, X3)
// (X0 + X1, X2 + X3, undef, undef)		// (X0 + X1, X2 + X3, undef, undef)
// ((X0 + X1) + (X2 + X3), undef, undef, undef)		// ((X0 + X1) + (X2 + X3), undef, undef, undef)
//		//
// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,		// %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>		// <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,		// %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		// <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1		// %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
// %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,		// %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
// <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>		// <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
// %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,		// %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
// %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1		// %bin.rdx8 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
// %r = extractelement <4 x float> %bin.rdx8, i32 0		// %r = extractelement <4 x float> %bin.rdx8, i32 0
if (!matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)))		if (matchPairwiseReductionAtLevel(RdxStart, 0, Log2_32(NumVecElems)) ==
return false;		ReductionKind::NotReduction)
		return ReductionKind::NotReduction;

Opcode = RD->Opcode;		Opcode = RD->Opcode;
Ty = VecTy;		Ty = VecTy;

return true;		return RD->Kind;
}		}

static std::pair<Value , ShuffleVectorInst >		static std::pair<Value , ShuffleVectorInst >
getShuffleAndOtherOprd(Value L, Value R) {		getShuffleAndOtherOprd(Value L, Value R) {
ShuffleVectorInst *S = nullptr;		ShuffleVectorInst *S = nullptr;

if ((S = dyn_cast<ShuffleVectorInst>(L)))		if ((S = dyn_cast<ShuffleVectorInst>(L)))
return std::make_pair(R, S);		return std::make_pair(R, S);

S = dyn_cast<ShuffleVectorInst>(R);		S = dyn_cast<ShuffleVectorInst>(R);
return std::make_pair(L, S);		return std::make_pair(L, S);
}		}

static bool matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,		static ReductionKind
		matchVectorSplittingReduction(const ExtractElementInst *ReduxRoot,
unsigned &Opcode, Type *&Ty) {		unsigned &Opcode, Type *&Ty) {
if (!EnableReduxCost)		if (!EnableReduxCost)
return false;		return ReductionKind::NotReduction;

// Need to extract the first element.		// Need to extract the first element.
ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));		ConstantInt *CI = dyn_cast<ConstantInt>(ReduxRoot->getOperand(1));
unsigned Idx = ~0u;		unsigned Idx = ~0u;
if (CI)		if (CI)
Idx = CI->getZExtValue();		Idx = CI->getZExtValue();
if (Idx != 0)		if (Idx != 0)
return false;		return ReductionKind::NotReduction;

auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));		auto *RdxStart = dyn_cast<Instruction>(ReduxRoot->getOperand(0));
if (!RdxStart)		if (!RdxStart)
return false;		return ReductionKind::NotReduction;
Optional<ReductionData> RD = getReductionData(RdxStart);		Optional<ReductionData> RD = getReductionData(RdxStart);
if (!RD)		if (!RD)
return false;		return ReductionKind::NotReduction;

Type *VecTy = ReduxRoot->getOperand(0)->getType();		Type *VecTy = ReduxRoot->getOperand(0)->getType();
unsigned NumVecElems = VecTy->getVectorNumElements();		unsigned NumVecElems = VecTy->getVectorNumElements();
if (!isPowerOf2_32(NumVecElems))		if (!isPowerOf2_32(NumVecElems))
return false;		return ReductionKind::NotReduction;

// We look for a sequence of shuffles and adds like the following matching one		// We look for a sequence of shuffles and adds like the following matching one
// fadd, shuffle vector pair at a time.		// fadd, shuffle vector pair at a time.
//		//
// %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,		// %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		// <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
// %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf		// %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
// %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,		// %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		// <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
// %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7		// %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
// %r = extractelement <4 x float> %bin.rdx8, i32 0		// %r = extractelement <4 x float> %bin.rdx8, i32 0

unsigned MaskStart = 1;		unsigned MaskStart = 1;
Instruction *RdxOp = RdxStart;		Instruction *RdxOp = RdxStart;
SmallVector<int, 32> ShuffleMask(NumVecElems, 0);		SmallVector<int, 32> ShuffleMask(NumVecElems, 0);
unsigned NumVecElemsRemain = NumVecElems;		unsigned NumVecElemsRemain = NumVecElems;
while (NumVecElemsRemain - 1) {		while (NumVecElemsRemain - 1) {
// Check for the right reduction operation.		// Check for the right reduction operation.
if (!RdxOp)		if (!RdxOp)
return false;		return ReductionKind::NotReduction;
Optional<ReductionData> RDLevel = getReductionData(RdxOp);		Optional<ReductionData> RDLevel = getReductionData(RdxOp);
if (!RDLevel \|\| RDLevel->Opcode != RD->Opcode)		if (!RDLevel \|\| !RDLevel->hasSameData(*RD))
return false;		return ReductionKind::NotReduction;

Value *NextRdxOp;		Value *NextRdxOp;
ShuffleVectorInst *Shuffle;		ShuffleVectorInst *Shuffle;
std::tie(NextRdxOp, Shuffle) =		std::tie(NextRdxOp, Shuffle) =
getShuffleAndOtherOprd(RDLevel->LHS, RDLevel->RHS);		getShuffleAndOtherOprd(RDLevel->LHS, RDLevel->RHS);

// Check the current reduction operation and the shuffle use the same value.		// Check the current reduction operation and the shuffle use the same value.
if (Shuffle == nullptr)		if (Shuffle == nullptr)
return false;		return ReductionKind::NotReduction;
if (Shuffle->getOperand(0) != NextRdxOp)		if (Shuffle->getOperand(0) != NextRdxOp)
return false;		return ReductionKind::NotReduction;

// Check that shuffle masks matches.		// Check that shuffle masks matches.
for (unsigned j = 0; j != MaskStart; ++j)		for (unsigned j = 0; j != MaskStart; ++j)
ShuffleMask[j] = MaskStart + j;		ShuffleMask[j] = MaskStart + j;
// Fill the rest of the mask with -1 for undef.		// Fill the rest of the mask with -1 for undef.
std::fill(&ShuffleMask[MaskStart], ShuffleMask.end(), -1);		std::fill(&ShuffleMask[MaskStart], ShuffleMask.end(), -1);

SmallVector<int, 16> Mask = Shuffle->getShuffleMask();		SmallVector<int, 16> Mask = Shuffle->getShuffleMask();
if (ShuffleMask != Mask)		if (ShuffleMask != Mask)
return false;		return ReductionKind::NotReduction;

RdxOp = dyn_cast<Instruction>(NextRdxOp);		RdxOp = dyn_cast<Instruction>(NextRdxOp);
NumVecElemsRemain /= 2;		NumVecElemsRemain /= 2;
MaskStart *= 2;		MaskStart *= 2;
}		}

Opcode = RD->Opcode;		Opcode = RD->Opcode;
Ty = VecTy;		Ty = VecTy;
return true;		return RD->Kind;
}		}

unsigned CostModelAnalysis::getInstructionCost(const Instruction *I) const {		unsigned CostModelAnalysis::getInstructionCost(const Instruction *I) const {
if (!TTI)		if (!TTI)
return -1;		return -1;

switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	case Instruction::ExtractElement: {
if (CI)		if (CI)
Idx = CI->getZExtValue();		Idx = CI->getZExtValue();

// Try to match a reduction sequence (series of shufflevector and vector		// Try to match a reduction sequence (series of shufflevector and vector
// adds followed by a extractelement).		// adds followed by a extractelement).
unsigned ReduxOpCode;		unsigned ReduxOpCode;
Type *ReduxType;		Type *ReduxType;

if (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {		switch (matchVectorSplittingReduction(EEI, ReduxOpCode, ReduxType)) {
		case ReductionKind::ArithmeticReduction:
return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,		return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
/IsPairwiseForm=/false);		/IsPairwiseForm=/false);
		case ReductionKind::MinMaxReduction:
		return TTI->getMinMaxReductionCost(ReduxType,
		CmpInst::makeCmpResultType(ReduxType),
		/IsPairwiseForm=/false);
		case ReductionKind::NotReduction:
		break;
}		}
if (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
		switch (matchPairwiseReduction(EEI, ReduxOpCode, ReduxType)) {
		case ReductionKind::ArithmeticReduction:
return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,		return TTI->getArithmeticReductionCost(ReduxOpCode, ReduxType,
/IsPairwiseForm=/true);		/IsPairwiseForm=/true);
		case ReductionKind::MinMaxReduction:
		return TTI->getMinMaxReductionCost(ReduxType,
		CmpInst::makeCmpResultType(ReduxType),
		/IsPairwiseForm=/true);
		case ReductionKind::NotReduction:
		break;
}		}

return TTI->getVectorInstrCost(I->getOpcode(),		return TTI->getVectorInstrCost(I->getOpcode(),
EEI->getOperand(0)->getType(), Idx);		EEI->getOperand(0)->getType(), Idx);
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
const InsertElementInst * IE = cast<InsertElementInst>(I);		const InsertElementInst * IE = cast<InsertElementInst>(I);
ConstantInt *CI = dyn_cast<ConstantInt>(IE->getOperand(2));		ConstantInt *CI = dyn_cast<ConstantInt>(IE->getOperand(2));
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 473 Lines • ▼ Show 20 Lines

	int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode, Type *Ty,			int TargetTransformInfo::getArithmeticReductionCost(unsigned Opcode, Type *Ty,
	bool IsPairwiseForm) const {			bool IsPairwiseForm) const {
	int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);			int Cost = TTIImpl->getArithmeticReductionCost(Opcode, Ty, IsPairwiseForm);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				int TargetTransformInfo::getMinMaxReductionCost(Type Ty, Type CondTy,
				bool IsPairwiseForm) const {
				int Cost = TTIImpl->getMinMaxReductionCost(Ty, CondTy, IsPairwiseForm);
				assert(Cost >= 0 && "TTI should not produce negative costs!");
				return Cost;
				}

	unsigned			unsigned
	TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {			TargetTransformInfo::getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) const {
	return TTIImpl->getCostOfKeepingLiveOverCall(Tys);			return TTIImpl->getCostOfKeepingLiveOverCall(Tys);
	}			}

	bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,			bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,
	MemIntrinsicInfo &Info) const {			MemIntrinsicInfo &Info) const {
	return TTIImpl->getTgtMemIntrinsic(Inst, Info);			return TTIImpl->getTgtMemIntrinsic(Inst, Info);
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
unsigned ScalarizationCostPassed = UINT_MAX);		unsigned ScalarizationCostPassed = UINT_MAX);
int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF,		ArrayRef<Value *> Args, FastMathFlags FMF,
unsigned VF = 1);		unsigned VF = 1);

int getArithmeticReductionCost(unsigned Opcode, Type *Ty,		int getArithmeticReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm);		bool IsPairwiseForm);

		int getMinMaxReductionCost(Type Ty, Type CondTy, bool IsPairwiseForm);

int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,		int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Factor, ArrayRef<unsigned> Indices,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,		int getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,
unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Factor, ArrayRef<unsigned> Indices,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getInterleavedMemoryOpCostAVX2(unsigned Opcode, Type *VecTy,		int getInterleavedMemoryOpCostAVX2(unsigned Opcode, Type *VecTy,
unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Factor, ArrayRef<unsigned> Indices,
Show All 29 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,871 Lines • ▼ Show 20 Lines	if (IsPairwise) {
if (ST->hasSSE42())		if (ST->hasSSE42())
if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))		if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;
}		}

return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwise);		return BaseT::getArithmeticReductionCost(Opcode, ValTy, IsPairwise);
}		}

		int X86TTIImpl::getMinMaxReductionCost(Type ValTy, Type CondTy,
		bool IsPairwise) {
		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);

		MVT MTy = LT.second;

		int ISD = ValTy->isIntOrIntVectorTy() ? ISD::SMIN : ISD::FMINNUM;

		// We use the Intel Architecture Code Analyzer(IACA) to measure the throughput
		RKSimonUnsubmitted Done Reply Inline Actions Unnecessary? RKSimon: Unnecessary?
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Yes, missed it, thanks. ABataev: Yes, missed it, thanks.
		// and make it as the cost.

		static const CostTblEntry SSE42CostTblPairWise[] = {
		{ISD::FMINNUM, MVT::v2f64, 3}, {ISD::FMINNUM, MVT::v4f32, 2},
		RKSimonUnsubmitted Not Done Reply Inline Actions One cost entry per line RKSimon: One cost entry per line
		{ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is "6.8"
		{ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is "1.5"
		{ISD::SMIN, MVT::v8i16, 2},
		};

		static const CostTblEntry AVX1CostTblPairWise[] = {
		{ISD::FMINNUM, MVT::v4f32, 1}, {ISD::FMINNUM, MVT::v4f64, 1},
		{ISD::FMINNUM, MVT::v8f32, 2}, {ISD::SMIN, MVT::v2i64, 3},
		{ISD::SMIN, MVT::v4i32, 1}, {ISD::SMIN, MVT::v8i16, 1},
		{ISD::SMIN, MVT::v8i32, 3},
		};

		static const CostTblEntry AVX2CostTblPairWise[] = {
		{ISD::SMIN, MVT::v4i64, 2},
		{ISD::SMIN, MVT::v8i32, 1},
		{ISD::SMIN, MVT::v16i16, 1},
		{ISD::SMIN, MVT::v32i8, 2},
		};

		static const CostTblEntry AVX512CostTblPairWise[] = {
		{ISD::FMINNUM, MVT::v8f64, 1},
		{ISD::FMINNUM, MVT::v16f32, 2},
		{ISD::SMIN, MVT::v8i64, 2},
		{ISD::SMIN, MVT::v16i32, 1},
		};

		static const CostTblEntry SSE42CostTblNoPairWise[] = {
		{ISD::FMINNUM, MVT::v2f64, 3}, {ISD::FMINNUM, MVT::v4f32, 3},
		{ISD::SMIN, MVT::v2i64, 7}, // The data reported by the IACA is "6.8"
		{ISD::SMIN, MVT::v4i32, 1}, // The data reported by the IACA is "1.5"
		{ISD::SMIN, MVT::v8i16, 1}, // The data reported by the IACA is "1.5"
		};

		static const CostTblEntry AVX1CostTblNoPairWise[] = {
		{ISD::FMINNUM, MVT::v4f32, 1}, {ISD::FMINNUM, MVT::v4f64, 1},
		{ISD::FMINNUM, MVT::v8f32, 1}, {ISD::SMIN, MVT::v2i64, 3},
		{ISD::SMIN, MVT::v4i32, 1}, {ISD::SMIN, MVT::v8i16, 1},
		{ISD::SMIN, MVT::v8i32, 2},
		};

		static const CostTblEntry AVX2CostTblNoPairWise[] = {
		{ISD::SMIN, MVT::v4i64, 1},
		{ISD::SMIN, MVT::v8i32, 1},
		{ISD::SMIN, MVT::v16i16, 1},
		{ISD::SMIN, MVT::v32i8, 1},
		};

		static const CostTblEntry AVX512CostTblNoPairWise[] = {
		{ISD::FMINNUM, MVT::v8f64, 1},
		{ISD::FMINNUM, MVT::v16f32, 2},
		{ISD::SMIN, MVT::v8i64, 1},
		{ISD::SMIN, MVT::v16i32, 1},
		};

		if (IsPairwise) {
		if (ST->hasAVX512())
		if (const auto *Entry = CostTableLookup(AVX512CostTblPairWise, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->hasAVX2())
		if (const auto *Entry = CostTableLookup(AVX2CostTblPairWise, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->hasAVX())
		if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->hasSSE42())
		if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD, MTy))
		return LT.first * Entry->Cost;
		} else {
		if (ST->hasAVX512())
		if (const auto *Entry =
		CostTableLookup(AVX512CostTblNoPairWise, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->hasAVX2())
		if (const auto *Entry = CostTableLookup(AVX2CostTblNoPairWise, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->hasAVX())
		if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
		return LT.first * Entry->Cost;

		if (ST->hasSSE42())
		if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))
		return LT.first * Entry->Cost;
		}

		return BaseT::getMinMaxReductionCost(ValTy, CondTy, IsPairwise);
		}

/// \brief Calculate the cost of materializing a 64-bit value. This helper		/// \brief Calculate the cost of materializing a 64-bit value. This helper
/// method might only calculate a fraction of a larger immediate. Therefore it		/// method might only calculate a fraction of a larger immediate. Therefore it
/// is valid to return a cost of ZERO.		/// is valid to return a cost of ZERO.
int X86TTIImpl::getIntImmCost(int64_t Val) {		int X86TTIImpl::getIntImmCost(int64_t Val) {
if (Val == 0)		if (Val == 0)
return TTI::TCC_Free;		return TTI::TCC_Free;

if (isInt<32>(Val))		if (isInt<32>(Val))
▲ Show 20 Lines • Show All 588 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 4,350 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,

return Changed;		return Changed;
}		}

bool SLPVectorizerPass::tryToVectorize(Instruction *I, BoUpSLP &R) {		bool SLPVectorizerPass::tryToVectorize(Instruction *I, BoUpSLP &R) {
if (!I)		if (!I)
return false;		return false;

if (!isa<BinaryOperator>(I))		if (!isa<BinaryOperator>(I) && !isa<CmpInst>(I))
return false;		return false;

Value *P = I->getParent();		Value *P = I->getParent();

// Vectorize in current basic block only.		// Vectorize in current basic block only.
auto *Op0 = dyn_cast<Instruction>(I->getOperand(0));		auto *Op0 = dyn_cast<Instruction>(I->getOperand(0));
auto *Op1 = dyn_cast<Instruction>(I->getOperand(1));		auto *Op1 = dyn_cast<Instruction>(I->getOperand(1));
if (!Op0 \|\| !Op1 \|\| Op0->getParent() != P \|\| Op1->getParent() != P)		if (!Op0 \|\| !Op1 \|\| Op0->getParent() != P \|\| Op1->getParent() != P)
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
/// *p =		/// *p =
///		///
class HorizontalReduction {		class HorizontalReduction {
SmallVector<Value *, 16> ReductionOps;		SmallVector<Value *, 16> ReductionOps;
SmallVector<Value *, 32> ReducedVals;		SmallVector<Value *, 32> ReducedVals;
// Use map vector to make stable output.		// Use map vector to make stable output.
MapVector<Instruction , Value > ExtraArgs;		MapVector<Instruction , Value > ExtraArgs;

		/// Kind of the reduction data.
		enum class ReductionKind {
		NotReduction, /// Not a reduction.
		RKSimonUnsubmitted Not Done Reply Inline Actions Call this RK_None to match the other version? RKSimon: Call this RK_None to match the other version?
		ArithmeticReduction, /// Binary reduction data.
		MinReduction, /// Minimum reduction data.
		UMinReduction, /// Unsigned minimum reduction data.
		MaxReduction, /// Maximum reduction data.
		UMaxReduction, /// Unsigned maximum reduction data.
		};
		RKSimonUnsubmitted Not Done Reply Inline Actions Same as above: enum ReductionKind { RK_Not, /// Not a reduction. RK_Arithmetic, /// Binary reduction data. RK_Min, /// Minimum reduction data. RK_UMin, /// Unsigned minimum reduction data. RK_Max, /// Maximum reduction data. RK_UMax, /// Unsigned maximum reduction data. }; RKSimon: Same as above: ``` enum ReductionKind { RK_Not, /// Not a reduction.
/// Contains info about operation, like its opcode, left and right operands.		/// Contains info about operation, like its opcode, left and right operands.
struct OperationData {		class OperationData {
/// true if the operation is a reduced value, false if reduction operation.
bool IsReducedValue = false;
/// Opcode of the instruction.		/// Opcode of the instruction.
unsigned Opcode = 0;		unsigned Opcode = 0;
/// Left operand of the reduction operation.		/// Left operand of the reduction operation.
Value *LHS = nullptr;		Value *LHS = nullptr;
/// Right operand of the reduction operation.		/// Right operand of the reduction operation.
Value *RHS = nullptr;		Value *RHS = nullptr;
		/// Kind of the reduction operation.
		ReductionKind Kind = ReductionKind::NotReduction;
		/// True if float point min/max reduction has no NaNs.
		bool NoNaN = false;

/// Checks if the reduction operation can be vectorized.		/// Checks if the reduction operation can be vectorized.
bool isVectorizable() const {		bool isVectorizable() const {
return LHS && RHS &&		return LHS && RHS &&
// We currently only support adds.		// We currently only support adds && min/max reductions.
(Opcode == Instruction::Add \|\| Opcode == Instruction::FAdd);		((Kind == ReductionKind::ArithmeticReduction &&
		(Opcode == Instruction::Add \|\| Opcode == Instruction::FAdd)) \|\|
		((Opcode == Instruction::ICmp \|\| Opcode == Instruction::FCmp) &&
		(Kind == ReductionKind::MinReduction \|\|
		Kind == ReductionKind::MaxReduction)) \|\|
		(Opcode == Instruction::ICmp &&
		(Kind == ReductionKind::UMinReduction \|\|
		Kind == ReductionKind::UMaxReduction)));
}		}

public:		public:
explicit OperationData() = default;		explicit OperationData() = default;
/// Construction for reduced values. They are identified by opcode only and		/// Construction for reduced values. They are identified by opcode only and
/// don't have associated LHS/RHS values.		/// don't have associated LHS/RHS values.
explicit OperationData(Value *V) : IsReducedValue(true) {		explicit OperationData(Value *V) : Kind(ReductionKind::NotReduction) {
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
Opcode = I->getOpcode();		Opcode = I->getOpcode();
}		}
/// Constructor for binary reduction operations with opcode and its left and		/// Constructor for reduction operations with opcode and its left and
/// right operands.		/// right operands.
OperationData(unsigned Opcode, Value LHS, Value RHS)		OperationData(unsigned Opcode, Value LHS, Value RHS, ReductionKind Kind,
: IsReducedValue(false), Opcode(Opcode), LHS(LHS), RHS(RHS) {}		bool NoNaN = false)
		: Opcode(Opcode), LHS(LHS), RHS(RHS), Kind(Kind), NoNaN(NoNaN) {
		assert(Kind != ReductionKind::NotReduction &&
		"One of the reduction operations is expected.");
		}
explicit operator bool() const { return Opcode; }		explicit operator bool() const { return Opcode; }
/// Get the index of the first operand.		/// Get the index of the first operand.
unsigned getFirstOperandIndex() const {		unsigned getFirstOperandIndex() const {
assert(!!*this && "The opcode is not set.");		assert(!!*this && "The opcode is not set.");
		switch (Kind) {
		case ReductionKind::MinReduction:
		case ReductionKind::UMinReduction:
		case ReductionKind::MaxReduction:
		case ReductionKind::UMaxReduction:
		return 1;
		case ReductionKind::ArithmeticReduction:
		case ReductionKind::NotReduction:
		break;
		}
return 0;		return 0;
}		}
/// Total number of operands in the reduction operation.		/// Total number of operands in the reduction operation.
unsigned getNumberOfOperands() const {		unsigned getNumberOfOperands() const {
assert(!IsReducedValue && !!*this && LHS && RHS &&		assert(Kind != ReductionKind::NotReduction && !!*this && LHS && RHS &&
"Expected reduction operation.");		"Expected reduction operation.");
		switch (Kind) {
		case ReductionKind::ArithmeticReduction:
return 2;		return 2;
		case ReductionKind::MinReduction:
		case ReductionKind::UMinReduction:
		case ReductionKind::MaxReduction:
		case ReductionKind::UMaxReduction:
		return 3;
		case ReductionKind::NotReduction:
		llvm_unreachable("Reduction kind is not set");
		}
}		}
/// Expected number of uses for reduction operations/reduced values.		/// Expected number of uses for reduction operations/reduced values.
unsigned getRequiredNumberOfUses() const {		unsigned getRequiredNumberOfUses() const {
assert(!IsReducedValue && !!*this && LHS && RHS &&		assert(Kind != ReductionKind::NotReduction && !!*this && LHS && RHS &&
"Expected reduction operation.");		"Expected reduction operation.");
		switch (Kind) {
		case ReductionKind::ArithmeticReduction:
return 1;		return 1;
		case ReductionKind::MinReduction:
		case ReductionKind::UMinReduction:
		case ReductionKind::MaxReduction:
		case ReductionKind::UMaxReduction:
		return 2;
		case ReductionKind::NotReduction:
		llvm_unreachable("Reduction kind is not set");
		}
}		}
/// Checks if instruction is associative and can be vectorized.		/// Checks if instruction is associative and can be vectorized.
bool isAssociative(Instruction *I) const {		bool isAssociative(Instruction *I) const {
assert(!IsReducedValue && *this && LHS && RHS &&		assert(Kind != ReductionKind::NotReduction && *this && LHS && RHS &&
"Expected reduction operation.");		"Expected reduction operation.");
		switch (Kind) {
		case ReductionKind::ArithmeticReduction:
return I->isAssociative();		return I->isAssociative();
		case ReductionKind::MinReduction:
		case ReductionKind::MaxReduction:
		return Opcode == Instruction::ICmp \|\|
		cast<Instruction>(I->getOperand(0))->hasUnsafeAlgebra();
		case ReductionKind::UMinReduction:
		case ReductionKind::UMaxReduction:
		assert(Opcode == Instruction::ICmp &&
		"Only integer compare operation is expected.");
		return true;
		case ReductionKind::NotReduction:
		break;
		}
		llvm_unreachable("Reduction kind is not set");
}		}
/// Checks if the reduction operation can be vectorized.		/// Checks if the reduction operation can be vectorized.
bool isVectorizable(Instruction *I) const {		bool isVectorizable(Instruction *I) const {
return isVectorizable() && isAssociative(I);		return isVectorizable() && isAssociative(I);
}		}

/// Checks if two operation data are both a reduction op or both a reduced		/// Checks if two operation data are both a reduction op or both a reduced
/// value.		/// value.
bool operator==(const OperationData &OD) {		bool operator==(const OperationData &OD) {
assert(((IsReducedValue != OD.IsReducedValue) \|\|		assert(((Kind != OD.Kind) \|\| ((!LHS == !OD.LHS) && (!RHS == !OD.RHS))) &&
((!LHS == !OD.LHS) && (!RHS == !OD.RHS))) &&
"One of the comparing operations is incorrect.");		"One of the comparing operations is incorrect.");
return this == &OD \|\|		return this == &OD \|\| (Kind == OD.Kind && Opcode == OD.Opcode);
(IsReducedValue == OD.IsReducedValue && Opcode == OD.Opcode);
}		}
bool operator!=(const OperationData &OD) { return !(*this == OD); }		bool operator!=(const OperationData &OD) { return !(*this == OD); }
void clear() {		void clear() {
IsReducedValue = false;
Opcode = 0;		Opcode = 0;
LHS = nullptr;		LHS = nullptr;
RHS = nullptr;		RHS = nullptr;
		Kind = ReductionKind::NotReduction;
		NoNaN = false;
}		}
/// Get the opcode of the reduction operation.		/// Get the opcode of the reduction operation.
unsigned getOpcode() const {		unsigned getOpcode() const {
assert(isVectorizable() && "Expected vectorizable operation.");		assert(isVectorizable() && "Expected vectorizable operation.");
return Opcode;		return Opcode;
}		}
		/// Get kind of reduction data.
		ReductionKind getKind() const { return Kind; }
Value *getLHS() const { return LHS; }		Value *getLHS() const { return LHS; }
Value *getRHS() const { return RHS; }		Value *getRHS() const { return RHS; }
		Type *getConditionType() const {
		switch (Kind) {
		case ReductionKind::ArithmeticReduction:
		return nullptr;
		case ReductionKind::MinReduction:
		case ReductionKind::MaxReduction:
		case ReductionKind::UMinReduction:
		case ReductionKind::UMaxReduction:
		return CmpInst::makeCmpResultType(LHS->getType());
		case ReductionKind::NotReduction:
		break;
		}
		llvm_unreachable("Reduction kind is not set");
		}
/// Creates reduction operation with the current opcode.		/// Creates reduction operation with the current opcode.
Value *createOp(IRBuilder<> &Builder, const Twine &Name = "") const {		Value *createOp(IRBuilder<> &Builder, const Twine &Name = "") const {
assert(!IsReducedValue &&		assert(isVectorizable() &&
(Opcode == Instruction::FAdd \|\| Opcode == Instruction::Add) &&		"Expected add\|fadd or min/max reduction operation.");
"Expected add\|fadd reduction operation.");		Value *Cmp;
		switch (Kind) {
		case ReductionKind::ArithmeticReduction:
return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS, RHS,		return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS, RHS,
Name);		Name);
		case ReductionKind::MinReduction:
		Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSLT(LHS, RHS)
		: Builder.CreateFCmpOLT(LHS, RHS);
		break;
		case ReductionKind::MaxReduction:
		Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSGT(LHS, RHS)
		: Builder.CreateFCmpOGT(LHS, RHS);
		break;
		case ReductionKind::UMinReduction:
		assert(Opcode == Instruction::ICmp && "Expected integer types.");
		Cmp = Builder.CreateICmpULT(LHS, RHS);
		break;
		case ReductionKind::UMaxReduction:
		assert(Opcode == Instruction::ICmp && "Expected integer types.");
		Cmp = Builder.CreateICmpUGT(LHS, RHS);
		break;
		case ReductionKind::NotReduction:
		llvm_unreachable("Unknown reduction operation.");
		}
		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
		}
		TargetTransformInfo::ReductionFlags getFlags() const {
		TargetTransformInfo::ReductionFlags Flags;
		Flags.NoNaN = NoNaN;
		switch (Kind) {
		case ReductionKind::ArithmeticReduction:
		break;
		case ReductionKind::MinReduction:
		Flags.IsSigned = Opcode == Instruction::ICmp;
		Flags.IsMaxOp = false;
		break;
		case ReductionKind::MaxReduction:
		Flags.IsSigned = Opcode == Instruction::ICmp;
		Flags.IsMaxOp = true;
		break;
		case ReductionKind::UMinReduction:
		Flags.IsSigned = false;
		Flags.IsMaxOp = false;
		break;
		case ReductionKind::UMaxReduction:
		Flags.IsSigned = false;
		Flags.IsMaxOp = true;
		break;
		case ReductionKind::NotReduction:
		llvm_unreachable("Reduction kind is not set");
		}
		return Flags;
}		}
};		};

Instruction *ReductionRoot = nullptr;		Instruction *ReductionRoot = nullptr;

/// The operation data of the reduction operation.		/// The operation data of the reduction operation.
OperationData ReductionData;		OperationData ReductionData;
/// The operation data of the values we perform a reduction on.		/// The operation data of the values we perform a reduction on.
Show All 23 Lines	class HorizontalReduction {
}		}

static OperationData getOperationData(Value *V) {		static OperationData getOperationData(Value *V) {
if (!V)		if (!V)
return OperationData();		return OperationData();

Value *LHS;		Value *LHS;
Value *RHS;		Value *RHS;
if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V))		if (m_BinOp(m_Value(LHS), m_Value(RHS)).match(V)) {
return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS, RHS);		return OperationData(cast<BinaryOperator>(V)->getOpcode(), LHS, RHS,
		ReductionKind::ArithmeticReduction);
		}
		if (auto *Select = dyn_cast<SelectInst>(V)) {
		// Look for a min/max pattern.
		if (m_UMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
		return OperationData(Instruction::ICmp, LHS, RHS,
		ReductionKind::UMinReduction);
		} else if (m_SMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
		return OperationData(Instruction::ICmp, LHS, RHS,
		ReductionKind::MinReduction);
		} else if (m_OrdFMin(m_Value(LHS), m_Value(RHS)).match(Select) \|\|
		m_UnordFMin(m_Value(LHS), m_Value(RHS)).match(Select)) {
		return OperationData(
		Instruction::FCmp, LHS, RHS, ReductionKind::MinReduction,
		cast<Instruction>(Select->getCondition())->hasNoNaNs());
		} else if (m_UMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
		return OperationData(Instruction::ICmp, LHS, RHS,
		ReductionKind::UMaxReduction);
		} else if (m_SMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
		return OperationData(Instruction::ICmp, LHS, RHS,
		ReductionKind::MaxReduction);
		} else if (m_OrdFMax(m_Value(LHS), m_Value(RHS)).match(Select) \|\|
		m_UnordFMax(m_Value(LHS), m_Value(RHS)).match(Select)) {
		return OperationData(
		Instruction::FCmp, LHS, RHS, ReductionKind::MaxReduction,
		cast<Instruction>(Select->getCondition())->hasNoNaNs());
		}
		}
return OperationData(V);		return OperationData(V);
}		}

public:		public:
HorizontalReduction() = default;		HorizontalReduction() = default;

/// \brief Try to find a reduction tree.		/// \brief Try to find a reduction tree.
bool matchAssociativeReduction(PHINode Phi, Instruction B) {		bool matchAssociativeReduction(PHINode Phi, Instruction B) {
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {
Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);		Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);

// Emit a reduction.		// Emit a reduction.
Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps, TTI);
if (VectorizedTree) {		if (VectorizedTree) {
Builder.SetCurrentDebugLocation(Loc);		Builder.SetCurrentDebugLocation(Loc);
OperationData VectReductionData(ReductionData.getOpcode(),		OperationData VectReductionData(ReductionData.getOpcode(),
VectorizedTree, ReducedSubTree);		VectorizedTree, ReducedSubTree,
VectorizedTree = VectReductionData.createOp(Builder, "bin.rdx");		ReductionData.getKind());
		VectorizedTree = VectReductionData.createOp(Builder, "op.rdx");
propagateIRFlags(VectorizedTree, ReductionOps);		propagateIRFlags(VectorizedTree, ReductionOps);
} else		} else
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
i += ReduxWidth;		i += ReduxWidth;
ReduxWidth = PowerOf2Floor(NumReducedVals - i);		ReduxWidth = PowerOf2Floor(NumReducedVals - i);
}		}

if (VectorizedTree) {		if (VectorizedTree) {
// Finish the reduction.		// Finish the reduction.
for (; i < NumReducedVals; ++i) {		for (; i < NumReducedVals; ++i) {
auto *I = cast<Instruction>(ReducedVals[i]);		auto *I = cast<Instruction>(ReducedVals[i]);
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
OperationData VectReductionData(ReductionData.getOpcode(),		OperationData VectReductionData(ReductionData.getOpcode(),
VectorizedTree, I);		VectorizedTree, I,
		ReductionData.getKind());
VectorizedTree = VectReductionData.createOp(Builder);		VectorizedTree = VectReductionData.createOp(Builder);
propagateIRFlags(VectorizedTree, ReductionOps);		propagateIRFlags(VectorizedTree, ReductionOps);
}		}
for (auto &Pair : ExternallyUsedValues) {		for (auto &Pair : ExternallyUsedValues) {
assert(!Pair.second.empty() &&		assert(!Pair.second.empty() &&
"At least one DebugLoc must be inserted");		"At least one DebugLoc must be inserted");
// Add each externally used value to the final reduction.		// Add each externally used value to the final reduction.
for (auto *I : Pair.second) {		for (auto *I : Pair.second) {
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
OperationData VectReductionData(ReductionData.getOpcode(),		OperationData VectReductionData(ReductionData.getOpcode(),
VectorizedTree, Pair.first);		VectorizedTree, Pair.first,
VectorizedTree = VectReductionData.createOp(Builder, "bin.extra");		ReductionData.getKind());
		VectorizedTree = VectReductionData.createOp(Builder, "op.extra");
propagateIRFlags(VectorizedTree, I);		propagateIRFlags(VectorizedTree, I);
}		}
}		}
// Update users.		// Update users.
ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);
}		}
return VectorizedTree != nullptr;		return VectorizedTree != nullptr;
}		}

unsigned numReductionValues() const {		unsigned numReductionValues() const {
return ReducedVals.size();		return ReducedVals.size();
}		}

private:		private:
/// \brief Calculate the cost of a reduction.		/// \brief Calculate the cost of a reduction.
int getReductionCost(TargetTransformInfo TTI, Value FirstReducedVal,		int getReductionCost(TargetTransformInfo TTI, Value FirstReducedVal,
unsigned ReduxWidth) {		unsigned ReduxWidth) {
Type *ScalarTy = FirstReducedVal->getType();		Type *ScalarTy = FirstReducedVal->getType();
Type *VecTy = VectorType::get(ScalarTy, ReduxWidth);		Type *VecTy = VectorType::get(ScalarTy, ReduxWidth);

int PairwiseRdxCost =		int PairwiseRdxCost;
		int SplittingRdxCost;
		switch (ReductionData.getKind()) {
		case ReductionKind::ArithmeticReduction:
		PairwiseRdxCost =
TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,		TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
/IsPairwiseForm=/true);		/IsPairwiseForm=/true);
int SplittingRdxCost =		SplittingRdxCost =
TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,		TTI->getArithmeticReductionCost(ReductionData.getOpcode(), VecTy,
/IsPairwiseForm=/false);		/IsPairwiseForm=/false);
		break;
		case ReductionKind::MinReduction:
		case ReductionKind::MaxReduction:
		case ReductionKind::UMinReduction:
		case ReductionKind::UMaxReduction: {
		Type *VecCondTy = CmpInst::makeCmpResultType(VecTy);
		PairwiseRdxCost = TTI->getMinMaxReductionCost(VecTy, VecCondTy,
		/IsPairwiseForm=/true);
		SplittingRdxCost = TTI->getMinMaxReductionCost(VecTy, VecCondTy,
		/IsPairwiseForm=/false);
		break;
		}
		case ReductionKind::NotReduction:
		llvm_unreachable("Expected arithmetic or min/max reduction operation");
		}

IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;		IsPairwiseReduction = PairwiseRdxCost < SplittingRdxCost;
int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost : SplittingRdxCost;		int VecReduxCost = IsPairwiseReduction ? PairwiseRdxCost : SplittingRdxCost;

int ScalarReduxCost =		int ScalarReduxCost;
(ReduxWidth - 1) *		switch (ReductionData.getKind()) {
		case ReductionKind::ArithmeticReduction:
		ScalarReduxCost =
TTI->getArithmeticInstrCost(ReductionData.getOpcode(), ScalarTy);		TTI->getArithmeticInstrCost(ReductionData.getOpcode(), ScalarTy);
		break;
		case ReductionKind::MinReduction:
		case ReductionKind::MaxReduction:
		case ReductionKind::UMinReduction:
		case ReductionKind::UMaxReduction:
		ScalarReduxCost =
		TTI->getCmpSelInstrCost(ReductionData.getOpcode(), ScalarTy) +
		TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
		CmpInst::makeCmpResultType(ScalarTy));
		break;
		case ReductionKind::NotReduction:
		llvm_unreachable("Expected arithmetic or min/max reduction operation");
		}
		ScalarReduxCost *= (ReduxWidth - 1);

DEBUG(dbgs() << "SLP: Adding cost " << VecReduxCost - ScalarReduxCost		DEBUG(dbgs() << "SLP: Adding cost " << VecReduxCost - ScalarReduxCost
<< " for reduction that starts with " << *FirstReducedVal		<< " for reduction that starts with " << *FirstReducedVal
<< " (It is a "		<< " (It is a "
<< (IsPairwiseReduction ? "pairwise" : "splitting")		<< (IsPairwiseReduction ? "pairwise" : "splitting")
<< " reduction)\n");		<< " reduction)\n");

return VecReduxCost - ScalarReduxCost;		return VecReduxCost - ScalarReduxCost;
}		}

/// \brief Emit a horizontal reduction of the vectorized value.		/// \brief Emit a horizontal reduction of the vectorized value.
Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,		Value emitReduction(Value VectorizedValue, IRBuilder<> &Builder,
unsigned ReduxWidth, ArrayRef<Value *> RedOps,		unsigned ReduxWidth, ArrayRef<Value *> RedOps,
const TargetTransformInfo *TTI) {		const TargetTransformInfo *TTI) {
assert(VectorizedValue && "Need to have a vectorized tree node");		assert(VectorizedValue && "Need to have a vectorized tree node");
assert(isPowerOf2_32(ReduxWidth) &&		assert(isPowerOf2_32(ReduxWidth) &&
"We only handle power-of-two reductions for now");		"We only handle power-of-two reductions for now");

if (!IsPairwiseReduction)		if (!IsPairwiseReduction)
return createSimpleTargetReduction(		return createSimpleTargetReduction(
Builder, TTI, ReductionData.getOpcode(), VectorizedValue,		Builder, TTI, ReductionData.getOpcode(), VectorizedValue,
TargetTransformInfo::ReductionFlags(), RedOps);		ReductionData.getFlags(), RedOps);

Value *TmpVec = VectorizedValue;		Value *TmpVec = VectorizedValue;
for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {		for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {
Value *LeftMask =		Value *LeftMask =
createRdxShuffleMask(ReduxWidth, i, true, true, Builder);		createRdxShuffleMask(ReduxWidth, i, true, true, Builder);
Value *RightMask =		Value *RightMask =
createRdxShuffleMask(ReduxWidth, i, true, false, Builder);		createRdxShuffleMask(ReduxWidth, i, true, false, Builder);

Value *LeftShuf = Builder.CreateShuffleVector(		Value *LeftShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");		TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
Value *RightShuf = Builder.CreateShuffleVector(		Value *RightShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),		TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
"rdx.shuf.r");		"rdx.shuf.r");
OperationData VectReductionData(ReductionData.getOpcode(), LeftShuf,		OperationData VectReductionData(ReductionData.getOpcode(), LeftShuf,
RightShuf);		RightShuf, ReductionData.getKind());
TmpVec = VectReductionData.createOp(Builder, "bin.rdx");		TmpVec = VectReductionData.createOp(Builder, "op.rdx");
propagateIRFlags(TmpVec, RedOps);		propagateIRFlags(TmpVec, RedOps);
}		}

// The result is in the first element of the vector.		// The result is in the first element of the vector.
return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));		return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

/// \brief Recognize construction of vectors like		/// \brief Recognize construction of vectors like
/// %ra = insertelement <4 x float> undef, float %s0, i32 0		/// %ra = insertelement <4 x float> undef, float %s0, i32 0
/// %rb = insertelement <4 x float> %ra, float %s1, i32 1		/// %rb = insertelement <4 x float> %ra, float %s1, i32 1
/// %rc = insertelement <4 x float> %rb, float %s2, i32 2		/// %rc = insertelement <4 x float> %rb, float %s2, i32 2
/// %rd = insertelement <4 x float> %rc, float %s3, i32 3		/// %rd = insertelement <4 x float> %rc, float %s3, i32 3
		/// starting from the last insertelement instruction.
///		///
/// Returns true if it matches		/// Returns true if it matches
///		///
static bool findBuildVector(InsertElementInst *FirstInsertElem,		static bool findBuildVector(InsertElementInst *LastInsertElem,
SmallVectorImpl<Value *> &BuildVector,		SmallVectorImpl<Value *> &BuildVector,
SmallVectorImpl<Value *> &BuildVectorOpds) {		SmallVectorImpl<Value *> &BuildVectorOpds) {
if (!isa<UndefValue>(FirstInsertElem->getOperand(0)))		Value *V = nullptr;
return false;		do {
		BuildVector.push_back(LastInsertElem);
InsertElementInst *IE = FirstInsertElem;		BuildVectorOpds.push_back(LastInsertElem->getOperand(1));
while (true) {		V = LastInsertElem->getOperand(0);
BuildVector.push_back(IE);		if (isa<UndefValue>(V))
BuildVectorOpds.push_back(IE->getOperand(1));		break;
		LastInsertElem = dyn_cast<InsertElementInst>(V);
if (IE->use_empty())		if (!LastInsertElem \|\| !LastInsertElem->hasOneUse())
return false;		return false;
		} while (true);
InsertElementInst *NextUse = dyn_cast<InsertElementInst>(IE->user_back());		std::reverse(BuildVector.begin(), BuildVector.end());
if (!NextUse)		std::reverse(BuildVectorOpds.begin(), BuildVectorOpds.end());
return true;		return true;

// If this isn't the final use, make sure the next insertelement is the only
// use. It's OK if the final constructed vector is used multiple times
if (!IE->hasOneUse())
return false;

IE = NextUse;
}		}

return false;		/// \brief Like findBuildVector, but looks for construction of aggregate.
}

/// \brief Like findBuildVector, but looks backwards for construction of aggregate.
///		///
/// \return true if it matches.		/// \return true if it matches.
static bool findBuildAggregate(InsertValueInst *IV,		static bool findBuildAggregate(InsertValueInst *IV,
SmallVectorImpl<Value *> &BuildVector,		SmallVectorImpl<Value *> &BuildVector,
SmallVectorImpl<Value *> &BuildVectorOpds) {		SmallVectorImpl<Value *> &BuildVectorOpds) {
Value *V;		Value *V;
do {		do {
BuildVector.push_back(IV);		BuildVector.push_back(IV);
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	while (!Stack.empty()) {
Value *V;		Value *V;
unsigned Level;		unsigned Level;
std::tie(V, Level) = Stack.pop_back_val();		std::tie(V, Level) = Stack.pop_back_val();
if (!V)		if (!V)
continue;		continue;
auto *Inst = dyn_cast<Instruction>(V);		auto *Inst = dyn_cast<Instruction>(V);
if (!Inst)		if (!Inst)
continue;		continue;
if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {		auto *BI = dyn_cast<BinaryOperator>(Inst);
		auto *SI = dyn_cast<SelectInst>(Inst);
		if (BI \|\| SI) {
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, BI)) {		if (HorRdx.matchAssociativeReduction(P, Inst)) {
if (HorRdx.tryToReduce(R, TTI)) {		if (HorRdx.tryToReduce(R, TTI)) {
Res = true;		Res = true;
// Set P to nullptr to avoid re-analysis of phi node in		// Set P to nullptr to avoid re-analysis of phi node in
// matchAssociativeReduction function unless this is the root node.		// matchAssociativeReduction function unless this is the root node.
P = nullptr;		P = nullptr;
continue;		continue;
}		}
}		}
if (P) {		if (P && BI) {
Inst = dyn_cast<Instruction>(BI->getOperand(0));		Inst = dyn_cast<Instruction>(BI->getOperand(0));
if (Inst == P)		if (Inst == P)
Inst = dyn_cast<Instruction>(BI->getOperand(1));		Inst = dyn_cast<Instruction>(BI->getOperand(1));
if (!Inst) {		if (!Inst) {
// Set P to nullptr to avoid re-analysis of phi node in		// Set P to nullptr to avoid re-analysis of phi node in
// matchAssociativeReduction function unless this is the root node.		// matchAssociativeReduction function unless this is the root node.
P = nullptr;		P = nullptr;
continue;		continue;
Show All 35 Lines	bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {		auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {
return tryToVectorize(I, R);		return tryToVectorize(I, R);
};		};
return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,		return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,
ExtraVectorization);		ExtraVectorization);
}		}

		bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,
		BasicBlock *BB, BoUpSLP &R) {
		const DataLayout &DL = BB->getModule()->getDataLayout();
		if (!R.canMapToVector(IVI->getType(), DL))
		return false;

		SmallVector<Value *, 16> BuildVector;
		SmallVector<Value *, 16> BuildVectorOpds;
		if (!findBuildAggregate(IVI, BuildVector, BuildVectorOpds))
		return false;

		DEBUG(dbgs() << "SLP: array mappable to vector: " << *IVI << "\n");
		return tryToVectorizeList(BuildVectorOpds, R, BuildVector, false);
		}

		bool SLPVectorizerPass::vectorizeInsertElementInst(InsertElementInst *IEI,
		BasicBlock *BB, BoUpSLP &R) {
		SmallVector<Value *, 16> BuildVector;
		SmallVector<Value *, 16> BuildVectorOpds;
		if (!findBuildVector(IEI, BuildVector, BuildVectorOpds))
		return false;

		// Vectorize starting with the build vector operands ignoring the BuildVector
		// instructions for the purpose of scheduling and user extraction.
		return tryToVectorizeList(BuildVectorOpds, R, BuildVector);
		}

		bool SLPVectorizerPass::vectorizeCmpInst(CmpInst CI, BasicBlock BB,
		BoUpSLP &R) {
		if (tryToVectorizePair(CI->getOperand(0), CI->getOperand(1), R))
		return true;

		bool OpsChanged = false;
		for (int Idx = 0; Idx < 2; ++Idx) {
		OpsChanged \|=
		vectorizeRootInstruction(nullptr, CI->getOperand(Idx), BB, R, TTI);
		}
		return OpsChanged;
		}

		bool SLPVectorizerPass::vectorizeSimpleInstructions(
		SmallVectorImpl<WeakVH> &Instructions, BasicBlock *BB, BoUpSLP &R) {
		bool OpsChanged = false;
		for (auto &VH : reverse(Instructions)) {
		auto *I = dyn_cast_or_null<Instruction>(VH);
		if (!I)
		continue;
		if (auto *LastInsertValue = dyn_cast<InsertValueInst>(I))
		OpsChanged \|= vectorizeInsertValueInst(LastInsertValue, BB, R);
		else if (auto *LastInsertElem = dyn_cast<InsertElementInst>(I))
		OpsChanged \|= vectorizeInsertElementInst(LastInsertElem, BB, R);
		else if (auto *CI = dyn_cast<CmpInst>(I))
		OpsChanged \|= vectorizeCmpInst(CI, BB, R);
		}
		Instructions.clear();
		return OpsChanged;
		}

bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {		bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
bool Changed = false;		bool Changed = false;
SmallVector<Value *, 4> Incoming;		SmallVector<Value *, 4> Incoming;
SmallSet<Value *, 16> VisitedInstrs;		SmallSet<Value *, 16> VisitedInstrs;

bool HaveVectorizedPhiNodes = true;		bool HaveVectorizedPhiNodes = true;
while (HaveVectorizedPhiNodes) {		while (HaveVectorizedPhiNodes) {
HaveVectorizedPhiNodes = false;		HaveVectorizedPhiNodes = false;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),

// Start over at the next instruction of a different type (or the end).		// Start over at the next instruction of a different type (or the end).
IncIt = SameTypeIt;		IncIt = SameTypeIt;
}		}
}		}

VisitedInstrs.clear();		VisitedInstrs.clear();

		SmallVector<WeakVH, 8> PostProcessInstructions;
		SmallDenseSet<Instruction *, 4> KeyNodes;
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; it++) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; it++) {
// We may go through BB multiple times so skip the one we have checked.		// We may go through BB multiple times so skip the one we have checked.
if (!VisitedInstrs.insert(&*it).second)		if (!VisitedInstrs.insert(&*it).second) {
		if (it->use_empty() && KeyNodes.count(&*it) > 0 &&
		vectorizeSimpleInstructions(PostProcessInstructions, BB, R)) {
		// We would like to start over since some instructions are deleted
		// and the iterator may become invalid value.
		Changed = true;
		it = BB->begin();
		e = BB->end();
		}
continue;		continue;
		}

if (isa<DbgInfoIntrinsic>(it))		if (isa<DbgInfoIntrinsic>(it))
continue;		continue;

// Try to vectorize reductions that use PHINodes.		// Try to vectorize reductions that use PHINodes.
if (PHINode *P = dyn_cast<PHINode>(it)) {		if (PHINode *P = dyn_cast<PHINode>(it)) {
// Check that the PHI is a reduction PHI.		// Check that the PHI is a reduction PHI.
if (P->getNumIncomingValues() != 2)		if (P->getNumIncomingValues() != 2)
return Changed;		return Changed;

// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
if (vectorizeRootInstruction(P, getReductionValue(DT, P, BB, LI), BB, R,		if (vectorizeRootInstruction(P, getReductionValue(DT, P, BB, LI), BB, R,
TTI)) {		TTI)) {
Changed = true;		Changed = true;
it = BB->begin();		it = BB->begin();
e = BB->end();		e = BB->end();
continue;		continue;
}		}
continue;		continue;
}		}

if (ShouldStartVectorizeHorAtStore) {		// Ran into an instruction without users, like terminator, or function call
if (StoreInst *SI = dyn_cast<StoreInst>(it)) {		// with ignored return value, store. Ignore unused instructions (basing on
		// instruction type, except for CallInst and InvokeInst).
		if (it->use_empty() && (it->getType()->isVoidTy() \|\| isa<CallInst>(it) \|\|
		isa<InvokeInst>(it))) {
		KeyNodes.insert(&*it);
		bool OpsChanged = false;
		if (ShouldStartVectorizeHorAtStore \|\| !isa<StoreInst>(it)) {
		for (auto *V : it->operand_values()) {
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
if (vectorizeRootInstruction(nullptr, SI->getValueOperand(), BB, R,		OpsChanged \|= vectorizeRootInstruction(nullptr, V, BB, R, TTI);
TTI)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
}		}
}		}
		// Start vectorization of post-process list of instructions from the
// Try to vectorize horizontal reductions feeding into a return.		// top-tree instructions to try to vectorize as many instructions as
if (ReturnInst *RI = dyn_cast<ReturnInst>(it)) {		// possible.
if (RI->getNumOperands() != 0) {		OpsChanged \|= vectorizeSimpleInstructions(PostProcessInstructions, BB, R);
// Try to match and vectorize a horizontal reduction.		if (OpsChanged) {
if (vectorizeRootInstruction(nullptr, RI->getOperand(0), BB, R, TTI)) {
Changed = true;
it = BB->begin();
e = BB->end();
continue;
}
}
}

// Try to vectorize trees that start at compare instructions.
if (CmpInst *CI = dyn_cast<CmpInst>(it)) {
if (tryToVectorizePair(CI->getOperand(0), CI->getOperand(1), R)) {
Changed = true;
// We would like to start over since some instructions are deleted		// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.		// and the iterator may become invalid value.
it = BB->begin();
e = BB->end();
continue;
}

for (int I = 0; I < 2; ++I) {
if (vectorizeRootInstruction(nullptr, CI->getOperand(I), BB, R, TTI)) {
Changed = true;		Changed = true;
// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.
it = BB->begin();		it = BB->begin();
e = BB->end();		e = BB->end();
break;
}
}
continue;
}

// Try to vectorize trees that start at insertelement instructions.
if (InsertElementInst *FirstInsertElem = dyn_cast<InsertElementInst>(it)) {
SmallVector<Value *, 16> BuildVector;
SmallVector<Value *, 16> BuildVectorOpds;
if (!findBuildVector(FirstInsertElem, BuildVector, BuildVectorOpds))
continue;		continue;

// Vectorize starting with the build vector operands ignoring the
// BuildVector instructions for the purpose of scheduling and user
// extraction.
if (tryToVectorizeList(BuildVectorOpds, R, BuildVector)) {
Changed = true;
it = BB->begin();
e = BB->end();
}		}

continue;
}		}

// Try to vectorize trees that start at insertvalue instructions feeding into		if (isa<InsertElementInst>(it) \|\| isa<CmpInst>(it) \|\|
// a store.		isa<InsertValueInst>(it))
if (StoreInst *SI = dyn_cast<StoreInst>(it)) {		PostProcessInstructions.push_back(&*it);
if (InsertValueInst *LastInsertValue = dyn_cast<InsertValueInst>(SI->getValueOperand())) {
const DataLayout &DL = BB->getModule()->getDataLayout();
if (R.canMapToVector(SI->getValueOperand()->getType(), DL)) {
SmallVector<Value *, 16> BuildVector;
SmallVector<Value *, 16> BuildVectorOpds;
if (!findBuildAggregate(LastInsertValue, BuildVector, BuildVectorOpds))
continue;

DEBUG(dbgs() << "SLP: store of array mappable to vector: " << *SI << "\n");
if (tryToVectorizeList(BuildVectorOpds, R, BuildVector, false)) {
Changed = true;
it = BB->begin();
e = BB->end();
}
continue;
}
}
}
}		}
		assert(PostProcessInstructions.empty());

return Changed;		return Changed;
}		}

bool SLPVectorizerPass::vectorizeGEPIndices(BasicBlock *BB, BoUpSLP &R) {		bool SLPVectorizerPass::vectorizeGEPIndices(BasicBlock *BB, BoUpSLP &R) {
auto Changed = false;		auto Changed = false;
for (auto &Entry : GEPs) {		for (auto &Entry : GEPs) {

▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	Show All 25 Lines
	; DEFAULT-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], undef			; DEFAULT-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], undef
	; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP2]])			; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP2]])
	; DEFAULT-NEXT: [[BIN_EXTRA]] = add i32 [[TMP3]], [[TMP17]]			; DEFAULT-NEXT: [[BIN_EXTRA]] = add i32 [[TMP3]], [[TMP17]]
	; DEFAULT-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], undef			; DEFAULT-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], undef
	; DEFAULT-NEXT: br label [[FOR_BODY]]			; DEFAULT-NEXT: br label [[FOR_BODY]]
	;			;
	; GATHER-LABEL: @PR28330(			; GATHER-LABEL: @PR28330(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq i8 [[TMP0]], 0			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: [[TMP2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	; GATHER-NEXT: [[TMP3:%.*]] = icmp eq i8 [[TMP2]], 0
	; GATHER-NEXT: [[TMP4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			; GATHER-NEXT: [[TMP4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; GATHER-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0			; GATHER-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0
	; GATHER-NEXT: [[TMP6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4			; GATHER-NEXT: [[TMP6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; GATHER-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 0			; GATHER-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 0
	; GATHER-NEXT: [[TMP8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1			; GATHER-NEXT: [[TMP8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; GATHER-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP8]], 0			; GATHER-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP8]], 0
	; GATHER-NEXT: [[TMP10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2			; GATHER-NEXT: [[TMP10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; GATHER-NEXT: [[TMP11:%.*]] = icmp eq i8 [[TMP10]], 0			; GATHER-NEXT: [[TMP11:%.*]] = icmp eq i8 [[TMP10]], 0
	; GATHER-NEXT: [[TMP12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1			; GATHER-NEXT: [[TMP12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; GATHER-NEXT: [[TMP13:%.*]] = icmp eq i8 [[TMP12]], 0			; GATHER-NEXT: [[TMP13:%.*]] = icmp eq i8 [[TMP12]], 0
	; GATHER-NEXT: [[TMP14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8			; GATHER-NEXT: [[TMP14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; GATHER-NEXT: [[TMP15:%.*]] = icmp eq i8 [[TMP14]], 0			; GATHER-NEXT: [[TMP15:%.*]] = icmp eq i8 [[TMP14]], 0
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[TMP17:%.]] = phi i32 [ [[BIN_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[TMP17:%.]] = phi i32 [ [[BIN_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP19:%.*]] = select i1 [[TMP1]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> <i32 -720, i32 -720>, <2 x i32> <i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP20:%.*]] = add i32 [[TMP17]], [[TMP19]]			; GATHER-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; GATHER-NEXT: [[TMP21:%.*]] = select i1 [[TMP3]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP20:%.*]] = add i32 [[TMP17]], [[TMP3]]
	; GATHER-NEXT: [[TMP22:%.*]] = add i32 [[TMP20]], [[TMP21]]			; GATHER-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
				; GATHER-NEXT: [[TMP22:%.*]] = add i32 [[TMP20]], [[TMP4]]
	; GATHER-NEXT: [[TMP23:%.*]] = select i1 [[TMP5]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP23:%.*]] = select i1 [[TMP5]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP24:%.*]] = add i32 [[TMP22]], [[TMP23]]			; GATHER-NEXT: [[TMP24:%.*]] = add i32 [[TMP22]], [[TMP23]]
	; GATHER-NEXT: [[TMP25:%.*]] = select i1 [[TMP7]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP25:%.*]] = select i1 [[TMP7]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP26:%.*]] = add i32 [[TMP24]], [[TMP25]]			; GATHER-NEXT: [[TMP26:%.*]] = add i32 [[TMP24]], [[TMP25]]
	; GATHER-NEXT: [[TMP27:%.*]] = select i1 [[TMP9]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP27:%.*]] = select i1 [[TMP9]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP28:%.*]] = add i32 [[TMP26]], [[TMP27]]			; GATHER-NEXT: [[TMP28:%.*]] = add i32 [[TMP26]], [[TMP27]]
	; GATHER-NEXT: [[TMP29:%.*]] = select i1 [[TMP11]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP29:%.*]] = select i1 [[TMP11]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP29]]			; GATHER-NEXT: [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP29]]
	; GATHER-NEXT: [[TMP31:%.*]] = select i1 [[TMP13]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP31:%.*]] = select i1 [[TMP13]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], [[TMP31]]			; GATHER-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], [[TMP31]]
	; GATHER-NEXT: [[TMP33:%.*]] = select i1 [[TMP15]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP33:%.*]] = select i1 [[TMP15]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP19]], i32 0			; GATHER-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> undef, i32 [[TMP3]], i32 0
	; GATHER-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> [[TMP0]], i32 [[TMP21]], i32 1			; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP4]], i32 1
	; GATHER-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP23]], i32 2			; GATHER-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP23]], i32 2
	; GATHER-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP25]], i32 3			; GATHER-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP25]], i32 3
	; GATHER-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP27]], i32 4			; GATHER-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[TMP27]], i32 4
	; GATHER-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP29]], i32 5			; GATHER-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[TMP29]], i32 5
	; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP31]], i32 6			; GATHER-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP10]], i32 [[TMP31]], i32 6
	; GATHER-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP33]], i32 7			; GATHER-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP7]])			; GATHER-NEXT: [[TMP13:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP12]])
	; GATHER-NEXT: [[BIN_EXTRA]] = add i32 [[TMP8]], [[TMP17]]			; GATHER-NEXT: [[BIN_EXTRA]] = add i32 [[TMP13]], [[TMP17]]
	; GATHER-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], [[TMP33]]			; GATHER-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], [[TMP33]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR28330(			; MAX-COST-LABEL: @PR28330(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq i8 [[TMP0]], 0			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq i8 [[TMP0]], 0
	; MAX-COST-NEXT: [[TMP2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			; MAX-COST-NEXT: [[TMP2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; DEFAULT-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], undef			; DEFAULT-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], undef
	; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP2]])			; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP2]])
	; DEFAULT-NEXT: [[BIN_EXTRA]] = add i32 [[TMP3]], -5			; DEFAULT-NEXT: [[BIN_EXTRA]] = add i32 [[TMP3]], -5
	; DEFAULT-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], undef			; DEFAULT-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], undef
	; DEFAULT-NEXT: br label [[FOR_BODY]]			; DEFAULT-NEXT: br label [[FOR_BODY]]
	;			;
	; GATHER-LABEL: @PR32038(			; GATHER-LABEL: @PR32038(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq i8 [[TMP0]], 0			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: [[TMP2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	; GATHER-NEXT: [[TMP3:%.*]] = icmp eq i8 [[TMP2]], 0
	; GATHER-NEXT: [[TMP4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			; GATHER-NEXT: [[TMP4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; GATHER-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0			; GATHER-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0
	; GATHER-NEXT: [[TMP6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4			; GATHER-NEXT: [[TMP6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; GATHER-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 0			; GATHER-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 0
	; GATHER-NEXT: [[TMP8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1			; GATHER-NEXT: [[TMP8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; GATHER-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP8]], 0			; GATHER-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP8]], 0
	; GATHER-NEXT: [[TMP10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2			; GATHER-NEXT: [[TMP10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; GATHER-NEXT: [[TMP11:%.*]] = icmp eq i8 [[TMP10]], 0			; GATHER-NEXT: [[TMP11:%.*]] = icmp eq i8 [[TMP10]], 0
	; GATHER-NEXT: [[TMP12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1			; GATHER-NEXT: [[TMP12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; GATHER-NEXT: [[TMP13:%.*]] = icmp eq i8 [[TMP12]], 0			; GATHER-NEXT: [[TMP13:%.*]] = icmp eq i8 [[TMP12]], 0
	; GATHER-NEXT: [[TMP14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8			; GATHER-NEXT: [[TMP14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; GATHER-NEXT: [[TMP15:%.*]] = icmp eq i8 [[TMP14]], 0			; GATHER-NEXT: [[TMP15:%.*]] = icmp eq i8 [[TMP14]], 0
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[TMP17:%.]] = phi i32 [ [[BIN_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[TMP17:%.]] = phi i32 [ [[BIN_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP19:%.*]] = select i1 [[TMP1]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> <i32 -720, i32 -720>, <2 x i32> <i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP20:%.*]] = add i32 -5, [[TMP19]]			; GATHER-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; GATHER-NEXT: [[TMP21:%.*]] = select i1 [[TMP3]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP20:%.*]] = add i32 -5, [[TMP3]]
	; GATHER-NEXT: [[TMP22:%.*]] = add i32 [[TMP20]], [[TMP21]]			; GATHER-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
				; GATHER-NEXT: [[TMP22:%.*]] = add i32 [[TMP20]], [[TMP4]]
	; GATHER-NEXT: [[TMP23:%.*]] = select i1 [[TMP5]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP23:%.*]] = select i1 [[TMP5]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP24:%.*]] = add i32 [[TMP22]], [[TMP23]]			; GATHER-NEXT: [[TMP24:%.*]] = add i32 [[TMP22]], [[TMP23]]
	; GATHER-NEXT: [[TMP25:%.*]] = select i1 [[TMP7]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP25:%.*]] = select i1 [[TMP7]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP26:%.*]] = add i32 [[TMP24]], [[TMP25]]			; GATHER-NEXT: [[TMP26:%.*]] = add i32 [[TMP24]], [[TMP25]]
	; GATHER-NEXT: [[TMP27:%.*]] = select i1 [[TMP9]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP27:%.*]] = select i1 [[TMP9]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP28:%.*]] = add i32 [[TMP26]], [[TMP27]]			; GATHER-NEXT: [[TMP28:%.*]] = add i32 [[TMP26]], [[TMP27]]
	; GATHER-NEXT: [[TMP29:%.*]] = select i1 [[TMP11]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP29:%.*]] = select i1 [[TMP11]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP29]]			; GATHER-NEXT: [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP29]]
	; GATHER-NEXT: [[TMP31:%.*]] = select i1 [[TMP13]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP31:%.*]] = select i1 [[TMP13]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], [[TMP31]]			; GATHER-NEXT: [[TMP32:%.*]] = add i32 [[TMP30]], [[TMP31]]
	; GATHER-NEXT: [[TMP33:%.*]] = select i1 [[TMP15]], i32 -720, i32 -80			; GATHER-NEXT: [[TMP33:%.*]] = select i1 [[TMP15]], i32 -720, i32 -80
	; GATHER-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> undef, i32 [[TMP19]], i32 0			; GATHER-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> undef, i32 [[TMP3]], i32 0
	; GATHER-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> [[TMP0]], i32 [[TMP21]], i32 1			; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP4]], i32 1
	; GATHER-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP23]], i32 2			; GATHER-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP23]], i32 2
	; GATHER-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP25]], i32 3			; GATHER-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP25]], i32 3
	; GATHER-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP27]], i32 4			; GATHER-NEXT: [[TMP9:%.*]] = insertelement <8 x i32> [[TMP8]], i32 [[TMP27]], i32 4
	; GATHER-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP29]], i32 5			; GATHER-NEXT: [[TMP10:%.*]] = insertelement <8 x i32> [[TMP9]], i32 [[TMP29]], i32 5
	; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP31]], i32 6			; GATHER-NEXT: [[TMP11:%.*]] = insertelement <8 x i32> [[TMP10]], i32 [[TMP31]], i32 6
	; GATHER-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP33]], i32 7			; GATHER-NEXT: [[TMP12:%.*]] = insertelement <8 x i32> [[TMP11]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP7]])			; GATHER-NEXT: [[TMP13:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP12]])
	; GATHER-NEXT: [[BIN_EXTRA]] = add i32 [[TMP8]], -5			; GATHER-NEXT: [[BIN_EXTRA]] = add i32 [[TMP13]], -5
	; GATHER-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], [[TMP33]]			; GATHER-NEXT: [[TMP34:%.*]] = add i32 [[TMP32]], [[TMP33]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR32038(			; MAX-COST-LABEL: @PR32038(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq i8 [[TMP0]], 0			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[TMP2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	; MAX-COST-NEXT: [[TMP3:%.*]] = icmp eq i8 [[TMP2]], 0
	; MAX-COST-NEXT: [[TMP4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			; MAX-COST-NEXT: [[TMP4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; MAX-COST-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 0			; MAX-COST-NEXT: [[TMPP5:%.*]] = icmp eq i8 [[TMP4]], 0
	; MAX-COST-NEXT: [[TMP6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4			; MAX-COST-NEXT: [[TMP6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; MAX-COST-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 0			; MAX-COST-NEXT: [[TMPP7:%.*]] = icmp eq i8 [[TMP6]], 0
	; MAX-COST-NEXT: [[TMP8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1			; MAX-COST-NEXT: [[TMP8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; MAX-COST-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP8]], 0			; MAX-COST-NEXT: [[TMP9:%.*]] = icmp eq i8 [[TMP8]], 0
	; MAX-COST-NEXT: [[TMP10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2			; MAX-COST-NEXT: [[TMP10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[TMP11:%.*]] = icmp eq i8 [[TMP10]], 0			; MAX-COST-NEXT: [[TMP11:%.*]] = icmp eq i8 [[TMP10]], 0
	; MAX-COST-NEXT: [[TMP12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1			; MAX-COST-NEXT: [[TMP12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[TMP13:%.*]] = icmp eq i8 [[TMP12]], 0			; MAX-COST-NEXT: [[TMP13:%.*]] = icmp eq i8 [[TMP12]], 0
	; MAX-COST-NEXT: [[TMP14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8			; MAX-COST-NEXT: [[TMP14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[TMP15:%.*]] = icmp eq i8 [[TMP14]], 0			; MAX-COST-NEXT: [[TMP15:%.*]] = icmp eq i8 [[TMP14]], 0
	; MAX-COST-NEXT: [[TMP0:%.*]] = insertelement <4 x i1> undef, i1 [[TMP1]], i32 0
	; MAX-COST-NEXT: [[TMP1:%.*]] = insertelement <4 x i1> [[TMP0]], i1 [[TMP3]], i32 1
	; MAX-COST-NEXT: [[TMP2:%.*]] = insertelement <4 x i1> [[TMP1]], i1 [[TMP5]], i32 2
	; MAX-COST-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> [[TMP2]], i1 [[TMP7]], i32 3
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[TMP17:%.]] = phi i32 [ [[TMP34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[TMP17:%.]] = phi i32 [ [[TMP34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>			; MAX-COST-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
				; MAX-COST-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> undef, i1 [[TMP2]], i32 0
				; MAX-COST-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
				; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[TMP4]], i32 1
				; MAX-COST-NEXT: [[TMP6:%.*]] = insertelement <4 x i1> [[TMP5]], i1 [[TMPP5]], i32 2
				; MAX-COST-NEXT: [[TMP7:%.*]] = insertelement <4 x i1> [[TMP6]], i1 [[TMPP7]], i32 3
				; MAX-COST-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP7]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[TMP20:%.*]] = add i32 -5, undef			; MAX-COST-NEXT: [[TMP20:%.*]] = add i32 -5, undef
	; MAX-COST-NEXT: [[TMP22:%.*]] = add i32 [[TMP20]], undef			; MAX-COST-NEXT: [[TMP22:%.*]] = add i32 [[TMP20]], undef
	; MAX-COST-NEXT: [[TMP24:%.*]] = add i32 [[TMP22]], undef			; MAX-COST-NEXT: [[TMP24:%.*]] = add i32 [[TMP22]], undef
	; MAX-COST-NEXT: [[TMP26:%.*]] = add i32 [[TMP24]], undef			; MAX-COST-NEXT: [[TMP26:%.*]] = add i32 [[TMP24]], undef
	; MAX-COST-NEXT: [[TMP27:%.*]] = select i1 [[TMP9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP27:%.*]] = select i1 [[TMP9]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP28:%.*]] = add i32 [[TMP26]], [[TMP27]]			; MAX-COST-NEXT: [[TMP28:%.*]] = add i32 [[TMP26]], [[TMP27]]
	; MAX-COST-NEXT: [[TMP29:%.*]] = select i1 [[TMP11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP29:%.*]] = select i1 [[TMP11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> [[TMP4]])			; MAX-COST-NEXT: [[TMP9:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> [[TMP8]])
	; MAX-COST-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], [[TMP27]]			; MAX-COST-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[TMP27]]
	; MAX-COST-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], [[TMP29]]			; MAX-COST-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], [[TMP29]]
	; MAX-COST-NEXT: [[BIN_EXTRA:%.*]] = add i32 [[TMP7]], -5			; MAX-COST-NEXT: [[BIN_EXTRA:%.*]] = add i32 [[TMP11]], -5
	; MAX-COST-NEXT: [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP29]]			; MAX-COST-NEXT: [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP29]]
	; MAX-COST-NEXT: [[TMP31:%.*]] = select i1 [[TMP13]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP31:%.*]] = select i1 [[TMP13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP32:%.*]] = add i32 [[BIN_EXTRA]], [[TMP31]]			; MAX-COST-NEXT: [[TMP32:%.*]] = add i32 [[BIN_EXTRA]], [[TMP31]]
	; MAX-COST-NEXT: [[TMP33:%.*]] = select i1 [[TMP15]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP33:%.*]] = select i1 [[TMP15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP34]] = add i32 [[TMP32]], [[TMP33]]			; MAX-COST-NEXT: [[TMP34]] = add i32 [[TMP32]], [[TMP33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	Show All 38 Lines

test/Transforms/SLPVectorizer/X86/horizontal-list.ll

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float undef, [[ADD19_1]]		; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float undef, [[ADD19_1]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]		; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], [[CONV6]]		; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]		; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
; CHECK-NEXT: store float [[BIN_EXTRA5]], float* @res, align 4		; CHECK-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
; CHECK-NEXT: ret float [[BIN_EXTRA5]]		; CHECK-NEXT: ret float [[OP_EXTRA5]]
;		;
; THRESHOLD-LABEL: @bazz(		; THRESHOLD-LABEL: @bazz(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3		; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16		; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16		; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
Show All 10 Lines
; THRESHOLD-NEXT: [[ADD19_2:%.*]] = fadd fast float undef, [[ADD19_1]]		; THRESHOLD-NEXT: [[ADD19_2:%.*]] = fadd fast float undef, [[ADD19_1]]
; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]		; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]
; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]		; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], [[CONV6]]		; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
; THRESHOLD-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]		; THRESHOLD-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
; THRESHOLD-NEXT: store float [[BIN_EXTRA5]], float* @res, align 4		; THRESHOLD-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]		; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
;		;
entry:		entry:
%0 = load i32, i32* @n, align 4		%0 = load i32, i32* @n, align 4
%mul = mul nsw i32 %0, 3		%mul = mul nsw i32 %0, 3
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16		%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
%mul4 = fmul fast float %2, %1		%mul4 = fmul fast float %2, %1
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	entry:
%conv4 = fptosi float %12 to i32		%conv4 = fptosi float %12 to i32
store i32 %conv4, i32* @n, align 4		store i32 %conv4, i32* @n, align 4
ret i32 %conv4		ret i32 %conv4
}		}

define float @bar() {		define float @bar() {
; CHECK-LABEL: @bar(		; CHECK-LABEL: @bar(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr to <2 x float>*), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP0]]		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]		; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]
; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float [[TMP3]], float [[TMP4]]		; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef, float undef
; CHECK-NEXT: [[TMP5:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; CHECK-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8		; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]], [[TMP5]]
; CHECK-NEXT: [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]		; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float [[MAX_0_MUL3]], float undef
; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]], [[MUL3_1]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float [[MAX_0_MUL3]], float [[MUL3_1]]		; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]], [[TMP6]]
; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP8:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]], [[MUL3_2]]		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float [[MAX_0_MUL3_1]], float [[MUL3_2]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: store float [[MAX_0_MUL3_2]], float* @res, align 4		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]
; CHECK-NEXT: ret float [[MAX_0_MUL3_2]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0
		; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float [[MAX_0_MUL3_1]], float undef
		; CHECK-NEXT: store float [[TMP7]], float* @res, align 4
		; CHECK-NEXT: ret float [[TMP7]]
;		;
; THRESHOLD-LABEL: @bar(		; THRESHOLD-LABEL: @bar(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[TMP0:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr to <2 x float>*), align 16		; THRESHOLD-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
; THRESHOLD-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16		; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP0]]		; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0		; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1		; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[TMP2]], i32 1
; THRESHOLD-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]		; THRESHOLD-NEXT: [[CMP4:%.*]] = fcmp fast ogt float [[TMP3]], [[TMP4]]
; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float [[TMP3]], float [[TMP4]]		; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef, float undef
; THRESHOLD-NEXT: [[TMP5:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2), align 8		; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
; THRESHOLD-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2), align 8		; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]], [[TMP5]]
; THRESHOLD-NEXT: [[MUL3_1:%.*]] = fmul fast float [[TMP6]], [[TMP5]]		; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float [[MAX_0_MUL3]], float undef
; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]], [[MUL3_1]]		; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP2]], i32 3
; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float [[MAX_0_MUL3]], float [[MUL3_1]]		; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]], [[TMP6]]
; THRESHOLD-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 3), align 4		; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[TMP8:%.]] = load float, float getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 3), align 4		; THRESHOLD-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]
; THRESHOLD-NEXT: [[MUL3_2:%.*]] = fmul fast float [[TMP8]], [[TMP7]]		; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]], [[MUL3_2]]		; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float [[MAX_0_MUL3_1]], float [[MUL3_2]]		; THRESHOLD-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: store float [[MAX_0_MUL3_2]], float* @res, align 4		; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]
; THRESHOLD-NEXT: ret float [[MAX_0_MUL3_2]]		; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0
		; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float [[MAX_0_MUL3_1]], float undef
		; THRESHOLD-NEXT: store float [[TMP7]], float* @res, align 4
		; THRESHOLD-NEXT: ret float [[TMP7]]
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16		%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
%mul = fmul fast float %1, %0		%mul = fmul fast float %1, %0
%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4		%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
%3 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 1), align 4
%mul3 = fmul fast float %3, %2		%mul3 = fmul fast float %3, %2
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]		; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]
; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]		; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]		; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]
; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]		; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0
; CHECK-NEXT: [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]], [[TMP5]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
; CHECK-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]		; CHECK-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
; CHECK-NEXT: ret float [[BIN_RDX17]]		; CHECK-NEXT: ret float [[OP_RDX]]
;		;
; THRESHOLD-LABEL: @f(		; THRESHOLD-LABEL: @f(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3
; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 4		; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 4
; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 5		; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 5
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]		; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]
; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]		; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]		; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]
; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]		; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0		; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0
; THRESHOLD-NEXT: [[BIN_RDX17:%.*]] = fadd fast float [[TMP4]], [[TMP5]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
; THRESHOLD-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]		; THRESHOLD-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
; THRESHOLD-NEXT: ret float [[BIN_RDX17]]		; THRESHOLD-NEXT: ret float [[OP_RDX]]
;		;
entry:		entry:
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1		%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
%1 = load float, float* %arrayidx.1, align 4		%1 = load float, float* %arrayidx.1, align 4
%add.1 = fadd fast float %1, %0		%add.1 = fadd fast float %1, %0
%arrayidx.2 = getelementptr inbounds float, float* %x, i64 2		%arrayidx.2 = getelementptr inbounds float, float* %x, i64 2
%2 = load float, float* %arrayidx.2, align 4		%2 = load float, float* %arrayidx.2, align 4
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]		; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]		; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
; CHECK-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]		; CHECK-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
; CHECK-NEXT: ret float [[BIN_EXTRA]]		; CHECK-NEXT: ret float [[OP_EXTRA]]
;		;
; THRESHOLD-LABEL: @f1(		; THRESHOLD-LABEL: @f1(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float
; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]		; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]		; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
; THRESHOLD-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]		; THRESHOLD-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
; THRESHOLD-NEXT: ret float [[BIN_EXTRA]]		; THRESHOLD-NEXT: ret float [[OP_EXTRA]]
;		;
entry:		entry:
%rem = srem i32 %a, %b		%rem = srem i32 %a, %b
%conv = sitofp i32 %rem to float		%conv = sitofp i32 %rem to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%add = fadd fast float %0, %conv		%add = fadd fast float %0, %conv
%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1		%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
%1 = load float, float* %arrayidx.1, align 4		%1 = load float, float* %arrayidx.1, align 4
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <8 x float> [[TMP5]], [[RDX_SHUF7]]		; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <8 x float> [[TMP5]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <8 x float> [[BIN_RDX8]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <8 x float> [[BIN_RDX8]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <8 x float> [[BIN_RDX8]], [[RDX_SHUF9]]		; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <8 x float> [[BIN_RDX8]], [[RDX_SHUF9]]
; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]], [[RDX_SHUF11]]		; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]], i32 0
; CHECK-NEXT: [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]], [[TMP9]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
; CHECK-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF14]]		; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]
; CHECK-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <4 x float> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX17:%.*]] = fadd fast <4 x float> [[BIN_RDX15]], [[RDX_SHUF16]]		; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX17]], i32 0		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0
; CHECK-NEXT: [[BIN_RDX18:%.*]] = fadd fast float [[BIN_RDX13]], [[TMP10]]		; CHECK-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[BIN_RDX18]], [[TMP1]]		; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]		; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]		; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
; CHECK-NEXT: ret float [[TMP12]]		; CHECK-NEXT: ret float [[TMP12]]
;		;
; THRESHOLD-LABEL: @loadadd31(		; THRESHOLD-LABEL: @loadadd31(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0		; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <8 x float> [[TMP5]], [[RDX_SHUF7]]		; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <8 x float> [[TMP5]], [[RDX_SHUF7]]
; THRESHOLD-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <8 x float> [[BIN_RDX8]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <8 x float> [[BIN_RDX8]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <8 x float> [[BIN_RDX8]], [[RDX_SHUF9]]		; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <8 x float> [[BIN_RDX8]], [[RDX_SHUF9]]
; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <8 x float> [[BIN_RDX10]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]], [[RDX_SHUF11]]		; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <8 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]], i32 0		; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <8 x float> [[BIN_RDX12]], i32 0
; THRESHOLD-NEXT: [[BIN_RDX13:%.*]] = fadd fast float [[TMP8]], [[TMP9]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
; THRESHOLD-NEXT: [[RDX_SHUF14:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX15:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF14]]		; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]
; THRESHOLD-NEXT: [[RDX_SHUF16:%.*]] = shufflevector <4 x float> [[BIN_RDX15]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX17:%.*]] = fadd fast <4 x float> [[BIN_RDX15]], [[RDX_SHUF16]]		; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX17]], i32 0		; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0
; THRESHOLD-NEXT: [[BIN_RDX18:%.*]] = fadd fast float [[BIN_RDX13]], [[TMP10]]		; THRESHOLD-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[BIN_RDX18]], [[TMP1]]		; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]		; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
; THRESHOLD-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]		; THRESHOLD-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
; THRESHOLD-NEXT: ret float [[TMP12]]		; THRESHOLD-NEXT: ret float [[TMP12]]
;		;
entry:		entry:
%arrayidx = getelementptr inbounds float, float* %x, i64 1		%arrayidx = getelementptr inbounds float, float* %x, i64 1
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2		%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]		; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], [[CONV]]		; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]		; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
; CHECK-NEXT: ret float [[BIN_EXTRA5]]		; CHECK-NEXT: ret float [[OP_EXTRA5]]
;		;
; THRESHOLD-LABEL: @extra_args(		; THRESHOLD-LABEL: @extra_args(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00		; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
Show All 14 Lines
; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]		; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]		; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], [[CONV]]		; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]		; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]		; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%add = fadd fast float %conv, 3.000000e+00		%add = fadd fast float %conv, 3.000000e+00
%add1 = fadd fast float %0, %add		%add1 = fadd fast float %0, %add
%arrayidx3 = getelementptr inbounds float, float* %x, i64 1		%arrayidx3 = getelementptr inbounds float, float* %x, i64 1
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]		; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], 5.000000e+00		; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00
; CHECK-NEXT: [[BIN_EXTRA6:%.*]] = fadd fast float [[BIN_EXTRA5]], 5.000000e+00		; CHECK-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00
; CHECK-NEXT: [[BIN_EXTRA7:%.*]] = fadd fast float [[BIN_EXTRA6]], [[CONV]]		; CHECK-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]
; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]		; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
; CHECK-NEXT: ret float [[BIN_EXTRA7]]		; CHECK-NEXT: ret float [[OP_EXTRA7]]
;		;
; THRESHOLD-LABEL: @extra_args_same_several_times(		; THRESHOLD-LABEL: @extra_args_same_several_times(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00		; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
Show All 16 Lines
; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]		; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]		; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], 5.000000e+00		; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00
; THRESHOLD-NEXT: [[BIN_EXTRA6:%.*]] = fadd fast float [[BIN_EXTRA5]], 5.000000e+00		; THRESHOLD-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00
; THRESHOLD-NEXT: [[BIN_EXTRA7:%.*]] = fadd fast float [[BIN_EXTRA6]], [[CONV]]		; THRESHOLD-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]
; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]		; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
; THRESHOLD-NEXT: ret float [[BIN_EXTRA7]]		; THRESHOLD-NEXT: ret float [[OP_EXTRA7]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%add = fadd fast float %conv, 3.000000e+00		%add = fadd fast float %conv, 3.000000e+00
%add1 = fadd fast float %0, %add		%add1 = fadd fast float %0, %add
%arrayidx3 = getelementptr inbounds float, float* %x, i64 1		%arrayidx3 = getelementptr inbounds float, float* %x, i64 1
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]		; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
; CHECK-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], [[CONV]]		; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]		; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
; CHECK-NEXT: ret float [[BIN_EXTRA5]]		; CHECK-NEXT: ret float [[OP_EXTRA5]]
;		;
; THRESHOLD-LABEL: @extra_args_no_replace(		; THRESHOLD-LABEL: @extra_args_no_replace(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float		; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00		; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00
; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]		; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]
Show All 16 Lines
; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]		; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]		; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
; THRESHOLD-NEXT: [[BIN_EXTRA5:%.*]] = fadd fast float [[BIN_EXTRA]], [[CONV]]		; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]		; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
; THRESHOLD-NEXT: ret float [[BIN_EXTRA5]]		; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%convc = sitofp i32 %c to float		%convc = sitofp i32 %c to float
%addc = fadd fast float %convc, 3.000000e+00		%addc = fadd fast float %convc, 3.000000e+00
%add = fadd fast float %conv, %addc		%add = fadd fast float %conv, %addc
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[R2:%.*]] = add nsw i32 [[R1]], undef		; CHECK-NEXT: [[R2:%.*]] = add nsw i32 [[R1]], undef
; CHECK-NEXT: [[R3:%.*]] = add nsw i32 [[R2]], undef		; CHECK-NEXT: [[R3:%.*]] = add nsw i32 [[R2]], undef
; CHECK-NEXT: [[R4:%.*]] = add nsw i32 [[R3]], undef		; CHECK-NEXT: [[R4:%.*]] = add nsw i32 [[R3]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0		; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[BIN_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]		; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
; CHECK-NEXT: [[BIN_EXTRA3:%.*]] = add nsw i32 [[BIN_EXTRA]], [[TMP9]]		; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
; CHECK-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], undef		; CHECK-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], undef
; CHECK-NEXT: ret i32 [[BIN_EXTRA3]]		; CHECK-NEXT: ret i32 [[OP_EXTRA3]]
;		;
; THRESHOLD-LABEL: @wobble(		; THRESHOLD-LABEL: @wobble(
; THRESHOLD-NEXT: bb:		; THRESHOLD-NEXT: bb:
; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> undef, i32 [[ARG:%.]], i32 0		; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> undef, i32 [[ARG:%.]], i32 0
; THRESHOLD-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG]], i32 1		; THRESHOLD-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG]], i32 1
; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[ARG]], i32 2		; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[ARG]], i32 2
; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ARG]], i32 3		; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ARG]], i32 3
; THRESHOLD-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0		; THRESHOLD-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0
; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1		; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1
; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2		; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2
; THRESHOLD-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3		; THRESHOLD-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3
; THRESHOLD-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]		; THRESHOLD-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]
; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3		; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
; THRESHOLD-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer		; THRESHOLD-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer
; THRESHOLD-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>		; THRESHOLD-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>
; THRESHOLD-NEXT: [[R1:%.*]] = add nuw i32 [[ARG]], undef		; THRESHOLD-NEXT: [[R1:%.*]] = add nuw i32 [[ARG]], undef
; THRESHOLD-NEXT: [[R2:%.*]] = add nsw i32 [[R1]], undef		; THRESHOLD-NEXT: [[R2:%.*]] = add nsw i32 [[R1]], undef
; THRESHOLD-NEXT: [[R3:%.*]] = add nsw i32 [[R2]], undef		; THRESHOLD-NEXT: [[R3:%.*]] = add nsw i32 [[R2]], undef
; THRESHOLD-NEXT: [[R4:%.*]] = add nsw i32 [[R3]], undef		; THRESHOLD-NEXT: [[R4:%.*]] = add nsw i32 [[R3]], undef
; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]		; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]
; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0		; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
; THRESHOLD-NEXT: [[BIN_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]		; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
; THRESHOLD-NEXT: [[BIN_EXTRA3:%.*]] = add nsw i32 [[BIN_EXTRA]], [[TMP9]]		; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
; THRESHOLD-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], undef		; THRESHOLD-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], undef
; THRESHOLD-NEXT: ret i32 [[BIN_EXTRA3]]		; THRESHOLD-NEXT: ret i32 [[OP_EXTRA3]]
;		;
bb:		bb:
%x1 = xor i32 %arg, %bar		%x1 = xor i32 %arg, %bar
%i1 = icmp eq i32 %x1, 0		%i1 = icmp eq i32 %x1, 0
%s1 = sext i1 %i1 to i32		%s1 = sext i1 %i1 to i32
%x2 = xor i32 %arg, %bar		%x2 = xor i32 %arg, %bar
%i2 = icmp eq i32 %x2, 0		%i2 = icmp eq i32 %x2, 0
%s2 = sext i1 %i2 to i32		%s2 = sext i1 %i2 to i32
Show All 14 Lines

test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

Show All 28 Lines
; CHECK-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]		; CHECK-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]		; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; CHECK-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]		; CHECK-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; CHECK-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]		; CHECK-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; CHECK-NEXT: ret i32 [[TMP23]]		; CHECK-NEXT: ret i32 [[TMP23]]
;		;
; AVX-LABEL: @maxi8(		; AVX-LABEL: @maxi8(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP24:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x i32> [[TMP2]], <8 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; AVX-NEXT: [[TMP25:%.*]] = icmp sgt <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; AVX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x i32> [[BIN_RDX]], <8 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; AVX-NEXT: [[TMP26:%.*]] = icmp sgt <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; AVX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x i32> [[BIN_RDX2]], <8 x i32> [[RDX_SHUF3]]
; AVX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; AVX-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; AVX: ret i32 [[TMP27]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; AVX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; AVX-NEXT: ret i32 [[TMP23]]
;		;
; AVX2-LABEL: @maxi8(		; AVX2-LABEL: @maxi8(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP24:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x i32> [[TMP2]], <8 x i32> [[RDX_SHUF]]
; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x i32> [[BIN_RDX]], <8 x i32> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; AVX2-NEXT: [[TMP26:%.*]] = icmp sgt <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x i32> [[BIN_RDX2]], <8 x i32> [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; AVX2-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; AVX2: ret i32 [[TMP27]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; AVX2-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX2-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; AVX2-NEXT: ret i32 [[TMP23]]
;		;
; SKX-LABEL: @maxi8(		; SKX-LABEL: @maxi8(
; SKX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; SKX-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
; SKX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; SKX: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; SKX-NEXT: [[TMP24:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; SKX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x i32> [[TMP2]], <8 x i32> [[RDX_SHUF]]
; SKX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; SKX-NEXT: [[TMP25:%.*]] = icmp sgt <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; SKX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x i32> [[BIN_RDX]], <8 x i32> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; SKX-NEXT: [[TMP26:%.*]] = icmp sgt <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; SKX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x i32> [[BIN_RDX2]], <8 x i32> [[RDX_SHUF3]]
; SKX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; SKX-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; SKX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; SKX: ret i32 [[TMP27]]
; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
; SKX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4
; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; SKX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; SKX-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; SKX-NEXT: ret i32 [[TMP23]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]		; CHECK-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]		; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; CHECK-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4		; CHECK-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; CHECK-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]		; CHECK-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]		; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; CHECK-NEXT: ret i32 [[TMP47]]		; CHECK-NEXT: ret i32 [[TMP47]]
;		;
; AVX-LABEL: @maxi16(		; AVX-LABEL: @maxi16(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP48:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x i32> [[TMP2]], <16 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; AVX-NEXT: [[TMP49:%.*]] = icmp sgt <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; AVX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x i32> [[BIN_RDX]], <16 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; AVX-NEXT: [[TMP50:%.*]] = icmp sgt <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; AVX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x i32> [[BIN_RDX2]], <16 x i32> [[RDX_SHUF3]]
; AVX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; AVX-NEXT: [[TMP51:%.*]] = icmp sgt <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; AVX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x i32> [[BIN_RDX4]], <16 x i32> [[RDX_SHUF5]]
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; AVX-NEXT: [[TMP52:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; AVX: ret i32 [[TMP52]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; AVX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; AVX-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; AVX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; AVX-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; AVX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; AVX-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; AVX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; AVX-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; AVX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; AVX-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; AVX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; AVX-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; AVX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; AVX-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; AVX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; AVX-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; AVX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; AVX-NEXT: ret i32 [[TMP47]]
;		;
; AVX2-LABEL: @maxi16(		; AVX2-LABEL: @maxi16(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP48:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x i32> [[TMP2]], <16 x i32> [[RDX_SHUF]]
; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; AVX2-NEXT: [[TMP49:%.*]] = icmp sgt <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x i32> [[BIN_RDX]], <16 x i32> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; AVX2-NEXT: [[TMP50:%.*]] = icmp sgt <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x i32> [[BIN_RDX2]], <16 x i32> [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; AVX2-NEXT: [[TMP51:%.*]] = icmp sgt <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x i32> [[BIN_RDX4]], <16 x i32> [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; AVX2-NEXT: [[TMP52:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; AVX2: ret i32 [[TMP52]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; AVX2-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX2-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; AVX2-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; AVX2-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; AVX2-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; AVX2-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; AVX2-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; AVX2-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; AVX2-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; AVX2-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; AVX2-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; AVX2-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; AVX2-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; AVX2-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; AVX2-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; AVX2-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; AVX2-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; AVX2-NEXT: ret i32 [[TMP47]]
;		;
; SKX-LABEL: @maxi16(		; SKX-LABEL: @maxi16(
; SKX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; SKX-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16
; SKX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; SKX: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; SKX-NEXT: [[TMP48:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; SKX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x i32> [[TMP2]], <16 x i32> [[RDX_SHUF]]
; SKX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; SKX-NEXT: [[TMP49:%.*]] = icmp sgt <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; SKX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x i32> [[BIN_RDX]], <16 x i32> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; SKX-NEXT: [[TMP50:%.*]] = icmp sgt <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; SKX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x i32> [[BIN_RDX2]], <16 x i32> [[RDX_SHUF3]]
; SKX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; SKX-NEXT: [[TMP51:%.*]] = icmp sgt <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; SKX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x i32> [[BIN_RDX4]], <16 x i32> [[RDX_SHUF5]]
; SKX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; SKX-NEXT: [[TMP52:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; SKX: ret i32 [[TMP52]]
; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; SKX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; SKX-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; SKX-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; SKX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; SKX-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; SKX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; SKX-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; SKX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; SKX-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; SKX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; SKX-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; SKX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; SKX-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; SKX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; SKX-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; SKX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; SKX-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; SKX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; SKX-NEXT: ret i32 [[TMP47]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
Show All 36 Lines	;
%45 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4		%45 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
%46 = icmp sgt i32 %44, %45		%46 = icmp sgt i32 %44, %45
%47 = select i1 %46, i32 %44, i32 %45		%47 = select i1 %46, i32 %44, i32 %45
ret i32 %47		ret i32 %47
}		}

define i32 @maxi32(i32) {		define i32 @maxi32(i32) {
; CHECK-LABEL: @maxi32(		; CHECK-LABEL: @maxi32(
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; CHECK: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; CHECK-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; CHECK-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; CHECK-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; CHECK-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; CHECK-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x i32> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; CHECK-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0
; CHECK-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]		; CHECK: ret i32 [[TMP101]]
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; CHECK-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; CHECK-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; CHECK-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; CHECK-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; CHECK-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; CHECK-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; CHECK-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; CHECK-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; CHECK-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; CHECK-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; CHECK-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; CHECK-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; CHECK-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; CHECK-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; CHECK-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; CHECK-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; CHECK-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; CHECK-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; CHECK-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; CHECK-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; CHECK-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; CHECK-NEXT: [[TMP48:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
; CHECK-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
; CHECK-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32 [[TMP48]]
; CHECK-NEXT: [[TMP51:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
; CHECK-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
; CHECK-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32 [[TMP51]]
; CHECK-NEXT: [[TMP54:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
; CHECK-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32 [[TMP54]]
; CHECK-NEXT: [[TMP57:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
; CHECK-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
; CHECK-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32 [[TMP57]]
; CHECK-NEXT: [[TMP60:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
; CHECK-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
; CHECK-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32 [[TMP60]]
; CHECK-NEXT: [[TMP63:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
; CHECK-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
; CHECK-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32 [[TMP63]]
; CHECK-NEXT: [[TMP66:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
; CHECK-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
; CHECK-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32 [[TMP66]]
; CHECK-NEXT: [[TMP69:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
; CHECK-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
; CHECK-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32 [[TMP69]]
; CHECK-NEXT: [[TMP72:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
; CHECK-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
; CHECK-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32 [[TMP72]]
; CHECK-NEXT: [[TMP75:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
; CHECK-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
; CHECK-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32 [[TMP75]]
; CHECK-NEXT: [[TMP78:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
; CHECK-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
; CHECK-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32 [[TMP78]]
; CHECK-NEXT: [[TMP81:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
; CHECK-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
; CHECK-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32 [[TMP81]]
; CHECK-NEXT: [[TMP84:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
; CHECK-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
; CHECK-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32 [[TMP84]]
; CHECK-NEXT: [[TMP87:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
; CHECK-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
; CHECK-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32 [[TMP87]]
; CHECK-NEXT: [[TMP90:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
; CHECK-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32 [[TMP90]]
; CHECK-NEXT: [[TMP93:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
; CHECK-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32 [[TMP93]]
; CHECK-NEXT: ret i32 [[TMP95]]
;		;
; AVX-LABEL: @maxi32(		; AVX-LABEL: @maxi32(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; AVX-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; AVX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; AVX-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; AVX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
; AVX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; AVX-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; AVX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; AVX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; AVX-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; AVX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x i32> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
; AVX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0
; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]		; AVX: ret i32 [[TMP101]]
; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; AVX-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; AVX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; AVX-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; AVX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; AVX-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; AVX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; AVX-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; AVX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; AVX-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; AVX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; AVX-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; AVX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; AVX-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; AVX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; AVX-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; AVX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; AVX-NEXT: [[TMP48:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
; AVX-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
; AVX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32 [[TMP48]]
; AVX-NEXT: [[TMP51:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
; AVX-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
; AVX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32 [[TMP51]]
; AVX-NEXT: [[TMP54:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
; AVX-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
; AVX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32 [[TMP54]]
; AVX-NEXT: [[TMP57:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
; AVX-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
; AVX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32 [[TMP57]]
; AVX-NEXT: [[TMP60:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
; AVX-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
; AVX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32 [[TMP60]]
; AVX-NEXT: [[TMP63:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
; AVX-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
; AVX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32 [[TMP63]]
; AVX-NEXT: [[TMP66:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
; AVX-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
; AVX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32 [[TMP66]]
; AVX-NEXT: [[TMP69:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
; AVX-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
; AVX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32 [[TMP69]]
; AVX-NEXT: [[TMP72:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
; AVX-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
; AVX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32 [[TMP72]]
; AVX-NEXT: [[TMP75:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
; AVX-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
; AVX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32 [[TMP75]]
; AVX-NEXT: [[TMP78:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
; AVX-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
; AVX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32 [[TMP78]]
; AVX-NEXT: [[TMP81:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
; AVX-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
; AVX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32 [[TMP81]]
; AVX-NEXT: [[TMP84:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
; AVX-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
; AVX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32 [[TMP84]]
; AVX-NEXT: [[TMP87:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
; AVX-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
; AVX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32 [[TMP87]]
; AVX-NEXT: [[TMP90:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
; AVX-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
; AVX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32 [[TMP90]]
; AVX-NEXT: [[TMP93:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
; AVX-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
; AVX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32 [[TMP93]]
; AVX-NEXT: ret i32 [[TMP95]]
;		;
; AVX2-LABEL: @maxi32(		; AVX2-LABEL: @maxi32(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]
; AVX2-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; AVX2-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; AVX2-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; AVX2-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; AVX2-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; AVX2-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; AVX2-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x i32> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
; AVX2-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; AVX2-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0
; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]		; AVX2: ret i32 [[TMP101]]
; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX2-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; AVX2-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; AVX2-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; AVX2-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; AVX2-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; AVX2-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; AVX2-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; AVX2-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; AVX2-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; AVX2-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; AVX2-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; AVX2-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; AVX2-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; AVX2-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; AVX2-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; AVX2-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; AVX2-NEXT: [[TMP48:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
; AVX2-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
; AVX2-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32 [[TMP48]]
; AVX2-NEXT: [[TMP51:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
; AVX2-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
; AVX2-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32 [[TMP51]]
; AVX2-NEXT: [[TMP54:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
; AVX2-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
; AVX2-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32 [[TMP54]]
; AVX2-NEXT: [[TMP57:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
; AVX2-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
; AVX2-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32 [[TMP57]]
; AVX2-NEXT: [[TMP60:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
; AVX2-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
; AVX2-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32 [[TMP60]]
; AVX2-NEXT: [[TMP63:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
; AVX2-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
; AVX2-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32 [[TMP63]]
; AVX2-NEXT: [[TMP66:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
; AVX2-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
; AVX2-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32 [[TMP66]]
; AVX2-NEXT: [[TMP69:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
; AVX2-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
; AVX2-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32 [[TMP69]]
; AVX2-NEXT: [[TMP72:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
; AVX2-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
; AVX2-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32 [[TMP72]]
; AVX2-NEXT: [[TMP75:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
; AVX2-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
; AVX2-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32 [[TMP75]]
; AVX2-NEXT: [[TMP78:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
; AVX2-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
; AVX2-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32 [[TMP78]]
; AVX2-NEXT: [[TMP81:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
; AVX2-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
; AVX2-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32 [[TMP81]]
; AVX2-NEXT: [[TMP84:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
; AVX2-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
; AVX2-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32 [[TMP84]]
; AVX2-NEXT: [[TMP87:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
; AVX2-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
; AVX2-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32 [[TMP87]]
; AVX2-NEXT: [[TMP90:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
; AVX2-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
; AVX2-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32 [[TMP90]]
; AVX2-NEXT: [[TMP93:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
; AVX2-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
; AVX2-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32 [[TMP93]]
; AVX2-NEXT: ret i32 [[TMP95]]
;		;
; SKX-LABEL: @maxi32(		; SKX-LABEL: @maxi32(
; SKX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; SKX-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16
; SKX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; SKX: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; SKX-NEXT: [[TMP96:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; SKX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]
; SKX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], [[TMP6]]		; SKX-NEXT: [[TMP97:%.*]] = icmp sgt <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 [[TMP6]]		; SKX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x i32> [[BIN_RDX]], <32 x i32> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP9:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 3), align 4		; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]		; SKX-NEXT: [[TMP98:%.*]] = icmp sgt <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP8]], i32 [[TMP9]]		; SKX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x i32> [[BIN_RDX2]], <32 x i32> [[RDX_SHUF3]]
; SKX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 4), align 16		; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]		; SKX-NEXT: [[TMP99:%.*]] = icmp sgt <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]		; SKX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x i32> [[BIN_RDX4]], <32 x i32> [[RDX_SHUF5]]
; SKX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 5), align 4		; SKX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]		; SKX-NEXT: [[TMP100:%.*]] = icmp sgt <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]
; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; SKX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x i32> [[BIN_RDX6]], <32 x i32> [[RDX_SHUF7]]
; SKX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; SKX-NEXT: [[TMP101:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0
; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]		; SKX: ret i32 [[TMP101]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; SKX-NEXT: [[TMP21:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]
; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], i32 [[TMP20]], i32 [[TMP21]]
; SKX-NEXT: [[TMP24:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 8), align 16
; SKX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP23]], [[TMP24]]
; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP23]], i32 [[TMP24]]
; SKX-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 9), align 4
; SKX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP26]], [[TMP27]]
; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[TMP26]], i32 [[TMP27]]
; SKX-NEXT: [[TMP30:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 10), align 8
; SKX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP29]], [[TMP30]]
; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP29]], i32 [[TMP30]]
; SKX-NEXT: [[TMP33:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 11), align 4
; SKX-NEXT: [[TMP34:%.*]] = icmp sgt i32 [[TMP32]], [[TMP33]]
; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], i32 [[TMP32]], i32 [[TMP33]]
; SKX-NEXT: [[TMP36:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 12), align 16
; SKX-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP35]], [[TMP36]]
; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP35]], i32 [[TMP36]]
; SKX-NEXT: [[TMP39:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 13), align 4
; SKX-NEXT: [[TMP40:%.*]] = icmp sgt i32 [[TMP38]], [[TMP39]]
; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], i32 [[TMP38]], i32 [[TMP39]]
; SKX-NEXT: [[TMP42:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 14), align 8
; SKX-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP41]], [[TMP42]]
; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP41]], i32 [[TMP42]]
; SKX-NEXT: [[TMP45:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 15), align 4
; SKX-NEXT: [[TMP46:%.*]] = icmp sgt i32 [[TMP44]], [[TMP45]]
; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], i32 [[TMP44]], i32 [[TMP45]]
; SKX-NEXT: [[TMP48:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 16), align 16
; SKX-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP47]], [[TMP48]]
; SKX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP47]], i32 [[TMP48]]
; SKX-NEXT: [[TMP51:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 17), align 4
; SKX-NEXT: [[TMP52:%.*]] = icmp sgt i32 [[TMP50]], [[TMP51]]
; SKX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], i32 [[TMP50]], i32 [[TMP51]]
; SKX-NEXT: [[TMP54:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 18), align 8
; SKX-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP53]], [[TMP54]]
; SKX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP53]], i32 [[TMP54]]
; SKX-NEXT: [[TMP57:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 19), align 4
; SKX-NEXT: [[TMP58:%.*]] = icmp sgt i32 [[TMP56]], [[TMP57]]
; SKX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], i32 [[TMP56]], i32 [[TMP57]]
; SKX-NEXT: [[TMP60:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 20), align 16
; SKX-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP59]], [[TMP60]]
; SKX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP59]], i32 [[TMP60]]
; SKX-NEXT: [[TMP63:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 21), align 4
; SKX-NEXT: [[TMP64:%.*]] = icmp sgt i32 [[TMP62]], [[TMP63]]
; SKX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], i32 [[TMP62]], i32 [[TMP63]]
; SKX-NEXT: [[TMP66:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 22), align 8
; SKX-NEXT: [[TMP67:%.*]] = icmp sgt i32 [[TMP65]], [[TMP66]]
; SKX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], i32 [[TMP65]], i32 [[TMP66]]
; SKX-NEXT: [[TMP69:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 23), align 4
; SKX-NEXT: [[TMP70:%.*]] = icmp sgt i32 [[TMP68]], [[TMP69]]
; SKX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], i32 [[TMP68]], i32 [[TMP69]]
; SKX-NEXT: [[TMP72:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 24), align 16
; SKX-NEXT: [[TMP73:%.*]] = icmp sgt i32 [[TMP71]], [[TMP72]]
; SKX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], i32 [[TMP71]], i32 [[TMP72]]
; SKX-NEXT: [[TMP75:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 25), align 4
; SKX-NEXT: [[TMP76:%.*]] = icmp sgt i32 [[TMP74]], [[TMP75]]
; SKX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], i32 [[TMP74]], i32 [[TMP75]]
; SKX-NEXT: [[TMP78:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 26), align 8
; SKX-NEXT: [[TMP79:%.*]] = icmp sgt i32 [[TMP77]], [[TMP78]]
; SKX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], i32 [[TMP77]], i32 [[TMP78]]
; SKX-NEXT: [[TMP81:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 27), align 4
; SKX-NEXT: [[TMP82:%.*]] = icmp sgt i32 [[TMP80]], [[TMP81]]
; SKX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], i32 [[TMP80]], i32 [[TMP81]]
; SKX-NEXT: [[TMP84:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 28), align 16
; SKX-NEXT: [[TMP85:%.*]] = icmp sgt i32 [[TMP83]], [[TMP84]]
; SKX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], i32 [[TMP83]], i32 [[TMP84]]
; SKX-NEXT: [[TMP87:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 29), align 4
; SKX-NEXT: [[TMP88:%.*]] = icmp sgt i32 [[TMP86]], [[TMP87]]
; SKX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], i32 [[TMP86]], i32 [[TMP87]]
; SKX-NEXT: [[TMP90:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 30), align 8
; SKX-NEXT: [[TMP91:%.*]] = icmp sgt i32 [[TMP89]], [[TMP90]]
; SKX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], i32 [[TMP89]], i32 [[TMP90]]
; SKX-NEXT: [[TMP93:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 31), align 4
; SKX-NEXT: [[TMP94:%.*]] = icmp sgt i32 [[TMP92]], [[TMP93]]
; SKX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], i32 [[TMP92]], i32 [[TMP93]]
; SKX-NEXT: ret i32 [[TMP95]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]		; CHECK-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]		; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; CHECK-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; CHECK-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]		; CHECK-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; CHECK-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]		; CHECK-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; CHECK-NEXT: ret float [[TMP23]]		; CHECK-NEXT: ret float [[TMP23]]
;		;
; AVX-LABEL: @maxf8(		; AVX-LABEL: @maxf8(
; AVX-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16
; AVX-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; AVX: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP24:%.*]] = fcmp fast ogt <8 x float> [[TMP2]], [[RDX_SHUF]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; AVX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x float> [[TMP2]], <8 x float> [[RDX_SHUF]]
; AVX-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; AVX-NEXT: [[TMP25:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; AVX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x float> [[BIN_RDX]], <8 x float> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; AVX-NEXT: [[TMP26:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; AVX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x float> [[BIN_RDX2]], <8 x float> [[RDX_SHUF3]]
; AVX-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; AVX-NEXT: [[TMP27:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; AVX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; AVX: ret float [[TMP27]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]
; AVX-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
; AVX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]
; AVX-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; AVX-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; AVX-NEXT: ret float [[TMP23]]
;		;
; AVX2-LABEL: @maxf8(		; AVX2-LABEL: @maxf8(
; AVX2-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16
; AVX2-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; AVX2: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP24:%.*]] = fcmp fast ogt <8 x float> [[TMP2]], [[RDX_SHUF]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; AVX2-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x float> [[TMP2]], <8 x float> [[RDX_SHUF]]
; AVX2-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; AVX2-NEXT: [[TMP25:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x float> [[BIN_RDX]], <8 x float> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; AVX2-NEXT: [[TMP26:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x float> [[BIN_RDX2]], <8 x float> [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; AVX2-NEXT: [[TMP27:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; AVX2-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; AVX2: ret float [[TMP27]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]
; AVX2-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
; AVX2-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]
; AVX2-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; AVX2-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; AVX2-NEXT: ret float [[TMP23]]
;		;
; SKX-LABEL: @maxf8(		; SKX-LABEL: @maxf8(
; SKX-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; SKX-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16
; SKX-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; SKX: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; SKX-NEXT: [[TMP24:%.*]] = fcmp fast ogt <8 x float> [[TMP2]], [[RDX_SHUF]]
; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; SKX-NEXT: [[BIN_RDX:%.*]] = select <8 x i1> [[TMP24]], <8 x float> [[TMP2]], <8 x float> [[RDX_SHUF]]
; SKX-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; SKX-NEXT: [[TMP25:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; SKX-NEXT: [[BIN_RDX2:%.*]] = select <8 x i1> [[TMP25]], <8 x float> [[BIN_RDX]], <8 x float> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; SKX-NEXT: [[TMP26:%.*]] = fcmp fast ogt <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; SKX-NEXT: [[BIN_RDX4:%.*]] = select <8 x i1> [[TMP26]], <8 x float> [[BIN_RDX2]], <8 x float> [[RDX_SHUF3]]
; SKX-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; SKX-NEXT: [[TMP27:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; SKX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; SKX: ret float [[TMP27]]
; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]
; SKX-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4
; SKX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]
; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]
; SKX-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
; SKX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; SKX-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; SKX-NEXT: ret float [[TMP23]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
%8 = select i1 %7, float %5, float %6		%8 = select i1 %7, float %5, float %6
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]		; CHECK-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]		; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; CHECK-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4		; CHECK-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; CHECK-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]		; CHECK-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]		; CHECK-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; CHECK-NEXT: ret float [[TMP47]]		; CHECK-NEXT: ret float [[TMP47]]
;		;
; AVX-LABEL: @maxf16(		; AVX-LABEL: @maxf16(
; AVX-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16
; AVX-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; AVX: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP48:%.*]] = fcmp fast ogt <16 x float> [[TMP2]], [[RDX_SHUF]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; AVX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x float> [[TMP2]], <16 x float> [[RDX_SHUF]]
; AVX-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; AVX-NEXT: [[TMP49:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; AVX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x float> [[BIN_RDX]], <16 x float> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; AVX-NEXT: [[TMP50:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; AVX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x float> [[BIN_RDX2]], <16 x float> [[RDX_SHUF3]]
; AVX-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; AVX-NEXT: [[TMP51:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]		; AVX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x float> [[BIN_RDX4]], <16 x float> [[RDX_SHUF5]]
; AVX-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4		; AVX-NEXT: [[TMP52:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; AVX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]		; AVX: ret float [[TMP52]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]
; AVX-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; AVX-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; AVX-NEXT: [[TMP24:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
; AVX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]], float [[TMP24]]
; AVX-NEXT: [[TMP27:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
; AVX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]], float [[TMP27]]
; AVX-NEXT: [[TMP30:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
; AVX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]], float [[TMP30]]
; AVX-NEXT: [[TMP33:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
; AVX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]], float [[TMP33]]
; AVX-NEXT: [[TMP36:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
; AVX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]], float [[TMP36]]
; AVX-NEXT: [[TMP39:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
; AVX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]], float [[TMP39]]
; AVX-NEXT: [[TMP42:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
; AVX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; AVX-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; AVX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; AVX-NEXT: ret float [[TMP47]]
;		;
; AVX2-LABEL: @maxf16(		; AVX2-LABEL: @maxf16(
; AVX2-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16
; AVX2-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; AVX2: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP48:%.*]] = fcmp fast ogt <16 x float> [[TMP2]], [[RDX_SHUF]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; AVX2-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x float> [[TMP2]], <16 x float> [[RDX_SHUF]]
; AVX2-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; AVX2-NEXT: [[TMP49:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x float> [[BIN_RDX]], <16 x float> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; AVX2-NEXT: [[TMP50:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x float> [[BIN_RDX2]], <16 x float> [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; AVX2-NEXT: [[TMP51:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]		; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x float> [[BIN_RDX4]], <16 x float> [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4		; AVX2-NEXT: [[TMP52:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; AVX2-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]		; AVX2: ret float [[TMP52]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]
; AVX2-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; AVX2-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; AVX2-NEXT: [[TMP24:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
; AVX2-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]], float [[TMP24]]
; AVX2-NEXT: [[TMP27:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
; AVX2-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]], float [[TMP27]]
; AVX2-NEXT: [[TMP30:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
; AVX2-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]], float [[TMP30]]
; AVX2-NEXT: [[TMP33:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
; AVX2-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]], float [[TMP33]]
; AVX2-NEXT: [[TMP36:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
; AVX2-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]], float [[TMP36]]
; AVX2-NEXT: [[TMP39:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
; AVX2-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]], float [[TMP39]]
; AVX2-NEXT: [[TMP42:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
; AVX2-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; AVX2-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; AVX2-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; AVX2-NEXT: ret float [[TMP47]]
;		;
; SKX-LABEL: @maxf16(		; SKX-LABEL: @maxf16(
; SKX-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; SKX-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16
; SKX-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; SKX: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; SKX-NEXT: [[TMP48:%.*]] = fcmp fast ogt <16 x float> [[TMP2]], [[RDX_SHUF]]
; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; SKX-NEXT: [[BIN_RDX:%.*]] = select <16 x i1> [[TMP48]], <16 x float> [[TMP2]], <16 x float> [[RDX_SHUF]]
; SKX-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; SKX-NEXT: [[TMP49:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; SKX-NEXT: [[BIN_RDX2:%.*]] = select <16 x i1> [[TMP49]], <16 x float> [[BIN_RDX]], <16 x float> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; SKX-NEXT: [[TMP50:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; SKX-NEXT: [[BIN_RDX4:%.*]] = select <16 x i1> [[TMP50]], <16 x float> [[BIN_RDX2]], <16 x float> [[RDX_SHUF3]]
; SKX-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; SKX-NEXT: [[TMP51:%.*]] = fcmp fast ogt <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]		; SKX-NEXT: [[BIN_RDX6:%.*]] = select <16 x i1> [[TMP51]], <16 x float> [[BIN_RDX4]], <16 x float> [[RDX_SHUF5]]
; SKX-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4		; SKX-NEXT: [[TMP52:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; SKX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]		; SKX: ret float [[TMP52]]
; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]
; SKX-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8
; SKX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; SKX-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; SKX-NEXT: [[TMP24:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
; SKX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]], float [[TMP24]]
; SKX-NEXT: [[TMP27:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
; SKX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]], float [[TMP27]]
; SKX-NEXT: [[TMP30:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
; SKX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]], float [[TMP30]]
; SKX-NEXT: [[TMP33:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
; SKX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]], float [[TMP33]]
; SKX-NEXT: [[TMP36:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
; SKX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]], float [[TMP36]]
; SKX-NEXT: [[TMP39:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
; SKX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]], float [[TMP39]]
; SKX-NEXT: [[TMP42:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
; SKX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; SKX-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; SKX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; SKX-NEXT: ret float [[TMP47]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
%8 = select i1 %7, float %5, float %6		%8 = select i1 %7, float %5, float %6
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]		; CHECK-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]], float [[TMP90]]		; CHECK-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]], float [[TMP90]]
; CHECK-NEXT: [[TMP93:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4		; CHECK-NEXT: [[TMP93:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
; CHECK-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]		; CHECK-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]], float [[TMP93]]		; CHECK-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]], float [[TMP93]]
; CHECK-NEXT: ret float [[TMP95]]		; CHECK-NEXT: ret float [[TMP95]]
;		;
; AVX-LABEL: @maxf32(		; AVX-LABEL: @maxf32(
; AVX-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16
; AVX-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; AVX: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP96:%.*]] = fcmp fast ogt <32 x float> [[TMP2]], [[RDX_SHUF]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; AVX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x float> [[TMP2]], <32 x float> [[RDX_SHUF]]
; AVX-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; AVX-NEXT: [[TMP97:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; AVX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x float> [[BIN_RDX]], <32 x float> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; AVX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; AVX-NEXT: [[TMP98:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; AVX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x float> [[BIN_RDX2]], <32 x float> [[RDX_SHUF3]]
; AVX-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; AVX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; AVX-NEXT: [[TMP99:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]		; AVX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x float> [[BIN_RDX4]], <32 x float> [[RDX_SHUF5]]
; AVX-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4		; AVX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]		; AVX-NEXT: [[TMP100:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]		; AVX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x float> [[BIN_RDX6]], <32 x float> [[RDX_SHUF7]]
; AVX-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8		; AVX-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
; AVX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]		; AVX: ret float [[TMP101]]
; AVX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; AVX-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; AVX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; AVX-NEXT: [[TMP24:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
; AVX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]], float [[TMP24]]
; AVX-NEXT: [[TMP27:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
; AVX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
; AVX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]], float [[TMP27]]
; AVX-NEXT: [[TMP30:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
; AVX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
; AVX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]], float [[TMP30]]
; AVX-NEXT: [[TMP33:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
; AVX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
; AVX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]], float [[TMP33]]
; AVX-NEXT: [[TMP36:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
; AVX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
; AVX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]], float [[TMP36]]
; AVX-NEXT: [[TMP39:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
; AVX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
; AVX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]], float [[TMP39]]
; AVX-NEXT: [[TMP42:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
; AVX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; AVX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; AVX-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; AVX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; AVX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; AVX-NEXT: [[TMP48:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 16), align 16
; AVX-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP47]], [[TMP48]]
; AVX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP47]], float [[TMP48]]
; AVX-NEXT: [[TMP51:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 17), align 4
; AVX-NEXT: [[TMP52:%.*]] = fcmp fast ogt float [[TMP50]], [[TMP51]]
; AVX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], float [[TMP50]], float [[TMP51]]
; AVX-NEXT: [[TMP54:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 18), align 8
; AVX-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP53]], [[TMP54]]
; AVX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP53]], float [[TMP54]]
; AVX-NEXT: [[TMP57:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 19), align 4
; AVX-NEXT: [[TMP58:%.*]] = fcmp fast ogt float [[TMP56]], [[TMP57]]
; AVX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], float [[TMP56]], float [[TMP57]]
; AVX-NEXT: [[TMP60:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 20), align 16
; AVX-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP59]], [[TMP60]]
; AVX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP59]], float [[TMP60]]
; AVX-NEXT: [[TMP63:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 21), align 4
; AVX-NEXT: [[TMP64:%.*]] = fcmp fast ogt float [[TMP62]], [[TMP63]]
; AVX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], float [[TMP62]], float [[TMP63]]
; AVX-NEXT: [[TMP66:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 22), align 8
; AVX-NEXT: [[TMP67:%.*]] = fcmp fast ogt float [[TMP65]], [[TMP66]]
; AVX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], float [[TMP65]], float [[TMP66]]
; AVX-NEXT: [[TMP69:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 23), align 4
; AVX-NEXT: [[TMP70:%.*]] = fcmp fast ogt float [[TMP68]], [[TMP69]]
; AVX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], float [[TMP68]], float [[TMP69]]
; AVX-NEXT: [[TMP72:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 24), align 16
; AVX-NEXT: [[TMP73:%.*]] = fcmp fast ogt float [[TMP71]], [[TMP72]]
; AVX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], float [[TMP71]], float [[TMP72]]
; AVX-NEXT: [[TMP75:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 25), align 4
; AVX-NEXT: [[TMP76:%.*]] = fcmp fast ogt float [[TMP74]], [[TMP75]]
; AVX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], float [[TMP74]], float [[TMP75]]
; AVX-NEXT: [[TMP78:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 26), align 8
; AVX-NEXT: [[TMP79:%.*]] = fcmp fast ogt float [[TMP77]], [[TMP78]]
; AVX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], float [[TMP77]], float [[TMP78]]
; AVX-NEXT: [[TMP81:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 27), align 4
; AVX-NEXT: [[TMP82:%.*]] = fcmp fast ogt float [[TMP80]], [[TMP81]]
; AVX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], float [[TMP80]], float [[TMP81]]
; AVX-NEXT: [[TMP84:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 28), align 16
; AVX-NEXT: [[TMP85:%.*]] = fcmp fast ogt float [[TMP83]], [[TMP84]]
; AVX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], float [[TMP83]], float [[TMP84]]
; AVX-NEXT: [[TMP87:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 29), align 4
; AVX-NEXT: [[TMP88:%.*]] = fcmp fast ogt float [[TMP86]], [[TMP87]]
; AVX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], float [[TMP86]], float [[TMP87]]
; AVX-NEXT: [[TMP90:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 30), align 8
; AVX-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
; AVX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]], float [[TMP90]]
; AVX-NEXT: [[TMP93:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
; AVX-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
; AVX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]], float [[TMP93]]
; AVX-NEXT: ret float [[TMP95]]
;		;
; AVX2-LABEL: @maxf32(		; AVX2-LABEL: @maxf32(
; AVX2-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16
; AVX2-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; AVX2: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP96:%.*]] = fcmp fast ogt <32 x float> [[TMP2]], [[RDX_SHUF]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; AVX2-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x float> [[TMP2]], <32 x float> [[RDX_SHUF]]
; AVX2-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; AVX2-NEXT: [[TMP97:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; AVX2-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x float> [[BIN_RDX]], <32 x float> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; AVX2-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; AVX2-NEXT: [[TMP98:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; AVX2-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x float> [[BIN_RDX2]], <32 x float> [[RDX_SHUF3]]
; AVX2-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; AVX2-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; AVX2-NEXT: [[TMP99:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]		; AVX2-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x float> [[BIN_RDX4]], <32 x float> [[RDX_SHUF5]]
; AVX2-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4		; AVX2-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]		; AVX2-NEXT: [[TMP100:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]		; AVX2-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x float> [[BIN_RDX6]], <32 x float> [[RDX_SHUF7]]
; AVX2-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8		; AVX2-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
; AVX2-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]		; AVX2: ret float [[TMP101]]
; AVX2-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; AVX2-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; AVX2-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; AVX2-NEXT: [[TMP24:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
; AVX2-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]], float [[TMP24]]
; AVX2-NEXT: [[TMP27:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
; AVX2-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
; AVX2-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]], float [[TMP27]]
; AVX2-NEXT: [[TMP30:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
; AVX2-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
; AVX2-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]], float [[TMP30]]
; AVX2-NEXT: [[TMP33:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
; AVX2-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
; AVX2-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]], float [[TMP33]]
; AVX2-NEXT: [[TMP36:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
; AVX2-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
; AVX2-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]], float [[TMP36]]
; AVX2-NEXT: [[TMP39:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
; AVX2-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
; AVX2-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]], float [[TMP39]]
; AVX2-NEXT: [[TMP42:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
; AVX2-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; AVX2-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; AVX2-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; AVX2-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; AVX2-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; AVX2-NEXT: [[TMP48:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 16), align 16
; AVX2-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP47]], [[TMP48]]
; AVX2-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP47]], float [[TMP48]]
; AVX2-NEXT: [[TMP51:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 17), align 4
; AVX2-NEXT: [[TMP52:%.*]] = fcmp fast ogt float [[TMP50]], [[TMP51]]
; AVX2-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], float [[TMP50]], float [[TMP51]]
; AVX2-NEXT: [[TMP54:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 18), align 8
; AVX2-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP53]], [[TMP54]]
; AVX2-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP53]], float [[TMP54]]
; AVX2-NEXT: [[TMP57:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 19), align 4
; AVX2-NEXT: [[TMP58:%.*]] = fcmp fast ogt float [[TMP56]], [[TMP57]]
; AVX2-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], float [[TMP56]], float [[TMP57]]
; AVX2-NEXT: [[TMP60:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 20), align 16
; AVX2-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP59]], [[TMP60]]
; AVX2-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP59]], float [[TMP60]]
; AVX2-NEXT: [[TMP63:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 21), align 4
; AVX2-NEXT: [[TMP64:%.*]] = fcmp fast ogt float [[TMP62]], [[TMP63]]
; AVX2-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], float [[TMP62]], float [[TMP63]]
; AVX2-NEXT: [[TMP66:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 22), align 8
; AVX2-NEXT: [[TMP67:%.*]] = fcmp fast ogt float [[TMP65]], [[TMP66]]
; AVX2-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], float [[TMP65]], float [[TMP66]]
; AVX2-NEXT: [[TMP69:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 23), align 4
; AVX2-NEXT: [[TMP70:%.*]] = fcmp fast ogt float [[TMP68]], [[TMP69]]
; AVX2-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], float [[TMP68]], float [[TMP69]]
; AVX2-NEXT: [[TMP72:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 24), align 16
; AVX2-NEXT: [[TMP73:%.*]] = fcmp fast ogt float [[TMP71]], [[TMP72]]
; AVX2-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], float [[TMP71]], float [[TMP72]]
; AVX2-NEXT: [[TMP75:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 25), align 4
; AVX2-NEXT: [[TMP76:%.*]] = fcmp fast ogt float [[TMP74]], [[TMP75]]
; AVX2-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], float [[TMP74]], float [[TMP75]]
; AVX2-NEXT: [[TMP78:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 26), align 8
; AVX2-NEXT: [[TMP79:%.*]] = fcmp fast ogt float [[TMP77]], [[TMP78]]
; AVX2-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], float [[TMP77]], float [[TMP78]]
; AVX2-NEXT: [[TMP81:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 27), align 4
; AVX2-NEXT: [[TMP82:%.*]] = fcmp fast ogt float [[TMP80]], [[TMP81]]
; AVX2-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], float [[TMP80]], float [[TMP81]]
; AVX2-NEXT: [[TMP84:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 28), align 16
; AVX2-NEXT: [[TMP85:%.*]] = fcmp fast ogt float [[TMP83]], [[TMP84]]
; AVX2-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], float [[TMP83]], float [[TMP84]]
; AVX2-NEXT: [[TMP87:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 29), align 4
; AVX2-NEXT: [[TMP88:%.*]] = fcmp fast ogt float [[TMP86]], [[TMP87]]
; AVX2-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], float [[TMP86]], float [[TMP87]]
; AVX2-NEXT: [[TMP90:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 30), align 8
; AVX2-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
; AVX2-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]], float [[TMP90]]
; AVX2-NEXT: [[TMP93:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
; AVX2-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
; AVX2-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]], float [[TMP93]]
; AVX2-NEXT: ret float [[TMP95]]
;		;
; SKX-LABEL: @maxf32(		; SKX-LABEL: @maxf32(
; SKX-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		; SKX-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16
; SKX-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		; SKX: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP4:%.*]] = fcmp fast ogt float [[TMP2]], [[TMP3]]		; SKX-NEXT: [[TMP96:%.*]] = fcmp fast ogt <32 x float> [[TMP2]], [[RDX_SHUF]]
; SKX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], float [[TMP2]], float [[TMP3]]		; SKX-NEXT: [[BIN_RDX:%.*]] = select <32 x i1> [[TMP96]], <32 x float> [[TMP2]], <32 x float> [[RDX_SHUF]]
; SKX-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP5]], [[TMP6]]		; SKX-NEXT: [[TMP97:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; SKX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP5]], float [[TMP6]]		; SKX-NEXT: [[BIN_RDX2:%.*]] = select <32 x i1> [[TMP97]], <32 x float> [[BIN_RDX]], <32 x float> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP9:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 3), align 4		; SKX-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP10:%.*]] = fcmp fast ogt float [[TMP8]], [[TMP9]]		; SKX-NEXT: [[TMP98:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; SKX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], float [[TMP8]], float [[TMP9]]		; SKX-NEXT: [[BIN_RDX4:%.*]] = select <32 x i1> [[TMP98]], <32 x float> [[BIN_RDX2]], <32 x float> [[RDX_SHUF3]]
; SKX-NEXT: [[TMP12:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 4), align 16		; SKX-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP11]], [[TMP12]]		; SKX-NEXT: [[TMP99:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; SKX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP11]], float [[TMP12]]		; SKX-NEXT: [[BIN_RDX6:%.*]] = select <32 x i1> [[TMP99]], <32 x float> [[BIN_RDX4]], <32 x float> [[RDX_SHUF5]]
; SKX-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 5), align 4		; SKX-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[TMP16:%.*]] = fcmp fast ogt float [[TMP14]], [[TMP15]]		; SKX-NEXT: [[TMP100:%.*]] = fcmp fast ogt <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
; SKX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], float [[TMP14]], float [[TMP15]]		; SKX-NEXT: [[BIN_RDX8:%.*]] = select <32 x i1> [[TMP100]], <32 x float> [[BIN_RDX6]], <32 x float> [[RDX_SHUF7]]
; SKX-NEXT: [[TMP18:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 6), align 8		; SKX-NEXT: [[TMP101:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
; SKX-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP17]], [[TMP18]]		; SKX: ret float [[TMP101]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP17]], float [[TMP18]]
; SKX-NEXT: [[TMP21:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP22:%.*]] = fcmp fast ogt float [[TMP20]], [[TMP21]]
; SKX-NEXT: [[TMP23:%.*]] = select i1 [[TMP22]], float [[TMP20]], float [[TMP21]]
; SKX-NEXT: [[TMP24:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 8), align 16
; SKX-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP23]], [[TMP24]]
; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP23]], float [[TMP24]]
; SKX-NEXT: [[TMP27:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 9), align 4
; SKX-NEXT: [[TMP28:%.*]] = fcmp fast ogt float [[TMP26]], [[TMP27]]
; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], float [[TMP26]], float [[TMP27]]
; SKX-NEXT: [[TMP30:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 10), align 8
; SKX-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP29]], [[TMP30]]
; SKX-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP29]], float [[TMP30]]
; SKX-NEXT: [[TMP33:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 11), align 4
; SKX-NEXT: [[TMP34:%.*]] = fcmp fast ogt float [[TMP32]], [[TMP33]]
; SKX-NEXT: [[TMP35:%.*]] = select i1 [[TMP34]], float [[TMP32]], float [[TMP33]]
; SKX-NEXT: [[TMP36:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 12), align 16
; SKX-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP35]], [[TMP36]]
; SKX-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP35]], float [[TMP36]]
; SKX-NEXT: [[TMP39:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 13), align 4
; SKX-NEXT: [[TMP40:%.*]] = fcmp fast ogt float [[TMP38]], [[TMP39]]
; SKX-NEXT: [[TMP41:%.*]] = select i1 [[TMP40]], float [[TMP38]], float [[TMP39]]
; SKX-NEXT: [[TMP42:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 14), align 8
; SKX-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP41]], [[TMP42]]
; SKX-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP41]], float [[TMP42]]
; SKX-NEXT: [[TMP45:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 15), align 4
; SKX-NEXT: [[TMP46:%.*]] = fcmp fast ogt float [[TMP44]], [[TMP45]]
; SKX-NEXT: [[TMP47:%.*]] = select i1 [[TMP46]], float [[TMP44]], float [[TMP45]]
; SKX-NEXT: [[TMP48:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 16), align 16
; SKX-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP47]], [[TMP48]]
; SKX-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP47]], float [[TMP48]]
; SKX-NEXT: [[TMP51:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 17), align 4
; SKX-NEXT: [[TMP52:%.*]] = fcmp fast ogt float [[TMP50]], [[TMP51]]
; SKX-NEXT: [[TMP53:%.*]] = select i1 [[TMP52]], float [[TMP50]], float [[TMP51]]
; SKX-NEXT: [[TMP54:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 18), align 8
; SKX-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP53]], [[TMP54]]
; SKX-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP53]], float [[TMP54]]
; SKX-NEXT: [[TMP57:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 19), align 4
; SKX-NEXT: [[TMP58:%.*]] = fcmp fast ogt float [[TMP56]], [[TMP57]]
; SKX-NEXT: [[TMP59:%.*]] = select i1 [[TMP58]], float [[TMP56]], float [[TMP57]]
; SKX-NEXT: [[TMP60:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 20), align 16
; SKX-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP59]], [[TMP60]]
; SKX-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP59]], float [[TMP60]]
; SKX-NEXT: [[TMP63:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 21), align 4
; SKX-NEXT: [[TMP64:%.*]] = fcmp fast ogt float [[TMP62]], [[TMP63]]
; SKX-NEXT: [[TMP65:%.*]] = select i1 [[TMP64]], float [[TMP62]], float [[TMP63]]
; SKX-NEXT: [[TMP66:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 22), align 8
; SKX-NEXT: [[TMP67:%.*]] = fcmp fast ogt float [[TMP65]], [[TMP66]]
; SKX-NEXT: [[TMP68:%.*]] = select i1 [[TMP67]], float [[TMP65]], float [[TMP66]]
; SKX-NEXT: [[TMP69:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 23), align 4
; SKX-NEXT: [[TMP70:%.*]] = fcmp fast ogt float [[TMP68]], [[TMP69]]
; SKX-NEXT: [[TMP71:%.*]] = select i1 [[TMP70]], float [[TMP68]], float [[TMP69]]
; SKX-NEXT: [[TMP72:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 24), align 16
; SKX-NEXT: [[TMP73:%.*]] = fcmp fast ogt float [[TMP71]], [[TMP72]]
; SKX-NEXT: [[TMP74:%.*]] = select i1 [[TMP73]], float [[TMP71]], float [[TMP72]]
; SKX-NEXT: [[TMP75:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 25), align 4
; SKX-NEXT: [[TMP76:%.*]] = fcmp fast ogt float [[TMP74]], [[TMP75]]
; SKX-NEXT: [[TMP77:%.*]] = select i1 [[TMP76]], float [[TMP74]], float [[TMP75]]
; SKX-NEXT: [[TMP78:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 26), align 8
; SKX-NEXT: [[TMP79:%.*]] = fcmp fast ogt float [[TMP77]], [[TMP78]]
; SKX-NEXT: [[TMP80:%.*]] = select i1 [[TMP79]], float [[TMP77]], float [[TMP78]]
; SKX-NEXT: [[TMP81:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 27), align 4
; SKX-NEXT: [[TMP82:%.*]] = fcmp fast ogt float [[TMP80]], [[TMP81]]
; SKX-NEXT: [[TMP83:%.*]] = select i1 [[TMP82]], float [[TMP80]], float [[TMP81]]
; SKX-NEXT: [[TMP84:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 28), align 16
; SKX-NEXT: [[TMP85:%.*]] = fcmp fast ogt float [[TMP83]], [[TMP84]]
; SKX-NEXT: [[TMP86:%.*]] = select i1 [[TMP85]], float [[TMP83]], float [[TMP84]]
; SKX-NEXT: [[TMP87:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 29), align 4
; SKX-NEXT: [[TMP88:%.*]] = fcmp fast ogt float [[TMP86]], [[TMP87]]
; SKX-NEXT: [[TMP89:%.*]] = select i1 [[TMP88]], float [[TMP86]], float [[TMP87]]
; SKX-NEXT: [[TMP90:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 30), align 8
; SKX-NEXT: [[TMP91:%.*]] = fcmp fast ogt float [[TMP89]], [[TMP90]]
; SKX-NEXT: [[TMP92:%.*]] = select i1 [[TMP91]], float [[TMP89]], float [[TMP90]]
; SKX-NEXT: [[TMP93:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 31), align 4
; SKX-NEXT: [[TMP94:%.*]] = fcmp fast ogt float [[TMP92]], [[TMP93]]
; SKX-NEXT: [[TMP95:%.*]] = select i1 [[TMP94]], float [[TMP92]], float [[TMP93]]
; SKX-NEXT: ret float [[TMP95]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
%8 = select i1 %7, float %5, float %6		%8 = select i1 %7, float %5, float %6
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/horizontal.ll

Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

declare i32 @foobar(i32)		declare i32 @foobar(i32)

define void @i32_red_call(i32 %val) {		define void @i32_red_call(i32 %val) {
; CHECK-LABEL: @i32_red_call(		; CHECK-LABEL: @i32_red_call(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP0]]		; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[TMP2]], [[ADD]]		; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 3), align 4		; CHECK-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP3]], [[ADD_1]]		; CHECK-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 4), align 16		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 [[TMP4]], [[ADD_2]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 5), align 4		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[ADD_4:%.*]] = add nsw i32 [[TMP5]], [[ADD_3]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 6), align 8		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[ADD_5:%.*]] = add nsw i32 [[TMP6]], [[ADD_4]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP7]], [[ADD_5]]		; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; CHECK-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[ADD_6]])		; CHECK-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
%add.1 = add nsw i32 %2, %add		%add.1 = add nsw i32 %2, %add
Show All 9 Lines	entry:
%add.6 = add nsw i32 %7, %add.5		%add.6 = add nsw i32 %7, %add.5
%res = call i32 @foobar(i32 %add.6)		%res = call i32 @foobar(i32 %add.6)
ret void		ret void
}		}

define void @i32_red_invoke(i32 %val) personality i32 (...)* @__gxx_personality_v0 {		define void @i32_red_invoke(i32 %val) personality i32 (...)* @__gxx_personality_v0 {
; CHECK-LABEL: @i32_red_invoke(		; CHECK-LABEL: @i32_red_invoke(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP0]]		; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[TMP2]], [[ADD]]		; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 3), align 4		; CHECK-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP3]], [[ADD_1]]		; CHECK-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; CHECK-NEXT: [[TMP4:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 4), align 16		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 [[TMP4]], [[ADD_2]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 5), align 4		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[ADD_4:%.*]] = add nsw i32 [[TMP5]], [[ADD_3]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 6), align 8		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[ADD_5:%.*]] = add nsw i32 [[TMP6]], [[ADD_4]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP7]], [[ADD_5]]		; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; CHECK-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[ADD_6]])		; CHECK-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])
; CHECK-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]		; CHECK-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]
; CHECK: exception:		; CHECK: exception:
; CHECK-NEXT: [[CLEANUP:%.*]] = landingpad i8		; CHECK-NEXT: [[CLEANUP:%.*]] = landingpad i8
; CHECK-NEXT: cleanup		; CHECK-NEXT: cleanup
; CHECK-NEXT: br label [[NORMAL]]		; CHECK-NEXT: br label [[NORMAL]]
; CHECK: normal:		; CHECK: normal:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
Show All 25 Lines

test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

	Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[A0:%.*]] = extractelement <4 x float> %a, i32 0			; CHECK-NEXT: [[A0:%.*]] = extractelement <4 x float> %a, i32 0
	; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> %a, i32 1			; CHECK-NEXT: [[A1:%.*]] = extractelement <4 x float> %a, i32 1
	; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> %a, i32 2			; CHECK-NEXT: [[A2:%.*]] = extractelement <4 x float> %a, i32 2
	; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> %a, i32 3			; CHECK-NEXT: [[A3:%.*]] = extractelement <4 x float> %a, i32 3
	; CHECK-NEXT: [[B0:%.*]] = extractelement <4 x float> %b, i32 0			; CHECK-NEXT: [[B0:%.*]] = extractelement <4 x float> %b, i32 0
	; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> %b, i32 1			; CHECK-NEXT: [[B1:%.*]] = extractelement <4 x float> %b, i32 1
	; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> %b, i32 2			; CHECK-NEXT: [[B2:%.*]] = extractelement <4 x float> %b, i32 2
	; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> %b, i32 3			; CHECK-NEXT: [[B3:%.*]] = extractelement <4 x float> %b, i32 3
	; CHECK-NEXT: [[CMP0:%.*]] = icmp ne i32 [[C0]], 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> undef, i32 [[C0]], i32 0
	; CHECK-NEXT: [[CMP1:%.*]] = icmp ne i32 [[C1]], 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C1]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> undef, i32 [[C2]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[C3]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp ne <2 x i32> [[TMP2]], zeroinitializer
	; CHECK-NEXT: [[S0:%.*]] = select i1 [[CMP0]], float [[A0]], float [[B0]]			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> undef, i32 [[C2]], i32 0
	; CHECK-NEXT: [[S1:%.*]] = select i1 [[CMP1]], float [[A1]], float [[B1]]			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[C3]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> undef, float [[A2]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <2 x i32> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[A3]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> undef, float [[A0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> undef, float [[B2]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[A1]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[B3]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> undef, float [[B0]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP5]], <2 x float> [[TMP7]]			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[B1]], i32 1
	; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[S0]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP3]], <2 x float> [[TMP8]], <2 x float> [[TMP10]]
	; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[S1]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> undef, float [[A2]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
	; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> undef, float [[TMP9]], i32 2			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> undef, float [[B2]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
	; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP10]], i32 3			; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
				; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
				; CHECK-NEXT: [[RA:%.*]] = insertelement <4 x float> undef, float [[TMP17]], i32 0
				; CHECK-NEXT: [[TMP18:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
				; CHECK-NEXT: [[RB:%.*]] = insertelement <4 x float> [[RA]], float [[TMP18]], i32 1
				; CHECK-NEXT: [[TMP19:%.*]] = extractelement <2 x float> [[TMP16]], i32 0
				; CHECK-NEXT: [[RC:%.*]] = insertelement <4 x float> undef, float [[TMP19]], i32 2
				; CHECK-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP16]], i32 1
				; CHECK-NEXT: [[RD:%.*]] = insertelement <4 x float> [[RC]], float [[TMP20]], i32 3
	; CHECK-NEXT: ret <4 x float> [[RD]]			; CHECK-NEXT: ret <4 x float> [[RD]]
	;			;
	; ZEROTHRESH-LABEL: @simple_select_no_users(			; ZEROTHRESH-LABEL: @simple_select_no_users(
	; ZEROTHRESH-NEXT: [[C0:%.*]] = extractelement <4 x i32> %c, i32 0			; ZEROTHRESH-NEXT: [[C0:%.*]] = extractelement <4 x i32> %c, i32 0
	; ZEROTHRESH-NEXT: [[C1:%.*]] = extractelement <4 x i32> %c, i32 1			; ZEROTHRESH-NEXT: [[C1:%.*]] = extractelement <4 x i32> %c, i32 1
	; ZEROTHRESH-NEXT: [[C2:%.*]] = extractelement <4 x i32> %c, i32 2			; ZEROTHRESH-NEXT: [[C2:%.*]] = extractelement <4 x i32> %c, i32 2
	; ZEROTHRESH-NEXT: [[C3:%.*]] = extractelement <4 x i32> %c, i32 3			; ZEROTHRESH-NEXT: [[C3:%.*]] = extractelement <4 x i32> %c, i32 3
	; ZEROTHRESH-NEXT: [[A0:%.*]] = extractelement <4 x float> %a, i32 0			; ZEROTHRESH-NEXT: [[A0:%.*]] = extractelement <4 x float> %a, i32 0
	▲ Show 20 Lines • Show All 417 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Support for horizontal min/max reductionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 109576

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/BasicTTIImpl.h

include/llvm/Transforms/Vectorize/SLPVectorizer.h

lib/Analysis/CostModel.cpp

lib/Analysis/TargetTransformInfo.cpp

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/SLPVectorizer/AArch64/gather-root.ll

test/Transforms/SLPVectorizer/X86/horizontal-list.ll

test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

test/Transforms/SLPVectorizer/X86/horizontal.ll

test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

[SLP] Support for horizontal min/max reduction
ClosedPublic