This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Transforms/Vectorize/
-
llvm/
-
Transforms/
-
Vectorize/
-
SLPVectorizer.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
gather-root.ll
-
horizontal.ll
-
spillcost-di.ll
-
X86/
-
PR31847.ll
-
PR35628_1.ll
-
PR35628_2.ll
-
PR39774.ll
-
PR40310.ll
-
bad-reduction.ll
-
horizontal-list.ll
-
horizontal-minmax.ll
-
horizontal.ll
-
long_chains.ll
-
reassociated-loads.ll
-
reduction_loads.ll
-
reduction_unrolled.ll
-
remark_horcost.ll
-
reorder_repeated_ops.ll
-
undef_vect.ll
-
vectorize-reorder-reuse.ll

Differential D29641

[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
ClosedPublic

Authored by ABataev on Feb 7 2017, 7:41 AM.

Download Raw Diff

Details

Reviewers

mzolotukhin
mkuper
hfinkel
RKSimon
davide
spatel

Commits

rG8b1eeafb9133: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
rL373166: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
rG6a278d9073bd: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…
rL372626: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&…

Summary

Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Diff Detail

Repository: rL LLVM

Event Timeline

ABataev created this revision.Feb 7 2017, 7:41 AM

Herald added a subscriber: sanjoy. · View Herald TranscriptFeb 7 2017, 7:41 AM

Test case?

Also, this looks a bit weird. Could you explain what exactly is going wrong?
I'm asking because this somehow seems like it's the wrong granularity for this - after any change, you invalidate the disposition of the innermost loop that contains the basic block in which the root/seed lives. I'm not sure why this is the right thing to do.

majnemer added a subscriber: majnemer.Feb 8 2017, 4:56 PM

majnemer added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
3737–3745 ↗	(On Diff #87432)	In general, we don't use this sort of overloading in LLVM...
3745 ↗	(On Diff #87432)	Probably should mark this as explicit.

Typically SCEV dispositions need to be changed if a loop variant value is made loop invariant (i.e. a LICM like transform, perhaps via makeLoopInvariant or something like that). As @mkuper said on the email thread, it is difficult to diagnose what's going on unless we have a reproducible test case, so I'd focus on that as a first priority.

Address comments + test

Added the test that crashes for me without the changes in this patch

So, do we know what exactly is going wrong here? Which disposition we're changing and why?

Also, any chance to (bugpoint-?)reduce the test case, or is this as small as it gets?

In D29641#675862, @mkuper wrote:

Also, any chance to (bugpoint-?)reduce the test case, or is this as small as it gets?

That's the smallest test case that crashes for me, could not reduce it more.

In D29641#675860, @mkuper wrote:

So, do we know what exactly is going wrong here? Which disposition we're changing and why?

We're not changing the disposition but seems to me some of the SCEV nodes are returned to cache but without clearing of their dispositions. After that, during some SCEV internal optimizations, SCEV tries to simplify some of SCEV nodes, which are LoopInvariant. During this reassociation, it creates a new SCEV node, which is not actually created but is taken from the cache of previously used SCEV nodes. But this new SCEV node for some reasons has the disposition LoopVariant (it was used before, then freed, but not tracked that the disposition for this node should be cleared). And we have a situation where from 2 LoopInvariant nodes we're getting 1 LoopVariant node, while the code expects only LoopInvariants, of course.
Seems to me, we're breaking SCEV during vectorization of select instructions, will try to check it somehow

Here is what happens with this test:
Initially we have 2 SCEVs (-1 * %cond) and (-1 *%cond14), they are combined into (-1 * (%cond + %cond14))<nsw>. All these SCEVs have LoopVariant dispositions. After vectorization scalars %cond and %cond14 are replaced by undef. SCEVs for these undefs have the same addresses, just like for original instructions (they are reused), but their dispositions are cleared and recalculated to LoopInvariant. But after combining of these SCEVs we get SCEV, previously used as (-1 * (%cond + %cond14))<nsw> (because FoldingSetNodeID, used as key, is rebuilt with the same args - identifier (scMulExpr and 2 addresses of the SCEVs, that were LoopVariant, but now are LoopInvariant)). But we did not clear the loop disposition for this SCEV, because ScalarEvolution is unable to do this. So, for two SCEVs with LoopInvariant dispositions, we're getting the resulting SCEV (-1 * (undef + undef))<nsw>, which has disposition LoopVariant. That's why I believe we should clear loop dispositions after each successful attempt of vectorization.

I believe this is exactly the situation @sanjoy described: LoopVariants are transformed to LoopInvariants.

You can't replace the operands of an instruction in a way that it computes a new value while still preserving SCEV, since SCEV keys off of the Instruction *. Loop disposition is only one of the things that this breaks. e.g. if you have 1 + %KnownPositive and you replace the %KnownPositive operand with %MayBeNegative ("in place"), that breaks the getRange cache.

Can the loop vectorizer do without replacing %cond and %cond14 to undef?

Reworked SLP vectorizer to not replace instructions to be deleted with UndefValue.

Harbormaster completed remote builds in B4042: Diff 88731.Feb 16 2017, 7:27 AM

ABataev edited the summary of this revision. (Show Details)Feb 16 2017, 7:29 AM

Ping, everyone :-)

This more or less makes sense to me, although I'm not sure it's the right solution.
One other option would be to actually delete things on the fly, and solve the AliasCache option in a different way. (Why does the SLP vectorizer even have its own alias cache?)
Yet another, probably more realistic, option would be to keep the WeakVHs to solve the reallocation issue, but, actually "delete" everything we need to delete, instead of leaving hanging undefs.

Regardless, I believe Hans wants to merge this into 4.0 to fix PR31847. This doesn't look at all safe, and I don't feel confident enough in this to endorse it for the branch, without baking in-tree for a while. And another pair of eyes would be great.

Hal?

lib/Transforms/Vectorize/SLPVectorizer.cpp
609 ↗	(On Diff #88731)	Is this change (and the others turning instructions into insertelementinst) related? If not, can they go into a separate patch?
2981 ↗	(On Diff #88731)	This comment no longer makes sense. Does the whole check?
4504 ↗	(On Diff #88731)	Is there any interaction with the "extra values" stuff?
5047 ↗	(On Diff #88731)	Again, this is an unrelated change, right?

Address Michael's comments

Harbormaster completed remote builds in B4444: Diff 90319.Mar 2 2017, 5:07 AM

ABataev added inline comments.Mar 6 2017, 11:55 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2981 ↗	(On Diff #88731)	I'll fix the comment, but the check itself is important.
4504 ↗	(On Diff #88731)	I reworked this part. Extra args are not removed, but the users of extra args are stored as ReductionOps and will be automatically marked for deletion

anemet added a subscriber: anemet.Apr 16 2017, 11:56 AM

Hi,

I would like to understand this better. Could somebody explain what assumptions SCEV makes about its clients? Which assumption(s) is broken by SLP? It seems to me that this issue potentially touches fundamental design decisions/questions and I don't see any verifiers in place.

Thanks
Gerolf

test/Transforms/SLPVectorizer/X86/crash-SCEV.ll
1 ↗	(On Diff #90319)	What is this supposed to test? I'm wondering if there can be a smaller test for what this one is supposed to do. As is at the least it looks hard to maintain.

RKSimon added a subscriber: RKSimon.May 18 2017, 8:52 AM

In D29641#728047, @Gerolf wrote:

Hi,

I would like to understand this better. Could somebody explain what assumptions SCEV makes about its clients? Which assumption(s) is broken by SLP? It seems to me that this issue potentially touches fundamental design decisions/questions and I don't see any verifiers in place.

Thanks
Gerolf

Hi Gerolf, sorry for the delay with the answer.
I suppose Sanjoy answered your question already:

You can't replace the operands of an instruction in a way that it computes a new value while still preserving SCEV, since SCEV keys off of the Instruction *. Loop disposition is only one of the things that this breaks. e.g. if you have 1 + %KnownPositive and you replace the %KnownPositive operand with %MayBeNegative ("in place"), that breaks the getRange cache.

Can the loop vectorizer do without replacing %cond and %cond14 to undef?

So, I think you can't replace instructions with one disposition by the instruction with another disposition. You can just remove it.

test/Transforms/SLPVectorizer/X86/crash-SCEV.ll
1 ↗	(On Diff #90319)	This is a reproducer, that currently crashes the compiler for sure. I tried to make it as small as possible, but could not reduce it anymore.

ABataev mentioned this in D29826: [SLP] General improvements of SLP vectorization process..Aug 3 2017, 6:10 AM

RKSimon added reviewers: RKSimon, davide, spatel.Aug 3 2017, 7:51 AM

What's the status here?

In D29641#830653, @hans wrote:

What's the status here?

It is still under review

What state is this in? PR31847 was cleared as a 5.0.0 blocker, so it probably won't need to be merged for 5.0.1 either

test/Transforms/SLPVectorizer/X86/crash-SCEV.ll
1 ↗	(On Diff #90319)	Possibly rename the test file pr31847.ll? Test called 'crash' etc. aren't that useful imo.

Update after review

Herald added a subscriber: javed.absar. · View Herald TranscriptSep 12 2017, 1:34 PM

Harbormaster completed remote builds in B10147: Diff 114895.Sep 12 2017, 1:35 PM

RKSimon added inline comments.Oct 29 2017, 5:08 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1327 ↗	(On Diff #114895)	What do we gain from using std::for_each instead of basic range for loops here and in ~BoUpSLP()?
4736 ↗	(On Diff #114895)	Would it be better to move the return into each min/max case, drop the RK_None case and just have the llvm_unreachable at the end of the function? Similar question for initReductionOps / addReductionOps below
4934 ↗	(On Diff #114895)	Don't have a break after return - some compilers will shout at you.
4960 ↗	(On Diff #114895)	Remove the break.

ABataev marked 2 inline comments as done.Oct 31 2017, 9:18 AM

ABataev added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1327 ↗	(On Diff #114895)	Nothing special, chnged it.
4736 ↗	(On Diff #114895)	Ok, reworked.

Update after review

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 1:10 PM

Herald added subscribers: zzheng, qcolombet. · View Herald Transcript

Harbormaster completed remote builds in B37741: Diff 218777.Sep 4 2019, 1:12 PM

RKSimon added inline comments.Sep 5 2019, 8:29 AM

include/llvm/Transforms/Vectorize/SLPVectorizer.h
27 ↗	(On Diff #218777)	Do we need this include any more? Or can it be moved to SLPVectorizer.cpp?
lib/Transforms/Vectorize/SLPVectorizer.cpp
5875 ↗	(On Diff #218777)	Why is this change necessary?
5895 ↗	(On Diff #218777)	Why is this change necessary?

@ABataev Other than those minors this patch looks almost ready

Update + fixes after comments.

ABataev marked 2 inline comments as done.Sep 23 2019, 7:37 AM

ABataev added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
5895 ↗	(On Diff #218777)	Restored original code

Harbormaster completed remote builds in B38434: Diff 221328.Sep 23 2019, 7:37 AM

LGTM - cheers

This revision is now accepted and ready to land.Sep 23 2019, 8:46 AM

Closed by commit rL372626: [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) &&… (authored by ABataev). · Explain WhySep 23 2019, 9:23 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in D43582: [SLP] Generalization of stores vectorization..Sep 24 2019, 4:29 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Vectorize/

SLPVectorizer.h

9 lines

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

141 lines

test/

Transforms/

SLPVectorizer/

AArch64/

gather-root.ll

102 lines

horizontal.ll

16 lines

spillcost-di.ll

4 lines

X86/

153 lines

13 lines

5 lines

72 lines

16 lines

28 lines

354 lines

476 lines

148 lines

8 lines

reassociated-loads.ll

31 lines

reduction_loads.ll

24 lines

reduction_unrolled.ll

35 lines

remark_horcost.ll

4 lines

reorder_repeated_ops.ll

22 lines

undef_vect.ll

10 lines

vectorize-reorder-reuse.ll

42 lines

Diff 221354

llvm/trunk/include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show All 18 Lines
#define LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H		#define LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/ValueHandle.h"

namespace llvm {		namespace llvm {

class AssumptionCache;		class AssumptionCache;
class BasicBlock;		class BasicBlock;
class CmpInst;		class CmpInst;
class DataLayout;		class DataLayout;
class DemandedBits;		class DemandedBits;
Show All 19 Lines

} // end namespace slpvectorizer		} // end namespace slpvectorizer

extern cl::opt<bool> RunSLPVectorization;		extern cl::opt<bool> RunSLPVectorization;

struct SLPVectorizerPass : public PassInfoMixin<SLPVectorizerPass> {		struct SLPVectorizerPass : public PassInfoMixin<SLPVectorizerPass> {
using StoreList = SmallVector<StoreInst *, 8>;		using StoreList = SmallVector<StoreInst *, 8>;
using StoreListMap = MapVector<Value *, StoreList>;		using StoreListMap = MapVector<Value *, StoreList>;
using WeakTrackingVHList = SmallVector<WeakTrackingVH, 8>;		using GEPList = SmallVector<GetElementPtrInst *, 8>;
using WeakTrackingVHListMap = MapVector<Value *, WeakTrackingVHList>;		using GEPListMap = MapVector<Value *, GEPList>;

ScalarEvolution *SE = nullptr;		ScalarEvolution *SE = nullptr;
TargetTransformInfo *TTI = nullptr;		TargetTransformInfo *TTI = nullptr;
TargetLibraryInfo *TLI = nullptr;		TargetLibraryInfo *TLI = nullptr;
AliasAnalysis *AA = nullptr;		AliasAnalysis *AA = nullptr;
LoopInfo *LI = nullptr;		LoopInfo *LI = nullptr;
DominatorTree *DT = nullptr;		DominatorTree *DT = nullptr;
AssumptionCache *AC = nullptr;		AssumptionCache *AC = nullptr;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	private:
bool vectorizeInsertElementInst(InsertElementInst IEI, BasicBlock BB,		bool vectorizeInsertElementInst(InsertElementInst IEI, BasicBlock BB,
slpvectorizer::BoUpSLP &R);		slpvectorizer::BoUpSLP &R);

/// Try to vectorize trees that start at compare instructions.		/// Try to vectorize trees that start at compare instructions.
bool vectorizeCmpInst(CmpInst CI, BasicBlock BB, slpvectorizer::BoUpSLP &R);		bool vectorizeCmpInst(CmpInst CI, BasicBlock BB, slpvectorizer::BoUpSLP &R);

/// Tries to vectorize constructs started from CmpInst, InsertValueInst or		/// Tries to vectorize constructs started from CmpInst, InsertValueInst or
/// InsertElementInst instructions.		/// InsertElementInst instructions.
bool vectorizeSimpleInstructions(SmallVectorImpl<WeakVH> &Instructions,		bool vectorizeSimpleInstructions(SmallVectorImpl<Instruction *> &Instructions,
BasicBlock *BB, slpvectorizer::BoUpSLP &R);		BasicBlock *BB, slpvectorizer::BoUpSLP &R);

/// Scan the basic block and look for patterns that are likely to start		/// Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.		/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,		bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,
unsigned VecRegSize);		unsigned VecRegSize);

bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);		bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);

/// The store instructions in a basic block organized by base pointer.		/// The store instructions in a basic block organized by base pointer.
StoreListMap Stores;		StoreListMap Stores;

/// The getelementptr instructions in a basic block organized by base pointer.		/// The getelementptr instructions in a basic block organized by base pointer.
WeakTrackingVHListMap GEPs;		GEPListMap GEPs;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H		#endif // LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,115 Lines • ▼ Show 20 Lines	LLVM_DUMP_METHOD raw_ostream &print(raw_ostream &OS) const {
return OS;		return OS;
}		}

/// Debug print.		/// Debug print.
LLVM_DUMP_METHOD void dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void dump() const { print(dbgs()); }
#endif		#endif
};		};

		/// Checks if the instruction is marked for deletion.
		bool isDeleted(Instruction *I) const { return DeletedInstructions.count(I); }

		/// Marks values for later deletion.
		void eraseInstructions(ArrayRef<Value *> AV);

		~BoUpSLP();

private:		private:
/// Checks if all users of \p I are the part of the vectorization tree.		/// Checks if all users of \p I are the part of the vectorization tree.
bool areAllUsersVectorized(Instruction *I) const;		bool areAllUsersVectorized(Instruction *I) const;

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
int getEntryCost(TreeEntry *E);		int getEntryCost(TreeEntry *E);

/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
▲ Show 20 Lines • Show All 354 Lines • ▼ Show 20 Lines	#endif

/// Removes an instruction from its block and eventually deletes it.		/// Removes an instruction from its block and eventually deletes it.
/// It's like Instruction::eraseFromParent() except that the actual deletion		/// It's like Instruction::eraseFromParent() except that the actual deletion
/// is delayed until BoUpSLP is destructed.		/// is delayed until BoUpSLP is destructed.
/// This is required to ensure that there are no incorrect collisions in the		/// This is required to ensure that there are no incorrect collisions in the
/// AliasCache, which can happen if a new instruction is allocated at the		/// AliasCache, which can happen if a new instruction is allocated at the
/// same address as a previously deleted instruction.		/// same address as a previously deleted instruction.
void eraseInstruction(Instruction *I) {		void eraseInstruction(Instruction *I) {
I->removeFromParent();		DeletedInstructions.insert(I);
I->dropAllReferences();
DeletedInstructions.emplace_back(I);
}		}

/// Temporary store for deleted instructions. Instructions will be deleted		/// Temporary store for deleted instructions. Instructions will be deleted
/// eventually when the BoUpSLP is destructed.		/// eventually when the BoUpSLP is destructed.
SmallVector<unique_value, 8> DeletedInstructions;		SmallPtrSet<Instruction *, 8> DeletedInstructions;

/// A list of values that need to extracted out of the tree.		/// A list of values that need to extracted out of the tree.
/// This list holds pairs of (Internal Scalar : External User). External User		/// This list holds pairs of (Internal Scalar : External User). External User
/// can be nullptr, it means that this Internal Scalar will be used later,		/// can be nullptr, it means that this Internal Scalar will be used later,
/// after vectorization.		/// after vectorization.
UserList ExternalUses;		UserList ExternalUses;

/// Values used only by @llvm.assume calls.		/// Values used only by @llvm.assume calls.
▲ Show 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	static std::string getNodeAttributes(const TreeEntry *Entry,
if (Entry->NeedToGather)		if (Entry->NeedToGather)
return "color=red";		return "color=red";
return "";		return "";
}		}
};		};

} // end namespace llvm		} // end namespace llvm

		BoUpSLP::~BoUpSLP() {
		for (auto *I : DeletedInstructions)
		I->dropAllReferences();
		for (auto *I : DeletedInstructions) {
		assert(I->use_empty() && "trying to erase instruction with users.");
		I->eraseFromParent();
		}
		}

		void BoUpSLP::eraseInstructions(ArrayRef<Value *> AV) {
		for (auto *V : AV) {
		if (auto *I = dyn_cast<Instruction>(V))
		eraseInstruction(I);
		};
		}

void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst) {
ExtraValueToDebugLocsMap ExternallyUsedValues;		ExtraValueToDebugLocsMap ExternallyUsedValues;
buildTree(Roots, ExternallyUsedValues, UserIgnoreLst);		buildTree(Roots, ExternallyUsedValues, UserIgnoreLst);
}		}

void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
▲ Show 20 Lines • Show All 1,470 Lines • ▼ Show 20 Lines	void BoUpSLP::setInsertPointAfterBundle(TreeEntry *E) {
Builder.SetCurrentDebugLocation(Front->getDebugLoc());		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
}		}

Value BoUpSLP::Gather(ArrayRef<Value > VL, VectorType *Ty) {		Value BoUpSLP::Gather(ArrayRef<Value > VL, VectorType *Ty) {
Value *Vec = UndefValue::get(Ty);		Value *Vec = UndefValue::get(Ty);
// Generate the 'InsertElement' instruction.		// Generate the 'InsertElement' instruction.
for (unsigned i = 0; i < Ty->getNumElements(); ++i) {		for (unsigned i = 0; i < Ty->getNumElements(); ++i) {
Vec = Builder.CreateInsertElement(Vec, VL[i], Builder.getInt32(i));		Vec = Builder.CreateInsertElement(Vec, VL[i], Builder.getInt32(i));
if (Instruction *Insrt = dyn_cast<Instruction>(Vec)) {		if (auto *Insrt = dyn_cast<InsertElementInst>(Vec)) {
GatherSeq.insert(Insrt);		GatherSeq.insert(Insrt);
CSEBlocks.insert(Insrt->getParent());		CSEBlocks.insert(Insrt->getParent());

// Add to our 'need-to-extract' list.		// Add to our 'need-to-extract' list.
if (TreeEntry *E = getTreeEntry(VL[i])) {		if (TreeEntry *E = getTreeEntry(VL[i])) {
// Find which lane we need to extract.		// Find which lane we need to extract.
int FoundLane = -1;		int FoundLane = -1;
for (unsigned Lane = 0, LE = E->Scalars.size(); Lane != LE; ++Lane) {		for (unsigned Lane = 0, LE = E->Scalars.size(); Lane != LE; ++Lane) {
▲ Show 20 Lines • Show All 732 Lines • ▼ Show 20 Lines	if (Entry->NeedToGather)
continue;		continue;

assert(Entry->VectorizedValue && "Can't find vectorizable value");		assert(Entry->VectorizedValue && "Can't find vectorizable value");

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];

		#ifndef NDEBUG
Type *Ty = Scalar->getType();		Type *Ty = Scalar->getType();
if (!Ty->isVoidTy()) {		if (!Ty->isVoidTy()) {
#ifndef NDEBUG
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");

// It is legal to replace users in the ignorelist by undef.		// It is legal to delete users in the ignorelist.
assert((getTreeEntry(U) \|\| is_contained(UserIgnoreList, U)) &&		assert((getTreeEntry(U) \|\| is_contained(UserIgnoreList, U)) &&
"Replacing out-of-tree value with undef");		"Deleting out-of-tree value");
}		}
#endif
Value *Undef = UndefValue::get(Ty);
Scalar->replaceAllUsesWith(Undef);
}		}
		#endif
LLVM_DEBUG(dbgs() << "SLP: \tErasing scalar:" << *Scalar << ".\n");		LLVM_DEBUG(dbgs() << "SLP: \tErasing scalar:" << *Scalar << ".\n");
eraseInstruction(cast<Instruction>(Scalar));		eraseInstruction(cast<Instruction>(Scalar));
}		}
}		}

Builder.ClearInsertionPoint();		Builder.ClearInsertionPoint();

return VectorizableTree[0]->VectorizedValue;		return VectorizableTree[0]->VectorizedValue;
}		}

void BoUpSLP::optimizeGatherSequence() {		void BoUpSLP::optimizeGatherSequence() {
LLVM_DEBUG(dbgs() << "SLP: Optimizing " << GatherSeq.size()		LLVM_DEBUG(dbgs() << "SLP: Optimizing " << GatherSeq.size()
<< " gather sequences instructions.\n");		<< " gather sequences instructions.\n");
// LICM InsertElementInst sequences.		// LICM InsertElementInst sequences.
for (Instruction *I : GatherSeq) {		for (Instruction *I : GatherSeq) {
if (!isa<InsertElementInst>(I) && !isa<ShuffleVectorInst>(I))		if (isDeleted(I))
continue;		continue;

// Check if this block is inside a loop.		// Check if this block is inside a loop.
Loop *L = LI->getLoopFor(I->getParent());		Loop *L = LI->getLoopFor(I->getParent());
if (!L)		if (!L)
continue;		continue;

// Check if it has a preheader.		// Check if it has a preheader.
Show All 37 Lines	void BoUpSLP::optimizeGatherSequence() {
SmallVector<Instruction *, 16> Visited;		SmallVector<Instruction *, 16> Visited;
for (auto I = CSEWorkList.begin(), E = CSEWorkList.end(); I != E; ++I) {		for (auto I = CSEWorkList.begin(), E = CSEWorkList.end(); I != E; ++I) {
assert((I == CSEWorkList.begin() \|\| !DT->dominates(I, std::prev(I))) &&		assert((I == CSEWorkList.begin() \|\| !DT->dominates(I, std::prev(I))) &&
"Worklist not sorted properly!");		"Worklist not sorted properly!");
BasicBlock BB = (I)->getBlock();		BasicBlock BB = (I)->getBlock();
// For all instructions in blocks containing gather sequences:		// For all instructions in blocks containing gather sequences:
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e;) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e;) {
Instruction In = &it++;		Instruction In = &it++;
		if (isDeleted(In))
		continue;
if (!isa<InsertElementInst>(In) && !isa<ExtractElementInst>(In))		if (!isa<InsertElementInst>(In) && !isa<ExtractElementInst>(In))
continue;		continue;

// Check if we can replace this instruction with any of the		// Check if we can replace this instruction with any of the
// visited instructions.		// visited instructions.
for (Instruction *v : Visited) {		for (Instruction *v : Visited) {
if (In->isIdenticalTo(v) &&		if (In->isIdenticalTo(v) &&
DT->dominates(v->getParent(), In->getParent())) {		DT->dominates(v->getParent(), In->getParent())) {
▲ Show 20 Lines • Show All 866 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,
if (Changed) {		if (Changed) {
R.optimizeGatherSequence();		R.optimizeGatherSequence();
LLVM_DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");		LLVM_DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");
LLVM_DEBUG(verifyFunction(F));		LLVM_DEBUG(verifyFunction(F));
}		}
return Changed;		return Changed;
}		}

/// Check that the Values in the slice in VL array are still existent in
/// the WeakTrackingVH array.
/// Vectorization of part of the VL array may cause later values in the VL array
/// to become invalid. We track when this has happened in the WeakTrackingVH
/// array.
static bool hasValueBeenRAUWed(ArrayRef<Value *> VL,
ArrayRef<WeakTrackingVH> VH, unsigned SliceBegin,
unsigned SliceSize) {
VL = VL.slice(SliceBegin, SliceSize);
VH = VH.slice(SliceBegin, SliceSize);
return !std::equal(VL.begin(), VL.end(), VH.begin());
}

bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,		bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,
unsigned VecRegSize) {		unsigned VecRegSize) {
const unsigned ChainLen = Chain.size();		const unsigned ChainLen = Chain.size();
LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << ChainLen		LLVM_DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << ChainLen
<< "\n");		<< "\n");
const unsigned Sz = R.getVectorElementSize(Chain[0]);		const unsigned Sz = R.getVectorElementSize(Chain[0]);
const unsigned VF = VecRegSize / Sz;		const unsigned VF = VecRegSize / Sz;

if (!isPowerOf2_32(Sz) \|\| VF < 2)		if (!isPowerOf2_32(Sz) \|\| VF < 2)
return false;		return false;

// Keep track of values that were deleted by vectorizing in the loop below.
const SmallVector<WeakTrackingVH, 8> TrackValues(Chain.begin(), Chain.end());

bool Changed = false;		bool Changed = false;
// Look for profitable vectorizable trees at all offsets, starting at zero.		// Look for profitable vectorizable trees at all offsets, starting at zero.
for (unsigned i = 0, e = ChainLen; i + VF <= e; ++i) {		for (unsigned i = 0, e = ChainLen; i + VF <= e; ++i) {

		ArrayRef<Value *> Operands = Chain.slice(i, VF);
// Check that a previous iteration of this loop did not delete the Value.		// Check that a previous iteration of this loop did not delete the Value.
if (hasValueBeenRAUWed(Chain, TrackValues, i, VF))		if (llvm::any_of(Operands, [&R](Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		return I && R.isDeleted(I);
		}))
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << i		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << VF << " stores at offset " << i
<< "\n");		<< "\n");
ArrayRef<Value *> Operands = Chain.slice(i, VF);

R.buildTree(Operands);		R.buildTree(Operands);
if (R.isTreeTinyAndNotFullyVectorizable())		if (R.isTreeTinyAndNotFullyVectorizable())
continue;		continue;

R.computeMinimumValueSizes();		R.computeMinimumValueSizes();

int Cost = R.getTreeCost();		int Cost = R.getTreeCost();
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	if (!isValidElementType(Ty)) {
return false;		return false;
}		}
}		}

bool Changed = false;		bool Changed = false;
bool CandidateFound = false;		bool CandidateFound = false;
int MinCost = SLPCostThreshold;		int MinCost = SLPCostThreshold;

// Keep track of values that were deleted by vectorizing in the loop below.
SmallVector<WeakTrackingVH, 8> TrackValues(VL.begin(), VL.end());

unsigned NextInst = 0, MaxInst = VL.size();		unsigned NextInst = 0, MaxInst = VL.size();
for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {		for (unsigned VF = MaxVF; NextInst + 1 < MaxInst && VF >= MinVF; VF /= 2) {
// No actual vectorization should happen, if number of parts is the same as		// No actual vectorization should happen, if number of parts is the same as
// provided vectorization factor (i.e. the scalar type is used for vector		// provided vectorization factor (i.e. the scalar type is used for vector
// code during codegen).		// code during codegen).
auto *VecTy = VectorType::get(VL[0]->getType(), VF);		auto *VecTy = VectorType::get(VL[0]->getType(), VF);
if (TTI->getNumberOfParts(VecTy) == VF)		if (TTI->getNumberOfParts(VecTy) == VF)
continue;		continue;
for (unsigned I = NextInst; I < MaxInst; ++I) {		for (unsigned I = NextInst; I < MaxInst; ++I) {
unsigned OpsWidth = 0;		unsigned OpsWidth = 0;

if (I + VF > MaxInst)		if (I + VF > MaxInst)
OpsWidth = MaxInst - I;		OpsWidth = MaxInst - I;
else		else
OpsWidth = VF;		OpsWidth = VF;

if (!isPowerOf2_32(OpsWidth) \|\| OpsWidth < 2)		if (!isPowerOf2_32(OpsWidth) \|\| OpsWidth < 2)
break;		break;

		ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);
// Check that a previous iteration of this loop did not delete the Value.		// Check that a previous iteration of this loop did not delete the Value.
if (hasValueBeenRAUWed(VL, TrackValues, I, OpsWidth))		if (llvm::any_of(Ops, [&R](Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		return I && R.isDeleted(I);
		}))
continue;		continue;

LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "		LLVM_DEBUG(dbgs() << "SLP: Analyzing " << OpsWidth << " operations "
<< "\n");		<< "\n");
ArrayRef<Value *> Ops = VL.slice(I, OpsWidth);

R.buildTree(Ops);		R.buildTree(Ops);
Optional<ArrayRef<unsigned>> Order = R.bestOrder();		Optional<ArrayRef<unsigned>> Order = R.bestOrder();
// TODO: check if we can allow reordering for more cases.		// TODO: check if we can allow reordering for more cases.
if (AllowReorder && Order) {		if (AllowReorder && Order) {
// TODO: reorder tree nodes without tree rebuilding.		// TODO: reorder tree nodes without tree rebuilding.
// Conceptually, there is nothing actually preventing us from trying to		// Conceptually, there is nothing actually preventing us from trying to
// reorder a larger list. In fact, we do exactly this when vectorizing		// reorder a larger list. In fact, we do exactly this when vectorizing
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	Value *createOp(IRBuilder<> &Builder, const Twine &Name) const {
Value *Cmp = nullptr;		Value *Cmp = nullptr;
switch (Kind) {		switch (Kind) {
case RK_Arithmetic:		case RK_Arithmetic:
return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS, RHS,		return Builder.CreateBinOp((Instruction::BinaryOps)Opcode, LHS, RHS,
Name);		Name);
case RK_Min:		case RK_Min:
Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSLT(LHS, RHS)		Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSLT(LHS, RHS)
: Builder.CreateFCmpOLT(LHS, RHS);		: Builder.CreateFCmpOLT(LHS, RHS);
break;		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
case RK_Max:		case RK_Max:
Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSGT(LHS, RHS)		Cmp = Opcode == Instruction::ICmp ? Builder.CreateICmpSGT(LHS, RHS)
: Builder.CreateFCmpOGT(LHS, RHS);		: Builder.CreateFCmpOGT(LHS, RHS);
break;		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
case RK_UMin:		case RK_UMin:
assert(Opcode == Instruction::ICmp && "Expected integer types.");		assert(Opcode == Instruction::ICmp && "Expected integer types.");
Cmp = Builder.CreateICmpULT(LHS, RHS);		Cmp = Builder.CreateICmpULT(LHS, RHS);
break;		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
case RK_UMax:		case RK_UMax:
assert(Opcode == Instruction::ICmp && "Expected integer types.");		assert(Opcode == Instruction::ICmp && "Expected integer types.");
Cmp = Builder.CreateICmpUGT(LHS, RHS);		Cmp = Builder.CreateICmpUGT(LHS, RHS);
break;		return Builder.CreateSelect(Cmp, LHS, RHS, Name);
case RK_None:		case RK_None:
llvm_unreachable("Unknown reduction operation.");		break;
}		}
return Builder.CreateSelect(Cmp, LHS, RHS, Name);		llvm_unreachable("Unknown reduction operation.");
}		}

public:		public:
explicit OperationData() = default;		explicit OperationData() = default;

/// Construction for reduced values. They are identified by opcode only and		/// Construction for reduced values. They are identified by opcode only and
/// don't have associated LHS/RHS values.		/// don't have associated LHS/RHS values.
explicit OperationData(Value *V) {		explicit OperationData(Value *V) {
▲ Show 20 Lines • Show All 663 Lines • ▼ Show 20 Lines	if (VectorizedTree) {
OperationData VectReductionData(ReductionData.getOpcode(),		OperationData VectReductionData(ReductionData.getOpcode(),
VectorizedTree, Pair.first,		VectorizedTree, Pair.first,
ReductionData.getKind());		ReductionData.getKind());
VectorizedTree = VectReductionData.createOp(Builder, "op.extra", I);		VectorizedTree = VectReductionData.createOp(Builder, "op.extra", I);
}		}
}		}
// Update users.		// Update users.
ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);
		// Mark all scalar reduction ops for deletion, they are replaced by the
		// vector reductions.
		V.eraseInstructions(IgnoreList);
}		}
return VectorizedTree != nullptr;		return VectorizedTree != nullptr;
}		}

unsigned numReductionValues() const {		unsigned numReductionValues() const {
return ReducedVals.size();		return ReducedVals.size();
}		}

▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	static bool tryToVectorizeHorReductionOrInstOperands(
// found, try to vectorize it. If it is not a horizontal reduction or		// found, try to vectorize it. If it is not a horizontal reduction or
// vectorization is not possible or not effective, and currently analyzed		// vectorization is not possible or not effective, and currently analyzed
// instruction is a binary operation, try to vectorize the operands, using		// instruction is a binary operation, try to vectorize the operands, using
// pre-order DFS traversal order. If the operands were not vectorized, repeat		// pre-order DFS traversal order. If the operands were not vectorized, repeat
// the same procedure considering each operand as a possible root of the		// the same procedure considering each operand as a possible root of the
// horizontal reduction.		// horizontal reduction.
// Interrupt the process if the Root instruction itself was vectorized or all		// Interrupt the process if the Root instruction itself was vectorized or all
// sub-trees not higher that RecursionMaxDepth were analyzed/vectorized.		// sub-trees not higher that RecursionMaxDepth were analyzed/vectorized.
SmallVector<std::pair<WeakTrackingVH, unsigned>, 8> Stack(1, {Root, 0});		SmallVector<std::pair<Instruction *, unsigned>, 8> Stack(1, {Root, 0});
SmallPtrSet<Value *, 8> VisitedInstrs;		SmallPtrSet<Value *, 8> VisitedInstrs;
bool Res = false;		bool Res = false;
while (!Stack.empty()) {		while (!Stack.empty()) {
Value *V;		Instruction *Inst;
unsigned Level;		unsigned Level;
std::tie(V, Level) = Stack.pop_back_val();		std::tie(Inst, Level) = Stack.pop_back_val();
if (!V)
continue;
auto *Inst = dyn_cast<Instruction>(V);
if (!Inst)
continue;
auto *BI = dyn_cast<BinaryOperator>(Inst);		auto *BI = dyn_cast<BinaryOperator>(Inst);
auto *SI = dyn_cast<SelectInst>(Inst);		auto *SI = dyn_cast<SelectInst>(Inst);
if (BI \|\| SI) {		if (BI \|\| SI) {
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, Inst)) {		if (HorRdx.matchAssociativeReduction(P, Inst)) {
if (HorRdx.tryToReduce(R, TTI)) {		if (HorRdx.tryToReduce(R, TTI)) {
Res = true;		Res = true;
// Set P to nullptr to avoid re-analysis of phi node in		// Set P to nullptr to avoid re-analysis of phi node in
Show All 24 Lines	while (!Stack.empty()) {

// Try to vectorize operands.		// Try to vectorize operands.
// Continue analysis for the instruction from the same basic block only to		// Continue analysis for the instruction from the same basic block only to
// save compile time.		// save compile time.
if (++Level < RecursionMaxDepth)		if (++Level < RecursionMaxDepth)
for (auto *Op : Inst->operand_values())		for (auto *Op : Inst->operand_values())
if (VisitedInstrs.insert(Op).second)		if (VisitedInstrs.insert(Op).second)
if (auto *I = dyn_cast<Instruction>(Op))		if (auto *I = dyn_cast<Instruction>(Op))
if (!isa<PHINode>(I) && I->getParent() == BB)		if (!isa<PHINode>(I) && !R.isDeleted(I) && I->getParent() == BB)
Stack.emplace_back(Op, Level);		Stack.emplace_back(I, Level);
}		}
return Res;		return Res;
}		}

bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,		bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,
BasicBlock *BB, BoUpSLP &R,		BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI) {		TargetTransformInfo *TTI) {
if (!V)		if (!V)
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::vectorizeCmpInst(CmpInst CI, BasicBlock BB,
for (int Idx = 0; Idx < 2; ++Idx) {		for (int Idx = 0; Idx < 2; ++Idx) {
OpsChanged \|=		OpsChanged \|=
vectorizeRootInstruction(nullptr, CI->getOperand(Idx), BB, R, TTI);		vectorizeRootInstruction(nullptr, CI->getOperand(Idx), BB, R, TTI);
}		}
return OpsChanged;		return OpsChanged;
}		}

bool SLPVectorizerPass::vectorizeSimpleInstructions(		bool SLPVectorizerPass::vectorizeSimpleInstructions(
SmallVectorImpl<WeakVH> &Instructions, BasicBlock *BB, BoUpSLP &R) {		SmallVectorImpl<Instruction > &Instructions, BasicBlock BB, BoUpSLP &R) {
bool OpsChanged = false;		bool OpsChanged = false;
for (auto &VH : reverse(Instructions)) {		for (auto *I : reverse(Instructions)) {
auto *I = dyn_cast_or_null<Instruction>(VH);		if (R.isDeleted(I))
if (!I)
continue;		continue;
if (auto *LastInsertValue = dyn_cast<InsertValueInst>(I))		if (auto *LastInsertValue = dyn_cast<InsertValueInst>(I))
OpsChanged \|= vectorizeInsertValueInst(LastInsertValue, BB, R);		OpsChanged \|= vectorizeInsertValueInst(LastInsertValue, BB, R);
else if (auto *LastInsertElem = dyn_cast<InsertElementInst>(I))		else if (auto *LastInsertElem = dyn_cast<InsertElementInst>(I))
OpsChanged \|= vectorizeInsertElementInst(LastInsertElem, BB, R);		OpsChanged \|= vectorizeInsertElementInst(LastInsertElem, BB, R);
else if (auto *CI = dyn_cast<CmpInst>(I))		else if (auto *CI = dyn_cast<CmpInst>(I))
OpsChanged \|= vectorizeCmpInst(CI, BB, R);		OpsChanged \|= vectorizeCmpInst(CI, BB, R);
}		}
Show All 12 Lines	while (HaveVectorizedPhiNodes) {

// Collect the incoming values from the PHIs.		// Collect the incoming values from the PHIs.
Incoming.clear();		Incoming.clear();
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
PHINode *P = dyn_cast<PHINode>(&I);		PHINode *P = dyn_cast<PHINode>(&I);
if (!P)		if (!P)
break;		break;

if (!VisitedInstrs.count(P))		if (!VisitedInstrs.count(P) && !R.isDeleted(P))
Incoming.push_back(P);		Incoming.push_back(P);
}		}

// Sort by type.		// Sort by type.
llvm::stable_sort(Incoming, PhiTypeSorterFunc);		llvm::stable_sort(Incoming, PhiTypeSorterFunc);

// Try to vectorize elements base on their type.		// Try to vectorize elements base on their type.
for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),		for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),
Show All 27 Lines	for (SmallVector<Value *, 4>::iterator IncIt = Incoming.begin(),

// Start over at the next instruction of a different type (or the end).		// Start over at the next instruction of a different type (or the end).
IncIt = SameTypeIt;		IncIt = SameTypeIt;
}		}
}		}

VisitedInstrs.clear();		VisitedInstrs.clear();

SmallVector<WeakVH, 8> PostProcessInstructions;		SmallVector<Instruction *, 8> PostProcessInstructions;
SmallDenseSet<Instruction *, 4> KeyNodes;		SmallDenseSet<Instruction *, 4> KeyNodes;
for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
		// Skip instructions marked for the deletion.
		if (R.isDeleted(&*it))
		continue;
// We may go through BB multiple times so skip the one we have checked.		// We may go through BB multiple times so skip the one we have checked.
if (!VisitedInstrs.insert(&*it).second) {		if (!VisitedInstrs.insert(&*it).second) {
if (it->use_empty() && KeyNodes.count(&*it) > 0 &&		if (it->use_empty() && KeyNodes.count(&*it) > 0 &&
vectorizeSimpleInstructions(PostProcessInstructions, BB, R)) {		vectorizeSimpleInstructions(PostProcessInstructions, BB, R)) {
// We would like to start over since some instructions are deleted		// We would like to start over since some instructions are deleted
// and the iterator may become invalid value.		// and the iterator may become invalid value.
Changed = true;		Changed = true;
it = BB->begin();		it = BB->begin();
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	for (unsigned BI = 0, BE = Entry.second.size(); BI < BE; BI += 16) {

// Initialize a set a candidate getelementptrs. Note that we use a		// Initialize a set a candidate getelementptrs. Note that we use a
// SetVector here to preserve program order. If the index computations		// SetVector here to preserve program order. If the index computations
// are vectorizable and begin with loads, we want to minimize the chance		// are vectorizable and begin with loads, we want to minimize the chance
// of having to reorder them later.		// of having to reorder them later.
SetVector<Value *> Candidates(GEPList.begin(), GEPList.end());		SetVector<Value *> Candidates(GEPList.begin(), GEPList.end());

// Some of the candidates may have already been vectorized after we		// Some of the candidates may have already been vectorized after we
// initially collected them. If so, the WeakTrackingVHs will have		// initially collected them. If so, they are marked as deleted, so remove
// nullified the		// them from the set of candidates.
// values, so remove them from the set of candidates.		Candidates.remove_if(
Candidates.remove(nullptr);		[&R](Value *I) { return R.isDeleted(cast<Instruction>(I)); });

// Remove from the set of candidates all pairs of getelementptrs with		// Remove from the set of candidates all pairs of getelementptrs with
// constant differences. Such getelementptrs are likely not good		// constant differences. Such getelementptrs are likely not good
// candidates for vectorization in a bottom-up phase since one can be		// candidates for vectorization in a bottom-up phase since one can be
// computed from the other. We also ensure all candidate getelementptr		// computed from the other. We also ensure all candidate getelementptr
// indices are unique.		// indices are unique.
for (int I = 0, E = GEPList.size(); I < E && Candidates.size() > 1; ++I) {		for (int I = 0, E = GEPList.size(); I < E && Candidates.size() > 1; ++I) {
auto *GEPI = cast<GetElementPtrInst>(GEPList[I]);		auto *GEPI = GEPList[I];
if (!Candidates.count(GEPI))		if (!Candidates.count(GEPI))
continue;		continue;
auto *SCEVI = SE->getSCEV(GEPList[I]);		auto *SCEVI = SE->getSCEV(GEPList[I]);
for (int J = I + 1; J < E && Candidates.size() > 1; ++J) {		for (int J = I + 1; J < E && Candidates.size() > 1; ++J) {
auto *GEPJ = cast<GetElementPtrInst>(GEPList[J]);		auto *GEPJ = GEPList[J];
auto *SCEVJ = SE->getSCEV(GEPList[J]);		auto *SCEVJ = SE->getSCEV(GEPList[J]);
if (isa<SCEVConstant>(SE->getMinusSCEV(SCEVI, SCEVJ))) {		if (isa<SCEVConstant>(SE->getMinusSCEV(SCEVI, SCEVJ))) {
Candidates.remove(GEPList[I]);		Candidates.remove(GEPI);
Candidates.remove(GEPList[J]);		Candidates.remove(GEPJ);
} else if (GEPI->idx_begin()->get() == GEPJ->idx_begin()->get()) {		} else if (GEPI->idx_begin()->get() == GEPJ->idx_begin()->get()) {
Candidates.remove(GEPList[J]);		Candidates.remove(GEPJ);
}		}
}		}
}		}

// We break out of the above computation as soon as we know there are		// We break out of the above computation as soon as we know there are
// fewer than two candidates remaining.		// fewer than two candidates remaining.
if (Candidates.size() < 2)		if (Candidates.size() < 2)
continue;		continue;
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	Show All 11 Lines
	; DEFAULT-LABEL: @PR28330(			; DEFAULT-LABEL: @PR28330(
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]			; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
	; DEFAULT: for.body:			; DEFAULT: for.body:
	; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; DEFAULT-NEXT: [[P20:%.*]] = add i32 [[P17]], undef
	; DEFAULT-NEXT: [[P22:%.*]] = add i32 [[P20]], undef
	; DEFAULT-NEXT: [[P24:%.*]] = add i32 [[P22]], undef
	; DEFAULT-NEXT: [[P26:%.*]] = add i32 [[P24]], undef
	; DEFAULT-NEXT: [[P28:%.*]] = add i32 [[P26]], undef
	; DEFAULT-NEXT: [[P30:%.*]] = add i32 [[P28]], undef
	; DEFAULT-NEXT: [[P32:%.*]] = add i32 [[P30]], undef
	; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; DEFAULT-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]			; DEFAULT-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]
	; DEFAULT-NEXT: [[P34:%.*]] = add i32 [[P32]], undef
	; DEFAULT-NEXT: br label [[FOR_BODY]]			; DEFAULT-NEXT: br label [[FOR_BODY]]
	;			;
	; GATHER-LABEL: @PR28330(			; GATHER-LABEL: @PR28330(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0			; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7
	; GATHER-NEXT: [[TMP3:%.*]] = insertelement <8 x i1> undef, i1 [[TMP2]], i32 0			; GATHER-NEXT: [[TMP3:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0
	; GATHER-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1			; GATHER-NEXT: [[TMP4:%.*]] = insertelement <8 x i1> undef, i1 [[TMP3]], i32 0
	; GATHER-NEXT: [[TMP5:%.*]] = insertelement <8 x i1> [[TMP3]], i1 [[TMP4]], i32 1			; GATHER-NEXT: [[TMP5:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1
	; GATHER-NEXT: [[TMP6:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2			; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i1> [[TMP4]], i1 [[TMP5]], i32 1
	; GATHER-NEXT: [[TMP7:%.*]] = insertelement <8 x i1> [[TMP5]], i1 [[TMP6]], i32 2			; GATHER-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2
	; GATHER-NEXT: [[TMP8:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3			; GATHER-NEXT: [[TMP8:%.*]] = insertelement <8 x i1> [[TMP6]], i1 [[TMP7]], i32 2
	; GATHER-NEXT: [[TMP9:%.*]] = insertelement <8 x i1> [[TMP7]], i1 [[TMP8]], i32 3			; GATHER-NEXT: [[TMP9:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3
	; GATHER-NEXT: [[TMP10:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4			; GATHER-NEXT: [[TMP10:%.*]] = insertelement <8 x i1> [[TMP8]], i1 [[TMP9]], i32 3
	; GATHER-NEXT: [[TMP11:%.*]] = insertelement <8 x i1> [[TMP9]], i1 [[TMP10]], i32 4			; GATHER-NEXT: [[TMP11:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4
	; GATHER-NEXT: [[TMP12:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5			; GATHER-NEXT: [[TMP12:%.*]] = insertelement <8 x i1> [[TMP10]], i1 [[TMP11]], i32 4
	; GATHER-NEXT: [[TMP13:%.*]] = insertelement <8 x i1> [[TMP11]], i1 [[TMP12]], i32 5			; GATHER-NEXT: [[TMP13:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5
	; GATHER-NEXT: [[TMP14:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6			; GATHER-NEXT: [[TMP14:%.*]] = insertelement <8 x i1> [[TMP12]], i1 [[TMP13]], i32 5
	; GATHER-NEXT: [[TMP15:%.*]] = insertelement <8 x i1> [[TMP13]], i1 [[TMP14]], i32 6			; GATHER-NEXT: [[TMP15:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6
	; GATHER-NEXT: [[TMP16:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7			; GATHER-NEXT: [[TMP16:%.*]] = insertelement <8 x i1> [[TMP14]], i1 [[TMP15]], i32 6
	; GATHER-NEXT: [[TMP17:%.*]] = insertelement <8 x i1> [[TMP15]], i1 [[TMP16]], i32 7			; GATHER-NEXT: [[TMP17:%.*]] = insertelement <8 x i1> [[TMP16]], i1 [[TMP2]], i32 7
	; GATHER-NEXT: [[TMP18:%.*]] = select <8 x i1> [[TMP17]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP18:%.*]] = select <8 x i1> [[TMP17]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP18]], i32 0			; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP18]], i32 0
	; GATHER-NEXT: [[P20:%.*]] = add i32 [[P17]], [[TMP19]]
	; GATHER-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP18]], i32 1			; GATHER-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP18]], i32 1
	; GATHER-NEXT: [[P22:%.*]] = add i32 [[P20]], [[TMP20]]
	; GATHER-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP18]], i32 2			; GATHER-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP18]], i32 2
	; GATHER-NEXT: [[P24:%.*]] = add i32 [[P22]], [[TMP21]]
	; GATHER-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP18]], i32 3			; GATHER-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP18]], i32 3
	; GATHER-NEXT: [[P26:%.*]] = add i32 [[P24]], [[TMP22]]
	; GATHER-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP18]], i32 4			; GATHER-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP18]], i32 4
	; GATHER-NEXT: [[P28:%.*]] = add i32 [[P26]], [[TMP23]]
	; GATHER-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP18]], i32 5			; GATHER-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP18]], i32 5
	; GATHER-NEXT: [[P30:%.*]] = add i32 [[P28]], [[TMP24]]
	; GATHER-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP18]], i32 6			; GATHER-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP18]], i32 6
	; GATHER-NEXT: [[P32:%.*]] = add i32 [[P30]], [[TMP25]]
	; GATHER-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> undef, i32 [[TMP19]], i32 0			; GATHER-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> undef, i32 [[TMP19]], i32 0
	; GATHER-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP20]], i32 1			; GATHER-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP20]], i32 1
	; GATHER-NEXT: [[TMP28:%.*]] = insertelement <8 x i32> [[TMP27]], i32 [[TMP21]], i32 2			; GATHER-NEXT: [[TMP28:%.*]] = insertelement <8 x i32> [[TMP27]], i32 [[TMP21]], i32 2
	; GATHER-NEXT: [[TMP29:%.*]] = insertelement <8 x i32> [[TMP28]], i32 [[TMP22]], i32 3			; GATHER-NEXT: [[TMP29:%.*]] = insertelement <8 x i32> [[TMP28]], i32 [[TMP22]], i32 3
	; GATHER-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> [[TMP29]], i32 [[TMP23]], i32 4			; GATHER-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> [[TMP29]], i32 [[TMP23]], i32 4
	; GATHER-NEXT: [[TMP31:%.*]] = insertelement <8 x i32> [[TMP30]], i32 [[TMP24]], i32 5			; GATHER-NEXT: [[TMP31:%.*]] = insertelement <8 x i32> [[TMP30]], i32 [[TMP24]], i32 5
	; GATHER-NEXT: [[TMP32:%.*]] = insertelement <8 x i32> [[TMP31]], i32 [[TMP25]], i32 6			; GATHER-NEXT: [[TMP32:%.*]] = insertelement <8 x i32> [[TMP31]], i32 [[TMP25]], i32 6
	; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7			; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7
	; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7			; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])			; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], [[P17]]			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], [[P17]]
	; GATHER-NEXT: [[P34:%.*]] = add i32 [[P32]], [[TMP33]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR28330(			; MAX-COST-LABEL: @PR28330(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	; MAX-COST-NEXT: [[P1:%.*]] = icmp eq i8 [[P0]], 0			; MAX-COST-NEXT: [[P1:%.*]] = icmp eq i8 [[P0]], 0
	; MAX-COST-NEXT: [[P2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			; MAX-COST-NEXT: [[P2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	; MAX-COST-NEXT: [[P3:%.*]] = icmp eq i8 [[P2]], 0			; MAX-COST-NEXT: [[P3:%.*]] = icmp eq i8 [[P2]], 0
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; DEFAULT-LABEL: @PR32038(			; DEFAULT-LABEL: @PR32038(
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]			; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
	; DEFAULT: for.body:			; DEFAULT: for.body:
	; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; DEFAULT-NEXT: [[P20:%.*]] = add i32 -5, undef
	; DEFAULT-NEXT: [[P22:%.*]] = add i32 [[P20]], undef
	; DEFAULT-NEXT: [[P24:%.*]] = add i32 [[P22]], undef
	; DEFAULT-NEXT: [[P26:%.*]] = add i32 [[P24]], undef
	; DEFAULT-NEXT: [[P28:%.*]] = add i32 [[P26]], undef
	; DEFAULT-NEXT: [[P30:%.*]] = add i32 [[P28]], undef
	; DEFAULT-NEXT: [[P32:%.*]] = add i32 [[P30]], undef
	; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; DEFAULT-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5			; DEFAULT-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5
	; DEFAULT-NEXT: [[P34:%.*]] = add i32 [[P32]], undef
	; DEFAULT-NEXT: br label [[FOR_BODY]]			; DEFAULT-NEXT: br label [[FOR_BODY]]
	;			;
	; GATHER-LABEL: @PR32038(			; GATHER-LABEL: @PR32038(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0			; GATHER-NEXT: [[TMP2:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7
	; GATHER-NEXT: [[TMP3:%.*]] = insertelement <8 x i1> undef, i1 [[TMP2]], i32 0			; GATHER-NEXT: [[TMP3:%.*]] = extractelement <8 x i1> [[TMP1]], i32 0
	; GATHER-NEXT: [[TMP4:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1			; GATHER-NEXT: [[TMP4:%.*]] = insertelement <8 x i1> undef, i1 [[TMP3]], i32 0
	; GATHER-NEXT: [[TMP5:%.*]] = insertelement <8 x i1> [[TMP3]], i1 [[TMP4]], i32 1			; GATHER-NEXT: [[TMP5:%.*]] = extractelement <8 x i1> [[TMP1]], i32 1
	; GATHER-NEXT: [[TMP6:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2			; GATHER-NEXT: [[TMP6:%.*]] = insertelement <8 x i1> [[TMP4]], i1 [[TMP5]], i32 1
	; GATHER-NEXT: [[TMP7:%.*]] = insertelement <8 x i1> [[TMP5]], i1 [[TMP6]], i32 2			; GATHER-NEXT: [[TMP7:%.*]] = extractelement <8 x i1> [[TMP1]], i32 2
	; GATHER-NEXT: [[TMP8:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3			; GATHER-NEXT: [[TMP8:%.*]] = insertelement <8 x i1> [[TMP6]], i1 [[TMP7]], i32 2
	; GATHER-NEXT: [[TMP9:%.*]] = insertelement <8 x i1> [[TMP7]], i1 [[TMP8]], i32 3			; GATHER-NEXT: [[TMP9:%.*]] = extractelement <8 x i1> [[TMP1]], i32 3
	; GATHER-NEXT: [[TMP10:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4			; GATHER-NEXT: [[TMP10:%.*]] = insertelement <8 x i1> [[TMP8]], i1 [[TMP9]], i32 3
	; GATHER-NEXT: [[TMP11:%.*]] = insertelement <8 x i1> [[TMP9]], i1 [[TMP10]], i32 4			; GATHER-NEXT: [[TMP11:%.*]] = extractelement <8 x i1> [[TMP1]], i32 4
	; GATHER-NEXT: [[TMP12:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5			; GATHER-NEXT: [[TMP12:%.*]] = insertelement <8 x i1> [[TMP10]], i1 [[TMP11]], i32 4
	; GATHER-NEXT: [[TMP13:%.*]] = insertelement <8 x i1> [[TMP11]], i1 [[TMP12]], i32 5			; GATHER-NEXT: [[TMP13:%.*]] = extractelement <8 x i1> [[TMP1]], i32 5
	; GATHER-NEXT: [[TMP14:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6			; GATHER-NEXT: [[TMP14:%.*]] = insertelement <8 x i1> [[TMP12]], i1 [[TMP13]], i32 5
	; GATHER-NEXT: [[TMP15:%.*]] = insertelement <8 x i1> [[TMP13]], i1 [[TMP14]], i32 6			; GATHER-NEXT: [[TMP15:%.*]] = extractelement <8 x i1> [[TMP1]], i32 6
	; GATHER-NEXT: [[TMP16:%.*]] = extractelement <8 x i1> [[TMP1]], i32 7			; GATHER-NEXT: [[TMP16:%.*]] = insertelement <8 x i1> [[TMP14]], i1 [[TMP15]], i32 6
	; GATHER-NEXT: [[TMP17:%.*]] = insertelement <8 x i1> [[TMP15]], i1 [[TMP16]], i32 7			; GATHER-NEXT: [[TMP17:%.*]] = insertelement <8 x i1> [[TMP16]], i1 [[TMP2]], i32 7
	; GATHER-NEXT: [[TMP18:%.*]] = select <8 x i1> [[TMP17]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP18:%.*]] = select <8 x i1> [[TMP17]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP18]], i32 0			; GATHER-NEXT: [[TMP19:%.*]] = extractelement <8 x i32> [[TMP18]], i32 0
	; GATHER-NEXT: [[P20:%.*]] = add i32 -5, [[TMP19]]
	; GATHER-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP18]], i32 1			; GATHER-NEXT: [[TMP20:%.*]] = extractelement <8 x i32> [[TMP18]], i32 1
	; GATHER-NEXT: [[P22:%.*]] = add i32 [[P20]], [[TMP20]]
	; GATHER-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP18]], i32 2			; GATHER-NEXT: [[TMP21:%.*]] = extractelement <8 x i32> [[TMP18]], i32 2
	; GATHER-NEXT: [[P24:%.*]] = add i32 [[P22]], [[TMP21]]
	; GATHER-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP18]], i32 3			; GATHER-NEXT: [[TMP22:%.*]] = extractelement <8 x i32> [[TMP18]], i32 3
	; GATHER-NEXT: [[P26:%.*]] = add i32 [[P24]], [[TMP22]]
	; GATHER-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP18]], i32 4			; GATHER-NEXT: [[TMP23:%.*]] = extractelement <8 x i32> [[TMP18]], i32 4
	; GATHER-NEXT: [[P28:%.*]] = add i32 [[P26]], [[TMP23]]
	; GATHER-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP18]], i32 5			; GATHER-NEXT: [[TMP24:%.*]] = extractelement <8 x i32> [[TMP18]], i32 5
	; GATHER-NEXT: [[P30:%.*]] = add i32 [[P28]], [[TMP24]]
	; GATHER-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP18]], i32 6			; GATHER-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP18]], i32 6
	; GATHER-NEXT: [[P32:%.*]] = add i32 [[P30]], [[TMP25]]
	; GATHER-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> undef, i32 [[TMP19]], i32 0			; GATHER-NEXT: [[TMP26:%.*]] = insertelement <8 x i32> undef, i32 [[TMP19]], i32 0
	; GATHER-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP20]], i32 1			; GATHER-NEXT: [[TMP27:%.*]] = insertelement <8 x i32> [[TMP26]], i32 [[TMP20]], i32 1
	; GATHER-NEXT: [[TMP28:%.*]] = insertelement <8 x i32> [[TMP27]], i32 [[TMP21]], i32 2			; GATHER-NEXT: [[TMP28:%.*]] = insertelement <8 x i32> [[TMP27]], i32 [[TMP21]], i32 2
	; GATHER-NEXT: [[TMP29:%.*]] = insertelement <8 x i32> [[TMP28]], i32 [[TMP22]], i32 3			; GATHER-NEXT: [[TMP29:%.*]] = insertelement <8 x i32> [[TMP28]], i32 [[TMP22]], i32 3
	; GATHER-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> [[TMP29]], i32 [[TMP23]], i32 4			; GATHER-NEXT: [[TMP30:%.*]] = insertelement <8 x i32> [[TMP29]], i32 [[TMP23]], i32 4
	; GATHER-NEXT: [[TMP31:%.*]] = insertelement <8 x i32> [[TMP30]], i32 [[TMP24]], i32 5			; GATHER-NEXT: [[TMP31:%.*]] = insertelement <8 x i32> [[TMP30]], i32 [[TMP24]], i32 5
	; GATHER-NEXT: [[TMP32:%.*]] = insertelement <8 x i32> [[TMP31]], i32 [[TMP25]], i32 6			; GATHER-NEXT: [[TMP32:%.*]] = insertelement <8 x i32> [[TMP31]], i32 [[TMP25]], i32 6
	; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7			; GATHER-NEXT: [[TMP33:%.*]] = extractelement <8 x i32> [[TMP18]], i32 7
	; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7			; GATHER-NEXT: [[TMP34:%.*]] = insertelement <8 x i32> [[TMP32]], i32 [[TMP33]], i32 7
	; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])			; GATHER-NEXT: [[TMP35:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP34]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], -5			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP35]], -5
	; GATHER-NEXT: [[P34:%.*]] = add i32 [[P32]], [[TMP33]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR32038(			; MAX-COST-LABEL: @PR32038(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <2 x i8>, <2 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <2 x i8>*), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <2 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0			; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0
	Show All 12 Lines
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0			; MAX-COST-NEXT: [[TMP2:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0
	; MAX-COST-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> undef, i1 [[TMP2]], i32 0			; MAX-COST-NEXT: [[TMP3:%.*]] = insertelement <4 x i1> undef, i1 [[TMP2]], i32 0
	; MAX-COST-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1			; MAX-COST-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
	; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[TMP4]], i32 1			; MAX-COST-NEXT: [[TMP5:%.*]] = insertelement <4 x i1> [[TMP3]], i1 [[TMP4]], i32 1
	; MAX-COST-NEXT: [[TMP6:%.*]] = insertelement <4 x i1> [[TMP5]], i1 [[P5]], i32 2			; MAX-COST-NEXT: [[TMP6:%.*]] = insertelement <4 x i1> [[TMP5]], i1 [[P5]], i32 2
	; MAX-COST-NEXT: [[TMP7:%.*]] = insertelement <4 x i1> [[TMP6]], i1 [[P7]], i32 3			; MAX-COST-NEXT: [[TMP7:%.*]] = insertelement <4 x i1> [[TMP6]], i1 [[P7]], i32 3
	; MAX-COST-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP7]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>			; MAX-COST-NEXT: [[TMP8:%.*]] = select <4 x i1> [[TMP7]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P20:%.*]] = add i32 -5, undef
	; MAX-COST-NEXT: [[P22:%.*]] = add i32 [[P20]], undef
	; MAX-COST-NEXT: [[P24:%.*]] = add i32 [[P22]], undef
	; MAX-COST-NEXT: [[P26:%.*]] = add i32 [[P24]], undef
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P28:%.*]] = add i32 [[P26]], [[P27]]
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[TMP9:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])			; MAX-COST-NEXT: [[TMP9:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])
	; MAX-COST-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[P27]]			; MAX-COST-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[P27]]
	; MAX-COST-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], [[P29]]			; MAX-COST-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], [[P29]]
	; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP11]], -5			; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP11]], -5
	; MAX-COST-NEXT: [[P30:%.*]] = add i32 [[P28]], [[P29]]
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]			; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80			; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]			; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	Show All 37 Lines

llvm/trunk/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	Show All 40 Lines
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 3			; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_024]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_024]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[S_026]]
	; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD]], undef
	; CHECK-NEXT: [[ADD19:%.*]] = add nsw i32 [[ADD11]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP7]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP8]], [[S_026]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP8]], [[S_026]]
	; CHECK-NEXT: [[ADD27:%.*]] = add nsw i32 [[ADD19]], undef
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[P1_023]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[P1_023]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[ADD_PTR29]] = getelementptr inbounds i32, i32* [[P2_024]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR29]] = getelementptr inbounds i32, i32* [[P2_024]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_025]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_025]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[H]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[H]]
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 2			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 2
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 3			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_017]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_017]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 3			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 3
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_018]] to <4 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_018]] to <4 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP3]], [[TMP1]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[S_020]]
	; CHECK-NEXT: [[ADD5:%.*]] = add nsw i32 [[ADD]], undef
	; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[ADD5]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP5]], [[S_020]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP5]], [[S_020]]
	; CHECK-NEXT: [[ADD13:%.*]] = add nsw i32 [[ADD9]], undef
	; CHECK-NEXT: [[CMP14:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]			; CHECK-NEXT: [[CMP14:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]
	; CHECK-NEXT: br i1 [[CMP14]], label [[IF_END]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP14]], label [[IF_END]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[P1_017]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[P1_017]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[ADD_PTR16]] = getelementptr inbounds i32, i32* [[P2_018]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR16]] = getelementptr inbounds i32, i32* [[P2_018]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_019]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_019]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX74:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 7			; CHECK-NEXT: [[ARRAYIDX74:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 7
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[P2_045]] to <8 x i8>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[P2_045]] to <8 x i8>*
	; CHECK-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1			; CHECK-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <8 x i32> [[TMP6]], zeroinitializer			; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <8 x i32> [[TMP6]], zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = select <8 x i1> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> [[TMP6]]			; CHECK-NEXT: [[TMP9:%.*]] = select <8 x i1> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> [[TMP6]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[S_047]]
	; CHECK-NEXT: [[ADD16:%.*]] = add nsw i32 [[ADD]], undef
	; CHECK-NEXT: [[ADD27:%.*]] = add nsw i32 [[ADD16]], undef
	; CHECK-NEXT: [[ADD38:%.*]] = add nsw i32 [[ADD27]], undef
	; CHECK-NEXT: [[ADD49:%.*]] = add nsw i32 [[ADD38]], undef
	; CHECK-NEXT: [[ADD60:%.*]] = add nsw i32 [[ADD49]], undef
	; CHECK-NEXT: [[ADD71:%.*]] = add nsw i32 [[ADD60]], undef
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP10]], [[S_047]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP10]], [[S_047]]
	; CHECK-NEXT: [[ADD82:%.*]] = add nsw i32 [[ADD71]], undef
	; CHECK-NEXT: [[CMP83:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]			; CHECK-NEXT: [[CMP83:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]
	; CHECK-NEXT: br i1 [[CMP83]], label [[IF_END_86]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP83]], label [[IF_END_86]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: if.end.86:			; CHECK: if.end.86:
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[P1_044]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[P1_044]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[ADD_PTR88]] = getelementptr inbounds i8, i8* [[P2_045]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR88]] = getelementptr inbounds i8, i8* [[P2_045]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_046]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_046]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/AArch64/spillcost-di.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; Debug informations shouldn't affect spill cost.			; Debug informations shouldn't affect spill cost.
	; RUN: opt -S -slp-vectorizer %s -o - \| FileCheck %s			; RUN: opt -S -slp-vectorizer %s -o - \| FileCheck %s

	target triple = "aarch64"			target triple = "aarch64"

	%struct.S = type { i64, i64 }			%struct.S = type { i64, i64 }

	define void @patatino(i64 %n, i64 %i, %struct.S* %p) !dbg !7 {			define void @patatino(i64 %n, i64 %i, %struct.S* %p) !dbg !7 {
	; CHECK-LABEL: @patatino(			; CHECK-LABEL: @patatino(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[N:%.*]], metadata !18, metadata !DIExpression()), !dbg !23			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[N:%.*]], metadata !18, metadata !DIExpression()), !dbg !23
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I:%.*]], metadata !19, metadata !DIExpression()), !dbg !24			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I:%.*]], metadata !19, metadata !DIExpression()), !dbg !24
	; CHECK-NEXT: call void @llvm.dbg.value(metadata %struct.S* [[P:%.*]], metadata !20, metadata !DIExpression()), !dbg !25			; CHECK-NEXT: call void @llvm.dbg.value(metadata %struct.S* [[P:%.*]], metadata !20, metadata !DIExpression()), !dbg !25
	; CHECK-NEXT: [[X1:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P]], i64 [[N]], i32 0, !dbg !26			; CHECK-NEXT: [[X1:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P]], i64 [[N]], i32 0, !dbg !26
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata !21, metadata !DIExpression()), !dbg !27			; CHECK-NEXT: call void @llvm.dbg.value(metadata !2, metadata !21, metadata !DIExpression()), !dbg !27
	; CHECK-NEXT: [[Y3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[N]], i32 1, !dbg !28			; CHECK-NEXT: [[Y3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[N]], i32 1, !dbg !28
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[X1]] to <2 x i64>*, !dbg !26			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[X1]] to <2 x i64>*, !dbg !26
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8, !dbg !26, !tbaa !29			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8, !dbg !26, !tbaa !29
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata !22, metadata !DIExpression()), !dbg !33			; CHECK-NEXT: call void @llvm.dbg.value(metadata !2, metadata !22, metadata !DIExpression()), !dbg !33
	; CHECK-NEXT: [[X5:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 0, !dbg !34			; CHECK-NEXT: [[X5:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 0, !dbg !34
	; CHECK-NEXT: [[Y7:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 1, !dbg !35			; CHECK-NEXT: [[Y7:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 1, !dbg !35
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[X5]] to <2 x i64>*, !dbg !36			; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[X5]] to <2 x i64>*, !dbg !36
	; CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[TMP2]], align 8, !dbg !36, !tbaa !29			; CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[TMP2]], align 8, !dbg !36, !tbaa !29
	; CHECK-NEXT: ret void, !dbg !37			; CHECK-NEXT: ret void, !dbg !37
	;			;
	entry:			entry:
	call void @llvm.dbg.value(metadata i64 %n, metadata !18, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.value(metadata i64 %n, metadata !18, metadata !DIExpression()), !dbg !23
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR31847.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -slp-vectorizer -S -o - -mtriple=i386 -mcpu=haswell < %s \| FileCheck %s
				target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"

				@shift = common local_unnamed_addr global [10 x i32] zeroinitializer, align 4
				@data = common local_unnamed_addr global [10 x i8*] zeroinitializer, align 4

				define void @flat(i32 %intensity) {
				; CHECK-LABEL: @flat(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 getelementptr inbounds ([10 x i32], [10 x i32]* @shift, i32 0, i32 0), align 4
				; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 getelementptr inbounds ([10 x i32], [10 x i32]* @shift, i32 0, i32 1), align 4
				; CHECK-NEXT: [[TMP2:%.]] = load i8, i8** getelementptr inbounds ([10 x i8], [10 x i8]* @data, i32 0, i32 0), align 4
				; CHECK-NEXT: [[TMP3:%.]] = load i8, i8** getelementptr inbounds ([10 x i8], [10 x i8]* @data, i32 0, i32 1), align 4
				; CHECK-NEXT: [[SHR:%.*]] = lshr i32 1, [[TMP0]]
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i8, i8 [[TMP2]], i32 [[SHR]]
				; CHECK-NEXT: [[SHR1:%.*]] = lshr i32 1, [[TMP1]]
				; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[TMP3]], i32 [[SHR1]]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.cond.cleanup:
				; CHECK-NEXT: ret void
				; CHECK: for.body:
				; CHECK-NEXT: [[D1_DATA_046:%.]] = phi i8 [ [[TMP3]], [[ENTRY:%.]] ], [ [[ADD_PTR23_1:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[Y_045:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC_1:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP4]] to i32
				; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[CONV]], -128
				; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[ARRAYIDX2]], align 1
				; CHECK-NEXT: [[CONV3:%.*]] = zext i8 [[TMP5]] to i32
				; CHECK-NEXT: [[SUB4:%.*]] = add nsw i32 [[CONV3]], -128
				; CHECK-NEXT: [[CMP5:%.*]] = icmp sgt i32 [[SUB]], -1
				; CHECK-NEXT: [[SUB7:%.*]] = sub nsw i32 128, [[CONV]]
				; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP5]], i32 [[SUB]], i32 [[SUB7]]
				; CHECK-NEXT: [[CMP8:%.*]] = icmp sgt i32 [[SUB4]], -1
				; CHECK-NEXT: [[SUB12:%.*]] = sub nsw i32 128, [[CONV3]]
				; CHECK-NEXT: [[COND14:%.*]] = select i1 [[CMP8]], i32 [[SUB4]], i32 [[SUB12]]
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[COND14]], [[COND]]
				; CHECK-NEXT: [[IDX_NEG:%.*]] = sub nsw i32 0, [[ADD]]
				; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[IDX_NEG]]
				; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[ADD_PTR]], align 1
				; CHECK-NEXT: [[CONV15:%.*]] = zext i8 [[TMP6]] to i32
				; CHECK-NEXT: [[ADD16:%.]] = add nsw i32 [[CONV15]], [[INTENSITY:%.]]
				; CHECK-NEXT: [[CONV17:%.*]] = trunc i32 [[ADD16]] to i8
				; CHECK-NEXT: store i8 [[CONV17]], i8* [[ADD_PTR]], align 1
				; CHECK-NEXT: [[ADD_PTR18:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[ADD]]
				; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[ADD_PTR18]], align 1
				; CHECK-NEXT: [[NOT_TOBOOL:%.*]] = icmp eq i8 [[TMP7]], 0
				; CHECK-NEXT: [[CONV21:%.*]] = zext i1 [[NOT_TOBOOL]] to i8
				; CHECK-NEXT: store i8 [[CONV21]], i8* [[ADD_PTR18]], align 1
				; CHECK-NEXT: [[ADD_PTR23:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[TMP1]]
				; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[ARRAYIDX]], align 1
				; CHECK-NEXT: [[CONV_1:%.*]] = zext i8 [[TMP8]] to i32
				; CHECK-NEXT: [[SUB_1:%.*]] = add nsw i32 [[CONV_1]], -128
				; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[ARRAYIDX2]], align 1
				; CHECK-NEXT: [[CONV3_1:%.*]] = zext i8 [[TMP9]] to i32
				; CHECK-NEXT: [[SUB4_1:%.*]] = add nsw i32 [[CONV3_1]], -128
				; CHECK-NEXT: [[CMP5_1:%.*]] = icmp sgt i32 [[SUB_1]], -1
				; CHECK-NEXT: [[SUB7_1:%.*]] = sub nsw i32 128, [[CONV_1]]
				; CHECK-NEXT: [[COND_1:%.*]] = select i1 [[CMP5_1]], i32 [[SUB_1]], i32 [[SUB7_1]]
				; CHECK-NEXT: [[CMP8_1:%.*]] = icmp sgt i32 [[SUB4_1]], -1
				; CHECK-NEXT: [[SUB12_1:%.*]] = sub nsw i32 128, [[CONV3_1]]
				; CHECK-NEXT: [[COND14_1:%.*]] = select i1 [[CMP8_1]], i32 [[SUB4_1]], i32 [[SUB12_1]]
				; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[COND14_1]], [[COND_1]]
				; CHECK-NEXT: [[IDX_NEG_1:%.*]] = sub nsw i32 0, [[ADD_1]]
				; CHECK-NEXT: [[ADD_PTR_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR23]], i32 [[IDX_NEG_1]]
				; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 [[ADD_PTR_1]], align 1
				; CHECK-NEXT: [[CONV15_1:%.*]] = zext i8 [[TMP10]] to i32
				; CHECK-NEXT: [[ADD16_1:%.*]] = add nsw i32 [[CONV15_1]], [[INTENSITY]]
				; CHECK-NEXT: [[CONV17_1:%.*]] = trunc i32 [[ADD16_1]] to i8
				; CHECK-NEXT: store i8 [[CONV17_1]], i8* [[ADD_PTR_1]], align 1
				; CHECK-NEXT: [[ADD_PTR18_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR23]], i32 [[ADD_1]]
				; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[ADD_PTR18_1]], align 1
				; CHECK-NEXT: [[NOT_TOBOOL_1:%.*]] = icmp eq i8 [[TMP11]], 0
				; CHECK-NEXT: [[CONV21_1:%.*]] = zext i1 [[NOT_TOBOOL_1]] to i8
				; CHECK-NEXT: store i8 [[CONV21_1]], i8* [[ADD_PTR18_1]], align 1
				; CHECK-NEXT: [[ADD_PTR23_1]] = getelementptr inbounds i8, i8* [[ADD_PTR23]], i32 [[TMP1]]
				; CHECK-NEXT: [[INC_1]] = add nsw i32 [[Y_045]], 2
				; CHECK-NEXT: [[EXITCOND_1:%.*]] = icmp eq i32 [[INC_1]], 128
				; CHECK-NEXT: br i1 [[EXITCOND_1]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
				;
				entry:
				%0 = load i32, i32* getelementptr inbounds ([10 x i32], [10 x i32]* @shift, i32 0, i32 0), align 4
				%1 = load i32, i32* getelementptr inbounds ([10 x i32], [10 x i32]* @shift, i32 0, i32 1), align 4
				%2 = load i8, i8* getelementptr inbounds ([10 x i8], [10 x i8]* @data, i32 0, i32 0), align 4
				%3 = load i8, i8* getelementptr inbounds ([10 x i8], [10 x i8]* @data, i32 0, i32 1), align 4
				%shr = lshr i32 1, %0
				%arrayidx = getelementptr inbounds i8, i8* %2, i32 %shr
				%shr1 = lshr i32 1, %1
				%arrayidx2 = getelementptr inbounds i8, i8* %3, i32 %shr1
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%d1_data.046 = phi i8* [ %3, %entry ], [ %add.ptr23.1, %for.body ]
				%y.045 = phi i32 [ 0, %entry ], [ %inc.1, %for.body ]
				%4 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %4 to i32
				%sub = add nsw i32 %conv, -128
				%5 = load i8, i8* %arrayidx2, align 1
				%conv3 = zext i8 %5 to i32
				%sub4 = add nsw i32 %conv3, -128
				%cmp5 = icmp sgt i32 %sub, -1
				%sub7 = sub nsw i32 128, %conv
				%cond = select i1 %cmp5, i32 %sub, i32 %sub7
				%cmp8 = icmp sgt i32 %sub4, -1
				%sub12 = sub nsw i32 128, %conv3
				%cond14 = select i1 %cmp8, i32 %sub4, i32 %sub12
				%add = add nsw i32 %cond14, %cond
				%idx.neg = sub nsw i32 0, %add
				%add.ptr = getelementptr inbounds i8, i8* %d1_data.046, i32 %idx.neg
				%6 = load i8, i8* %add.ptr, align 1
				%conv15 = zext i8 %6 to i32
				%add16 = add nsw i32 %conv15, %intensity
				%conv17 = trunc i32 %add16 to i8
				store i8 %conv17, i8* %add.ptr, align 1
				%add.ptr18 = getelementptr inbounds i8, i8* %d1_data.046, i32 %add
				%7 = load i8, i8* %add.ptr18, align 1
				%not.tobool = icmp eq i8 %7, 0
				%conv21 = zext i1 %not.tobool to i8
				store i8 %conv21, i8* %add.ptr18, align 1
				%add.ptr23 = getelementptr inbounds i8, i8* %d1_data.046, i32 %1
				%8 = load i8, i8* %arrayidx, align 1
				%conv.1 = zext i8 %8 to i32
				%sub.1 = add nsw i32 %conv.1, -128
				%9 = load i8, i8* %arrayidx2, align 1
				%conv3.1 = zext i8 %9 to i32
				%sub4.1 = add nsw i32 %conv3.1, -128
				%cmp5.1 = icmp sgt i32 %sub.1, -1
				%sub7.1 = sub nsw i32 128, %conv.1
				%cond.1 = select i1 %cmp5.1, i32 %sub.1, i32 %sub7.1
				%cmp8.1 = icmp sgt i32 %sub4.1, -1
				%sub12.1 = sub nsw i32 128, %conv3.1
				%cond14.1 = select i1 %cmp8.1, i32 %sub4.1, i32 %sub12.1
				%add.1 = add nsw i32 %cond14.1, %cond.1
				%idx.neg.1 = sub nsw i32 0, %add.1
				%add.ptr.1 = getelementptr inbounds i8, i8* %add.ptr23, i32 %idx.neg.1
				%10 = load i8, i8* %add.ptr.1, align 1
				%conv15.1 = zext i8 %10 to i32
				%add16.1 = add nsw i32 %conv15.1, %intensity
				%conv17.1 = trunc i32 %add16.1 to i8
				store i8 %conv17.1, i8* %add.ptr.1, align 1
				%add.ptr18.1 = getelementptr inbounds i8, i8* %add.ptr23, i32 %add.1
				%11 = load i8, i8* %add.ptr18.1, align 1
				%not.tobool.1 = icmp eq i8 %11, 0
				%conv21.1 = zext i1 %not.tobool.1 to i8
				store i8 %conv21.1, i8* %add.ptr18.1, align 1
				%add.ptr23.1 = getelementptr inbounds i8, i8* %add.ptr23, i32 %1
				%inc.1 = add nsw i32 %y.045, 2
				%exitcond.1 = icmp eq i32 %inc.1, 128
				br i1 %exitcond.1, label %for.cond.cleanup, label %for.body
				}

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR35628_1.ll

	Show All 12 Lines
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 2			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 3			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[PTR]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[PTR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i32> [[TMP4]], [[TMP4]]			; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i32> [[TMP4]], [[TMP4]]
	; CHECK-NEXT: [[TMP9:%.*]] = add i32 1, undef			; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP6]] to i64
	; CHECK-NEXT: [[TMP10:%.*]] = add i32 [[TMP9]], [[TMP7]]
	; CHECK-NEXT: [[TMP11:%.*]] = add i32 [[TMP10]], undef
	; CHECK-NEXT: [[TMP12:%.*]] = add i32 [[TMP11]], [[TMP6]]
	; CHECK-NEXT: [[TMP13:%.*]] = add i32 [[TMP12]], undef
	; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP6]] to i64
	; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP13]], [[TMP5]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP8]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP8]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP16]], 1			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP10]], 1
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add i32 [[OP_EXTRA]], [[TMP7]]			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add i32 [[OP_EXTRA]], [[TMP7]]
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = add i32 [[OP_EXTRA3]], [[TMP6]]			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = add i32 [[OP_EXTRA3]], [[TMP6]]
	; CHECK-NEXT: [[OP_EXTRA5]] = add i32 [[OP_EXTRA4]], [[TMP5]]			; CHECK-NEXT: [[OP_EXTRA5]] = add i32 [[OP_EXTRA4]], [[TMP5]]
	; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP15]], undef
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	; CHECK: bail_out:			; CHECK: bail_out:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%cmp = icmp eq i32* %ptr, null			%cmp = icmp eq i32* %ptr, null
	br i1 %cmp, label %loop, label %bail_out			br i1 %cmp, label %loop, label %bail_out

	Show All 29 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

	Show All 14 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[TMP0]], i32 2			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i64> [[TMP3]], i64 [[TMP0]], i32 3			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i64> [[TMP3]], i64 [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP4]], <i64 3, i64 2, i64 1, i64 0>			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> [[TMP4]], <i64 3, i64 2, i64 1, i64 0>
	; CHECK-NEXT: [[TMP6]] = extractelement <4 x i64> [[TMP5]], i32 3			; CHECK-NEXT: [[TMP6]] = extractelement <4 x i64> [[TMP5]], i32 3
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP7]], 32			; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP7]], 32
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP5]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.*]] = ashr exact <4 x i64> [[TMP8]], <i64 32, i64 32, i64 32, i64 32>			; CHECK-NEXT: [[TMP9:%.*]] = ashr exact <4 x i64> [[TMP8]], <i64 32, i64 32, i64 32, i64 32>
	; CHECK-NEXT: [[SUM1:%.*]] = add i64 undef, undef
	; CHECK-NEXT: [[SUM2:%.*]] = add i64 [[SUM1]], undef
	; CHECK-NEXT: [[ZSUM:%.*]] = add i64 [[SUM2]], 0
	; CHECK-NEXT: [[JOIN:%.*]] = add i64 [[TMP6]], [[ZSUM]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP9]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP9]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i64> [[TMP9]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i64> [[TMP9]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP10]], 0			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP10]], 0
	; CHECK-NEXT: [[OP_EXTRA3]] = add i64 [[OP_EXTRA]], [[TMP6]]			; CHECK-NEXT: [[OP_EXTRA3]] = add i64 [[OP_EXTRA]], [[TMP6]]
	; CHECK-NEXT: [[LAST:%.*]] = add i64 [[JOIN]], undef
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%dummy_phi = phi i64 [ 1, %entry ], [ %last, %loop ]			%dummy_phi = phi i64 [ 1, %entry ], [ %last, %loop ]
	%0 = phi i64 [ 2, %entry ], [ %fork, %loop ]			%0 = phi i64 [ 2, %entry ], [ %fork, %loop ]
	Show All 22 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefixes=ALL,CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefixes=ALL,CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefixes=ALL,FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefixes=ALL,FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[VAL_1:%.*]] = and i32 [[TMP2]], undef
	; CHECK-NEXT: [[VAL_2:%.]] = and i32 [[VAL_1]], [[TMP0:%.]]
	; CHECK-NEXT: [[VAL_3:%.*]] = and i32 [[VAL_2]], [[TMP0]]
	; CHECK-NEXT: [[VAL_4:%.*]] = and i32 [[VAL_3]], [[TMP0]]
	; CHECK-NEXT: [[VAL_5:%.*]] = and i32 [[VAL_4]], [[TMP0]]
	; CHECK-NEXT: [[VAL_7:%.*]] = and i32 [[VAL_5]], undef
	; CHECK-NEXT: [[VAL_8:%.*]] = and i32 [[VAL_7]], [[TMP0]]
	; CHECK-NEXT: [[VAL_9:%.*]] = and i32 [[VAL_8]], [[TMP0]]
	; CHECK-NEXT: [[VAL_10:%.*]] = and i32 [[VAL_9]], [[TMP0]]
	; CHECK-NEXT: [[VAL_12:%.*]] = and i32 [[VAL_10]], undef
	; CHECK-NEXT: [[VAL_13:%.*]] = and i32 [[VAL_12]], [[TMP0]]
	; CHECK-NEXT: [[VAL_14:%.*]] = and i32 [[VAL_13]], [[TMP0]]
	; CHECK-NEXT: [[VAL_15:%.*]] = and i32 [[VAL_14]], [[TMP0]]
	; CHECK-NEXT: [[VAL_16:%.*]] = and i32 [[VAL_15]], [[TMP0]]
	; CHECK-NEXT: [[VAL_17:%.*]] = and i32 [[VAL_16]], [[TMP0]]
	; CHECK-NEXT: [[VAL_19:%.*]] = and i32 [[VAL_17]], undef
	; CHECK-NEXT: [[VAL_21:%.*]] = and i32 [[VAL_19]], undef
	; CHECK-NEXT: [[VAL_22:%.*]] = and i32 [[VAL_21]], [[TMP0]]
	; CHECK-NEXT: [[VAL_23:%.*]] = and i32 [[VAL_22]], [[TMP0]]
	; CHECK-NEXT: [[VAL_24:%.*]] = and i32 [[VAL_23]], [[TMP0]]
	; CHECK-NEXT: [[VAL_25:%.*]] = and i32 [[VAL_24]], [[TMP0]]
	; CHECK-NEXT: [[VAL_26:%.*]] = and i32 [[VAL_25]], [[TMP0]]
	; CHECK-NEXT: [[VAL_27:%.*]] = and i32 [[VAL_26]], [[TMP0]]
	; CHECK-NEXT: [[VAL_28:%.*]] = and i32 [[VAL_27]], [[TMP0]]
	; CHECK-NEXT: [[VAL_29:%.*]] = and i32 [[VAL_28]], [[TMP0]]
	; CHECK-NEXT: [[VAL_30:%.*]] = and i32 [[VAL_29]], [[TMP0]]
	; CHECK-NEXT: [[VAL_31:%.*]] = and i32 [[VAL_30]], [[TMP0]]
	; CHECK-NEXT: [[VAL_32:%.*]] = and i32 [[VAL_31]], [[TMP0]]
	; CHECK-NEXT: [[VAL_33:%.*]] = and i32 [[VAL_32]], [[TMP0]]
	; CHECK-NEXT: [[VAL_35:%.*]] = and i32 [[VAL_33]], undef
	; CHECK-NEXT: [[VAL_36:%.*]] = and i32 [[VAL_35]], [[TMP0]]
	; CHECK-NEXT: [[VAL_37:%.*]] = and i32 [[VAL_36]], [[TMP0]]
	; CHECK-NEXT: [[VAL_38:%.*]] = and i32 [[VAL_37]], [[TMP0]]
	; CHECK-NEXT: [[VAL_40:%.*]] = and i32 [[VAL_38]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP4]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
	Show All 10 Lines
	; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA30:%.*]] = and i32 [[OP_EXTRA29]], [[TMP0]]			; CHECK-NEXT: [[OP_EXTRA30:%.*]] = and i32 [[OP_EXTRA29]], [[TMP0]]
	; CHECK-NEXT: [[VAL_42:%.*]] = and i32 [[VAL_40]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA30]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA30]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 14910, i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]			; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> undef, i32 [[TMP12]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> undef, i32 [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
	; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1			; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>			; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>
	; FORCE_REDUCTION-NEXT: [[VAL_1:%.*]] = and i32 [[TMP2]], undef
	; FORCE_REDUCTION-NEXT: [[VAL_2:%.]] = and i32 [[VAL_1]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[VAL_3:%.*]] = and i32 [[VAL_2]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_4:%.*]] = and i32 [[VAL_3]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_5:%.*]] = and i32 [[VAL_4]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_7:%.*]] = and i32 [[VAL_5]], undef
	; FORCE_REDUCTION-NEXT: [[VAL_8:%.*]] = and i32 [[VAL_7]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_9:%.*]] = and i32 [[VAL_8]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_10:%.*]] = and i32 [[VAL_9]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_12:%.*]] = and i32 [[VAL_10]], undef
	; FORCE_REDUCTION-NEXT: [[VAL_13:%.*]] = and i32 [[VAL_12]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_14:%.*]] = and i32 [[VAL_13]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_15:%.*]] = and i32 [[VAL_14]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_16:%.*]] = and i32 [[VAL_15]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_17:%.*]] = and i32 [[VAL_16]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_19:%.*]] = and i32 [[VAL_17]], undef
	; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496			; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496
	; FORCE_REDUCTION-NEXT: [[VAL_21:%.*]] = and i32 [[VAL_19]], [[VAL_20]]
	; FORCE_REDUCTION-NEXT: [[VAL_22:%.*]] = and i32 [[VAL_21]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_23:%.*]] = and i32 [[VAL_22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_24:%.*]] = and i32 [[VAL_23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_25:%.*]] = and i32 [[VAL_24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_26:%.*]] = and i32 [[VAL_25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_27:%.*]] = and i32 [[VAL_26]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_28:%.*]] = and i32 [[VAL_27]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_29:%.*]] = and i32 [[VAL_28]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_30:%.*]] = and i32 [[VAL_29]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_31:%.*]] = and i32 [[VAL_30]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_32:%.*]] = and i32 [[VAL_31]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_33:%.*]] = and i32 [[VAL_32]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555			; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555
	; FORCE_REDUCTION-NEXT: [[VAL_35:%.*]] = and i32 [[VAL_33]], [[VAL_34]]
	; FORCE_REDUCTION-NEXT: [[VAL_36:%.*]] = and i32 [[VAL_35]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_37:%.*]] = and i32 [[VAL_36]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; FORCE_REDUCTION-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; FORCE_REDUCTION-NEXT: [[BIN_RDX:%.*]] = and <4 x i32> [[TMP3]], [[RDX_SHUF]]			; FORCE_REDUCTION-NEXT: [[BIN_RDX:%.*]] = and <4 x i32> [[TMP3]], [[RDX_SHUF]]
	; FORCE_REDUCTION-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; FORCE_REDUCTION-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; FORCE_REDUCTION-NEXT: [[BIN_RDX2:%.*]] = and <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; FORCE_REDUCTION-NEXT: [[BIN_RDX2:%.*]] = and <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]			; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]			; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP6]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
	Show All 11 Lines
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP2]]			; FORCE_REDUCTION-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_38:%.*]] = and i32 [[VAL_37]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529			; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA29]], [[VAL_39]]			; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA29]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685			; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_40]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_41]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1			; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> [[TMP9]], i32 14910, i32 1
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP8]], [[TMP10]]			; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP8]], [[TMP10]]
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR40310.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s

	define void @mainTest(i32 %param, i32 * %vals, i32 %len) {			define void @mainTest(i32 %param, i32 * %vals, i32 %len) {
	; CHECK-LABEL: @mainTest(			; CHECK-LABEL: @mainTest(
	; CHECK-NEXT: bci_15.preheader:			; CHECK-NEXT: bci_15.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 undef>, i32 [[PARAM:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 undef>, i32 [[PARAM:%.]], i32 1
	; CHECK-NEXT: br label [[BCI_15:%.*]]			; CHECK-NEXT: br label [[BCI_15:%.*]]
	; CHECK: bci_15:			; CHECK: bci_15:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15
	; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4			; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>			; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>
	; CHECK-NEXT: [[V14:%.*]] = and i32 [[TMP2]], undef
	; CHECK-NEXT: [[V16:%.*]] = and i32 undef, [[V14]]
	; CHECK-NEXT: [[V18:%.*]] = and i32 undef, [[V16]]
	; CHECK-NEXT: [[V20:%.*]] = and i32 undef, [[V18]]
	; CHECK-NEXT: [[V22:%.*]] = and i32 undef, [[V20]]
	; CHECK-NEXT: [[V24:%.*]] = and i32 undef, [[V22]]
	; CHECK-NEXT: [[V26:%.*]] = and i32 undef, [[V24]]
	; CHECK-NEXT: [[V28:%.*]] = and i32 undef, [[V26]]
	; CHECK-NEXT: [[V30:%.*]] = and i32 undef, [[V28]]
	; CHECK-NEXT: [[V32:%.*]] = and i32 undef, [[V30]]
	; CHECK-NEXT: [[V34:%.*]] = and i32 undef, [[V32]]
	; CHECK-NEXT: [[V36:%.*]] = and i32 undef, [[V34]]
	; CHECK-NEXT: [[V38:%.*]] = and i32 undef, [[V36]]
	; CHECK-NEXT: [[V40:%.*]] = and i32 undef, [[V38]]
	; CHECK-NEXT: [[V42:%.*]] = and i32 undef, [[V40]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <16 x i32> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = and <16 x i32> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = and <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]			; CHECK-NEXT: [[BIN_RDX6:%.*]] = and <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]
	; CHECK-NEXT: [[V43:%.*]] = and i32 undef, [[V42]]
	; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16			; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> undef, i32 [[V44]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> undef, i32 [[V44]], i32 0
	; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1			; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1
	; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bci_15.preheader:			bci_15.preheader:
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/bad-reduction.ll

	Show All 24 Lines
	; CHECK-NEXT: [[Z4:%.*]] = zext i8 [[T4]] to i64			; CHECK-NEXT: [[Z4:%.*]] = zext i8 [[T4]] to i64
	; CHECK-NEXT: [[Z5:%.*]] = zext i8 [[T5]] to i64			; CHECK-NEXT: [[Z5:%.*]] = zext i8 [[T5]] to i64
	; CHECK-NEXT: [[Z6:%.*]] = zext i8 [[T6]] to i64			; CHECK-NEXT: [[Z6:%.*]] = zext i8 [[T6]] to i64
	; CHECK-NEXT: [[Z7:%.*]] = zext i8 [[T7]] to i64			; CHECK-NEXT: [[Z7:%.*]] = zext i8 [[T7]] to i64
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <4 x i64> [[TMP3]], <i64 56, i64 48, i64 40, i64 32>			; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <4 x i64> [[TMP3]], <i64 56, i64 48, i64 40, i64 32>
	; CHECK-NEXT: [[SH4:%.*]] = shl nuw nsw i64 [[Z4]], 24			; CHECK-NEXT: [[SH4:%.*]] = shl nuw nsw i64 [[Z4]], 24
	; CHECK-NEXT: [[SH5:%.*]] = shl nuw nsw i64 [[Z5]], 16			; CHECK-NEXT: [[SH5:%.*]] = shl nuw nsw i64 [[Z5]], 16
	; CHECK-NEXT: [[SH6:%.*]] = shl nuw nsw i64 [[Z6]], 8			; CHECK-NEXT: [[SH6:%.*]] = shl nuw nsw i64 [[Z6]], 8
	; CHECK-NEXT: [[OR01:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[OR012:%.*]] = or i64 [[OR01]], undef
	; CHECK-NEXT: [[OR0123:%.*]] = or i64 [[OR012]], undef
	; CHECK-NEXT: [[OR01234:%.*]] = or i64 [[OR0123]], [[SH4]]
	; CHECK-NEXT: [[OR012345:%.*]] = or i64 [[OR01234]], [[SH5]]
	; CHECK-NEXT: [[OR0123456:%.*]] = or i64 [[OR012345]], [[SH6]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i64> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i64> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP5]], [[SH4]]			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP5]], [[SH4]]
	; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], [[SH5]]			; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], [[SH5]]
	; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], [[SH6]]			; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], [[SH6]]
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = or i64 [[TMP8]], [[Z7]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = or i64 [[TMP8]], [[Z7]]
	; CHECK-NEXT: [[OR01234567:%.*]] = or i64 [[OR0123456]], [[Z7]]
	; CHECK-NEXT: ret i64 [[OP_EXTRA]]			; CHECK-NEXT: ret i64 [[OP_EXTRA]]
	;			;
	%g0 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 0			%g0 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 0
	%g1 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 1			%g1 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 1
	%g2 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 2			%g2 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 2
	%g3 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 3			%g3 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 3
	%g4 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 4			%g4 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 4
	%g5 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 5			%g5 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 5
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 4			; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 4
	; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 5			; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 5
	; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 6			; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 6
	; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 7			; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds [[V8I8]], %v8i8 [[P]], i64 0, i32 7
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[G0]] to <8 x i8>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[G0]] to <8 x i8>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i64>			; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i64>
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <8 x i64> [[TMP3]], <i64 56, i64 48, i64 40, i64 32, i64 24, i64 16, i64 8, i64 0>			; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <8 x i64> [[TMP3]], <i64 56, i64 48, i64 40, i64 32, i64 24, i64 16, i64 8, i64 0>
	; CHECK-NEXT: [[OR01:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[OR012:%.*]] = or i64 [[OR01]], undef
	; CHECK-NEXT: [[OR0123:%.*]] = or i64 [[OR012]], undef
	; CHECK-NEXT: [[OR01234:%.*]] = or i64 [[OR0123]], undef
	; CHECK-NEXT: [[OR012345:%.*]] = or i64 [[OR01234]], undef
	; CHECK-NEXT: [[OR0123456:%.*]] = or i64 [[OR012345]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i64> [[TMP4]], <8 x i64> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i64> [[TMP4]], <8 x i64> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i64> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i64> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i64> [[BIN_RDX]], <8 x i64> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i64> [[BIN_RDX]], <8 x i64> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i64> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i64> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i64> [[BIN_RDX2]], <8 x i64> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i64> [[BIN_RDX2]], <8 x i64> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i64> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i64> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i64> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i64> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OR01234567:%.*]] = or i64 [[OR0123456]], undef
	; CHECK-NEXT: ret i64 [[TMP5]]			; CHECK-NEXT: ret i64 [[TMP5]]
	;			;
	%g0 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 0			%g0 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 0
	%g1 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 1			%g1 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 1
	%g2 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 2			%g2 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 2
	%g3 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 3			%g3 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 3
	%g4 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 4			%g4 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 4
	%g5 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 5			%g5 = getelementptr inbounds %v8i8, %v8i8* %p, i64 0, i32 5
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>			; CHECK-NEXT: [[TMP3:%.*]] = zext <4 x i8> [[TMP2]] to <4 x i64>
	; CHECK-NEXT: [[Z5:%.*]] = zext i8 [[LD5]] to i64			; CHECK-NEXT: [[Z5:%.*]] = zext i8 [[LD5]] to i64
	; CHECK-NEXT: [[Z6:%.*]] = zext i8 [[LD6]] to i64			; CHECK-NEXT: [[Z6:%.*]] = zext i8 [[LD6]] to i64
	; CHECK-NEXT: [[Z7:%.*]] = zext i8 [[LD7]] to i64			; CHECK-NEXT: [[Z7:%.*]] = zext i8 [[LD7]] to i64
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw <4 x i64> [[TMP3]], <i64 8, i64 16, i64 24, i64 32>			; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw <4 x i64> [[TMP3]], <i64 8, i64 16, i64 24, i64 32>
	; CHECK-NEXT: [[S5:%.*]] = shl nuw nsw i64 [[Z5]], 40			; CHECK-NEXT: [[S5:%.*]] = shl nuw nsw i64 [[Z5]], 40
	; CHECK-NEXT: [[S6:%.*]] = shl nuw nsw i64 [[Z6]], 48			; CHECK-NEXT: [[S6:%.*]] = shl nuw nsw i64 [[Z6]], 48
	; CHECK-NEXT: [[S7:%.*]] = shl nuw i64 [[Z7]], 56			; CHECK-NEXT: [[S7:%.*]] = shl nuw i64 [[Z7]], 56
	; CHECK-NEXT: [[O1:%.*]] = or i64 undef, [[Z0]]
	; CHECK-NEXT: [[O2:%.*]] = or i64 [[O1]], undef
	; CHECK-NEXT: [[O3:%.*]] = or i64 [[O2]], undef
	; CHECK-NEXT: [[O4:%.*]] = or i64 [[O3]], undef
	; CHECK-NEXT: [[O5:%.*]] = or i64 [[O4]], [[S5]]
	; CHECK-NEXT: [[O6:%.*]] = or i64 [[O5]], [[S6]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i64> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = or <4 x i64> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i64> [[BIN_RDX]], <4 x i64> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <4 x i64> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i64> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP5]], [[S5]]			; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP5]], [[S5]]
	; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], [[S6]]			; CHECK-NEXT: [[TMP7:%.*]] = or i64 [[TMP6]], [[S6]]
	; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], [[S7]]			; CHECK-NEXT: [[TMP8:%.*]] = or i64 [[TMP7]], [[S7]]
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = or i64 [[TMP8]], [[Z0]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = or i64 [[TMP8]], [[Z0]]
	; CHECK-NEXT: [[O7:%.*]] = or i64 [[O6]], [[S7]]
	; CHECK-NEXT: ret i64 [[OP_EXTRA]]			; CHECK-NEXT: ret i64 [[OP_EXTRA]]
	;			;
	%g1 = getelementptr inbounds i8, i8* %arg, i64 1			%g1 = getelementptr inbounds i8, i8* %arg, i64 1
	%g2 = getelementptr inbounds i8, i8* %arg, i64 2			%g2 = getelementptr inbounds i8, i8* %arg, i64 2
	%g3 = getelementptr inbounds i8, i8* %arg, i64 3			%g3 = getelementptr inbounds i8, i8* %arg, i64 3
	%g4 = getelementptr inbounds i8, i8* %arg, i64 4			%g4 = getelementptr inbounds i8, i8* %arg, i64 4
	%g5 = getelementptr inbounds i8, i8* %arg, i64 5			%g5 = getelementptr inbounds i8, i8* %arg, i64 5
	%g6 = getelementptr inbounds i8, i8* %arg, i64 6			%g6 = getelementptr inbounds i8, i8* %arg, i64 6
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 4			; CHECK-NEXT: [[G4:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 4
	; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 5			; CHECK-NEXT: [[G5:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 5
	; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 6			; CHECK-NEXT: [[G6:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 6
	; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 7			; CHECK-NEXT: [[G7:%.]] = getelementptr inbounds i8, i8 [[ARG]], i64 7
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[ARG]] to <8 x i8>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[ARG]] to <8 x i8>*
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1			; CHECK-NEXT: [[TMP2:%.]] = load <8 x i8>, <8 x i8> [[TMP1]], align 1
	; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i64>			; CHECK-NEXT: [[TMP3:%.*]] = zext <8 x i8> [[TMP2]] to <8 x i64>
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <8 x i64> [[TMP3]], <i64 0, i64 8, i64 16, i64 24, i64 32, i64 40, i64 48, i64 56>			; CHECK-NEXT: [[TMP4:%.*]] = shl nuw <8 x i64> [[TMP3]], <i64 0, i64 8, i64 16, i64 24, i64 32, i64 40, i64 48, i64 56>
	; CHECK-NEXT: [[O1:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[O2:%.*]] = or i64 [[O1]], undef
	; CHECK-NEXT: [[O3:%.*]] = or i64 [[O2]], undef
	; CHECK-NEXT: [[O4:%.*]] = or i64 [[O3]], undef
	; CHECK-NEXT: [[O5:%.*]] = or i64 [[O4]], undef
	; CHECK-NEXT: [[O6:%.*]] = or i64 [[O5]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i64> [[TMP4]], <8 x i64> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i64> [[TMP4]], <8 x i64> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i64> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i64> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i64> [[BIN_RDX]], <8 x i64> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i64> [[BIN_RDX]], <8 x i64> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i64> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i64> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i64> [[BIN_RDX2]], <8 x i64> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i64> [[BIN_RDX2]], <8 x i64> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i64> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i64> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i64> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i64> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[O7:%.*]] = or i64 [[O6]], undef
	; CHECK-NEXT: ret i64 [[TMP5]]			; CHECK-NEXT: ret i64 [[TMP5]]
	;			;
	%g1 = getelementptr inbounds i8, i8* %arg, i64 1			%g1 = getelementptr inbounds i8, i8* %arg, i64 1
	%g2 = getelementptr inbounds i8, i8* %arg, i64 2			%g2 = getelementptr inbounds i8, i8* %arg, i64 2
	%g3 = getelementptr inbounds i8, i8* %arg, i64 3			%g3 = getelementptr inbounds i8, i8* %arg, i64 3
	%g4 = getelementptr inbounds i8, i8* %arg, i64 4			%g4 = getelementptr inbounds i8, i8* %arg, i64 4
	%g5 = getelementptr inbounds i8, i8* %arg, i64 5			%g5 = getelementptr inbounds i8, i8* %arg, i64 5
	%g6 = getelementptr inbounds i8, i8* %arg, i64 6			%g6 = getelementptr inbounds i8, i8* %arg, i64 6
	Show All 38 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: @bazz(			; CHECK-LABEL: @bazz(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float undef, [[CONV]]
	; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
	; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2			; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
	; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float			; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; CHECK-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]]
	; CHECK-NEXT: [[ADD19:%.*]] = fadd fast float undef, [[ADD7]]
	; CHECK-NEXT: [[ADD19_1:%.*]] = fadd fast float undef, [[ADD19]]
	; CHECK-NEXT: [[ADD19_2:%.*]] = fadd fast float undef, [[ADD19_1]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
	; CHECK-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
	; CHECK-NEXT: store float [[OP_EXTRA5]], float* @res, align 4			; CHECK-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
	; CHECK-NEXT: ret float [[OP_EXTRA5]]			; CHECK-NEXT: ret float [[OP_EXTRA5]]
	;			;
	; THRESHOLD-LABEL: @bazz(			; THRESHOLD-LABEL: @bazz(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3			; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float undef, [[CONV]]
	; THRESHOLD-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
	; THRESHOLD-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; THRESHOLD-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2			; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
	; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float			; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
	; THRESHOLD-NEXT: [[ADD7:%.*]] = fadd fast float [[ADD_3]], [[CONV6]]
	; THRESHOLD-NEXT: [[ADD19:%.*]] = fadd fast float undef, [[ADD7]]
	; THRESHOLD-NEXT: [[ADD19_1:%.*]] = fadd fast float undef, [[ADD19]]
	; THRESHOLD-NEXT: [[ADD19_2:%.*]] = fadd fast float undef, [[ADD19_1]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]			; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]
	; THRESHOLD-NEXT: [[ADD19_3:%.*]] = fadd fast float undef, [[ADD19_2]]
	; THRESHOLD-NEXT: store float [[OP_EXTRA5]], float* @res, align 4			; THRESHOLD-NEXT: store float [[OP_EXTRA5]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%mul = mul nsw i32 %0, 3			%mul = mul nsw i32 %0, 3
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	Show All 38 Lines
	define float @bazzz() {			define float @bazzz() {
	; CHECK-LABEL: @bazzz(			; CHECK-LABEL: @bazzz(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]			; CHECK-NEXT: store float [[TMP5]], float* @res, align 4
	; CHECK-NEXT: store float [[TMP8]], float* @res, align 4			; CHECK-NEXT: ret float [[TMP5]]
	; CHECK-NEXT: ret float [[TMP8]]
	;			;
	; THRESHOLD-LABEL: @bazzz(			; THRESHOLD-LABEL: @bazzz(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
	; THRESHOLD-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; THRESHOLD-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]			; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; THRESHOLD-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]			; THRESHOLD-NEXT: store float [[TMP5]], float* @res, align 4
	; THRESHOLD-NEXT: store float [[TMP8]], float* @res, align 4			; THRESHOLD-NEXT: ret float [[TMP5]]
	; THRESHOLD-NEXT: ret float [[TMP8]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16			%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
	%mul = fmul fast float %2, %1			%mul = fmul fast float %2, %1
	%3 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4			%3 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
	Show All 16 Lines
	define i32 @foo() {			define i32 @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; CHECK-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]			; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP5]] to i32
	; CHECK-NEXT: [[CONV4:%.*]] = fptosi float [[TMP8]] to i32
	; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4			; CHECK-NEXT: store i32 [[CONV4]], i32* @n, align 4
	; CHECK-NEXT: ret i32 [[CONV4]]			; CHECK-NEXT: ret i32 [[CONV4]]
	;			;
	; THRESHOLD-LABEL: @foo(			; THRESHOLD-LABEL: @foo(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = fadd fast float undef, undef
	; THRESHOLD-NEXT: [[TMP5:%.*]] = fadd fast float undef, [[TMP4]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
	; THRESHOLD-NEXT: [[TMP7:%.*]] = fadd fast float undef, [[TMP5]]			; THRESHOLD-NEXT: [[TMP5:%.*]] = fmul fast float [[CONV]], [[TMP4]]
	; THRESHOLD-NEXT: [[TMP8:%.*]] = fmul fast float [[CONV]], [[TMP6]]			; THRESHOLD-NEXT: [[CONV4:%.*]] = fptosi float [[TMP5]] to i32
	; THRESHOLD-NEXT: [[CONV4:%.*]] = fptosi float [[TMP8]] to i32
	; THRESHOLD-NEXT: store i32 [[CONV4]], i32* @n, align 4			; THRESHOLD-NEXT: store i32 [[CONV4]], i32* @n, align 4
	; THRESHOLD-NEXT: ret i32 [[CONV4]]			; THRESHOLD-NEXT: ret i32 [[CONV4]]
	;			;
	entry:			entry:
	%0 = load i32, i32* @n, align 4			%0 = load i32, i32* @n, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16			%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
	Show All 17 Lines
	}			}

	define float @bar() {			define float @bar() {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp fast ogt float undef, undef
	; CHECK-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef, float undef
	; CHECK-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]], undef
	; CHECK-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float [[MAX_0_MUL3]], float undef
	; CHECK-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float [[MAX_0_MUL3_1]], float undef
	; CHECK-NEXT: store float [[TMP3]], float* @res, align 4			; CHECK-NEXT: store float [[TMP3]], float* @res, align 4
	; CHECK-NEXT: ret float [[TMP3]]			; CHECK-NEXT: ret float [[TMP3]]
	;			;
	; THRESHOLD-LABEL: @bar(			; THRESHOLD-LABEL: @bar(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16			; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
	; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]			; THRESHOLD-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP0]]
	; THRESHOLD-NEXT: [[CMP4:%.*]] = fcmp fast ogt float undef, undef
	; THRESHOLD-NEXT: [[MAX_0_MUL3:%.*]] = select i1 [[CMP4]], float undef, float undef
	; THRESHOLD-NEXT: [[CMP4_1:%.*]] = fcmp fast ogt float [[MAX_0_MUL3]], undef
	; THRESHOLD-NEXT: [[MAX_0_MUL3_1:%.*]] = select i1 [[CMP4_1]], float [[MAX_0_MUL3]], float undef
	; THRESHOLD-NEXT: [[CMP4_2:%.*]] = fcmp fast ogt float [[MAX_0_MUL3_1]], undef
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <4 x float> [[TMP2]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]			; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x float> [[TMP2]], <4 x float> [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <4 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x float> [[RDX_MINMAX_SELECT]], <4 x float> [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0			; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[RDX_MINMAX_SELECT3]], i32 0
	; THRESHOLD-NEXT: [[MAX_0_MUL3_2:%.*]] = select i1 [[CMP4_2]], float [[MAX_0_MUL3_1]], float undef
	; THRESHOLD-NEXT: store float [[TMP3]], float* @res, align 4			; THRESHOLD-NEXT: store float [[TMP3]], float* @res, align 4
	; THRESHOLD-NEXT: ret float [[TMP3]]			; THRESHOLD-NEXT: ret float [[TMP3]]
	;			;
	entry:			entry:
	%0 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16			%0 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
	%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16			%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
	%mul = fmul fast float %1, %0			%mul = fmul fast float %1, %0
	%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4			%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 1), align 4
	Show All 30 Lines
	; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10			; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10
	; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11			; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11
	; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12			; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12
	; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13			; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13
	; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14			; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14
	; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15			; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <16 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <16 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float undef, undef
	; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
	; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
	; CHECK-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
	; CHECK-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
	; CHECK-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
	; CHECK-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
	; CHECK-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
	; CHECK-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
	; CHECK-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
	; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
	; CHECK-NEXT: [[ADD_15:%.*]] = fadd fast float undef, [[ADD_14]]
	; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16			; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16
	; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17			; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17
	; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18			; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18
	; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19			; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19
	; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20			; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20
	; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21			; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21
	; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22			; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22
	; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23			; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23
	Show All 18 Lines
	; CHECK-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42			; CHECK-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42
	; CHECK-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43			; CHECK-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43
	; CHECK-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44			; CHECK-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44
	; CHECK-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45			; CHECK-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45
	; CHECK-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46			; CHECK-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46
	; CHECK-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47			; CHECK-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[ADD_16:%.*]] = fadd fast float undef, [[ADD_15]]
	; CHECK-NEXT: [[ADD_17:%.*]] = fadd fast float undef, [[ADD_16]]
	; CHECK-NEXT: [[ADD_18:%.*]] = fadd fast float undef, [[ADD_17]]
	; CHECK-NEXT: [[ADD_19:%.*]] = fadd fast float undef, [[ADD_18]]
	; CHECK-NEXT: [[ADD_20:%.*]] = fadd fast float undef, [[ADD_19]]
	; CHECK-NEXT: [[ADD_21:%.*]] = fadd fast float undef, [[ADD_20]]
	; CHECK-NEXT: [[ADD_22:%.*]] = fadd fast float undef, [[ADD_21]]
	; CHECK-NEXT: [[ADD_23:%.*]] = fadd fast float undef, [[ADD_22]]
	; CHECK-NEXT: [[ADD_24:%.*]] = fadd fast float undef, [[ADD_23]]
	; CHECK-NEXT: [[ADD_25:%.*]] = fadd fast float undef, [[ADD_24]]
	; CHECK-NEXT: [[ADD_26:%.*]] = fadd fast float undef, [[ADD_25]]
	; CHECK-NEXT: [[ADD_27:%.*]] = fadd fast float undef, [[ADD_26]]
	; CHECK-NEXT: [[ADD_28:%.*]] = fadd fast float undef, [[ADD_27]]
	; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
	; CHECK-NEXT: [[ADD_30:%.*]] = fadd fast float undef, [[ADD_29]]
	; CHECK-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
	; CHECK-NEXT: [[ADD_32:%.*]] = fadd fast float undef, [[ADD_31]]
	; CHECK-NEXT: [[ADD_33:%.*]] = fadd fast float undef, [[ADD_32]]
	; CHECK-NEXT: [[ADD_34:%.*]] = fadd fast float undef, [[ADD_33]]
	; CHECK-NEXT: [[ADD_35:%.*]] = fadd fast float undef, [[ADD_34]]
	; CHECK-NEXT: [[ADD_36:%.*]] = fadd fast float undef, [[ADD_35]]
	; CHECK-NEXT: [[ADD_37:%.*]] = fadd fast float undef, [[ADD_36]]
	; CHECK-NEXT: [[ADD_38:%.*]] = fadd fast float undef, [[ADD_37]]
	; CHECK-NEXT: [[ADD_39:%.*]] = fadd fast float undef, [[ADD_38]]
	; CHECK-NEXT: [[ADD_40:%.*]] = fadd fast float undef, [[ADD_39]]
	; CHECK-NEXT: [[ADD_41:%.*]] = fadd fast float undef, [[ADD_40]]
	; CHECK-NEXT: [[ADD_42:%.*]] = fadd fast float undef, [[ADD_41]]
	; CHECK-NEXT: [[ADD_43:%.*]] = fadd fast float undef, [[ADD_42]]
	; CHECK-NEXT: [[ADD_44:%.*]] = fadd fast float undef, [[ADD_43]]
	; CHECK-NEXT: [[ADD_45:%.*]] = fadd fast float undef, [[ADD_44]]
	; CHECK-NEXT: [[ADD_46:%.*]] = fadd fast float undef, [[ADD_45]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP3]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP3]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP3]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]			; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]			; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <16 x float> [[TMP1]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <16 x float> [[TMP1]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]			; CHECK-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]			; CHECK-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]			; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]			; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0
	; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
	; CHECK-NEXT: ret float [[OP_RDX]]			; CHECK-NEXT: ret float [[OP_RDX]]
	;			;
	; THRESHOLD-LABEL: @f(			; THRESHOLD-LABEL: @f(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 8			; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 8
	; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 9			; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 9
	; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10			; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10
	; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11			; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11
	; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12			; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12
	; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13			; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13
	; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14			; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14
	; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15			; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <16 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <16 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[ADD_1:%.*]] = fadd fast float undef, undef
	; THRESHOLD-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; THRESHOLD-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; THRESHOLD-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
	; THRESHOLD-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
	; THRESHOLD-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
	; THRESHOLD-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
	; THRESHOLD-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
	; THRESHOLD-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
	; THRESHOLD-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
	; THRESHOLD-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
	; THRESHOLD-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
	; THRESHOLD-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
	; THRESHOLD-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
	; THRESHOLD-NEXT: [[ADD_15:%.*]] = fadd fast float undef, [[ADD_14]]
	; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16			; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16
	; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17			; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17
	; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18			; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18
	; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19			; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19
	; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20			; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20
	; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21			; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21
	; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22			; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22
	; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23			; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23
	Show All 18 Lines
	; THRESHOLD-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42			; THRESHOLD-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42
	; THRESHOLD-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43			; THRESHOLD-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43
	; THRESHOLD-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44			; THRESHOLD-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44
	; THRESHOLD-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45			; THRESHOLD-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45
	; THRESHOLD-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46			; THRESHOLD-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46
	; THRESHOLD-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47			; THRESHOLD-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47
	; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*			; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*
	; THRESHOLD-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4			; THRESHOLD-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4
	; THRESHOLD-NEXT: [[ADD_16:%.*]] = fadd fast float undef, [[ADD_15]]
	; THRESHOLD-NEXT: [[ADD_17:%.*]] = fadd fast float undef, [[ADD_16]]
	; THRESHOLD-NEXT: [[ADD_18:%.*]] = fadd fast float undef, [[ADD_17]]
	; THRESHOLD-NEXT: [[ADD_19:%.*]] = fadd fast float undef, [[ADD_18]]
	; THRESHOLD-NEXT: [[ADD_20:%.*]] = fadd fast float undef, [[ADD_19]]
	; THRESHOLD-NEXT: [[ADD_21:%.*]] = fadd fast float undef, [[ADD_20]]
	; THRESHOLD-NEXT: [[ADD_22:%.*]] = fadd fast float undef, [[ADD_21]]
	; THRESHOLD-NEXT: [[ADD_23:%.*]] = fadd fast float undef, [[ADD_22]]
	; THRESHOLD-NEXT: [[ADD_24:%.*]] = fadd fast float undef, [[ADD_23]]
	; THRESHOLD-NEXT: [[ADD_25:%.*]] = fadd fast float undef, [[ADD_24]]
	; THRESHOLD-NEXT: [[ADD_26:%.*]] = fadd fast float undef, [[ADD_25]]
	; THRESHOLD-NEXT: [[ADD_27:%.*]] = fadd fast float undef, [[ADD_26]]
	; THRESHOLD-NEXT: [[ADD_28:%.*]] = fadd fast float undef, [[ADD_27]]
	; THRESHOLD-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
	; THRESHOLD-NEXT: [[ADD_30:%.*]] = fadd fast float undef, [[ADD_29]]
	; THRESHOLD-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
	; THRESHOLD-NEXT: [[ADD_32:%.*]] = fadd fast float undef, [[ADD_31]]
	; THRESHOLD-NEXT: [[ADD_33:%.*]] = fadd fast float undef, [[ADD_32]]
	; THRESHOLD-NEXT: [[ADD_34:%.*]] = fadd fast float undef, [[ADD_33]]
	; THRESHOLD-NEXT: [[ADD_35:%.*]] = fadd fast float undef, [[ADD_34]]
	; THRESHOLD-NEXT: [[ADD_36:%.*]] = fadd fast float undef, [[ADD_35]]
	; THRESHOLD-NEXT: [[ADD_37:%.*]] = fadd fast float undef, [[ADD_36]]
	; THRESHOLD-NEXT: [[ADD_38:%.*]] = fadd fast float undef, [[ADD_37]]
	; THRESHOLD-NEXT: [[ADD_39:%.*]] = fadd fast float undef, [[ADD_38]]
	; THRESHOLD-NEXT: [[ADD_40:%.*]] = fadd fast float undef, [[ADD_39]]
	; THRESHOLD-NEXT: [[ADD_41:%.*]] = fadd fast float undef, [[ADD_40]]
	; THRESHOLD-NEXT: [[ADD_42:%.*]] = fadd fast float undef, [[ADD_41]]
	; THRESHOLD-NEXT: [[ADD_43:%.*]] = fadd fast float undef, [[ADD_42]]
	; THRESHOLD-NEXT: [[ADD_44:%.*]] = fadd fast float undef, [[ADD_43]]
	; THRESHOLD-NEXT: [[ADD_45:%.*]] = fadd fast float undef, [[ADD_44]]
	; THRESHOLD-NEXT: [[ADD_46:%.*]] = fadd fast float undef, [[ADD_45]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP3]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP3]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP3]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP3]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]			; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]			; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; THRESHOLD-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <16 x float> [[TMP1]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <16 x float> [[TMP1]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]			; THRESHOLD-NEXT: [[BIN_RDX10:%.*]] = fadd fast <16 x float> [[TMP1]], [[RDX_SHUF9]]
	; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF11:%.*]] = shufflevector <16 x float> [[BIN_RDX10]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]			; THRESHOLD-NEXT: [[BIN_RDX12:%.*]] = fadd fast <16 x float> [[BIN_RDX10]], [[RDX_SHUF11]]
	; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <16 x float> [[BIN_RDX12]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]			; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <16 x float> [[BIN_RDX12]], [[RDX_SHUF13]]
	; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <16 x float> [[BIN_RDX14]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]			; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <16 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0			; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <16 x float> [[BIN_RDX16]], i32 0
	; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]			; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
	; THRESHOLD-NEXT: [[ADD_47:%.*]] = fadd fast float undef, [[ADD_46]]
	; THRESHOLD-NEXT: ret float [[OP_RDX]]			; THRESHOLD-NEXT: ret float [[OP_RDX]]
	;			;
	entry:			entry:
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1			%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
	%1 = load float, float* %arrayidx.1, align 4			%1 = load float, float* %arrayidx.1, align 4
	%add.1 = fadd fast float %1, %0			%add.1 = fadd fast float %1, %0
	%arrayidx.2 = getelementptr inbounds float, float* %x, i64 2			%arrayidx.2 = getelementptr inbounds float, float* %x, i64 2
	▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26			; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27			; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28			; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29			; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30			; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31			; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = fadd fast float undef, [[CONV]]
	; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
	; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
	; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
	; CHECK-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
	; CHECK-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
	; CHECK-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
	; CHECK-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
	; CHECK-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
	; CHECK-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
	; CHECK-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
	; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
	; CHECK-NEXT: [[ADD_15:%.*]] = fadd fast float undef, [[ADD_14]]
	; CHECK-NEXT: [[ADD_16:%.*]] = fadd fast float undef, [[ADD_15]]
	; CHECK-NEXT: [[ADD_17:%.*]] = fadd fast float undef, [[ADD_16]]
	; CHECK-NEXT: [[ADD_18:%.*]] = fadd fast float undef, [[ADD_17]]
	; CHECK-NEXT: [[ADD_19:%.*]] = fadd fast float undef, [[ADD_18]]
	; CHECK-NEXT: [[ADD_20:%.*]] = fadd fast float undef, [[ADD_19]]
	; CHECK-NEXT: [[ADD_21:%.*]] = fadd fast float undef, [[ADD_20]]
	; CHECK-NEXT: [[ADD_22:%.*]] = fadd fast float undef, [[ADD_21]]
	; CHECK-NEXT: [[ADD_23:%.*]] = fadd fast float undef, [[ADD_22]]
	; CHECK-NEXT: [[ADD_24:%.*]] = fadd fast float undef, [[ADD_23]]
	; CHECK-NEXT: [[ADD_25:%.*]] = fadd fast float undef, [[ADD_24]]
	; CHECK-NEXT: [[ADD_26:%.*]] = fadd fast float undef, [[ADD_25]]
	; CHECK-NEXT: [[ADD_27:%.*]] = fadd fast float undef, [[ADD_26]]
	; CHECK-NEXT: [[ADD_28:%.*]] = fadd fast float undef, [[ADD_27]]
	; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
	; CHECK-NEXT: [[ADD_30:%.*]] = fadd fast float undef, [[ADD_29]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP1]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP1]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]			; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]			; CHECK-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
	; CHECK-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
	; CHECK-NEXT: ret float [[OP_EXTRA]]			; CHECK-NEXT: ret float [[OP_EXTRA]]
	;			;
	; THRESHOLD-LABEL: @f1(			; THRESHOLD-LABEL: @f1(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]			; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float
	; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
	Show All 23 Lines
	; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26			; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27			; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28			; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29			; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30			; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31			; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float undef, [[CONV]]
	; THRESHOLD-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
	; THRESHOLD-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; THRESHOLD-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; THRESHOLD-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
	; THRESHOLD-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
	; THRESHOLD-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
	; THRESHOLD-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
	; THRESHOLD-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
	; THRESHOLD-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
	; THRESHOLD-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
	; THRESHOLD-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
	; THRESHOLD-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
	; THRESHOLD-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
	; THRESHOLD-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
	; THRESHOLD-NEXT: [[ADD_15:%.*]] = fadd fast float undef, [[ADD_14]]
	; THRESHOLD-NEXT: [[ADD_16:%.*]] = fadd fast float undef, [[ADD_15]]
	; THRESHOLD-NEXT: [[ADD_17:%.*]] = fadd fast float undef, [[ADD_16]]
	; THRESHOLD-NEXT: [[ADD_18:%.*]] = fadd fast float undef, [[ADD_17]]
	; THRESHOLD-NEXT: [[ADD_19:%.*]] = fadd fast float undef, [[ADD_18]]
	; THRESHOLD-NEXT: [[ADD_20:%.*]] = fadd fast float undef, [[ADD_19]]
	; THRESHOLD-NEXT: [[ADD_21:%.*]] = fadd fast float undef, [[ADD_20]]
	; THRESHOLD-NEXT: [[ADD_22:%.*]] = fadd fast float undef, [[ADD_21]]
	; THRESHOLD-NEXT: [[ADD_23:%.*]] = fadd fast float undef, [[ADD_22]]
	; THRESHOLD-NEXT: [[ADD_24:%.*]] = fadd fast float undef, [[ADD_23]]
	; THRESHOLD-NEXT: [[ADD_25:%.*]] = fadd fast float undef, [[ADD_24]]
	; THRESHOLD-NEXT: [[ADD_26:%.*]] = fadd fast float undef, [[ADD_25]]
	; THRESHOLD-NEXT: [[ADD_27:%.*]] = fadd fast float undef, [[ADD_26]]
	; THRESHOLD-NEXT: [[ADD_28:%.*]] = fadd fast float undef, [[ADD_27]]
	; THRESHOLD-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
	; THRESHOLD-NEXT: [[ADD_30:%.*]] = fadd fast float undef, [[ADD_29]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP1]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP1]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP1]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <32 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[BIN_RDX]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <32 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x float> [[BIN_RDX2]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <32 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x float> [[BIN_RDX4]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]			; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <32 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[BIN_RDX6]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]			; THRESHOLD-NEXT: [[BIN_RDX8:%.*]] = fadd fast <32 x float> [[BIN_RDX6]], [[RDX_SHUF7]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0			; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <32 x float> [[BIN_RDX8]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD_31:%.*]] = fadd fast float undef, [[ADD_30]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA]]
	;			;
	entry:			entry:
	%rem = srem i32 %a, %b			%rem = srem i32 %a, %b
	%conv = sitofp i32 %rem to float			%conv = sitofp i32 %rem to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %0, %conv			%add = fadd fast float %0, %conv
	%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1			%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines

	define float @loadadd31(float* nocapture readonly %x) {			define float @loadadd31(float* nocapture readonly %x) {
	; CHECK-LABEL: @loadadd31(			; CHECK-LABEL: @loadadd31(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4
	; CHECK-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_2]] to <4 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_2]] to <4 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
	; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8
	; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9			; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9
	; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10			; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10
	; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11			; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11
	; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12			; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12
	; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13			; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13
	; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14			; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14
	; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_6]] to <8 x float>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_6]] to <8 x float>*
	; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
	; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
	; CHECK-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
	; CHECK-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
	; CHECK-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
	; CHECK-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
	; CHECK-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
	; CHECK-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
	; CHECK-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
	; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15			; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15
	; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16			; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16
	; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17			; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17
	; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18			; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18
	; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19			; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19
	; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20			; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20
	; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21			; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21
	; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22			; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22
	; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23			; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23
	; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24			; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24
	; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25			; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25
	; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26			; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27			; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28			; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29			; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30			; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*
	; CHECK-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4
	; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
	; CHECK-NEXT: [[ADD_15:%.*]] = fadd fast float undef, [[ADD_14]]
	; CHECK-NEXT: [[ADD_16:%.*]] = fadd fast float undef, [[ADD_15]]
	; CHECK-NEXT: [[ADD_17:%.*]] = fadd fast float undef, [[ADD_16]]
	; CHECK-NEXT: [[ADD_18:%.*]] = fadd fast float undef, [[ADD_17]]
	; CHECK-NEXT: [[ADD_19:%.*]] = fadd fast float undef, [[ADD_18]]
	; CHECK-NEXT: [[ADD_20:%.*]] = fadd fast float undef, [[ADD_19]]
	; CHECK-NEXT: [[ADD_21:%.*]] = fadd fast float undef, [[ADD_20]]
	; CHECK-NEXT: [[ADD_22:%.*]] = fadd fast float undef, [[ADD_21]]
	; CHECK-NEXT: [[ADD_23:%.*]] = fadd fast float undef, [[ADD_22]]
	; CHECK-NEXT: [[ADD_24:%.*]] = fadd fast float undef, [[ADD_23]]
	; CHECK-NEXT: [[ADD_25:%.*]] = fadd fast float undef, [[ADD_24]]
	; CHECK-NEXT: [[ADD_26:%.*]] = fadd fast float undef, [[ADD_25]]
	; CHECK-NEXT: [[ADD_27:%.*]] = fadd fast float undef, [[ADD_26]]
	; CHECK-NEXT: [[ADD_28:%.*]] = fadd fast float undef, [[ADD_27]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP7]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP7]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]			; CHECK-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	Show All 9 Lines
	; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]			; CHECK-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]
	; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]			; CHECK-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0
	; CHECK-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]			; CHECK-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]			; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
	; CHECK-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
	; CHECK-NEXT: ret float [[TMP12]]			; CHECK-NEXT: ret float [[TMP12]]
	;			;
	; THRESHOLD-LABEL: @loadadd31(			; THRESHOLD-LABEL: @loadadd31(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4			; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
	; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4
	; THRESHOLD-NEXT: [[ADD_1:%.*]] = fadd fast float [[TMP1]], [[TMP0]]
	; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_2]] to <4 x float>*			; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_2]] to <4 x float>*
	; THRESHOLD-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4			; THRESHOLD-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
	; THRESHOLD-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
	; THRESHOLD-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
	; THRESHOLD-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
	; THRESHOLD-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
	; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8			; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8
	; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9			; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9
	; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10			; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10
	; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11			; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11
	; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12			; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12
	; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13			; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13
	; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14			; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14
	; THRESHOLD-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_6]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_6]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4			; THRESHOLD-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
	; THRESHOLD-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
	; THRESHOLD-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
	; THRESHOLD-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
	; THRESHOLD-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
	; THRESHOLD-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
	; THRESHOLD-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
	; THRESHOLD-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
	; THRESHOLD-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
	; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15			; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15
	; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16			; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16
	; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17			; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17
	; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18			; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18
	; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19			; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19
	; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20			; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20
	; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21			; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21
	; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22			; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22
	; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23			; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23
	; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24			; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24
	; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25			; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25
	; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26			; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26
	; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27			; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27
	; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28			; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28
	; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29			; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29
	; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30			; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30
	; THRESHOLD-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*			; THRESHOLD-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*
	; THRESHOLD-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4			; THRESHOLD-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4
	; THRESHOLD-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
	; THRESHOLD-NEXT: [[ADD_15:%.*]] = fadd fast float undef, [[ADD_14]]
	; THRESHOLD-NEXT: [[ADD_16:%.*]] = fadd fast float undef, [[ADD_15]]
	; THRESHOLD-NEXT: [[ADD_17:%.*]] = fadd fast float undef, [[ADD_16]]
	; THRESHOLD-NEXT: [[ADD_18:%.*]] = fadd fast float undef, [[ADD_17]]
	; THRESHOLD-NEXT: [[ADD_19:%.*]] = fadd fast float undef, [[ADD_18]]
	; THRESHOLD-NEXT: [[ADD_20:%.*]] = fadd fast float undef, [[ADD_19]]
	; THRESHOLD-NEXT: [[ADD_21:%.*]] = fadd fast float undef, [[ADD_20]]
	; THRESHOLD-NEXT: [[ADD_22:%.*]] = fadd fast float undef, [[ADD_21]]
	; THRESHOLD-NEXT: [[ADD_23:%.*]] = fadd fast float undef, [[ADD_22]]
	; THRESHOLD-NEXT: [[ADD_24:%.*]] = fadd fast float undef, [[ADD_23]]
	; THRESHOLD-NEXT: [[ADD_25:%.*]] = fadd fast float undef, [[ADD_24]]
	; THRESHOLD-NEXT: [[ADD_26:%.*]] = fadd fast float undef, [[ADD_25]]
	; THRESHOLD-NEXT: [[ADD_27:%.*]] = fadd fast float undef, [[ADD_26]]
	; THRESHOLD-NEXT: [[ADD_28:%.*]] = fadd fast float undef, [[ADD_27]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP7]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP7]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP7]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]			; THRESHOLD-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
	Show All 9 Lines
	; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF13:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]			; THRESHOLD-NEXT: [[BIN_RDX14:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF13]]
	; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF15:%.*]] = shufflevector <4 x float> [[BIN_RDX14]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]			; THRESHOLD-NEXT: [[BIN_RDX16:%.*]] = fadd fast <4 x float> [[BIN_RDX14]], [[RDX_SHUF15]]
	; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0			; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[BIN_RDX16]], i32 0
	; THRESHOLD-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]			; THRESHOLD-NEXT: [[OP_RDX17:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
	; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]			; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX17]], [[TMP1]]
	; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]			; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]
	; THRESHOLD-NEXT: [[ADD_29:%.*]] = fadd fast float undef, [[ADD_28]]
	; THRESHOLD-NEXT: ret float [[TMP12]]			; THRESHOLD-NEXT: ret float [[TMP12]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds float, float* %x, i64 1			%arrayidx = getelementptr inbounds float, float* %x, i64 1
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2			%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2
	%1 = load float, float* %arrayidx.1, align 4			%1 = load float, float* %arrayidx.1, align 4
	%add.1 = fadd fast float %1, %0			%add.1 = fadd fast float %1, %0
	▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ADD1:%.*]] = fadd fast float undef, [[ADD]]
	; CHECK-NEXT: [[ADD4:%.*]] = fadd fast float undef, [[ADD1]]
	; CHECK-NEXT: [[ADD5:%.*]] = fadd fast float [[ADD4]], [[CONV]]
	; CHECK-NEXT: [[ADD4_1:%.*]] = fadd fast float undef, [[ADD5]]
	; CHECK-NEXT: [[ADD4_2:%.*]] = fadd fast float undef, [[ADD4_1]]
	; CHECK-NEXT: [[ADD4_3:%.*]] = fadd fast float undef, [[ADD4_2]]
	; CHECK-NEXT: [[ADD4_4:%.*]] = fadd fast float undef, [[ADD4_3]]
	; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
	; CHECK-NEXT: ret float [[OP_EXTRA5]]			; CHECK-NEXT: ret float [[OP_EXTRA5]]
	;			;
	; THRESHOLD-LABEL: @extra_args(			; THRESHOLD-LABEL: @extra_args(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00			; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
	; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[ADD1:%.*]] = fadd fast float undef, [[ADD]]
	; THRESHOLD-NEXT: [[ADD4:%.*]] = fadd fast float undef, [[ADD1]]
	; THRESHOLD-NEXT: [[ADD5:%.*]] = fadd fast float [[ADD4]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD4_1:%.*]] = fadd fast float undef, [[ADD5]]
	; THRESHOLD-NEXT: [[ADD4_2:%.*]] = fadd fast float undef, [[ADD4_1]]
	; THRESHOLD-NEXT: [[ADD4_3:%.*]] = fadd fast float undef, [[ADD4_2]]
	; THRESHOLD-NEXT: [[ADD4_4:%.*]] = fadd fast float undef, [[ADD4_3]]
	; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %conv, 3.000000e+00			%add = fadd fast float %conv, 3.000000e+00
	%add1 = fadd fast float %0, %add			%add1 = fadd fast float %0, %add
	Show All 32 Lines
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ADD1:%.*]] = fadd fast float undef, [[ADD]]
	; CHECK-NEXT: [[ADD4:%.*]] = fadd fast float undef, [[ADD1]]
	; CHECK-NEXT: [[ADD41:%.*]] = fadd fast float [[ADD4]], 5.000000e+00
	; CHECK-NEXT: [[ADD5:%.*]] = fadd fast float [[ADD41]], [[CONV]]
	; CHECK-NEXT: [[ADD4_1:%.*]] = fadd fast float undef, [[ADD5]]
	; CHECK-NEXT: [[ADD4_11:%.*]] = fadd fast float [[ADD4_1]], 5.000000e+00
	; CHECK-NEXT: [[ADD4_2:%.*]] = fadd fast float undef, [[ADD4_11]]
	; CHECK-NEXT: [[ADD4_3:%.*]] = fadd fast float undef, [[ADD4_2]]
	; CHECK-NEXT: [[ADD4_4:%.*]] = fadd fast float undef, [[ADD4_3]]
	; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00			; CHECK-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]
	; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
	; CHECK-NEXT: ret float [[OP_EXTRA7]]			; CHECK-NEXT: ret float [[OP_EXTRA7]]
	;			;
	; THRESHOLD-LABEL: @extra_args_same_several_times(			; THRESHOLD-LABEL: @extra_args_same_several_times(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00			; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
	; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[ADD1:%.*]] = fadd fast float undef, [[ADD]]
	; THRESHOLD-NEXT: [[ADD4:%.*]] = fadd fast float undef, [[ADD1]]
	; THRESHOLD-NEXT: [[ADD41:%.*]] = fadd fast float [[ADD4]], 5.000000e+00
	; THRESHOLD-NEXT: [[ADD5:%.*]] = fadd fast float [[ADD41]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD4_1:%.*]] = fadd fast float undef, [[ADD5]]
	; THRESHOLD-NEXT: [[ADD4_11:%.*]] = fadd fast float [[ADD4_1]], 5.000000e+00
	; THRESHOLD-NEXT: [[ADD4_2:%.*]] = fadd fast float undef, [[ADD4_11]]
	; THRESHOLD-NEXT: [[ADD4_3:%.*]] = fadd fast float undef, [[ADD4_2]]
	; THRESHOLD-NEXT: [[ADD4_4:%.*]] = fadd fast float undef, [[ADD4_3]]
	; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00			; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00
	; THRESHOLD-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00			; THRESHOLD-NEXT: [[OP_EXTRA6:%.*]] = fadd fast float [[OP_EXTRA5]], 5.000000e+00
	; THRESHOLD-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA7:%.*]] = fadd fast float [[OP_EXTRA6]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA7]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA7]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %conv, 3.000000e+00			%add = fadd fast float %conv, 3.000000e+00
	%add1 = fadd fast float %0, %add			%add1 = fadd fast float %0, %add
	Show All 36 Lines
	; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[ADD1:%.*]] = fadd fast float undef, [[ADD]]
	; CHECK-NEXT: [[ADD4:%.*]] = fadd fast float undef, [[ADD1]]
	; CHECK-NEXT: [[ADD4_1:%.*]] = fadd fast float undef, [[ADD4]]
	; CHECK-NEXT: [[ADD4_2:%.*]] = fadd fast float undef, [[ADD4_1]]
	; CHECK-NEXT: [[ADD4_3:%.*]] = fadd fast float undef, [[ADD4_2]]
	; CHECK-NEXT: [[ADD5:%.*]] = fadd fast float [[ADD4_3]], [[CONV]]
	; CHECK-NEXT: [[ADD4_4:%.*]] = fadd fast float undef, [[ADD5]]
	; CHECK-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; CHECK-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
	; CHECK-NEXT: ret float [[OP_EXTRA5]]			; CHECK-NEXT: ret float [[OP_EXTRA5]]
	;			;
	; THRESHOLD-LABEL: @extra_args_no_replace(			; THRESHOLD-LABEL: @extra_args_no_replace(
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float			; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
	; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00			; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00
	; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]			; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]
	; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1			; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
	; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2			; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
	; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3			; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
	; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4			; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
	; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5			; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
	; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6			; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
	; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7			; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[ADD1:%.*]] = fadd fast float undef, [[ADD]]
	; THRESHOLD-NEXT: [[ADD4:%.*]] = fadd fast float undef, [[ADD1]]
	; THRESHOLD-NEXT: [[ADD4_1:%.*]] = fadd fast float undef, [[ADD4]]
	; THRESHOLD-NEXT: [[ADD4_2:%.*]] = fadd fast float undef, [[ADD4_1]]
	; THRESHOLD-NEXT: [[ADD4_3:%.*]] = fadd fast float undef, [[ADD4_2]]
	; THRESHOLD-NEXT: [[ADD5:%.*]] = fadd fast float [[ADD4_3]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD4_4:%.*]] = fadd fast float undef, [[ADD5]]
	; THRESHOLD-NEXT: [[ADD4_5:%.*]] = fadd fast float undef, [[ADD4_4]]
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP1]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]			; THRESHOLD-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
	; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0			; THRESHOLD-NEXT: [[TMP2:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]
	; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]			; THRESHOLD-NEXT: [[OP_EXTRA5:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
	; THRESHOLD-NEXT: [[ADD4_6:%.*]] = fadd fast float undef, [[ADD4_5]]
	; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]			; THRESHOLD-NEXT: ret float [[OP_EXTRA5]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%convc = sitofp i32 %c to float			%convc = sitofp i32 %c to float
	%addc = fadd fast float %convc, 3.000000e+00			%addc = fadd fast float %convc, 3.000000e+00
	Show All 34 Lines
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>			; CHECK-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>
	; CHECK-NEXT: [[R1:%.*]] = add nuw i32 [[ARG]], undef
	; CHECK-NEXT: [[R2:%.*]] = add nsw i32 [[R1]], undef
	; CHECK-NEXT: [[R3:%.*]] = add nsw i32 [[R2]], undef
	; CHECK-NEXT: [[R4:%.*]] = add nsw i32 [[R3]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]			; CHECK-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
	; CHECK-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], [[TMP9]]
	; CHECK-NEXT: ret i32 [[OP_EXTRA3]]			; CHECK-NEXT: ret i32 [[OP_EXTRA3]]
	;			;
	; THRESHOLD-LABEL: @wobble(			; THRESHOLD-LABEL: @wobble(
	; THRESHOLD-NEXT: bb:			; THRESHOLD-NEXT: bb:
	; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> undef, i32 [[ARG:%.]], i32 0			; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> undef, i32 [[ARG:%.]], i32 0
	; THRESHOLD-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG]], i32 1			; THRESHOLD-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[ARG]], i32 1
	; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[ARG]], i32 2			; THRESHOLD-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[ARG]], i32 2
	; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ARG]], i32 3			; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[ARG]], i32 3
	; THRESHOLD-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.]] = insertelement <4 x i32> undef, i32 [[BAR:%.]], i32 0
	; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1			; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[BAR]], i32 1
	; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2			; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[BAR]], i32 2
	; THRESHOLD-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3			; THRESHOLD-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[BAR]], i32 3
	; THRESHOLD-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]			; THRESHOLD-NEXT: [[TMP8:%.*]] = xor <4 x i32> [[TMP3]], [[TMP7]]
	; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3			; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
	; THRESHOLD-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer			; THRESHOLD-NEXT: [[TMP10:%.*]] = icmp eq <4 x i32> [[TMP8]], zeroinitializer
	; THRESHOLD-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>			; THRESHOLD-NEXT: [[TMP11:%.*]] = sext <4 x i1> [[TMP10]] to <4 x i32>
	; THRESHOLD-NEXT: [[R1:%.*]] = add nuw i32 [[ARG]], undef
	; THRESHOLD-NEXT: [[R2:%.*]] = add nsw i32 [[R1]], undef
	; THRESHOLD-NEXT: [[R3:%.*]] = add nsw i32 [[R2]], undef
	; THRESHOLD-NEXT: [[R4:%.*]] = add nsw i32 [[R3]], undef
	; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]			; THRESHOLD-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP11]], [[RDX_SHUF]]
	; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; THRESHOLD-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; THRESHOLD-NEXT: [[BIN_RDX2:%.*]] = add <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0			; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]			; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP12]], [[ARG]]
	; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]			; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP9]]
	; THRESHOLD-NEXT: [[R5:%.*]] = add nsw i32 [[R4]], [[TMP9]]
	; THRESHOLD-NEXT: ret i32 [[OP_EXTRA3]]			; THRESHOLD-NEXT: ret i32 [[OP_EXTRA3]]
	;			;
	bb:			bb:
	%x1 = xor i32 %arg, %bar			%x1 = xor i32 %arg, %bar
	%i1 = icmp eq i32 %x1, 0			%i1 = icmp eq i32 %x1, 0
	%s1 = sext i1 %i1 to i32			%s1 = sext i1 %i1 to i32
	%x2 = xor i32 %arg, %bar			%x2 = xor i32 %arg, %bar
	%i2 = icmp eq i32 %x2, 0			%i2 = icmp eq i32 %x2, 0
	Show All 15 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown-linux -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE		; RUN: opt < %s -mtriple=x86_64-unknown-linux -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX		; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX
; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX2		; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=core-avx2 -slp-vectorizer -S \| FileCheck %s --check-prefixes=CHECK,AVX2
; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=skx -slp-vectorizer -S -slp-threshold=-100 \| FileCheck %s --check-prefixes=CHECK,SKX		; RUN: opt < %s -mtriple=x86_64-unknown-linux -mcpu=skx -slp-vectorizer -S -slp-threshold=-100 \| FileCheck %s --check-prefixes=CHECK,SKX

@arr = local_unnamed_addr global [32 x i32] zeroinitializer, align 16		@arr = local_unnamed_addr global [32 x i32] zeroinitializer, align 16
@arr1 = local_unnamed_addr global [32 x float] zeroinitializer, align 16		@arr1 = local_unnamed_addr global [32 x float] zeroinitializer, align 16
@arrp = local_unnamed_addr global [32 x i32*] zeroinitializer, align 16		@arrp = local_unnamed_addr global [32 x i32*] zeroinitializer, align 16
@var = global i32 zeroinitializer, align 8		@var = global i32 zeroinitializer, align 8

define i32 @maxi8(i32) {		define i32 @maxi8(i32) {
; CHECK-LABEL: @maxi8(		; CHECK-LABEL: @maxi8(
; CHECK-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr to <8 x i32>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 undef, undef
; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 undef, i32 undef
; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP4]], undef
; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP4]], i32 undef
; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP6]], undef
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP6]], i32 undef
; CHECK-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; CHECK-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; CHECK-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; CHECK-NEXT: [[TMP15:%.*]] = icmp sgt i32 [[TMP14]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <8 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP2]], <8 x i32> [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP2]], <8 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
; CHECK-NEXT: [[TMP17:%.*]] = select i1 [[TMP15]], i32 [[TMP14]], i32 undef		; CHECK-NEXT: ret i32 [[TMP3]]
; CHECK-NEXT: ret i32 [[TMP16]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
Show All 13 Lines	;
%22 = icmp sgt i32 %20, %21		%22 = icmp sgt i32 %20, %21
%23 = select i1 %22, i32 %20, i32 %21		%23 = select i1 %22, i32 %20, i32 %21
ret i32 %23		ret i32 %23
}		}

define i32 @maxi16(i32) {		define i32 @maxi16(i32) {
; CHECK-LABEL: @maxi16(		; CHECK-LABEL: @maxi16(
; CHECK-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr to <16 x i32>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 undef, undef
; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 undef, i32 undef
; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP4]], undef
; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP4]], i32 undef
; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP6]], undef
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP6]], i32 undef
; CHECK-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; CHECK-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; CHECK-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; CHECK-NEXT: [[TMP15:%.*]] = icmp sgt i32 [[TMP14]], undef
; CHECK-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], i32 [[TMP14]], i32 undef
; CHECK-NEXT: [[TMP17:%.*]] = icmp sgt i32 [[TMP16]], undef
; CHECK-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 [[TMP16]], i32 undef
; CHECK-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP18]], undef
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP18]], i32 undef
; CHECK-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], undef
; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 undef
; CHECK-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], undef
; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 undef
; CHECK-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], undef
; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 undef
; CHECK-NEXT: [[TMP27:%.*]] = icmp sgt i32 [[TMP26]], undef
; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP27]], i32 [[TMP26]], i32 undef
; CHECK-NEXT: [[TMP29:%.*]] = icmp sgt i32 [[TMP28]], undef
; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP29]], i32 [[TMP28]], i32 undef
; CHECK-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP30]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <16 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP]], <16 x i32> [[TMP2]], <16 x i32> [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP]], <16 x i32> [[TMP2]], <16 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP2]], <16 x i32> [[RDX_MINMAX_SELECT]], <16 x i32> [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP2]], <16 x i32> [[RDX_MINMAX_SELECT]], <16 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT3]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT3]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP5]], <16 x i32> [[RDX_MINMAX_SELECT3]], <16 x i32> [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP5]], <16 x i32> [[RDX_MINMAX_SELECT3]], <16 x i32> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT6]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <16 x i32> [[RDX_MINMAX_SELECT6]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt <16 x i32> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP8]], <16 x i32> [[RDX_MINMAX_SELECT6]], <16 x i32> [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP8]], <16 x i32> [[RDX_MINMAX_SELECT6]], <16 x i32> [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP32:%.*]] = extractelement <16 x i32> [[RDX_MINMAX_SELECT9]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[RDX_MINMAX_SELECT9]], i32 0
; CHECK-NEXT: [[TMP33:%.*]] = select i1 [[TMP31]], i32 [[TMP30]], i32 undef		; CHECK-NEXT: ret i32 [[TMP3]]
; CHECK-NEXT: ret i32 [[TMP32]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
Show All 37 Lines	;
%46 = icmp sgt i32 %44, %45		%46 = icmp sgt i32 %44, %45
%47 = select i1 %46, i32 %44, i32 %45		%47 = select i1 %46, i32 %44, i32 %45
ret i32 %47		ret i32 %47
}		}

define i32 @maxi32(i32) {		define i32 @maxi32(i32) {
; CHECK-LABEL: @maxi32(		; CHECK-LABEL: @maxi32(
; CHECK-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr to <32 x i32>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 undef, undef
; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i32 undef, i32 undef
; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP4]], undef
; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], i32 [[TMP4]], i32 undef
; CHECK-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP6]], undef
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP6]], i32 undef
; CHECK-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; CHECK-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; CHECK-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; CHECK-NEXT: [[TMP15:%.*]] = icmp sgt i32 [[TMP14]], undef
; CHECK-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], i32 [[TMP14]], i32 undef
; CHECK-NEXT: [[TMP17:%.*]] = icmp sgt i32 [[TMP16]], undef
; CHECK-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 [[TMP16]], i32 undef
; CHECK-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP18]], undef
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP18]], i32 undef
; CHECK-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], undef
; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 undef
; CHECK-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], undef
; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 undef
; CHECK-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], undef
; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 undef
; CHECK-NEXT: [[TMP27:%.*]] = icmp sgt i32 [[TMP26]], undef
; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP27]], i32 [[TMP26]], i32 undef
; CHECK-NEXT: [[TMP29:%.*]] = icmp sgt i32 [[TMP28]], undef
; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP29]], i32 [[TMP28]], i32 undef
; CHECK-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP30]], undef
; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 [[TMP30]], i32 undef
; CHECK-NEXT: [[TMP33:%.*]] = icmp sgt i32 [[TMP32]], undef
; CHECK-NEXT: [[TMP34:%.*]] = select i1 [[TMP33]], i32 [[TMP32]], i32 undef
; CHECK-NEXT: [[TMP35:%.*]] = icmp sgt i32 [[TMP34]], undef
; CHECK-NEXT: [[TMP36:%.*]] = select i1 [[TMP35]], i32 [[TMP34]], i32 undef
; CHECK-NEXT: [[TMP37:%.*]] = icmp sgt i32 [[TMP36]], undef
; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[TMP36]], i32 undef
; CHECK-NEXT: [[TMP39:%.*]] = icmp sgt i32 [[TMP38]], undef
; CHECK-NEXT: [[TMP40:%.*]] = select i1 [[TMP39]], i32 [[TMP38]], i32 undef
; CHECK-NEXT: [[TMP41:%.*]] = icmp sgt i32 [[TMP40]], undef
; CHECK-NEXT: [[TMP42:%.*]] = select i1 [[TMP41]], i32 [[TMP40]], i32 undef
; CHECK-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP42]], undef
; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP42]], i32 undef
; CHECK-NEXT: [[TMP45:%.*]] = icmp sgt i32 [[TMP44]], undef
; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP45]], i32 [[TMP44]], i32 undef
; CHECK-NEXT: [[TMP47:%.*]] = icmp sgt i32 [[TMP46]], undef
; CHECK-NEXT: [[TMP48:%.*]] = select i1 [[TMP47]], i32 [[TMP46]], i32 undef
; CHECK-NEXT: [[TMP49:%.*]] = icmp sgt i32 [[TMP48]], undef
; CHECK-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], i32 [[TMP48]], i32 undef
; CHECK-NEXT: [[TMP51:%.*]] = icmp sgt i32 [[TMP50]], undef
; CHECK-NEXT: [[TMP52:%.*]] = select i1 [[TMP51]], i32 [[TMP50]], i32 undef
; CHECK-NEXT: [[TMP53:%.*]] = icmp sgt i32 [[TMP52]], undef
; CHECK-NEXT: [[TMP54:%.*]] = select i1 [[TMP53]], i32 [[TMP52]], i32 undef
; CHECK-NEXT: [[TMP55:%.*]] = icmp sgt i32 [[TMP54]], undef
; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], i32 [[TMP54]], i32 undef
; CHECK-NEXT: [[TMP57:%.*]] = icmp sgt i32 [[TMP56]], undef
; CHECK-NEXT: [[TMP58:%.*]] = select i1 [[TMP57]], i32 [[TMP56]], i32 undef
; CHECK-NEXT: [[TMP59:%.*]] = icmp sgt i32 [[TMP58]], undef
; CHECK-NEXT: [[TMP60:%.*]] = select i1 [[TMP59]], i32 [[TMP58]], i32 undef
; CHECK-NEXT: [[TMP61:%.*]] = icmp sgt i32 [[TMP60]], undef
; CHECK-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], i32 [[TMP60]], i32 undef
; CHECK-NEXT: [[TMP63:%.*]] = icmp sgt i32 [[TMP62]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP2]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <32 x i32> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP]], <32 x i32> [[TMP2]], <32 x i32> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP2]], <32 x i32> [[RDX_MINMAX_SELECT]], <32 x i32> [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP2]], <32 x i32> [[RDX_MINMAX_SELECT]], <32 x i32> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT3]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT3]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP5]], <32 x i32> [[RDX_MINMAX_SELECT3]], <32 x i32> [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP5]], <32 x i32> [[RDX_MINMAX_SELECT3]], <32 x i32> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT6]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT6]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP8]], <32 x i32> [[RDX_MINMAX_SELECT6]], <32 x i32> [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP8]], <32 x i32> [[RDX_MINMAX_SELECT6]], <32 x i32> [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT9]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <32 x i32> [[RDX_MINMAX_SELECT9]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP11:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT9]], [[RDX_SHUF10]]		; CHECK-NEXT: [[RDX_MINMAX_CMP11:%.*]] = icmp sgt <32 x i32> [[RDX_MINMAX_SELECT9]], [[RDX_SHUF10]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT12:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP11]], <32 x i32> [[RDX_MINMAX_SELECT9]], <32 x i32> [[RDX_SHUF10]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT12:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP11]], <32 x i32> [[RDX_MINMAX_SELECT9]], <32 x i32> [[RDX_SHUF10]]
; CHECK-NEXT: [[TMP64:%.*]] = extractelement <32 x i32> [[RDX_MINMAX_SELECT12]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <32 x i32> [[RDX_MINMAX_SELECT12]], i32 0
; CHECK-NEXT: [[TMP65:%.*]] = select i1 [[TMP63]], i32 [[TMP62]], i32 undef		; CHECK-NEXT: ret i32 [[TMP3]]
; CHECK-NEXT: ret i32 [[TMP64]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	;
%94 = icmp sgt i32 %92, %93		%94 = icmp sgt i32 %92, %93
%95 = select i1 %94, i32 %92, i32 %93		%95 = select i1 %94, i32 %92, i32 %93
ret i32 %95		ret i32 %95
}		}

define float @maxf8(float) {		define float @maxf8(float) {
; CHECK-LABEL: @maxf8(		; CHECK-LABEL: @maxf8(
; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr1 to <8 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fcmp fast ogt float undef, undef
; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], float undef, float undef
; CHECK-NEXT: [[TMP5:%.*]] = fcmp fast ogt float [[TMP4]], undef
; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float undef
; CHECK-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP6]], undef
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP6]], float undef
; CHECK-NEXT: [[TMP9:%.*]] = fcmp fast ogt float [[TMP8]], undef
; CHECK-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], float [[TMP8]], float undef
; CHECK-NEXT: [[TMP11:%.*]] = fcmp fast ogt float [[TMP10]], undef
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], float [[TMP10]], float undef
; CHECK-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP12]], undef
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP12]], float undef
; CHECK-NEXT: [[TMP15:%.*]] = fcmp fast ogt float [[TMP14]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <8 x float> [[TMP2]], [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <8 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x float> [[TMP2]], <8 x float> [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x float> [[TMP2]], <8 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[RDX_MINMAX_SELECT]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[RDX_MINMAX_SELECT]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <8 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <8 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x float> [[RDX_MINMAX_SELECT]], <8 x float> [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x float> [[RDX_MINMAX_SELECT]], <8 x float> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x float> [[RDX_MINMAX_SELECT3]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x float> [[RDX_MINMAX_SELECT3]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <8 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <8 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x float> [[RDX_MINMAX_SELECT3]], <8 x float> [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x float> [[RDX_MINMAX_SELECT3]], <8 x float> [[RDX_SHUF4]]
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <8 x float> [[RDX_MINMAX_SELECT6]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x float> [[RDX_MINMAX_SELECT6]], i32 0
; CHECK-NEXT: [[TMP17:%.*]] = select i1 [[TMP15]], float [[TMP14]], float undef		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: ret float [[TMP16]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
%8 = select i1 %7, float %5, float %6		%8 = select i1 %7, float %5, float %6
Show All 13 Lines	;
%22 = fcmp fast ogt float %20, %21		%22 = fcmp fast ogt float %20, %21
%23 = select i1 %22, float %20, float %21		%23 = select i1 %22, float %20, float %21
ret float %23		ret float %23
}		}

define float @maxf16(float) {		define float @maxf16(float) {
; CHECK-LABEL: @maxf16(		; CHECK-LABEL: @maxf16(
; CHECK-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr1 to <16 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fcmp fast ogt float undef, undef
; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], float undef, float undef
; CHECK-NEXT: [[TMP5:%.*]] = fcmp fast ogt float [[TMP4]], undef
; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float undef
; CHECK-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP6]], undef
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP6]], float undef
; CHECK-NEXT: [[TMP9:%.*]] = fcmp fast ogt float [[TMP8]], undef
; CHECK-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], float [[TMP8]], float undef
; CHECK-NEXT: [[TMP11:%.*]] = fcmp fast ogt float [[TMP10]], undef
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], float [[TMP10]], float undef
; CHECK-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP12]], undef
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP12]], float undef
; CHECK-NEXT: [[TMP15:%.*]] = fcmp fast ogt float [[TMP14]], undef
; CHECK-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], float [[TMP14]], float undef
; CHECK-NEXT: [[TMP17:%.*]] = fcmp fast ogt float [[TMP16]], undef
; CHECK-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], float [[TMP16]], float undef
; CHECK-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP18]], undef
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP18]], float undef
; CHECK-NEXT: [[TMP21:%.*]] = fcmp fast ogt float [[TMP20]], undef
; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], float [[TMP20]], float undef
; CHECK-NEXT: [[TMP23:%.*]] = fcmp fast ogt float [[TMP22]], undef
; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], float [[TMP22]], float undef
; CHECK-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP24]], undef
; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP24]], float undef
; CHECK-NEXT: [[TMP27:%.*]] = fcmp fast ogt float [[TMP26]], undef
; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP27]], float [[TMP26]], float undef
; CHECK-NEXT: [[TMP29:%.*]] = fcmp fast ogt float [[TMP28]], undef
; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP29]], float [[TMP28]], float undef
; CHECK-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP30]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP2]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <16 x float> [[TMP2]], [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <16 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP]], <16 x float> [[TMP2]], <16 x float> [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP]], <16 x float> [[TMP2]], <16 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP2]], <16 x float> [[RDX_MINMAX_SELECT]], <16 x float> [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP2]], <16 x float> [[RDX_MINMAX_SELECT]], <16 x float> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT3]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT3]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP5]], <16 x float> [[RDX_MINMAX_SELECT3]], <16 x float> [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP5]], <16 x float> [[RDX_MINMAX_SELECT3]], <16 x float> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT6]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <16 x float> [[RDX_MINMAX_SELECT6]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = fcmp fast ogt <16 x float> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP8]], <16 x float> [[RDX_MINMAX_SELECT6]], <16 x float> [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <16 x i1> [[RDX_MINMAX_CMP8]], <16 x float> [[RDX_MINMAX_SELECT6]], <16 x float> [[RDX_SHUF7]]
; CHECK-NEXT: [[TMP32:%.*]] = extractelement <16 x float> [[RDX_MINMAX_SELECT9]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x float> [[RDX_MINMAX_SELECT9]], i32 0
; CHECK-NEXT: [[TMP33:%.*]] = select i1 [[TMP31]], float [[TMP30]], float undef		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: ret float [[TMP32]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
%8 = select i1 %7, float %5, float %6		%8 = select i1 %7, float %5, float %6
Show All 37 Lines	;
%46 = fcmp fast ogt float %44, %45		%46 = fcmp fast ogt float %44, %45
%47 = select i1 %46, float %44, float %45		%47 = select i1 %46, float %44, float %45
ret float %47		ret float %47
}		}

define float @maxf32(float) {		define float @maxf32(float) {
; CHECK-LABEL: @maxf32(		; CHECK-LABEL: @maxf32(
; CHECK-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <32 x float>, <32 x float> bitcast ([32 x float]* @arr1 to <32 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fcmp fast ogt float undef, undef
; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], float undef, float undef
; CHECK-NEXT: [[TMP5:%.*]] = fcmp fast ogt float [[TMP4]], undef
; CHECK-NEXT: [[TMP6:%.*]] = select i1 [[TMP5]], float [[TMP4]], float undef
; CHECK-NEXT: [[TMP7:%.*]] = fcmp fast ogt float [[TMP6]], undef
; CHECK-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], float [[TMP6]], float undef
; CHECK-NEXT: [[TMP9:%.*]] = fcmp fast ogt float [[TMP8]], undef
; CHECK-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], float [[TMP8]], float undef
; CHECK-NEXT: [[TMP11:%.*]] = fcmp fast ogt float [[TMP10]], undef
; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], float [[TMP10]], float undef
; CHECK-NEXT: [[TMP13:%.*]] = fcmp fast ogt float [[TMP12]], undef
; CHECK-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], float [[TMP12]], float undef
; CHECK-NEXT: [[TMP15:%.*]] = fcmp fast ogt float [[TMP14]], undef
; CHECK-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], float [[TMP14]], float undef
; CHECK-NEXT: [[TMP17:%.*]] = fcmp fast ogt float [[TMP16]], undef
; CHECK-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], float [[TMP16]], float undef
; CHECK-NEXT: [[TMP19:%.*]] = fcmp fast ogt float [[TMP18]], undef
; CHECK-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], float [[TMP18]], float undef
; CHECK-NEXT: [[TMP21:%.*]] = fcmp fast ogt float [[TMP20]], undef
; CHECK-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], float [[TMP20]], float undef
; CHECK-NEXT: [[TMP23:%.*]] = fcmp fast ogt float [[TMP22]], undef
; CHECK-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], float [[TMP22]], float undef
; CHECK-NEXT: [[TMP25:%.*]] = fcmp fast ogt float [[TMP24]], undef
; CHECK-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], float [[TMP24]], float undef
; CHECK-NEXT: [[TMP27:%.*]] = fcmp fast ogt float [[TMP26]], undef
; CHECK-NEXT: [[TMP28:%.*]] = select i1 [[TMP27]], float [[TMP26]], float undef
; CHECK-NEXT: [[TMP29:%.*]] = fcmp fast ogt float [[TMP28]], undef
; CHECK-NEXT: [[TMP30:%.*]] = select i1 [[TMP29]], float [[TMP28]], float undef
; CHECK-NEXT: [[TMP31:%.*]] = fcmp fast ogt float [[TMP30]], undef
; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], float [[TMP30]], float undef
; CHECK-NEXT: [[TMP33:%.*]] = fcmp fast ogt float [[TMP32]], undef
; CHECK-NEXT: [[TMP34:%.*]] = select i1 [[TMP33]], float [[TMP32]], float undef
; CHECK-NEXT: [[TMP35:%.*]] = fcmp fast ogt float [[TMP34]], undef
; CHECK-NEXT: [[TMP36:%.*]] = select i1 [[TMP35]], float [[TMP34]], float undef
; CHECK-NEXT: [[TMP37:%.*]] = fcmp fast ogt float [[TMP36]], undef
; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], float [[TMP36]], float undef
; CHECK-NEXT: [[TMP39:%.*]] = fcmp fast ogt float [[TMP38]], undef
; CHECK-NEXT: [[TMP40:%.*]] = select i1 [[TMP39]], float [[TMP38]], float undef
; CHECK-NEXT: [[TMP41:%.*]] = fcmp fast ogt float [[TMP40]], undef
; CHECK-NEXT: [[TMP42:%.*]] = select i1 [[TMP41]], float [[TMP40]], float undef
; CHECK-NEXT: [[TMP43:%.*]] = fcmp fast ogt float [[TMP42]], undef
; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], float [[TMP42]], float undef
; CHECK-NEXT: [[TMP45:%.*]] = fcmp fast ogt float [[TMP44]], undef
; CHECK-NEXT: [[TMP46:%.*]] = select i1 [[TMP45]], float [[TMP44]], float undef
; CHECK-NEXT: [[TMP47:%.*]] = fcmp fast ogt float [[TMP46]], undef
; CHECK-NEXT: [[TMP48:%.*]] = select i1 [[TMP47]], float [[TMP46]], float undef
; CHECK-NEXT: [[TMP49:%.*]] = fcmp fast ogt float [[TMP48]], undef
; CHECK-NEXT: [[TMP50:%.*]] = select i1 [[TMP49]], float [[TMP48]], float undef
; CHECK-NEXT: [[TMP51:%.*]] = fcmp fast ogt float [[TMP50]], undef
; CHECK-NEXT: [[TMP52:%.*]] = select i1 [[TMP51]], float [[TMP50]], float undef
; CHECK-NEXT: [[TMP53:%.*]] = fcmp fast ogt float [[TMP52]], undef
; CHECK-NEXT: [[TMP54:%.*]] = select i1 [[TMP53]], float [[TMP52]], float undef
; CHECK-NEXT: [[TMP55:%.*]] = fcmp fast ogt float [[TMP54]], undef
; CHECK-NEXT: [[TMP56:%.*]] = select i1 [[TMP55]], float [[TMP54]], float undef
; CHECK-NEXT: [[TMP57:%.*]] = fcmp fast ogt float [[TMP56]], undef
; CHECK-NEXT: [[TMP58:%.*]] = select i1 [[TMP57]], float [[TMP56]], float undef
; CHECK-NEXT: [[TMP59:%.*]] = fcmp fast ogt float [[TMP58]], undef
; CHECK-NEXT: [[TMP60:%.*]] = select i1 [[TMP59]], float [[TMP58]], float undef
; CHECK-NEXT: [[TMP61:%.*]] = fcmp fast ogt float [[TMP60]], undef
; CHECK-NEXT: [[TMP62:%.*]] = select i1 [[TMP61]], float [[TMP60]], float undef
; CHECK-NEXT: [[TMP63:%.*]] = fcmp fast ogt float [[TMP62]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x float> [[TMP2]], <32 x float> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <32 x float> [[TMP2]], [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = fcmp fast ogt <32 x float> [[TMP2]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP]], <32 x float> [[TMP2]], <32 x float> [[RDX_SHUF]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP]], <32 x float> [[TMP2]], <32 x float> [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT]], <32 x float> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP2]], <32 x float> [[RDX_MINMAX_SELECT]], <32 x float> [[RDX_SHUF1]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP2]], <32 x float> [[RDX_MINMAX_SELECT]], <32 x float> [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT3]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT3]], <32 x float> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP5]], <32 x float> [[RDX_MINMAX_SELECT3]], <32 x float> [[RDX_SHUF4]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP5]], <32 x float> [[RDX_MINMAX_SELECT3]], <32 x float> [[RDX_SHUF4]]
; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT6]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT6]], <32 x float> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_CMP8:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT6]], [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP8]], <32 x float> [[RDX_MINMAX_SELECT6]], <32 x float> [[RDX_SHUF7]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT9:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP8]], <32 x float> [[RDX_MINMAX_SELECT6]], <32 x float> [[RDX_SHUF7]]
; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT9]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF10:%.*]] = shufflevector <32 x float> [[RDX_MINMAX_SELECT9]], <32 x float> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[RDX_MINMAX_CMP11:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT9]], [[RDX_SHUF10]]		; CHECK-NEXT: [[RDX_MINMAX_CMP11:%.*]] = fcmp fast ogt <32 x float> [[RDX_MINMAX_SELECT9]], [[RDX_SHUF10]]
; CHECK-NEXT: [[RDX_MINMAX_SELECT12:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP11]], <32 x float> [[RDX_MINMAX_SELECT9]], <32 x float> [[RDX_SHUF10]]		; CHECK-NEXT: [[RDX_MINMAX_SELECT12:%.*]] = select <32 x i1> [[RDX_MINMAX_CMP11]], <32 x float> [[RDX_MINMAX_SELECT9]], <32 x float> [[RDX_SHUF10]]
; CHECK-NEXT: [[TMP64:%.*]] = extractelement <32 x float> [[RDX_MINMAX_SELECT12]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <32 x float> [[RDX_MINMAX_SELECT12]], i32 0
; CHECK-NEXT: [[TMP65:%.*]] = select i1 [[TMP63]], float [[TMP62]], float undef		; CHECK-NEXT: ret float [[TMP3]]
; CHECK-NEXT: ret float [[TMP64]]
;		;
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 0), align 16
%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4		%3 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 1), align 4
%4 = fcmp fast ogt float %2, %3		%4 = fcmp fast ogt float %2, %3
%5 = select i1 %4, float %2, float %3		%5 = select i1 %4, float %2, float %3
%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8		%6 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr1, i64 0, i64 2), align 8
%7 = fcmp fast ogt float %5, %6		%7 = fcmp fast ogt float %5, %6
%8 = select i1 %7, float %5, float %6		%8 = select i1 %7, float %5, float %6
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines

define i32 @maxi8_mutiple_uses(i32) {		define i32 @maxi8_mutiple_uses(i32) {
; SSE-LABEL: @maxi8_mutiple_uses(		; SSE-LABEL: @maxi8_mutiple_uses(
; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; SSE-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; SSE-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; SSE-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; SSE-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; SSE-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], undef		; SSE-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SSE-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 undef
; SSE-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; SSE-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; SSE-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; SSE-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; SSE-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; SSE-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; SSE-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SSE-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; SSE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SSE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SSE-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]		; SSE-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; SSE-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]		; SSE-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; SSE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; SSE-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; SSE-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; SSE-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; SSE-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; SSE-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; SSE-NEXT: [[TMP18:%.*]] = icmp sgt i32 [[TMP17]], [[TMP15]]		; SSE-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP7]]
; SSE-NEXT: [[TMP19:%.*]] = select i1 [[TMP18]], i32 [[TMP17]], i32 [[TMP15]]		; SSE-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP7]]
; SSE-NEXT: [[TMP20:%.*]] = icmp sgt i32 [[TMP19]], [[TMP5]]		; SSE-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], [[TMP5]]
; SSE-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP20]], i32 [[TMP19]], i32 [[TMP5]]		; SSE-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 [[TMP5]]
; SSE-NEXT: [[TMP21:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; SSE-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SSE-NEXT: [[TMP22:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; SSE-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP12]]
; SSE-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP22]]		; SSE-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA]], i32 [[TMP12]]
; SSE-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[OP_EXTRA]], i32 [[TMP22]]		; SSE-NEXT: [[TMP15:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; SSE-NEXT: [[TMP25:%.*]] = select i1 [[TMP4]], i32 3, i32 4		; SSE-NEXT: store i32 [[TMP15]], i32* @var, align 8
; SSE-NEXT: store i32 [[TMP25]], i32* @var, align 8		; SSE-NEXT: ret i32 [[TMP14]]
; SSE-NEXT: ret i32 [[TMP24]]
;		;
; AVX-LABEL: @maxi8_mutiple_uses(		; AVX-LABEL: @maxi8_mutiple_uses(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], undef		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 undef
; AVX-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; AVX-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; AVX-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]		; AVX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; AVX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]		; AVX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; AVX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; AVX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; AVX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; AVX-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; AVX-NEXT: [[TMP18:%.*]] = icmp sgt i32 [[TMP17]], [[TMP15]]		; AVX-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP7]]
; AVX-NEXT: [[TMP19:%.*]] = select i1 [[TMP18]], i32 [[TMP17]], i32 [[TMP15]]		; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP7]]
; AVX-NEXT: [[TMP20:%.*]] = icmp sgt i32 [[TMP19]], [[TMP5]]		; AVX-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], [[TMP5]]
; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP20]], i32 [[TMP19]], i32 [[TMP5]]		; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 [[TMP5]]
; AVX-NEXT: [[TMP21:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; AVX-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP22:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP12]]
; AVX-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP22]]		; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA]], i32 [[TMP12]]
; AVX-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[OP_EXTRA]], i32 [[TMP22]]		; AVX-NEXT: [[TMP15:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX-NEXT: [[TMP25:%.*]] = select i1 [[TMP4]], i32 3, i32 4		; AVX-NEXT: store i32 [[TMP15]], i32* @var, align 8
; AVX-NEXT: store i32 [[TMP25]], i32* @var, align 8		; AVX-NEXT: ret i32 [[TMP14]]
; AVX-NEXT: ret i32 [[TMP24]]
;		;
; AVX2-LABEL: @maxi8_mutiple_uses(		; AVX2-LABEL: @maxi8_mutiple_uses(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], undef		; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 undef
; AVX2-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; AVX2-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; AVX2-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; AVX2-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; AVX2-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX2-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]		; AVX2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; AVX2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]		; AVX2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; AVX2-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; AVX2-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; AVX2-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP17:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; AVX2-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; AVX2-NEXT: [[TMP18:%.*]] = icmp sgt i32 [[TMP17]], [[TMP15]]		; AVX2-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP7]]
; AVX2-NEXT: [[TMP19:%.*]] = select i1 [[TMP18]], i32 [[TMP17]], i32 [[TMP15]]		; AVX2-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP7]]
; AVX2-NEXT: [[TMP20:%.*]] = icmp sgt i32 [[TMP19]], [[TMP5]]		; AVX2-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], [[TMP5]]
; AVX2-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP20]], i32 [[TMP19]], i32 [[TMP5]]		; AVX2-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 [[TMP5]]
; AVX2-NEXT: [[TMP21:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]		; AVX2-NEXT: [[TMP12:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP22:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4		; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP12]]
; AVX2-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP22]]		; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA]], i32 [[TMP12]]
; AVX2-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[OP_EXTRA]], i32 [[TMP22]]		; AVX2-NEXT: [[TMP15:%.*]] = select i1 [[TMP4]], i32 3, i32 4
; AVX2-NEXT: [[TMP25:%.*]] = select i1 [[TMP4]], i32 3, i32 4		; AVX2-NEXT: store i32 [[TMP15]], i32* @var, align 8
; AVX2-NEXT: store i32 [[TMP25]], i32* @var, align 8		; AVX2-NEXT: ret i32 [[TMP14]]
; AVX2-NEXT: ret i32 [[TMP24]]
;		;
; SKX-LABEL: @maxi8_mutiple_uses(		; SKX-LABEL: @maxi8_mutiple_uses(
; SKX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; SKX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
; SKX-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; SKX-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
; SKX-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; SKX-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
; SKX-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; SKX-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; SKX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8		; SKX-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SKX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SKX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SKX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP5]], [[RDX_SHUF]]		; SKX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP5]], [[RDX_SHUF]]
; SKX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP5]], <4 x i32> [[RDX_SHUF]]		; SKX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP5]], <4 x i32> [[RDX_SHUF]]
; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SKX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SKX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; SKX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; SKX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; SKX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; SKX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; SKX-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; SKX-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> undef, i32 [[TMP7]], i32 0		; SKX-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> undef, i32 [[TMP7]], i32 0
; SKX-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP3]], i32 1		; SKX-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP3]], i32 1
; SKX-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> undef, i32 [[TMP6]], i32 0		; SKX-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> undef, i32 [[TMP6]], i32 0
; SKX-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1		; SKX-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1
; SKX-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP9]], [[TMP11]]		; SKX-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP9]], [[TMP11]]
; SKX-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]		; SKX-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]
; SKX-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1		; SKX-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
; SKX-NEXT: [[TMP15:%.*]] = icmp sgt i32 [[TMP14]], undef		; SKX-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
; SKX-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], i32 [[TMP14]], i32 undef		; SKX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP15]], [[TMP14]]
; SKX-NEXT: [[TMP17:%.*]] = icmp sgt i32 [[TMP16]], undef		; SKX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP16]], i32 [[TMP15]], i32 [[TMP14]]
; SKX-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 [[TMP16]], i32 undef		; SKX-NEXT: [[TMP17:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP18]], undef		; SKX-NEXT: [[TMP18:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP17]]
; SKX-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 [[TMP18]], i32 undef		; SKX-NEXT: [[TMP19:%.*]] = select i1 [[TMP18]], i32 [[OP_EXTRA]], i32 [[TMP17]]
; SKX-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], undef		; SKX-NEXT: [[TMP20:%.*]] = extractelement <2 x i1> [[TMP12]], i32 1
; SKX-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 undef		; SKX-NEXT: [[TMP21:%.*]] = select i1 [[TMP20]], i32 3, i32 4
; SKX-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], [[TMP6]]		; SKX-NEXT: store i32 [[TMP21]], i32* @var, align 8
; SKX-NEXT: [[TMP24:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0		; SKX-NEXT: ret i32 [[TMP19]]
; SKX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], [[TMP14]]
; SKX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 [[TMP14]]
; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 [[TMP6]]
; SKX-NEXT: [[TMP27:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SKX-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[OP_EXTRA]], [[TMP27]]
; SKX-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 [[OP_EXTRA]], i32 [[TMP27]]
; SKX-NEXT: [[TMP30:%.*]] = extractelement <2 x i1> [[TMP12]], i32 1
; SKX-NEXT: [[TMP31:%.*]] = select i1 [[TMP30]], i32 3, i32 4
; SKX-NEXT: store i32 [[TMP31]], i32* @var, align 8
; SKX-NEXT: ret i32 [[TMP29]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
%5 = select i1 %4, i32 %2, i32 %3		%5 = select i1 %4, i32 %2, i32 %3
%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8		%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
%7 = icmp sgt i32 %5, %6		%7 = icmp sgt i32 %5, %6
%8 = select i1 %7, i32 %5, i32 %6		%8 = select i1 %7, i32 %5, i32 %6
Show All 21 Lines
; SSE-LABEL: @maxi8_wrong_parent(		; SSE-LABEL: @maxi8_wrong_parent(
; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; SSE-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; SSE-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; SSE-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; SSE-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; SSE-NEXT: br label [[PP:%.*]]		; SSE-NEXT: br label [[PP:%.*]]
; SSE: pp:		; SSE: pp:
; SSE-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; SSE-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; SSE-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; SSE-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], undef		; SSE-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SSE-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 undef		; SSE-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SSE-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; SSE-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; SSE-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; SSE-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; SSE-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; SSE-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; SSE-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; SSE-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; SSE-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; SSE-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; SSE-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; SSE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; SSE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; SSE-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]		; SSE-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; SSE-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]		; SSE-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; SSE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; SSE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; SSE-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; SSE-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; SSE-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; SSE-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; SSE-NEXT: [[TMP20:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; SSE-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; SSE-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], [[TMP15]]		; SSE-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
; SSE-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 [[TMP15]]		; SSE-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; SSE-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], [[TMP18]]		; SSE-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; SSE-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 [[TMP18]]		; SSE-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; SSE-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], [[TMP5]]		; SSE-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; SSE-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 [[TMP5]]		; SSE-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP5]]
; SSE-NEXT: [[TMP26:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; SSE-NEXT: ret i32 [[OP_EXTRA]]		; SSE-NEXT: ret i32 [[OP_EXTRA]]
;		;
; AVX-LABEL: @maxi8_wrong_parent(		; AVX-LABEL: @maxi8_wrong_parent(
; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX-NEXT: br label [[PP:%.*]]		; AVX-NEXT: br label [[PP:%.*]]
; AVX: pp:		; AVX: pp:
; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; AVX-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], undef		; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 undef		; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; AVX-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; AVX-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; AVX-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; AVX-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; AVX-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; AVX-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]		; AVX-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; AVX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]		; AVX-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; AVX-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; AVX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; AVX-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; AVX-NEXT: [[TMP20:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; AVX-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; AVX-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], [[TMP15]]		; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
; AVX-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 [[TMP15]]		; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; AVX-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], [[TMP18]]		; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; AVX-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 [[TMP18]]		; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], [[TMP5]]		; AVX-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 [[TMP5]]		; AVX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP5]]
; AVX-NEXT: [[TMP26:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX-NEXT: ret i32 [[OP_EXTRA]]		; AVX-NEXT: ret i32 [[OP_EXTRA]]
;		;
; AVX2-LABEL: @maxi8_wrong_parent(		; AVX2-LABEL: @maxi8_wrong_parent(
; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]		; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
; AVX2-NEXT: br label [[PP:%.*]]		; AVX2-NEXT: br label [[PP:%.*]]
; AVX2: pp:		; AVX2: pp:
; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]		; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8		; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
; AVX2-NEXT: [[TMP7:%.*]] = icmp sgt i32 [[TMP5]], undef		; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP8:%.*]] = select i1 [[TMP7]], i32 [[TMP5]], i32 undef		; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], undef
; AVX2-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 undef
; AVX2-NEXT: [[TMP11:%.*]] = icmp sgt i32 [[TMP10]], undef
; AVX2-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
; AVX2-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP12]], undef
; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP12]], i32 undef
; AVX2-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
; AVX2-NEXT: [[TMP16:%.*]] = icmp sgt i32 [[TMP14]], [[TMP15]]
; AVX2-NEXT: [[TMP17:%.*]] = select i1 [[TMP16]], i32 [[TMP14]], i32 [[TMP15]]
; AVX2-NEXT: [[TMP18:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
; AVX2-NEXT: [[TMP19:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]
; AVX2-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; AVX2-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; AVX2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]		; AVX2-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <4 x i32> [[TMP6]], [[RDX_SHUF]]
; AVX2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]		; AVX2-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP6]], <4 x i32> [[RDX_SHUF]]
; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; AVX2-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; AVX2-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]		; AVX2-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
; AVX2-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]		; AVX2-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
; AVX2-NEXT: [[TMP20:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0		; AVX2-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
; AVX2-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], [[TMP15]]		; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]
; AVX2-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 [[TMP15]]		; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]
; AVX2-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], [[TMP18]]		; AVX2-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]
; AVX2-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 [[TMP18]]		; AVX2-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]
; AVX2-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], [[TMP5]]		; AVX2-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]
; AVX2-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 [[TMP5]]		; AVX2-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP14]], i32 [[TMP13]], i32 [[TMP5]]
; AVX2-NEXT: [[TMP26:%.*]] = select i1 [[TMP19]], i32 [[TMP17]], i32 [[TMP18]]
; AVX2-NEXT: ret i32 [[OP_EXTRA]]		; AVX2-NEXT: ret i32 [[OP_EXTRA]]
;		;
; SKX-LABEL: @maxi8_wrong_parent(		; SKX-LABEL: @maxi8_wrong_parent(
; SKX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16		; SKX-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
; SKX-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0		; SKX-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
; SKX-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1		; SKX-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
; SKX-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]		; SKX-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
; SKX-NEXT: br label [[PP:%.*]]		; SKX-NEXT: br label [[PP:%.*]]
Show All 14 Lines
; SKX-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> undef, i1 [[TMP12]], i32 0		; SKX-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> undef, i1 [[TMP12]], i32 0
; SKX-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1		; SKX-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1
; SKX-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[TMP11]], i32 0		; SKX-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[TMP11]], i32 0
; SKX-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1		; SKX-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1
; SKX-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> undef, i32 [[TMP8]], i32 0		; SKX-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> undef, i32 [[TMP8]], i32 0
; SKX-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1		; SKX-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1
; SKX-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]		; SKX-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]
; SKX-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1		; SKX-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1
; SKX-NEXT: [[TMP21:%.*]] = icmp sgt i32 [[TMP20]], undef		; SKX-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0
; SKX-NEXT: [[TMP22:%.*]] = select i1 [[TMP21]], i32 [[TMP20]], i32 undef		; SKX-NEXT: [[TMP22:%.*]] = icmp sgt i32 [[TMP21]], [[TMP20]]
; SKX-NEXT: [[TMP23:%.*]] = icmp sgt i32 [[TMP22]], undef		; SKX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP22]], i32 [[TMP21]], i32 [[TMP20]]
; SKX-NEXT: [[TMP24:%.*]] = select i1 [[TMP23]], i32 [[TMP22]], i32 undef
; SKX-NEXT: [[TMP25:%.*]] = icmp sgt i32 [[TMP24]], undef
; SKX-NEXT: [[TMP26:%.*]] = select i1 [[TMP25]], i32 [[TMP24]], i32 undef
; SKX-NEXT: [[TMP27:%.*]] = icmp sgt i32 [[TMP26]], undef
; SKX-NEXT: [[TMP28:%.*]] = select i1 [[TMP27]], i32 [[TMP26]], i32 undef
; SKX-NEXT: [[TMP29:%.*]] = icmp sgt i32 [[TMP28]], [[TMP7]]
; SKX-NEXT: [[TMP30:%.*]] = select i1 [[TMP29]], i32 [[TMP28]], i32 [[TMP7]]
; SKX-NEXT: [[TMP31:%.*]] = icmp sgt i32 [[TMP30]], [[TMP8]]
; SKX-NEXT: [[TMP32:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0
; SKX-NEXT: [[TMP33:%.*]] = icmp sgt i32 [[TMP32]], [[TMP20]]
; SKX-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP33]], i32 [[TMP32]], i32 [[TMP20]]
; SKX-NEXT: [[TMP34:%.*]] = select i1 [[TMP31]], i32 [[TMP30]], i32 [[TMP8]]
; SKX-NEXT: ret i32 [[OP_EXTRA]]		; SKX-NEXT: ret i32 [[OP_EXTRA]]
;		;
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4		%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
%4 = icmp sgt i32 %2, %3		%4 = icmp sgt i32 %2, %3
br label %pp		br label %pp

pp:		pp:
▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal.ll

Show All 31 Lines
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]
; CHECK-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2		; CHECK-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2
; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]		; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]
; CHECK-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3		; CHECK-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3
; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]		; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]
; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*		; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>		; CHECK-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float undef, undef
; CHECK-NEXT: [[ADD11:%.*]] = fadd fast float [[ADD6]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[ADD16:%.*]] = fadd fast float [[ADD11]], undef
; CHECK-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]		; CHECK-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_033]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_033]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
Show All 16 Lines
; STORE-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]		; STORE-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD28]]
; STORE-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2		; STORE-NEXT: [[ADD829:%.*]] = or i64 [[MUL]], 2
; STORE-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]		; STORE-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD829]]
; STORE-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1330:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]		; STORE-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1330]]
; STORE-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*		; STORE-NEXT: [[TMP1:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4		; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; STORE-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>		; STORE-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP2]], <float 7.000000e+00, float 7.000000e+00, float 7.000000e+00, float 7.000000e+00>
; STORE-NEXT: [[ADD6:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD11:%.*]] = fadd fast float [[ADD6]], undef
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP3]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; STORE-NEXT: [[TMP4:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[ADD16:%.*]] = fadd fast float [[ADD11]], undef
; STORE-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]		; STORE-NEXT: [[ADD17]] = fadd fast float [[SUM_032]], [[TMP4]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_033]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_033]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD17]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]		; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]
; CHECK-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2		; CHECK-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2
; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]		; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]
; CHECK-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3		; CHECK-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3
; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]		; CHECK-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]
; CHECK-NEXT: [[ADD8:%.*]] = fadd fast float undef, undef
; CHECK-NEXT: [[ADD14:%.*]] = fadd fast float [[ADD8]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[ADD20:%.*]] = fadd fast float [[ADD14]], undef
; CHECK-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]		; CHECK-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_040]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_040]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
Show All 21 Lines
; STORE-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]		; STORE-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]
; STORE-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2		; STORE-NEXT: [[ADD1136:%.*]] = or i64 [[MUL]], 2
; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]		; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1136]]
; STORE-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1737:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]		; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1737]]
; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; STORE-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]		; STORE-NEXT: [[TMP5:%.*]] = fmul <4 x float> [[TMP1]], [[TMP4]]
; STORE-NEXT: [[ADD8:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD14:%.*]] = fadd fast float [[ADD8]], undef
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[ADD20:%.*]] = fadd fast float [[ADD14]], undef
; STORE-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]		; STORE-NEXT: [[MUL21]] = fmul float [[SUM_039]], [[TMP6]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_040]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_040]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[MUL21]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD29]]		; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD29]]
; CHECK-NEXT: [[ADD35:%.*]] = add nsw i64 [[MUL]], 6		; CHECK-NEXT: [[ADD35:%.*]] = add nsw i64 [[MUL]], 6
; CHECK-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]		; CHECK-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]
; CHECK-NEXT: [[ADD41:%.*]] = add nsw i64 [[MUL]], 7		; CHECK-NEXT: [[ADD41:%.*]] = add nsw i64 [[MUL]], 7
; CHECK-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]		; CHECK-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]
; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*
; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4		; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]		; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]
; CHECK-NEXT: [[ADD8:%.*]] = fadd fast float undef, undef
; CHECK-NEXT: [[ADD14:%.*]] = fadd fast float [[ADD8]], undef
; CHECK-NEXT: [[ADD20:%.*]] = fadd fast float [[ADD14]], undef
; CHECK-NEXT: [[ADD26:%.*]] = fadd fast float [[ADD20]], undef
; CHECK-NEXT: [[ADD32:%.*]] = fadd fast float [[ADD26]], undef
; CHECK-NEXT: [[ADD38:%.*]] = fadd fast float [[ADD32]], undef
; CHECK-NEXT: [[ADD44:%.*]] = fadd fast float [[ADD38]], undef
; CHECK-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8		; CHECK-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8
; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]		; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]
; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4		; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4
; CHECK-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]		; CHECK-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP6]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP6]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]		; CHECK-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]
; CHECK-NEXT: [[ADD50:%.*]] = fadd fast float [[ADD44]], [[MUL49]]
; CHECK-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]		; CHECK-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_083]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_083]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
Show All 35 Lines
; STORE-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD29]]		; STORE-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD29]]
; STORE-NEXT: [[ADD35:%.*]] = add nsw i64 [[MUL]], 6		; STORE-NEXT: [[ADD35:%.*]] = add nsw i64 [[MUL]], 6
; STORE-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]		; STORE-NEXT: [[ARRAYIDX36:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD35]]
; STORE-NEXT: [[ADD41:%.*]] = add nsw i64 [[MUL]], 7		; STORE-NEXT: [[ADD41:%.*]] = add nsw i64 [[MUL]], 7
; STORE-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]		; STORE-NEXT: [[ARRAYIDX42:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD41]]
; STORE-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*		; STORE-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX2]] to <8 x float>*
; STORE-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4		; STORE-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
; STORE-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]		; STORE-NEXT: [[TMP6:%.*]] = fmul fast <8 x float> [[TMP1]], [[TMP5]]
; STORE-NEXT: [[ADD8:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD14:%.*]] = fadd fast float [[ADD8]], undef
; STORE-NEXT: [[ADD20:%.*]] = fadd fast float [[ADD14]], undef
; STORE-NEXT: [[ADD26:%.*]] = fadd fast float [[ADD20]], undef
; STORE-NEXT: [[ADD32:%.*]] = fadd fast float [[ADD26]], undef
; STORE-NEXT: [[ADD38:%.*]] = fadd fast float [[ADD32]], undef
; STORE-NEXT: [[ADD44:%.*]] = fadd fast float [[ADD38]], undef
; STORE-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8		; STORE-NEXT: [[ADD47:%.*]] = add nsw i64 [[MUL]], 8
; STORE-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]		; STORE-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD47]]
; STORE-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4		; STORE-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX48]], align 4
; STORE-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]		; STORE-NEXT: [[MUL49:%.*]] = fmul fast float [[TMP2]], [[TMP7]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP6]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP6]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP6]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; STORE-NEXT: [[TMP8:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]		; STORE-NEXT: [[TMP9:%.*]] = fadd fast float [[TMP8]], [[MUL49]]
; STORE-NEXT: [[ADD50:%.*]] = fadd fast float [[ADD44]], [[MUL49]]
; STORE-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]		; STORE-NEXT: [[ADD51]] = fadd fast float [[SUM_082]], [[TMP9]]
; STORE-NEXT: [[INC]] = add nsw i64 [[I_083]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_083]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP3]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[ADD51]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]		; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]
; CHECK-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2		; CHECK-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2
; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]		; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]
; CHECK-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3		; CHECK-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3
; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]		; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]
; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[SUM_042]], undef
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[ADD]], undef
; CHECK-NEXT: [[ADD15:%.*]] = fadd fast float [[ADD9]], undef
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; CHECK-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]		; CHECK-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]
; CHECK-NEXT: [[ADD21:%.*]] = fadd fast float [[ADD15]], undef
; CHECK-NEXT: [[INC]] = add nsw i64 [[I_043]], 1		; CHECK-NEXT: [[INC]] = add nsw i64 [[I_043]], 1
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; CHECK: for.cond.for.end_crit_edge:		; CHECK: for.cond.for.end_crit_edge:
; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32		; CHECK-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32
; CHECK-NEXT: br label [[FOR_END]]		; CHECK-NEXT: br label [[FOR_END]]
; CHECK: for.end:		; CHECK: for.end:
; CHECK-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ [[PHITMP]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ [[PHITMP]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY:%.]] ]
Show All 20 Lines
; STORE-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]		; STORE-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD638]]
; STORE-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2		; STORE-NEXT: [[ADD1239:%.*]] = or i64 [[MUL]], 2
; STORE-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]		; STORE-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1239]]
; STORE-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1840:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]		; STORE-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1840]]
; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]		; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP1]], [[TMP4]]
; STORE-NEXT: [[ADD:%.*]] = fadd fast float [[SUM_042]], undef
; STORE-NEXT: [[ADD9:%.*]] = fadd fast float [[ADD]], undef
; STORE-NEXT: [[ADD15:%.*]] = fadd fast float [[ADD9]], undef
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]		; STORE-NEXT: [[OP_EXTRA]] = fadd fast float [[TMP6]], [[SUM_042]]
; STORE-NEXT: [[ADD21:%.*]] = fadd fast float [[ADD15]], undef
; STORE-NEXT: [[INC]] = add nsw i64 [[I_043]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_043]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP2]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_FOR_END_CRIT_EDGE:%.*]], label [[FOR_BODY]]
; STORE: for.cond.for.end_crit_edge:		; STORE: for.cond.for.end_crit_edge:
; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32		; STORE-NEXT: [[PHITMP:%.*]] = fptosi float [[OP_EXTRA]] to i32
; STORE-NEXT: br label [[FOR_END]]		; STORE-NEXT: br label [[FOR_END]]
; STORE: for.end:		; STORE: for.end:
; STORE-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ [[PHITMP]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY:%.]] ]		; STORE-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ [[PHITMP]], [[FOR_COND_FOR_END_CRIT_EDGE]] ], [ 0, [[ENTRY:%.]] ]
▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines
; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1135]]		; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1135]]
; STORE-NEXT: [[TMP1:%.]] = bitcast float [[B]] to <4 x float>*		; STORE-NEXT: [[TMP1:%.]] = bitcast float [[B]] to <4 x float>*
; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4		; STORE-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
; STORE-NEXT: [[ADD1736:%.*]] = or i64 [[MUL]], 3		; STORE-NEXT: [[ADD1736:%.*]] = or i64 [[MUL]], 3
; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1736]]		; STORE-NEXT: [[ARRAYIDX18:%.]] = getelementptr inbounds float, float [[A]], i64 [[ADD1736]]
; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*		; STORE-NEXT: [[TMP3:%.]] = bitcast float [[ARRAYIDX2]] to <4 x float>*
; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4		; STORE-NEXT: [[TMP4:%.]] = load <4 x float>, <4 x float> [[TMP3]], align 4
; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP4]]		; STORE-NEXT: [[TMP5:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP4]]
; STORE-NEXT: [[ADD8:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD14:%.*]] = fadd fast float [[ADD8]], undef
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP5]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; STORE-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[ADD20:%.*]] = fadd fast float [[ADD14]], undef
; STORE-NEXT: store float [[TMP6]], float* [[C_ADDR_038]], align 4		; STORE-NEXT: store float [[TMP6]], float* [[C_ADDR_038]], align 4
; STORE-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[C_ADDR_038]], i64 1		; STORE-NEXT: [[INCDEC_PTR]] = getelementptr inbounds float, float* [[C_ADDR_038]], i64 1
; STORE-NEXT: [[INC]] = add nsw i64 [[I_039]], 1		; STORE-NEXT: [[INC]] = add nsw i64 [[I_039]], 1
; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]		; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INC]], [[TMP0]]
; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]		; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_END]], label [[FOR_BODY]]
; STORE: for.end:		; STORE: for.end:
; STORE-NEXT: ret i32 0		; STORE-NEXT: ret i32 0
;		;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 3), align 4		; CHECK-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 3), align 4
; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP3]], [[ADD_1]]		; CHECK-NEXT: [[ADD_2:%.*]] = fadd fast float [[TMP3]], [[ADD_1]]
; CHECK-NEXT: store float [[ADD_2]], float* [[RES:%.*]], align 16		; CHECK-NEXT: store float [[ADD_2]], float* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @float_red_example4(		; STORE-LABEL: @float_red_example4(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([32 x float]* @arr_float to <4 x float>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <4 x float>, <4 x float> bitcast ([32 x float]* @arr_float to <4 x float>*), align 16
; STORE-NEXT: [[ADD:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16		; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4		%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4
%add = fadd fast float %1, %0		%add = fadd fast float %1, %0
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8
Show All 23 Lines
; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 7), align 4
; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float [[TMP7]], [[ADD_5]]		; CHECK-NEXT: [[ADD_6:%.*]] = fadd fast float [[TMP7]], [[ADD_5]]
; CHECK-NEXT: store float [[ADD_6]], float* [[RES:%.*]], align 16		; CHECK-NEXT: store float [[ADD_6]], float* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @float_red_example8(		; STORE-LABEL: @float_red_example8(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr_float to <8 x float>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x float>, <8 x float> bitcast ([32 x float]* @arr_float to <8 x float>*), align 16
; STORE-NEXT: [[ADD:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <8 x float> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x float> [[BIN_RDX]], <8 x float> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <8 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x float> [[BIN_RDX2]], <8 x float> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <8 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x float> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16		; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4		%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4
%add = fadd fast float %1, %0		%add = fadd fast float %1, %0
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 15), align 4		; CHECK-NEXT: [[TMP15:%.]] = load float, float getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 15), align 4
; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float [[TMP15]], [[ADD_13]]		; CHECK-NEXT: [[ADD_14:%.*]] = fadd fast float [[TMP15]], [[ADD_13]]
; CHECK-NEXT: store float [[ADD_14]], float* [[RES:%.*]], align 16		; CHECK-NEXT: store float [[ADD_14]], float* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @float_red_example16(		; STORE-LABEL: @float_red_example16(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr_float to <16 x float>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <16 x float>, <16 x float> bitcast ([32 x float]* @arr_float to <16 x float>*), align 16
; STORE-NEXT: [[ADD:%.*]] = fadd fast float undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = fadd fast float undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = fadd fast float undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = fadd fast float undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = fadd fast float undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = fadd fast float undef, [[ADD_4]]
; STORE-NEXT: [[ADD_6:%.*]] = fadd fast float undef, [[ADD_5]]
; STORE-NEXT: [[ADD_7:%.*]] = fadd fast float undef, [[ADD_6]]
; STORE-NEXT: [[ADD_8:%.*]] = fadd fast float undef, [[ADD_7]]
; STORE-NEXT: [[ADD_9:%.*]] = fadd fast float undef, [[ADD_8]]
; STORE-NEXT: [[ADD_10:%.*]] = fadd fast float undef, [[ADD_9]]
; STORE-NEXT: [[ADD_11:%.*]] = fadd fast float undef, [[ADD_10]]
; STORE-NEXT: [[ADD_12:%.*]] = fadd fast float undef, [[ADD_11]]
; STORE-NEXT: [[ADD_13:%.*]] = fadd fast float undef, [[ADD_12]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP0]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x float> [[TMP0]], <16 x float> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <16 x float> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x float> [[BIN_RDX]], <16 x float> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <16 x float> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x float> [[BIN_RDX2]], <16 x float> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = fadd fast <16 x float> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x float> [[BIN_RDX4]], <16 x float> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]		; STORE-NEXT: [[BIN_RDX6:%.*]] = fadd fast <16 x float> [[BIN_RDX4]], [[RDX_SHUF5]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <16 x float> [[BIN_RDX6]], i32 0
; STORE-NEXT: [[ADD_14:%.*]] = fadd fast float undef, [[ADD_13]]
; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16		; STORE-NEXT: store float [[TMP1]], float* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16		%0 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 0), align 16
%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4		%1 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 1), align 4
%add = fadd fast float %1, %0		%add = fadd fast float %1, %0
%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8		%2 = load float, float* getelementptr inbounds ([32 x float], [32 x float]* @arr_float, i64 0, i64 2), align 8
Show All 39 Lines
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 3), align 4		; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 3), align 4
; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP3]], [[ADD_1]]		; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP3]], [[ADD_1]]
; CHECK-NEXT: store i32 [[ADD_2]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_2]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example4(		; STORE-LABEL: @i32_red_example4(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([32 x i32]* @arr_i32 to <4 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([32 x i32]* @arr_i32 to <4 x i32>*), align 16
; STORE-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <4 x i32> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <4 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
; STORE-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
Show All 23 Lines
; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 7), align 4		; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 7), align 4
; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP7]], [[ADD_5]]		; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP7]], [[ADD_5]]
; CHECK-NEXT: store i32 [[ADD_6]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_6]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example8(		; STORE-LABEL: @i32_red_example8(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; STORE-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 15), align 4		; CHECK-NEXT: [[TMP15:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 15), align 4
; CHECK-NEXT: [[ADD_14:%.*]] = add nsw i32 [[TMP15]], [[ADD_13]]		; CHECK-NEXT: [[ADD_14:%.*]] = add nsw i32 [[TMP15]], [[ADD_13]]
; CHECK-NEXT: store i32 [[ADD_14]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_14]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example16(		; STORE-LABEL: @i32_red_example16(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr_i32 to <16 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <16 x i32>, <16 x i32> bitcast ([32 x i32]* @arr_i32 to <16 x i32>*), align 16
; STORE-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; STORE-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; STORE-NEXT: [[ADD_7:%.*]] = add nsw i32 undef, [[ADD_6]]
; STORE-NEXT: [[ADD_8:%.*]] = add nsw i32 undef, [[ADD_7]]
; STORE-NEXT: [[ADD_9:%.*]] = add nsw i32 undef, [[ADD_8]]
; STORE-NEXT: [[ADD_10:%.*]] = add nsw i32 undef, [[ADD_9]]
; STORE-NEXT: [[ADD_11:%.*]] = add nsw i32 undef, [[ADD_10]]
; STORE-NEXT: [[ADD_12:%.*]] = add nsw i32 undef, [[ADD_11]]
; STORE-NEXT: [[ADD_13:%.*]] = add nsw i32 undef, [[ADD_12]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <16 x i32> [[TMP0]], <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <16 x i32> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <16 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <16 x i32> [[BIN_RDX]], <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <16 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <16 x i32> [[BIN_RDX2]], <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <16 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <16 x i32> [[BIN_RDX4]], <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX6:%.*]] = add nsw <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]		; STORE-NEXT: [[BIN_RDX6:%.*]] = add nsw <16 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <16 x i32> [[BIN_RDX6]], i32 0
; STORE-NEXT: [[ADD_14:%.*]] = add nsw i32 undef, [[ADD_13]]
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP31:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 31), align 4		; CHECK-NEXT: [[TMP31:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 31), align 4
; CHECK-NEXT: [[ADD_30:%.*]] = add nsw i32 [[TMP31]], [[ADD_29]]		; CHECK-NEXT: [[ADD_30:%.*]] = add nsw i32 [[TMP31]], [[ADD_29]]
; CHECK-NEXT: store i32 [[ADD_30]], i32* [[RES:%.*]], align 16		; CHECK-NEXT: store i32 [[ADD_30]], i32* [[RES:%.*]], align 16
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_example32(		; STORE-LABEL: @i32_red_example32(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr_i32 to <32 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <32 x i32>, <32 x i32> bitcast ([32 x i32]* @arr_i32 to <32 x i32>*), align 16
; STORE-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; STORE-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; STORE-NEXT: [[ADD_7:%.*]] = add nsw i32 undef, [[ADD_6]]
; STORE-NEXT: [[ADD_8:%.*]] = add nsw i32 undef, [[ADD_7]]
; STORE-NEXT: [[ADD_9:%.*]] = add nsw i32 undef, [[ADD_8]]
; STORE-NEXT: [[ADD_10:%.*]] = add nsw i32 undef, [[ADD_9]]
; STORE-NEXT: [[ADD_11:%.*]] = add nsw i32 undef, [[ADD_10]]
; STORE-NEXT: [[ADD_12:%.*]] = add nsw i32 undef, [[ADD_11]]
; STORE-NEXT: [[ADD_13:%.*]] = add nsw i32 undef, [[ADD_12]]
; STORE-NEXT: [[ADD_14:%.*]] = add nsw i32 undef, [[ADD_13]]
; STORE-NEXT: [[ADD_15:%.*]] = add nsw i32 undef, [[ADD_14]]
; STORE-NEXT: [[ADD_16:%.*]] = add nsw i32 undef, [[ADD_15]]
; STORE-NEXT: [[ADD_17:%.*]] = add nsw i32 undef, [[ADD_16]]
; STORE-NEXT: [[ADD_18:%.*]] = add nsw i32 undef, [[ADD_17]]
; STORE-NEXT: [[ADD_19:%.*]] = add nsw i32 undef, [[ADD_18]]
; STORE-NEXT: [[ADD_20:%.*]] = add nsw i32 undef, [[ADD_19]]
; STORE-NEXT: [[ADD_21:%.*]] = add nsw i32 undef, [[ADD_20]]
; STORE-NEXT: [[ADD_22:%.*]] = add nsw i32 undef, [[ADD_21]]
; STORE-NEXT: [[ADD_23:%.*]] = add nsw i32 undef, [[ADD_22]]
; STORE-NEXT: [[ADD_24:%.*]] = add nsw i32 undef, [[ADD_23]]
; STORE-NEXT: [[ADD_25:%.*]] = add nsw i32 undef, [[ADD_24]]
; STORE-NEXT: [[ADD_26:%.*]] = add nsw i32 undef, [[ADD_25]]
; STORE-NEXT: [[ADD_27:%.*]] = add nsw i32 undef, [[ADD_26]]
; STORE-NEXT: [[ADD_28:%.*]] = add nsw i32 undef, [[ADD_27]]
; STORE-NEXT: [[ADD_29:%.*]] = add nsw i32 undef, [[ADD_28]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP0]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i32> [[TMP0]], <32 x i32> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <32 x i32> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <32 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i32> [[BIN_RDX]], <32 x i32> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <32 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i32> [[BIN_RDX2]], <32 x i32> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <32 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i32> [[BIN_RDX4]], <32 x i32> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX6:%.*]] = add nsw <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]		; STORE-NEXT: [[BIN_RDX6:%.*]] = add nsw <32 x i32> [[BIN_RDX4]], [[RDX_SHUF5]]
; STORE-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i32> [[BIN_RDX6]], <32 x i32> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX8:%.*]] = add nsw <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]		; STORE-NEXT: [[BIN_RDX8:%.*]] = add nsw <32 x i32> [[BIN_RDX6]], [[RDX_SHUF7]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <32 x i32> [[BIN_RDX8]], i32 0
; STORE-NEXT: [[ADD_30:%.*]] = add nsw i32 undef, [[ADD_29]]
; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16		; STORE-NEXT: store i32 [[TMP1]], i32* [[RES:%.*]], align 16
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
}		}

declare i32 @foobar(i32)		declare i32 @foobar(i32)

define void @i32_red_call(i32 %val) {		define void @i32_red_call(i32 %val) {
; CHECK-LABEL: @i32_red_call(		; CHECK-LABEL: @i32_red_call(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; CHECK-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; CHECK-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; CHECK-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])		; CHECK-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_call(		; STORE-LABEL: @i32_red_call(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; STORE-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; STORE-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])		; STORE-NEXT: [[RES:%.*]] = call i32 @foobar(i32 [[TMP1]])
; STORE-NEXT: ret void		; STORE-NEXT: ret void
;		;
entry:		entry:
%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16		%0 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 0), align 16
%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4		%1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 1), align 4
%add = add nsw i32 %1, %0		%add = add nsw i32 %1, %0
%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8		%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr_i32, i64 0, i64 2), align 8
Show All 11 Lines	entry:
%res = call i32 @foobar(i32 %add.6)		%res = call i32 @foobar(i32 %add.6)
ret void		ret void
}		}

define void @i32_red_invoke(i32 %val) personality i32 (...)* @__gxx_personality_v0 {		define void @i32_red_invoke(i32 %val) personality i32 (...)* @__gxx_personality_v0 {
; CHECK-LABEL: @i32_red_invoke(		; CHECK-LABEL: @i32_red_invoke(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; CHECK-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; CHECK-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; CHECK-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; CHECK-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; CHECK-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; CHECK-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; CHECK-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; CHECK-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])		; CHECK-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])
; CHECK-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]		; CHECK-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]
; CHECK: exception:		; CHECK: exception:
; CHECK-NEXT: [[CLEANUP:%.*]] = landingpad i8		; CHECK-NEXT: [[CLEANUP:%.*]] = landingpad i8
; CHECK-NEXT: cleanup		; CHECK-NEXT: cleanup
; CHECK-NEXT: br label [[NORMAL]]		; CHECK-NEXT: br label [[NORMAL]]
; CHECK: normal:		; CHECK: normal:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
; STORE-LABEL: @i32_red_invoke(		; STORE-LABEL: @i32_red_invoke(
; STORE-NEXT: entry:		; STORE-NEXT: entry:
; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16		; STORE-NEXT: [[TMP0:%.]] = load <8 x i32>, <8 x i32> bitcast ([32 x i32]* @arr_i32 to <8 x i32>*), align 16
; STORE-NEXT: [[ADD:%.*]] = add nsw i32 undef, undef
; STORE-NEXT: [[ADD_1:%.*]] = add nsw i32 undef, [[ADD]]
; STORE-NEXT: [[ADD_2:%.*]] = add nsw i32 undef, [[ADD_1]]
; STORE-NEXT: [[ADD_3:%.*]] = add nsw i32 undef, [[ADD_2]]
; STORE-NEXT: [[ADD_4:%.*]] = add nsw i32 undef, [[ADD_3]]
; STORE-NEXT: [[ADD_5:%.*]] = add nsw i32 undef, [[ADD_4]]
; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]		; STORE-NEXT: [[BIN_RDX:%.*]] = add nsw <8 x i32> [[TMP0]], [[RDX_SHUF]]
; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]		; STORE-NEXT: [[BIN_RDX2:%.*]] = add nsw <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		; STORE-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]		; STORE-NEXT: [[BIN_RDX4:%.*]] = add nsw <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0		; STORE-NEXT: [[TMP1:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
; STORE-NEXT: [[ADD_6:%.*]] = add nsw i32 undef, [[ADD_5]]
; STORE-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])		; STORE-NEXT: [[RES:%.*]] = invoke i32 @foobar(i32 [[TMP1]])
; STORE-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]		; STORE-NEXT: to label [[NORMAL:%.]] unwind label [[EXCEPTION:%.]]
; STORE: exception:		; STORE: exception:
; STORE-NEXT: [[CLEANUP:%.*]] = landingpad i8		; STORE-NEXT: [[CLEANUP:%.*]] = landingpad i8
; STORE-NEXT: cleanup		; STORE-NEXT: cleanup
; STORE-NEXT: br label [[NORMAL]]		; STORE-NEXT: br label [[NORMAL]]
; STORE: normal:		; STORE: normal:
; STORE-NEXT: ret void		; STORE-NEXT: ret void
Show All 26 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/long_chains.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	; At this point we can't vectorize only parts of the tree.			; At this point we can't vectorize only parts of the tree.

	define i32 @test(double* nocapture %A, i8* nocapture %B) {			define i32 @test(double* nocapture %A, i8* nocapture %B) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[B:%.]] to <2 x i8>			; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[B:%.]] to <2 x i8>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i8>, <2 x i8> [[TMP0]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i8> [[TMP1]], <i8 3, i8 3>			; CHECK-NEXT: [[TMP2:%.*]] = add <2 x i8> [[TMP1]], <i8 3, i8 3>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i8> undef, i8 [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i8> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i8> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i8> undef, i8 [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i8> [[TMP4]], i8 [[TMP5]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i8> [[TMP5]], i8 [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[TMP6]] to <2 x double>			; CHECK-NEXT: [[TMP7:%.*]] = sitofp <2 x i8> [[TMP6]] to <2 x double>
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP14:%.*]] = fmul <2 x double> [[TMP13]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = fmul <2 x double> [[TMP13]], [[TMP13]]
	Show All 40 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/reassociated-loads.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -reassociate -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-apple-macosx -mcpu=corei7-avx -mattr=+avx2 \| FileCheck %s			; RUN: opt -reassociate -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-apple-macosx -mcpu=corei7-avx -mattr=+avx2 \| FileCheck %s

	define signext i8 @Foo(<32 x i8>* %__v) {			define signext i8 @Foo(<32 x i8>* %__v) {
	; CHECK-LABEL: @Foo(			; CHECK-LABEL: @Foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <32 x i8>, <32 x i8> [[__V:%.*]], align 32			; CHECK-NEXT: [[TMP0:%.]] = load <32 x i8>, <32 x i8> [[__V:%.*]], align 32
	; CHECK-NEXT: [[ADD_I_1_I:%.*]] = add i8 undef, undef
	; CHECK-NEXT: [[ADD_I_2_I:%.*]] = add i8 [[ADD_I_1_I]], undef
	; CHECK-NEXT: [[ADD_I_3_I:%.*]] = add i8 [[ADD_I_2_I]], undef
	; CHECK-NEXT: [[ADD_I_4_I:%.*]] = add i8 [[ADD_I_3_I]], undef
	; CHECK-NEXT: [[ADD_I_5_I:%.*]] = add i8 [[ADD_I_4_I]], undef
	; CHECK-NEXT: [[ADD_I_6_I:%.*]] = add i8 [[ADD_I_5_I]], undef
	; CHECK-NEXT: [[ADD_I_7_I:%.*]] = add i8 [[ADD_I_6_I]], undef
	; CHECK-NEXT: [[ADD_I_8_I:%.*]] = add i8 [[ADD_I_7_I]], undef
	; CHECK-NEXT: [[ADD_I_9_I:%.*]] = add i8 [[ADD_I_8_I]], undef
	; CHECK-NEXT: [[ADD_I_10_I:%.*]] = add i8 [[ADD_I_9_I]], undef
	; CHECK-NEXT: [[ADD_I_11_I:%.*]] = add i8 [[ADD_I_10_I]], undef
	; CHECK-NEXT: [[ADD_I_12_I:%.*]] = add i8 [[ADD_I_11_I]], undef
	; CHECK-NEXT: [[ADD_I_13_I:%.*]] = add i8 [[ADD_I_12_I]], undef
	; CHECK-NEXT: [[ADD_I_14_I:%.*]] = add i8 [[ADD_I_13_I]], undef
	; CHECK-NEXT: [[ADD_I_15_I:%.*]] = add i8 [[ADD_I_14_I]], undef
	; CHECK-NEXT: [[ADD_I_16_I:%.*]] = add i8 [[ADD_I_15_I]], undef
	; CHECK-NEXT: [[ADD_I_17_I:%.*]] = add i8 [[ADD_I_16_I]], undef
	; CHECK-NEXT: [[ADD_I_18_I:%.*]] = add i8 [[ADD_I_17_I]], undef
	; CHECK-NEXT: [[ADD_I_19_I:%.*]] = add i8 [[ADD_I_18_I]], undef
	; CHECK-NEXT: [[ADD_I_20_I:%.*]] = add i8 [[ADD_I_19_I]], undef
	; CHECK-NEXT: [[ADD_I_21_I:%.*]] = add i8 [[ADD_I_20_I]], undef
	; CHECK-NEXT: [[ADD_I_22_I:%.*]] = add i8 [[ADD_I_21_I]], undef
	; CHECK-NEXT: [[ADD_I_23_I:%.*]] = add i8 [[ADD_I_22_I]], undef
	; CHECK-NEXT: [[ADD_I_24_I:%.*]] = add i8 [[ADD_I_23_I]], undef
	; CHECK-NEXT: [[ADD_I_25_I:%.*]] = add i8 [[ADD_I_24_I]], undef
	; CHECK-NEXT: [[ADD_I_26_I:%.*]] = add i8 [[ADD_I_25_I]], undef
	; CHECK-NEXT: [[ADD_I_27_I:%.*]] = add i8 [[ADD_I_26_I]], undef
	; CHECK-NEXT: [[ADD_I_28_I:%.*]] = add i8 [[ADD_I_27_I]], undef
	; CHECK-NEXT: [[ADD_I_29_I:%.*]] = add i8 [[ADD_I_28_I]], undef
	; CHECK-NEXT: [[ADD_I_30_I:%.*]] = add i8 [[ADD_I_29_I]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i8> [[TMP0]], <32 x i8> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <32 x i8> [[TMP0]], <32 x i8> undef, <32 x i32> <i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <32 x i8> [[TMP0]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <32 x i8> [[TMP0]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i8> [[BIN_RDX]], <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <32 x i8> [[BIN_RDX]], <32 x i8> undef, <32 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <32 x i8> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <32 x i8> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i8> [[BIN_RDX2]], <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <32 x i8> [[BIN_RDX2]], <32 x i8> undef, <32 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <32 x i8> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <32 x i8> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i8> [[BIN_RDX4]], <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF5:%.*]] = shufflevector <32 x i8> [[BIN_RDX4]], <32 x i8> undef, <32 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX6:%.*]] = add <32 x i8> [[BIN_RDX4]], [[RDX_SHUF5]]			; CHECK-NEXT: [[BIN_RDX6:%.*]] = add <32 x i8> [[BIN_RDX4]], [[RDX_SHUF5]]
	; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i8> [[BIN_RDX6]], <32 x i8> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF7:%.*]] = shufflevector <32 x i8> [[BIN_RDX6]], <32 x i8> undef, <32 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <32 x i8> [[BIN_RDX6]], [[RDX_SHUF7]]			; CHECK-NEXT: [[BIN_RDX8:%.*]] = add <32 x i8> [[BIN_RDX6]], [[RDX_SHUF7]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <32 x i8> [[BIN_RDX8]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <32 x i8> [[BIN_RDX8]], i32 0
	; CHECK-NEXT: [[ADD_I_31_I:%.*]] = add i8 [[ADD_I_30_I]], undef
	; CHECK-NEXT: ret i8 [[TMP1]]			; CHECK-NEXT: ret i8 [[TMP1]]
	;			;
	entry:			entry:
	%0 = load <32 x i8>, <32 x i8>* %__v, align 32			%0 = load <32 x i8>, <32 x i8>* %__v, align 32
	%vecext.i.i.i = extractelement <32 x i8> %0, i64 0			%vecext.i.i.i = extractelement <32 x i8> %0, i64 0
	%vecext.i.i.1.i = extractelement <32 x i8> %0, i64 1			%vecext.i.i.1.i = extractelement <32 x i8> %0, i64 1
	%add.i.1.i = add i8 %vecext.i.i.1.i, %vecext.i.i.i			%add.i.1.i = add i8 %vecext.i.i.1.i, %vecext.i.i.i
	%vecext.i.i.2.i = extractelement <32 x i8> %0, i64 2			%vecext.i.i.2.i = extractelement <32 x i8> %0, i64 2
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/reduction_loads.ll

	Show All 29 Lines
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = mul <8 x i32> [[TMP1]], <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>			; CHECK-NEXT: [[TMP2:%.*]] = mul <8 x i32> [[TMP1]], <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
	; CHECK-NEXT: [[ADD:%.*]] = add i32 undef, [[SUM]]
	; CHECK-NEXT: [[ADD_1:%.*]] = add i32 undef, [[ADD]]
	; CHECK-NEXT: [[ADD_2:%.*]] = add i32 undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = add i32 undef, [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = add i32 undef, [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = add i32 undef, [[ADD_4]]
	; CHECK-NEXT: [[ADD_6:%.*]] = add i32 undef, [[ADD_5]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP2]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP2]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[SUM]]			; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[SUM]]
	; CHECK-NEXT: [[ADD_7:%.*]] = add i32 undef, [[ADD_6]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	%arrayidx.3 = getelementptr inbounds i32, i32* %p, i64 3			%arrayidx.3 = getelementptr inbounds i32, i32* %p, i64 3
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[ADD:%.*]] = add i32 undef, [[SUM]]
	; CHECK-NEXT: [[ADD_1:%.*]] = add i32 undef, [[ADD]]
	; CHECK-NEXT: [[ADD_2:%.*]] = add i32 undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = add i32 undef, [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = add i32 undef, [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = add i32 undef, [[ADD_4]]
	; CHECK-NEXT: [[ADD_6:%.*]] = add i32 undef, [[ADD_5]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]			; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]
	; CHECK-NEXT: [[ADD_7:%.*]] = add i32 undef, [[ADD_6]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2
	%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3			%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[REORDER_SHUFFLE]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[REORDER_SHUFFLE]], [[TMP3]]
	; CHECK-NEXT: [[ADD:%.*]] = add i32 undef, [[SUM]]
	; CHECK-NEXT: [[ADD_1:%.*]] = add i32 undef, [[ADD]]
	; CHECK-NEXT: [[ADD_2:%.*]] = add i32 undef, [[ADD_1]]
	; CHECK-NEXT: [[ADD_3:%.*]] = add i32 undef, [[ADD_2]]
	; CHECK-NEXT: [[ADD_4:%.*]] = add i32 undef, [[ADD_3]]
	; CHECK-NEXT: [[ADD_5:%.*]] = add i32 undef, [[ADD_4]]
	; CHECK-NEXT: [[ADD_6:%.*]] = add i32 undef, [[ADD_5]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP4]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP4]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]			; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]
	; CHECK-NEXT: [[ADD_7:%.*]] = add i32 undef, [[ADD_6]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2
	%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3			%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll

	Show All 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[MUL_18:%.*]] = add i32 undef, undef
	; CHECK-NEXT: [[MUL_29:%.*]] = add i32 undef, [[MUL_18]]
	; CHECK-NEXT: [[MUL_310:%.*]] = add i32 undef, [[MUL_29]]
	; CHECK-NEXT: [[MUL_411:%.*]] = add i32 undef, [[MUL_310]]
	; CHECK-NEXT: [[MUL_512:%.*]] = add i32 undef, [[MUL_411]]
	; CHECK-NEXT: [[MUL_613:%.*]] = add i32 undef, [[MUL_512]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = add <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[MUL_714:%.*]] = add i32 undef, [[MUL_613]]
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = add i32 %1, %0			%mul.18 = add i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[MUL_18:%.*]] = and i32 undef, undef
	; CHECK-NEXT: [[MUL_29:%.*]] = and i32 undef, [[MUL_18]]
	; CHECK-NEXT: [[MUL_310:%.*]] = and i32 undef, [[MUL_29]]
	; CHECK-NEXT: [[MUL_411:%.*]] = and i32 undef, [[MUL_310]]
	; CHECK-NEXT: [[MUL_512:%.*]] = and i32 undef, [[MUL_411]]
	; CHECK-NEXT: [[MUL_613:%.*]] = and i32 undef, [[MUL_512]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[MUL_714:%.*]] = and i32 undef, [[MUL_613]]
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = and i32 %1, %0			%mul.18 = and i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 31 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[MUL_18:%.*]] = or i32 undef, undef
	; CHECK-NEXT: [[MUL_29:%.*]] = or i32 undef, [[MUL_18]]
	; CHECK-NEXT: [[MUL_310:%.*]] = or i32 undef, [[MUL_29]]
	; CHECK-NEXT: [[MUL_411:%.*]] = or i32 undef, [[MUL_310]]
	; CHECK-NEXT: [[MUL_512:%.*]] = or i32 undef, [[MUL_411]]
	; CHECK-NEXT: [[MUL_613:%.*]] = or i32 undef, [[MUL_512]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i32> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = or <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = or <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = or <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[MUL_714:%.*]] = or i32 undef, [[MUL_613]]
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = or i32 %1, %0			%mul.18 = or i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 31 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[MUL_18:%.*]] = xor i32 undef, undef
	; CHECK-NEXT: [[MUL_29:%.*]] = xor i32 undef, [[MUL_18]]
	; CHECK-NEXT: [[MUL_310:%.*]] = xor i32 undef, [[MUL_29]]
	; CHECK-NEXT: [[MUL_411:%.*]] = xor i32 undef, [[MUL_310]]
	; CHECK-NEXT: [[MUL_512:%.*]] = xor i32 undef, [[MUL_411]]
	; CHECK-NEXT: [[MUL_613:%.*]] = xor i32 undef, [[MUL_512]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <8 x i32> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = xor <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = xor <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = xor <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[BIN_RDX4:%.*]] = xor <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0
	; CHECK-NEXT: [[MUL_714:%.*]] = xor i32 undef, [[MUL_613]]
	; CHECK-NEXT: ret i32 [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	entry:			entry:
	%0 = load i32, i32* %p, align 4			%0 = load i32, i32* %p, align 4
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%1 = load i32, i32* %arrayidx.1, align 4			%1 = load i32, i32* %arrayidx.1, align 4
	%mul.18 = xor i32 %1, %0			%mul.18 = xor i32 %1, %0
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	Show All 23 Lines
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> [[SELF:%.*]], align 16			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> [[SELF:%.*]], align 16
	; CHECK-NEXT: [[TMP1:%.*]] = shl <4 x i32> [[TMP0]], <i32 6, i32 2, i32 13, i32 3>			; CHECK-NEXT: [[TMP1:%.*]] = shl <4 x i32> [[TMP0]], <i32 6, i32 2, i32 13, i32 3>
	; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP2]], <i32 13, i32 27, i32 21, i32 12>			; CHECK-NEXT: [[TMP3:%.*]] = lshr <4 x i32> [[TMP2]], <i32 13, i32 27, i32 21, i32 12>
	; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP0]], <i32 -2, i32 -8, i32 -16, i32 -128>			; CHECK-NEXT: [[TMP4:%.*]] = and <4 x i32> [[TMP0]], <i32 -2, i32 -8, i32 -16, i32 -128>
	; CHECK-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP4]], <i32 18, i32 2, i32 7, i32 13>			; CHECK-NEXT: [[TMP5:%.*]] = shl <4 x i32> [[TMP4]], <i32 18, i32 2, i32 7, i32 13>
	; CHECK-NEXT: [[TMP6:%.*]] = xor <4 x i32> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = xor <4 x i32> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[SELF]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* [[SELF]], align 16
	; CHECK-NEXT: [[TMP7:%.*]] = xor i32 undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = xor i32 [[TMP7]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <4 x i32> [[TMP6]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = xor <4 x i32> [[TMP6]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = xor <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = xor <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = xor i32 [[TMP8]], undef			; CHECK-NEXT: ret i32 [[TMP7]]
	; CHECK-NEXT: ret i32 [[TMP9]]
	;			;
	entry:			entry:
	%0 = load <4 x i32>, <4 x i32>* %self, align 16			%0 = load <4 x i32>, <4 x i32>* %self, align 16
	%1 = shl <4 x i32> %0, <i32 6, i32 2, i32 13, i32 3>			%1 = shl <4 x i32> %0, <i32 6, i32 2, i32 13, i32 3>
	%2 = xor <4 x i32> %1, %0			%2 = xor <4 x i32> %1, %0
	%3 = lshr <4 x i32> %2, <i32 13, i32 27, i32 21, i32 12>			%3 = lshr <4 x i32> %2, <i32 13, i32 27, i32 21, i32 12>
	%4 = and <4 x i32> %0, <i32 -2, i32 -8, i32 -16, i32 -128>			%4 = and <4 x i32> %0, <i32 -2, i32 -8, i32 -16, i32 -128>
	%5 = shl <4 x i32> %4, <i32 18, i32 2, i32 7, i32 13>			%5 = shl <4 x i32> %4, <i32 18, i32 2, i32 7, i32 13>
	Show All 11 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_horcost.ll

	Show All 27 Lines
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[ARRAYIDX]] to <4 x i32>*
	; CHECK-NEXT: [[TMP9:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 4			; CHECK-NEXT: [[TMP9:%.]] = load <4 x i32>, <4 x i32> [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = or i64 [[TMP1]], 7			; CHECK-NEXT: [[TMP10:%.*]] = or i64 [[TMP1]], 7
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP10]]			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds i32, i32 [[DIFF]], i64 [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[ARRAYIDX2]] to <4 x i32>*
	; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4			; CHECK-NEXT: [[TMP12:%.]] = load <4 x i32>, <4 x i32> [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = add nsw <4 x i32> [[TMP12]], [[TMP9]]
	; CHECK-NEXT: [[ADD10:%.*]] = add nsw i32 undef, [[A_088]]
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1			; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 1
	; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[ADD10]], undef
	; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2			; CHECK-NEXT: [[ARRAYIDX34:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 2
	; CHECK-NEXT: [[ADD38:%.*]] = add nsw i32 [[ADD24]], undef
	; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3			; CHECK-NEXT: [[ARRAYIDX48:%.]] = getelementptr inbounds [8 x [8 x i32]], [8 x [8 x i32]] [[M2]], i64 0, i64 [[INDVARS_IV]], i64 3
	; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[ARRAYIDX6]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16			; CHECK-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* [[TMP14]], align 16
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <4 x i32> [[TMP13]], [[RDX_SHUF]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = add nsw <4 x i32> [[TMP13]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[BIN_RDX]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = add nsw <4 x i32> [[BIN_RDX]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[BIN_RDX2]], i32 0
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]			; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP15]], [[A_088]]
	; CHECK-NEXT: [[ADD52:%.*]] = add nsw i32 [[ADD38]], undef
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 8
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_EXTRA]]
	;			;
	entry:			entry:
	%m2 = alloca [8 x [8 x i32]], align 16			%m2 = alloca [8 x [8 x i32]], align 16
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

	Show All 13 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> undef, i16 [[TMP]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> undef, i16 [[TMP]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 undef, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i16> [[TMP0]], i16 undef, i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i16> [[TMP1]] to <2 x i32>			; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i16> [[TMP1]] to <2 x i32>
	; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> <i32 63, i32 undef>, [[REORDER_SHUFFLE]]			; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> <i32 63, i32 undef>, [[REORDER_SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP3]], undef
	; CHECK-NEXT: [[SHUFFLE8:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE8:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[SHUFFLE8]], <i32 undef, i32 15, i32 31, i32 47>			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[SHUFFLE8]], <i32 undef, i32 15, i32 31, i32 47>
	; CHECK-NEXT: [[TMP11:%.*]] = icmp sgt i32 undef, undef
	; CHECK-NEXT: [[TMP12:%.*]] = select i1 [[TMP11]], i32 undef, i32 undef
	; CHECK-NEXT: [[TMP14:%.*]] = icmp sgt i32 [[TMP12]], undef
	; CHECK-NEXT: [[TMP15:%.*]] = select i1 [[TMP14]], i32 [[TMP12]], i32 undef
	; CHECK-NEXT: [[TMP17:%.*]] = icmp sgt i32 [[TMP15]], undef
	; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF9:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt <4 x i32> [[TMP5]], [[RDX_SHUF9]]			; CHECK-NEXT: [[RDX_MINMAX_CMP10:%.*]] = icmp sgt <4 x i32> [[TMP5]], [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP10]], <4 x i32> [[TMP5]], <4 x i32> [[RDX_SHUF9]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT11:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP10]], <4 x i32> [[TMP5]], <4 x i32> [[RDX_SHUF9]]
	; CHECK-NEXT: [[RDX_SHUF12:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT11]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF12:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT11]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP13:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT11]], [[RDX_SHUF12]]			; CHECK-NEXT: [[RDX_MINMAX_CMP13:%.*]] = icmp sgt <4 x i32> [[RDX_MINMAX_SELECT11]], [[RDX_SHUF12]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT14:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP13]], <4 x i32> [[RDX_MINMAX_SELECT11]], <4 x i32> [[RDX_SHUF12]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT14:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP13]], <4 x i32> [[RDX_MINMAX_SELECT11]], <4 x i32> [[RDX_SHUF12]]
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT14]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT14]], i32 0
	; CHECK-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 [[TMP15]], i32 undef
	; CHECK-NEXT: [[TMP19:%.*]] = select i1 undef, i32 [[TMP6]], i32 undef			; CHECK-NEXT: [[TMP19:%.*]] = select i1 undef, i32 [[TMP6]], i32 undef
	; CHECK-NEXT: [[TMP20:%.*]] = icmp sgt i32 [[TMP19]], 63			; CHECK-NEXT: [[TMP20:%.*]] = icmp sgt i32 [[TMP19]], 63
	; CHECK-NEXT: [[TMP7:%.*]] = sub nsw <2 x i32> undef, [[TMP2]]			; CHECK-NEXT: [[TMP7:%.*]] = sub nsw <2 x i32> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = sub <2 x i32> [[TMP7]], undef
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>			; CHECK-NEXT: [[TMP9:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>
	; CHECK-NEXT: [[TMP26:%.*]] = icmp sgt i32 undef, undef
	; CHECK-NEXT: [[TMP27:%.*]] = select i1 [[TMP26]], i32 undef, i32 undef
	; CHECK-NEXT: [[TMP28:%.*]] = icmp sgt i32 [[TMP27]], undef
	; CHECK-NEXT: [[TMP29:%.*]] = select i1 [[TMP28]], i32 undef, i32 [[TMP27]]
	; CHECK-NEXT: [[TMP31:%.*]] = icmp sgt i32 undef, undef
	; CHECK-NEXT: [[TMP32:%.*]] = select i1 [[TMP31]], i32 undef, i32 undef
	; CHECK-NEXT: [[TMP33:%.*]] = icmp sgt i32 [[TMP32]], [[TMP29]]
	; CHECK-NEXT: [[TMP34:%.*]] = select i1 [[TMP33]], i32 [[TMP29]], i32 [[TMP32]]
	; CHECK-NEXT: [[TMP36:%.*]] = icmp sgt i32 undef, undef
	; CHECK-NEXT: [[TMP37:%.*]] = select i1 [[TMP36]], i32 undef, i32 undef
	; CHECK-NEXT: [[TMP38:%.*]] = icmp sgt i32 [[TMP37]], [[TMP34]]
	; CHECK-NEXT: [[TMP39:%.*]] = select i1 [[TMP38]], i32 [[TMP34]], i32 [[TMP37]]
	; CHECK-NEXT: [[TMP41:%.*]] = icmp sgt i32 undef, undef
	; CHECK-NEXT: [[TMP42:%.*]] = select i1 [[TMP41]], i32 undef, i32 undef
	; CHECK-NEXT: [[TMP43:%.*]] = icmp sgt i32 [[TMP42]], [[TMP39]]
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt <4 x i32> [[TMP9]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp slt <4 x i32> [[TMP9]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP9]], <4 x i32> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP]], <4 x i32> [[TMP9]], <4 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp slt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp slt <4 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <4 x i1> [[RDX_MINMAX_CMP2]], <4 x i32> [[RDX_MINMAX_SELECT]], <4 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[RDX_MINMAX_SELECT3]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = icmp slt i32 [[TMP10]], undef			; CHECK-NEXT: [[TMP11:%.*]] = icmp slt i32 [[TMP10]], undef
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP11]], i32 [[TMP10]], i32 undef
	; CHECK-NEXT: [[TMP12:%.*]] = icmp slt i32 [[OP_EXTRA]], undef			; CHECK-NEXT: [[TMP12:%.*]] = icmp slt i32 [[OP_EXTRA]], undef
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = select i1 [[TMP12]], i32 [[OP_EXTRA]], i32 undef			; CHECK-NEXT: [[OP_EXTRA4:%.*]] = select i1 [[TMP12]], i32 [[OP_EXTRA]], i32 undef
	; CHECK-NEXT: [[TMP13:%.*]] = icmp slt i32 [[OP_EXTRA4]], undef			; CHECK-NEXT: [[TMP13:%.*]] = icmp slt i32 [[OP_EXTRA4]], undef
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA4]], i32 undef			; CHECK-NEXT: [[OP_EXTRA5:%.*]] = select i1 [[TMP13]], i32 [[OP_EXTRA4]], i32 undef
	; CHECK-NEXT: [[TMP14:%.*]] = icmp slt i32 [[OP_EXTRA5]], undef			; CHECK-NEXT: [[TMP14:%.*]] = icmp slt i32 [[OP_EXTRA5]], undef
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = select i1 [[TMP14]], i32 [[OP_EXTRA5]], i32 undef			; CHECK-NEXT: [[OP_EXTRA6:%.*]] = select i1 [[TMP14]], i32 [[OP_EXTRA5]], i32 undef
	; CHECK-NEXT: [[TMP15:%.*]] = icmp slt i32 [[OP_EXTRA6]], undef			; CHECK-NEXT: [[TMP15:%.*]] = icmp slt i32 [[OP_EXTRA6]], undef
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[TMP15]], i32 [[OP_EXTRA6]], i32 undef			; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[TMP15]], i32 [[OP_EXTRA6]], i32 undef
	; CHECK-NEXT: [[TMP44:%.*]] = select i1 [[TMP43]], i32 [[TMP39]], i32 [[TMP42]]
	; CHECK-NEXT: [[TMP45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA7]]			; CHECK-NEXT: [[TMP45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA7]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	bb:			bb:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1: ; preds = %bb			bb1: ; preds = %bb
	ret void			ret void
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/undef_vect.ll

	Show All 10 Lines
	; CHECK-NEXT: [[DOTSROA_CAST_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 1
	; CHECK-NEXT: [[DOTSROA_CAST_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 1
	; CHECK-NEXT: [[DOTSROA_CAST_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[DOTSROA_CAST_4]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[DOTSROA_CAST_4]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[CMP_I1_4:%.*]] = icmp slt i32 undef, undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_4:%.*]] = select i1 [[CMP_I1_4]], i32 undef, i32 undef
	; CHECK-NEXT: [[CMP_I1_5:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_4]], undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_5:%.*]] = select i1 [[CMP_I1_5]], i32 undef, i32 [[DOTSROA_SPECULATED_4]]
	; CHECK-NEXT: [[CMP_I1_6:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_5]], undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_6:%.*]] = select i1 [[CMP_I1_6]], i32 undef, i32 [[DOTSROA_SPECULATED_5]]
	; CHECK-NEXT: [[CMP_I1_7:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_6]], undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_7:%.*]] = select i1 [[CMP_I1_7]], i32 undef, i32 [[DOTSROA_SPECULATED_6]]
	; CHECK-NEXT: [[CMP_I1_8:%.*]] = icmp slt i32 undef, undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <8 x i32> [[TMP1]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp sgt <8 x i32> [[TMP1]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP1]], <8 x i32> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP1]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp sgt <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP2]], undef
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP3]], i32 [[TMP2]], i32 undef			; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP3]], i32 [[TMP2]], i32 undef
	; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[OP_EXTRA]], undef			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[OP_EXTRA]], undef
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[TMP4]], i32 [[OP_EXTRA]], i32 undef			; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[TMP4]], i32 [[OP_EXTRA]], i32 undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_8:%.*]] = select i1 [[CMP_I1_8]], i32 undef, i32 undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_EXTRA7]]			; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_EXTRA7]]
	; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef			; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	for.body.lr.ph:			for.body.lr.ph:
	%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 0			%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 0
	%retval.sroa.0.0.copyload.i5.4 = load i32, i32* %.sroa_cast.4, align 4			%retval.sroa.0.0.copyload.i5.4 = load i32, i32* %.sroa_cast.4, align 4
	%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 1			%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 1
	Show All 36 Lines

llvm/trunk/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

	Show All 12 Lines
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 undef, undef
	; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 undef, i32 undef
	; CHECK-NEXT: [[CMP15:%.*]] = icmp ult i32 [[COND]], undef
	; CHECK-NEXT: [[COND19:%.*]] = select i1 [[CMP15]], i32 [[COND]], i32 undef
	; CHECK-NEXT: [[CMP20:%.*]] = icmp ult i32 [[COND19]], undef
	; CHECK-NEXT: [[COND24:%.*]] = select i1 [[CMP20]], i32 [[COND19]], i32 undef
	; CHECK-NEXT: [[CMP25:%.*]] = icmp ult i32 [[COND24]], undef
	; CHECK-NEXT: [[COND29:%.*]] = select i1 [[CMP25]], i32 [[COND24]], i32 undef
	; CHECK-NEXT: [[CMP30:%.*]] = icmp ult i32 [[COND29]], undef
	; CHECK-NEXT: [[COND34:%.*]] = select i1 [[CMP30]], i32 [[COND29]], i32 undef
	; CHECK-NEXT: [[CMP35:%.*]] = icmp ult i32 [[COND34]], undef
	; CHECK-NEXT: [[COND39:%.*]] = select i1 [[CMP35]], i32 [[COND34]], i32 undef
	; CHECK-NEXT: [[CMP40:%.*]] = icmp ult i32 [[COND39]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: [[COND44:%.*]] = select i1 [[CMP40]], i32 [[COND39]], i32 undef
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	%add2 = add i32 %0, %a2			%add2 = add i32 %0, %a2
	%add4 = add i32 %0, %a3			%add4 = add i32 %0, %a3
	Show All 34 Lines
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 undef, undef
	; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 undef, i32 undef
	; CHECK-NEXT: [[CMP15:%.*]] = icmp ult i32 [[COND]], undef
	; CHECK-NEXT: [[COND19:%.*]] = select i1 [[CMP15]], i32 [[COND]], i32 undef
	; CHECK-NEXT: [[CMP20:%.*]] = icmp ult i32 [[COND19]], undef
	; CHECK-NEXT: [[COND24:%.*]] = select i1 [[CMP20]], i32 [[COND19]], i32 undef
	; CHECK-NEXT: [[CMP25:%.*]] = icmp ult i32 [[COND24]], undef
	; CHECK-NEXT: [[COND29:%.*]] = select i1 [[CMP25]], i32 [[COND24]], i32 undef
	; CHECK-NEXT: [[CMP30:%.*]] = icmp ult i32 [[COND29]], undef
	; CHECK-NEXT: [[COND34:%.*]] = select i1 [[CMP30]], i32 [[COND29]], i32 undef
	; CHECK-NEXT: [[CMP35:%.*]] = icmp ult i32 [[COND34]], undef
	; CHECK-NEXT: [[COND39:%.*]] = select i1 [[CMP35]], i32 [[COND34]], i32 undef
	; CHECK-NEXT: [[CMP40:%.*]] = icmp ult i32 [[COND39]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: [[COND44:%.*]] = select i1 [[CMP40]], i32 [[COND39]], i32 undef
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 1
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2			%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	Show All 38 Lines
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 undef, undef
	; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 undef, i32 undef
	; CHECK-NEXT: [[CMP15:%.*]] = icmp ult i32 [[COND]], undef
	; CHECK-NEXT: [[COND19:%.*]] = select i1 [[CMP15]], i32 [[COND]], i32 undef
	; CHECK-NEXT: [[CMP20:%.*]] = icmp ult i32 [[COND19]], undef
	; CHECK-NEXT: [[COND24:%.*]] = select i1 [[CMP20]], i32 [[COND19]], i32 undef
	; CHECK-NEXT: [[CMP25:%.*]] = icmp ult i32 [[COND24]], undef
	; CHECK-NEXT: [[COND29:%.*]] = select i1 [[CMP25]], i32 [[COND24]], i32 undef
	; CHECK-NEXT: [[CMP30:%.*]] = icmp ult i32 [[COND29]], undef
	; CHECK-NEXT: [[COND34:%.*]] = select i1 [[CMP30]], i32 [[COND29]], i32 undef
	; CHECK-NEXT: [[CMP35:%.*]] = icmp ult i32 [[COND34]], undef
	; CHECK-NEXT: [[COND39:%.*]] = select i1 [[CMP35]], i32 [[COND34]], i32 undef
	; CHECK-NEXT: [[CMP40:%.*]] = icmp ult i32 [[COND39]], undef
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP10]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_CMP:%.*]] = icmp ult <8 x i32> [[TMP10]], [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP]], <8 x i32> [[TMP10]], <8 x i32> [[RDX_SHUF]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_CMP2:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT]], [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT3:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP2]], <8 x i32> [[RDX_MINMAX_SELECT]], <8 x i32> [[RDX_SHUF1]]
	; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[RDX_SHUF4:%.*]] = shufflevector <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_CMP5:%.*]] = icmp ult <8 x i32> [[RDX_MINMAX_SELECT3]], [[RDX_SHUF4]]
	; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]			; CHECK-NEXT: [[RDX_MINMAX_SELECT6:%.*]] = select <8 x i1> [[RDX_MINMAX_CMP5]], <8 x i32> [[RDX_MINMAX_SELECT3]], <8 x i32> [[RDX_SHUF4]]
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i32> [[RDX_MINMAX_SELECT6]], i32 0
	; CHECK-NEXT: [[COND44:%.*]] = select i1 [[CMP40]], i32 [[COND39]], i32 undef
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP11]]
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3			%arrayidx = getelementptr inbounds i32, i32* %arr, i64 3
	%0 = load i32, i32* %arrayidx, align 4			%0 = load i32, i32* %arrayidx, align 4
	%add = add i32 %0, %a1			%add = add i32 %0, %a1
	%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2			%arrayidx1 = getelementptr inbounds i32, i32* %arr, i64 2
	%1 = load i32, i32* %arrayidx1, align 4			%1 = load i32, i32* %arrayidx1, align 4
	Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 221354

llvm/trunk/include/llvm/Transforms/Vectorize/SLPVectorizer.h

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/trunk/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

llvm/trunk/test/Transforms/SLPVectorizer/AArch64/horizontal.ll

llvm/trunk/test/Transforms/SLPVectorizer/AArch64/spillcost-di.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR31847.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR35628_1.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/PR40310.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/bad-reduction.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/horizontal.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/long_chains.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/reassociated-loads.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/reduction_loads.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_horcost.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/undef_vect.ll

llvm/trunk/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
ClosedPublic