This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
10
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
PhaseOrdering/X86/
-
X86/
-
vector-reductions-logical.ll
-
vector-reductions.ll
-
SLPVectorizer/
-
AArch64/
-
gather-root.ll
-
transpose-inseltpoison.ll
-
transpose.ll
-
AMDGPU/
-
horizontal-store.ll
-
X86/
-
PR35628_1.ll
-
PR35628_2.ll
-
PR39774.ll
-
PR40310.ll
-
blending-shuffle-inseltpoison.ll
-
blending-shuffle.ll
-
crash_reordering_undefs.ll
-
horizontal-list.ll
-
horizontal-minmax.ll
-
matched-shuffled-entries.ll
-
reduction-logical.ll
-
reduction_loads.ll
-
reduction_unrolled.ll
-
reorder_repeated_ops.ll
-
revectorized_rdx_crash.ll
-
undef_vect.ll
-
used-reduced-op.ll
-
vectorize-reorder-reuse.ll
-
slp-umax-rdx-matcher-crash.ll

Differential D111574

[SLP]Improve reductions vectorization.
ClosedPublic

Authored by ABataev on Oct 11 2021, 1:07 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
anton-afanasyev

Commits

rG7d8060bc19e9: [SLP]Improve reductions vectorization.

Summary

The pattern matching and vectgorization for reductions was not very
effective. Some of of the possible reduction values were marked as
external arguments, SLP could not find some reduction patterns because
of too early attempt to vectorize pair of binops arguments, the cost of
consts reductions was not correct. Patch addresses these issues and
improves the analysis/cost estimation and vectorization of the
reductions.

The most significant changes in SLP.NumVectorInstructions:

Metric: SLP.NumVectorInstructions [140/14396]

Program results results0 diff

         test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   920.00  3548.00 285.7%
          test-suite :: SingleSource/Benchmarks/BenchmarkGame/n-body.test    66.00   122.00  84.8%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   100.00   128.00  28.0%

test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 664.00 810.00 22.0%

               test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   592.00   687.00  16.0%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   402.00   426.00   6.0%
                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test  1665.00  1745.00   4.8%
test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   135.00   139.00   3.0%

test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 135.00 139.00 3.0%

              test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   388.00   397.00   2.3%
               test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   895.00   914.00   2.1%
test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   240.00   244.00   1.7%
       test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   240.00   244.00   1.7%
         test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   820.00   832.00   1.5%
          test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   820.00   832.00   1.5%
   test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 14804.00 14914.00   0.7%
                    test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  8125.00  8183.00   0.7%
       test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1330.00  1338.00   0.6%
        test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1330.00  1338.00   0.6%
     test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  9832.00  9880.00   0.5%
     test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  5267.00  5291.00   0.5%
   test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  4018.00  4024.00   0.1%
  test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  4018.00  4024.00   0.1%
          test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test   426.00   424.00  -0.5%
           test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test   426.00   424.00  -0.5%
      test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test   201.00   192.00  -4.5%
     test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test   201.00   192.00  -4.5%

644.nab_s and 544.nab_r - reduced number of shuffles but increased number
of useful vectorized instructions.

641.leela_s and 541.leela_r - the function
@_ZN9FastBoard25get_pattern3_augment_specEiib is not inlined anymore
but its body gets vectorized successfully. Before, the function was
inlined twice and vectorized just after inlining, currently it is not
required. The vector code looks pretty similar, just like as it was before.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,770 ms	x64 debian > SanitizerCommon-asan-x86_64-Linux.SanitizerCommon-asan-x86_64-Linux::onprint.cpp
	710 ms	x64 debian > SanitizerCommon-lsan-x86_64-Linux.SanitizerCommon-lsan-x86_64-Linux::onprint.cpp
	2,920 ms	x64 debian > SanitizerCommon-msan-x86_64-Linux.SanitizerCommon-msan-x86_64-Linux::onprint.cpp
	4,300 ms	x64 debian > SanitizerCommon-tsan-x86_64-Linux.SanitizerCommon-tsan-x86_64-Linux::onprint.cpp
	3,710 ms	x64 debian > SanitizerCommon-ubsan-x86_64-Linux.SanitizerCommon-ubsan-x86_64-Linux::onprint.cpp

Event Timeline

ABataev created this revision.Oct 11 2021, 1:07 PM

Herald added subscribers: kerbowa, dmgreen, zzheng and 3 others. · View Herald TranscriptOct 11 2021, 1:07 PM

ABataev requested review of this revision.Oct 11 2021, 1:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2021, 1:07 PM

Harbormaster completed remote builds in B128180: Diff 378768.Oct 11 2021, 2:17 PM

Some minor comments, but its a heavy patch to review tbh...

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3883	Can this 2 iteration for-loop be simplified/split ?
5040	Can you transfer some of this explanation up to AreVectorizableGathers - its not very obvious what is being checked there.
5482	precommit this?
8458	vectorization
8472	Split this to stop clang-format heroics?
8480	Split this to stop clang-format heroics?
8485	Split this to stop clang-format heroics?
8610	Can you use another variable name to avoid Wshadow?
8805	for-range loop?
8808	for-range loop?

In D111574#3075574, @RKSimon wrote:

Some minor comments, but its a heavy patch to review tbh...

I understand. I'll try to split it.
The main idea behind this patch is to improve the reduction vectorization process. Currently, it SLPVectorizer gathers 3 kinds of instructions:
reduction operations (those with the root RdxKind kind), reduction values (the very first non-RdxKind instruction with the same opcodes) and extra args (instructions with different parents, non-RdxKind or non-reduced value opcode, etc.). At first, it complicates the reduction analysis (some of the potential reduction operands may transform into extra args, because their operands are also extra args). Also, it throws away some potential beneficial reductions, like constant, reductions with the repeated values, reductions with same/alternate opcodes.
Patch simplifies the reduction analysis process (we just do simple BFS in the operand order), gathers all potential reduced values (without checking for reduced value opcode) and extra args (without any extra transformations, we can detect such args immediately). Then it sorts potential reduction values by their value/instruction opcodes (same and/or alternate ones too) and then it tries to generate the reduction for all these potentially reduced values/instructions.
Also, it changes the order of reductions/args vectorization attempts. At first, we need to find the reductions and only if there are no reductions, try to vectorize args of the binops.
Also, it tries to generate the final scalar code for the non-reduced/extra args in the most optimal way, to avoid some extra dependency between the last scalar instructions to allow the CPU to schedule more instructions to be executed independently.
That's the first patch in the series. I have another one, which should add support for reduction operations with many uses, it may help to vectorize something like this:

bool Res = false;
for (int i =0; i < 15; ++i) {
  bool Cmp = a[i] < a[i+1];
  int min = Cmp ? a[i] : a[i+1]
  Res |= Cmp;
}

and similar patterns I saw in real user code.

ABataev mentioned this in D112224: [SLP]Change the order of the reduction/binops args pair vectorization attempts..Oct 21 2021, 8:03 AM

ABataev mentioned this in rGeb9b75dd4da8: [SLP]Change the order of the reduction/binops args pair vectorization attempts..Oct 25 2021, 6:28 AM

Rebase

Harbormaster completed remote builds in B130472: Diff 382002.Oct 25 2021, 9:19 AM

ABataev mentioned this in D112467: [SLP]Do not reorder reduction nodes..Oct 25 2021, 10:07 AM

ABataev mentioned this in rGce14d1b690d8: [SLP]Do not reorder reduction nodes..Oct 26 2021, 7:44 AM

Rebase

Harbormaster completed remote builds in B130953: Diff 382664.Oct 27 2021, 8:29 AM

Address comments

Harbormaster completed remote builds in B130967: Diff 382689.Oct 27 2021, 9:54 AM

vporpo added a subscriber: vporpo.Nov 11 2021, 7:57 PM

Rebase

Harbormaster completed remote builds in B139233: Diff 394266.Dec 14 2021, 9:18 AM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 8:32 AM

Herald added a subscriber: kosarev. · View Herald Transcript

Use range-based for

Harbormaster completed remote builds in B164909: Diff 430080.May 17 2022, 9:38 AM

LGTM

llvm/test/Transforms/SLPVectorizer/X86/bool-mask.ll
3 ↗	(On Diff #430080)	You might be able to add a common SSE check prefix?

This revision is now accepted and ready to land.May 18 2022, 3:35 AM

This revision was landed with ongoing or failed builds.May 18 2022, 1:24 PM

Closed by commit rG7d8060bc19e9: [SLP]Improve reductions vectorization. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG7d8060bc19e9: [SLP]Improve reductions vectorization..

ABataev marked an inline comment as done.May 18 2022, 1:25 PM

@ABataev, we are getting an assertion failure downstream in setInsertPointAfterBundle which is bisected to this patch and reproducible with it.

I have created a PR here with my analysis along with an upstream reproducer: https://github.com/llvm/llvm-project/issues/55796
Here's the (potential) fix for review: https://reviews.llvm.org/D126713.

If this is not the correct fix, can we please revert this patch and fix the issue?

anna mentioned this in D126713: [SLPVectorizer] Fix extractelement insertion point.May 31 2022, 11:45 AM

In D111574#3548038, @anna wrote:

@ABataev, we are getting an assertion failure downstream in setInsertPointAfterBundle which is bisected to this patch and reproducible with it.

I have created a PR here with my analysis along with an upstream reproducer: https://github.com/llvm/llvm-project/issues/55796
Here's the (potential) fix for review: https://reviews.llvm.org/D126713.

If this is not the correct fix, can we please revert this patch and fix the issue?

I saw it. Was waiting for a patch from you. If you don't have much time, I can prepare a small fix for this issue.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

710 lines

test/

Transforms/

PhaseOrdering/

X86/

vector-reductions-logical.ll

56 lines

vector-reductions.ll

2 lines

SLPVectorizer/

AArch64/

gather-root.ll

89 lines

transpose-inseltpoison.ll

12 lines

transpose.ll

12 lines

AMDGPU/

horizontal-store.ll

18 lines

X86/

10 lines

10 lines

173 lines

16 lines

blending-shuffle-inseltpoison.ll

2 lines

blending-shuffle.ll

2 lines

crash_reordering_undefs.ll

16 lines

horizontal-list.ll

263 lines

horizontal-minmax.ll

147 lines

matched-shuffled-entries.ll

18 lines

reduction-logical.ll

246 lines

reduction_loads.ll

22 lines

reduction_unrolled.ll

4 lines

reorder_repeated_ops.ll

23 lines

revectorized_rdx_crash.ll

23 lines

undef_vect.ll

8 lines

used-reduced-op.ll

8 lines

vectorize-reorder-reuse.ll

16 lines

slp-umax-rdx-matcher-crash.ll

2 lines

Diff 382664

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,848 Lines • ▼ Show 20 Lines	case Instruction::GetElementPtr: {
return;		return;
}		}
}		}

TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,		TreeEntry TE = newTreeEntry(VL, Bundle /vectorized*/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
LLVM_DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");		LLVM_DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");
TE->setOperandsInOrder();		TE->setOperandsInOrder();
for (unsigned i = 0, e = 2; i < e; ++i) {		for (unsigned I = 0, E = 2; I < E; ++I) {
ValueList Operands;		ValueList Operands;
		if (I >= 1) {
		// Need to cast all elements to the same type before vectorization to
		// avoid crash.
		Type *VL0Ty = VL0->getOperand(I)->getType();
		Type *Ty =
		all_of(VL, [VL0Ty](Value *V) { return VL0Ty == V->getType(); })
		? VL0Ty
		: DL->getIndexType(cast<GetElementPtrInst>(VL0)
		->getPointerOperandType()
		->getScalarType());
		// Prepare the operand vector.
		for (Value *V : VL) {
		auto *Op = cast<Instruction>(V)->getOperand(I);
		auto *CI = cast<ConstantInt>(Op);
		Operands.push_back(ConstantExpr::getIntegerCast(
		CI, Ty, CI->getValue().isSignBitSet()));
		}
		} else {
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *V : VL)		for (Value *V : VL)
Operands.push_back(cast<Instruction>(V)->getOperand(i));		Operands.push_back(cast<Instruction>(V)->getOperand(I));
		}

buildTree_rec(Operands, Depth + 1, {TE, i});		buildTree_rec(Operands, Depth + 1, {TE, I});
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions Can this 2 iteration for-loop be simplified/split ? RKSimon: Can this 2 iteration for-loop be simplified/split ?
return;		return;
}		}
case Instruction::Store: {		case Instruction::Store: {
// Check if the stores are consecutive or if we need to swizzle them.		// Check if the stores are consecutive or if we need to swizzle them.
llvm::Type *ScalarTy = cast<StoreInst>(VL0)->getValueOperand()->getType();		llvm::Type *ScalarTy = cast<StoreInst>(VL0)->getValueOperand()->getType();
// Avoid types that are padded when being allocated as scalars, while		// Avoid types that are padded when being allocated as scalars, while
// being packed together in a vector (such as i1).		// being packed together in a vector (such as i1).
if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
▲ Show 20 Lines • Show All 1,140 Lines • ▼ Show 20 Lines	if (VectorizableTree.size() == 1 &&
return true;		return true;

if (VectorizableTree.size() != 2)		if (VectorizableTree.size() != 2)
return false;		return false;

// Handle splat and all-constants stores. Also try to vectorize tiny trees		// Handle splat and all-constants stores. Also try to vectorize tiny trees
// with the second gather nodes if they have less scalar operands rather than		// with the second gather nodes if they have less scalar operands rather than
// the initial tree element (may be profitable to shuffle the second gather)		// the initial tree element (may be profitable to shuffle the second gather)
// or they are extractelements, which form shuffle.		// or they are extractelements, which form shuffle.
		RKSimonUnsubmitted Not Done Reply Inline Actions Can you transfer some of this explanation up to AreVectorizableGathers - its not very obvious what is being checked there. RKSimon: Can you transfer some of this explanation up to AreVectorizableGathers - its not very obvious…
SmallVector<int> Mask;		SmallVector<int> Mask;
if (VectorizableTree[0]->State == TreeEntry::Vectorize &&		if (VectorizableTree[0]->State == TreeEntry::Vectorize &&
AreVectorizableGathers(VectorizableTree[1].get(),		AreVectorizableGathers(VectorizableTree[1].get(),
VectorizableTree[0]->Scalars.size()))		VectorizableTree[0]->Scalars.size()))
return true;		return true;

// Gathering cost would be too much for tiny trees.		// Gathering cost would be too much for tiny trees.
if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|		if (VectorizableTree[0]->State == TreeEntry::NeedToGather \|\|
▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	if (UsedTEs.empty()) {
return None;		return None;
UsedTEs.push_back(SavedVToTEs);		UsedTEs.push_back(SavedVToTEs);
Idx = UsedTEs.size() - 1;		Idx = UsedTEs.size() - 1;
}		}
UsedValuesEntry.try_emplace(V, Idx);		UsedValuesEntry.try_emplace(V, Idx);
}		}
}		}

		if (UsedTEs.empty()) {
		assert(all_of(TE->Scalars, UndefValue::classof) &&
		"Expected vector of undefs only.");
		return None;
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions precommit this? RKSimon: precommit this?

unsigned VF = 0;		unsigned VF = 0;
if (UsedTEs.size() == 1) {		if (UsedTEs.size() == 1) {
// Try to find the perfect match in another gather node at first.		// Try to find the perfect match in another gather node at first.
auto It = find_if(UsedTEs.front(), [TE](const TreeEntry *EntryPtr) {		auto It = find_if(UsedTEs.front(), [TE](const TreeEntry *EntryPtr) {
return EntryPtr->isSame(TE->Scalars);		return EntryPtr->isSame(TE->Scalars);
});		});
if (It != UsedTEs.front().end()) {		if (It != UsedTEs.front().end()) {
Entries.push_back(*It);		Entries.push_back(*It);
▲ Show 20 Lines • Show All 2,549 Lines • ▼ Show 20 Lines
/// +		/// +
/// \|		/// \|
/// *p =		/// *p =
///		///
class HorizontalReduction {		class HorizontalReduction {
using ReductionOpsType = SmallVector<Value *, 16>;		using ReductionOpsType = SmallVector<Value *, 16>;
using ReductionOpsListType = SmallVector<ReductionOpsType, 2>;		using ReductionOpsListType = SmallVector<ReductionOpsType, 2>;
ReductionOpsListType ReductionOps;		ReductionOpsListType ReductionOps;
SmallVector<Value *, 32> ReducedVals;		SmallVector<SmallVector<Value *>> ReducedVals;
		DenseMap<Value , Instruction > ReducedValsToOps;
// Use map vector to make stable output.		// Use map vector to make stable output.
MapVector<Instruction , Value > ExtraArgs;		MapVector<Instruction , Value > ExtraArgs;
WeakTrackingVH ReductionRoot;		WeakTrackingVH ReductionRoot;
/// The type of reduction operation.		/// The type of reduction operation.
RecurKind RdxKind;		RecurKind RdxKind;

const unsigned INVALID_OPERAND_INDEX = std::numeric_limits<unsigned>::max();

static bool isCmpSelMinMax(Instruction *I) {		static bool isCmpSelMinMax(Instruction *I) {
return match(I, m_Select(m_Cmp(), m_Value(), m_Value())) &&		return match(I, m_Select(m_Cmp(), m_Value(), m_Value())) &&
RecurrenceDescriptor::isMinMaxRecurrenceKind(getRdxKind(I));		RecurrenceDescriptor::isMinMaxRecurrenceKind(getRdxKind(I));
}		}

// And/or are potentially poison-safe logical patterns like:		// And/or are potentially poison-safe logical patterns like:
// select x, y, false		// select x, y, false
// select x, true, y		// select x, true, y
Show All 27 Lines	static Value getRdxOperand(Instruction I, unsigned Index) {
// To make that work with the normal operand processing, we skip the		// To make that work with the normal operand processing, we skip the
// true value operand.		// true value operand.
// TODO: Change the code and data structures to handle this without a hack.		// TODO: Change the code and data structures to handle this without a hack.
if (getRdxKind(I) == RecurKind::Or && isa<SelectInst>(I) && Index == 1)		if (getRdxKind(I) == RecurKind::Or && isa<SelectInst>(I) && Index == 1)
return I->getOperand(2);		return I->getOperand(2);
return I->getOperand(Index);		return I->getOperand(Index);
}		}

/// Checks if the ParentStackElem.first should be marked as a reduction
/// operation with an extra argument or as extra argument itself.
void markExtraArg(std::pair<Instruction *, unsigned> &ParentStackElem,
Value *ExtraArg) {
if (ExtraArgs.count(ParentStackElem.first)) {
ExtraArgs[ParentStackElem.first] = nullptr;
// We ran into something like:
// ParentStackElem.first = ExtraArgs[ParentStackElem.first] + ExtraArg.
// The whole ParentStackElem.first should be considered as an extra value
// in this case.
// Do not perform analysis of remaining operands of ParentStackElem.first
// instruction, this whole instruction is an extra argument.
ParentStackElem.second = INVALID_OPERAND_INDEX;
} else {
// We ran into something like:
// ParentStackElem.first += ... + ExtraArg + ...
ExtraArgs[ParentStackElem.first] = ExtraArg;
}
}

/// Creates reduction operation with the current opcode.		/// Creates reduction operation with the current opcode.
static Value createOp(IRBuilder<> &Builder, RecurKind Kind, Value LHS,		static Value createOp(IRBuilder<> &Builder, RecurKind Kind, Value LHS,
Value *RHS, const Twine &Name, bool UseSelect) {		Value *RHS, const Twine &Name, bool UseSelect) {
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);		unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(Kind);
switch (Kind) {		switch (Kind) {
case RecurKind::Or:		case RecurKind::Or:
if (UseSelect &&		if (UseSelect &&
LHS->getType() == CmpInst::makeCmpResultType(LHS->getType()))		LHS->getType() == CmpInst::makeCmpResultType(LHS->getType()))
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,
}		}
propagateIRFlags(Op, ReductionOps[0]);		propagateIRFlags(Op, ReductionOps[0]);
return Op;		return Op;
}		}

/// Creates reduction operation with the current opcode with the IR flags		/// Creates reduction operation with the current opcode with the IR flags
/// from \p I.		/// from \p I.
static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,		static Value createOp(IRBuilder<> &Builder, RecurKind RdxKind, Value LHS,
Value RHS, const Twine &Name, Instruction I) {		Value RHS, const Twine &Name, Value I) {
auto *SelI = dyn_cast<SelectInst>(I);		auto *SelI = dyn_cast<SelectInst>(I);
Value *Op = createOp(Builder, RdxKind, LHS, RHS, Name, SelI != nullptr);		Value *Op = createOp(Builder, RdxKind, LHS, RHS, Name, SelI != nullptr);
if (SelI && RecurrenceDescriptor::isIntMinMaxRecurrenceKind(RdxKind)) {		if (SelI && RecurrenceDescriptor::isIntMinMaxRecurrenceKind(RdxKind)) {
if (auto *Sel = dyn_cast<SelectInst>(Op))		if (auto *Sel = dyn_cast<SelectInst>(Op))
propagateIRFlags(Sel->getCondition(), SelI->getCondition());		propagateIRFlags(Sel->getCondition(), SelI->getCondition());
}		}
propagateIRFlags(Op, I);		propagateIRFlags(Op, I);
return Op;		return Op;
}		}

static RecurKind getRdxKind(Instruction *I) {		static RecurKind getRdxKind(Value *V) {
assert(I && "Expected instruction for reduction matching");		auto *I = dyn_cast<Instruction>(V);
		if (!I)
		return RecurKind::None;
TargetTransformInfo::ReductionFlags RdxFlags;		TargetTransformInfo::ReductionFlags RdxFlags;
if (match(I, m_Add(m_Value(), m_Value())))		if (match(I, m_Add(m_Value(), m_Value())))
return RecurKind::Add;		return RecurKind::Add;
if (match(I, m_Mul(m_Value(), m_Value())))		if (match(I, m_Mul(m_Value(), m_Value())))
return RecurKind::Mul;		return RecurKind::Mul;
if (match(I, m_And(m_Value(), m_Value())) \|\|		if (match(I, m_And(m_Value(), m_Value())) \|\|
match(I, m_LogicalAnd(m_Value(), m_Value())))		match(I, m_LogicalAnd(m_Value(), m_Value())))
return RecurKind::And;		return RecurKind::And;
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	if (Kind == RecurKind::None)
return nullptr;		return nullptr;
return I->getOperand(getFirstOperandIndex(I) + 1);		return I->getOperand(getFirstOperandIndex(I) + 1);
}		}

public:		public:
HorizontalReduction() = default;		HorizontalReduction() = default;

/// Try to find a reduction tree.		/// Try to find a reduction tree.
bool matchAssociativeReduction(PHINode Phi, Instruction Inst) {		bool matchAssociativeReduction(PHINode Phi, Instruction Inst,
		ScalarEvolution &SE, const DataLayout &DL) {
assert((!Phi \|\| is_contained(Phi->operands(), Inst)) &&		assert((!Phi \|\| is_contained(Phi->operands(), Inst)) &&
"Phi needs to use the binary operator");		"Phi needs to use the binary operator");
assert((isa<BinaryOperator>(Inst) \|\| isa<SelectInst>(Inst) \|\|		assert((isa<BinaryOperator>(Inst) \|\| isa<SelectInst>(Inst) \|\|
isa<IntrinsicInst>(Inst)) &&		isa<IntrinsicInst>(Inst)) &&
"Expected binop, select, or intrinsic for reduction matching");		"Expected binop, select, or intrinsic for reduction matching");
RdxKind = getRdxKind(Inst);		RdxKind = getRdxKind(Inst);

// We could have a initial reductions that is not an add.		// We could have a initial reductions that is not an add.
Show All 27 Lines	bool matchAssociativeReduction(PHINode Phi, Instruction Inst,
// Though the ultimate reduction may have multiple uses, its condition must		// Though the ultimate reduction may have multiple uses, its condition must
// have only single use.		// have only single use.
if (auto *Sel = dyn_cast<SelectInst>(Inst))		if (auto *Sel = dyn_cast<SelectInst>(Inst))
if (!Sel->getCondition()->hasOneUse())		if (!Sel->getCondition()->hasOneUse())
return false;		return false;

ReductionRoot = Inst;		ReductionRoot = Inst;

// The opcode for leaf values that we perform a reduction on.		// Iterate through all the operands of the possible reduction tree and
// For example: load(x) + load(y) + load(z) + fptoui(w)		// gather all the reduced values, sorting them by their value id.
// The leaf opcode for 'w' does not match, so we don't include it as a		BasicBlock *BB = Inst->getParent();
// potential candidate for the reduction.		bool IsCmpSelMinMax = isCmpSelMinMax(Inst);
unsigned LeafOpcode = 0;		std::queue<Instruction *> Worklist;
		Worklist.push(Inst);
// Post-order traverse the reduction tree starting at Inst. We only handle		auto &&CheckOperands = [this, IsCmpSelMinMax,
// true trees containing binary operators or selects.		BB](Instruction *TreeN,
SmallVector<std::pair<Instruction *, unsigned>, 32> Stack;		SmallVectorImpl<Value *> &ExtraArgs,
Stack.push_back(std::make_pair(Inst, getFirstOperandIndex(Inst)));		SmallVectorImpl<Value *> &PossibleReducedVals,
		SmallVectorImpl<Instruction *> &ReductionOps) {
		for (int I = getFirstOperandIndex(TreeN),
		End = getNumberOfOperands(TreeN);
		I < End; ++I) {
		Value *EdgeVal = getRdxOperand(TreeN, I);
		ReducedValsToOps.try_emplace(EdgeVal, TreeN);
		auto *EdgeInst = dyn_cast<Instruction>(EdgeVal);
		// Edge has wrong parent - mark as an extra argument.
		if (EdgeInst && !isVectorLikeInstWithConstOps(EdgeInst) &&
		!hasSameParent(EdgeInst, BB)) {
		ExtraArgs.push_back(EdgeVal);
		continue;
		}
		// If the edge is not an instruction, or it is different from main
		// reduction opcode or has too many uses - possible reduced value.
		if (!EdgeInst \|\| getRdxKind(EdgeInst) != RdxKind \|\|
		!hasRequiredNumberOfUses(IsCmpSelMinMax, EdgeInst) \|\|
		!isVectorizable(getRdxKind(EdgeInst), EdgeInst)) {
		PossibleReducedVals.push_back(EdgeVal);
		continue;
		}
		ReductionOps.push_back(EdgeInst);
		}
		};
		MapVector<unsigned, MapVector<uintptr_t, SmallVector<Value *>>>
		PossibleReducedVals;
initReductionOps(Inst);		initReductionOps(Inst);
while (!Stack.empty()) {		while (!Worklist.empty()) {
Instruction *TreeN = Stack.back().first;		Instruction *TreeN = Worklist.front();
unsigned EdgeToVisit = Stack.back().second++;		Worklist.pop();
const RecurKind TreeRdxKind = getRdxKind(TreeN);		SmallVector<Value *> Args;
bool IsReducedValue = TreeRdxKind != RdxKind;		SmallVector<Value *> PossibleRedVals;
		SmallVector<Instruction *> PossibleReductionOps;
// Postorder visit.		CheckOperands(TreeN, Args, PossibleRedVals, PossibleReductionOps);
if (IsReducedValue \|\| EdgeToVisit >= getNumberOfOperands(TreeN)) {		// If too many extra args - mark the instruction itself as a reduction
if (IsReducedValue)		// value, not a reduction operation.
ReducedVals.push_back(TreeN);		if (Args.size() < 2) {
else {
auto ExtraArgsIter = ExtraArgs.find(TreeN);
if (ExtraArgsIter != ExtraArgs.end() && !ExtraArgsIter->second) {
// Check if TreeN is an extra argument of its parent operation.
if (Stack.size() <= 1) {
// TreeN can't be an extra argument as it is a root reduction
// operation.
return false;
}
// Yes, TreeN is an extra argument, do not add it to a list of
// reduction operations.
// Stack[Stack.size() - 2] always points to the parent operation.
markExtraArg(Stack[Stack.size() - 2], TreeN);
ExtraArgs.erase(TreeN);
} else
addReductionOps(TreeN);		addReductionOps(TreeN);
		// Add extra args.
		for (Value *V : Args)
		ExtraArgs[TreeN] = V;
		// Add reduction values. The values are sorted for better vetorization
		RKSimonUnsubmitted Not Done Reply Inline Actions vectorization RKSimon: vectorization
		// results.
		for (Value *V : PossibleRedVals) {
		unsigned Key = V->getValueID() + 1;
		// Sort the loads by the distance between the pointers.
		if (auto *LI = dyn_cast<LoadInst>(V)) {
		bool Found = false;
		for (const auto &LoadData : PossibleReducedVals[Key]) {
		auto *RLI = cast<LoadInst>(LoadData.second.front());
		if (getPointersDiff(RLI->getType(), RLI->getPointerOperand(),
		LI->getType(), LI->getPointerOperand(), DL,
		SE, /StrictCheck=/true)) {
		PossibleReducedVals[Key][reinterpret_cast<uintptr_t>(
		RLI->getPointerOperand())]
		.push_back(V);
		RKSimonUnsubmitted Not Done Reply Inline Actions Split this to stop clang-format heroics? RKSimon: Split this to stop clang-format heroics?
		Found = true;
		break;
}		}
// Retract.
Stack.pop_back();
continue;
}		}
		if (!Found)
// Visit operands.		PossibleReducedVals[Key][reinterpret_cast<uintptr_t>(
Value *EdgeVal = getRdxOperand(TreeN, EdgeToVisit);		LI->getPointerOperand())]
auto *EdgeInst = dyn_cast<Instruction>(EdgeVal);		.push_back(V);
		RKSimonUnsubmitted Not Done Reply Inline Actions Split this to stop clang-format heroics? RKSimon: Split this to stop clang-format heroics?
if (!EdgeInst) {		} else if (auto *EI = dyn_cast<ExtractElementInst>(V)) {
// Edge value is not a reduction instruction or a leaf instruction.		// Sort extracts by the vector operands.
// (It may be a constant, function argument, or something else.)		PossibleReducedVals[Key][reinterpret_cast<uintptr_t>(
markExtraArg(Stack.back(), EdgeVal);		EI->getVectorOperand())]
continue;		.push_back(V);
		RKSimonUnsubmitted Not Done Reply Inline Actions Split this to stop clang-format heroics? RKSimon: Split this to stop clang-format heroics?
}		} else if (auto *I = dyn_cast<Instruction>(V)) {
RecurKind EdgeRdxKind = getRdxKind(EdgeInst);		// Sort other instructions just by the opcodes except for CMPInst.
// Continue analysis if the next operand is a reduction operation or		// For CMP also sort by the predicate kind.
// (possibly) a leaf value. If the leaf value opcode is not set,		if (isValidForAlternation(I->getOpcode()) && !isa<CmpInst>(I))
// the first met operation != reduction operation is considered as the		PossibleReducedVals[0][I->getOpcode()].push_back(V);
// leaf opcode.		else if (auto *CI = dyn_cast<CmpInst>(I))
// Only handle trees in the current basic block.		PossibleReducedVals[Key]
// Each tree node needs to have minimal number of users except for the		[hash_combine(hash_value(I->getOpcode()),
// ultimate reduction.		hash_value(CI->getPredicate()))]
const bool IsRdxInst = EdgeRdxKind == RdxKind;		.push_back(V);
if (EdgeInst != Phi && EdgeInst != Inst &&		else
hasSameParent(EdgeInst, Inst->getParent()) &&		PossibleReducedVals[Key][I->getOpcode()].push_back(V);
hasRequiredNumberOfUses(isCmpSelMinMax(Inst), EdgeInst) &&		} else {
(!LeafOpcode \|\| LeafOpcode == EdgeInst->getOpcode() \|\| IsRdxInst)) {		PossibleReducedVals[Key][0].push_back(V);
if (IsRdxInst) {
// We need to be able to reassociate the reduction operations.
if (!isVectorizable(EdgeRdxKind, EdgeInst)) {
// I is an extra argument for TreeN (its parent operation).
markExtraArg(Stack.back(), EdgeInst);
continue;
}
} else if (!LeafOpcode) {
LeafOpcode = EdgeInst->getOpcode();
}		}
Stack.push_back(
std::make_pair(EdgeInst, getFirstOperandIndex(EdgeInst)));
continue;
}		}
// I is an extra argument for TreeN (its parent operation).		for (Instruction *I : PossibleReductionOps)
markExtraArg(Stack.back(), EdgeInst);		Worklist.push(I);
		} else {
		PossibleReducedVals[0][TreeN->getOpcode()].push_back(TreeN);
		}
		}
		auto PossibleReducedValsVect = PossibleReducedVals.takeVector();
		// Sort the reduced values by number of same/alternate opcode and/or pointer
		// operand.
		auto Cmp = [](ArrayRef<Value > P1, ArrayRef<Value > P2) {
		return P1.size() < P2.size();
		};
		std::priority_queue<SmallVector<Value >, SmallVector<SmallVector<Value >>,
		decltype(Cmp)>
		OrderedVals(Cmp);
		// Sort values by total number of values kinds.
		for (auto &PossibleReducedVals : PossibleReducedValsVect) {
		auto PossibleRedVals = PossibleReducedVals.second.takeVector();
		stable_sort(PossibleRedVals, [](const auto &P1, const auto &P2) {
		return P1.second.size() > P2.second.size();
		});
		for (auto &Data : PossibleRedVals)
		OrderedVals.emplace(Data.second);
		}
		while (!OrderedVals.empty()) {
		ReducedVals.emplace_back(OrderedVals.top().rbegin(),
		OrderedVals.top().rend());
		OrderedVals.pop();
}		}
return true;		return true;
}		}

/// Attempt to vectorize the tree found by matchAssociativeReduction.		/// Attempt to vectorize the tree found by matchAssociativeReduction.
Value tryToReduce(BoUpSLP &V, TargetTransformInfo TTI) {		Value tryToReduce(BoUpSLP &V, TargetTransformInfo TTI) {
// If there are a sufficient number of reduction values, reduce		// If there are a sufficient number of reduction values, reduce
// to a nearby power-of-2. We can safely generate oversized		// to a nearby power-of-2. We can safely generate oversized
// vectors and rely on the backend to split them to legal sizes.		// vectors and rely on the backend to split them to legal sizes.
unsigned NumReducedVals = ReducedVals.size();		unsigned NumReducedVals = std::accumulate(
		ReducedVals.begin(), ReducedVals.end(), 0,
		[](int Num, ArrayRef<Value *> Vals) { return Num += Vals.size(); });
if (NumReducedVals < 4)		if (NumReducedVals < 4)
return nullptr;		return nullptr;

// Intersect the fast-math-flags from all reduction operations.
FastMathFlags RdxFMF;
RdxFMF.set();
for (ReductionOpsType &RdxOp : ReductionOps) {
for (Value *RdxVal : RdxOp) {
if (auto *FPMO = dyn_cast<FPMathOperator>(RdxVal))
RdxFMF &= FPMO->getFastMathFlags();
}
}

IRBuilder<> Builder(cast<Instruction>(ReductionRoot));		IRBuilder<> Builder(cast<Instruction>(ReductionRoot));
Builder.setFastMathFlags(RdxFMF);

		// Track the reduced values in case if they are replaced by extractelement
		// because of the vectorization.
		DenseMap<Value *, WeakTrackingVH> TrackedVals;
BoUpSLP::ExtraValueToDebugLocsMap ExternallyUsedValues;		BoUpSLP::ExtraValueToDebugLocsMap ExternallyUsedValues;
// The same extra argument may be used several times, so log each attempt		// The same extra argument may be used several times, so log each attempt
// to use it.		// to use it.
for (const std::pair<Instruction , Value > &Pair : ExtraArgs) {		for (const std::pair<Instruction , Value > &Pair : ExtraArgs) {
assert(Pair.first && "DebugLoc must be set.");		assert(Pair.first && "DebugLoc must be set.");
ExternallyUsedValues[Pair.second].push_back(Pair.first);		ExternallyUsedValues[Pair.second].push_back(Pair.first);
		TrackedVals.try_emplace(Pair.second, Pair.second);
}		}

// The compare instruction of a min/max is the insertion point for new		// The compare instruction of a min/max is the insertion point for new
// instructions and may be replaced with a new compare instruction.		// instructions and may be replaced with a new compare instruction.
auto getCmpForMinMaxReduction = [](Instruction *RdxRootInst) {		auto &&GetCmpForMinMaxReduction = [](Instruction *RdxRootInst) {
assert(isa<SelectInst>(RdxRootInst) &&		assert(isa<SelectInst>(RdxRootInst) &&
"Expected min/max reduction to have select root instruction");		"Expected min/max reduction to have select root instruction");
Value *ScalarCond = cast<SelectInst>(RdxRootInst)->getCondition();		Value *ScalarCond = cast<SelectInst>(RdxRootInst)->getCondition();
assert(isa<Instruction>(ScalarCond) &&		assert(isa<Instruction>(ScalarCond) &&
"Expected min/max reduction to have compare condition");		"Expected min/max reduction to have compare condition");
return cast<Instruction>(ScalarCond);		return cast<Instruction>(ScalarCond);
};		};

// The reduction root is used as the insertion point for new instructions,		// The reduction root is used as the insertion point for new instructions,
// so set it as externally used to prevent it from being deleted.		// so set it as externally used to prevent it from being deleted.
ExternallyUsedValues[ReductionRoot];		ExternallyUsedValues[ReductionRoot];
SmallVector<Value *, 16> IgnoreList;		SmallVector<Value *> IgnoreList;
for (ReductionOpsType &RdxOp : ReductionOps)		for (ReductionOpsType &RdxOps : ReductionOps)
IgnoreList.append(RdxOp.begin(), RdxOp.end());		for (Value *RdxOp : RdxOps) {
		if (!RdxOp)
		continue;
		IgnoreList.push_back(RdxOp);
		}

unsigned ReduxWidth = PowerOf2Floor(NumReducedVals);		// Need to track reduced vals, they may be changed during vectorization of
if (NumReducedVals > ReduxWidth) {		// subvectors.
// In the loop below, we are building a tree based on a window of		for (ArrayRef<Value *> Candidates : ReducedVals)
// 'ReduxWidth' values.		for (Value *V : Candidates)
// If the operands of those values have common traits (compare predicate,		TrackedVals.try_emplace(V, V);
// constant operand, etc), then we want to group those together to
// minimize the cost of the reduction.		DenseMap<Value *, unsigned> VectorizedVals;
		Value *VectorizedTree = nullptr;
// TODO: This should be extended to count common operands for		// Try to vectorize elements base on their type.
// compares and binops.		for (unsigned I = 0, E = ReducedVals.size(); I < E; ++I) {
		ArrayRef<Value *> OrigReducedVals = ReducedVals[I];
// Step 1: Count the number of times each compare predicate occurs.		InstructionsState S = getSameOpcode(OrigReducedVals);
SmallDenseMap<unsigned, unsigned> PredCountMap;		SmallVector<Value *> Candidates;
for (Value *RdxVal : ReducedVals) {		DenseMap<Value , Value > TrackedToOrig;
CmpInst::Predicate Pred;		for (unsigned Cnt = 0, Sz = OrigReducedVals.size(); Cnt < Sz; ++Cnt) {
if (match(RdxVal, m_Cmp(Pred, m_Value(), m_Value())))		Value *RdxVal = TrackedVals.find(OrigReducedVals[Cnt])->second;
++PredCountMap[Pred];		// Check if the reduction value was not overriden by the extractelement
		// instruction because of the vectorization and exclude it, if it is not
		// compatible with other values.
		if (auto *Inst = dyn_cast<Instruction>(RdxVal))
		if (!S.getOpcode() \|\| !S.isOpcodeOrAlt(Inst))
		continue;
		Candidates.push_back(RdxVal);
		TrackedToOrig.try_emplace(RdxVal, OrigReducedVals[Cnt]);
		}
		bool ShuffledExtracts = false;
		// Try to handle shuffled extractelements.
		if (S.getOpcode() == Instruction::ExtractElement && !S.isAltShuffle() &&
		I + 1 < E) {
		InstructionsState S = getSameOpcode(ReducedVals[I + 1]);
		RKSimonUnsubmitted Not Done Reply Inline Actions Can you use another variable name to avoid Wshadow? RKSimon: Can you use another variable name to avoid Wshadow?
		if (S.getOpcode() == Instruction::ExtractElement && !S.isAltShuffle()) {
		SmallVector<Value *> CommonCandidates(Candidates);
		for (unsigned Cnt = 0, Sz = ReducedVals[I + 1].size(); Cnt < Sz;
		++Cnt) {
		Value *RdxVal = TrackedVals.find(ReducedVals[I + 1][Cnt])->second;
		// Check if the reduction value was not overriden by the
		// extractelement instruction because of the vectorization and
		// exclude it, if it is not compatible with other values.
		if (auto *Inst = dyn_cast<Instruction>(RdxVal))
		if (!S.getOpcode() \|\| !S.isOpcodeOrAlt(Inst))
		continue;
		CommonCandidates.push_back(RdxVal);
		TrackedToOrig.try_emplace(RdxVal, ReducedVals[I + 1][Cnt]);
}		}
// Step 2: Sort the values so the most common predicates come first.		SmallVector<int> Mask;
stable_sort(ReducedVals, [&PredCountMap](Value A, Value B) {		if (isFixedVectorShuffle(CommonCandidates, Mask)) {
CmpInst::Predicate PredA, PredB;		++I;
if (match(A, m_Cmp(PredA, m_Value(), m_Value())) &&		Candidates.swap(CommonCandidates);
match(B, m_Cmp(PredB, m_Value(), m_Value()))) {		ShuffledExtracts = true;
return PredCountMap[PredA] > PredCountMap[PredB];
}		}
return false;
});
}		}
		}
		unsigned NumReducedVals = Candidates.size();
		if (NumReducedVals < 4)
		continue;

Value *VectorizedTree = nullptr;		unsigned ReduxWidth = PowerOf2Floor(NumReducedVals);
unsigned i = 0;		unsigned Start = 0;
while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {		unsigned i = Start;
ArrayRef<Value *> VL(&ReducedVals[i], ReduxWidth);		// Restarts vectorization attempt with lower vector factor.
		auto &&AdjustReducedVals = [&i, &Start, &ReduxWidth, NumReducedVals]() {
		if (ReduxWidth == 4 \|\| i >= NumReducedVals - ReduxWidth + 1) {
		++Start;
		ReduxWidth = PowerOf2Floor(NumReducedVals - Start) * 2;
		}
		i = Start;
		ReduxWidth /= 2;
		};
		while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth >= 4) {
		ArrayRef<Value *> VL(std::next(Candidates.begin(), i), ReduxWidth);
V.buildTree(VL, IgnoreList);		V.buildTree(VL, IgnoreList);
if (V.isTreeTinyAndNotFullyVectorizable(/ForReduction=/true))		if (V.isTreeTinyAndNotFullyVectorizable(/ForReduction=/true)) {
break;		AdjustReducedVals();
if (V.isLoadCombineReductionCandidate(RdxKind))		continue;
break;		}
		if (V.isLoadCombineReductionCandidate(RdxKind)) {
		AdjustReducedVals();
		continue;
		}
V.reorderTopToBottom();		V.reorderTopToBottom();
		// No need to reorder the root node at all.
V.reorderBottomToTop(/IgnoreReorder=/true);		V.reorderBottomToTop(/IgnoreReorder=/true);
V.buildExternalUses(ExternallyUsedValues);		// Keep extracted other reduction values, if they are used in the
		// vectorization trees.
// For a poison-safe boolean logic reduction, do not replace select		BoUpSLP::ExtraValueToDebugLocsMap LocalExternallyUsedValues(
// instructions with logic ops. All reduced values will be frozen (see		ExternallyUsedValues);
// below) to prevent leaking poison.		for (unsigned Cnt = 0, Sz = ReducedVals.size(); Cnt < Sz; ++Cnt) {
if (isa<SelectInst>(ReductionRoot) &&		if (Cnt == I \|\| (ShuffledExtracts && Cnt == I - 1))
isBoolLogicOp(cast<Instruction>(ReductionRoot)) &&		continue;
NumReducedVals != ReduxWidth)		for_each(ReducedVals[Cnt],
break;		[&LocalExternallyUsedValues, &TrackedVals](Value *V) {
		if (isa<Instruction>(V))
		LocalExternallyUsedValues[TrackedVals[V]];
		});
		}
		for (unsigned Cnt = 0; Cnt < NumReducedVals; ++Cnt) {
		if (Cnt >= i && Cnt < i + ReduxWidth)
		continue;
		if (VectorizedVals.count(Candidates[Cnt]))
		continue;
		LocalExternallyUsedValues[Candidates[Cnt]];
		}
		V.buildExternalUses(LocalExternallyUsedValues);

V.computeMinimumValueSizes();		V.computeMinimumValueSizes();

		// Intersect the fast-math-flags from all reduction operations.
		FastMathFlags RdxFMF;
		RdxFMF.set();
		for (Value *RdxVal : VL) {
		if (auto *FPMO = dyn_cast<FPMathOperator>(
		ReducedValsToOps.find(RdxVal)->second))
		RdxFMF &= FPMO->getFastMathFlags();
		}
// Estimate cost.		// Estimate cost.
InstructionCost TreeCost =		InstructionCost TreeCost = V.getTreeCost(VL);
V.getTreeCost(makeArrayRef(&ReducedVals[i], ReduxWidth));
InstructionCost ReductionCost =		InstructionCost ReductionCost =
getReductionCost(TTI, ReducedVals[i], ReduxWidth, RdxFMF);		getReductionCost(TTI, VL, ReduxWidth, RdxFMF);
InstructionCost Cost = TreeCost + ReductionCost;		InstructionCost Cost = TreeCost + ReductionCost;
if (!Cost.isValid()) {		if (!Cost.isValid()) {
LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");		LLVM_DEBUG(dbgs() << "Encountered invalid baseline cost.\n");
return nullptr;		return nullptr;
}		}
if (Cost >= -SLPCostThreshold) {		if (Cost >= -SLPCostThreshold) {
V.getORE()->emit([&]() {		V.getORE()->emit([&]() {
return OptimizationRemarkMissed(SV_NAME, "HorSLPNotBeneficial",		return OptimizationRemarkMissed(SV_NAME, "HorSLPNotBeneficial",
cast<Instruction>(VL[0]))		cast<Instruction>(VL[0]))
<< "Vectorizing horizontal reduction is possible"		<< "Vectorizing horizontal reduction is possible"
<< "but not beneficial with cost " << ore::NV("Cost", Cost)		<< "but not beneficial with cost " << ore::NV("Cost", Cost)
<< " and threshold "		<< " and threshold "
<< ore::NV("Threshold", -SLPCostThreshold);		<< ore::NV("Threshold", -SLPCostThreshold);
});		});
break;		AdjustReducedVals();
		continue;
}		}

LLVM_DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:"		LLVM_DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:"
<< Cost << ". (HorRdx)\n");		<< Cost << ". (HorRdx)\n");
V.getORE()->emit([&]() {		V.getORE()->emit([&]() {
return OptimizationRemark(SV_NAME, "VectorizedHorizontalReduction",		return OptimizationRemark(SV_NAME, "VectorizedHorizontalReduction",
cast<Instruction>(VL[0]))		cast<Instruction>(VL[0]))
<< "Vectorized horizontal reduction with cost "		<< "Vectorized horizontal reduction with cost "
<< ore::NV("Cost", Cost) << " and with tree size "		<< ore::NV("Cost", Cost) << " and with tree size "
<< ore::NV("TreeSize", V.getTreeSize());		<< ore::NV("TreeSize", V.getTreeSize());
});		});

		Builder.setFastMathFlags(RdxFMF);

// Vectorize a tree.		// Vectorize a tree.
DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();		Value *VectorizedRoot = V.vectorizeTree(LocalExternallyUsedValues);
Value *VectorizedRoot = V.vectorizeTree(ExternallyUsedValues);

// Emit a reduction. If the root is a select (min/max idiom), the insert		// Emit a reduction. If the root is a select (min/max idiom), the insert
// point is the compare condition of that select.		// point is the compare condition of that select.
Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);		Instruction *RdxRootInst = cast<Instruction>(ReductionRoot);
if (isCmpSelMinMax(RdxRootInst))		if (isCmpSelMinMax(RdxRootInst))
Builder.SetInsertPoint(getCmpForMinMaxReduction(RdxRootInst));		Builder.SetInsertPoint(GetCmpForMinMaxReduction(RdxRootInst));
else		else
Builder.SetInsertPoint(RdxRootInst);		Builder.SetInsertPoint(RdxRootInst);

// To prevent poison from leaking across what used to be sequential, safe,		// To prevent poison from leaking across what used to be sequential,
// scalar boolean logic operations, the reduction operand must be frozen.		// safe, scalar boolean logic operations, the reduction operand must be
		// frozen.
if (isa<SelectInst>(RdxRootInst) && isBoolLogicOp(RdxRootInst))		if (isa<SelectInst>(RdxRootInst) && isBoolLogicOp(RdxRootInst))
VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);		VectorizedRoot = Builder.CreateFreeze(VectorizedRoot);

Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, TTI);

if (!VectorizedTree) {		if (!VectorizedTree) {
// Initialize the final value in the reduction.		// Initialize the final value in the reduction.
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
} else {		} else {
// Update the final value in the reduction.		// Update the final value in the reduction.
Builder.SetCurrentDebugLocation(Loc);		Builder.SetCurrentDebugLocation(
		cast<Instruction>(ReductionOps.front().front())->getDebugLoc());
VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,		VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,
ReducedSubTree, "op.rdx", ReductionOps);		ReducedSubTree, "op.rdx", ReductionOps);
}		}
		// Count vectorized reduced values to exclude them from final reduction.
		for (Value *V : VL)
		++VectorizedVals.try_emplace(TrackedToOrig.find(V)->second, 0)
		.first->getSecond();
i += ReduxWidth;		i += ReduxWidth;
		Start = i;
ReduxWidth = PowerOf2Floor(NumReducedVals - i);		ReduxWidth = PowerOf2Floor(NumReducedVals - i);
}		}
		}
if (VectorizedTree) {		if (VectorizedTree) {
// Finish the reduction.		// Need to add extra arguments and not vectorized possible reduction
for (; i < NumReducedVals; ++i) {		// values.
auto *I = cast<Instruction>(ReducedVals[i]);		// Try to avoid dependencies between the scalar reductions.
Builder.SetCurrentDebugLocation(I->getDebugLoc());		auto &&FinalGen =
VectorizedTree =		[this, &Builder,
createOp(Builder, RdxKind, VectorizedTree, I, "", ReductionOps);		&TrackedVals](ArrayRef<std::pair<Instruction , Value >> InstVals) {
		unsigned Sz = InstVals.size();
		SmallVector<std::pair<Instruction , Value >> ExtraReds(Sz / 2 +
		Sz % 2);
		for (unsigned I = 0, E = (Sz / 2) * 2; I < E; I += 2) {
		Instruction *RedOp = InstVals[I + 1].first;
		Builder.SetCurrentDebugLocation(RedOp->getDebugLoc());
		ReductionOpsListType Ops;
		if (auto *Sel = dyn_cast<SelectInst>(RedOp))
		Ops.emplace_back().push_back(Sel->getCondition());
		Ops.emplace_back().push_back(RedOp);
		Value *RdxVal1 = InstVals[I].second;
		Value *StableRdxVal1 = RdxVal1;
		auto It1 = TrackedVals.find(RdxVal1);
		if (It1 != TrackedVals.end())
		StableRdxVal1 = It1->second;
		Value *RdxVal2 = InstVals[I + 1].second;
		Value *StableRdxVal2 = RdxVal2;
		auto It2 = TrackedVals.find(RdxVal2);
		if (It2 != TrackedVals.end())
		StableRdxVal2 = It2->second;
		Value *ExtraRed = createOp(Builder, RdxKind, StableRdxVal1,
		StableRdxVal2, "op.rdx", Ops);
		ExtraReds[I / 2] = std::make_pair(InstVals[I].first, ExtraRed);
		}
		if (Sz % 2 == 1)
		ExtraReds[Sz / 2] = InstVals.back();
		return ExtraReds;
		};
		SmallVector<std::pair<Instruction , Value >> ExtraReductions;
		// Final reduction of not vectorized reduced values.
		for (unsigned I = 0, E = ReducedVals.size(); I < E; ++I) {
		RKSimonUnsubmitted Not Done Reply Inline Actions for-range loop? RKSimon: for-range loop?
		ArrayRef<Value *> Candidates = ReducedVals[I];
		for (unsigned Cnt = 0, NumReducedVals = Candidates.size();
		Cnt < NumReducedVals; ++Cnt) {
		RKSimonUnsubmitted Not Done Reply Inline Actions for-range loop? RKSimon: for-range loop?
		Value *RdxVal = Candidates[Cnt];
		auto It = VectorizedVals.find(RdxVal);
		if (It != VectorizedVals.end()) {
		--It->getSecond();
		if (It->second == 0)
		VectorizedVals.erase(It);
		continue;
}		}
for (auto &Pair : ExternallyUsedValues) {		Instruction *RedOp = ReducedValsToOps.find(RdxVal)->second;
		ExtraReductions.emplace_back(RedOp, RdxVal);
		}
		}
		for (const auto &Pair : ExtraArgs) {
// Add each externally used value to the final reduction.		// Add each externally used value to the final reduction.
for (auto *I : Pair.second) {		ExtraReductions.emplace_back(Pair.first, Pair.second);
Builder.SetCurrentDebugLocation(I->getDebugLoc());
VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,
Pair.first, "op.extra", I);
}		}
		// Iterate through all not-vectorized reduction values/extra arguments.
		while (ExtraReductions.size() > 1) {
		SmallVector<std::pair<Instruction , Value >> NewReds =
		FinalGen(ExtraReductions);
		ExtraReductions.swap(NewReds);
		}
		// Final reduction.
		if (ExtraReductions.size() == 1) {
		Instruction *RedOp = ExtraReductions.back().first;
		Builder.SetCurrentDebugLocation(RedOp->getDebugLoc());
		ReductionOpsListType Ops;
		if (auto *Sel = dyn_cast<SelectInst>(RedOp))
		Ops.emplace_back().push_back(Sel->getCondition());
		Ops.emplace_back().push_back(RedOp);
		Value *RdxVal = ExtraReductions.back().second;
		Value *StableRdxVal = RdxVal;
		auto It = TrackedVals.find(RdxVal);
		if (It != TrackedVals.end())
		StableRdxVal = It->second;
		VectorizedTree = createOp(Builder, RdxKind, VectorizedTree,
		StableRdxVal, "op.rdx", Ops);
}		}

ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);

// Mark all scalar reduction ops for deletion, they are replaced by the		// Mark all scalar reduction ops for deletion, they are replaced by the
// vector reductions.		// vector reductions.
V.eraseInstructions(IgnoreList);		V.eraseInstructions(IgnoreList);
}		}
return VectorizedTree;		return VectorizedTree;
}		}

unsigned numReductionValues() const { return ReducedVals.size(); }		unsigned numReductionValues() const { return ReducedVals.size(); }

private:		private:
/// Calculate the cost of a reduction.		/// Calculate the cost of a reduction.
InstructionCost getReductionCost(TargetTransformInfo *TTI,		InstructionCost getReductionCost(TargetTransformInfo *TTI,
Value *FirstReducedVal, unsigned ReduxWidth,		ArrayRef<Value *> ReducedVals,
FastMathFlags FMF) {		unsigned ReduxWidth, FastMathFlags FMF) {
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
		Value *FirstReducedVal = ReducedVals.front();
Type *ScalarTy = FirstReducedVal->getType();		Type *ScalarTy = FirstReducedVal->getType();
FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);		FixedVectorType *VectorTy = FixedVectorType::get(ScalarTy, ReduxWidth);
InstructionCost VectorCost, ScalarCost;		InstructionCost VectorCost = 0, ScalarCost;
		// If all of the reduced values are constant, the vector cost is 0, since
		// the reduction value can be calculated at the compile time.
		bool AllConsts = all_of(ReducedVals, isConstant);
switch (RdxKind) {		switch (RdxKind) {
case RecurKind::Add:		case RecurKind::Add:
case RecurKind::Mul:		case RecurKind::Mul:
case RecurKind::Or:		case RecurKind::Or:
case RecurKind::And:		case RecurKind::And:
case RecurKind::Xor:		case RecurKind::Xor:
case RecurKind::FAdd:		case RecurKind::FAdd:
case RecurKind::FMul: {		case RecurKind::FMul: {
unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(RdxKind);		unsigned RdxOpcode = RecurrenceDescriptor::getOpcode(RdxKind);
		if (!AllConsts)
VectorCost =		VectorCost =
TTI->getArithmeticReductionCost(RdxOpcode, VectorTy, FMF, CostKind);		TTI->getArithmeticReductionCost(RdxOpcode, VectorTy, FMF, CostKind);
ScalarCost = TTI->getArithmeticInstrCost(RdxOpcode, ScalarTy, CostKind);		ScalarCost = TTI->getArithmeticInstrCost(RdxOpcode, ScalarTy, CostKind);
break;		break;
}		}
case RecurKind::FMax:		case RecurKind::FMax:
case RecurKind::FMin: {		case RecurKind::FMin: {
auto *SclCondTy = CmpInst::makeCmpResultType(ScalarTy);		auto *SclCondTy = CmpInst::makeCmpResultType(ScalarTy);
auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));		if (!AllConsts) {
		auto *VecCondTy =
		cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));
VectorCost = TTI->getMinMaxReductionCost(VectorTy, VecCondTy,		VectorCost = TTI->getMinMaxReductionCost(VectorTy, VecCondTy,
/unsigned=/false, CostKind);		/unsigned=/false, CostKind);
		}
CmpInst::Predicate RdxPred = getMinMaxReductionPredicate(RdxKind);		CmpInst::Predicate RdxPred = getMinMaxReductionPredicate(RdxKind);
ScalarCost = TTI->getCmpSelInstrCost(Instruction::FCmp, ScalarTy,		ScalarCost = TTI->getCmpSelInstrCost(Instruction::FCmp, ScalarTy,
SclCondTy, RdxPred, CostKind) +		SclCondTy, RdxPred, CostKind) +
TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,		TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
SclCondTy, RdxPred, CostKind);		SclCondTy, RdxPred, CostKind);
break;		break;
}		}
case RecurKind::SMax:		case RecurKind::SMax:
case RecurKind::SMin:		case RecurKind::SMin:
case RecurKind::UMax:		case RecurKind::UMax:
case RecurKind::UMin: {		case RecurKind::UMin: {
auto *SclCondTy = CmpInst::makeCmpResultType(ScalarTy);		auto *SclCondTy = CmpInst::makeCmpResultType(ScalarTy);
auto *VecCondTy = cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));		if (!AllConsts) {
		auto *VecCondTy =
		cast<VectorType>(CmpInst::makeCmpResultType(VectorTy));
bool IsUnsigned =		bool IsUnsigned =
RdxKind == RecurKind::UMax \|\| RdxKind == RecurKind::UMin;		RdxKind == RecurKind::UMax \|\| RdxKind == RecurKind::UMin;
VectorCost = TTI->getMinMaxReductionCost(VectorTy, VecCondTy, IsUnsigned,		VectorCost = TTI->getMinMaxReductionCost(VectorTy, VecCondTy,
CostKind);		IsUnsigned, CostKind);
		}
CmpInst::Predicate RdxPred = getMinMaxReductionPredicate(RdxKind);		CmpInst::Predicate RdxPred = getMinMaxReductionPredicate(RdxKind);
ScalarCost = TTI->getCmpSelInstrCost(Instruction::ICmp, ScalarTy,		ScalarCost = TTI->getCmpSelInstrCost(Instruction::ICmp, ScalarTy,
SclCondTy, RdxPred, CostKind) +		SclCondTy, RdxPred, CostKind) +
TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,		TTI->getCmpSelInstrCost(Instruction::Select, ScalarTy,
SclCondTy, RdxPred, CostKind);		SclCondTy, RdxPred, CostKind);
break;		break;
}		}
default:		default:
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
/// attempted.		/// attempted.
/// \returns true if a horizontal reduction was matched and reduced or operands		/// \returns true if a horizontal reduction was matched and reduced or operands
/// of one of the binary instruction were vectorized.		/// of one of the binary instruction were vectorized.
/// \returns false if a horizontal reduction was not matched (or not possible)		/// \returns false if a horizontal reduction was not matched (or not possible)
/// or no vectorization of any binary operation feeding \a Root instruction was		/// or no vectorization of any binary operation feeding \a Root instruction was
/// performed.		/// performed.
static bool tryToVectorizeHorReductionOrInstOperands(		static bool tryToVectorizeHorReductionOrInstOperands(
PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,		PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI,		TargetTransformInfo *TTI, ScalarEvolution &SE, const DataLayout &DL,
const function_ref<bool(Instruction *, BoUpSLP &)> Vectorize) {		const function_ref<bool(Instruction *, BoUpSLP &)> Vectorize) {
if (!ShouldVectorizeHor)		if (!ShouldVectorizeHor)
return false;		return false;

if (!Root)		if (!Root)
return false;		return false;

if (Root->getParent() != BB \|\| isa<PHINode>(Root))		if (Root->getParent() != BB \|\| isa<PHINode>(Root))
Show All 10 Lines	static bool tryToVectorizeHorReductionOrInstOperands(
// Skip the analysis of CmpInsts.Compiler implements postanalysis of the		// Skip the analysis of CmpInsts.Compiler implements postanalysis of the
// CmpInsts so we can skip extra attempts in		// CmpInsts so we can skip extra attempts in
// tryToVectorizeHorReductionOrInstOperands and save compile time.		// tryToVectorizeHorReductionOrInstOperands and save compile time.
std::queue<std::pair<Instruction *, unsigned>> Stack;		std::queue<std::pair<Instruction *, unsigned>> Stack;
Stack.emplace(Root, 0);		Stack.emplace(Root, 0);
SmallPtrSet<Value *, 8> VisitedInstrs;		SmallPtrSet<Value *, 8> VisitedInstrs;
SmallVector<WeakTrackingVH> PostponedInsts;		SmallVector<WeakTrackingVH> PostponedInsts;
bool Res = false;		bool Res = false;
auto &&TryToReduce = [TTI, &P, &R](Instruction Inst, Value &B0,		auto &&TryToReduce = [TTI, &SE, &DL, &P, &R](Instruction Inst, Value &B0,
Value &B1) -> Value {		Value &B1) -> Value {
bool IsBinop = matchRdxBop(Inst, B0, B1);		bool IsBinop = matchRdxBop(Inst, B0, B1);
bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));		bool IsSelect = match(Inst, m_Select(m_Value(), m_Value(), m_Value()));
if (IsBinop \|\| IsSelect) {		if (IsBinop \|\| IsSelect) {
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, Inst))		if (HorRdx.matchAssociativeReduction(P, Inst, SE, DL))
return HorRdx.tryToReduce(R, TTI);		return HorRdx.tryToReduce(R, TTI);
}		}
return nullptr;		return nullptr;
};		};
while (!Stack.empty()) {		while (!Stack.empty()) {
Instruction *Inst;		Instruction *Inst;
unsigned Level;		unsigned Level;
std::tie(Inst, Level) = Stack.front();		std::tie(Inst, Level) = Stack.front();
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (!I)
return false;		return false;

if (!isa<BinaryOperator>(I))		if (!isa<BinaryOperator>(I))
P = nullptr;		P = nullptr;
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {		auto &&ExtraVectorization = [this](Instruction *I, BoUpSLP &R) -> bool {
return tryToVectorize(I, R);		return tryToVectorize(I, R);
};		};
return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI,		return tryToVectorizeHorReductionOrInstOperands(P, I, BB, R, TTI, SE, DL,
ExtraVectorization);		ExtraVectorization);
}		}

bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,		bool SLPVectorizerPass::vectorizeInsertValueInst(InsertValueInst *IVI,
BasicBlock *BB, BoUpSLP &R) {		BasicBlock *BB, BoUpSLP &R) {
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
if (!R.canMapToVector(IVI->getType(), DL))		if (!R.canMapToVector(IVI->getType(), DL))
return false;		return false;
▲ Show 20 Lines • Show All 568 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

	Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %add, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4sf(<4 x float> %t) {			define float @test_merge_anyof_v4sf(<4 x float> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4sf(			; CHECK-LABEL: @test_merge_anyof_v4sf(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x float> [[T:%.]], i32 3			; CHECK-NEXT: [[T_FR7:%.]] = freeze <4 x float> [[T:%.]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[T]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = fcmp olt <4 x float> [[T_FR7]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = fcmp ogt <4 x float> [[T_FR7]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[T]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = or <4 x i1> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x float> [[T]]			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
	; CHECK-NEXT: [[TMP4:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i4 [[TMP3]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T_FR7]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0			; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[SHIFT]], [[T_FR7]]
	; CHECK-NEXT: [[CMP19:%.*]] = fcmp ogt float [[TMP3]], 1.000000e+00			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP4]], i32 0
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[TMP6]], i1 true, i1 [[CMP19]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[ADD]], float 0.000000e+00
	; CHECK-NEXT: [[CMP24:%.*]] = fcmp ogt float [[TMP2]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP24]]
	; CHECK-NEXT: [[CMP29:%.*]] = fcmp ogt float [[TMP1]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP29]]
	; CHECK-NEXT: [[CMP34:%.*]] = fcmp ogt float [[TMP0]], 1.000000e+00
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP34]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x float> %t, i32 0			%vecext = extractelement <4 x float> %t, i32 0
	%conv = fpext float %vecext to double			%conv = fpext float %vecext to double
	%cmp = fcmp olt double %conv, 0.000000e+00			%cmp = fcmp olt double %conv, 0.000000e+00
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false

	▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	return:			return:
	%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]			%retval.0 = phi float [ 0.000000e+00, %if.then ], [ %conv, %if.end ]
	ret float %retval.0			ret float %retval.0
	}			}

	define float @test_merge_anyof_v4si(<4 x i32> %t) {			define float @test_merge_anyof_v4si(<4 x i32> %t) {
	; CHECK-LABEL: @test_merge_anyof_v4si(			; CHECK-LABEL: @test_merge_anyof_v4si(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x i32> [[T:%.]], i32 3			; CHECK-NEXT: [[T_FR7:%.]] = freeze <4 x i32> [[T:%.]]
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x i32> [[T]], i32 2			; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i32> [[T_FR7]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[T]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[T_FR7]], <i32 255, i32 255, i32 255, i32 255>
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[T]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = or <4 x i1> [[TMP1]], [[TMP0]]
	; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x i32> [[T]]			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i4 [[TMP3]], 0
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4			; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T_FR7]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[SHIFT]], [[T_FR7]]
	; CHECK-NEXT: [[CMP11:%.*]] = icmp sgt i32 [[TMP3]], 255			; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0
	; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[TMP6]], i1 true, i1 [[CMP11]]
	; CHECK-NEXT: [[CMP14:%.*]] = icmp sgt i32 [[TMP2]], 255
	; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP14]]
	; CHECK-NEXT: [[CMP17:%.*]] = icmp sgt i32 [[TMP1]], 255
	; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP17]]
	; CHECK-NEXT: [[CMP20:%.*]] = icmp sgt i32 [[TMP0]], 255
	; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP20]]
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP3]], [[TMP2]]
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
	; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[CONV]]			; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[DOTNOT]], float [[CONV]], float 0.000000e+00
	; CHECK-NEXT: ret float [[RETVAL_0]]			; CHECK-NEXT: ret float [[RETVAL_0]]
	;			;
	entry:			entry:
	%vecext = extractelement <4 x i32> %t, i32 0			%vecext = extractelement <4 x i32> %t, i32 0
	%cmp = icmp slt i32 %vecext, 1			%cmp = icmp slt i32 %vecext, 1
	br i1 %cmp, label %if.then, label %lor.lhs.false			br i1 %cmp, label %if.then, label %lor.lhs.false

	lor.lhs.false:			lor.lhs.false:
	▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

Show All 35 Lines	;
%x10 = add i32 %x1, %x0		%x10 = add i32 %x1, %x0
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x210 = add i32 %x2, %x10		%x210 = add i32 %x2, %x10
ret i32 %x210		ret i32 %x210
}		}

define i32 @ext_ext_partial_add_reduction_and_extra_add_v4i32(<4 x i32> %x, <4 x i32> %y) {		define i32 @ext_ext_partial_add_reduction_and_extra_add_v4i32(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: @ext_ext_partial_add_reduction_and_extra_add_v4i32(		; CHECK-LABEL: @ext_ext_partial_add_reduction_and_extra_add_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]], <4 x i32> <i32 4, i32 2, i32 5, i32 6>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[Y:%.]], <4 x i32> [[X:%.*]], <4 x i32> <i32 0, i32 1, i32 2, i32 6>
; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])
; CHECK-NEXT: ret i32 [[TMP2]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%y0 = extractelement <4 x i32> %y, i32 0		%y0 = extractelement <4 x i32> %y, i32 0
%y1 = extractelement <4 x i32> %y, i32 1		%y1 = extractelement <4 x i32> %y, i32 1
%y10 = add i32 %y1, %y0		%y10 = add i32 %y1, %y0
%y2 = extractelement <4 x i32> %y, i32 2		%y2 = extractelement <4 x i32> %y, i32 2
%y210 = add i32 %y2, %y10		%y210 = add i32 %y2, %y10
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	Show All 9 Lines

	define void @PR28330(i32 %n) {			define void @PR28330(i32 %n) {
	; DEFAULT-LABEL: @PR28330(			; DEFAULT-LABEL: @PR28330(
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]			; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
	; DEFAULT: for.body:			; DEFAULT: for.body:
	; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_RDX:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; DEFAULT-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]			; DEFAULT-NEXT: [[OP_RDX]] = add i32 [[TMP3]], [[P17]]
	; DEFAULT-NEXT: br label [[FOR_BODY]]			; DEFAULT-NEXT: br label [[FOR_BODY]]
	;			;
	; GATHER-LABEL: @PR28330(			; GATHER-LABEL: @PR28330(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_RDX:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]			; GATHER-NEXT: [[OP_RDX]] = add i32 [[TMP3]], [[P17]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR28330(			; MAX-COST-LABEL: @PR28330(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <4 x i8>*), align 1
	; MAX-COST-NEXT: [[P1:%.*]] = icmp eq i8 [[P0]], 0			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <4 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[P2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			; MAX-COST-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5) to <4 x i8>*), align 1
	; MAX-COST-NEXT: [[P3:%.*]] = icmp eq i8 [[P2]], 0			; MAX-COST-NEXT: [[TMP3:%.*]] = icmp eq <4 x i8> [[TMP2]], zeroinitializer
	; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0
	; MAX-COST-NEXT: [[P6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; MAX-COST-NEXT: [[P7:%.*]] = icmp eq i8 [[P6]], 0
	; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0
	; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[OP_RDX1:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[P19:%.*]] = select i1 [[P1]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P20:%.*]] = add i32 [[P17]], [[P19]]			; MAX-COST-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P21:%.*]] = select i1 [[P3]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	; MAX-COST-NEXT: [[P22:%.*]] = add i32 [[P20]], [[P21]]			; MAX-COST-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])
	; MAX-COST-NEXT: [[P23:%.*]] = select i1 [[P5]], i32 -720, i32 -80			; MAX-COST-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], [[TMP7]]
	; MAX-COST-NEXT: [[P24:%.*]] = add i32 [[P22]], [[P23]]			; MAX-COST-NEXT: [[OP_RDX1]] = add i32 [[OP_RDX]], [[P17]]
	; MAX-COST-NEXT: [[P25:%.*]] = select i1 [[P7]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P26:%.*]] = add i32 [[P24]], [[P25]]
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P28:%.*]] = add i32 [[P26]], [[P27]]
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P30:%.*]] = add i32 [[P28]], [[P29]]
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[P30]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	%p1 = icmp eq i8 %p0, 0			%p1 = icmp eq i8 %p0, 0
	%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	%p3 = icmp eq i8 %p2, 0			%p3 = icmp eq i8 %p2, 0
	%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	Show All 33 Lines

	define void @PR32038(i32 %n) {			define void @PR32038(i32 %n) {
	; DEFAULT-LABEL: @PR32038(			; DEFAULT-LABEL: @PR32038(
	; DEFAULT-NEXT: entry:			; DEFAULT-NEXT: entry:
	; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; DEFAULT-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; DEFAULT-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]			; DEFAULT-NEXT: br label [[FOR_BODY:%.*]]
	; DEFAULT: for.body:			; DEFAULT: for.body:
	; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; DEFAULT-NEXT: [[P17:%.]] = phi i32 [ [[OP_RDX:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; DEFAULT-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; DEFAULT-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; DEFAULT-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5			; DEFAULT-NEXT: [[OP_RDX]] = add i32 [[TMP3]], -5
	; DEFAULT-NEXT: br label [[FOR_BODY]]			; DEFAULT-NEXT: br label [[FOR_BODY]]
	;			;
	; GATHER-LABEL: @PR32038(			; GATHER-LABEL: @PR32038(
	; GATHER-NEXT: entry:			; GATHER-NEXT: entry:
	; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1			; GATHER-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer			; GATHER-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; GATHER-NEXT: br label [[FOR_BODY:%.*]]			; GATHER-NEXT: br label [[FOR_BODY:%.*]]
	; GATHER: for.body:			; GATHER: for.body:
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_RDX:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5			; GATHER-NEXT: [[OP_RDX]] = add i32 [[TMP3]], -5
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR32038(			; MAX-COST-LABEL: @PR32038(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <4 x i8>*), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <4 x i8>*), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <4 x i8> [[TMP0]], zeroinitializer			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <4 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1			; MAX-COST-NEXT: [[TMP2:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5) to <4 x i8>*), align 1
	; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0			; MAX-COST-NEXT: [[TMP3:%.*]] = icmp eq <4 x i8> [[TMP2]], zeroinitializer
	; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[OP_RDX1:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>			; MAX-COST-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP3]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	; MAX-COST-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP2]])			; MAX-COST-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])
	; MAX-COST-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], [[P27]]			; MAX-COST-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP6]], [[TMP7]]
	; MAX-COST-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[P29]]			; MAX-COST-NEXT: [[OP_RDX1]] = add i32 [[OP_RDX]], -5
	; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP5]], -5
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	%p1 = icmp eq i8 %p0, 0			%p1 = icmp eq i8 %p0, 0
	%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	%p3 = icmp eq i8 %p2, 0			%p3 = icmp eq i8 %p2, 0
	%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	Show All 33 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = sub <4 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 7, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 3, i32 6>
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[TMP6]], [[TMP3]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[TMP6]], [[TMP3]]
; CHECK-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i32> [[TMP8]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i32> [[TMP8]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP10:%.*]] = mul nuw <4 x i32> [[TMP9]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP10:%.*]] = mul nuw <4 x i32> [[TMP9]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP10]], [[TMP7]]		; CHECK-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP10]], [[TMP7]]
; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i32> [[TMP11]], [[TMP10]]		; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i32> [[TMP11]], [[TMP10]]
; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP12]])		; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP12]])
; CHECK-NEXT: ret i32 [[TMP13]]		; CHECK-NEXT: ret i32 [[TMP13]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = sub <4 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 7, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 3, i32 6>
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x i32> [[TMP4]], <4 x i32> [[TMP5]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[TMP6]], [[TMP3]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[TMP6]], [[TMP3]]
; CHECK-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP8:%.*]] = lshr <4 x i32> [[TMP7]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i32> [[TMP8]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP9:%.*]] = and <4 x i32> [[TMP8]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP10:%.*]] = mul nuw <4 x i32> [[TMP9]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP10:%.*]] = mul nuw <4 x i32> [[TMP9]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP10]], [[TMP7]]		; CHECK-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP10]], [[TMP7]]
; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i32> [[TMP11]], [[TMP10]]		; CHECK-NEXT: [[TMP12:%.*]] = xor <4 x i32> [[TMP11]], [[TMP10]]
; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP12]])		; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP12]])
; CHECK-NEXT: ret i32 [[TMP13]]		; CHECK-NEXT: ret i32 [[TMP13]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/horizontal-store.ll

	Show All 16 Lines
	; GFX9-LABEL: @smaxv6(			; GFX9-LABEL: @smaxv6(
	; GFX9-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; GFX9-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; GFX9-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; GFX9-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; GFX9-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; GFX9-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; GFX9-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; GFX9-NEXT: [[CMP1:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; GFX9-NEXT: [[SELECT1:%.*]] = select i1 [[CMP1]], i32 [[TMP2]], i32 [[TMP3]]			; GFX9-NEXT: [[SELECT1:%.*]] = select i1 [[CMP1]], i32 [[TMP2]], i32 [[TMP3]]
	; GFX9-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; GFX9-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; GFX9-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])			; GFX9-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])
	; GFX9-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP5]], [[SELECT1]]			; GFX9-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP5]], [[SELECT1]]
	; GFX9-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP5]], i32 [[SELECT1]]			; GFX9-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP5]], i32 [[SELECT1]]
	; GFX9-NEXT: [[STORE_SELECT:%.*]] = select i1 [[CMP1]], i32 3, i32 4			; GFX9-NEXT: [[STORE_SELECT:%.*]] = select i1 [[CMP1]], i32 3, i32 4
	; GFX9-NEXT: store i32 [[STORE_SELECT]], i32* @var, align 8			; GFX9-NEXT: store i32 [[STORE_SELECT]], i32* @var, align 8
	; GFX9-NEXT: ret i32 [[OP_EXTRA1]]			; GFX9-NEXT: ret i32 [[OP_RDX1]]
	;			;
	%load1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%load1 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%load2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%load2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%cmp1 = icmp sgt i32 %load1, %load2			%cmp1 = icmp sgt i32 %load1, %load2
	%select1 = select i1 %cmp1, i32 %load1, i32 %load2			%select1 = select i1 %cmp1, i32 %load1, i32 %load2

	%load3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%load3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%cmp2 = icmp sgt i32 %select1, %load3			%cmp2 = icmp sgt i32 %select1, %load3
	Show All 20 Lines
	; GFX9-LABEL: @sminv6(			; GFX9-LABEL: @sminv6(
	; GFX9-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([32 x i64]* @arr64 to <2 x i64>*), align 16			; GFX9-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> bitcast ([32 x i64]* @arr64 to <2 x i64>*), align 16
	; GFX9-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0			; GFX9-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0
	; GFX9-NEXT: [[TMP3:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1			; GFX9-NEXT: [[TMP3:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1
	; GFX9-NEXT: [[CMP1:%.*]] = icmp slt i64 [[TMP2]], [[TMP3]]			; GFX9-NEXT: [[CMP1:%.*]] = icmp slt i64 [[TMP2]], [[TMP3]]
	; GFX9-NEXT: [[SELECT1:%.*]] = select i1 [[CMP1]], i64 [[TMP2]], i64 [[TMP3]]			; GFX9-NEXT: [[SELECT1:%.*]] = select i1 [[CMP1]], i64 [[TMP2]], i64 [[TMP3]]
	; GFX9-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 2) to <4 x i64>*), align 16			; GFX9-NEXT: [[TMP4:%.]] = load <4 x i64>, <4 x i64> bitcast (i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 2) to <4 x i64>*), align 16
	; GFX9-NEXT: [[TMP5:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[TMP4]])			; GFX9-NEXT: [[TMP5:%.*]] = call i64 @llvm.vector.reduce.smin.v4i64(<4 x i64> [[TMP4]])
	; GFX9-NEXT: [[OP_EXTRA:%.*]] = icmp slt i64 [[TMP5]], [[SELECT1]]			; GFX9-NEXT: [[OP_RDX:%.*]] = icmp slt i64 [[TMP5]], [[SELECT1]]
	; GFX9-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i64 [[TMP5]], i64 [[SELECT1]]			; GFX9-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i64 [[TMP5]], i64 [[SELECT1]]
	; GFX9-NEXT: [[STORE_SELECT:%.*]] = select i1 [[CMP1]], i64 3, i64 4			; GFX9-NEXT: [[STORE_SELECT:%.*]] = select i1 [[CMP1]], i64 3, i64 4
	; GFX9-NEXT: store i64 [[STORE_SELECT]], i64* @var64, align 8			; GFX9-NEXT: store i64 [[STORE_SELECT]], i64* @var64, align 8
	; GFX9-NEXT: ret i64 [[OP_EXTRA1]]			; GFX9-NEXT: ret i64 [[OP_RDX1]]
	;			;
	%load1 = load i64, i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 0), align 16			%load1 = load i64, i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 0), align 16
	%load2 = load i64, i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 1), align 8			%load2 = load i64, i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 1), align 8
	%cmp1 = icmp slt i64 %load1, %load2			%cmp1 = icmp slt i64 %load1, %load2
	%select1 = select i1 %cmp1, i64 %load1, i64 %load2			%select1 = select i1 %cmp1, i64 %load1, i64 %load2

	%load3 = load i64, i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 2), align 16			%load3 = load i64, i64* getelementptr inbounds ([32 x i64], [32 x i64]* @arr64, i64 0, i64 2), align 16
	%cmp2 = icmp slt i64 %select1, %load3			%cmp2 = icmp slt i64 %select1, %load3
	▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	; GFX9-LABEL: @smax_wdiff_valuenum(			; GFX9-LABEL: @smax_wdiff_valuenum(
	; GFX9-NEXT: [[VLOAD:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; GFX9-NEXT: [[VLOAD:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; GFX9-NEXT: [[ELT1:%.*]] = extractelement <2 x i32> [[VLOAD]], i32 0			; GFX9-NEXT: [[ELT1:%.*]] = extractelement <2 x i32> [[VLOAD]], i32 0
	; GFX9-NEXT: [[CMP1:%.]] = icmp sgt i32 [[ELT1]], [[V1:%.]]			; GFX9-NEXT: [[CMP1:%.]] = icmp sgt i32 [[ELT1]], [[V1:%.]]
	; GFX9-NEXT: [[EX0:%.*]] = extractelement <2 x i32> [[VLOAD]], i32 0			; GFX9-NEXT: [[EX0:%.*]] = extractelement <2 x i32> [[VLOAD]], i32 0
	; GFX9-NEXT: [[SELECT1:%.*]] = select i1 [[CMP1]], i32 [[EX0]], i32 [[V1]]			; GFX9-NEXT: [[SELECT1:%.*]] = select i1 [[CMP1]], i32 [[EX0]], i32 [[V1]]
	; GFX9-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; GFX9-NEXT: [[TMP2:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; GFX9-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP2]])			; GFX9-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP2]])
	; GFX9-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP3]], [[SELECT1]]			; GFX9-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP3]], [[SELECT1]]
	; GFX9-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP3]], i32 [[SELECT1]]			; GFX9-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP3]], i32 [[SELECT1]]
	; GFX9-NEXT: [[STOREVAL:%.*]] = select i1 [[CMP1]], i32 3, i32 4			; GFX9-NEXT: [[STOREVAL:%.*]] = select i1 [[CMP1]], i32 3, i32 4
	; GFX9-NEXT: store i32 [[STOREVAL]], i32* @var, align 8			; GFX9-NEXT: store i32 [[STOREVAL]], i32* @var, align 8
	; GFX9-NEXT: ret i32 [[OP_EXTRA1]]			; GFX9-NEXT: ret i32 [[OP_RDX1]]
	;			;
	%vload = load <2 x i32>, <2 x i32>* bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			%vload = load <2 x i32>, <2 x i32>* bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	%elt1 = extractelement <2 x i32> %vload, i32 0			%elt1 = extractelement <2 x i32> %vload, i32 0
	%cmp1 = icmp sgt i32 %elt1, %v1			%cmp1 = icmp sgt i32 %elt1, %v1
	%ex0 = extractelement <2 x i32> %vload, i32 0			%ex0 = extractelement <2 x i32> %vload, i32 0
	%select1 = select i1 %cmp1, i32 %ex0, i32 %v1			%select1 = select i1 %cmp1, i32 %ex0, i32 %v1

	%load3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%load3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35628_1.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	define void @mainTest(i32* %ptr) #0 {			define void @mainTest(i32* %ptr) #0 {
	; CHECK-LABEL: @mainTest(			; CHECK-LABEL: @mainTest(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[PTR:%.*]], null			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[PTR:%.*]], null
	; CHECK-NEXT: br i1 [[CMP]], label [[LOOP:%.]], label [[BAIL_OUT:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[LOOP:%.]], label [[BAIL_OUT:%.]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA3:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ [[OP_RDX3:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 1			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 2			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 3			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[PTR]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[PTR]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[PTR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i32> [[TMP4]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x i32> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i32> [[TMP4]], [[TMP4]]			; CHECK-NEXT: [[TMP8:%.*]] = mul <4 x i32> [[TMP4]], [[TMP4]]
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP6]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP6]] to i64
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP10]], 1			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP7]], [[TMP6]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = add i32 [[OP_EXTRA]], [[TMP7]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[TMP5]], 1
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = add i32 [[OP_EXTRA1]], [[TMP6]]			; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[OP_EXTRA3]] = add i32 [[OP_EXTRA2]], [[TMP5]]			; CHECK-NEXT: [[OP_RDX3]] = add i32 [[TMP10]], [[OP_RDX2]]
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	; CHECK: bail_out:			; CHECK: bail_out:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%cmp = icmp eq i32* %ptr, null			%cmp = icmp eq i32* %ptr, null
	br i1 %cmp, label %loop, label %bail_out			br i1 %cmp, label %loop, label %bail_out

	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA1:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[OP_RDX1:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi i64 [ 2, [[ENTRY]] ], [ [[TMP3:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi i64 [ 2, [[ENTRY]] ], [ [[TMP3:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[DUMMY_ADD:%.*]] = add i16 0, 0			; CHECK-NEXT: [[DUMMY_ADD:%.*]] = add i16 0, 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[TMP0]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i64> [[SHUFFLE]], <i64 3, i64 2, i64 1, i64 0>			; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i64> [[SHUFFLE]], <i64 2, i64 3, i64 1, i64 0>
	; CHECK-NEXT: [[TMP3]] = extractelement <4 x i64> [[TMP2]], i32 3			; CHECK-NEXT: [[TMP3]] = extractelement <4 x i64> [[TMP2]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP2]], i32 1
	; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP4]], 32			; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP4]], 32
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.*]] = ashr exact <4 x i64> [[TMP5]], <i64 32, i64 32, i64 32, i64 32>			; CHECK-NEXT: [[TMP6:%.*]] = ashr exact <4 x i64> [[TMP5]], <i64 32, i64 32, i64 32, i64 32>
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP6]])			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP6]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP7]], 0			; CHECK-NEXT: [[OP_RDX:%.*]] = add i64 [[TMP3]], 0
	; CHECK-NEXT: [[OP_EXTRA1]] = add i64 [[OP_EXTRA]], [[TMP3]]			; CHECK-NEXT: [[OP_RDX1]] = add i64 [[TMP7]], [[OP_RDX]]
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%dummy_phi = phi i64 [ 1, %entry ], [ %last, %loop ]			%dummy_phi = phi i64 [ 1, %entry ], [ %last, %loop ]
	%0 = phi i64 [ 2, %entry ], [ %fork, %loop ]			%0 = phi i64 [ 2, %entry ], [ %fork, %loop ]
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP0]], i32 1
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP0]], i32 2
				; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP0]], i32 3
				; CHECK-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP0]], i32 4
				; CHECK-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP0]], i32 5
				; CHECK-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP0]], i32 6
				; CHECK-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP0]], i32 7
				; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
				; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[TMP0]], i32 1
				; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP0]], i32 2
				; CHECK-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[TMP0]], i32 3
				; CHECK-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP0]], i32 4
				; CHECK-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[TMP0]], i32 5
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP0]], i32 6
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP0]], i32 7
				; CHECK-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP16]], i32 [[TMP0]], i32 8
				; CHECK-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP0]], i32 9
				; CHECK-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP0]], i32 10
				; CHECK-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP0]], i32 11
				; CHECK-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP0]], i32 12
				; CHECK-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP0]], i32 13
				; CHECK-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP0]], i32 14
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP0]], i32 15
				; CHECK-NEXT: [[TMP25:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0]], i32 1
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP26:%.]] = phi <2 x i32> [ [[TMP45:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP26]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP28:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP3]])			; CHECK-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP24]])
	; CHECK-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP4]], [[TMP0:%.]]			; CHECK-NEXT: [[TMP30:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP8]])
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[TMP31:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP28]])
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP29]], i32 0
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; CHECK-NEXT: [[TMP33:%.*]] = insertelement <2 x i32> [[TMP25]], i32 [[TMP30]], i32 0
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; CHECK-NEXT: [[TMP34:%.*]] = and <2 x i32> [[TMP32]], [[TMP33]]
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]			; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP27]]
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; CHECK-NEXT: [[TMP35:%.*]] = insertelement <2 x i32> poison, i32 [[TMP31]], i32 0
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; CHECK-NEXT: [[TMP36:%.*]] = insertelement <2 x i32> [[TMP35]], i32 [[OP_RDX3]], i32 1
	; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; CHECK-NEXT: [[TMP37:%.*]] = and <2 x i32> [[TMP34]], [[TMP36]]
	; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]			; CHECK-NEXT: [[TMP38:%.*]] = extractelement <2 x i32> [[TMP37]], i32 0
	; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]			; CHECK-NEXT: [[TMP39:%.*]] = extractelement <2 x i32> [[TMP37]], i32 1
	; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]			; CHECK-NEXT: [[TMP40:%.*]] = insertelement <2 x i32> poison, i32 [[TMP38]], i32 0
	; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]			; CHECK-NEXT: [[TMP41:%.*]] = insertelement <2 x i32> [[TMP40]], i32 [[TMP27]], i32 1
	; CHECK-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]			; CHECK-NEXT: [[TMP42:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[TMP39]], i32 0
	; CHECK-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]			; CHECK-NEXT: [[TMP43:%.*]] = and <2 x i32> [[TMP41]], [[TMP42]]
	; CHECK-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]			; CHECK-NEXT: [[TMP44:%.*]] = add <2 x i32> [[TMP41]], [[TMP42]]
	; CHECK-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]			; CHECK-NEXT: [[TMP45]] = shufflevector <2 x i32> [[TMP43]], <2 x i32> [[TMP44]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_EXTRA26]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = and <2 x i32> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
				; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
				; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[TMP0]], i32 1
				; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[TMP0]], i32 2
				; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[TMP0]], i32 3
				; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[TMP0]], i32 4
				; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[TMP0]], i32 5
				; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[TMP0]], i32 6
				; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[TMP0]], i32 7
				; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
				; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[TMP0]], i32 1
				; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[TMP0]], i32 2
				; FORCE_REDUCTION-NEXT: [[TMP12:%.*]] = insertelement <16 x i32> [[TMP11]], i32 [[TMP0]], i32 3
				; FORCE_REDUCTION-NEXT: [[TMP13:%.*]] = insertelement <16 x i32> [[TMP12]], i32 [[TMP0]], i32 4
				; FORCE_REDUCTION-NEXT: [[TMP14:%.*]] = insertelement <16 x i32> [[TMP13]], i32 [[TMP0]], i32 5
				; FORCE_REDUCTION-NEXT: [[TMP15:%.*]] = insertelement <16 x i32> [[TMP14]], i32 [[TMP0]], i32 6
				; FORCE_REDUCTION-NEXT: [[TMP16:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP0]], i32 7
				; FORCE_REDUCTION-NEXT: [[TMP17:%.*]] = insertelement <16 x i32> [[TMP16]], i32 [[TMP0]], i32 8
				; FORCE_REDUCTION-NEXT: [[TMP18:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP0]], i32 9
				; FORCE_REDUCTION-NEXT: [[TMP19:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP0]], i32 10
				; FORCE_REDUCTION-NEXT: [[TMP20:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP0]], i32 11
				; FORCE_REDUCTION-NEXT: [[TMP21:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP0]], i32 12
				; FORCE_REDUCTION-NEXT: [[TMP22:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP0]], i32 13
				; FORCE_REDUCTION-NEXT: [[TMP23:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP0]], i32 14
				; FORCE_REDUCTION-NEXT: [[TMP24:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP0]], i32 15
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP12:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP25:%.]] = phi <2 x i32> [ [[TMP36:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP25]], <2 x i32> poison, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0>
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[SHUFFLE]], i32 1			; FORCE_REDUCTION-NEXT: [[TMP26:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240>			; FORCE_REDUCTION-NEXT: [[TMP27:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 1496, i32 1240, i32 285, i32 55, i32 0, i32 13685, i32 12529, i32 8555>
	; FORCE_REDUCTION-NEXT: [[VAL_20:%.*]] = add i32 [[TMP2]], 1496			; FORCE_REDUCTION-NEXT: [[TMP28:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP24]])
	; FORCE_REDUCTION-NEXT: [[VAL_34:%.*]] = add i32 [[TMP2]], 8555			; FORCE_REDUCTION-NEXT: [[TMP29:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP8]])
	; FORCE_REDUCTION-NEXT: [[TMP4:%.*]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[TMP3]])			; FORCE_REDUCTION-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP28]], [[TMP29]]
	; FORCE_REDUCTION-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], [[VAL_20]]			; FORCE_REDUCTION-NEXT: [[TMP30:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP27]])
	; FORCE_REDUCTION-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], [[VAL_34]]			; FORCE_REDUCTION-NEXT: [[OP_RDX13:%.*]] = and i32 [[TMP0]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA:%.]] = and i32 [[TMP6]], [[TMP0:%.]]			; FORCE_REDUCTION-NEXT: [[OP_RDX14:%.*]] = and i32 [[TMP0]], [[TMP26]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA1:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_RDX15:%.*]] = and i32 [[OP_RDX13]], [[OP_RDX14]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA2:%.*]] = and i32 [[OP_EXTRA1]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[OP_RDX16:%.*]] = and i32 [[TMP30]], [[OP_RDX15]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA3:%.*]] = and i32 [[OP_EXTRA2]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP31:%.*]] = insertelement <2 x i32> poison, i32 [[TMP26]], i32 0
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA4:%.*]] = and i32 [[OP_EXTRA3]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP32:%.*]] = insertelement <2 x i32> [[TMP31]], i32 [[OP_RDX]], i32 1
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA4]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP33:%.*]] = insertelement <2 x i32> <i32 14910, i32 poison>, i32 [[OP_RDX16]], i32 1
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP34:%.*]] = add <2 x i32> [[TMP32]], [[TMP33]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP35:%.*]] = and <2 x i32> [[TMP32]], [[TMP33]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; FORCE_REDUCTION-NEXT: [[TMP36]] = shufflevector <2 x i32> [[TMP34]], <2 x i32> [[TMP35]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; FORCE_REDUCTION-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP2]]
	; FORCE_REDUCTION-NEXT: [[VAL_39:%.*]] = add i32 [[TMP2]], 12529
	; FORCE_REDUCTION-NEXT: [[VAL_40:%.*]] = and i32 [[OP_EXTRA27]], [[VAL_39]]
	; FORCE_REDUCTION-NEXT: [[VAL_41:%.*]] = add i32 [[TMP2]], 13685
	; FORCE_REDUCTION-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> poison, i32 [[VAL_40]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 [[TMP2]], i32 1
	; FORCE_REDUCTION-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[VAL_41]], i32 0
	; FORCE_REDUCTION-NEXT: [[TMP10:%.*]] = and <2 x i32> [[TMP8]], [[TMP9]]
	; FORCE_REDUCTION-NEXT: [[TMP11:%.*]] = add <2 x i32> [[TMP8]], [[TMP9]]
	; FORCE_REDUCTION-NEXT: [[TMP12]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> [[TMP11]], <2 x i32> <i32 0, i32 3>
	; FORCE_REDUCTION-NEXT: br label [[LOOP]]			; FORCE_REDUCTION-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]			%local_4_39.us = phi i32 [ %val_42, %loop ], [ 0, %entry ]
	%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]			%local_8_43.us = phi i32 [ %val_43, %loop ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s

	define void @mainTest(i32 %param, i32 * %vals, i32 %len) {			define void @mainTest(i32 %param, i32 * %vals, i32 %len) {
	; CHECK-LABEL: @mainTest(			; CHECK-LABEL: @mainTest(
	; CHECK-NEXT: bci_15.preheader:			; CHECK-NEXT: bci_15.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 poison>, i32 [[PARAM:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 poison, i32 31>, i32 [[PARAM:%.]], i32 0
	; CHECK-NEXT: br label [[BCI_15:%.*]]			; CHECK-NEXT: br label [[BCI_15:%.*]]
	; CHECK: bci_15:			; CHECK: bci_15:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <16 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4			; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>			; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 -1, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]			; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP5]], [[TMP2]]
	; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16			; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[V44]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[OP_RDX]], i32 0
	; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1			; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[V44]], i32 1
	; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bci_15.preheader:			bci_15.preheader:
	br label %bci_15			br label %bci_15

	bci_15: ; preds = %bci_15.preheader, %bci_15			bci_15: ; preds = %bci_15.preheader, %bci_15
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	;
%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
ret <4 x i8> %ins4		ret <4 x i8> %ins4
}		}

define i8 @i(<4 x i8> %x, <4 x i8> %y) {		define i8 @i(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @i(		; CHECK-LABEL: @i(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 0, i32 3, i32 5, i32 6>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[Y:%.]], <4 x i8> [[X:%.*]], <4 x i32> <i32 2, i32 1, i32 7, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = call i8 @llvm.vector.reduce.add.v4i8(<4 x i8> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call i8 @llvm.vector.reduce.add.v4i8(<4 x i8> [[TMP2]])
; CHECK-NEXT: ret i8 [[TMP3]]		; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	;
%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1		%ins2 = insertelement <4 x i8> %ins1, i8 %x3x3, i32 1
%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2		%ins3 = insertelement <4 x i8> %ins2, i8 %y1y1, i32 2
%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3		%ins4 = insertelement <4 x i8> %ins3, i8 %y2y2, i32 3
ret <4 x i8> %ins4		ret <4 x i8> %ins4
}		}

define i8 @i(<4 x i8> %x, <4 x i8> %y) {		define i8 @i(<4 x i8> %x, <4 x i8> %y) {
; CHECK-LABEL: @i(		; CHECK-LABEL: @i(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[X:%.]], <4 x i8> [[Y:%.*]], <4 x i32> <i32 0, i32 3, i32 5, i32 6>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i8> [[Y:%.]], <4 x i8> [[X:%.*]], <4 x i32> <i32 2, i32 1, i32 7, i32 4>
; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = mul <4 x i8> [[TMP1]], [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = call i8 @llvm.vector.reduce.add.v4i8(<4 x i8> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call i8 @llvm.vector.reduce.add.v4i8(<4 x i8> [[TMP2]])
; CHECK-NEXT: ret i8 [[TMP3]]		; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%y1 = extractelement <4 x i8> %y, i32 1		%y1 = extractelement <4 x i8> %y, i32 1
%y2 = extractelement <4 x i8> %y, i32 2		%y2 = extractelement <4 x i8> %y, i32 2
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s

	define i32 @crash_reordering_undefs() {			define i32 @crash_reordering_undefs() {
	; CHECK-LABEL: @crash_reordering_undefs(			; CHECK-LABEL: @crash_reordering_undefs(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]			; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]
	; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD1:%.*]] = add i32 undef, [[ADD0]]
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef
	; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD3:%.*]] = add i32 [[ADD1]], [[ADD2]]
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef
	; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD5:%.*]] = add i32 [[ADD3]], [[ADD4]]
	; CHECK-NEXT: [[ADD6:%.*]] = add i32 [[ADD5]], undef
	; CHECK-NEXT: [[ADD7:%.*]] = add i32 [[ADD6]], undef
	; CHECK-NEXT: [[ADD8:%.*]] = add i32 [[ADD7]], undef
	; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef
	; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]			; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]
	; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537			; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537
	; CHECK-NEXT: [[ADD10:%.*]] = add i32 [[ADD8]], [[ADD9]]			; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)
	; CHECK-NEXT: [[ADD11:%.*]] = add i32 [[ADD10]], undef			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 undef, [[ADD0]]
	; CHECK-NEXT: ret i32 [[ADD11]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[ADD2]], [[ADD4]]
				; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]
				; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[OP_RDX2]], [[ADD9]]
				; CHECK-NEXT: [[OP_RDX4:%.*]] = add i32 [[TMP0]], [[OP_RDX3]]
				; CHECK-NEXT: ret i32 [[OP_RDX4]]
	;			;
	entry:			entry:
	%or0 = or i64 undef, undef			%or0 = or i64 undef, undef
	%cmp0 = icmp eq i64 undef, %or0			%cmp0 = icmp eq i64 undef, %or0
	%add0 = select i1 %cmp0, i32 65536, i32 65537			%add0 = select i1 %cmp0, i32 65536, i32 65537
	%add1 = add i32 undef, %add0			%add1 = add i32 undef, %add0
	%cmp1 = icmp eq i64 undef, undef			%cmp1 = icmp eq i64 undef, undef
	%add2 = select i1 %cmp1, i32 65536, i32 65537			%add2 = select i1 %cmp1, i32 65536, i32 65537
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s		; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s
; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 -slp-threshold=-10 \| FileCheck %s --check-prefix=THRESHOLD		; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 -slp-threshold=-10 \| FileCheck %s --check-prefix=THRESHOLD

@n = external local_unnamed_addr global i32, align 4		@n = external local_unnamed_addr global i32, align 4
@arr = common local_unnamed_addr global [20 x float] zeroinitializer, align 16		@arr = common local_unnamed_addr global [20 x float] zeroinitializer, align 16
@arr1 = common local_unnamed_addr global [20 x float] zeroinitializer, align 16		@arr1 = common local_unnamed_addr global [20 x float] zeroinitializer, align 16
@res = external local_unnamed_addr global float, align 4		@res = external local_unnamed_addr global float, align 4

define float @baz() {		define float @baz() {
; CHECK-LABEL: @baz(		; CHECK-LABEL: @baz(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3		; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr to <2 x float>*), align 16		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[SHUFFLE]])
; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[CONV]], [[CONV]]
; CHECK-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[TMP4]], [[OP_RDX]]
; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]		; CHECK-NEXT: store float [[OP_RDX1]], float* @res, align 4
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; CHECK-NEXT: ret float [[OP_RDX1]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; CHECK-NEXT: [[TMP11:%.*]] = insertelement <8 x float> poison, float [[TMP10]], i32 0
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[TMP9]], i32 1
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <8 x float> [[TMP12]], float [[TMP5]], i32 2
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP13]], float [[TMP4]], i32 3
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <8 x float> [[TMP14]], float [[TMP10]], i32 4
; CHECK-NEXT: [[TMP16:%.*]] = insertelement <8 x float> [[TMP15]], float [[TMP9]], i32 5
; CHECK-NEXT: [[TMP17:%.*]] = insertelement <8 x float> [[TMP16]], float [[TMP5]], i32 6
; CHECK-NEXT: [[TMP18:%.*]] = insertelement <8 x float> [[TMP17]], float [[TMP4]], i32 7
; CHECK-NEXT: [[TMP19:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP18]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP19]], [[CONV]]
; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
; CHECK-NEXT: ret float [[OP_EXTRA1]]
;		;
; THRESHOLD-LABEL: @baz(		; THRESHOLD-LABEL: @baz(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3		; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr to <2 x float>*), align 16		; THRESHOLD-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr to <4 x float>*), align 16
; THRESHOLD-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> bitcast ([20 x float]* @arr1 to <2 x float>*), align 16		; THRESHOLD-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> bitcast ([20 x float]* @arr1 to <4 x float>*), align 16
; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP1]]		; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[TMP2]], [[TMP1]]
; THRESHOLD-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0		; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
; THRESHOLD-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1		; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[SHUFFLE]])
; THRESHOLD-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 2) to <2 x float>*), align 8		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[CONV]], [[CONV]]
; THRESHOLD-NEXT: [[TMP7:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 2) to <2 x float>*), align 8		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[TMP4]], [[OP_RDX]]
; THRESHOLD-NEXT: [[TMP8:%.*]] = fmul fast <2 x float> [[TMP7]], [[TMP6]]		; THRESHOLD-NEXT: store float [[OP_RDX1]], float* @res, align 4
; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; THRESHOLD-NEXT: ret float [[OP_RDX1]]
; THRESHOLD-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; THRESHOLD-NEXT: [[TMP11:%.*]] = insertelement <8 x float> poison, float [[TMP10]], i32 0
; THRESHOLD-NEXT: [[TMP12:%.*]] = insertelement <8 x float> [[TMP11]], float [[TMP9]], i32 1
; THRESHOLD-NEXT: [[TMP13:%.*]] = insertelement <8 x float> [[TMP12]], float [[TMP5]], i32 2
; THRESHOLD-NEXT: [[TMP14:%.*]] = insertelement <8 x float> [[TMP13]], float [[TMP4]], i32 3
; THRESHOLD-NEXT: [[TMP15:%.*]] = insertelement <8 x float> [[TMP14]], float [[TMP10]], i32 4
; THRESHOLD-NEXT: [[TMP16:%.*]] = insertelement <8 x float> [[TMP15]], float [[TMP9]], i32 5
; THRESHOLD-NEXT: [[TMP17:%.*]] = insertelement <8 x float> [[TMP16]], float [[TMP5]], i32 6
; THRESHOLD-NEXT: [[TMP18:%.*]] = insertelement <8 x float> [[TMP17]], float [[TMP4]], i32 7
; THRESHOLD-NEXT: [[TMP19:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP18]])
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP19]], [[CONV]]
; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]
; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4
; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]
;		;
entry:		entry:
%0 = load i32, i32* @n, align 4		%0 = load i32, i32* @n, align 4
%mul = mul nsw i32 %0, 3		%mul = mul nsw i32 %0, 3
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16		%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
%mul4 = fmul fast float %2, %1		%mul4 = fmul fast float %2, %1
Show All 26 Lines
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3		; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16		; CHECK-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2		; CHECK-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2
; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float		; CHECK-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float
; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])		; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[CONV]], [[CONV6]]
; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[TMP4]], [[OP_RDX]]
; CHECK-NEXT: store float [[OP_EXTRA1]], float* @res, align 4		; CHECK-NEXT: store float [[OP_RDX1]], float* @res, align 4
; CHECK-NEXT: ret float [[OP_EXTRA1]]		; CHECK-NEXT: ret float [[OP_RDX1]]
;		;
; THRESHOLD-LABEL: @bazz(		; THRESHOLD-LABEL: @bazz(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4		; THRESHOLD-NEXT: [[TMP0:%.]] = load i32, i32 @n, align 4
; THRESHOLD-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 3
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16		; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr to <8 x float>*), align 16
; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16		; THRESHOLD-NEXT: [[TMP2:%.]] = load <8 x float>, <8 x float> bitcast ([20 x float]* @arr1 to <8 x float>*), align 16
; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]		; THRESHOLD-NEXT: [[TMP3:%.*]] = fmul fast <8 x float> [[TMP2]], [[TMP1]]
; THRESHOLD-NEXT: [[MUL5:%.*]] = shl nsw i32 [[TMP0]], 2		; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP0]], i32 0
; THRESHOLD-NEXT: [[CONV6:%.*]] = sitofp i32 [[MUL5]] to float		; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP0]], i32 1
; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])		; THRESHOLD-NEXT: [[TMP6:%.*]] = mul nsw <2 x i32> [[TMP5]], <i32 3, i32 2>
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP4]], [[CONV]]		; THRESHOLD-NEXT: [[TMP7:%.*]] = shl nsw <2 x i32> [[TMP5]], <i32 3, i32 2>
; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV6]]		; THRESHOLD-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>
; THRESHOLD-NEXT: store float [[OP_EXTRA1]], float* @res, align 4		; THRESHOLD-NEXT: [[TMP9:%.*]] = sitofp <2 x i32> [[TMP8]] to <2 x float>
; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]		; THRESHOLD-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])
		; THRESHOLD-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP9]], i32 0
		; THRESHOLD-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i32 1
		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP11]], [[TMP12]]
		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[TMP10]], [[OP_RDX]]
		; THRESHOLD-NEXT: store float [[OP_RDX1]], float* @res, align 4
		; THRESHOLD-NEXT: ret float [[OP_RDX1]]
;		;
entry:		entry:
%0 = load i32, i32* @n, align 4		%0 = load i32, i32* @n, align 4
%mul = mul nsw i32 %0, 3		%mul = mul nsw i32 %0, 3
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16		%1 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr, i64 0, i64 0), align 16
%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16		%2 = load float, float* getelementptr inbounds ([20 x float], [20 x float]* @arr1, i64 0, i64 0), align 16
%mul4 = fmul fast float %2, %1		%mul4 = fmul fast float %2, %1
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 8		; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 8
; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 9		; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 9
; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10		; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10
; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11		; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11
; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12		; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12
; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13		; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13
; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14		; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14
; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15		; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15
; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <16 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4
; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16		; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16
; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17		; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17
; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18		; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18
; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19		; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19
; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20		; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20
; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21		; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21
; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22		; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22
; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23		; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23
; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 24		; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 24
; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 25		; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 25
; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26		; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26
; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27		; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28		; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29		; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30		; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31		; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
		; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
; CHECK-NEXT: [[ARRAYIDX_32:%.]] = getelementptr inbounds float, float [[X]], i64 32		; CHECK-NEXT: [[ARRAYIDX_32:%.]] = getelementptr inbounds float, float [[X]], i64 32
; CHECK-NEXT: [[ARRAYIDX_33:%.]] = getelementptr inbounds float, float [[X]], i64 33		; CHECK-NEXT: [[ARRAYIDX_33:%.]] = getelementptr inbounds float, float [[X]], i64 33
; CHECK-NEXT: [[ARRAYIDX_34:%.]] = getelementptr inbounds float, float [[X]], i64 34		; CHECK-NEXT: [[ARRAYIDX_34:%.]] = getelementptr inbounds float, float [[X]], i64 34
; CHECK-NEXT: [[ARRAYIDX_35:%.]] = getelementptr inbounds float, float [[X]], i64 35		; CHECK-NEXT: [[ARRAYIDX_35:%.]] = getelementptr inbounds float, float [[X]], i64 35
; CHECK-NEXT: [[ARRAYIDX_36:%.]] = getelementptr inbounds float, float [[X]], i64 36		; CHECK-NEXT: [[ARRAYIDX_36:%.]] = getelementptr inbounds float, float [[X]], i64 36
; CHECK-NEXT: [[ARRAYIDX_37:%.]] = getelementptr inbounds float, float [[X]], i64 37		; CHECK-NEXT: [[ARRAYIDX_37:%.]] = getelementptr inbounds float, float [[X]], i64 37
; CHECK-NEXT: [[ARRAYIDX_38:%.]] = getelementptr inbounds float, float [[X]], i64 38		; CHECK-NEXT: [[ARRAYIDX_38:%.]] = getelementptr inbounds float, float [[X]], i64 38
; CHECK-NEXT: [[ARRAYIDX_39:%.]] = getelementptr inbounds float, float [[X]], i64 39		; CHECK-NEXT: [[ARRAYIDX_39:%.]] = getelementptr inbounds float, float [[X]], i64 39
; CHECK-NEXT: [[ARRAYIDX_40:%.]] = getelementptr inbounds float, float [[X]], i64 40		; CHECK-NEXT: [[ARRAYIDX_40:%.]] = getelementptr inbounds float, float [[X]], i64 40
; CHECK-NEXT: [[ARRAYIDX_41:%.]] = getelementptr inbounds float, float [[X]], i64 41		; CHECK-NEXT: [[ARRAYIDX_41:%.]] = getelementptr inbounds float, float [[X]], i64 41
; CHECK-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42		; CHECK-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42
; CHECK-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43		; CHECK-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43
; CHECK-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44		; CHECK-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44
; CHECK-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45		; CHECK-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45
; CHECK-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46		; CHECK-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46
; CHECK-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47		; CHECK-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47
; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_32]] to <16 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load <16 x float>, <16 x float> [[TMP2]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP3]])		; CHECK-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])
; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])		; CHECK-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP3]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
; CHECK-NEXT: ret float [[OP_RDX]]		; CHECK-NEXT: ret float [[OP_RDX]]
;		;
; THRESHOLD-LABEL: @f(		; THRESHOLD-LABEL: @f(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3
; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 4		; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 4
; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 5		; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 5
; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 6		; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 6
; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 7		; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 7
; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 8		; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 8
; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 9		; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 9
; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10		; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 10
; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11		; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 11
; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12		; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 12
; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13		; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 13
; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14		; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 14
; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15		; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 15
; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <16 x float>*
; THRESHOLD-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16		; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 16
; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17		; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 17
; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18		; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 18
; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19		; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 19
; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20		; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 20
; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21		; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 21
; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22		; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 22
; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23		; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 23
; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 24		; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 24
; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 25		; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 25
; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26		; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 26
; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27		; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28		; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29		; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30		; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31		; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
		; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_32:%.]] = getelementptr inbounds float, float [[X]], i64 32		; THRESHOLD-NEXT: [[ARRAYIDX_32:%.]] = getelementptr inbounds float, float [[X]], i64 32
; THRESHOLD-NEXT: [[ARRAYIDX_33:%.]] = getelementptr inbounds float, float [[X]], i64 33		; THRESHOLD-NEXT: [[ARRAYIDX_33:%.]] = getelementptr inbounds float, float [[X]], i64 33
; THRESHOLD-NEXT: [[ARRAYIDX_34:%.]] = getelementptr inbounds float, float [[X]], i64 34		; THRESHOLD-NEXT: [[ARRAYIDX_34:%.]] = getelementptr inbounds float, float [[X]], i64 34
; THRESHOLD-NEXT: [[ARRAYIDX_35:%.]] = getelementptr inbounds float, float [[X]], i64 35		; THRESHOLD-NEXT: [[ARRAYIDX_35:%.]] = getelementptr inbounds float, float [[X]], i64 35
; THRESHOLD-NEXT: [[ARRAYIDX_36:%.]] = getelementptr inbounds float, float [[X]], i64 36		; THRESHOLD-NEXT: [[ARRAYIDX_36:%.]] = getelementptr inbounds float, float [[X]], i64 36
; THRESHOLD-NEXT: [[ARRAYIDX_37:%.]] = getelementptr inbounds float, float [[X]], i64 37		; THRESHOLD-NEXT: [[ARRAYIDX_37:%.]] = getelementptr inbounds float, float [[X]], i64 37
; THRESHOLD-NEXT: [[ARRAYIDX_38:%.]] = getelementptr inbounds float, float [[X]], i64 38		; THRESHOLD-NEXT: [[ARRAYIDX_38:%.]] = getelementptr inbounds float, float [[X]], i64 38
; THRESHOLD-NEXT: [[ARRAYIDX_39:%.]] = getelementptr inbounds float, float [[X]], i64 39		; THRESHOLD-NEXT: [[ARRAYIDX_39:%.]] = getelementptr inbounds float, float [[X]], i64 39
; THRESHOLD-NEXT: [[ARRAYIDX_40:%.]] = getelementptr inbounds float, float [[X]], i64 40		; THRESHOLD-NEXT: [[ARRAYIDX_40:%.]] = getelementptr inbounds float, float [[X]], i64 40
; THRESHOLD-NEXT: [[ARRAYIDX_41:%.]] = getelementptr inbounds float, float [[X]], i64 41		; THRESHOLD-NEXT: [[ARRAYIDX_41:%.]] = getelementptr inbounds float, float [[X]], i64 41
; THRESHOLD-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42		; THRESHOLD-NEXT: [[ARRAYIDX_42:%.]] = getelementptr inbounds float, float [[X]], i64 42
; THRESHOLD-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43		; THRESHOLD-NEXT: [[ARRAYIDX_43:%.]] = getelementptr inbounds float, float [[X]], i64 43
; THRESHOLD-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44		; THRESHOLD-NEXT: [[ARRAYIDX_44:%.]] = getelementptr inbounds float, float [[X]], i64 44
; THRESHOLD-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45		; THRESHOLD-NEXT: [[ARRAYIDX_45:%.]] = getelementptr inbounds float, float [[X]], i64 45
; THRESHOLD-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46		; THRESHOLD-NEXT: [[ARRAYIDX_46:%.]] = getelementptr inbounds float, float [[X]], i64 46
; THRESHOLD-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47		; THRESHOLD-NEXT: [[ARRAYIDX_47:%.]] = getelementptr inbounds float, float [[X]], i64 47
; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <32 x float>*		; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_32]] to <16 x float>*
; THRESHOLD-NEXT: [[TMP3:%.]] = load <32 x float>, <32 x float> [[TMP2]], align 4		; THRESHOLD-NEXT: [[TMP3:%.]] = load <16 x float>, <16 x float> [[TMP2]], align 4
; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP3]])		; THRESHOLD-NEXT: [[TMP4:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])
; THRESHOLD-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP5:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP3]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP4]], [[TMP5]]
; THRESHOLD-NEXT: ret float [[OP_RDX]]		; THRESHOLD-NEXT: ret float [[OP_RDX]]
;		;
entry:		entry:
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1		%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
%1 = load float, float* %arrayidx.1, align 4		%1 = load float, float* %arrayidx.1, align 4
%add.1 = fadd fast float %1, %0		%add.1 = fadd fast float %1, %0
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27		; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28		; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29		; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30		; CHECK-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31		; CHECK-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[CONV]]
; CHECK-NEXT: ret float [[OP_EXTRA]]		; CHECK-NEXT: ret float [[OP_RDX]]
;		;
; THRESHOLD-LABEL: @f1(		; THRESHOLD-LABEL: @f1(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]		; THRESHOLD-NEXT: [[REM:%.]] = srem i32 [[A:%.]], [[B:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[REM]] to float
; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 3
Show All 23 Lines
; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27		; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 27
; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28		; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 28
; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29		; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 29
; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30		; THRESHOLD-NEXT: [[ARRAYIDX_30:%.]] = getelementptr inbounds float, float [[X]], i64 30
; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31		; THRESHOLD-NEXT: [[ARRAYIDX_31:%.]] = getelementptr inbounds float, float [[X]], i64 31
; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <32 x float>*
; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4		; THRESHOLD-NEXT: [[TMP1:%.]] = load <32 x float>, <32 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v32f32(float -0.000000e+00, <32 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[CONV]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP2]], [[CONV]]
; THRESHOLD-NEXT: ret float [[OP_EXTRA]]		; THRESHOLD-NEXT: ret float [[OP_RDX]]
;		;
entry:		entry:
%rem = srem i32 %a, %b		%rem = srem i32 %a, %b
%conv = sitofp i32 %rem to float		%conv = sitofp i32 %rem to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%add = fadd fast float %0, %conv		%add = fadd fast float %0, %conv
%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1		%arrayidx.1 = getelementptr inbounds float, float* %x, i64 1
%1 = load float, float* %arrayidx.1, align 4		%1 = load float, float* %arrayidx.1, align 4
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	;
%add.31 = fadd fast float %31, %add.30		%add.31 = fadd fast float %31, %add.30
ret float %add.31		ret float %add.31
}		}

define float @loadadd31(float* nocapture readonly %x) {		define float @loadadd31(float* nocapture readonly %x) {
; CHECK-LABEL: @loadadd31(		; CHECK-LABEL: @loadadd31(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4
; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_2]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8		; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8
; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9		; CHECK-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9
; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10		; CHECK-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10
; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11		; CHECK-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11
; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12		; CHECK-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12
; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13		; CHECK-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13
; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14		; CHECK-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14
; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_6]] to <8 x float>*
; CHECK-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15		; CHECK-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15
; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16		; CHECK-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16
		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[ARRAYIDX]] to <16 x float>*
		; CHECK-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4
; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17		; CHECK-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17
; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18		; CHECK-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18
; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19		; CHECK-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19
; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20		; CHECK-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20
; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21		; CHECK-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21
; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22		; CHECK-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22
; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23		; CHECK-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23
; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24		; CHECK-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24
		; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <8 x float>*
		; CHECK-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> [[TMP2]], align 4
; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25		; CHECK-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25
; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26		; CHECK-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26
; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27		; CHECK-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27
; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28		; CHECK-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28
		; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_24]] to <4 x float>*
		; CHECK-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29		; CHECK-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29
		; CHECK-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX_28]], align 4
; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30		; CHECK-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30
; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*		; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX_29]], align 4
; CHECK-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4		; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])
; CHECK-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP7]])		; CHECK-NEXT: [[TMP9:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])
; CHECK-NEXT: [[TMP9:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP5]])
; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP3]])		; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])
; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]
; CHECK-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX1]], [[TMP1]]		; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]		; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX1]], [[OP_RDX2]]
; CHECK-NEXT: ret float [[TMP12]]		; CHECK-NEXT: ret float [[OP_RDX3]]
;		;
; THRESHOLD-LABEL: @loadadd31(		; THRESHOLD-LABEL: @loadadd31(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX_1]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; THRESHOLD-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; THRESHOLD-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; THRESHOLD-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_2]] to <4 x float>*
; THRESHOLD-NEXT: [[TMP3:%.]] = load <4 x float>, <4 x float> [[TMP2]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; THRESHOLD-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8		; THRESHOLD-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds float, float [[X]], i64 8
; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9		; THRESHOLD-NEXT: [[ARRAYIDX_8:%.]] = getelementptr inbounds float, float [[X]], i64 9
; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10		; THRESHOLD-NEXT: [[ARRAYIDX_9:%.]] = getelementptr inbounds float, float [[X]], i64 10
; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11		; THRESHOLD-NEXT: [[ARRAYIDX_10:%.]] = getelementptr inbounds float, float [[X]], i64 11
; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12		; THRESHOLD-NEXT: [[ARRAYIDX_11:%.]] = getelementptr inbounds float, float [[X]], i64 12
; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13		; THRESHOLD-NEXT: [[ARRAYIDX_12:%.]] = getelementptr inbounds float, float [[X]], i64 13
; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14		; THRESHOLD-NEXT: [[ARRAYIDX_13:%.]] = getelementptr inbounds float, float [[X]], i64 14
; THRESHOLD-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_6]] to <8 x float>*
; THRESHOLD-NEXT: [[TMP5:%.]] = load <8 x float>, <8 x float> [[TMP4]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15		; THRESHOLD-NEXT: [[ARRAYIDX_14:%.]] = getelementptr inbounds float, float [[X]], i64 15
; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16		; THRESHOLD-NEXT: [[ARRAYIDX_15:%.]] = getelementptr inbounds float, float [[X]], i64 16
		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[ARRAYIDX]] to <16 x float>*
		; THRESHOLD-NEXT: [[TMP1:%.]] = load <16 x float>, <16 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17		; THRESHOLD-NEXT: [[ARRAYIDX_16:%.]] = getelementptr inbounds float, float [[X]], i64 17
; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18		; THRESHOLD-NEXT: [[ARRAYIDX_17:%.]] = getelementptr inbounds float, float [[X]], i64 18
; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19		; THRESHOLD-NEXT: [[ARRAYIDX_18:%.]] = getelementptr inbounds float, float [[X]], i64 19
; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20		; THRESHOLD-NEXT: [[ARRAYIDX_19:%.]] = getelementptr inbounds float, float [[X]], i64 20
; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21		; THRESHOLD-NEXT: [[ARRAYIDX_20:%.]] = getelementptr inbounds float, float [[X]], i64 21
; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22		; THRESHOLD-NEXT: [[ARRAYIDX_21:%.]] = getelementptr inbounds float, float [[X]], i64 22
; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23		; THRESHOLD-NEXT: [[ARRAYIDX_22:%.]] = getelementptr inbounds float, float [[X]], i64 23
; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24		; THRESHOLD-NEXT: [[ARRAYIDX_23:%.]] = getelementptr inbounds float, float [[X]], i64 24
		; THRESHOLD-NEXT: [[TMP2:%.]] = bitcast float [[ARRAYIDX_16]] to <8 x float>*
		; THRESHOLD-NEXT: [[TMP3:%.]] = load <8 x float>, <8 x float> [[TMP2]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25		; THRESHOLD-NEXT: [[ARRAYIDX_24:%.]] = getelementptr inbounds float, float [[X]], i64 25
; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26		; THRESHOLD-NEXT: [[ARRAYIDX_25:%.]] = getelementptr inbounds float, float [[X]], i64 26
; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27		; THRESHOLD-NEXT: [[ARRAYIDX_26:%.]] = getelementptr inbounds float, float [[X]], i64 27
; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28		; THRESHOLD-NEXT: [[ARRAYIDX_27:%.]] = getelementptr inbounds float, float [[X]], i64 28
		; THRESHOLD-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX_24]] to <4 x float>*
		; THRESHOLD-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29		; THRESHOLD-NEXT: [[ARRAYIDX_28:%.]] = getelementptr inbounds float, float [[X]], i64 29
		; THRESHOLD-NEXT: [[TMP6:%.]] = load float, float [[ARRAYIDX_28]], align 4
; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30		; THRESHOLD-NEXT: [[ARRAYIDX_29:%.]] = getelementptr inbounds float, float [[X]], i64 30
; THRESHOLD-NEXT: [[TMP6:%.]] = bitcast float [[ARRAYIDX_14]] to <16 x float>*		; THRESHOLD-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX_29]], align 4
; THRESHOLD-NEXT: [[TMP7:%.]] = load <16 x float>, <16 x float> [[TMP6]], align 4		; THRESHOLD-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP1]])
; THRESHOLD-NEXT: [[TMP8:%.*]] = call fast float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> [[TMP7]])		; THRESHOLD-NEXT: [[TMP9:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP3]])
; THRESHOLD-NEXT: [[TMP9:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP5]])
; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[TMP8]], [[TMP9]]
; THRESHOLD-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP3]])		; THRESHOLD-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP5]])
; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], [[TMP10]]		; THRESHOLD-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[OP_RDX]], i32 0
; THRESHOLD-NEXT: [[TMP11:%.*]] = fadd fast float [[OP_RDX1]], [[TMP1]]		; THRESHOLD-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP6]], i32 1
; THRESHOLD-NEXT: [[TMP12:%.*]] = fadd fast float [[TMP11]], [[TMP0]]		; THRESHOLD-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP10]], i32 0
; THRESHOLD-NEXT: ret float [[TMP12]]		; THRESHOLD-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP7]], i32 1
		; THRESHOLD-NEXT: [[TMP15:%.*]] = fadd fast <2 x float> [[TMP12]], [[TMP14]]
		; THRESHOLD-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
		; THRESHOLD-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
		; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP16]], [[TMP17]]
		; THRESHOLD-NEXT: ret float [[OP_RDX3]]
;		;
entry:		entry:
%arrayidx = getelementptr inbounds float, float* %x, i64 1		%arrayidx = getelementptr inbounds float, float* %x, i64 1
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2		%arrayidx.1 = getelementptr inbounds float, float* %x, i64 2
%1 = load float, float* %arrayidx.1, align 4		%1 = load float, float* %arrayidx.1, align 4
%add.1 = fadd fast float %1, %0		%add.1 = fadd fast float %1, %0
%arrayidx.2 = getelementptr inbounds float, float* %x, i64 3		%arrayidx.2 = getelementptr inbounds float, float* %x, i64 3
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	;
ret float %add.29		ret float %add.29
}		}

define float @extra_args(float* nocapture readonly %x, i32 %a, i32 %b) {		define float @extra_args(float* nocapture readonly %x, i32 %a, i32 %b) {
; CHECK-LABEL: @extra_args(		; CHECK-LABEL: @extra_args(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[CONV]], [[CONV]]
; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], 3.000000e+00
; CHECK-NEXT: ret float [[OP_EXTRA1]]		; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP2]], [[OP_RDX1]]
		; CHECK-NEXT: ret float [[OP_RDX2]]
;		;
; THRESHOLD-LABEL: @extra_args(		; THRESHOLD-LABEL: @extra_args(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float [[CONV]], [[CONV]]
; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[OP_RDX]], 3.000000e+00
; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]		; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP2]], [[OP_RDX1]]
		; THRESHOLD-NEXT: ret float [[OP_RDX2]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%add = fadd fast float %conv, 3.000000e+00		%add = fadd fast float %conv, 3.000000e+00
%add1 = fadd fast float %0, %add		%add1 = fadd fast float %0, %add
%arrayidx3 = getelementptr inbounds float, float* %x, i64 1		%arrayidx3 = getelementptr inbounds float, float* %x, i64 1
Show All 21 Lines	;
ret float %add4.6		ret float %add4.6
}		}

define float @extra_args_same_several_times(float* nocapture readonly %x, i32 %a, i32 %b) {		define float @extra_args_same_several_times(float* nocapture readonly %x, i32 %a, i32 %b) {
; CHECK-LABEL: @extra_args_same_several_times(		; CHECK-LABEL: @extra_args_same_several_times(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float 5.000000e+00, [[CONV]]
; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float 8.000000e+00, [[OP_RDX]]
; CHECK-NEXT: [[OP_EXTRA2:%.*]] = fadd fast float [[OP_EXTRA1]], 5.000000e+00		; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], [[CONV]]
; CHECK-NEXT: [[OP_EXTRA3:%.*]] = fadd fast float [[OP_EXTRA2]], [[CONV]]		; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP2]], [[OP_RDX2]]
; CHECK-NEXT: ret float [[OP_EXTRA3]]		; CHECK-NEXT: ret float [[OP_RDX3]]
;		;
; THRESHOLD-LABEL: @extra_args_same_several_times(		; THRESHOLD-LABEL: @extra_args_same_several_times(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = fadd fast float 5.000000e+00, [[CONV]]
; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], 5.000000e+00		; THRESHOLD-NEXT: [[OP_RDX1:%.*]] = fadd fast float 8.000000e+00, [[OP_RDX]]
; THRESHOLD-NEXT: [[OP_EXTRA2:%.*]] = fadd fast float [[OP_EXTRA1]], 5.000000e+00		; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX1]], [[CONV]]
; THRESHOLD-NEXT: [[OP_EXTRA3:%.*]] = fadd fast float [[OP_EXTRA2]], [[CONV]]		; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP2]], [[OP_RDX2]]
; THRESHOLD-NEXT: ret float [[OP_EXTRA3]]		; THRESHOLD-NEXT: ret float [[OP_RDX3]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%add = fadd fast float %conv, 3.000000e+00		%add = fadd fast float %conv, 3.000000e+00
%add1 = fadd fast float %0, %add		%add1 = fadd fast float %0, %add
%arrayidx3 = getelementptr inbounds float, float* %x, i64 1		%arrayidx3 = getelementptr inbounds float, float* %x, i64 1
Show All 24 Lines
}		}

define float @extra_args_no_replace(float* nocapture readonly %x, i32 %a, i32 %b, i32 %c) {		define float @extra_args_no_replace(float* nocapture readonly %x, i32 %a, i32 %b, i32 %c) {
; CHECK-LABEL: @extra_args_no_replace(		; CHECK-LABEL: @extra_args_no_replace(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; CHECK-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float		; CHECK-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
; CHECK-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; CHECK-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; CHECK-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; CHECK-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; CHECK-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; CHECK-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; CHECK-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; CHECK-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; CHECK-NEXT: [[OP_RDX:%.*]] = fadd fast float [[CONVC]], [[CONV]]
; CHECK-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]		; CHECK-NEXT: [[OP_RDX1:%.*]] = fadd fast float [[CONV]], 3.000000e+00
; CHECK-NEXT: ret float [[OP_EXTRA1]]		; CHECK-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[OP_RDX]], [[OP_RDX1]]
		; CHECK-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP2]], [[OP_RDX2]]
		; CHECK-NEXT: ret float [[OP_RDX3]]
;		;
; THRESHOLD-LABEL: @extra_args_no_replace(		; THRESHOLD-LABEL: @extra_args_no_replace(
; THRESHOLD-NEXT: entry:		; THRESHOLD-NEXT: entry:
; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]		; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float		; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float		; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
; THRESHOLD-NEXT: [[ADDC:%.*]] = fadd fast float [[CONVC]], 3.000000e+00
; THRESHOLD-NEXT: [[ADD:%.*]] = fadd fast float [[CONV]], [[ADDC]]
; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1		; THRESHOLD-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds float, float [[X:%.*]], i64 1
; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2		; THRESHOLD-NEXT: [[ARRAYIDX3_1:%.]] = getelementptr inbounds float, float [[X]], i64 2
; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3		; THRESHOLD-NEXT: [[ARRAYIDX3_2:%.]] = getelementptr inbounds float, float [[X]], i64 3
; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4		; THRESHOLD-NEXT: [[ARRAYIDX3_3:%.]] = getelementptr inbounds float, float [[X]], i64 4
; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5		; THRESHOLD-NEXT: [[ARRAYIDX3_4:%.]] = getelementptr inbounds float, float [[X]], i64 5
; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6		; THRESHOLD-NEXT: [[ARRAYIDX3_5:%.]] = getelementptr inbounds float, float [[X]], i64 6
; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7		; THRESHOLD-NEXT: [[ARRAYIDX3_6:%.]] = getelementptr inbounds float, float [[X]], i64 7
; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*		; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X]] to <8 x float>*
; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4		; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])		; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = fadd fast float [[TMP2]], [[ADD]]		; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> <float poison, float 3.000000e+00>, float [[CONVC]], i32 0
; THRESHOLD-NEXT: [[OP_EXTRA1:%.*]] = fadd fast float [[OP_EXTRA]], [[CONV]]		; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0
; THRESHOLD-NEXT: ret float [[OP_EXTRA1]]		; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[CONV]], i32 1
		; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP3]], [[TMP5]]
		; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
		; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
		; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
		; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[TMP2]], [[OP_RDX2]]
		; THRESHOLD-NEXT: ret float [[OP_RDX3]]
;		;
entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%conv = sitofp i32 %mul to float		%conv = sitofp i32 %mul to float
%0 = load float, float* %x, align 4		%0 = load float, float* %x, align 4
%convc = sitofp i32 %c to float		%convc = sitofp i32 %c to float
%addc = fadd fast float %convc, 3.000000e+00		%addc = fadd fast float %convc, 3.000000e+00
%add = fadd fast float %conv, %addc		%add = fadd fast float %conv, %addc
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0
; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; CHECK-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer		; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>		; CHECK-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>
; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])		; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP6]], [[ARG]]		; CHECK-NEXT: [[OP_RDX:%.*]] = add nuw i32 [[TMP3]], [[ARG]]
; CHECK-NEXT: [[OP_EXTRA2:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP3]]		; CHECK-NEXT: [[OP_RDX2:%.*]] = add nsw i32 [[TMP6]], [[OP_RDX]]
; CHECK-NEXT: ret i32 [[OP_EXTRA2]]		; CHECK-NEXT: ret i32 [[OP_RDX2]]
;		;
; THRESHOLD-LABEL: @wobble(		; THRESHOLD-LABEL: @wobble(
; THRESHOLD-NEXT: bb:		; THRESHOLD-NEXT: bb:
; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0		; THRESHOLD-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[ARG:%.]], i32 0
; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer		; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
; THRESHOLD-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0		; THRESHOLD-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[BAR:%.]], i32 0
; THRESHOLD-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer		; THRESHOLD-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
; THRESHOLD-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]		; THRESHOLD-NEXT: [[TMP2:%.*]] = xor <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3		; THRESHOLD-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[TMP2]], i32 3
; THRESHOLD-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer		; THRESHOLD-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[TMP2]], zeroinitializer
; THRESHOLD-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>		; THRESHOLD-NEXT: [[TMP5:%.*]] = sext <4 x i1> [[TMP4]] to <4 x i32>
; THRESHOLD-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])		; THRESHOLD-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP5]])
; THRESHOLD-NEXT: [[OP_EXTRA:%.*]] = add nuw i32 [[TMP6]], [[ARG]]		; THRESHOLD-NEXT: [[OP_RDX:%.*]] = add nuw i32 [[TMP3]], [[ARG]]
; THRESHOLD-NEXT: [[OP_EXTRA2:%.*]] = add nsw i32 [[OP_EXTRA]], [[TMP3]]		; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = add nsw i32 [[TMP6]], [[OP_RDX]]
; THRESHOLD-NEXT: ret i32 [[OP_EXTRA2]]		; THRESHOLD-NEXT: ret i32 [[OP_RDX2]]
;		;
bb:		bb:
%x1 = xor i32 %arg, %bar		%x1 = xor i32 %arg, %bar
%i1 = icmp eq i32 %x1, 0		%i1 = icmp eq i32 %x1, 0
%s1 = sext i1 %i1 to i32		%s1 = sext i1 %i1 to i32
%x2 = xor i32 %arg, %bar		%x2 = xor i32 %arg, %bar
%i2 = icmp eq i32 %x2, 0		%i2 = icmp eq i32 %x2, 0
%s2 = sext i1 %i2 to i32		%s2 = sext i1 %i2 to i32
Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

	Show First 20 Lines • Show All 867 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; AVX-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; AVX-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
	; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; AVX-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP7]], [[TMP8]]
	; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; AVX-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP7]], i32 [[TMP8]]
	; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; AVX-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[OP_RDX1]], [[TMP5]]
	; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]			; AVX-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 [[TMP5]]
	; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]			; AVX-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP9]], [[OP_RDX3]]
	; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]			; AVX-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP9]], i32 [[OP_RDX3]]
	; AVX-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4			; AVX-NEXT: [[TMP10:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; AVX-NEXT: store i32 [[TMP14]], i32* @var, align 8			; AVX-NEXT: store i32 [[TMP10]], i32* @var, align 8
	; AVX-NEXT: ret i32 [[OP_EXTRA1]]			; AVX-NEXT: ret i32 [[OP_RDX5]]
	;			;
	; AVX2-LABEL: @maxi8_mutiple_uses(			; AVX2-LABEL: @maxi8_mutiple_uses(
	; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]			; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
	; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; AVX2-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; AVX2-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; AVX2-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP7]], [[TMP8]]
	; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; AVX2-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP7]], i32 [[TMP8]]
	; AVX2-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; AVX2-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[OP_RDX1]], [[TMP5]]
	; AVX2-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]			; AVX2-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 [[TMP5]]
	; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]			; AVX2-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP9]], [[OP_RDX3]]
	; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]			; AVX2-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP9]], i32 [[OP_RDX3]]
	; AVX2-NEXT: [[TMP14:%.*]] = select i1 [[TMP4]], i32 3, i32 4			; AVX2-NEXT: [[TMP10:%.*]] = select i1 [[TMP4]], i32 3, i32 4
	; AVX2-NEXT: store i32 [[TMP14]], i32* @var, align 8			; AVX2-NEXT: store i32 [[TMP10]], i32* @var, align 8
	; AVX2-NEXT: ret i32 [[OP_EXTRA1]]			; AVX2-NEXT: ret i32 [[OP_RDX5]]
	;			;
	; THRESH-LABEL: @maxi8_mutiple_uses(			; THRESH-LABEL: @maxi8_mutiple_uses(
	; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0			; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
	; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; THRESH-NEXT: [[TMP5:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; THRESH-NEXT: [[TMP6:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; THRESH-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])			; THRESH-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[TMP6]], i32 0
	; THRESH-NEXT: [[TMP9:%.*]] = icmp sgt i32 [[TMP8]], [[TMP6]]			; THRESH-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[TMP3]], i32 1
	; THRESH-NEXT: [[TMP10:%.*]] = select i1 [[TMP9]], i32 [[TMP8]], i32 [[TMP6]]			; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0
	; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> poison, i32 [[TMP10]], i32 0			; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1
	; THRESH-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 [[TMP3]], i32 1			; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt <2 x i32> [[TMP9]], [[TMP11]]
	; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> poison, i32 [[TMP7]], i32 0			; THRESH-NEXT: [[TMP13:%.*]] = select <2 x i1> [[TMP12]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]
	; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP4]], i32 1			; THRESH-NEXT: [[TMP14:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP5]])
	; THRESH-NEXT: [[TMP15:%.*]] = icmp sgt <2 x i32> [[TMP12]], [[TMP14]]			; THRESH-NEXT: [[TMP15:%.*]] = extractelement <2 x i32> [[TMP13]], i32 0
	; THRESH-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP15]], <2 x i32> [[TMP12]], <2 x i32> [[TMP14]]			; THRESH-NEXT: [[TMP16:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
	; THRESH-NEXT: [[TMP17:%.*]] = extractelement <2 x i32> [[TMP16]], i32 0			; THRESH-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[TMP15]], [[TMP16]]
	; THRESH-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP16]], i32 1			; THRESH-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[TMP15]], i32 [[TMP16]]
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP17]], [[TMP18]]			; THRESH-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP14]], [[OP_RDX3]]
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP17]], i32 [[TMP18]]			; THRESH-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP14]], i32 [[OP_RDX3]]
	; THRESH-NEXT: [[TMP19:%.*]] = extractelement <2 x i1> [[TMP15]], i32 1			; THRESH-NEXT: [[TMP17:%.*]] = extractelement <2 x i1> [[TMP12]], i32 1
	; THRESH-NEXT: [[TMP20:%.*]] = select i1 [[TMP19]], i32 3, i32 4			; THRESH-NEXT: [[TMP18:%.*]] = select i1 [[TMP17]], i32 3, i32 4
	; THRESH-NEXT: store i32 [[TMP20]], i32* @var, align 8			; THRESH-NEXT: store i32 [[TMP18]], i32* @var, align 8
	; THRESH-NEXT: ret i32 [[OP_EXTRA1]]			; THRESH-NEXT: ret i32 [[OP_RDX5]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = icmp sgt i32 %2, %3			%4 = icmp sgt i32 %2, %3
	%5 = select i1 %4, i32 %2, i32 %3			%5 = select i1 %4, i32 %2, i32 %3
	%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8			%6 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2), align 8
	%7 = icmp sgt i32 %5, %6			%7 = icmp sgt i32 %5, %6
	%8 = select i1 %7, i32 %5, i32 %6			%8 = select i1 %7, i32 %5, i32 %6
	▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; AVX-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; AVX-NEXT: br label [[PP:%.*]]			; AVX-NEXT: br label [[PP:%.*]]
	; AVX: pp:			; AVX: pp:
	; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]			; AVX-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
	; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; AVX-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; AVX-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; AVX-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; AVX-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; AVX-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; AVX-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP7]], [[TMP8]]
	; AVX-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; AVX-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP7]], i32 [[TMP8]]
	; AVX-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; AVX-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[OP_RDX1]], [[TMP5]]
	; AVX-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]			; AVX-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 [[TMP5]]
	; AVX-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]			; AVX-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP9]], [[OP_RDX3]]
	; AVX-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]			; AVX-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP9]], i32 [[OP_RDX3]]
	; AVX-NEXT: ret i32 [[OP_EXTRA1]]			; AVX-NEXT: ret i32 [[OP_RDX5]]
	;			;
	; AVX2-LABEL: @maxi8_wrong_parent(			; AVX2-LABEL: @maxi8_wrong_parent(
	; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			; AVX2-NEXT: [[TMP2:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			; AVX2-NEXT: [[TMP3:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]			; AVX2-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[TMP2]], [[TMP3]]
	; AVX2-NEXT: br label [[PP:%.*]]			; AVX2-NEXT: br label [[PP:%.*]]
	; AVX2: pp:			; AVX2: pp:
	; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]			; AVX2-NEXT: [[TMP5:%.*]] = select i1 [[TMP4]], i32 [[TMP2]], i32 [[TMP3]]
	; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; AVX2-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; AVX2-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8
	; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; AVX2-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4
	; AVX2-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; AVX2-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; AVX2-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; AVX2-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP7]], [[TMP8]]
	; AVX2-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; AVX2-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP7]], i32 [[TMP8]]
	; AVX2-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; AVX2-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[OP_RDX1]], [[TMP5]]
	; AVX2-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i32 [[TMP11]], i32 [[TMP8]]			; AVX2-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 [[TMP5]]
	; AVX2-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP13]], [[TMP5]]			; AVX2-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP9]], [[OP_RDX3]]
	; AVX2-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP13]], i32 [[TMP5]]			; AVX2-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP9]], i32 [[OP_RDX3]]
	; AVX2-NEXT: ret i32 [[OP_EXTRA1]]			; AVX2-NEXT: ret i32 [[OP_RDX5]]
	;			;
	; THRESH-LABEL: @maxi8_wrong_parent(			; THRESH-LABEL: @maxi8_wrong_parent(
	; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16			; THRESH-NEXT: [[TMP2:%.]] = load <2 x i32>, <2 x i32> bitcast ([32 x i32]* @arr to <2 x i32>*), align 16
	; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0			; THRESH-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1			; THRESH-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
	; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]			; THRESH-NEXT: [[TMP5:%.*]] = icmp sgt i32 [[TMP3]], [[TMP4]]
	; THRESH-NEXT: br label [[PP:%.*]]			; THRESH-NEXT: br label [[PP:%.*]]
	; THRESH: pp:			; THRESH: pp:
	; THRESH-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8			; THRESH-NEXT: [[TMP6:%.]] = load <4 x i32>, <4 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 2) to <4 x i32>*), align 8
	; THRESH-NEXT: [[TMP7:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6), align 8			; THRESH-NEXT: [[TMP7:%.]] = load <2 x i32>, <2 x i32> bitcast (i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 6) to <2 x i32>*), align 8
	; THRESH-NEXT: [[TMP8:%.]] = load i32, i32 getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 7), align 4			; THRESH-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP7]], i32 0
	; THRESH-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])			; THRESH-NEXT: [[TMP9:%.*]] = extractelement <2 x i32> [[TMP7]], i32 1
	; THRESH-NEXT: [[TMP10:%.*]] = icmp sgt i32 [[TMP9]], [[TMP7]]			; THRESH-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP8]], [[TMP9]]
	; THRESH-NEXT: [[TMP11:%.*]] = select i1 [[TMP10]], i32 [[TMP9]], i32 [[TMP7]]			; THRESH-NEXT: [[TMP10:%.*]] = insertelement <2 x i1> poison, i1 [[OP_RDX]], i32 0
	; THRESH-NEXT: [[TMP12:%.*]] = icmp sgt i32 [[TMP11]], [[TMP8]]			; THRESH-NEXT: [[TMP11:%.*]] = insertelement <2 x i1> [[TMP10]], i1 [[TMP5]], i32 1
	; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i1> poison, i1 [[TMP12]], i32 0			; THRESH-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0
	; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i1> [[TMP13]], i1 [[TMP5]], i32 1			; THRESH-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> [[TMP12]], i32 [[TMP3]], i32 1
	; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> poison, i32 [[TMP11]], i32 0			; THRESH-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> poison, i32 [[TMP9]], i32 0
	; THRESH-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP3]], i32 1			; THRESH-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> [[TMP14]], i32 [[TMP4]], i32 1
	; THRESH-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> poison, i32 [[TMP8]], i32 0			; THRESH-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x i32> [[TMP13]], <2 x i32> [[TMP15]]
	; THRESH-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> [[TMP17]], i32 [[TMP4]], i32 1			; THRESH-NEXT: [[TMP17:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP6]])
	; THRESH-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP14]], <2 x i32> [[TMP16]], <2 x i32> [[TMP18]]			; THRESH-NEXT: [[TMP18:%.*]] = extractelement <2 x i32> [[TMP16]], i32 0
	; THRESH-NEXT: [[TMP20:%.*]] = extractelement <2 x i32> [[TMP19]], i32 0			; THRESH-NEXT: [[TMP19:%.*]] = extractelement <2 x i32> [[TMP16]], i32 1
	; THRESH-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP19]], i32 1			; THRESH-NEXT: [[OP_RDX2:%.*]] = icmp sgt i32 [[TMP18]], [[TMP19]]
	; THRESH-NEXT: [[OP_EXTRA:%.*]] = icmp sgt i32 [[TMP20]], [[TMP21]]			; THRESH-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[TMP18]], i32 [[TMP19]]
	; THRESH-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP20]], i32 [[TMP21]]			; THRESH-NEXT: [[OP_RDX4:%.*]] = icmp sgt i32 [[TMP17]], [[OP_RDX3]]
	; THRESH-NEXT: ret i32 [[OP_EXTRA1]]			; THRESH-NEXT: [[OP_RDX5:%.*]] = select i1 [[OP_RDX4]], i32 [[TMP17]], i32 [[OP_RDX3]]
				; THRESH-NEXT: ret i32 [[OP_RDX5]]
	;			;
	%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16			%2 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 0), align 16
	%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4			%3 = load i32, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @arr, i64 0, i64 1), align 4
	%4 = icmp sgt i32 %2, %3			%4 = icmp sgt i32 %2, %3
	br label %pp			br label %pp

	pp:			pp:
	%5 = select i1 %4, i32 %2, i32 %3			%5 = select i1 %4, i32 %2, i32 %3
	▲ Show 20 Lines • Show All 353 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[T13]], i32 93)			; SSE-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[T13]], i32 93)
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @PR49730(			; AVX-LABEL: @PR49730(
	; AVX-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)			; AVX-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
	; AVX-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]			; AVX-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]
	; AVX-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; AVX-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])			; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
	; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])			; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
	; AVX-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)			; AVX-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
	; AVX-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)			; AVX-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @PR49730(			; AVX2-LABEL: @PR49730(
	; AVX2-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)			; AVX2-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
	; AVX2-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]			; AVX2-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]
	; AVX2-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; AVX2-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])			; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
	; AVX2-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])			; AVX2-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
	; AVX2-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)			; AVX2-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
	; AVX2-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)			; AVX2-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; THRESH-LABEL: @PR49730(			; THRESH-LABEL: @PR49730(
	; THRESH-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)			; THRESH-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
	; THRESH-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]			; THRESH-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> poison, [[TMP1]]
	; THRESH-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef			; THRESH-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
	; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])			; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
	; THRESH-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[T12]])			; THRESH-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
	; THRESH-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP4]], i32 undef)			; THRESH-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
	; THRESH-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)			; THRESH-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
	; THRESH-NEXT: ret void			; THRESH-NEXT: ret void
	;			;
	%t = call i32 @llvm.smin.i32(i32 undef, i32 2)			%t = call i32 @llvm.smin.i32(i32 undef, i32 2)
	%t1 = sub nsw i32 undef, %t			%t1 = sub nsw i32 undef, %t
	%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)			%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)
	%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)			%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)
	%t4 = sub nsw i32 undef, %t3			%t4 = sub nsw i32 undef, %t3
	Show All 12 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s

	define i32 @bar() local_unnamed_addr {			define i32 @bar() local_unnamed_addr {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_3]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[SUB102_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[ADD78_1]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[SUB86_1]], i32 4			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 4
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[ADD78_2]], i32 5			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> [[TMP4]], i32 [[SUB102_3]], i32 5
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> poison, <16 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 2, i32 3, i32 4, i32 undef, i32 5, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP5]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 4, i32 undef, i32 5, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD78_1]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD78_1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[ADD94_1]], i32 2
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_1]], i32 3			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_1]], i32 3
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[ADD78_2]], i32 4			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <16 x i32> [[TMP9]], i32 [[ADD78_2]], i32 4
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[SUB102_3]], i32 5			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <16 x i32> [[TMP10]], i32 [[SUB102_3]], i32 5
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP11]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 0, i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 2, i32 3, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 5>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP11]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 5>
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP12:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP13:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP13:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i32> [[TMP12]], <16 x i32> [[TMP13]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 21, i32 22, i32 7, i32 24, i32 25, i32 10, i32 27, i32 28, i32 13, i32 30, i32 31>			; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <16 x i32> [[TMP12]], <16 x i32> [[TMP13]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
	; CHECK-NEXT: [[TMP15:%.*]] = lshr <16 x i32> [[TMP14]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: [[TMP15:%.*]] = lshr <16 x i32> [[TMP14]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP16:%.*]] = and <16 x i32> [[TMP15]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>			; CHECK-NEXT: [[TMP16:%.*]] = and <16 x i32> [[TMP15]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP17:%.*]] = mul nuw <16 x i32> [[TMP16]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>			; CHECK-NEXT: [[TMP17:%.*]] = mul nuw <16 x i32> [[TMP16]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[TMP18:%.*]] = add <16 x i32> [[TMP17]], [[TMP14]]			; CHECK-NEXT: [[TMP18:%.*]] = add <16 x i32> [[TMP17]], [[TMP14]]
	; CHECK-NEXT: [[TMP19:%.*]] = xor <16 x i32> [[TMP18]], [[TMP17]]			; CHECK-NEXT: [[TMP19:%.*]] = xor <16 x i32> [[TMP18]], [[TMP17]]
	; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP19]])			; CHECK-NEXT: [[TMP20:%.*]] = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP19]])
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP20]], 16			; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[TMP20]], 16
	; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]			; CHECK-NEXT: [[ADD119:%.*]] = add nuw nsw i32 undef, [[SHR]]
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -S \| FileCheck %s --check-prefixes=CHECK,SSE
; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -mattr=avx512vl -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -mtriple=x86_64-- -mattr=avx512vl -S \| FileCheck %s --check-prefixes=CHECK,AVX512VL

declare void @use1(i1)		declare void @use1(i1)

define i1 @logical_and_icmp(<4 x i32> %x) {		define i1 @logical_and_icmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp(		; CHECK-LABEL: @logical_and_icmp(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
}		}

; TODO: This is better than all-scalar and still safe,		; TODO: This is better than all-scalar and still safe,
; but we want this to be 2 reductions with glue		; but we want this to be 2 reductions with glue
; logic...or a wide reduction?		; logic...or a wide reduction?

define i1 @logical_and_icmp_clamp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp(		; CHECK-LABEL: @logical_and_icmp_clamp(
; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x i1> [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[X]], i32 0		; CHECK-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[X]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[TMP4]], 17		; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[TMP3]], 17		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP4]], i1 [[TMP6]], i1 false
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[TMP2]], 17		; CHECK-NEXT: ret i1 [[OP_RDX]]
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[TMP1]], 17
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[S4:%.*]] = select i1 [[TMP7]], i1 [[D0]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 9 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(		; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: call void @use1(i1 [[TMP2]])
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42		; CHECK-NEXT: [[TMP4:%.*]] = freeze <4 x i1> [[TMP3]]
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42		; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP4]])
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: call void @use1(i1 [[C2]])		; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP5]], i1 [[TMP7]], i1 false
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17		; CHECK-NEXT: ret i1 [[OP_RDX]]
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 10 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_extra_use_select(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_extra_use_select(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_select(		; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_select(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[X]], i32 0
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42		; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP4]], 42
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42		; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP3]], 42
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP2]], 42
; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42		; CHECK-NEXT: [[C3:%.*]] = icmp slt i32 [[TMP1]], 42
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17		; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
; CHECK-NEXT: call void @use1(i1 [[S2]])		; CHECK-NEXT: call void @use1(i1 [[S2]])
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false		; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false		; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[C3]], i1 [[S2]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false		; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[TMP7]], i1 [[OP_RDX]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false		; CHECK-NEXT: ret i1 [[OP_RDX1]]
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 9 Lines	;
%s4 = select i1 %s3, i1 %d0, i1 false		%s4 = select i1 %s3, i1 %d0, i1 false
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {		define i1 @logical_and_icmp_clamp_v8i32(<8 x i32> %x, <8 x i32> %y) {
; CHECK-LABEL: @logical_and_icmp_clamp_v8i32(		; SSE-LABEL: @logical_and_icmp_clamp_v8i32(
; CHECK-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0		; SSE-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0
; CHECK-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1		; SSE-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2		; SSE-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2
; CHECK-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3		; SSE-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3
; CHECK-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0		; SSE-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0
; CHECK-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1		; SSE-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1
; CHECK-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2		; SSE-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2
; CHECK-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3		; SSE-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0		; SSE-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[X1]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[X1]], i32 1		; SSE-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[X0]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[X2]], i32 2		; SSE-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[X2]], i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[X3]], i32 3		; SSE-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[X3]], i32 3
; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], <i32 42, i32 42, i32 42, i32 42>		; SSE-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[D0:%.*]] = icmp slt i32 [[X0]], [[Y0]]		; SSE-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
; CHECK-NEXT: [[D1:%.*]] = icmp slt i32 [[X1]], [[Y1]]		; SSE-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[X1]], i32 1
; CHECK-NEXT: [[D2:%.*]] = icmp slt i32 [[X2]], [[Y2]]		; SSE-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[X2]], i32 2
; CHECK-NEXT: [[D3:%.*]] = icmp slt i32 [[X3]], [[Y3]]		; SSE-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[X3]], i32 3
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]		; SSE-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> poison, i32 [[Y0]], i32 0
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])		; SSE-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[Y1]], i32 1
; CHECK-NEXT: [[S4:%.*]] = select i1 [[TMP7]], i1 [[D0]], i1 false		; SSE-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[Y2]], i32 2
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false		; SSE-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[Y3]], i32 3
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false		; SSE-NEXT: [[TMP14:%.*]] = icmp slt <4 x i32> [[TMP9]], [[TMP13]]
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false		; SSE-NEXT: [[TMP15:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: ret i1 [[S7]]		; SSE-NEXT: [[TMP16:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP15]])
		; SSE-NEXT: [[TMP17:%.*]] = freeze <4 x i1> [[TMP14]]
		; SSE-NEXT: [[TMP18:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP17]])
		; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP16]], i1 [[TMP18]], i1 false
		; SSE-NEXT: ret i1 [[OP_RDX]]
		;
		; AVX512VL-LABEL: @logical_and_icmp_clamp_v8i32(
		; AVX512VL-NEXT: [[X0:%.]] = extractelement <8 x i32> [[X:%.]], i32 0
		; AVX512VL-NEXT: [[X1:%.*]] = extractelement <8 x i32> [[X]], i32 1
		; AVX512VL-NEXT: [[X2:%.*]] = extractelement <8 x i32> [[X]], i32 2
		; AVX512VL-NEXT: [[X3:%.*]] = extractelement <8 x i32> [[X]], i32 3
		; AVX512VL-NEXT: [[Y0:%.]] = extractelement <8 x i32> [[Y:%.]], i32 0
		; AVX512VL-NEXT: [[Y1:%.*]] = extractelement <8 x i32> [[Y]], i32 1
		; AVX512VL-NEXT: [[Y2:%.*]] = extractelement <8 x i32> [[Y]], i32 2
		; AVX512VL-NEXT: [[Y3:%.*]] = extractelement <8 x i32> [[Y]], i32 3
		; AVX512VL-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42
		; AVX512VL-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42
		; AVX512VL-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42
		; AVX512VL-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42
		; AVX512VL-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[X0]], i32 0
		; AVX512VL-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[X1]], i32 1
		; AVX512VL-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[X2]], i32 2
		; AVX512VL-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[X3]], i32 3
		; AVX512VL-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> poison, i32 [[Y0]], i32 0
		; AVX512VL-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[Y1]], i32 1
		; AVX512VL-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[Y2]], i32 2
		; AVX512VL-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[Y3]], i32 3
		; AVX512VL-NEXT: [[TMP9:%.*]] = icmp slt <4 x i32> [[TMP4]], [[TMP8]]
		; AVX512VL-NEXT: [[TMP10:%.*]] = freeze <4 x i1> [[TMP9]]
		; AVX512VL-NEXT: [[TMP11:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP10]])
		; AVX512VL-NEXT: [[OP_RDX:%.*]] = select i1 [[C1]], i1 [[C0]], i1 false
		; AVX512VL-NEXT: [[OP_RDX1:%.*]] = select i1 [[C2]], i1 [[C3]], i1 false
		; AVX512VL-NEXT: [[OP_RDX2:%.*]] = select i1 [[OP_RDX]], i1 [[OP_RDX1]], i1 false
		; AVX512VL-NEXT: [[OP_RDX3:%.*]] = select i1 [[TMP11]], i1 [[OP_RDX2]], i1 false
		; AVX512VL-NEXT: ret i1 [[OP_RDX3]]
;		;
%x0 = extractelement <8 x i32> %x, i32 0		%x0 = extractelement <8 x i32> %x, i32 0
%x1 = extractelement <8 x i32> %x, i32 1		%x1 = extractelement <8 x i32> %x, i32 1
%x2 = extractelement <8 x i32> %x, i32 2		%x2 = extractelement <8 x i32> %x, i32 2
%x3 = extractelement <8 x i32> %x, i32 3		%x3 = extractelement <8 x i32> %x, i32 3
%y0 = extractelement <8 x i32> %y, i32 0		%y0 = extractelement <8 x i32> %y, i32 0
%y1 = extractelement <8 x i32> %y, i32 1		%y1 = extractelement <8 x i32> %y, i32 1
%y2 = extractelement <8 x i32> %y, i32 2		%y2 = extractelement <8 x i32> %y, i32 2
Show All 13 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_partial(		; CHECK-LABEL: @logical_and_icmp_clamp_partial(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP3]], 42
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42		; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP2]], 42
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42		; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP1]], 42
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17		; CHECK-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP4]]
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17		; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[C1]], i1 [[C0]], i1 false
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17		; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i1 [[C2]], i1 false
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; CHECK-NEXT: [[OP_RDX2:%.*]] = select i1 [[TMP6]], i1 [[OP_RDX1]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; CHECK-NEXT: ret i1 [[OP_RDX2]]
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S2]], i1 [[D0]], i1 false
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 9 Lines	;
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_pred_diff(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_pred_diff(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_pred_diff(		; CHECK-LABEL: @logical_and_icmp_clamp_pred_diff(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 3
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 2
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 1
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3		; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i32> [[X]], i32 0
; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42		; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP4]], 42
; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42		; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP3]], 42
; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42		; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP2]], 42
; CHECK-NEXT: [[C3:%.*]] = icmp ult i32 [[X3]], 42		; CHECK-NEXT: [[C3:%.*]] = icmp ult i32 [[TMP1]], 42
; CHECK-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17		; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17		; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP5]]
; CHECK-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17		; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[C1]], i1 [[C0]], i1 false
; CHECK-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false		; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[C2]], i1 [[C3]], i1 false
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false		; CHECK-NEXT: [[OP_RDX2:%.*]] = select i1 [[OP_RDX]], i1 [[OP_RDX1]], i1 false
; CHECK-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false		; CHECK-NEXT: [[OP_RDX3:%.*]] = select i1 [[TMP7]], i1 [[OP_RDX2]], i1 false
; CHECK-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false		; CHECK-NEXT: ret i1 [[OP_RDX3]]
; CHECK-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
; CHECK-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
; CHECK-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
; CHECK-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 13 Lines
}		}

define i1 @logical_and_icmp_extra_op(<4 x i32> %x, <4 x i32> %y, i1 %c) {		define i1 @logical_and_icmp_extra_op(<4 x i32> %x, <4 x i32> %y, i1 %c) {
; CHECK-LABEL: @logical_and_icmp_extra_op(		; CHECK-LABEL: @logical_and_icmp_extra_op(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], [[Y:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[S3:%.]] = select i1 [[C:%.]], i1 [[C]], i1 false		; CHECK-NEXT: [[S3:%.]] = select i1 [[C:%.]], i1 [[C]], i1 false
; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]		; CHECK-NEXT: [[TMP2:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])		; CHECK-NEXT: [[TMP3:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP2]])
; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP3]], i1 [[S3]], i1 false		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP3]], i1 [[S3]], i1 false
; CHECK-NEXT: ret i1 [[OP_EXTRA]]		; CHECK-NEXT: ret i1 [[OP_RDX]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%y0 = extractelement <4 x i32> %y, i32 0		%y0 = extractelement <4 x i32> %y, i32 0
%y1 = extractelement <4 x i32> %y, i32 1		%y1 = extractelement <4 x i32> %y, i32 1
%y2 = extractelement <4 x i32> %y, i32 2		%y2 = extractelement <4 x i32> %y, i32 2
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction_loads.ll

	Show All 25 Lines
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6			; CHECK-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[P]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7			; CHECK-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[P]], i64 7
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = mul <8 x i32> [[TMP1]], <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>			; CHECK-NEXT: [[TMP2:%.*]] = mul <8 x i32> [[TMP1]], <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
	; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[SUM]]			; CHECK-NEXT: [[OP_RDX]] = add i32 [[TMP3]], [[SUM]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_RDX]]
	;			;
	entry:			entry:
	%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.2 = getelementptr inbounds i32, i32* %p, i64 2
	%arrayidx.3 = getelementptr inbounds i32, i32* %p, i64 3			%arrayidx.3 = getelementptr inbounds i32, i32* %p, i64 3
	%arrayidx.4 = getelementptr inbounds i32, i32* %p, i64 4			%arrayidx.4 = getelementptr inbounds i32, i32* %p, i64 4
	%arrayidx.5 = getelementptr inbounds i32, i32* %p, i64 5			%arrayidx.5 = getelementptr inbounds i32, i32* %p, i64 5
	%arrayidx.6 = getelementptr inbounds i32, i32* %p, i64 6			%arrayidx.6 = getelementptr inbounds i32, i32* %p, i64 6
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_Q_2:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 2			; CHECK-NEXT: [[ARRAYIDX_Q_2:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_Q_3:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 3			; CHECK-NEXT: [[ARRAYIDX_Q_3:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_Q_4:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 4			; CHECK-NEXT: [[ARRAYIDX_Q_4:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_Q_5:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 5			; CHECK-NEXT: [[ARRAYIDX_Q_5:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_Q_6:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 6			; CHECK-NEXT: [[ARRAYIDX_Q_6:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_Q_7:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 7			; CHECK-NEXT: [[ARRAYIDX_Q_7:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 7
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]			; CHECK-NEXT: [[OP_RDX]] = add i32 [[TMP5]], [[SUM]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_RDX]]
	;			;
	entry:			entry:
	%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2
	%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3			%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3
	%arrayidx.p.4 = getelementptr inbounds i32, i32* %p, i64 4			%arrayidx.p.4 = getelementptr inbounds i32, i32* %p, i64 4
	%arrayidx.p.5 = getelementptr inbounds i32, i32* %p, i64 5			%arrayidx.p.5 = getelementptr inbounds i32, i32* %p, i64 5
	%arrayidx.p.6 = getelementptr inbounds i32, i32* %p, i64 6			%arrayidx.p.6 = getelementptr inbounds i32, i32* %p, i64 6
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX_Q_2:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 2			; CHECK-NEXT: [[ARRAYIDX_Q_2:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_Q_3:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 3			; CHECK-NEXT: [[ARRAYIDX_Q_3:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_Q_4:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 4			; CHECK-NEXT: [[ARRAYIDX_Q_4:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_Q_5:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 5			; CHECK-NEXT: [[ARRAYIDX_Q_5:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 5
	; CHECK-NEXT: [[ARRAYIDX_Q_6:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 6			; CHECK-NEXT: [[ARRAYIDX_Q_6:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 6
	; CHECK-NEXT: [[ARRAYIDX_Q_7:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 7			; CHECK-NEXT: [[ARRAYIDX_Q_7:%.]] = getelementptr inbounds i32, i32 [[Q]], i64 7
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[SUM:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_RDX:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[Q]] to <8 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP3:%.]] = load <8 x i32>, <8 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[SHUFFLE]], [[TMP3]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				; CHECK-NEXT: [[TMP4:%.*]] = mul <8 x i32> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA]] = add i32 [[TMP5]], [[SUM]]			; CHECK-NEXT: [[OP_RDX]] = add i32 [[TMP5]], [[SUM]]
	; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret i32 [[OP_EXTRA]]			; CHECK-NEXT: ret i32 [[OP_RDX]]
	;			;
	entry:			entry:
	%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1			%arrayidx.p.1 = getelementptr inbounds i32, i32* %p, i64 1
	%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2			%arrayidx.p.2 = getelementptr inbounds i32, i32* %p, i64 2
	%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3			%arrayidx.p.3 = getelementptr inbounds i32, i32* %p, i64 3
	%arrayidx.p.4 = getelementptr inbounds i32, i32* %p, i64 4			%arrayidx.p.4 = getelementptr inbounds i32, i32* %p, i64 4
	%arrayidx.p.5 = getelementptr inbounds i32, i32* %p, i64 5			%arrayidx.p.5 = getelementptr inbounds i32, i32* %p, i64 5
	%arrayidx.p.6 = getelementptr inbounds i32, i32* %p, i64 6			%arrayidx.p.6 = getelementptr inbounds i32, i32* %p, i64 6
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 -debug < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt -slp-vectorizer -slp-vectorize-hor -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 -debug < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,AVX
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -S -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 -debug < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt -slp-vectorizer -slp-vectorize-hor -S -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 -debug < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,SSE
	; REQUIRES: asserts			; REQUIRES: asserts

	; int test_add(unsigned int *p) {			; int test_add(unsigned int *p) {
	; int result = 0;			; int result = 0;
	; for (int i = 0; i < 8; i++)			; for (int i = 0; i < 8; i++)
	; result += p[i];			; result += p[i];
	; return result;			; return result;
	; }			; }

	; Vector cost is 5, Scalar cost is 7			; Vector cost is 5, Scalar cost is 7
	; AVX: Adding cost -2 for reduction that starts with %7 = load i32, i32* %arrayidx.7, align 4 (It is a splitting reduction)			; AVX: Adding cost -2 for reduction that starts with %0 = load i32, i32* %p, align 4 (It is a splitting reduction)
	; Vector cost is 6, Scalar cost is 7			; Vector cost is 6, Scalar cost is 7
	; SSE: Adding cost -1 for reduction that starts with %7 = load i32, i32* %arrayidx.7, align 4 (It is a splitting reduction)			; SSE: Adding cost -1 for reduction that starts with %0 = load i32, i32* %p, align 4 (It is a splitting reduction)
	define i32 @test_add(i32* nocapture readonly %p) {			define i32 @test_add(i32* nocapture readonly %p) {
	; CHECK-LABEL: @test_add(			; CHECK-LABEL: @test_add(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i32, i32 [[P:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2			; CHECK-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[P]], i64 2
	; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3			; CHECK-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[P]], i64 3
	; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4			; CHECK-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[P]], i64 4
	; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5			; CHECK-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[P]], i64 5
	▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

	Show All 9 Lines
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15			; CHECK-NEXT: [[T:%.*]] = select i1 undef, i16 undef, i16 15
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> <i16 poison, i16 undef>, i16 [[T]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i16> <i16 poison, i16 undef>, i16 [[T]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[TMP0]] to <2 x i32>			; CHECK-NEXT: [[TMP1:%.*]] = sext <2 x i16> [[TMP0]] to <2 x i32>
	; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 undef, i32 63>, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <2 x i32> <i32 undef, i32 63>, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], undef			; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP2]], undef
	; CHECK-NEXT: [[SHUFFLE10:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE10]], <i32 undef, i32 15, i32 31, i32 47>			; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[SHUFFLE4]], <i32 15, i32 undef, i32 31, i32 47>
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef			; CHECK-NEXT: [[T19:%.*]] = select i1 undef, i32 [[TMP5]], i32 undef
	; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63			; CHECK-NEXT: [[T20:%.*]] = icmp sgt i32 [[T19]], 63
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> undef, [[TMP1]]			; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <2 x i32> undef, [[TMP1]]
	; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[TMP6]], undef
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <4 x i32> [[SHUFFLE]], <i32 -49, i32 -33, i32 -33, i32 -17>
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> undef)
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP9]], undef			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP8]])
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP9]], i32 undef			; CHECK-NEXT: [[OP_RDX:%.*]] = icmp slt i32 [[TMP9]], [[TMP10]]
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = icmp slt i32 [[OP_EXTRA1]], undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP9]], i32 [[TMP10]]
	; CHECK-NEXT: [[OP_EXTRA3:%.*]] = select i1 [[OP_EXTRA2]], i32 [[OP_EXTRA1]], i32 undef			; CHECK-NEXT: [[OP_RDX2:%.*]] = icmp slt i32 [[OP_RDX1]], undef
	; CHECK-NEXT: [[OP_EXTRA4:%.*]] = icmp slt i32 [[OP_EXTRA3]], undef			; CHECK-NEXT: [[OP_RDX3:%.*]] = select i1 [[OP_RDX2]], i32 [[OP_RDX1]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = select i1 [[OP_EXTRA4]], i32 [[OP_EXTRA3]], i32 undef			; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_RDX3]]
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = icmp slt i32 [[OP_EXTRA5]], undef
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = select i1 [[OP_EXTRA6]], i32 [[OP_EXTRA5]], i32 undef
	; CHECK-NEXT: [[OP_EXTRA8:%.*]] = icmp slt i32 [[OP_EXTRA7]], undef
	; CHECK-NEXT: [[OP_EXTRA9:%.*]] = select i1 [[OP_EXTRA8]], i32 [[OP_EXTRA7]], i32 undef
	; CHECK-NEXT: [[T45:%.*]] = icmp sgt i32 undef, [[OP_EXTRA9]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	bb:			bb:
	br i1 undef, label %bb1, label %bb2			br i1 undef, label %bb1, label %bb2

	bb1: ; preds = %bb			bb1: ; preds = %bb
	ret void			ret void

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll

	Show All 18 Lines
	; CHECK: for.cond.preheader:			; CHECK: for.cond.preheader:
	; CHECK-NEXT: [[I:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 2			; CHECK-NEXT: [[I:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 2
	; CHECK-NEXT: [[I1:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 3			; CHECK-NEXT: [[I1:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 3
	; CHECK-NEXT: [[I2:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 4			; CHECK-NEXT: [[I2:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 4
	; CHECK-NEXT: [[I3:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 5			; CHECK-NEXT: [[I3:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 5
	; CHECK-NEXT: [[I4:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 6			; CHECK-NEXT: [[I4:%.]] = getelementptr inbounds [100 x i32], [100 x i32] undef, i64 0, i64 6
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[I]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[I]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 8
	; CHECK-NEXT: [[I5:%.*]] = add i32 undef, undef
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])
	; CHECK-NEXT: [[OP_EXTRA2:%.*]] = add i32 [[TMP2]], [[I5]]			; CHECK-NEXT: [[OP_RDX6:%.*]] = add i32 [[TMP2]], undef
	; CHECK-NEXT: [[I10:%.*]] = add i32 [[OP_EXTRA2]], undef
	; CHECK-NEXT: [[I11:%.*]] = add i32 [[OP_EXTRA2]], [[I10]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[I1]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[I1]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[I12:%.*]] = add i32 undef, undef
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP4]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP5]], [[I12]]			; CHECK-NEXT: [[OP_RDX5:%.*]] = add i32 [[TMP5]], undef
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = add i32 [[OP_EXTRA]], undef			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)
	; CHECK-NEXT: [[I18:%.*]] = add i32 [[OP_EXTRA1]], [[I11]]			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 undef, [[OP_RDX6]]
	; CHECK-NEXT: [[I19:%.*]] = add i32 [[OP_EXTRA1]], [[I18]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX6]], [[OP_RDX5]]
	; CHECK-NEXT: [[I20:%.*]] = add i32 undef, [[I19]]			; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[I21:%.*]] = add i32 undef, [[I20]]			; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[OP_RDX2]], [[OP_RDX5]]
	; CHECK-NEXT: [[I22:%.*]] = add i32 undef, [[I21]]			; CHECK-NEXT: [[OP_RDX4:%.*]] = add i32 [[TMP6]], [[OP_RDX3]]
	; CHECK-NEXT: [[I23:%.*]] = add i32 undef, [[I22]]
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[R:%.]] = phi i32 [ [[I23]], [[FOR_COND_PREHEADER]] ], [ undef, [[ENTRY:%.]] ]			; CHECK-NEXT: [[R:%.]] = phi i32 [ [[OP_RDX4]], [[FOR_COND_PREHEADER]] ], [ undef, [[ENTRY:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br i1 undef, label %if.end, label %for.cond.preheader			br i1 undef, label %if.end, label %for.cond.preheader

	for.cond.preheader: ; preds = %entry			for.cond.preheader: ; preds = %entry
	%i = getelementptr inbounds [100 x i32], [100 x i32]* undef, i64 0, i64 2			%i = getelementptr inbounds [100 x i32], [100 x i32]* undef, i64 0, i64 2
	%i1 = getelementptr inbounds [100 x i32], [100 x i32]* undef, i64 0, i64 3			%i1 = getelementptr inbounds [100 x i32], [100 x i32]* undef, i64 0, i64 3
	Show All 36 Lines

llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll

	Show All 11 Lines
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_5:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 5, i32 1
	; CHECK-NEXT: [[DOTSROA_CAST_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_6:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 6, i32 1
	; CHECK-NEXT: [[DOTSROA_CAST_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 0			; CHECK-NEXT: [[DOTSROA_CAST_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 0
	; CHECK-NEXT: [[DOTSROA_RAW_IDX_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 1			; CHECK-NEXT: [[DOTSROA_RAW_IDX_7:%.]] = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76" undef, i64 7, i32 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[DOTSROA_CAST_4]] to <8 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[DOTSROA_CAST_4]] to <8 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <8 x i32>, <8 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP1]])			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.smax.v8i32(<8 x i32> [[TMP1]])
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt i32 [[TMP2]], undef			; CHECK-NEXT: [[OP_RDX:%.*]] = icmp sgt i32 [[TMP2]], undef
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = select i1 [[TMP3]], i32 [[TMP2]], i32 undef			; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP2]], i32 undef
	; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt i32 [[OP_EXTRA]], undef			; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_RDX1]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[TMP4]], i32 [[OP_EXTRA]], i32 undef
	; CHECK-NEXT: [[DOTSROA_SPECULATED_9:%.*]] = select i1 undef, i32 undef, i32 [[OP_EXTRA1]]
	; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef			; CHECK-NEXT: [[CMP_I1_10:%.*]] = icmp slt i32 [[DOTSROA_SPECULATED_9]], undef
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	for.body.lr.ph:			for.body.lr.ph:
	%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 0			%.sroa_cast.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 0
	%retval.sroa.0.0.copyload.i5.4 = load i32, i32* %.sroa_cast.4, align 4			%retval.sroa.0.0.copyload.i5.4 = load i32, i32* %.sroa_cast.4, align 4
	%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 1			%.sroa_raw_idx.4 = getelementptr inbounds %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76", %"struct.std::h.0.4.8.12.16.20.24.28.248.0.1.2.3.76"* undef, i64 4, i32 1
	%retval.sroa.0.0.copyload.i7.4 = load i32, i32* %.sroa_raw_idx.4, align 4			%retval.sroa.0.0.copyload.i7.4 = load i32, i32* %.sroa_raw_idx.4, align 4
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP30:%.*]] = add i32 [[TMP29]], -183			; CHECK-NEXT: [[TMP30:%.*]] = add i32 [[TMP29]], -183
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <4 x i32> poison, i32 [[TMP30]], i32 0			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <4 x i32> poison, i32 [[TMP30]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP31]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP31]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP32:%.*]] = sub <4 x i32> [[SHUFFLE]], [[TMP0]]			; CHECK-NEXT: [[TMP32:%.*]] = sub <4 x i32> [[SHUFFLE]], [[TMP0]]
	; CHECK-NEXT: [[TMP33:%.*]] = icmp slt <4 x i32> [[TMP32]], zeroinitializer			; CHECK-NEXT: [[TMP33:%.*]] = icmp slt <4 x i32> [[TMP32]], zeroinitializer
	; CHECK-NEXT: [[TMP34:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP32]]			; CHECK-NEXT: [[TMP34:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP32]]
	; CHECK-NEXT: [[TMP35:%.*]] = select <4 x i1> [[TMP33]], <4 x i32> [[TMP34]], <4 x i32> [[TMP32]]			; CHECK-NEXT: [[TMP35:%.*]] = select <4 x i1> [[TMP33]], <4 x i32> [[TMP34]], <4 x i32> [[TMP32]]
	; CHECK-NEXT: [[TMP36:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP35]])			; CHECK-NEXT: [[TMP36:%.*]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[TMP35]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = icmp slt i32 [[TMP36]], [[B_0]]			; CHECK-NEXT: [[OP_RDX:%.*]] = icmp slt i32 [[TMP36]], [[B_0]]
	; CHECK-NEXT: [[OP_EXTRA1:%.*]] = select i1 [[OP_EXTRA]], i32 [[TMP36]], i32 [[B_0]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i32 [[TMP36]], i32 [[B_0]]
	; CHECK-NEXT: [[SUB_116:%.*]] = sub i32 [[TMP30]], [[TMP1]]			; CHECK-NEXT: [[SUB_116:%.*]] = sub i32 [[TMP30]], [[TMP1]]
	; CHECK-NEXT: [[TMP37:%.*]] = icmp slt i32 [[SUB_116]], 0			; CHECK-NEXT: [[TMP37:%.*]] = icmp slt i32 [[SUB_116]], 0
	; CHECK-NEXT: [[NEG_117:%.*]] = sub nsw i32 0, [[SUB_116]]			; CHECK-NEXT: [[NEG_117:%.*]] = sub nsw i32 0, [[SUB_116]]
	; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[NEG_117]], i32 [[SUB_116]]			; CHECK-NEXT: [[TMP38:%.*]] = select i1 [[TMP37]], i32 [[NEG_117]], i32 [[SUB_116]]
	; CHECK-NEXT: [[CMP12_118:%.*]] = icmp slt i32 [[TMP38]], [[OP_EXTRA1]]			; CHECK-NEXT: [[CMP12_118:%.*]] = icmp slt i32 [[TMP38]], [[OP_RDX1]]
	; CHECK-NEXT: [[SPEC_SELECT8_120:%.*]] = select i1 [[CMP12_118]], i32 [[TMP38]], i32 [[OP_EXTRA1]]			; CHECK-NEXT: [[SPEC_SELECT8_120:%.*]] = select i1 [[CMP12_118]], i32 [[TMP38]], i32 [[OP_RDX1]]
	; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]			; CHECK-NEXT: [[SUB_1_1:%.*]] = sub i32 [[TMP30]], [[TMP2]]
	; CHECK-NEXT: [[TMP39:%.*]] = icmp slt i32 [[SUB_1_1]], 0			; CHECK-NEXT: [[TMP39:%.*]] = icmp slt i32 [[SUB_1_1]], 0
	; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]			; CHECK-NEXT: [[NEG_1_1:%.*]] = sub nsw i32 0, [[SUB_1_1]]
	; CHECK-NEXT: [[TMP40:%.*]] = select i1 [[TMP39]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]			; CHECK-NEXT: [[TMP40:%.*]] = select i1 [[TMP39]], i32 [[NEG_1_1]], i32 [[SUB_1_1]]
	; CHECK-NEXT: [[CMP12_1_1:%.*]] = icmp slt i32 [[TMP40]], [[SPEC_SELECT8_120]]			; CHECK-NEXT: [[CMP12_1_1:%.*]] = icmp slt i32 [[TMP40]], [[SPEC_SELECT8_120]]
	; CHECK-NEXT: [[NARROW:%.*]] = or i1 [[CMP12_1_1]], [[CMP12_118]]			; CHECK-NEXT: [[NARROW:%.*]] = or i1 [[CMP12_1_1]], [[CMP12_118]]
	; CHECK-NEXT: [[SPEC_SELECT8_1_1:%.*]] = select i1 [[CMP12_1_1]], i32 [[TMP40]], i32 [[SPEC_SELECT8_120]]			; CHECK-NEXT: [[SPEC_SELECT8_1_1:%.*]] = select i1 [[CMP12_1_1]], i32 [[TMP40]], i32 [[SPEC_SELECT8_120]]
	; CHECK-NEXT: [[SUB_2_1:%.*]] = sub i32 [[TMP30]], [[TMP3]]			; CHECK-NEXT: [[SUB_2_1:%.*]] = sub i32 [[TMP30]], [[TMP3]]
	▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

	define i32 @foo(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 0, i32 0>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A2:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	Show All 31 Lines
	define i32 @foo1(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo1(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo1(			; CHECK-LABEL: @foo1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 1
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 3			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 1, i32 2, i32 3, i32 1, i32 1, i32 0, i32 2, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 2, i32 1, i32 3, i32 1, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A2:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	Show All 35 Lines
	define i32 @foo2(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {			define i32 @foo2(i32* nocapture readonly %arr, i32 %a1, i32 %a2, i32 %a3, i32 %a4, i32 %a5, i32 %a6, i32 %a7, i32 %a8) {
	; CHECK-LABEL: @foo2(			; CHECK-LABEL: @foo2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 3			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[ARR:%.*]], i64 3
	; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2			; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 2
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 1			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[ARR]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[ARR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 3, i32 2, i32 3, i32 0, i32 1, i32 0, i32 2, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 2, i32 3, i32 3, i32 0, i32 1, i32 0, i32 2, i32 1>
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A1:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <8 x i32> poison, i32 [[A2:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A2:%.]], i32 1			; CHECK-NEXT: [[TMP3:%.]] = insertelement <8 x i32> [[TMP2]], i32 [[A1:%.]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2			; CHECK-NEXT: [[TMP4:%.]] = insertelement <8 x i32> [[TMP3]], i32 [[A3:%.]], i32 2
	; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3			; CHECK-NEXT: [[TMP5:%.]] = insertelement <8 x i32> [[TMP4]], i32 [[A4:%.]], i32 3
	; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4			; CHECK-NEXT: [[TMP6:%.]] = insertelement <8 x i32> [[TMP5]], i32 [[A5:%.]], i32 4
	; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5			; CHECK-NEXT: [[TMP7:%.]] = insertelement <8 x i32> [[TMP6]], i32 [[A6:%.]], i32 5
	; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6			; CHECK-NEXT: [[TMP8:%.]] = insertelement <8 x i32> [[TMP7]], i32 [[A7:%.]], i32 6
	; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7			; CHECK-NEXT: [[TMP9:%.]] = insertelement <8 x i32> [[TMP8]], i32 [[A8:%.]], i32 7
	; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = add <8 x i32> [[SHUFFLE]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.umin.v8i32(<8 x i32> [[TMP10]])
	Show All 34 Lines

llvm/test/Transforms/SLPVectorizer/slp-umax-rdx-matcher-crash.ll

	Show All 37 Lines

	declare i32 @llvm.smin.i32(i32, i32)			declare i32 @llvm.smin.i32(i32, i32)
	declare i32 @llvm.umin.i32(i32, i32)			declare i32 @llvm.umin.i32(i32, i32)

	; Given LLVM IR caused crash in SLP.			; Given LLVM IR caused crash in SLP.
	define void @test2() {			define void @test2() {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>)			; CHECK-NEXT: [[TMP0:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> poison, <4 x i32> <i32 2, i32 3, i32 1, i32 0>)
	; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <4 x i32> poison, [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <4 x i32> poison, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP1]])			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP1]])
	; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP2]], i32 77)			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP2]], i32 77)
	; CHECK-NEXT: [[E:%.*]] = icmp ugt i32 [[TMP3]], 1			; CHECK-NEXT: [[E:%.*]] = icmp ugt i32 [[TMP3]], 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%smin0 = call i32 @llvm.smin.i32(i32 undef, i32 0)			%smin0 = call i32 @llvm.smin.i32(i32 undef, i32 0)
	Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve reductions vectorization.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 382664

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/horizontal-store.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35628_1.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction_loads.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_repeated_ops.ll

llvm/test/Transforms/SLPVectorizer/X86/revectorized_rdx_crash.ll

llvm/test/Transforms/SLPVectorizer/X86/undef_vect.ll

llvm/test/Transforms/SLPVectorizer/X86/used-reduced-op.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-reorder-reuse.ll

llvm/test/Transforms/SLPVectorizer/slp-umax-rdx-matcher-crash.ll

[SLP]Improve reductions vectorization.
ClosedPublic